* [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-21 9:28 ` Thomas Gleixner
2025-02-20 1:31 ` [PATCH v2 2/7] genirq/msi: Refactor iommu_dma_compose_msi_msg() Nicolin Chen
` (6 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
From: Jason Gunthorpe <jgg@nvidia.com>
The IOMMU translation for MSI message addresses has been a 2-step process,
separated in time:
1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address
is stored in the MSI descriptor when an MSI interrupt is allocated.
2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a
translated message address.
This has an inherent lifetime problem for the pointer stored in the cookie
that must remain valid between the two steps. However, there is no locking
at the irq layer that helps protect the lifetime. Today, this works under
the assumption that the iommu domain is not changed while MSI interrupts
being programmed. This is true for normal DMA API users within the kernel,
as the iommu domain is attached before the driver is probed and cannot be
changed while a driver is attached.
Classic VFIO type1 also prevented changing the iommu domain while VFIO was
running as it does not support changing the "container" after starting up.
However, iommufd has improved this so that the iommu domain can be changed
during VFIO operation. This potentially allows userspace to directly race
VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and
VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()).
This potentially causes both the cookie pointer and the unlocked call to
iommu_get_domain_for_dev() on the MSI translation path to become UAFs.
Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
address is already known during iommu_dma_prepare_msi() and cannot change.
Thus, it can simply be stored as an integer in the MSI descriptor.
A following patch will fix the other UAF in iommu_get_domain_for_dev(), by
using the IOMMU group mutex.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/msi.h | 33 ++++++++++++---------------------
drivers/iommu/dma-iommu.c | 28 +++++++++++++---------------
2 files changed, 25 insertions(+), 36 deletions(-)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index b10093c4d00e..fc4f3774c3af 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -166,6 +166,10 @@ struct msi_desc_data {
* @dev: Pointer to the device which uses this descriptor
* @msg: The last set MSI message cached for reuse
* @affinity: Optional pointer to a cpu affinity mask for this descriptor
+ * @iommu_msi_iova: Optional shifted IOVA from the IOMMU to override the msi_addr.
+ * Only used if iommu_msi_shift != 0
+ * @iommu_msi_shift: Indicates how many bits of the original address should be
+ * preserved when using iommu_msi_iova.
* @sysfs_attr: Pointer to sysfs device attribute
*
* @write_msi_msg: Callback that may be called when the MSI message
@@ -184,7 +188,8 @@ struct msi_desc {
struct msi_msg msg;
struct irq_affinity_desc *affinity;
#ifdef CONFIG_IRQ_MSI_IOMMU
- const void *iommu_cookie;
+ u64 iommu_msi_iova : 58;
+ u64 iommu_msi_shift : 6;
#endif
#ifdef CONFIG_SYSFS
struct device_attribute *sysfs_attrs;
@@ -285,28 +290,14 @@ struct msi_desc *msi_next_desc(struct device *dev, unsigned int domid,
#define msi_desc_to_dev(desc) ((desc)->dev)
-#ifdef CONFIG_IRQ_MSI_IOMMU
-static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
-{
- return desc->iommu_cookie;
-}
-
-static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
- const void *iommu_cookie)
-{
- desc->iommu_cookie = iommu_cookie;
-}
-#else
-static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
+static inline void msi_desc_set_iommu_msi_iova(struct msi_desc *desc, u64 msi_iova,
+ unsigned int msi_shift)
{
- return NULL;
-}
-
-static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
- const void *iommu_cookie)
-{
-}
+#ifdef CONFIG_IRQ_MSI_IOMMU
+ desc->iommu_msi_iova = msi_iova >> msi_shift;
+ desc->iommu_msi_shift = msi_shift;
#endif
+}
int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
struct msi_desc *init_desc);
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2a9fa0c8cc00..0f0caf59023c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1815,7 +1815,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
static DEFINE_MUTEX(msi_prepare_lock); /* see below */
if (!domain || !domain->iova_cookie) {
- desc->iommu_cookie = NULL;
+ msi_desc_set_iommu_msi_iova(desc, 0, 0);
return 0;
}
@@ -1827,11 +1827,12 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
mutex_lock(&msi_prepare_lock);
msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
mutex_unlock(&msi_prepare_lock);
-
- msi_desc_set_iommu_cookie(desc, msi_page);
-
if (!msi_page)
return -ENOMEM;
+
+ msi_desc_set_iommu_msi_iova(
+ desc, msi_page->iova,
+ ilog2(cookie_msi_granule(domain->iova_cookie)));
return 0;
}
@@ -1842,18 +1843,15 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
*/
void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
{
- struct device *dev = msi_desc_to_dev(desc);
- const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
- const struct iommu_dma_msi_page *msi_page;
+#ifdef CONFIG_IRQ_MSI_IOMMU
+ if (desc->iommu_msi_shift) {
+ u64 msi_iova = desc->iommu_msi_iova << desc->iommu_msi_shift;
- msi_page = msi_desc_get_iommu_cookie(desc);
-
- if (!domain || !domain->iova_cookie || WARN_ON(!msi_page))
- return;
-
- msg->address_hi = upper_32_bits(msi_page->iova);
- msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1;
- msg->address_lo += lower_32_bits(msi_page->iova);
+ msg->address_hi = upper_32_bits(msi_iova);
+ msg->address_lo = lower_32_bits(msi_iova) |
+ (msg->address_lo & ((1 << desc->iommu_msi_shift) - 1));
+ }
+#endif
}
static int iommu_dma_init(void)
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-20 1:31 ` [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie Nicolin Chen
@ 2025-02-21 9:28 ` Thomas Gleixner
2025-02-21 11:10 ` Joerg Roedel
2025-02-21 14:05 ` Jason Gunthorpe
0 siblings, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2025-02-21 9:28 UTC (permalink / raw)
To: Nicolin Chen, jgg, kevin.tian, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
> address is already known during iommu_dma_prepare_msi() and cannot change.
> Thus, it can simply be stored as an integer in the MSI descriptor.
>
> A following patch will fix the other UAF in iommu_get_domain_for_dev(), by
> using the IOMMU group mutex.
"A following patch" has no meaning once the current one is
applied. Simply say:
The other UAF in iommu_get_domain_for_dev() will be addressed
seperately, by ....
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
With that fixed:
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-21 9:28 ` Thomas Gleixner
@ 2025-02-21 11:10 ` Joerg Roedel
2025-02-21 13:41 ` Jason Gunthorpe
2025-02-21 14:05 ` Jason Gunthorpe
1 sibling, 1 reply; 34+ messages in thread
From: Joerg Roedel @ 2025-02-21 11:10 UTC (permalink / raw)
To: Nicolin Chen
Cc: Thomas Gleixner, jgg, kevin.tian, maz, will, robin.murphy, shuah,
iommu, linux-kernel, linux-arm-kernel, linux-kselftest,
eric.auger, baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
Hi Nicolin,
On Fri, Feb 21, 2025 at 10:28:20AM +0100, Thomas Gleixner wrote:
> On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> > Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
> > address is already known during iommu_dma_prepare_msi() and cannot change.
> > Thus, it can simply be stored as an integer in the MSI descriptor.
> >
> > A following patch will fix the other UAF in iommu_get_domain_for_dev(), by
> > using the IOMMU group mutex.
>
> "A following patch" has no meaning once the current one is
> applied. Simply say:
>
> The other UAF in iommu_get_domain_for_dev() will be addressed
> seperately, by ....
>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>
> With that fixed:
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Can you please send a v3 with updated commit message and all the
review/acked tags added? I will pick it up then.
Thanks,
Joerg
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-21 11:10 ` Joerg Roedel
@ 2025-02-21 13:41 ` Jason Gunthorpe
2025-02-21 14:00 ` Joerg Roedel
0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 13:41 UTC (permalink / raw)
To: Joerg Roedel
Cc: Nicolin Chen, Thomas Gleixner, kevin.tian, maz, will,
robin.murphy, shuah, iommu, linux-kernel, linux-arm-kernel,
linux-kselftest, eric.auger, baolu.lu, yi.l.liu, yury.norov,
jacob.pan, patches
On Fri, Feb 21, 2025 at 12:10:46PM +0100, Joerg Roedel wrote:
> Hi Nicolin,
>
> On Fri, Feb 21, 2025 at 10:28:20AM +0100, Thomas Gleixner wrote:
> > On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> > > Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
> > > address is already known during iommu_dma_prepare_msi() and cannot change.
> > > Thus, it can simply be stored as an integer in the MSI descriptor.
> > >
> > > A following patch will fix the other UAF in iommu_get_domain_for_dev(), by
> > > using the IOMMU group mutex.
> >
> > "A following patch" has no meaning once the current one is
> > applied. Simply say:
> >
> > The other UAF in iommu_get_domain_for_dev() will be addressed
> > seperately, by ....
> >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> >
> > With that fixed:
> >
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>
> Can you please send a v3 with updated commit message and all the
> review/acked tags added? I will pick it up then.
Can I send you a PR instead? I'd like it on a branch so we can work on
the iommufd specific bits that where in v1.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-21 13:41 ` Jason Gunthorpe
@ 2025-02-21 14:00 ` Joerg Roedel
0 siblings, 0 replies; 34+ messages in thread
From: Joerg Roedel @ 2025-02-21 14:00 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, Thomas Gleixner, kevin.tian, maz, will,
robin.murphy, shuah, iommu, linux-kernel, linux-arm-kernel,
linux-kselftest, eric.auger, baolu.lu, yi.l.liu, yury.norov,
jacob.pan, patches
Hi Jason,
On Fri, Feb 21, 2025 at 09:41:12AM -0400, Jason Gunthorpe wrote:
> Can I send you a PR instead? I'd like it on a branch so we can work on
> the iommufd specific bits that where in v1.
Yes, that works as well.
Thanks,
Joerg
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie
2025-02-21 9:28 ` Thomas Gleixner
2025-02-21 11:10 ` Joerg Roedel
@ 2025-02-21 14:05 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 14:05 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Nicolin Chen, kevin.tian, maz, joro, will, robin.murphy, shuah,
iommu, linux-kernel, linux-arm-kernel, linux-kselftest,
eric.auger, baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Fri, Feb 21, 2025 at 10:28:20AM +0100, Thomas Gleixner wrote:
> On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> > Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
> > address is already known during iommu_dma_prepare_msi() and cannot change.
> > Thus, it can simply be stored as an integer in the MSI descriptor.
> >
> > A following patch will fix the other UAF in iommu_get_domain_for_dev(), by
> > using the IOMMU group mutex.
>
> "A following patch" has no meaning once the current one is
> applied. Simply say:
>
> The other UAF in iommu_get_domain_for_dev() will be addressed
> seperately, by ....
I used this paragraph:
The other UAF related to iommu_get_domain_for_dev() will be addressed in
patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by
using the IOMMU group mutex.
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 2/7] genirq/msi: Refactor iommu_dma_compose_msi_msg()
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
2025-02-20 1:31 ` [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-21 9:28 ` Thomas Gleixner
2025-02-20 1:31 ` [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation Nicolin Chen
` (5 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
From: Jason Gunthorpe <jgg@nvidia.com>
The two-step process to translate the MSI address involves two functions,
iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg().
Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as
it had to dereference the opaque cookie pointer. Now, the previous patch
changed the cookie pointer into an integer so there is no longer any need
for the iommu layer to be involved.
Further, the call sites of iommu_dma_compose_msi_msg() all follow the same
pattern of setting an MSI message address_hi/lo to non-translated and then
immediately calling iommu_dma_compose_msi_msg().
Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly
accepts the u64 version of the address and simplifies all the callers.
Move the new helper to linux/msi.h since it has nothing to do with iommu.
Aside from refactoring, this logically prepares for the next patch, which
allows multiple implementation options for iommu_dma_prepare_msi(). So, it
does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c
as it no longer provides the only iommu_dma_prepare_msi() implementation.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 6 ------
include/linux/msi.h | 28 ++++++++++++++++++++++++++++
drivers/iommu/dma-iommu.c | 18 ------------------
drivers/irqchip/irq-gic-v2m.c | 5 +----
drivers/irqchip/irq-gic-v3-its.c | 13 +++----------
drivers/irqchip/irq-gic-v3-mbi.c | 12 ++++--------
drivers/irqchip/irq-ls-scfg-msi.c | 5 ++---
7 files changed, 38 insertions(+), 49 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 38c65e92ecd0..caee952febd4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1508,7 +1508,6 @@ static inline void iommu_debugfs_setup(void) {}
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
-void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg);
#else /* CONFIG_IOMMU_DMA */
@@ -1524,11 +1523,6 @@ static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_a
{
return 0;
}
-
-static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
-{
-}
-
#endif /* CONFIG_IOMMU_DMA */
/*
diff --git a/include/linux/msi.h b/include/linux/msi.h
index fc4f3774c3af..8d97b890faec 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -299,6 +299,34 @@ static inline void msi_desc_set_iommu_msi_iova(struct msi_desc *desc, u64 msi_io
#endif
}
+/**
+ * msi_msg_set_addr() - Set MSI address in an MSI message
+ *
+ * @desc: MSI descriptor that may carry an IOVA base address for MSI via @iommu_msi_iova/shift
+ * @msg: Target MSI message to set its address_hi and address_lo
+ * @msi_addr: Physical address to set the MSI message
+ *
+ * Notes:
+ * - Override @msi_addr using the IOVA base address in the @desc if @iommu_msi_shift is set
+ * - Otherwise, simply set @msi_addr to @msg
+ */
+static inline void msi_msg_set_addr(struct msi_desc *desc, struct msi_msg *msg,
+ phys_addr_t msi_addr)
+{
+#ifdef CONFIG_IRQ_MSI_IOMMU
+ if (desc->iommu_msi_shift) {
+ u64 msi_iova = desc->iommu_msi_iova << desc->iommu_msi_shift;
+
+ msg->address_hi = upper_32_bits(msi_iova);
+ msg->address_lo = lower_32_bits(msi_iova) |
+ (msi_addr & ((1 << desc->iommu_msi_shift) - 1));
+ return;
+ }
+#endif
+ msg->address_hi = upper_32_bits(msi_addr);
+ msg->address_lo = lower_32_bits(msi_addr);
+}
+
int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
struct msi_desc *init_desc);
/**
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0f0caf59023c..bf91e014d179 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1836,24 +1836,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
return 0;
}
-/**
- * iommu_dma_compose_msi_msg() - Apply translation to an MSI message
- * @desc: MSI descriptor prepared by iommu_dma_prepare_msi()
- * @msg: MSI message containing target physical address
- */
-void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
-{
-#ifdef CONFIG_IRQ_MSI_IOMMU
- if (desc->iommu_msi_shift) {
- u64 msi_iova = desc->iommu_msi_iova << desc->iommu_msi_shift;
-
- msg->address_hi = upper_32_bits(msi_iova);
- msg->address_lo = lower_32_bits(msi_iova) |
- (msg->address_lo & ((1 << desc->iommu_msi_shift) - 1));
- }
-#endif
-}
-
static int iommu_dma_init(void)
{
if (is_kdump_kernel())
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index be35c5349986..57e0470e0d13 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -87,9 +87,6 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
struct v2m_data *v2m = irq_data_get_irq_chip_data(data);
phys_addr_t addr = gicv2m_get_msi_addr(v2m, data->hwirq);
- msg->address_hi = upper_32_bits(addr);
- msg->address_lo = lower_32_bits(addr);
-
if (v2m->flags & GICV2M_GRAVITON_ADDRESS_ONLY)
msg->data = 0;
else
@@ -97,7 +94,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET)
msg->data -= v2m->spi_offset;
- iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg);
+ msi_msg_set_addr(irq_data_get_msi_desc(data), msg, addr);
}
static struct irq_chip gicv2m_irq_chip = {
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 8c3ec5734f1e..ce0bf70b9eaf 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1809,17 +1809,10 @@ static u64 its_irq_get_msi_base(struct its_device *its_dev)
static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
{
struct its_device *its_dev = irq_data_get_irq_chip_data(d);
- struct its_node *its;
- u64 addr;
-
- its = its_dev->its;
- addr = its->get_msi_base(its_dev);
-
- msg->address_lo = lower_32_bits(addr);
- msg->address_hi = upper_32_bits(addr);
- msg->data = its_get_event_id(d);
- iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg);
+ msg->data = its_get_event_id(d);
+ msi_msg_set_addr(irq_data_get_msi_desc(d), msg,
+ its_dev->its->get_msi_base(its_dev));
}
static int its_irq_set_irqchip_state(struct irq_data *d,
diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c
index 3fe870f8ee17..a6510128611e 100644
--- a/drivers/irqchip/irq-gic-v3-mbi.c
+++ b/drivers/irqchip/irq-gic-v3-mbi.c
@@ -147,22 +147,18 @@ static const struct irq_domain_ops mbi_domain_ops = {
static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
{
- msg[0].address_hi = upper_32_bits(mbi_phys_base + GICD_SETSPI_NSR);
- msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR);
msg[0].data = data->parent_data->hwirq;
-
- iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg);
+ msi_msg_set_addr(irq_data_get_msi_desc(data), &msg[0],
+ mbi_phys_base + GICD_SETSPI_NSR);
}
static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg)
{
mbi_compose_msi_msg(data, msg);
- msg[1].address_hi = upper_32_bits(mbi_phys_base + GICD_CLRSPI_NSR);
- msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR);
msg[1].data = data->parent_data->hwirq;
-
- iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), &msg[1]);
+ msi_msg_set_addr(irq_data_get_msi_desc(data), &msg[1],
+ mbi_phys_base + GICD_CLRSPI_NSR);
}
static bool mbi_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c
index c0e1aafe468c..3cb80796cc7c 100644
--- a/drivers/irqchip/irq-ls-scfg-msi.c
+++ b/drivers/irqchip/irq-ls-scfg-msi.c
@@ -87,8 +87,6 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
{
struct ls_scfg_msi *msi_data = irq_data_get_irq_chip_data(data);
- msg->address_hi = upper_32_bits(msi_data->msiir_addr);
- msg->address_lo = lower_32_bits(msi_data->msiir_addr);
msg->data = data->hwirq;
if (msi_affinity_flag) {
@@ -98,7 +96,8 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
msg->data |= cpumask_first(mask);
}
- iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg);
+ msi_msg_set_addr(irq_data_get_msi_desc(data), msg,
+ msi_data->msiir_addr);
}
static int ls_scfg_msi_set_affinity(struct irq_data *irq_data,
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 2/7] genirq/msi: Refactor iommu_dma_compose_msi_msg()
2025-02-20 1:31 ` [PATCH v2 2/7] genirq/msi: Refactor iommu_dma_compose_msi_msg() Nicolin Chen
@ 2025-02-21 9:28 ` Thomas Gleixner
0 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2025-02-21 9:28 UTC (permalink / raw)
To: Nicolin Chen, jgg, kevin.tian, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> The two-step process to translate the MSI address involves two functions,
> iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg().
>
> Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as
> it had to dereference the opaque cookie pointer. Now, the previous patch
> changed the cookie pointer into an integer so there is no longer any need
> for the iommu layer to be involved.
>
> Further, the call sites of iommu_dma_compose_msi_msg() all follow the same
> pattern of setting an MSI message address_hi/lo to non-translated and then
> immediately calling iommu_dma_compose_msi_msg().
>
> Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly
> accepts the u64 version of the address and simplifies all the callers.
>
> Move the new helper to linux/msi.h since it has nothing to do with iommu.
>
> Aside from refactoring, this logically prepares for the next patch, which
> allows multiple implementation options for iommu_dma_prepare_msi(). So, it
> does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c
> as it no longer provides the only iommu_dma_prepare_msi() implementation.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
2025-02-20 1:31 ` [PATCH v2 1/7] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie Nicolin Chen
2025-02-20 1:31 ` [PATCH v2 2/7] genirq/msi: Refactor iommu_dma_compose_msi_msg() Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-21 15:39 ` Robin Murphy
2025-02-20 1:31 ` [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Nicolin Chen
` (4 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
From: Jason Gunthorpe <jgg@nvidia.com>
SW_MSI supports IOMMU to translate an MSI message before the MSI message
is delivered to the interrupt controller. On such systems, an iommu_domain
must have a translation for the MSI message for interrupts to work.
The IRQ subsystem will call into IOMMU to request that a physical page be
set up to receive MSI messages, and the IOMMU then sets an IOVA that maps
to that physical page. Ultimately the IOVA is programmed into the device
via the msi_msg.
Generalize this by allowing iommu_domain owners to provide implementations
of this mapping. Add a function pointer in struct iommu_domain to allow a
domain owner to provide its own implementation.
Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during
the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by
VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in
the iommu_get_msi_cookie() path.
Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain
doesn't change or become freed while running. Races with IRQ operations
from VFIO and domain changes from iommufd are possible here.
Replace the msi_prepare_lock with a lockdep assertion for the group mutex
as documentation. For the dmau_iommu.c each iommu_domain is unique to a
group.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 44 ++++++++++++++++++++++++++-------------
drivers/iommu/dma-iommu.c | 33 +++++++++++++----------------
drivers/iommu/iommu.c | 29 ++++++++++++++++++++++++++
3 files changed, 73 insertions(+), 33 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index caee952febd4..761c5e186de9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -44,6 +44,8 @@ struct iommu_dma_cookie;
struct iommu_fault_param;
struct iommufd_ctx;
struct iommufd_viommu;
+struct msi_desc;
+struct msi_msg;
#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
@@ -216,6 +218,12 @@ struct iommu_domain {
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
int (*iopf_handler)(struct iopf_group *group);
+
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
+ int (*sw_msi)(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr);
+#endif
+
void *fault_data;
union {
struct {
@@ -234,6 +242,16 @@ struct iommu_domain {
};
};
+static inline void iommu_domain_set_sw_msi(
+ struct iommu_domain *domain,
+ int (*sw_msi)(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr))
+{
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
+ domain->sw_msi = sw_msi;
+#endif
+}
+
static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
{
return domain->type & __IOMMU_DOMAIN_DMA_API;
@@ -1470,6 +1488,18 @@ static inline ioasid_t iommu_alloc_global_pasid(struct device *dev)
static inline void iommu_free_global_pasid(ioasid_t pasid) {}
#endif /* CONFIG_IOMMU_API */
+#ifdef CONFIG_IRQ_MSI_IOMMU
+#ifdef CONFIG_IOMMU_API
+int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
+#else
+static inline int iommu_dma_prepare_msi(struct msi_desc *desc,
+ phys_addr_t msi_addr)
+{
+ return 0;
+}
+#endif /* CONFIG_IOMMU_API */
+#endif /* CONFIG_IRQ_MSI_IOMMU */
+
#if IS_ENABLED(CONFIG_LOCKDEP) && IS_ENABLED(CONFIG_IOMMU_API)
void iommu_group_mutex_assert(struct device *dev);
#else
@@ -1503,26 +1533,12 @@ static inline void iommu_debugfs_setup(void) {}
#endif
#ifdef CONFIG_IOMMU_DMA
-#include <linux/msi.h>
-
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
-
-int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
-
#else /* CONFIG_IOMMU_DMA */
-
-struct msi_desc;
-struct msi_msg;
-
static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
{
return -ENODEV;
}
-
-static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
-{
- return 0;
-}
#endif /* CONFIG_IOMMU_DMA */
/*
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index bf91e014d179..3b58244e6344 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -24,6 +24,7 @@
#include <linux/memremap.h>
#include <linux/mm.h>
#include <linux/mutex.h>
+#include <linux/msi.h>
#include <linux/of_iommu.h>
#include <linux/pci.h>
#include <linux/scatterlist.h>
@@ -102,6 +103,9 @@ static int __init iommu_dma_forcedac_setup(char *str)
}
early_param("iommu.forcedac", iommu_dma_forcedac_setup);
+static int iommu_dma_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr);
+
/* Number of entries per flush queue */
#define IOVA_DEFAULT_FQ_SIZE 256
#define IOVA_SINGLE_FQ_SIZE 32768
@@ -398,6 +402,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
return -ENOMEM;
mutex_init(&domain->iova_cookie->mutex);
+ iommu_domain_set_sw_msi(domain, iommu_dma_sw_msi);
return 0;
}
@@ -429,6 +434,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
cookie->msi_iova = base;
domain->iova_cookie = cookie;
+ iommu_domain_set_sw_msi(domain, iommu_dma_sw_msi);
return 0;
}
EXPORT_SYMBOL(iommu_get_msi_cookie);
@@ -443,6 +449,9 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi, *tmp;
+ if (domain->sw_msi != iommu_dma_sw_msi)
+ return;
+
if (!cookie)
return;
@@ -1800,33 +1809,19 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return NULL;
}
-/**
- * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
- * @desc: MSI descriptor, will store the MSI page
- * @msi_addr: MSI target address to be mapped
- *
- * Return: 0 on success or negative error code if the mapping failed.
- */
-int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
+static int iommu_dma_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr)
{
struct device *dev = msi_desc_to_dev(desc);
- struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
- struct iommu_dma_msi_page *msi_page;
- static DEFINE_MUTEX(msi_prepare_lock); /* see below */
+ const struct iommu_dma_msi_page *msi_page;
- if (!domain || !domain->iova_cookie) {
+ if (!domain->iova_cookie) {
msi_desc_set_iommu_msi_iova(desc, 0, 0);
return 0;
}
- /*
- * In fact the whole prepare operation should already be serialised by
- * irq_domain_mutex further up the callchain, but that's pretty subtle
- * on its own, so consider this locking as failsafe documentation...
- */
- mutex_lock(&msi_prepare_lock);
+ iommu_group_mutex_assert(dev);
msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
- mutex_unlock(&msi_prepare_lock);
if (!msi_page)
return -ENOMEM;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 870c3cdbd0f6..022bf96a18c5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3596,3 +3596,32 @@ int iommu_replace_group_handle(struct iommu_group *group,
return ret;
}
EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
+
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
+/**
+ * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
+ * @desc: MSI descriptor, will store the MSI page
+ * @msi_addr: MSI target address to be mapped
+ *
+ * The implementation of sw_msi() should take msi_addr and map it to
+ * an IOVA in the domain and call msi_desc_set_iommu_msi_iova() with the
+ * mapping information.
+ *
+ * Return: 0 on success or negative error code if the mapping failed.
+ */
+int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
+{
+ struct device *dev = msi_desc_to_dev(desc);
+ struct iommu_group *group = dev->iommu_group;
+ int ret = 0;
+
+ if (!group)
+ return 0;
+
+ mutex_lock(&group->mutex);
+ if (group->domain && group->domain->sw_msi)
+ ret = group->domain->sw_msi(group->domain, desc, msi_addr);
+ mutex_unlock(&group->mutex);
+ return ret;
+}
+#endif /* CONFIG_IRQ_MSI_IOMMU */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-20 1:31 ` [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation Nicolin Chen
@ 2025-02-21 15:39 ` Robin Murphy
2025-02-21 16:44 ` Jason Gunthorpe
0 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2025-02-21 15:39 UTC (permalink / raw)
To: Nicolin Chen, jgg, kevin.tian, tglx, maz
Cc: joro, will, shuah, iommu, linux-kernel, linux-arm-kernel,
linux-kselftest, eric.auger, baolu.lu, yi.l.liu, yury.norov,
jacob.pan, patches
On 2025-02-20 1:31 am, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> SW_MSI supports IOMMU to translate an MSI message before the MSI message
> is delivered to the interrupt controller. On such systems, an iommu_domain
> must have a translation for the MSI message for interrupts to work.
>
> The IRQ subsystem will call into IOMMU to request that a physical page be
> set up to receive MSI messages, and the IOMMU then sets an IOVA that maps
> to that physical page. Ultimately the IOVA is programmed into the device
> via the msi_msg.
>
> Generalize this by allowing iommu_domain owners to provide implementations
> of this mapping. Add a function pointer in struct iommu_domain to allow a
> domain owner to provide its own implementation.
>
> Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during
> the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by
> VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in
> the iommu_get_msi_cookie() path.
>
> Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain
> doesn't change or become freed while running. Races with IRQ operations
> from VFIO and domain changes from iommufd are possible here.
>
> Replace the msi_prepare_lock with a lockdep assertion for the group mutex
> as documentation. For the dmau_iommu.c each iommu_domain is unique to a
> group.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> include/linux/iommu.h | 44 ++++++++++++++++++++++++++-------------
> drivers/iommu/dma-iommu.c | 33 +++++++++++++----------------
> drivers/iommu/iommu.c | 29 ++++++++++++++++++++++++++
> 3 files changed, 73 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index caee952febd4..761c5e186de9 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -44,6 +44,8 @@ struct iommu_dma_cookie;
> struct iommu_fault_param;
> struct iommufd_ctx;
> struct iommufd_viommu;
> +struct msi_desc;
> +struct msi_msg;
>
> #define IOMMU_FAULT_PERM_READ (1 << 0) /* read */
> #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
> @@ -216,6 +218,12 @@ struct iommu_domain {
> struct iommu_domain_geometry geometry;
> struct iommu_dma_cookie *iova_cookie;
> int (*iopf_handler)(struct iopf_group *group);
> +
> +#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> + int (*sw_msi)(struct iommu_domain *domain, struct msi_desc *desc,
> + phys_addr_t msi_addr);
> +#endif
> +
> void *fault_data;
> union {
> struct {
> @@ -234,6 +242,16 @@ struct iommu_domain {
> };
> };
>
> +static inline void iommu_domain_set_sw_msi(
> + struct iommu_domain *domain,
> + int (*sw_msi)(struct iommu_domain *domain, struct msi_desc *desc,
> + phys_addr_t msi_addr))
> +{
> +#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> + domain->sw_msi = sw_msi;
> +#endif
> +}
Yuck. Realistically we are going to have no more than two different
implementations of this; a fiddly callback interface seems overkill. All
we should need in the domain is a simple indicator of *which* MSI
translation scheme is in use (if it can't be squeezed into the domain
type itself), then iommu_dma_prepare_msi() can simply dispatch between
iommu-dma and IOMMUFD based on that, and then it's easy to solve all the
other fragility issues too.
Thanks,
Robin.
> +
> static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
> {
> return domain->type & __IOMMU_DOMAIN_DMA_API;
> @@ -1470,6 +1488,18 @@ static inline ioasid_t iommu_alloc_global_pasid(struct device *dev)
> static inline void iommu_free_global_pasid(ioasid_t pasid) {}
> #endif /* CONFIG_IOMMU_API */
>
> +#ifdef CONFIG_IRQ_MSI_IOMMU
> +#ifdef CONFIG_IOMMU_API
> +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
> +#else
> +static inline int iommu_dma_prepare_msi(struct msi_desc *desc,
> + phys_addr_t msi_addr)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_IOMMU_API */
> +#endif /* CONFIG_IRQ_MSI_IOMMU */
> +
> #if IS_ENABLED(CONFIG_LOCKDEP) && IS_ENABLED(CONFIG_IOMMU_API)
> void iommu_group_mutex_assert(struct device *dev);
> #else
> @@ -1503,26 +1533,12 @@ static inline void iommu_debugfs_setup(void) {}
> #endif
>
> #ifdef CONFIG_IOMMU_DMA
> -#include <linux/msi.h>
> -
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> -
> -int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
> -
> #else /* CONFIG_IOMMU_DMA */
> -
> -struct msi_desc;
> -struct msi_msg;
> -
> static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> {
> return -ENODEV;
> }
> -
> -static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> -{
> - return 0;
> -}
> #endif /* CONFIG_IOMMU_DMA */
>
> /*
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index bf91e014d179..3b58244e6344 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -24,6 +24,7 @@
> #include <linux/memremap.h>
> #include <linux/mm.h>
> #include <linux/mutex.h>
> +#include <linux/msi.h>
> #include <linux/of_iommu.h>
> #include <linux/pci.h>
> #include <linux/scatterlist.h>
> @@ -102,6 +103,9 @@ static int __init iommu_dma_forcedac_setup(char *str)
> }
> early_param("iommu.forcedac", iommu_dma_forcedac_setup);
>
> +static int iommu_dma_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
> + phys_addr_t msi_addr);
> +
> /* Number of entries per flush queue */
> #define IOVA_DEFAULT_FQ_SIZE 256
> #define IOVA_SINGLE_FQ_SIZE 32768
> @@ -398,6 +402,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
> return -ENOMEM;
>
> mutex_init(&domain->iova_cookie->mutex);
> + iommu_domain_set_sw_msi(domain, iommu_dma_sw_msi);
> return 0;
> }
>
> @@ -429,6 +434,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>
> cookie->msi_iova = base;
> domain->iova_cookie = cookie;
> + iommu_domain_set_sw_msi(domain, iommu_dma_sw_msi);
> return 0;
> }
> EXPORT_SYMBOL(iommu_get_msi_cookie);
> @@ -443,6 +449,9 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
> struct iommu_dma_cookie *cookie = domain->iova_cookie;
> struct iommu_dma_msi_page *msi, *tmp;
>
> + if (domain->sw_msi != iommu_dma_sw_msi)
> + return;
> +
> if (!cookie)
> return;
>
> @@ -1800,33 +1809,19 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
> return NULL;
> }
>
> -/**
> - * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
> - * @desc: MSI descriptor, will store the MSI page
> - * @msi_addr: MSI target address to be mapped
> - *
> - * Return: 0 on success or negative error code if the mapping failed.
> - */
> -int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> +static int iommu_dma_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
> + phys_addr_t msi_addr)
> {
> struct device *dev = msi_desc_to_dev(desc);
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> - struct iommu_dma_msi_page *msi_page;
> - static DEFINE_MUTEX(msi_prepare_lock); /* see below */
> + const struct iommu_dma_msi_page *msi_page;
>
> - if (!domain || !domain->iova_cookie) {
> + if (!domain->iova_cookie) {
> msi_desc_set_iommu_msi_iova(desc, 0, 0);
> return 0;
> }
>
> - /*
> - * In fact the whole prepare operation should already be serialised by
> - * irq_domain_mutex further up the callchain, but that's pretty subtle
> - * on its own, so consider this locking as failsafe documentation...
> - */
> - mutex_lock(&msi_prepare_lock);
> + iommu_group_mutex_assert(dev);
> msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
> - mutex_unlock(&msi_prepare_lock);
> if (!msi_page)
> return -ENOMEM;
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 870c3cdbd0f6..022bf96a18c5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3596,3 +3596,32 @@ int iommu_replace_group_handle(struct iommu_group *group,
> return ret;
> }
> EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
> +
> +#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> +/**
> + * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
> + * @desc: MSI descriptor, will store the MSI page
> + * @msi_addr: MSI target address to be mapped
> + *
> + * The implementation of sw_msi() should take msi_addr and map it to
> + * an IOVA in the domain and call msi_desc_set_iommu_msi_iova() with the
> + * mapping information.
> + *
> + * Return: 0 on success or negative error code if the mapping failed.
> + */
> +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> +{
> + struct device *dev = msi_desc_to_dev(desc);
> + struct iommu_group *group = dev->iommu_group;
> + int ret = 0;
> +
> + if (!group)
> + return 0;
> +
> + mutex_lock(&group->mutex);
> + if (group->domain && group->domain->sw_msi)
> + ret = group->domain->sw_msi(group->domain, desc, msi_addr);
> + mutex_unlock(&group->mutex);
> + return ret;
> +}
> +#endif /* CONFIG_IRQ_MSI_IOMMU */
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-21 15:39 ` Robin Murphy
@ 2025-02-21 16:44 ` Jason Gunthorpe
2025-02-27 11:21 ` Robin Murphy
0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 16:44 UTC (permalink / raw)
To: Robin Murphy
Cc: Nicolin Chen, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Fri, Feb 21, 2025 at 03:39:45PM +0000, Robin Murphy wrote:
> Yuck. Realistically we are going to have no more than two different
> implementations of this; a fiddly callback interface seems overkill. All we
> should need in the domain is a simple indicator of *which* MSI translation
> scheme is in use (if it can't be squeezed into the domain type itself), then
> iommu_dma_prepare_msi() can simply dispatch between iommu-dma and IOMMUFD
> based on that, and then it's easy to solve all the other fragility issues
> too.
That would make module dependency problems, we have so far avoided
having the core kernel hard depend on iommufd.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-21 16:44 ` Jason Gunthorpe
@ 2025-02-27 11:21 ` Robin Murphy
2025-02-27 15:32 ` Jason Gunthorpe
0 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2025-02-27 11:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On 2025-02-21 4:44 pm, Jason Gunthorpe wrote:
> On Fri, Feb 21, 2025 at 03:39:45PM +0000, Robin Murphy wrote:
>> Yuck. Realistically we are going to have no more than two different
>> implementations of this; a fiddly callback interface seems overkill. All we
>> should need in the domain is a simple indicator of *which* MSI translation
>> scheme is in use (if it can't be squeezed into the domain type itself), then
>> iommu_dma_prepare_msi() can simply dispatch between iommu-dma and IOMMUFD
>> based on that, and then it's easy to solve all the other fragility issues
>> too.
>
> That would make module dependency problems, we have so far avoided
> having the core kernel hard depend on iommufd.
It wouldn't need a hard dependency, it's easy to have a trivial built-in
stub function which becomes valid once the module loads - you literally
have the iommufd_driver infrastructure for precisely that sort of thing
already. All I'm saying is to hide the callback detail in the IOMMUFD
code because being IOMMUFD modular is unique to IOMMUFD and not the rest
of the core code's problem.
And frankly otherwise, what even is the benefit of moving the
iova_cookie pointer into the union if we have to replace it with another
whole pointer to make it work? This is just adding more code and more
complexity in in order to make struct iommu_domain... the same size it
already is :/
Thanks,
Robin.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-27 11:21 ` Robin Murphy
@ 2025-02-27 15:32 ` Jason Gunthorpe
2025-02-27 17:46 ` Nicolin Chen
0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 15:32 UTC (permalink / raw)
To: Robin Murphy
Cc: Nicolin Chen, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Thu, Feb 27, 2025 at 11:21:28AM +0000, Robin Murphy wrote:
> It wouldn't need a hard dependency, it's easy to have a trivial built-in
> stub function which becomes valid once the module loads - you literally have
> the iommufd_driver infrastructure for precisely that sort of thing already.
Yes, but I also kinda dislike using it because it bloats the built in
kernel for a narrow use case..
> All I'm saying is to hide the callback detail in the IOMMUFD code because
> being IOMMUFD modular is unique to IOMMUFD and not the rest of the core
> code's problem.
Maybe we could use a global function pointer set/cleared on iommufd
module load?
Regardless, we need to first find a way for the core code to tell if
the domain is iommufd owned or not.
We should also make it so we can tell if dma-iommu.c is linked to that
domain (eg vfio or the default_domain), then we can do the iova_cookie
move without changing the destruction flows. This would be the missing
union struct tag you mentioned in the other email.
What I've been thinking of is changing type into flags. I think we
have now removed type from all drivers so this should be a small
enough work.
Nicolin should be able to look into some followup here, it is not a
small change.
> And frankly otherwise, what even is the benefit of moving the iova_cookie
> pointer into the union if we have to replace it with another whole pointer
> to make it work?
It makes a lot more semantic sense that the domain owners all share a
single "private data" pointer.
> This is just adding more code and more complexity in in
> order to make struct iommu_domain... the same size it already is :/
That we get back the space we spent on sw_msi is a nice bonus.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-27 15:32 ` Jason Gunthorpe
@ 2025-02-27 17:46 ` Nicolin Chen
2025-02-27 19:47 ` Jason Gunthorpe
0 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-27 17:46 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Robin Murphy, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Thu, Feb 27, 2025 at 11:32:42AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 11:21:28AM +0000, Robin Murphy wrote:
> > All I'm saying is to hide the callback detail in the IOMMUFD code because
> > being IOMMUFD modular is unique to IOMMUFD and not the rest of the core
> > code's problem.
>
> Maybe we could use a global function pointer set/cleared on iommufd
> module load?
>
> Regardless, we need to first find a way for the core code to tell if
> the domain is iommufd owned or not.
>
> We should also make it so we can tell if dma-iommu.c is linked to that
> domain (eg vfio or the default_domain), then we can do the iova_cookie
> move without changing the destruction flows. This would be the missing
> union struct tag you mentioned in the other email.
>
> What I've been thinking of is changing type into flags. I think we
> have now removed type from all drivers so this should be a small
> enough work.
>
> Nicolin should be able to look into some followup here, it is not a
> small change.
>
> > And frankly otherwise, what even is the benefit of moving the iova_cookie
> > pointer into the union if we have to replace it with another whole pointer
> > to make it work?
>
> It makes a lot more semantic sense that the domain owners all share a
> single "private data" pointer.
I found a bit confusing to use "owner" as the domain->owner isn't
the same thing in this context. Maybe it should be "driver_ops"?
Then, "owner" could be another op structure that holds the owner-
specific things, such as:
enum iommu_domain_owner { DMA/VFIO/IOMMUFD}; // or flag?
union {
iova_cookie; // DMA
msi_cookie; // VFIO
iommufd_hwpt; // IOMMUFD
}
(*sw_msi);
?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation
2025-02-27 17:46 ` Nicolin Chen
@ 2025-02-27 19:47 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 19:47 UTC (permalink / raw)
To: Nicolin Chen
Cc: Robin Murphy, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Thu, Feb 27, 2025 at 09:46:55AM -0800, Nicolin Chen wrote:
> I found a bit confusing to use "owner" as the domain->owner isn't
> the same thing in this context. Maybe it should be "driver_ops"?
Maybe, but I wouldn't churn it
> Then, "owner" could be another op structure that holds the owner-
> specific things, such as:
> enum iommu_domain_owner { DMA/VFIO/IOMMUFD}; // or flag?
I was thinking about breaking type into something like this:
u32 private_data_owner:2 // DMA/IOMMUFD/None
u32 translation_type:3 // paging/identity/sva/platform/blocked/nested
u32 dma_fq:1 // true/false
u32 dma_api_domain:1 // true/false
Which is close to how it already is with just some breaking up of the
bits differently.. Get rid of the word unmanaged and drop the
IOMMU_DOMAIN_* defines.
I also wanted to separate the "policy" enum that determines which of
the three default domains you get from the type. Lots of type
combinations are not allowed as policy.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
` (2 preceding siblings ...)
2025-02-20 1:31 ` [PATCH v2 3/7] iommu: Make iommu_dma_prepare_msi() into a generic operation Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-21 9:30 ` Thomas Gleixner
2025-02-21 14:48 ` Jason Gunthorpe
2025-02-20 1:31 ` [PATCH v2 5/7] iommu: Turn fault_data to iommufd private pointer Nicolin Chen
` (3 subsequent siblings)
7 siblings, 2 replies; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
From: Jason Gunthorpe <jgg@nvidia.com>
Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide
an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll
make more sense for irqchips that call prepare/compose to select it, and
that will trigger all the additional code and data to be compiled into
the kernel.
If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the
prepare/compose() will be NOP stubs.
If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on
the iommu side is compiled out.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/Kconfig | 1 -
drivers/irqchip/Kconfig | 4 ++++
kernel/irq/Kconfig | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index ec1b5e32b972..5124e7431fe3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -154,7 +154,6 @@ config IOMMU_DMA
select DMA_OPS_HELPERS
select IOMMU_API
select IOMMU_IOVA
- select IRQ_MSI_IOMMU
select NEED_SG_DMA_LENGTH
select NEED_SG_DMA_FLAGS if SWIOTLB
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index c11b9965c4ad..64658a1c3aa1 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -28,6 +28,7 @@ config ARM_GIC_V2M
select ARM_GIC
select IRQ_MSI_LIB
select PCI_MSI
+ select IRQ_MSI_IOMMU
config GIC_NON_BANKED
bool
@@ -38,12 +39,14 @@ config ARM_GIC_V3
select PARTITION_PERCPU
select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
select HAVE_ARM_SMCCC_DISCOVERY
+ select IRQ_MSI_IOMMU
config ARM_GIC_V3_ITS
bool
select GENERIC_MSI_IRQ
select IRQ_MSI_LIB
default ARM_GIC_V3
+ select IRQ_MSI_IOMMU
config ARM_GIC_V3_ITS_FSL_MC
bool
@@ -408,6 +411,7 @@ config LS_EXTIRQ
config LS_SCFG_MSI
def_bool y if SOC_LS1021A || ARCH_LAYERSCAPE
+ select IRQ_MSI_IOMMU
depends on PCI_MSI
config PARTITION_PERCPU
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 5432418c0fea..9636aed20401 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -100,6 +100,7 @@ config GENERIC_MSI_IRQ
bool
select IRQ_DOMAIN_HIERARCHY
+# irqchip drivers should select this if they call iommu_dma_prepare_msi()
config IRQ_MSI_IOMMU
bool
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it
2025-02-20 1:31 ` [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Nicolin Chen
@ 2025-02-21 9:30 ` Thomas Gleixner
2025-02-21 14:48 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2025-02-21 9:30 UTC (permalink / raw)
To: Nicolin Chen, jgg, kevin.tian, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
On Wed, Feb 19 2025 at 17:31, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide
> an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll
> make more sense for irqchips that call prepare/compose to select it, and
> that will trigger all the additional code and data to be compiled into
> the kernel.
>
> If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the
> prepare/compose() will be NOP stubs.
>
> If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on
> the iommu side is compiled out.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
I don't think I have conflicting changes here, so the MSI/IRQ related
changes can be routed through the IOMMU tree along with the rest.
Thanks,
tglx
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it
2025-02-20 1:31 ` [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Nicolin Chen
2025-02-21 9:30 ` Thomas Gleixner
@ 2025-02-21 14:48 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 14:48 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -154,7 +154,6 @@ config IOMMU_DMA
> select DMA_OPS_HELPERS
> select IOMMU_API
> select IOMMU_IOVA
> - select IRQ_MSI_IOMMU
> select NEED_SG_DMA_LENGTH
> select NEED_SG_DMA_FLAGS if SWIOTLB
Because of the above this patch needs to add:
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,8 +449,10 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi, *tmp;
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
if (domain->sw_msi != iommu_dma_sw_msi)
return;
+#endif
if (!cookie)
return;
I fixed it up
I think the above if can be deleted with the sketch I showed in the
last email since the put_dma_cookie will only ever be called on the
default domain or on the vfio domain which guarantees it is not
iommufd or something else using the union.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 5/7] iommu: Turn fault_data to iommufd private pointer
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
` (3 preceding siblings ...)
2025-02-20 1:31 ` [PATCH v2 4/7] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need it Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-20 17:50 ` Jason Gunthorpe
2025-02-20 1:31 ` [PATCH v2 6/7] iommufd: Implement sw_msi support natively Nicolin Chen
` (2 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
A "fault_data" was added exclusively for the iommufd_fault_iopf_handler()
used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd
version of the sw_msi function will reuse the attach_handle and fault_data
for a non-fault case.
Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault"
case. Move it into a union to be the iommufd private pointer. A following
patch will move the iova_cookie to the union for dma-iommu too after the
iommufd_sw_msi implementation is added.
Since we have two unions now, add some simple comments for readability.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 6 ++++--
drivers/iommu/iommufd/fault.c | 2 +-
drivers/iommu/iommufd/hw_pagetable.c | 2 +-
3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 761c5e186de9..e93d2e918599 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -224,8 +224,10 @@ struct iommu_domain {
phys_addr_t msi_addr);
#endif
- void *fault_data;
- union {
+ union { /* Pointer usable by owner of the domain */
+ struct iommufd_hw_pagetable *iommufd_hwpt; /* iommufd */
+ };
+ union { /* Fault handler */
struct {
iommu_fault_handler_t handler;
void *handler_token;
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 931a3fbe6e32..c48d72c9668c 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -329,7 +329,7 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
struct iommufd_hw_pagetable *hwpt;
struct iommufd_fault *fault;
- hwpt = group->attach_handle->domain->fault_data;
+ hwpt = group->attach_handle->domain->iommufd_hwpt;
fault = hwpt->fault;
spin_lock(&fault->lock);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 598be26a14e2..2641d50f46cf 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -406,10 +406,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
}
hwpt->fault = fault;
hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
- hwpt->domain->fault_data = hwpt;
refcount_inc(&fault->obj.users);
iommufd_put_object(ucmd->ictx, &fault->obj);
}
+ hwpt->domain->iommufd_hwpt = hwpt;
cmd->out_hwpt_id = hwpt->obj.id;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 5/7] iommu: Turn fault_data to iommufd private pointer
2025-02-20 1:31 ` [PATCH v2 5/7] iommu: Turn fault_data to iommufd private pointer Nicolin Chen
@ 2025-02-20 17:50 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-20 17:50 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:40PM -0800, Nicolin Chen wrote:
> A "fault_data" was added exclusively for the iommufd_fault_iopf_handler()
> used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd
> version of the sw_msi function will reuse the attach_handle and fault_data
> for a non-fault case.
>
> Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault"
> case. Move it into a union to be the iommufd private pointer. A following
> patch will move the iova_cookie to the union for dma-iommu too after the
> iommufd_sw_msi implementation is added.
>
> Since we have two unions now, add some simple comments for readability.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> include/linux/iommu.h | 6 ++++--
> drivers/iommu/iommufd/fault.c | 2 +-
> drivers/iommu/iommufd/hw_pagetable.c | 2 +-
> 3 files changed, 6 insertions(+), 4 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 6/7] iommufd: Implement sw_msi support natively
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
` (4 preceding siblings ...)
2025-02-20 1:31 ` [PATCH v2 5/7] iommu: Turn fault_data to iommufd private pointer Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-21 14:51 ` Jason Gunthorpe
2025-02-27 19:33 ` Jason Gunthorpe
2025-02-20 1:31 ` [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer Nicolin Chen
2025-02-21 14:59 ` [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Jason Gunthorpe
7 siblings, 2 replies; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
From: Jason Gunthorpe <jgg@nvidia.com>
iommufd has a model where the iommu_domain can be changed while the VFIO
device is attached. In this case, the MSI should continue to work. This
corner case has not worked because the dma-iommu implementation of sw_msi
is tied to a single domain.
Implement the sw_msi mapping directly and use a global per-fd table to
associate assigned IOVA to the MSI pages. This allows the MSI pages to
be loaded into a domain before it is attached ensuring that MSI is not
disrupted.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_private.h | 23 +++-
drivers/iommu/iommufd/device.c | 160 ++++++++++++++++++++----
drivers/iommu/iommufd/hw_pagetable.c | 3 +
drivers/iommu/iommufd/main.c | 9 ++
4 files changed, 172 insertions(+), 23 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 8e0e3ab64747..246297452a44 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -19,6 +19,22 @@ struct iommu_group;
struct iommu_option;
struct iommufd_device;
+struct iommufd_sw_msi_map {
+ struct list_head sw_msi_item;
+ phys_addr_t sw_msi_start;
+ phys_addr_t msi_addr;
+ unsigned int pgoff;
+ unsigned int id;
+};
+
+/* Bitmap of struct iommufd_sw_msi_map::id */
+struct iommufd_sw_msi_maps {
+ DECLARE_BITMAP(bitmap, 64);
+};
+
+int iommufd_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr);
+
struct iommufd_ctx {
struct file *file;
struct xarray objects;
@@ -26,6 +42,10 @@ struct iommufd_ctx {
wait_queue_head_t destroy_wait;
struct rw_semaphore ioas_creation_lock;
+ struct mutex sw_msi_lock;
+ struct list_head sw_msi_list;
+ unsigned int sw_msi_id;
+
u8 account_mode;
/* Compatibility with VFIO no iommu */
u8 no_iommu_mode;
@@ -283,10 +303,10 @@ struct iommufd_hwpt_paging {
struct iommufd_ioas *ioas;
bool auto_domain : 1;
bool enforce_cache_coherency : 1;
- bool msi_cookie : 1;
bool nest_parent : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
+ struct iommufd_sw_msi_maps present_sw_msi;
};
struct iommufd_hwpt_nested {
@@ -383,6 +403,7 @@ struct iommufd_group {
struct iommu_group *group;
struct iommufd_hw_pagetable *hwpt;
struct list_head device_list;
+ struct iommufd_sw_msi_maps required_sw_msi;
phys_addr_t sw_msi_start;
};
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 0786290b4056..d03c7f9e9530 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -5,6 +5,7 @@
#include <linux/iommufd.h>
#include <linux/slab.h>
#include <uapi/linux/iommufd.h>
+#include <linux/msi.h>
#include "../iommu-priv.h"
#include "io_pagetable.h"
@@ -293,36 +294,151 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
+/*
+ * Get a iommufd_sw_msi_map for the msi physical address requested by the irq
+ * layer. The mapping to IOVA is global to the iommufd file descriptor, every
+ * domain that is attached to a device using the same MSI parameters will use
+ * the same IOVA.
+ */
+static struct iommufd_sw_msi_map *
+iommufd_sw_msi_get_map(struct iommufd_ctx *ictx, phys_addr_t msi_addr,
+ phys_addr_t sw_msi_start)
+{
+ struct iommufd_sw_msi_map *cur;
+ unsigned int max_pgoff = 0;
+
+ lockdep_assert_held(&ictx->sw_msi_lock);
+
+ list_for_each_entry(cur, &ictx->sw_msi_list, sw_msi_item) {
+ if (cur->sw_msi_start != sw_msi_start)
+ continue;
+ max_pgoff = max(max_pgoff, cur->pgoff + 1);
+ if (cur->msi_addr == msi_addr)
+ return cur;
+ }
+
+ if (ictx->sw_msi_id >=
+ BITS_PER_BYTE * sizeof_field(struct iommufd_sw_msi_maps, bitmap))
+ return ERR_PTR(-EOVERFLOW);
+
+ cur = kzalloc(sizeof(*cur), GFP_KERNEL);
+ if (!cur)
+ cur = ERR_PTR(-ENOMEM);
+ cur->sw_msi_start = sw_msi_start;
+ cur->msi_addr = msi_addr;
+ cur->pgoff = max_pgoff;
+ cur->id = ictx->sw_msi_id++;
+ list_add_tail(&cur->sw_msi_item, &ictx->sw_msi_list);
+ return cur;
+}
+
+static int iommufd_sw_msi_install(struct iommufd_ctx *ictx,
+ struct iommufd_hwpt_paging *hwpt_paging,
+ struct iommufd_sw_msi_map *msi_map)
+{
+ unsigned long iova;
+
+ lockdep_assert_held(&ictx->sw_msi_lock);
+
+ iova = msi_map->sw_msi_start + msi_map->pgoff * PAGE_SIZE;
+ if (!test_bit(msi_map->id, hwpt_paging->present_sw_msi.bitmap)) {
+ int rc;
+
+ rc = iommu_map(hwpt_paging->common.domain, iova,
+ msi_map->msi_addr, PAGE_SIZE,
+ IOMMU_WRITE | IOMMU_READ | IOMMU_MMIO,
+ GFP_KERNEL_ACCOUNT);
+ if (rc)
+ return rc;
+ __set_bit(msi_map->id, hwpt_paging->present_sw_msi.bitmap);
+ }
+ return 0;
+}
+
+/*
+ * Called by the irq code if the platform translates the MSI address through the
+ * IOMMU. msi_addr is the physical address of the MSI page. iommufd will
+ * allocate a fd global iova for the physical page that is the same on all
+ * domains and devices.
+ */
+#ifdef CONFIG_IRQ_MSI_IOMMU
+int iommufd_sw_msi(struct iommu_domain *domain, struct msi_desc *desc,
+ phys_addr_t msi_addr)
+{
+ struct device *dev = msi_desc_to_dev(desc);
+ struct iommufd_hwpt_paging *hwpt_paging;
+ struct iommu_attach_handle *raw_handle;
+ struct iommufd_attach_handle *handle;
+ struct iommufd_sw_msi_map *msi_map;
+ struct iommufd_ctx *ictx;
+ unsigned long iova;
+ int rc;
+
+ /*
+ * It is safe to call iommu_attach_handle_get() here because the iommu
+ * core code invokes this under the group mutex which also prevents any
+ * change of the attach handle for the duration of this function.
+ */
+ iommu_group_mutex_assert(dev);
+
+ raw_handle =
+ iommu_attach_handle_get(dev->iommu_group, IOMMU_NO_PASID, 0);
+ if (IS_ERR(raw_handle))
+ return 0;
+ hwpt_paging = find_hwpt_paging(domain->iommufd_hwpt);
+
+ handle = to_iommufd_handle(raw_handle);
+ /* No IOMMU_RESV_SW_MSI means no change to the msi_msg */
+ if (handle->idev->igroup->sw_msi_start == PHYS_ADDR_MAX)
+ return 0;
+
+ ictx = handle->idev->ictx;
+ guard(mutex)(&ictx->sw_msi_lock);
+ /*
+ * The input msi_addr is the exact byte offset of the MSI doorbell, we
+ * assume the caller has checked that it is contained with a MMIO region
+ * that is secure to map at PAGE_SIZE.
+ */
+ msi_map = iommufd_sw_msi_get_map(handle->idev->ictx,
+ msi_addr & PAGE_MASK,
+ handle->idev->igroup->sw_msi_start);
+ if (IS_ERR(msi_map))
+ return PTR_ERR(msi_map);
+
+ rc = iommufd_sw_msi_install(ictx, hwpt_paging, msi_map);
+ if (rc)
+ return rc;
+ __set_bit(msi_map->id, handle->idev->igroup->required_sw_msi.bitmap);
+
+ iova = msi_map->sw_msi_start + msi_map->pgoff * PAGE_SIZE;
+ msi_desc_set_iommu_msi_iova(desc, iova, PAGE_SHIFT);
+ return 0;
+}
+#endif
+
static int iommufd_group_setup_msi(struct iommufd_group *igroup,
struct iommufd_hwpt_paging *hwpt_paging)
{
- phys_addr_t sw_msi_start = igroup->sw_msi_start;
- int rc;
+ struct iommufd_ctx *ictx = igroup->ictx;
+ struct iommufd_sw_msi_map *cur;
+
+ if (igroup->sw_msi_start == PHYS_ADDR_MAX)
+ return 0;
/*
- * If the IOMMU driver gives a IOMMU_RESV_SW_MSI then it is asking us to
- * call iommu_get_msi_cookie() on its behalf. This is necessary to setup
- * the MSI window so iommu_dma_prepare_msi() can install pages into our
- * domain after request_irq(). If it is not done interrupts will not
- * work on this domain.
- *
- * FIXME: This is conceptually broken for iommufd since we want to allow
- * userspace to change the domains, eg switch from an identity IOAS to a
- * DMA IOAS. There is currently no way to create a MSI window that
- * matches what the IRQ layer actually expects in a newly created
- * domain.
+ * Install all the MSI pages the device has been using into the domain
*/
- if (sw_msi_start != PHYS_ADDR_MAX && !hwpt_paging->msi_cookie) {
- rc = iommu_get_msi_cookie(hwpt_paging->common.domain,
- sw_msi_start);
+ guard(mutex)(&ictx->sw_msi_lock);
+ list_for_each_entry(cur, &ictx->sw_msi_list, sw_msi_item) {
+ int rc;
+
+ if (cur->sw_msi_start != igroup->sw_msi_start ||
+ !test_bit(cur->id, igroup->required_sw_msi.bitmap))
+ continue;
+
+ rc = iommufd_sw_msi_install(ictx, hwpt_paging, cur);
if (rc)
return rc;
-
- /*
- * iommu_get_msi_cookie() can only be called once per domain,
- * it returns -EBUSY on later calls.
- */
- hwpt_paging->msi_cookie = true;
}
return 0;
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 2641d50f46cf..7de6e914232e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -156,6 +156,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
goto out_abort;
}
}
+ iommu_domain_set_sw_msi(hwpt->domain, iommufd_sw_msi);
/*
* Set the coherency mode before we do iopt_table_add_domain() as some
@@ -251,6 +252,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
goto out_abort;
}
hwpt->domain->owner = ops;
+ iommu_domain_set_sw_msi(hwpt->domain, iommufd_sw_msi);
if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) {
rc = -EINVAL;
@@ -307,6 +309,7 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
goto out_abort;
}
hwpt->domain->owner = viommu->iommu_dev->ops;
+ iommu_domain_set_sw_msi(hwpt->domain, iommufd_sw_msi);
if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) {
rc = -EINVAL;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index ccf616462a1c..b6fa9fd11bc1 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -227,6 +227,8 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
xa_init(&ictx->groups);
ictx->file = filp;
init_waitqueue_head(&ictx->destroy_wait);
+ mutex_init(&ictx->sw_msi_lock);
+ INIT_LIST_HEAD(&ictx->sw_msi_list);
filp->private_data = ictx;
return 0;
}
@@ -234,6 +236,8 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
static int iommufd_fops_release(struct inode *inode, struct file *filp)
{
struct iommufd_ctx *ictx = filp->private_data;
+ struct iommufd_sw_msi_map *next;
+ struct iommufd_sw_msi_map *cur;
struct iommufd_object *obj;
/*
@@ -262,6 +266,11 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
break;
}
WARN_ON(!xa_empty(&ictx->groups));
+
+ mutex_destroy(&ictx->sw_msi_lock);
+ list_for_each_entry_safe(cur, next, &ictx->sw_msi_list, sw_msi_item)
+ kfree(cur);
+
kfree(ictx);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 6/7] iommufd: Implement sw_msi support natively
2025-02-20 1:31 ` [PATCH v2 6/7] iommufd: Implement sw_msi support natively Nicolin Chen
@ 2025-02-21 14:51 ` Jason Gunthorpe
2025-02-27 19:33 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 14:51 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:41PM -0800, Nicolin Chen wrote:
> +/*
> + * Get a iommufd_sw_msi_map for the msi physical address requested by the irq
> + * layer. The mapping to IOVA is global to the iommufd file descriptor, every
> + * domain that is attached to a device using the same MSI parameters will use
> + * the same IOVA.
> + */
> +static struct iommufd_sw_msi_map *
> +iommufd_sw_msi_get_map(struct iommufd_ctx *ictx, phys_addr_t msi_addr,
> + phys_addr_t sw_msi_start)
> +{
This ends up being never called if !CONFIG_IRQ_MSI_IOMMU because the
sw_msi doesn't exist.
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -300,7 +300,7 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
* domain that is attached to a device using the same MSI parameters will use
* the same IOVA.
*/
-static struct iommufd_sw_msi_map *
+static __maybe_unused struct iommufd_sw_msi_map *
iommufd_sw_msi_get_map(struct iommufd_ctx *ictx, phys_addr_t msi_addr,
phys_addr_t sw_msi_start)
{
Fixed it up
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 6/7] iommufd: Implement sw_msi support natively
2025-02-20 1:31 ` [PATCH v2 6/7] iommufd: Implement sw_msi support natively Nicolin Chen
2025-02-21 14:51 ` Jason Gunthorpe
@ 2025-02-27 19:33 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-27 19:33 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:41PM -0800, Nicolin Chen wrote:
> + cur = kzalloc(sizeof(*cur), GFP_KERNEL);
> + if (!cur)
> + cur = ERR_PTR(-ENOMEM);
^^^^^^^^^^^^^^^^
> + cur->sw_msi_start = sw_msi_start;
> + cur->msi_addr = msi_addr;
> + cur->pgoff = max_pgoff;
> + cur->id = ictx->sw_msi_id++;
> + list_add_tail(&cur->sw_msi_item, &ictx->sw_msi_list);
> + return cur;
> +}
Dan pointed out this should have been
return ERR_PTR(-ENOMEM);
I fixed it up
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
` (5 preceding siblings ...)
2025-02-20 1:31 ` [PATCH v2 6/7] iommufd: Implement sw_msi support natively Nicolin Chen
@ 2025-02-20 1:31 ` Nicolin Chen
2025-02-20 17:50 ` Jason Gunthorpe
2025-02-21 14:39 ` Jason Gunthorpe
2025-02-21 14:59 ` [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Jason Gunthorpe
7 siblings, 2 replies; 34+ messages in thread
From: Nicolin Chen @ 2025-02-20 1:31 UTC (permalink / raw)
To: jgg, kevin.tian, tglx, maz
Cc: joro, will, robin.murphy, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
Now that iommufd does not rely on dma-iommu.c for any purpose. We can
combine the dma-iommu.c iova_cookie and the iommufd_hwpt under the same
union. This union is effectively 'owner data' and can be used by the
entity that allocated the domain. Note that legacy vfio type1 flows
continue to use dma-iommu.c for sw_msi and still need iova_cookie.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e93d2e918599..99dd72998cb7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -216,7 +216,6 @@ struct iommu_domain {
const struct iommu_ops *owner; /* Whose domain_alloc we came from */
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
- struct iommu_dma_cookie *iova_cookie;
int (*iopf_handler)(struct iopf_group *group);
#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
@@ -225,6 +224,7 @@ struct iommu_domain {
#endif
union { /* Pointer usable by owner of the domain */
+ struct iommu_dma_cookie *iova_cookie; /* dma-iommu */
struct iommufd_hw_pagetable *iommufd_hwpt; /* iommufd */
};
union { /* Fault handler */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-20 1:31 ` [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer Nicolin Chen
@ 2025-02-20 17:50 ` Jason Gunthorpe
2025-02-21 14:39 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-20 17:50 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:42PM -0800, Nicolin Chen wrote:
> Now that iommufd does not rely on dma-iommu.c for any purpose. We can
> combine the dma-iommu.c iova_cookie and the iommufd_hwpt under the same
> union. This union is effectively 'owner data' and can be used by the
> entity that allocated the domain. Note that legacy vfio type1 flows
> continue to use dma-iommu.c for sw_msi and still need iova_cookie.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> include/linux/iommu.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-20 1:31 ` [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer Nicolin Chen
2025-02-20 17:50 ` Jason Gunthorpe
@ 2025-02-21 14:39 ` Jason Gunthorpe
2025-02-21 15:23 ` Robin Murphy
2025-02-26 2:25 ` Nicolin Chen
1 sibling, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 14:39 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:42PM -0800, Nicolin Chen wrote:
> Now that iommufd does not rely on dma-iommu.c for any purpose. We can
> combine the dma-iommu.c iova_cookie and the iommufd_hwpt under the same
> union. This union is effectively 'owner data' and can be used by the
> entity that allocated the domain. Note that legacy vfio type1 flows
> continue to use dma-iommu.c for sw_msi and still need iova_cookie.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> include/linux/iommu.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e93d2e918599..99dd72998cb7 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -216,7 +216,6 @@ struct iommu_domain {
> const struct iommu_ops *owner; /* Whose domain_alloc we came from */
> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
> struct iommu_domain_geometry geometry;
> - struct iommu_dma_cookie *iova_cookie;
> int (*iopf_handler)(struct iopf_group *group);
>
> #if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> @@ -225,6 +224,7 @@ struct iommu_domain {
> #endif
>
> union { /* Pointer usable by owner of the domain */
> + struct iommu_dma_cookie *iova_cookie; /* dma-iommu */
> struct iommufd_hw_pagetable *iommufd_hwpt; /* iommufd */
> };
Ugh, there is a problem here:
void iommu_domain_free(struct iommu_domain *domain)
{
if (domain->type == IOMMU_DOMAIN_SVA)
mmdrop(domain->mm);
iommu_put_dma_cookie(domain);
It calls into dma-iommu for all domain types
And if !CONFIG_IRQ_MSI_IOMMU then this isn't possible to protect it
against iommufd owning the cookie union:
#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
if (domain->sw_msi != iommu_dma_sw_msi)
return;
#endif
I came up with the below, but I will drop this patch for now you can
repost it as a cleanup series..
Jason
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3b58244e6344a5..31d53552dc4790 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -418,6 +418,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
* number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
* used by the devices attached to @domain.
*/
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
{
struct iommu_dma_cookie *cookie;
@@ -439,6 +440,13 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
}
EXPORT_SYMBOL(iommu_get_msi_cookie);
+void iommu_put_msi_cookie(struct iommu_domain *domain)
+{
+ iommu_put_dma_cookie(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_put_msi_cookie);
+#endif
+
/**
* iommu_put_dma_cookie - Release a domain's DMA mapping resources
* @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
@@ -449,8 +457,10 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi, *tmp;
+#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
if (domain->sw_msi != iommu_dma_sw_msi)
return;
+#endif
if (!cookie)
return;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 022bf96a18c5e4..f07544b290e5b1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -456,6 +456,12 @@ static int iommu_init_device(struct device *dev, const struct iommu_ops *ops)
return ret;
}
+static void iommu_default_domain_free(struct iommu_domain *domain)
+{
+ iommu_put_dma_cookie(domain);
+ iommu_domain_free(domain);
+}
+
static void iommu_deinit_device(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
@@ -494,7 +500,7 @@ static void iommu_deinit_device(struct device *dev)
*/
if (list_empty(&group->devices)) {
if (group->default_domain) {
- iommu_domain_free(group->default_domain);
+ iommu_default_domain_free(group->default_domain);
group->default_domain = NULL;
}
if (group->blocking_domain) {
@@ -2023,7 +2029,6 @@ void iommu_domain_free(struct iommu_domain *domain)
{
if (domain->type == IOMMU_DOMAIN_SVA)
mmdrop(domain->mm);
- iommu_put_dma_cookie(domain);
if (domain->ops->free)
domain->ops->free(domain);
}
@@ -3000,7 +3005,7 @@ static int iommu_setup_default_domain(struct iommu_group *group,
out_free_old:
if (old_dom)
- iommu_domain_free(old_dom);
+ iommu_default_domain_free(old_dom);
return ret;
err_restore_domain:
@@ -3009,7 +3014,7 @@ static int iommu_setup_default_domain(struct iommu_group *group,
group, old_dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
err_restore_def_domain:
if (old_dom) {
- iommu_domain_free(dom);
+ iommu_default_domain_free(dom);
group->default_domain = old_dom;
}
return ret;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 50ebc9593c9d70..b5bb946c9c1b19 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2271,6 +2271,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (!iommu_attach_group(d->domain,
group->iommu_group)) {
list_add(&group->next, &d->group_list);
+ iommu_put_msi_cookie(domain->domain);
iommu_domain_free(domain->domain);
kfree(domain);
goto done;
@@ -2316,6 +2317,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
out_detach:
iommu_detach_group(domain->domain, group->iommu_group);
out_domain:
+ iommu_put_msi_cookie(domain->domain);
iommu_domain_free(domain->domain);
vfio_iommu_iova_free(&iova_copy);
vfio_iommu_resv_free(&group_resv_regions);
@@ -2496,6 +2498,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
vfio_iommu_unmap_unpin_reaccount(iommu);
}
}
+ iommu_put_msi_cookie(domain->domain);
iommu_domain_free(domain->domain);
list_del(&domain->next);
kfree(domain);
@@ -2567,6 +2570,7 @@ static void vfio_release_domain(struct vfio_domain *domain)
kfree(group);
}
+ iommu_put_msi_cookie(domain->domain);
iommu_domain_free(domain->domain);
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 99dd72998cb7f7..082274e8ba6a3d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1534,12 +1534,16 @@ void iommu_debugfs_setup(void);
static inline void iommu_debugfs_setup(void) {}
#endif
-#ifdef CONFIG_IOMMU_DMA
+#if defined(CONFIG_IOMMU_DMA) && IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
+void iommu_put_msi_cookie(struct iommu_domain *domain);
#else /* CONFIG_IOMMU_DMA */
static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
{
- return -ENODEV;
+ return 0;
+}
+static inline void iommu_put_msi_cookie(struct iommu_domain *domain)
+{
}
#endif /* CONFIG_IOMMU_DMA */
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-21 14:39 ` Jason Gunthorpe
@ 2025-02-21 15:23 ` Robin Murphy
2025-02-21 16:48 ` Jason Gunthorpe
2025-02-26 2:25 ` Nicolin Chen
1 sibling, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2025-02-21 15:23 UTC (permalink / raw)
To: Jason Gunthorpe, Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, shuah, iommu, linux-kernel,
linux-arm-kernel, linux-kselftest, eric.auger, baolu.lu, yi.l.liu,
yury.norov, jacob.pan, patches
On 2025-02-21 2:39 pm, Jason Gunthorpe wrote:
> On Wed, Feb 19, 2025 at 05:31:42PM -0800, Nicolin Chen wrote:
>> Now that iommufd does not rely on dma-iommu.c for any purpose. We can
>> combine the dma-iommu.c iova_cookie and the iommufd_hwpt under the same
>> union. This union is effectively 'owner data' and can be used by the
>> entity that allocated the domain. Note that legacy vfio type1 flows
>> continue to use dma-iommu.c for sw_msi and still need iova_cookie.
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> ---
>> include/linux/iommu.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index e93d2e918599..99dd72998cb7 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -216,7 +216,6 @@ struct iommu_domain {
>> const struct iommu_ops *owner; /* Whose domain_alloc we came from */
>> unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
>> struct iommu_domain_geometry geometry;
>> - struct iommu_dma_cookie *iova_cookie;
>> int (*iopf_handler)(struct iopf_group *group);
>>
>> #if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
>> @@ -225,6 +224,7 @@ struct iommu_domain {
>> #endif
>>
>> union { /* Pointer usable by owner of the domain */
>> + struct iommu_dma_cookie *iova_cookie; /* dma-iommu */
>> struct iommufd_hw_pagetable *iommufd_hwpt; /* iommufd */
>> };
>
> Ugh, there is a problem here:
>
> void iommu_domain_free(struct iommu_domain *domain)
> {
> if (domain->type == IOMMU_DOMAIN_SVA)
> mmdrop(domain->mm);
> iommu_put_dma_cookie(domain);
>
> It calls into dma-iommu for all domain types
>
> And if !CONFIG_IRQ_MSI_IOMMU then this isn't possible to protect it
> against iommufd owning the cookie union:
>
> #if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> if (domain->sw_msi != iommu_dma_sw_msi)
> return;
> #endif
>
> I came up with the below, but I will drop this patch for now you can
> repost it as a cleanup series..
Eww... What's the issue with just checking the domain type in
iommu_put_dma_cookie()? Is is that IOMMUFD and VFIO type 1 are both
doing their own different thing with IOMMU_DOMAIN_UNMANAGED?
In general it seems like a bad smell to have a union in a structure with
not enough information within that structire itself to know which union
member is valid... :/
Thanks,
Robin.
>
> Jason
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 3b58244e6344a5..31d53552dc4790 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -418,6 +418,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
> * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
> * used by the devices attached to @domain.
> */
> +#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> {
> struct iommu_dma_cookie *cookie;
> @@ -439,6 +440,13 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> }
> EXPORT_SYMBOL(iommu_get_msi_cookie);
>
> +void iommu_put_msi_cookie(struct iommu_domain *domain)
> +{
> + iommu_put_dma_cookie(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_put_msi_cookie);
> +#endif
> +
> /**
> * iommu_put_dma_cookie - Release a domain's DMA mapping resources
> * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
> @@ -449,8 +457,10 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
> struct iommu_dma_cookie *cookie = domain->iova_cookie;
> struct iommu_dma_msi_page *msi, *tmp;
>
> +#if IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> if (domain->sw_msi != iommu_dma_sw_msi)
> return;
> +#endif
>
> if (!cookie)
> return;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 022bf96a18c5e4..f07544b290e5b1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -456,6 +456,12 @@ static int iommu_init_device(struct device *dev, const struct iommu_ops *ops)
> return ret;
> }
>
> +static void iommu_default_domain_free(struct iommu_domain *domain)
> +{
> + iommu_put_dma_cookie(domain);
> + iommu_domain_free(domain);
> +}
> +
> static void iommu_deinit_device(struct device *dev)
> {
> struct iommu_group *group = dev->iommu_group;
> @@ -494,7 +500,7 @@ static void iommu_deinit_device(struct device *dev)
> */
> if (list_empty(&group->devices)) {
> if (group->default_domain) {
> - iommu_domain_free(group->default_domain);
> + iommu_default_domain_free(group->default_domain);
> group->default_domain = NULL;
> }
> if (group->blocking_domain) {
> @@ -2023,7 +2029,6 @@ void iommu_domain_free(struct iommu_domain *domain)
> {
> if (domain->type == IOMMU_DOMAIN_SVA)
> mmdrop(domain->mm);
> - iommu_put_dma_cookie(domain);
> if (domain->ops->free)
> domain->ops->free(domain);
> }
> @@ -3000,7 +3005,7 @@ static int iommu_setup_default_domain(struct iommu_group *group,
>
> out_free_old:
> if (old_dom)
> - iommu_domain_free(old_dom);
> + iommu_default_domain_free(old_dom);
> return ret;
>
> err_restore_domain:
> @@ -3009,7 +3014,7 @@ static int iommu_setup_default_domain(struct iommu_group *group,
> group, old_dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
> err_restore_def_domain:
> if (old_dom) {
> - iommu_domain_free(dom);
> + iommu_default_domain_free(dom);
> group->default_domain = old_dom;
> }
> return ret;
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 50ebc9593c9d70..b5bb946c9c1b19 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2271,6 +2271,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (!iommu_attach_group(d->domain,
> group->iommu_group)) {
> list_add(&group->next, &d->group_list);
> + iommu_put_msi_cookie(domain->domain);
> iommu_domain_free(domain->domain);
> kfree(domain);
> goto done;
> @@ -2316,6 +2317,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> out_detach:
> iommu_detach_group(domain->domain, group->iommu_group);
> out_domain:
> + iommu_put_msi_cookie(domain->domain);
> iommu_domain_free(domain->domain);
> vfio_iommu_iova_free(&iova_copy);
> vfio_iommu_resv_free(&group_resv_regions);
> @@ -2496,6 +2498,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> vfio_iommu_unmap_unpin_reaccount(iommu);
> }
> }
> + iommu_put_msi_cookie(domain->domain);
> iommu_domain_free(domain->domain);
> list_del(&domain->next);
> kfree(domain);
> @@ -2567,6 +2570,7 @@ static void vfio_release_domain(struct vfio_domain *domain)
> kfree(group);
> }
>
> + iommu_put_msi_cookie(domain->domain);
> iommu_domain_free(domain->domain);
> }
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 99dd72998cb7f7..082274e8ba6a3d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -1534,12 +1534,16 @@ void iommu_debugfs_setup(void);
> static inline void iommu_debugfs_setup(void) {}
> #endif
>
> -#ifdef CONFIG_IOMMU_DMA
> +#if defined(CONFIG_IOMMU_DMA) && IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> +void iommu_put_msi_cookie(struct iommu_domain *domain);
> #else /* CONFIG_IOMMU_DMA */
> static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> {
> - return -ENODEV;
> + return 0;
> +}
> +static inline void iommu_put_msi_cookie(struct iommu_domain *domain)
> +{
> }
> #endif /* CONFIG_IOMMU_DMA */
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-21 15:23 ` Robin Murphy
@ 2025-02-21 16:48 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 16:48 UTC (permalink / raw)
To: Robin Murphy
Cc: Nicolin Chen, kevin.tian, tglx, maz, joro, will, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Fri, Feb 21, 2025 at 03:23:22PM +0000, Robin Murphy wrote:
> Eww... What's the issue with just checking the domain type in
> iommu_put_dma_cookie()? Is is that IOMMUFD and VFIO type 1 are both doing
> their own different thing with IOMMU_DOMAIN_UNMANAGED?
Yes
> In general it seems like a bad smell to have a union in a structure with not
> enough information within that structire itself to know which union member
> is valid... :/
The concept is the opaque pointer belongs only to the caller that
allocated and owns the domain. The core iommu code should never look
at it or touch it.
The problem is with the mandatory call to dma-iommu in the free path -
dma-iommu code should never be invoked outside of VFIO and the default
domain cases.
So the little rework I sketched makes it into the caller knowing if
dma-iommu is operating that domain and then only does it call the
dma-iommu related functions, and the core code never touches the union
content.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-21 14:39 ` Jason Gunthorpe
2025-02-21 15:23 ` Robin Murphy
@ 2025-02-26 2:25 ` Nicolin Chen
2025-02-26 17:36 ` Jason Gunthorpe
1 sibling, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-26 2:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Fri, Feb 21, 2025 at 10:39:59AM -0400, Jason Gunthorpe wrote:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 99dd72998cb7f7..082274e8ba6a3d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -1534,12 +1534,16 @@ void iommu_debugfs_setup(void);
> static inline void iommu_debugfs_setup(void) {}
> #endif
>
> -#ifdef CONFIG_IOMMU_DMA
> +#if defined(CONFIG_IOMMU_DMA) && IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> +void iommu_put_msi_cookie(struct iommu_domain *domain);
> #else /* CONFIG_IOMMU_DMA */
> static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> {
> - return -ENODEV;
> + return 0;
Should we keep the -ENODEV here for !CONFIG_IOMMU_DMA?
Nicolin
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-26 2:25 ` Nicolin Chen
@ 2025-02-26 17:36 ` Jason Gunthorpe
2025-02-26 18:57 ` Nicolin Chen
0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 17:36 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Tue, Feb 25, 2025 at 06:25:27PM -0800, Nicolin Chen wrote:
> On Fri, Feb 21, 2025 at 10:39:59AM -0400, Jason Gunthorpe wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 99dd72998cb7f7..082274e8ba6a3d 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -1534,12 +1534,16 @@ void iommu_debugfs_setup(void);
> > static inline void iommu_debugfs_setup(void) {}
> > #endif
> >
> > -#ifdef CONFIG_IOMMU_DMA
> > +#if defined(CONFIG_IOMMU_DMA) && IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> > int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> > +void iommu_put_msi_cookie(struct iommu_domain *domain);
> > #else /* CONFIG_IOMMU_DMA */
> > static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> > {
> > - return -ENODEV;
> > + return 0;
>
> Should we keep the -ENODEV here for !CONFIG_IOMMU_DMA?
My feeling was if the system doesn't have an IRQ driver that needs
MSI_IOMMU but does have a IOMMU driver that reports SW_MSI reserved
regions then iommufd/vfio should not fail.
I don't think it is realistic that we'd ever hit this return.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-26 17:36 ` Jason Gunthorpe
@ 2025-02-26 18:57 ` Nicolin Chen
2025-02-26 19:18 ` Jason Gunthorpe
0 siblings, 1 reply; 34+ messages in thread
From: Nicolin Chen @ 2025-02-26 18:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 26, 2025 at 01:36:10PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 25, 2025 at 06:25:27PM -0800, Nicolin Chen wrote:
> > On Fri, Feb 21, 2025 at 10:39:59AM -0400, Jason Gunthorpe wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 99dd72998cb7f7..082274e8ba6a3d 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -1534,12 +1534,16 @@ void iommu_debugfs_setup(void);
> > > static inline void iommu_debugfs_setup(void) {}
> > > #endif
> > >
> > > -#ifdef CONFIG_IOMMU_DMA
> > > +#if defined(CONFIG_IOMMU_DMA) && IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)
> > > int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
> > > +void iommu_put_msi_cookie(struct iommu_domain *domain);
> > > #else /* CONFIG_IOMMU_DMA */
> > > static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
> > > {
> > > - return -ENODEV;
> > > + return 0;
> >
> > Should we keep the -ENODEV here for !CONFIG_IOMMU_DMA?
>
> My feeling was if the system doesn't have an IRQ driver that needs
> MSI_IOMMU but does have a IOMMU driver that reports SW_MSI reserved
> regions then iommufd/vfio should not fail.
OK, I see. But we are also changing the behavior for the
!CONFIG_IOMMU_DMA configuration, in which case all other iommu
functions seem to return -ENODEV. And I assume we would need a
justification for such a change?
Perhaps, this can be explicit, just to keep the consistency:
/* NOP if IOMMU driver reports SW_MSI reserved regions */
return IS_ENABLED(CONFIG_IOMMU_DMA) ? 0 : -ENODEV;
> I don't think it is realistic that we'd ever hit this return.
Yea, the only caller is VFIO, where there are quite a few IOMMU
functions calls before reaching to this one. So, it would have
been just returned with any -ENODEV prior.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer
2025-02-26 18:57 ` Nicolin Chen
@ 2025-02-26 19:18 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 19:18 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 26, 2025 at 10:57:00AM -0800, Nicolin Chen wrote:
> OK, I see. But we are also changing the behavior for the
> !CONFIG_IOMMU_DMA configuration, in which case all other iommu
> functions seem to return -ENODEV. And I assume we would need a
> justification for such a change?
>
> Perhaps, this can be explicit, just to keep the consistency:
> /* NOP if IOMMU driver reports SW_MSI reserved regions */
> return IS_ENABLED(CONFIG_IOMMU_DMA) ? 0 : -ENODEV;
Hurm, I donno, if we don't have a IRQ driver that supports it then it
doesn't matter that IOMMU_DMA is off either because there is nothing
that would call into IOMMU_DMA in the first place.
Success still seems like the right answer on such a weirdo kconfig.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core)
2025-02-20 1:31 [PATCH v2 0/7] iommu: Add MSI mapping support with nested SMMU (Part-1 core) Nicolin Chen
` (6 preceding siblings ...)
2025-02-20 1:31 ` [PATCH v2 7/7] iommu: Turn iova_cookie to dma-iommu private pointer Nicolin Chen
@ 2025-02-21 14:59 ` Jason Gunthorpe
7 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2025-02-21 14:59 UTC (permalink / raw)
To: Nicolin Chen
Cc: kevin.tian, tglx, maz, joro, will, robin.murphy, shuah, iommu,
linux-kernel, linux-arm-kernel, linux-kselftest, eric.auger,
baolu.lu, yi.l.liu, yury.norov, jacob.pan, patches
On Wed, Feb 19, 2025 at 05:31:35PM -0800, Nicolin Chen wrote:
>
> Jason Gunthorpe (5):
> genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of
> iommu_cookie
> genirq/msi: Refactor iommu_dma_compose_msi_msg()
> iommu: Make iommu_dma_prepare_msi() into a generic operation
> irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need
> it
> iommufd: Implement sw_msi support natively
>
> Nicolin Chen (2):
> iommu: Turn fault_data to iommufd private pointer
I dropped this patch:
> iommu: Turn iova_cookie to dma-iommu private pointer
And fixed up the two compilation issues found by building on my x86
config, plus Thomas's language update.
It is headed toward linux-next, give it till monday for a PR to Joerg
just incase there are more randconfig issues.
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread