* [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 [PATCH 0/5] iommu: Domain allocation enhancements Vasant Hegde
@ 2024-08-21 13:35 ` Vasant Hegde
2024-08-21 16:31 ` Jason Gunthorpe
` (4 more replies)
2024-08-21 13:35 ` [PATCH 2/5] iommu/amd: Separate page table setup from domain allocation Vasant Hegde
` (3 subsequent siblings)
4 siblings, 5 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-21 13:35 UTC (permalink / raw)
To: iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
baolu.lu, kevin.tian, Vasant Hegde
From: Jason Gunthorpe <jgg@ziepe.ca>
Currently drivers calls iommu_paging_domain_alloc(dev) to get an
UNAMANAGED domain. This is not sufficient to support PASID with
UNAMANAGED domain as some HW like AMD requires certain page table type
to support PASIDs.
Also domain_alloc_paging() passes device as param for domain
allocation. This is not sufficient for AMD driver to decide the right
page table.
Hence add iommu_paging_domain_alloc_flags() API which takes flags as
parameter. Also update default domain allocation path to use this new
API whenever device is PASID capable. HW driver should implement
domain_alloc_user() to allocate PASID capable domain.
Finally introduce new flag (IOMMUFD_HWPT_ALLOC_PASID) to
domain_alloc_users() ops. If both IOMMU and device supports PASID it
will allocate domain. Otherwise return error.
Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca>
[Added __iommu_paging_domain_alloc_flags() and description - Vasant]
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
@Jason,
Notice that I have added __iommu_paging_domain_alloc_flags() so that
it can call iommu_domain_init() with appropriate domain type.
I think instead of having separate function it may be better to
enhance __iommu_domain_alloc() such that:
- Keep below changes from this patch
- iommu_domain_init()
- iommu_get_dma_cookie call inside iommu_setup_default_domain()
- modify __iommu_domain_alloc() to additional param (flags)
- iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
Does above flow looks good?
-Vasant
drivers/iommu/iommu.c | 103 +++++++++++++++++++++++++----------
include/linux/iommu.h | 16 +++++-
include/uapi/linux/iommufd.h | 6 ++
3 files changed, 94 insertions(+), 31 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ed6c5cb60c5a..abf537c605d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>
#include <linux/msi.h>
+#include <uapi/linux/iommufd.h>
#include "dma-iommu.h"
#include "iommu-priv.h"
@@ -99,6 +100,9 @@ static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
+static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct device *dev,
+ unsigned int type,
+ unsigned int flags);
enum {
IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0,
@@ -1589,8 +1593,19 @@ EXPORT_SYMBOL_GPL(fsl_mc_device_group);
static struct iommu_domain *
__iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
{
+ struct device *dev = iommu_group_first_dev(group);
+
if (group->default_domain && group->default_domain->type == req_type)
return group->default_domain;
+
+ /*
+ * When allocating the DMA API domain assume that the driver is going to
+ * use PASID and make sure the RID's domain is PASID compatible.
+ */
+ if (req_type & __IOMMU_DOMAIN_PAGING)
+ return __iommu_paging_domain_alloc_flags(dev, req_type,
+ dev->iommu->max_pasids ? IOMMUFD_HWPT_ALLOC_PASID : 0);
+
return __iommu_group_domain_alloc(group, req_type);
}
@@ -1934,6 +1949,22 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
+static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
+ const struct iommu_ops *ops)
+{
+ domain->type = type;
+ domain->owner = ops;
+ if (!domain->ops)
+ domain->ops = ops->default_domain_ops;
+
+ /*
+ * If not already set, assume all sizes by default; the driver
+ * may override this later
+ */
+ if (!domain->pgsize_bitmap)
+ domain->pgsize_bitmap = ops->pgsize_bitmap;
+}
+
static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
struct device *dev,
unsigned int type)
@@ -1962,27 +1993,7 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
if (!domain)
return ERR_PTR(-ENOMEM);
- domain->type = type;
- domain->owner = ops;
- /*
- * If not already set, assume all sizes by default; the driver
- * may override this later
- */
- if (!domain->pgsize_bitmap)
- domain->pgsize_bitmap = ops->pgsize_bitmap;
-
- if (!domain->ops)
- domain->ops = ops->default_domain_ops;
-
- if (iommu_is_dma_domain(domain)) {
- int rc;
-
- rc = iommu_get_dma_cookie(domain);
- if (rc) {
- iommu_domain_free(domain);
- return ERR_PTR(rc);
- }
- }
+ iommu_domain_init(domain, type, ops);
return domain;
}
@@ -2030,21 +2041,49 @@ struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
}
EXPORT_SYMBOL_GPL(iommu_domain_alloc);
+static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct device *dev,
+ unsigned int type,
+ unsigned int flags)
+{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ struct iommu_domain *domain;
+
+ if (!dev_has_iommu(dev))
+ return ERR_PTR(-ENODEV);
+
+ if (ops->domain_alloc_paging && !flags)
+ domain = ops->domain_alloc_paging(dev);
+ else if (ops->domain_alloc_user)
+ domain = ops->domain_alloc_user(dev, flags, NULL, NULL);
+ else if (ops->domain_alloc && !flags)
+ domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+ else
+ return ERR_PTR(-EOPNOTSUPP);
+
+ if (IS_ERR(domain))
+ return domain;
+ if (!domain)
+ return ERR_PTR(-ENOMEM);
+
+ iommu_domain_init(domain, type, ops);
+ return domain;
+}
+
/**
- * iommu_paging_domain_alloc() - Allocate a paging domain
+ * iommu_paging_domain_alloc_flags() - Allocate a paging domain
* @dev: device for which the domain is allocated
+ * @flags: Bitmap of iommufd_hwpt_alloc_flags
*
* Allocate a paging domain which will be managed by a kernel driver. Return
* allocated domain if successful, or a ERR pointer for failure.
*/
-struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
+struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
+ unsigned int flags)
{
- if (!dev_has_iommu(dev))
- return ERR_PTR(-ENODEV);
-
- return __iommu_domain_alloc(dev_iommu_ops(dev), dev, IOMMU_DOMAIN_UNMANAGED);
+ return __iommu_paging_domain_alloc_flags(dev,
+ IOMMU_DOMAIN_UNMANAGED, flags);
}
-EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc);
+EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
void iommu_domain_free(struct iommu_domain *domain)
{
@@ -2965,6 +3004,14 @@ static int iommu_setup_default_domain(struct iommu_group *group,
if (group->default_domain == dom)
return 0;
+ if (iommu_is_dma_domain(dom)) {
+ ret = iommu_get_dma_cookie(dom);
+ if (ret) {
+ iommu_domain_free(dom);
+ return ret;
+ }
+ }
+
/*
* IOMMU_RESV_DIRECT and IOMMU_RESV_DIRECT_RELAXABLE regions must be
* mapped before their device is attached, in order to guarantee
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 04cbdae0052e..ca1813b2fa64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -511,8 +511,8 @@ static inline int __iommu_copy_struct_from_user_array(
* the caller iommu_domain_alloc() returns.
* @domain_alloc_user: Allocate an iommu domain corresponding to the input
* parameters as defined in include/uapi/linux/iommufd.h.
- * Unlike @domain_alloc, it is called only by IOMMUFD and
- * must fully initialize the new domain before return.
+ * Unlike @domain_alloc, it must fully initialize the new
+ * domain before return.
* Upon success, if the @user_data is valid and the @parent
* points to a kernel-managed domain, the new domain must be
* IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be
@@ -789,7 +789,11 @@ extern bool iommu_present(const struct bus_type *bus);
extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
extern bool iommu_group_has_isolated_msi(struct iommu_group *group);
extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus);
-struct iommu_domain *iommu_paging_domain_alloc(struct device *dev);
+struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev, unsigned int flags);
+static inline struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
+{
+ return iommu_paging_domain_alloc_flags(dev, 0);
+}
extern void iommu_domain_free(struct iommu_domain *domain);
extern int iommu_attach_device(struct iommu_domain *domain,
struct device *dev);
@@ -1096,6 +1100,12 @@ static inline struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus
return NULL;
}
+struct inline iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
+ unsigned int flags)
+{
+ return ERR_PTR(-ENODEV);
+}
+
static inline struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
{
return ERR_PTR(-ENODEV);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4dde745cfb7e..eba07ddb141e 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
* enforced on device attachment
* @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
* valid.
+ * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
+ * PASID, the device will support later attaching
+ * a PASID as well. Some HW requires a specific
+ * domain format on the device to allow PASID to
+ * work.
*/
enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
+ IOMMUFD_HWPT_ALLOC_PASID = 1 << 3,
};
/**
--
2.31.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
@ 2024-08-21 16:31 ` Jason Gunthorpe
2024-08-22 1:50 ` Baolu Lu
` (2 more replies)
2024-08-22 1:38 ` Baolu Lu
` (3 subsequent siblings)
4 siblings, 3 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 16:31 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
On Wed, Aug 21, 2024 at 01:35:50PM +0000, Vasant Hegde wrote:
> From: Jason Gunthorpe <jgg@ziepe.ca>
>
> Currently drivers calls iommu_paging_domain_alloc(dev) to get an
> UNAMANAGED domain. This is not sufficient to support PASID with
> UNAMANAGED domain as some HW like AMD requires certain page table type
> to support PASIDs.
>
> Also domain_alloc_paging() passes device as param for domain
> allocation. This is not sufficient for AMD driver to decide the right
> page table.
>
> Hence add iommu_paging_domain_alloc_flags() API which takes flags as
> parameter. Also update default domain allocation path to use this new
> API whenever device is PASID capable. HW driver should implement
> domain_alloc_user() to allocate PASID capable domain.
>
> Finally introduce new flag (IOMMUFD_HWPT_ALLOC_PASID) to
> domain_alloc_users() ops. If both IOMMU and device supports PASID it
> will allocate domain. Otherwise return error.
>
> Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca>
> [Added __iommu_paging_domain_alloc_flags() and description - Vasant]
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
> @Jason,
> Notice that I have added __iommu_paging_domain_alloc_flags() so that
> it can call iommu_domain_init() with appropriate domain type.
It is okay, but also the type could be fixed in
__iommu_group_alloc_default_domain() using something like:
ret->type |= req_type;
> I think instead of having separate function it may be better to
> enhance __iommu_domain_alloc() such that:
> - Keep below changes from this patch
> - iommu_domain_init()
> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
> - modify __iommu_domain_alloc() to additional param (flags)
> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
My expectation was to basically remove iommu_domain_alloc() entirely
once Lu's work is merged.
Instead we'd have these direct APIs:
iommu_domain_alloc_paging_flags()
iommu_group_alloc_blocking_domain()
iommu_group_alloc_identity_domain()
And then iommu_group_alloc_default_domain() would just call them in
the right order maybe like this:
if (req_type & __IOMMU_DOMAIN_PAGING)
return iommu_domain_alloc_paging_flags();
if (req_type && req_type != IOMMU_DOMAIN_IDENTITY)
return ERR_PTR(-EOPNOTSUPP);
if (req_type || iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY) {
dom = iommu_group_alloc_identity_domain();
if (req_type || !IS_ERR(dom))
return dom;
/*
* if iommu_def_domain_type == IDENTITY fails then fall through
* to PAGING
*/
}
/* iommu_def_domain_type is PAGING */
dom = iommu_domain_alloc_paging_flags();
if (IS_ERR(dom))
return dom;
pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
iommu_def_domain_type, group->name);
return dom;
Which is the only place we need to make a decision based on a type
input.
Hoping Lu's final few patches make it this cycle
> +static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
> + const struct iommu_ops *ops)
> +{
> + domain->type = type;
> + domain->owner = ops;
> + if (!domain->ops)
> + domain->ops = ops->default_domain_ops;
> +
> + /*
> + * If not already set, assume all sizes by default; the driver
> + * may override this later
> + */
> + if (!domain->pgsize_bitmap)
> + domain->pgsize_bitmap = ops->pgsize_bitmap;
> +}
Pedantically the pgsize_bitmap is only needed for paging domains, but
it is OK like this too.
> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
> * enforced on device attachment
> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
> * valid.
> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
> + * PASID, the device will support later attaching
> + * a PASID as well. Some HW requires a specific
> + * domain format on the device to allow PASID to
> + * work.
Maybe:
Requests a domain that can be used with PASID. The domain can be
attached to any PASID on the device. Any domain attached to the
non-PASID part of the device must also be flaged, otherwise attaching
a PASID will blocked.
Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
specified on the RID domain while processing attach on the PASID
domain.
> enum iommufd_hwpt_alloc_flags {
> IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
> IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
> + IOMMUFD_HWPT_ALLOC_PASID = 1 << 3,
Let's keep the consistent spelling IOMMU_HWPT_ALLOC_PASID
I think the series looks OK. The naming of domain_alloc_user() is now
a little bit weird, it has now turned into
domain_alloc_paging_extended().
Thanks,
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 16:31 ` Jason Gunthorpe
@ 2024-08-22 1:50 ` Baolu Lu
2024-08-22 12:43 ` Jason Gunthorpe
2024-08-22 11:27 ` Yi Liu
2024-08-26 8:36 ` Vasant Hegde
2 siblings, 1 reply; 41+ messages in thread
From: Baolu Lu @ 2024-08-22 1:50 UTC (permalink / raw)
To: Jason Gunthorpe, Vasant Hegde
Cc: baolu.lu, iommu, joro, will, robin.murphy, suravee.suthikulpanit,
yi.l.liu, kevin.tian
On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
>> I think instead of having separate function it may be better to
>> enhance __iommu_domain_alloc() such that:
>> - Keep below changes from this patch
>> - iommu_domain_init()
>> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
>> - modify __iommu_domain_alloc() to additional param (flags)
>> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
> My expectation was to basically remove iommu_domain_alloc() entirely
> once Lu's work is merged.
>
> Instead we'd have these direct APIs:
> iommu_domain_alloc_paging_flags()
Is it possible to use different domain allocation APIs for kernel DMA
and user-space DMA? Right now, we differentiate between these two types
of domains using IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
I'm thinking about this because the Intel iommu driver has a similar
need to AMD. They both recommend using different page table formats for
IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is currently stopping
us from implementing domain_alloc_paging in the Intel iommu driver.
Thanks,
baolu
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 1:50 ` Baolu Lu
@ 2024-08-22 12:43 ` Jason Gunthorpe
2024-08-23 2:47 ` Baolu Lu
0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-22 12:43 UTC (permalink / raw)
To: Baolu Lu
Cc: Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, yi.l.liu, kevin.tian
On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
> > > I think instead of having separate function it may be better to
> > > enhance __iommu_domain_alloc() such that:
> > > - Keep below changes from this patch
> > > - iommu_domain_init()
> > > - iommu_get_dma_cookie call inside iommu_setup_default_domain()
> > > - modify __iommu_domain_alloc() to additional param (flags)
> > > - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
> > My expectation was to basically remove iommu_domain_alloc() entirely
> > once Lu's work is merged.
> >
> > Instead we'd have these direct APIs:
> > iommu_domain_alloc_paging_flags()
>
> Is it possible to use different domain allocation APIs for kernel DMA
> and user-space DMA? Right now, we differentiate between these two types
> of domains using IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
I really don't want to have such a distinction.
> I'm thinking about this because the Intel iommu driver has a similar
> need to AMD. They both recommend using different page table formats for
> IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is currently stopping
> us from implementing domain_alloc_paging in the Intel iommu driver.
Why? What exactly is the issue?
It is inhernetly wrong to behave differently based on DMA API or VFIO.
They are not different things.
If you have different behaviors and different properies, like AMD's
PASID, then they should be described and mapped to some kind of flag.
Otherwise the driver should always allocate a paging domain that gives
the highest performance.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 12:43 ` Jason Gunthorpe
@ 2024-08-23 2:47 ` Baolu Lu
2024-08-26 8:08 ` Tian, Kevin
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Baolu Lu @ 2024-08-23 2:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, yi.l.liu, kevin.tian
On 8/22/24 8:43 PM, Jason Gunthorpe wrote:
> On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
>> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
>>>> I think instead of having separate function it may be better to
>>>> enhance __iommu_domain_alloc() such that:
>>>> - Keep below changes from this patch
>>>> - iommu_domain_init()
>>>> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
>>>> - modify __iommu_domain_alloc() to additional param (flags)
>>>> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
>>> My expectation was to basically remove iommu_domain_alloc() entirely
>>> once Lu's work is merged.
>>>
>>> Instead we'd have these direct APIs:
>>> iommu_domain_alloc_paging_flags()
>> Is it possible to use different domain allocation APIs for kernel DMA
>> and user-space DMA? Right now, we differentiate between these two types
>> of domains using IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
> I really don't want to have such a distinction.
>
>> I'm thinking about this because the Intel iommu driver has a similar
>> need to AMD. They both recommend using different page table formats for
>> IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is currently stopping
>> us from implementing domain_alloc_paging in the Intel iommu driver.
> Why? What exactly is the issue?
>
> It is inhernetly wrong to behave differently based on DMA API or VFIO.
> They are not different things.
>
> If you have different behaviors and different properies, like AMD's
> PASID, then they should be described and mapped to some kind of flag.
>
> Otherwise the driver should always allocate a paging domain that gives
> the highest performance.
It relates to Intel VT-d's nested translation. Intel VT-d has two types
of page table formats for DMA translation: first level and second level.
In nested translation, the first level page table is used for first-
stage translation, and the second level page table is used for second-
stage translation.
The iommu driver for vIOMMU in the guest kernel must use the first level
page table format for kernel DMA. This page table will then be nested on
a second level page table in the VMM host kernel.
Our current design uses the first level page table for both the host and
guest kernel for simplicity. This is why we use different page table
formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
We considered determining the page table format based on whether the
iommu has caching mode capability. This would result in the first level
page table format being used for guest kernel DMA and the second level
page table format being used for host kernel DMA. However, this approach
creates an inconsistency between the host and guest kernels.
I am not sure about other architectures, like AMD, ARM and RISC-V.
Perhaps all of them have the similar need?
Thanks,
baolu
^ permalink raw reply [flat|nested] 41+ messages in thread* RE: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-23 2:47 ` Baolu Lu
@ 2024-08-26 8:08 ` Tian, Kevin
2024-08-26 8:34 ` Baolu Lu
2024-08-26 8:47 ` Vasant Hegde
2024-08-26 13:45 ` Jason Gunthorpe
2 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2024-08-26 8:08 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: Vasant Hegde, iommu@lists.linux.dev, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com,
suravee.suthikulpanit@amd.com, Liu, Yi L
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, August 23, 2024 10:48 AM
>
> On 8/22/24 8:43 PM, Jason Gunthorpe wrote:
> > On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
> >> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
> >>>> I think instead of having separate function it may be better to
> >>>> enhance __iommu_domain_alloc() such that:
> >>>> - Keep below changes from this patch
> >>>> - iommu_domain_init()
> >>>> - iommu_get_dma_cookie call inside
> iommu_setup_default_domain()
> >>>> - modify __iommu_domain_alloc() to additional param (flags)
> >>>> - iommu_paging_domain_alloc_flags() will call
> __iommu_domain_alloc()
> >>> My expectation was to basically remove iommu_domain_alloc() entirely
> >>> once Lu's work is merged.
> >>>
> >>> Instead we'd have these direct APIs:
> >>> iommu_domain_alloc_paging_flags()
> >> Is it possible to use different domain allocation APIs for kernel DMA
> >> and user-space DMA? Right now, we differentiate between these two
> types
> >> of domains using IOMMU_DOMAIN_DMA and
> IOMMU_DOMAIN_UNMANAGED.
> > I really don't want to have such a distinction.
> >
> >> I'm thinking about this because the Intel iommu driver has a similar
> >> need to AMD. They both recommend using different page table formats
> for
> >> IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is
Where is such recommendation coming from?
> currently stopping
> >> us from implementing domain_alloc_paging in the Intel iommu driver.
> > Why? What exactly is the issue?
> >
> > It is inhernetly wrong to behave differently based on DMA API or VFIO.
> > They are not different things.
> >
> > If you have different behaviors and different properies, like AMD's
> > PASID, then they should be described and mapped to some kind of flag.
> >
> > Otherwise the driver should always allocate a paging domain that gives
> > the highest performance.
>
> It relates to Intel VT-d's nested translation. Intel VT-d has two types
> of page table formats for DMA translation: first level and second level.
> In nested translation, the first level page table is used for first-
> stage translation, and the second level page table is used for second-
> stage translation.
>
> The iommu driver for vIOMMU in the guest kernel must use the first level
> page table format for kernel DMA. This page table will then be nested on
> a second level page table in the VMM host kernel.
If a 'legacy-only' vIOMMU is exposed the guest kernel will certainly
use the 2nd stage page table.
nested is an optimization manageable by VMM. Not something that
the kernel driver needs to restrict.
>
> Our current design uses the first level page table for both the host and
> guest kernel for simplicity. This is why we use different page table
> formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
As you said it's the current 'design', not an arch limitation. 😊
>
> We considered determining the page table format based on whether the
> iommu has caching mode capability. This would result in the first level
> page table format being used for guest kernel DMA and the second level
> page table format being used for host kernel DMA. However, this approach
> creates an inconsistency between the host and guest kernels.
Why would one care about such consistency between the host and the guest?
In the end the iommu driver needs to decide based on the requested hwpt
type and available iommu cap to decide the format.
e.g. is there a problem with a simple policy below?
- Default use stage-1 for both DMA/UNMANAGED, if nesting_parent is
not specified and stage-1 is supported by hw
- Otherwise use stage-2
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-26 8:08 ` Tian, Kevin
@ 2024-08-26 8:34 ` Baolu Lu
2024-08-26 8:59 ` Tian, Kevin
0 siblings, 1 reply; 41+ messages in thread
From: Baolu Lu @ 2024-08-26 8:34 UTC (permalink / raw)
To: Tian, Kevin, Jason Gunthorpe
Cc: baolu.lu, Vasant Hegde, iommu@lists.linux.dev, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com,
suravee.suthikulpanit@amd.com, Liu, Yi L
On 2024/8/26 16:08, Tian, Kevin wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Friday, August 23, 2024 10:48 AM
>>
>> On 8/22/24 8:43 PM, Jason Gunthorpe wrote:
>>> On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
>>>> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
>>>>>> I think instead of having separate function it may be better to
>>>>>> enhance __iommu_domain_alloc() such that:
>>>>>> - Keep below changes from this patch
>>>>>> - iommu_domain_init()
>>>>>> - iommu_get_dma_cookie call inside
>> iommu_setup_default_domain()
>>>>>> - modify __iommu_domain_alloc() to additional param (flags)
>>>>>> - iommu_paging_domain_alloc_flags() will call
>> __iommu_domain_alloc()
>>>>> My expectation was to basically remove iommu_domain_alloc() entirely
>>>>> once Lu's work is merged.
>>>>>
>>>>> Instead we'd have these direct APIs:
>>>>> iommu_domain_alloc_paging_flags()
>>>> Is it possible to use different domain allocation APIs for kernel DMA
>>>> and user-space DMA? Right now, we differentiate between these two
>> types
>>>> of domains using IOMMU_DOMAIN_DMA and
>> IOMMU_DOMAIN_UNMANAGED.
>>> I really don't want to have such a distinction.
>>>
>>>> I'm thinking about this because the Intel iommu driver has a similar
>>>> need to AMD. They both recommend using different page table formats
>> for
>>>> IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is
> Where is such recommendation coming from?
>
>> currently stopping
>>>> us from implementing domain_alloc_paging in the Intel iommu driver.
>>> Why? What exactly is the issue?
>>>
>>> It is inhernetly wrong to behave differently based on DMA API or VFIO.
>>> They are not different things.
>>>
>>> If you have different behaviors and different properies, like AMD's
>>> PASID, then they should be described and mapped to some kind of flag.
>>>
>>> Otherwise the driver should always allocate a paging domain that gives
>>> the highest performance.
>> It relates to Intel VT-d's nested translation. Intel VT-d has two types
>> of page table formats for DMA translation: first level and second level.
>> In nested translation, the first level page table is used for first-
>> stage translation, and the second level page table is used for second-
>> stage translation.
>>
>> The iommu driver for vIOMMU in the guest kernel must use the first level
>> page table format for kernel DMA. This page table will then be nested on
>> a second level page table in the VMM host kernel.
> If a 'legacy-only' vIOMMU is exposed the guest kernel will certainly
> use the 2nd stage page table.
>
> nested is an optimization manageable by VMM. Not something that
> the kernel driver needs to restrict.
>
>> Our current design uses the first level page table for both the host and
>> guest kernel for simplicity. This is why we use different page table
>> formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
> As you said it's the current 'design', not an arch limitation. 😊
>
>> We considered determining the page table format based on whether the
>> iommu has caching mode capability. This would result in the first level
>> page table format being used for guest kernel DMA and the second level
>> page table format being used for host kernel DMA. However, this approach
>> creates an inconsistency between the host and guest kernels.
> Why would one care about such consistency between the host and the guest?
>
> In the end the iommu driver needs to decide based on the requested hwpt
> type and available iommu cap to decide the format.
>
> e.g. is there a problem with a simple policy below?
>
> - Default use stage-1 for both DMA/UNMANAGED, if nesting_parent is
> not specified and stage-1 is supported by hw
> - Otherwise use stage-2
First-stage has some limitations for an UNMANAGED domain. For example,
- No separate controls for the Access/Dirty page tracking;
- No Write-only permission support;
- No page-level control for forcing snoop.
Thanks,
baolu
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-26 8:34 ` Baolu Lu
@ 2024-08-26 8:59 ` Tian, Kevin
2024-08-26 13:51 ` Jason Gunthorpe
0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2024-08-26 8:59 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: Vasant Hegde, iommu@lists.linux.dev, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com,
suravee.suthikulpanit@amd.com, Liu, Yi L
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, August 26, 2024 4:34 PM
>
> On 2024/8/26 16:08, Tian, Kevin wrote:
> >> From: Baolu Lu<baolu.lu@linux.intel.com>
> >> Sent: Friday, August 23, 2024 10:48 AM
> >>
> >> On 8/22/24 8:43 PM, Jason Gunthorpe wrote:
> >>> On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
> >>>> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
> >>>>>> I think instead of having separate function it may be better to
> >>>>>> enhance __iommu_domain_alloc() such that:
> >>>>>> - Keep below changes from this patch
> >>>>>> - iommu_domain_init()
> >>>>>> - iommu_get_dma_cookie call inside
> >> iommu_setup_default_domain()
> >>>>>> - modify __iommu_domain_alloc() to additional param (flags)
> >>>>>> - iommu_paging_domain_alloc_flags() will call
> >> __iommu_domain_alloc()
> >>>>> My expectation was to basically remove iommu_domain_alloc()
> entirely
> >>>>> once Lu's work is merged.
> >>>>>
> >>>>> Instead we'd have these direct APIs:
> >>>>> iommu_domain_alloc_paging_flags()
> >>>> Is it possible to use different domain allocation APIs for kernel DMA
> >>>> and user-space DMA? Right now, we differentiate between these two
> >> types
> >>>> of domains using IOMMU_DOMAIN_DMA and
> >> IOMMU_DOMAIN_UNMANAGED.
> >>> I really don't want to have such a distinction.
> >>>
> >>>> I'm thinking about this because the Intel iommu driver has a similar
> >>>> need to AMD. They both recommend using different page table formats
> >> for
> >>>> IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED, which is
> > Where is such recommendation coming from?
> >
> >> currently stopping
> >>>> us from implementing domain_alloc_paging in the Intel iommu driver.
> >>> Why? What exactly is the issue?
> >>>
> >>> It is inhernetly wrong to behave differently based on DMA API or VFIO.
> >>> They are not different things.
> >>>
> >>> If you have different behaviors and different properies, like AMD's
> >>> PASID, then they should be described and mapped to some kind of flag.
> >>>
> >>> Otherwise the driver should always allocate a paging domain that gives
> >>> the highest performance.
> >> It relates to Intel VT-d's nested translation. Intel VT-d has two types
> >> of page table formats for DMA translation: first level and second level.
> >> In nested translation, the first level page table is used for first-
> >> stage translation, and the second level page table is used for second-
> >> stage translation.
> >>
> >> The iommu driver for vIOMMU in the guest kernel must use the first level
> >> page table format for kernel DMA. This page table will then be nested on
> >> a second level page table in the VMM host kernel.
> > If a 'legacy-only' vIOMMU is exposed the guest kernel will certainly
> > use the 2nd stage page table.
> >
> > nested is an optimization manageable by VMM. Not something that
> > the kernel driver needs to restrict.
> >
> >> Our current design uses the first level page table for both the host and
> >> guest kernel for simplicity. This is why we use different page table
> >> formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
> > As you said it's the current 'design', not an arch limitation. 😊
> >
> >> We considered determining the page table format based on whether the
> >> iommu has caching mode capability. This would result in the first level
> >> page table format being used for guest kernel DMA and the second level
> >> page table format being used for host kernel DMA. However, this
> approach
> >> creates an inconsistency between the host and guest kernels.
> > Why would one care about such consistency between the host and the
> guest?
> >
> > In the end the iommu driver needs to decide based on the requested hwpt
> > type and available iommu cap to decide the format.
> >
> > e.g. is there a problem with a simple policy below?
> >
> > - Default use stage-1 for both DMA/UNMANAGED, if nesting_parent is
> > not specified and stage-1 is supported by hw
> > - Otherwise use stage-2
>
> First-stage has some limitations for an UNMANAGED domain. For example,
>
> - No separate controls for the Access/Dirty page tracking;
You can do sw control upon whether to pre-set the a/d bit?
> - No Write-only permission support;
What is the actual usage? What about a vIOMMU which
only supports stage-1?
> - No page-level control for forcing snoop.
>
again we don't have such page-granular snoop usage now and
in scalable mode we always set PGSNP field in the PASID entry
which makes the PTE snoop bit unused.
iirc arm smmu is using this policy too. Jason, is there any
similar limitations between stage-1 vs. stage-2 as above ?
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-26 8:59 ` Tian, Kevin
@ 2024-08-26 13:51 ` Jason Gunthorpe
0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-26 13:51 UTC (permalink / raw)
To: Tian, Kevin
Cc: Baolu Lu, Vasant Hegde, iommu@lists.linux.dev, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com,
suravee.suthikulpanit@amd.com, Liu, Yi L
On Mon, Aug 26, 2024 at 08:59:47AM +0000, Tian, Kevin wrote:
> > First-stage has some limitations for an UNMANAGED domain. For example,
If these are important then they need their own flags for iommufd to
request them.
> > - No separate controls for the Access/Dirty page tracking;
>
> You can do sw control upon whether to pre-set the a/d bit?
The VMM can specify
IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING
If this is important?
> > - No Write-only permission support;
>
> What is the actual usage? What about a vIOMMU which
> only supports stage-1?
Is there some usage for this?
> > - No page-level control for forcing snoop.
>
> again we don't have such page-granular snoop usage now and
> in scalable mode we always set PGSNP field in the PASID entry
> which makes the PTE snoop bit unused.
Right no-snoop is expected to be domain global.
> iirc arm smmu is using this policy too. Jason, is there any
> similar limitations between stage-1 vs. stage-2 as above ?
There are some difference, but more around performance. I think ARM
would prefer to use the S2 in many cases because it may decrease the
table depth by 1.
AMD prefers to use their S2 for performance because it has more page
size options.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-23 2:47 ` Baolu Lu
2024-08-26 8:08 ` Tian, Kevin
@ 2024-08-26 8:47 ` Vasant Hegde
2024-08-26 13:45 ` Jason Gunthorpe
2 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-26 8:47 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
kevin.tian
Baolu,
On 8/23/2024 8:17 AM, Baolu Lu wrote:
> On 8/22/24 8:43 PM, Jason Gunthorpe wrote:
>> On Thu, Aug 22, 2024 at 09:50:57AM +0800, Baolu Lu wrote:
>>> On 8/22/24 12:31 AM, Jason Gunthorpe wrote:
>>>>> I think instead of having separate function it may be better to
>>>>> enhance __iommu_domain_alloc() such that:
>>>>> - Keep below changes from this patch
>>>>> - iommu_domain_init()
>>>>> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
>>>>> - modify __iommu_domain_alloc() to additional param (flags)
>>>>> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
>>>> My expectation was to basically remove iommu_domain_alloc() entirely
>>>> once Lu's work is merged.
.../...
>
> It relates to Intel VT-d's nested translation. Intel VT-d has two types
> of page table formats for DMA translation: first level and second level.
> In nested translation, the first level page table is used for first-
> stage translation, and the second level page table is used for second-
> stage translation.
>
> The iommu driver for vIOMMU in the guest kernel must use the first level
> page table format for kernel DMA. This page table will then be nested on
> a second level page table in the VMM host kernel.
>
> Our current design uses the first level page table for both the host and
> guest kernel for simplicity. This is why we use different page table
> formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
>
> We considered determining the page table format based on whether the
> iommu has caching mode capability. This would result in the first level
> page table format being used for guest kernel DMA and the second level
> page table format being used for host kernel DMA. However, this approach
> creates an inconsistency between the host and guest kernels.
>
> I am not sure about other architectures, like AMD, ARM and RISC-V.
> Perhaps all of them have the similar need?
AMD driver has caching mode capability (amd_iommu_np_cache).
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-23 2:47 ` Baolu Lu
2024-08-26 8:08 ` Tian, Kevin
2024-08-26 8:47 ` Vasant Hegde
@ 2024-08-26 13:45 ` Jason Gunthorpe
2 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-26 13:45 UTC (permalink / raw)
To: Baolu Lu, Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
kevin.tian
On Fri, Aug 23, 2024 at 10:47:46AM +0800, Baolu Lu wrote:
> > Why? What exactly is the issue?
> >
> > It is inhernetly wrong to behave differently based on DMA API or VFIO.
> > They are not different things.
> >
> > If you have different behaviors and different properies, like AMD's
> > PASID, then they should be described and mapped to some kind of flag.
> >
> > Otherwise the driver should always allocate a paging domain that gives
> > the highest performance.
>
> It relates to Intel VT-d's nested translation. Intel VT-d has two types
> of page table formats for DMA translation: first level and second level.
> In nested translation, the first level page table is used for first-
> stage translation, and the second level page table is used for second-
> stage translation.
>
> The iommu driver for vIOMMU in the guest kernel must use the first level
> page table format for kernel DMA. This page table will then be nested on
> a second level page table in the VMM host kernel.
The guest kernel must know that the IOMMU can only support the first
level format in this case! There is no other option!
> Our current design uses the first level page table for both the host and
> guest kernel for simplicity. This is why we use different page table
> formats for IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_UNMANAGED.
You should just always use the first level format for every paging
domain except when IOMMU_HWPT_ALLOC_NEST_PARENT is set.
IOMMU_HWPT_ALLOC_NEST_PARENT was added specifically to solve this
problem.
Ideally the guest kernel will reject IOMMU_HWPT_ALLOC_NEST_PARENT as
not supported when allocated or attached. ARM does.
> I am not sure about other architectures, like AMD, ARM and RISC-V.
> Perhaps all of them have the similar need?
ARM/AMD/VTD all have the two page table problem.
ARM informs the guest if S1/S2 is available via IDR which is like
VT-D's ECAP. Only S1 can be used in the nested guest VM.
AMD has a global amd_iommu_pgtable, it must be set to AMD_IOMMU_V2
within a nested guest VM. The only way I saw for that to happen was
via kernel command line parameter? Was expecting some ACPI or
FEATURE_XXX thing? Vasant?
VTD should add some discoverability that the second stage is not
supported and behave as above.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 16:31 ` Jason Gunthorpe
2024-08-22 1:50 ` Baolu Lu
@ 2024-08-22 11:27 ` Yi Liu
2024-08-22 12:44 ` Jason Gunthorpe
2024-08-26 8:36 ` Vasant Hegde
2 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2024-08-22 11:27 UTC (permalink / raw)
To: Jason Gunthorpe, Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, baolu.lu,
kevin.tian
On 2024/8/22 00:31, Jason Gunthorpe wrote:
>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
>> * enforced on device attachment
>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
>> * valid.
>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
>> + * PASID, the device will support later attaching
>> + * a PASID as well. Some HW requires a specific
>> + * domain format on the device to allow PASID to
>> + * work.
>
> Maybe:
>
> Requests a domain that can be used with PASID. The domain can be
> attached to any PASID on the device. Any domain attached to the
> non-PASID part of the device must also be flaged, otherwise attaching
> a PASID will blocked.
>
> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
> specified on the RID domain while processing attach on the PASID
> domain.
this means if userspace wants to user pasid on a device, it should allocate
hwpt with this flag, no matter the hwpt is going to be attached to RID or
pasid. is it?
I still have a confusion. Say on AMD, is it a legitimate configuration that
the RID is attached to a domain that uses V1 page table while pasid is
attached to domains that use V2 page table? From architecture p.o.v., it
seems allowed.
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 11:27 ` Yi Liu
@ 2024-08-22 12:44 ` Jason Gunthorpe
2024-08-23 8:58 ` Yi Liu
0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-22 12:44 UTC (permalink / raw)
To: Yi Liu
Cc: Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
On Thu, Aug 22, 2024 at 07:27:39PM +0800, Yi Liu wrote:
> On 2024/8/22 00:31, Jason Gunthorpe wrote:
>
> > > @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
> > > * enforced on device attachment
> > > * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
> > > * valid.
> > > + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
> > > + * PASID, the device will support later attaching
> > > + * a PASID as well. Some HW requires a specific
> > > + * domain format on the device to allow PASID to
> > > + * work.
> >
> > Maybe:
> >
> > Requests a domain that can be used with PASID. The domain can be
> > attached to any PASID on the device. Any domain attached to the
> > non-PASID part of the device must also be flaged, otherwise attaching
> > a PASID will blocked.
> >
> > Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
> > specified on the RID domain while processing attach on the PASID
> > domain.
>
> this means if userspace wants to user pasid on a device, it should allocate
> hwpt with this flag, no matter the hwpt is going to be attached to RID or
> pasid. is it?
Yes
> I still have a confusion. Say on AMD, is it a legitimate configuration that
> the RID is attached to a domain that uses V1 page table while pasid is
> attached to domains that use V2 page table? From architecture p.o.v., it
> seems allowed.
I think that is exactly what they have been saying their design cannot
do.
If you want to use PASID you must install a GCR3 table full of V2
tables. Doing this denies use of the V1 format for anything except a
NESTING_PARENT.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 12:44 ` Jason Gunthorpe
@ 2024-08-23 8:58 ` Yi Liu
2024-08-24 14:47 ` Vasant Hegde
0 siblings, 1 reply; 41+ messages in thread
From: Yi Liu @ 2024-08-23 8:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
On 2024/8/22 20:44, Jason Gunthorpe wrote:
> On Thu, Aug 22, 2024 at 07:27:39PM +0800, Yi Liu wrote:
>> On 2024/8/22 00:31, Jason Gunthorpe wrote:
>>
>>>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
>>>> * enforced on device attachment
>>>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
>>>> * valid.
>>>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
>>>> + * PASID, the device will support later attaching
>>>> + * a PASID as well. Some HW requires a specific
>>>> + * domain format on the device to allow PASID to
>>>> + * work.
>>>
>>> Maybe:
>>>
>>> Requests a domain that can be used with PASID. The domain can be
>>> attached to any PASID on the device. Any domain attached to the
>>> non-PASID part of the device must also be flaged, otherwise attaching
>>> a PASID will blocked.
>>>
>>> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
>>> specified on the RID domain while processing attach on the PASID
>>> domain.
>>
>> this means if userspace wants to user pasid on a device, it should allocate
>> hwpt with this flag, no matter the hwpt is going to be attached to RID or
>> pasid. is it?
>
> Yes
>
>> I still have a confusion. Say on AMD, is it a legitimate configuration that
>> the RID is attached to a domain that uses V1 page table while pasid is
>> attached to domains that use V2 page table? From architecture p.o.v., it
>> seems allowed.
>
> I think that is exactly what they have been saying their design cannot
> do.
ok. I don't think Intel has this limitation. RID/PASID is free to use s1
or s2 format page table. So such enforcement is nop on Intel platform.
>
> If you want to use PASID you must install a GCR3 table full of V2
> tables. Doing this denies use of the V1 format for anything except a
> NESTING_PARENT.
thanks for the explanation. So if pasid is to be used, amd iommu driver
just programs the GCR3 tables and the request without pasid would use the
v2 page table installed at pasid#0. It does not use the
DTE->host_page_table at all. I remember the DTE->host_page_table uses v1
format.
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-23 8:58 ` Yi Liu
@ 2024-08-24 14:47 ` Vasant Hegde
2024-08-28 21:52 ` Jacob Pan
0 siblings, 1 reply; 41+ messages in thread
From: Vasant Hegde @ 2024-08-24 14:47 UTC (permalink / raw)
To: Yi Liu, Jason Gunthorpe
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, baolu.lu,
kevin.tian
Yi,
On 8/23/2024 2:28 PM, Yi Liu wrote:
> On 2024/8/22 20:44, Jason Gunthorpe wrote:
>> On Thu, Aug 22, 2024 at 07:27:39PM +0800, Yi Liu wrote:
>>> On 2024/8/22 00:31, Jason Gunthorpe wrote:
>>>
>>>>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
>>>>> * enforced on device attachment
>>>>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation
>>>>> data is
>>>>> * valid.
>>>>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
>>>>> + * PASID, the device will support later attaching
>>>>> + * a PASID as well. Some HW requires a specific
>>>>> + * domain format on the device to allow PASID to
>>>>> + * work.
>>>>
>>>> Maybe:
>>>>
>>>> Requests a domain that can be used with PASID. The domain can be
>>>> attached to any PASID on the device. Any domain attached to the
>>>> non-PASID part of the device must also be flaged, otherwise attaching
>>>> a PASID will blocked.
>>>>
>>>> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
>>>> specified on the RID domain while processing attach on the PASID
>>>> domain.
>>>
>>> this means if userspace wants to user pasid on a device, it should allocate
>>> hwpt with this flag, no matter the hwpt is going to be attached to RID or
>>> pasid. is it?
>>
>> Yes>>
>>> I still have a confusion. Say on AMD, is it a legitimate configuration that
>>> the RID is attached to a domain that uses V1 page table while pasid is
>>> attached to domains that use V2 page table? From architecture p.o.v., it
>>> seems allowed.
>>
>> I think that is exactly what they have been saying their design cannot
>> do.
>
> ok. I don't think Intel has this limitation. RID/PASID is free to use s1
> or s2 format page table. So such enforcement is nop on Intel platform.
>
>>
>> If you want to use PASID you must install a GCR3 table full of V2
>> tables. Doing this denies use of the V1 format for anything except a
>> NESTING_PARENT.
>
> thanks for the explanation. So if pasid is to be used, amd iommu driver
> just programs the GCR3 tables and the request without pasid would use the
> v2 page table installed at pasid#0. It does not use the
> DTE->host_page_table at all. I remember the DTE->host_page_table uses v1
> format.
Right. Only V2 page table supports PASID. So when user space requests hwpt with
PASID AMD driver will allocate V2 page table with GCR3 setup and updated DTE. So
it can handle attaching Device ID (uses PASID#0) and PASID (as GCR3 table is
already setup).
Note that when we use NESTED page table it will have both v1 (L2 / stage 2
level) and v2 page table (L1 / stage 1 level). Even in this setup V2 page table
inside guest can support PASID.
Hope this clarifies.
-Vasant
>
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-24 14:47 ` Vasant Hegde
@ 2024-08-28 21:52 ` Jacob Pan
2024-08-29 10:51 ` Vasant Hegde
0 siblings, 1 reply; 41+ messages in thread
From: Jacob Pan @ 2024-08-28 21:52 UTC (permalink / raw)
To: Vasant Hegde
Cc: Yi Liu, Jason Gunthorpe, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
Hi Vasant,
On Sat, 24 Aug 2024 20:17:45 +0530
Vasant Hegde <vasant.hegde@amd.com> wrote:
> Yi,
>
>
> On 8/23/2024 2:28 PM, Yi Liu wrote:
> > On 2024/8/22 20:44, Jason Gunthorpe wrote:
> >> On Thu, Aug 22, 2024 at 07:27:39PM +0800, Yi Liu wrote:
> >>> On 2024/8/22 00:31, Jason Gunthorpe wrote:
> >>>
> >>>>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
> >>>>> * enforced on device
> >>>>> attachment
> >>>>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt
> >>>>> allocation data is
> >>>>> * valid.
> >>>>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a
> >>>>> device, with no
> >>>>> + * PASID, the device will support
> >>>>> later attaching
> >>>>> + * a PASID as well. Some HW
> >>>>> requires a specific
> >>>>> + * domain format on the device to
> >>>>> allow PASID to
> >>>>> + * work.
> >>>>
> >>>> Maybe:
> >>>>
> >>>> Requests a domain that can be used with PASID. The domain can
> >>>> be attached to any PASID on the device. Any domain attached to
> >>>> the non-PASID part of the device must also be flaged, otherwise
> >>>> attaching a PASID will blocked.
> >>>>
> >>>> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
> >>>> specified on the RID domain while processing attach on the PASID
> >>>> domain.
> >>>
> >>> this means if userspace wants to user pasid on a device, it
> >>> should allocate hwpt with this flag, no matter the hwpt is going
> >>> to be attached to RID or pasid. is it?
> >>
> >> Yes>>
> >>> I still have a confusion. Say on AMD, is it a legitimate
> >>> configuration that the RID is attached to a domain that uses V1
> >>> page table while pasid is attached to domains that use V2 page
> >>> table? From architecture p.o.v., it seems allowed.
> >>
> >> I think that is exactly what they have been saying their design
> >> cannot do.
> >
> > ok. I don't think Intel has this limitation. RID/PASID is free to
> > use s1 or s2 format page table. So such enforcement is nop on Intel
> > platform.
> >>
> >> If you want to use PASID you must install a GCR3 table full of V2
> >> tables. Doing this denies use of the V1 format for anything except
> >> a NESTING_PARENT.
> >
> > thanks for the explanation. So if pasid is to be used, amd iommu
> > driver just programs the GCR3 tables and the request without pasid
> > would use the v2 page table installed at pasid#0. It does not use
> > the DTE->host_page_table at all. I remember the
> > DTE->host_page_table uses v1 format.
>
>
> Right. Only V2 page table supports PASID. So when user space requests
> hwpt with PASID AMD driver will allocate V2 page table with GCR3
> setup and updated DTE. So it can handle attaching Device ID (uses
> PASID#0) and PASID (as GCR3 table is already setup).
>
> Note that when we use NESTED page table it will have both v1 (L2 /
> stage 2 level) and v2 page table (L1 / stage 1 level). Even in this
> setup V2 page table inside guest can support PASID.
>
Sorry I am still confused. So V2 can support both PASID and no PASID
(untagged DMA). For the nested case, if the guest wants to allocate
a UNMANAGED domain, would you use V1? If this code in [3/5] below runs
in the guest:
static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
/*
* Force IOMMU v1 page table when allocating
* domain for pass-through devices.
*/
if (type == IOMMU_DOMAIN_UNMANAGED)
pgtable = AMD_IOMMU_V1;
Hardware-wise, in nested case, can V1 be used in L1/S1 for DMA w/o
PASID?
Thanks,
Jacob
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-28 21:52 ` Jacob Pan
@ 2024-08-29 10:51 ` Vasant Hegde
2024-08-29 12:10 ` Jason Gunthorpe
0 siblings, 1 reply; 41+ messages in thread
From: Vasant Hegde @ 2024-08-29 10:51 UTC (permalink / raw)
To: Jacob Pan
Cc: Yi Liu, Jason Gunthorpe, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
Hi Jacob,
On 8/29/2024 3:22 AM, Jacob Pan wrote:
> Hi Vasant,
>
> On Sat, 24 Aug 2024 20:17:45 +0530
> Vasant Hegde <vasant.hegde@amd.com> wrote:
>
>> Yi,
>>
>>
>> On 8/23/2024 2:28 PM, Yi Liu wrote:
>>> On 2024/8/22 20:44, Jason Gunthorpe wrote:
>>>> On Thu, Aug 22, 2024 at 07:27:39PM +0800, Yi Liu wrote:
>>>>> On 2024/8/22 00:31, Jason Gunthorpe wrote:
>>>>>
>>>>>>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
>>>>>>> * enforced on device
>>>>>>> attachment
>>>>>>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt
>>>>>>> allocation data is
>>>>>>> * valid.
>>>>>>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a
>>>>>>> device, with no
>>>>>>> + * PASID, the device will support
>>>>>>> later attaching
>>>>>>> + * a PASID as well. Some HW
>>>>>>> requires a specific
>>>>>>> + * domain format on the device to
>>>>>>> allow PASID to
>>>>>>> + * work.
>>>>>>
>>>>>> Maybe:
>>>>>>
>>>>>> Requests a domain that can be used with PASID. The domain can
>>>>>> be attached to any PASID on the device. Any domain attached to
>>>>>> the non-PASID part of the device must also be flaged, otherwise
>>>>>> attaching a PASID will blocked.
>>>>>>
>>>>>> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
>>>>>> specified on the RID domain while processing attach on the PASID
>>>>>> domain.
>>>>>
>>>>> this means if userspace wants to user pasid on a device, it
>>>>> should allocate hwpt with this flag, no matter the hwpt is going
>>>>> to be attached to RID or pasid. is it?
>>>>
>>>> Yes>>
>>>>> I still have a confusion. Say on AMD, is it a legitimate
>>>>> configuration that the RID is attached to a domain that uses V1
>>>>> page table while pasid is attached to domains that use V2 page
>>>>> table? From architecture p.o.v., it seems allowed.
>>>>
>>>> I think that is exactly what they have been saying their design
>>>> cannot do.
>>>
>>> ok. I don't think Intel has this limitation. RID/PASID is free to
>>> use s1 or s2 format page table. So such enforcement is nop on Intel
>>> platform.
>>>>
>>>> If you want to use PASID you must install a GCR3 table full of V2
>>>> tables. Doing this denies use of the V1 format for anything except
>>>> a NESTING_PARENT.
>>>
>>> thanks for the explanation. So if pasid is to be used, amd iommu
>>> driver just programs the GCR3 tables and the request without pasid
>>> would use the v2 page table installed at pasid#0. It does not use
>>> the DTE->host_page_table at all. I remember the
>>> DTE->host_page_table uses v1 format.
>>
>>
>> Right. Only V2 page table supports PASID. So when user space requests
>> hwpt with PASID AMD driver will allocate V2 page table with GCR3
>> setup and updated DTE. So it can handle attaching Device ID (uses
>> PASID#0) and PASID (as GCR3 table is already setup).
>>
>> Note that when we use NESTED page table it will have both v1 (L2 /
>> stage 2 level) and v2 page table (L1 / stage 1 level). Even in this
>> setup V2 page table inside guest can support PASID.
>>
> Sorry I am still confused. So V2 can support both PASID and no PASID
> (untagged DMA). For the nested case, if the guest wants to allocate
> a UNMANAGED domain, would you use V1? If this code in [3/5] below runs
> in the guest:
In nested scenario, Guest will always use V2 page table (and host will use V1
page table).
>
> static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
>
> /*
> * Force IOMMU v1 page table when allocating
> * domain for pass-through devices.
> */
> if (type == IOMMU_DOMAIN_UNMANAGED)
> pgtable = AMD_IOMMU_V1;
>
> Hardware-wise, in nested case, can V1 be used in L1/S1 for DMA w/o
> PASID?
Actually no. L1 (Guest) will use V2 page table. I am also implementing
domain_alloc_paging()... I believe that should be sufficient. If something is
still missing we should address that as part of nested support.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-29 10:51 ` Vasant Hegde
@ 2024-08-29 12:10 ` Jason Gunthorpe
2024-08-29 12:47 ` Vasant Hegde
0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-29 12:10 UTC (permalink / raw)
To: Vasant Hegde
Cc: Jacob Pan, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
On Thu, Aug 29, 2024 at 04:21:32PM +0530, Vasant Hegde wrote:
> >> Note that when we use NESTED page table it will have both v1 (L2 /
> >> stage 2 level) and v2 page table (L1 / stage 1 level). Even in this
> >> setup V2 page table inside guest can support PASID.
> >>
> > Sorry I am still confused. So V2 can support both PASID and no PASID
> > (untagged DMA). For the nested case, if the guest wants to allocate
> > a UNMANAGED domain, would you use V1? If this code in [3/5] below runs
> > in the guest:
>
> In nested scenario, Guest will always use V2 page table (and host will use V1
> page table).
Sure, that is how it must work but..
> > static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
> >
> > /*
> > * Force IOMMU v1 page table when allocating
> > * domain for pass-through devices.
> > */
> > if (type == IOMMU_DOMAIN_UNMANAGED)
> > pgtable = AMD_IOMMU_V1;
> >
> > Hardware-wise, in nested case, can V1 be used in L1/S1 for DMA w/o
> > PASID?
>
> Actually no. L1 (Guest) will use V2 page table.
Where is the code for this? How does it work today?
I have the same confusion as Jacob where I don't see the driver making
the distinction - ie as above...
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-29 12:10 ` Jason Gunthorpe
@ 2024-08-29 12:47 ` Vasant Hegde
2024-08-29 13:11 ` Jason Gunthorpe
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-29 12:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jacob Pan, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
Jason, Jacob,
On 8/29/2024 5:40 PM, Jason Gunthorpe wrote:
> On Thu, Aug 29, 2024 at 04:21:32PM +0530, Vasant Hegde wrote:
>>>> Note that when we use NESTED page table it will have both v1 (L2 /
>>>> stage 2 level) and v2 page table (L1 / stage 1 level). Even in this
>>>> setup V2 page table inside guest can support PASID.
>>>>
>>> Sorry I am still confused. So V2 can support both PASID and no PASID
>>> (untagged DMA). For the nested case, if the guest wants to allocate
>>> a UNMANAGED domain, would you use V1? If this code in [3/5] below runs
>>> in the guest:
>>
>> In nested scenario, Guest will always use V2 page table (and host will use V1
>> page table).
>
> Sure, that is how it must work but..
>
>>> static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
>>>
>>> /*
>>> * Force IOMMU v1 page table when allocating
>>> * domain for pass-through devices.
>>> */
>>> if (type == IOMMU_DOMAIN_UNMANAGED)
>>> pgtable = AMD_IOMMU_V1;
>>>
>>> Hardware-wise, in nested case, can V1 be used in L1/S1 for DMA w/o
>>> PASID?
>>
>> Actually no. L1 (Guest) will use V2 page table.
>
> Where is the code for this? How does it work today?
>
> I have the same confusion as Jacob where I don't see the driver making
> the distinction - ie as above...
Sorry. I was only thinking about HW vIOMMU support.
Today we have two modes:
1) Device passthrough : No IOMMU support inside guest. Host will use V1 page table.
2) Emulated amd-iommu driver in qemu :
For some reason (which I am not aware) emulated AMD IOMMU driver in qemu
implemented V1 page table. So if we use emulated IOMMU it will use V1 page table
in guest. But this doesn't support DMA for passthrough devices.
Hence it worked fine so far.
For HW-vIOMMU where we will have nested page table, Host will be in V1 and Guest
will be in V2 page table. We need to add few checks to detect and use
appropriate page table. It should be part of nested page table support.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-29 12:47 ` Vasant Hegde
@ 2024-08-29 13:11 ` Jason Gunthorpe
2024-09-11 10:54 ` Vasant Hegde
2024-08-29 17:40 ` Jacob Pan
[not found] ` <66d0b2a1.630a0220.1dd301.daceSMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-29 13:11 UTC (permalink / raw)
To: Vasant Hegde
Cc: Jacob Pan, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
On Thu, Aug 29, 2024 at 06:17:38PM +0530, Vasant Hegde wrote:
> For HW-vIOMMU where we will have nested page table, Host will be in V1 and Guest
> will be in V2 page table. We need to add few checks to detect and use
> appropriate page table. It should be part of nested page table support.
You need to push those checks ASAP..
Guest enablement needs to be fast tracked to distros ahead of building
the hypervisor component.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-29 13:11 ` Jason Gunthorpe
@ 2024-09-11 10:54 ` Vasant Hegde
0 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-09-11 10:54 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jacob Pan, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
Jason,
On 8/29/2024 6:41 PM, Jason Gunthorpe wrote:
> On Thu, Aug 29, 2024 at 06:17:38PM +0530, Vasant Hegde wrote:
>
>> For HW-vIOMMU where we will have nested page table, Host will be in V1 and Guest
>> will be in V2 page table. We need to add few checks to detect and use
>> appropriate page table. It should be part of nested page table support.
>
> You need to push those checks ASAP..
>
> Guest enablement needs to be fast tracked to distros ahead of building
> the hypervisor component.
Sure. We are actively working on adding this support.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-29 12:47 ` Vasant Hegde
2024-08-29 13:11 ` Jason Gunthorpe
@ 2024-08-29 17:40 ` Jacob Pan
[not found] ` <66d0b2a1.630a0220.1dd301.daceSMTPIN_ADDED_BROKEN@mx.google.com>
2 siblings, 0 replies; 41+ messages in thread
From: Jacob Pan @ 2024-08-29 17:40 UTC (permalink / raw)
To: Vasant Hegde
Cc: Jason Gunthorpe, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
Hi Vasant,
On Thu, 29 Aug 2024 18:17:38 +0530
Vasant Hegde <vasant.hegde@amd.com> wrote:
> Jason, Jacob,
>
>
> On 8/29/2024 5:40 PM, Jason Gunthorpe wrote:
> > On Thu, Aug 29, 2024 at 04:21:32PM +0530, Vasant Hegde wrote:
> >>>> Note that when we use NESTED page table it will have both v1 (L2
> >>>> / stage 2 level) and v2 page table (L1 / stage 1 level). Even in
> >>>> this setup V2 page table inside guest can support PASID.
> >>>>
> >>> Sorry I am still confused. So V2 can support both PASID and no
> >>> PASID (untagged DMA). For the nested case, if the guest wants to
> >>> allocate a UNMANAGED domain, would you use V1? If this code in
> >>> [3/5] below runs in the guest:
> >>
> >> In nested scenario, Guest will always use V2 page table (and host
> >> will use V1 page table).
> >
> > Sure, that is how it must work but..
> >
> >>> static struct iommu_domain *amd_iommu_domain_alloc(unsigned int
> >>> type)
> >>>
> >>> /*
> >>> * Force IOMMU v1 page table when allocating
> >>> * domain for pass-through devices.
> >>> */
> >>> if (type == IOMMU_DOMAIN_UNMANAGED)
> >>> pgtable = AMD_IOMMU_V1;
> >>>
> >>> Hardware-wise, in nested case, can V1 be used in L1/S1 for DMA w/o
> >>> PASID?
> >>
> >> Actually no. L1 (Guest) will use V2 page table.
> >
> > Where is the code for this? How does it work today?
> >
> > I have the same confusion as Jacob where I don't see the driver
> > making the distinction - ie as above...
>
> Sorry. I was only thinking about HW vIOMMU support.
>
> Today we have two modes:
> 1) Device passthrough : No IOMMU support inside guest. Host will use
> V1 page table.
>
> 2) Emulated amd-iommu driver in qemu :
> For some reason (which I am not aware) emulated AMD IOMMU driver in
> qemu implemented V1 page table. So if we use emulated IOMMU it will
> use V1 page table in guest. But this doesn't support DMA for
> passthrough devices.
>
> Hence it worked fine so far.
>
> For HW-vIOMMU where we will have nested page table, Host will be in
> V1 and Guest will be in V2 page table. We need to add few checks to
> detect and use appropriate page table. It should be part of nested
> page table support.
>
Thanks for explaining. I am curious to see how vIOMMU is virtualized,
since for each guest DTE (assume it is the same format as host), you
would need a Host Page Table Root Pointer (SPA) then add PASID support
in GCR3 table.
I don't know where the guest can get the host page table root pointer,
perhaps just ignored?
On Intel, guest can just do S1 only, S2 pointer can be ignored
architectually based on PGTT (PASID Granular Translation Type).
Thanks,
Jacob
^ permalink raw reply [flat|nested] 41+ messages in thread[parent not found: <66d0b2a1.630a0220.1dd301.daceSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
[not found] ` <66d0b2a1.630a0220.1dd301.daceSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2024-08-30 15:00 ` Jason Gunthorpe
0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-30 15:00 UTC (permalink / raw)
To: Jacob Pan
Cc: Vasant Hegde, Yi Liu, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, baolu.lu, kevin.tian
On Thu, Aug 29, 2024 at 10:40:47AM -0700, Jacob Pan wrote:
> I don't know where the guest can get the host page table root pointer,
> perhaps just ignored?
The vDTE created by the guest will have host translation disabled, and
no host page table pointer. The vDTE will only point to a GCR3
table. Same as ARM.
> On Intel, guest can just do S1 only, S2 pointer can be ignored
> architectually based on PGTT (PASID Granular Translation Type).
Yes, same idea.
AMD is just lacking a bit to tell the guest that the HW doesn't
support host translation :( :( :(
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 16:31 ` Jason Gunthorpe
2024-08-22 1:50 ` Baolu Lu
2024-08-22 11:27 ` Yi Liu
@ 2024-08-26 8:36 ` Vasant Hegde
2024-08-26 13:56 ` Jason Gunthorpe
2 siblings, 1 reply; 41+ messages in thread
From: Vasant Hegde @ 2024-08-26 8:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
Jason,
On 8/21/2024 10:01 PM, Jason Gunthorpe wrote:
> On Wed, Aug 21, 2024 at 01:35:50PM +0000, Vasant Hegde wrote:
>> From: Jason Gunthorpe <jgg@ziepe.ca>
>>
>> Currently drivers calls iommu_paging_domain_alloc(dev) to get an
>> UNAMANAGED domain. This is not sufficient to support PASID with
>> UNAMANAGED domain as some HW like AMD requires certain page table type
>> to support PASIDs.
>>
>> Also domain_alloc_paging() passes device as param for domain
>> allocation. This is not sufficient for AMD driver to decide the right
>> page table.
>>
>> Hence add iommu_paging_domain_alloc_flags() API which takes flags as
>> parameter. Also update default domain allocation path to use this new
>> API whenever device is PASID capable. HW driver should implement
>> domain_alloc_user() to allocate PASID capable domain.
>>
>> Finally introduce new flag (IOMMUFD_HWPT_ALLOC_PASID) to
>> domain_alloc_users() ops. If both IOMMU and device supports PASID it
>> will allocate domain. Otherwise return error.
>>
>> Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca>
>> [Added __iommu_paging_domain_alloc_flags() and description - Vasant]
>> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
>> ---
>> @Jason,
>> Notice that I have added __iommu_paging_domain_alloc_flags() so that
>> it can call iommu_domain_init() with appropriate domain type.
>
> It is okay, but also the type could be fixed in
> __iommu_group_alloc_default_domain() using something like:
>
> ret->type |= req_type;
Sorry. I didn't get it. You mean domain->type? If yes, its set in
iommu_domain_init(). (That's why I had to introduce
__iommu_group_alloc_default_domain()).
>
>> I think instead of having separate function it may be better to
>> enhance __iommu_domain_alloc() such that:
>> - Keep below changes from this patch
>> - iommu_domain_init()
>> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
>> - modify __iommu_domain_alloc() to additional param (flags)
>> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
>
> My expectation was to basically remove iommu_domain_alloc() entirely
> once Lu's work is merged.
>
> Instead we'd have these direct APIs:
> iommu_domain_alloc_paging_flags()
> iommu_group_alloc_blocking_domain()
> iommu_group_alloc_identity_domain()
Sure (For AMD driver, we may have to tweak a bit to allocate_identity domain,
but doable).
>
> And then iommu_group_alloc_default_domain() would just call them in
> the right order maybe like this:
>
> if (req_type & __IOMMU_DOMAIN_PAGING)
> return iommu_domain_alloc_paging_flags();
> if (req_type && req_type != IOMMU_DOMAIN_IDENTITY)
> return ERR_PTR(-EOPNOTSUPP);
>
> if (req_type || iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY) {
> dom = iommu_group_alloc_identity_domain();
> if (req_type || !IS_ERR(dom))
> return dom;
> /*
> * if iommu_def_domain_type == IDENTITY fails then fall through
> * to PAGING
> */
> }
>
> /* iommu_def_domain_type is PAGING */
> dom = iommu_domain_alloc_paging_flags();
> if (IS_ERR(dom))
> return dom;
>
> pr_warn("Failed to allocate default IOMMU domain of type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
> iommu_def_domain_type, group->name);
> return dom;
>
> Which is the only place we need to make a decision based on a type
> input.
>
> Hoping Lu's final few patches make it this cycle
>
>> +static void iommu_domain_init(struct iommu_domain *domain, unsigned int type,
>> + const struct iommu_ops *ops)
>> +{
>> + domain->type = type;
>> + domain->owner = ops;
>> + if (!domain->ops)
>> + domain->ops = ops->default_domain_ops;
>> +
>> + /*
>> + * If not already set, assume all sizes by default; the driver
>> + * may override this later
>> + */
>> + if (!domain->pgsize_bitmap)
>> + domain->pgsize_bitmap = ops->pgsize_bitmap;
>> +}
>
> Pedantically the pgsize_bitmap is only needed for paging domains, but
> it is OK like this too.
Right. Lets keep it for now. I can do a cleanup patch later.
>
>> @@ -359,11 +359,17 @@ struct iommu_vfio_ioas {
>> * enforced on device attachment
>> * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
>> * valid.
>> + * @IOMMUFD_HWPT_ALLOC_PASID: When the domain is used on a device, with no
>> + * PASID, the device will support later attaching
>> + * a PASID as well. Some HW requires a specific
>> + * domain format on the device to allow PASID to
>> + * work.
>
> Maybe:
>
> Requests a domain that can be used with PASID. The domain can be
> attached to any PASID on the device. Any domain attached to the
> non-PASID part of the device must also be flaged, otherwise attaching
> a PASID will blocked.
Ok.
>
> Yi will need to add a check that IOMMUFD_HWPT_ALLOC_PASID was
> specified on the RID domain while processing attach on the PASID
> domain.
>
>> enum iommufd_hwpt_alloc_flags {
>> IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
>> IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
>> + IOMMUFD_HWPT_ALLOC_PASID = 1 << 3,
>
> Let's keep the consistent spelling IOMMU_HWPT_ALLOC_PASID
Sorry. Its my fault. Will fix it.
>
> I think the series looks OK. The naming of domain_alloc_user() is now
> a little bit weird, it has now turned into
> domain_alloc_paging_extended().
Ack. Do you want me to do it now -OR- May be wait until all changes settles down
(above described group() related changes, this series, Baolu's series, etc)?
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-26 8:36 ` Vasant Hegde
@ 2024-08-26 13:56 ` Jason Gunthorpe
2024-08-29 12:34 ` Vasant Hegde
0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-26 13:56 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
On Mon, Aug 26, 2024 at 02:06:27PM +0530, Vasant Hegde wrote:
> >> @Jason,
> >> Notice that I have added __iommu_paging_domain_alloc_flags() so that
> >> it can call iommu_domain_init() with appropriate domain type.
> >
> > It is okay, but also the type could be fixed in
> > __iommu_group_alloc_default_domain() using something like:
> >
> > ret->type |= req_type;
>
> Sorry. I didn't get it. You mean domain->type?
Sorry, yes. I mean the path that knows it is allocating for the DMA
API can or in the DMA API flags after allocation succeeds and we don't
have to muddle lower levels by passing them all around.
> >> I think instead of having separate function it may be better to
> >> enhance __iommu_domain_alloc() such that:
> >> - Keep below changes from this patch
> >> - iommu_domain_init()
> >> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
> >> - modify __iommu_domain_alloc() to additional param (flags)
> >> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
> >
> > My expectation was to basically remove iommu_domain_alloc() entirely
> > once Lu's work is merged.
> >
> > Instead we'd have these direct APIs:
> > iommu_domain_alloc_paging_flags()
> > iommu_group_alloc_blocking_domain()
> > iommu_group_alloc_identity_domain()
>
> Sure (For AMD driver, we may have to tweak a bit to allocate_identity domain,
> but doable).
Why does it need to allocate_identity ?
> > I think the series looks OK. The naming of domain_alloc_user() is now
> > a little bit weird, it has now turned into
> > domain_alloc_paging_extended().
>
> Ack. Do you want me to do it now -OR- May be wait until all changes settles down
> (above described group() related changes, this series, Baolu's series, etc)?
I think if we rename things it will clash with the smmu nesting
series, lets live with it for now.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-26 13:56 ` Jason Gunthorpe
@ 2024-08-29 12:34 ` Vasant Hegde
0 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-29 12:34 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
Jason,
On 8/26/2024 7:26 PM, Jason Gunthorpe wrote:
> On Mon, Aug 26, 2024 at 02:06:27PM +0530, Vasant Hegde wrote:
>>>> @Jason,
>>>> Notice that I have added __iommu_paging_domain_alloc_flags() so that
>>>> it can call iommu_domain_init() with appropriate domain type.
>>>
>>> It is okay, but also the type could be fixed in
>>> __iommu_group_alloc_default_domain() using something like:
>>>
>>> ret->type |= req_type;
>>
>> Sorry. I didn't get it. You mean domain->type?
>
> Sorry, yes. I mean the path that knows it is allocating for the DMA
> API can or in the DMA API flags after allocation succeeds and we don't
> have to muddle lower levels by passing them all around.
Ok.
>
>>>> I think instead of having separate function it may be better to
>>>> enhance __iommu_domain_alloc() such that:
>>>> - Keep below changes from this patch
>>>> - iommu_domain_init()
>>>> - iommu_get_dma_cookie call inside iommu_setup_default_domain()
>>>> - modify __iommu_domain_alloc() to additional param (flags)
>>>> - iommu_paging_domain_alloc_flags() will call __iommu_domain_alloc()
>>>
>>> My expectation was to basically remove iommu_domain_alloc() entirely
>>> once Lu's work is merged.
>>>
>>> Instead we'd have these direct APIs:
>>> iommu_domain_alloc_paging_flags()
>>> iommu_group_alloc_blocking_domain()
>>> iommu_group_alloc_identity_domain()
>>
>> Sure (For AMD driver, we may have to tweak a bit to allocate_identity domain,
>> but doable).
>
> Why does it need to allocate_identity ?
Ignore. I got mixed up with global identity domain and this one.
>
>>> I think the series looks OK. The naming of domain_alloc_user() is now
>>> a little bit weird, it has now turned into
>>> domain_alloc_paging_extended().
>>
>> Ack. Do you want me to do it now -OR- May be wait until all changes settles down
>> (above described group() related changes, this series, Baolu's series, etc)?
>
> I think if we rename things it will clash with the smmu nesting
> series, lets live with it for now.
Sure.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
2024-08-21 16:31 ` Jason Gunthorpe
@ 2024-08-22 1:38 ` Baolu Lu
2024-08-22 12:40 ` Jason Gunthorpe
2024-08-26 6:09 ` Vasant Hegde
2024-08-22 2:10 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 2 replies; 41+ messages in thread
From: Baolu Lu @ 2024-08-22 1:38 UTC (permalink / raw)
To: Vasant Hegde, iommu, joro
Cc: baolu.lu, will, robin.murphy, suravee.suthikulpanit, jgg,
yi.l.liu, kevin.tian
On 8/21/24 9:35 PM, Vasant Hegde wrote:
> From: Jason Gunthorpe<jgg@ziepe.ca>
>
> Currently drivers calls iommu_paging_domain_alloc(dev) to get an
> UNAMANAGED domain. This is not sufficient to support PASID with
> UNAMANAGED domain as some HW like AMD requires certain page table type
> to support PASIDs.
>
> Also domain_alloc_paging() passes device as param for domain
> allocation. This is not sufficient for AMD driver to decide the right
> page table.
>
> Hence add iommu_paging_domain_alloc_flags() API which takes flags as
> parameter. Also update default domain allocation path to use this new
> API whenever device is PASID capable. HW driver should implement
> domain_alloc_user() to allocate PASID capable domain.
>
> Finally introduce new flag (IOMMUFD_HWPT_ALLOC_PASID) to
> domain_alloc_users() ops. If both IOMMU and device supports PASID it
> will allocate domain. Otherwise return error.
I would suggest one patch for one thing. This will make the changes
easier to review, especially for those who don't have much context. It
might be helpful to split this into some smaller changes:
- Code refactoring with no functionality change;
- Introduce new domain allocation API;
- Introduce domain allocation flag;
- Code refactoring afterwards.
>
> Signed-off-by: Jason Gunthorpe<jgg@ziepe.ca>
> [Added __iommu_paging_domain_alloc_flags() and description - Vasant]
> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com>
[...]
> ---+static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct device *dev,
> + unsigned int type,
> + unsigned int flags)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_domain *domain;
> +
> + if (!dev_has_iommu(dev))
> + return ERR_PTR(-ENODEV);
This check is unnecessary as you have already called dev_iommu_ops(),
which assumes that the device is iommu probed.
If you are worrying about the caller passing a non-iommu-probed device,
you can make it like this:
const struct iommu_ops *ops;
if (!dev_has_iommu(dev))
return ERR_PTR(-ENODEV);
ops = dev_iommu_ops(dev);
> +
> + if (ops->domain_alloc_paging && !flags)
> + domain = ops->domain_alloc_paging(dev);
> + else if (ops->domain_alloc_user)
> + domain = ops->domain_alloc_user(dev, flags, NULL, NULL);
> + else if (ops->domain_alloc && !flags)
> + domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
> + else
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + if (IS_ERR(domain))
> + return domain;
> + if (!domain)
> + return ERR_PTR(-ENOMEM);
> +
> + iommu_domain_init(domain, type, ops);
> + return domain;
> +}
> +
> /**
> - * iommu_paging_domain_alloc() - Allocate a paging domain
> + * iommu_paging_domain_alloc_flags() - Allocate a paging domain
> * @dev: device for which the domain is allocated
> + * @flags: Bitmap of iommufd_hwpt_alloc_flags
> *
> * Allocate a paging domain which will be managed by a kernel driver. Return
> * allocated domain if successful, or a ERR pointer for failure.
> */
> -struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
> +struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
> + unsigned int flags)
> {
> - if (!dev_has_iommu(dev))
> - return ERR_PTR(-ENODEV);
> -
> - return __iommu_domain_alloc(dev_iommu_ops(dev), dev, IOMMU_DOMAIN_UNMANAGED);
> + return __iommu_paging_domain_alloc_flags(dev,
> + IOMMU_DOMAIN_UNMANAGED, flags);
> }
> -EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc);
> +EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
Some external modules are still using iommu_paging_domain_alloc(). Would
it break the kernel build if you simply remove it here?
[...]
Thanks,
baolu
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 1:38 ` Baolu Lu
@ 2024-08-22 12:40 ` Jason Gunthorpe
2024-08-23 2:04 ` Baolu Lu
2024-08-26 6:09 ` Vasant Hegde
1 sibling, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-22 12:40 UTC (permalink / raw)
To: Baolu Lu
Cc: Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, yi.l.liu, kevin.tian
On Thu, Aug 22, 2024 at 09:38:31AM +0800, Baolu Lu wrote:
> > -EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc);
> > +EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
>
> Some external modules are still using iommu_paging_domain_alloc(). Would
> it break the kernel build if you simply remove it here?
modules have to be recompiled if the kernel is changed, they will pick
up the inline in the header and shift to this API. But generally we
don't worry about external/out of tree modules..
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 12:40 ` Jason Gunthorpe
@ 2024-08-23 2:04 ` Baolu Lu
0 siblings, 0 replies; 41+ messages in thread
From: Baolu Lu @ 2024-08-23 2:04 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Vasant Hegde, iommu, joro, will, robin.murphy,
suravee.suthikulpanit, yi.l.liu, kevin.tian
On 8/22/24 8:40 PM, Jason Gunthorpe wrote:
> On Thu, Aug 22, 2024 at 09:38:31AM +0800, Baolu Lu wrote:
>
>>> -EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc);
>>> +EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags);
>> Some external modules are still using iommu_paging_domain_alloc(). Would
>> it break the kernel build if you simply remove it here?
> modules have to be recompiled if the kernel is changed, they will pick
> up the inline in the header and shift to this API. But generally we
> don't worry about external/out of tree modules..
I see. I overlooked below inline helper.
+static inline struct iommu_domain *iommu_paging_domain_alloc(struct
device *dev)
+{
+ return iommu_paging_domain_alloc_flags(dev, 0);
+}
Thanks,
baolu
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-22 1:38 ` Baolu Lu
2024-08-22 12:40 ` Jason Gunthorpe
@ 2024-08-26 6:09 ` Vasant Hegde
1 sibling, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-26 6:09 UTC (permalink / raw)
To: Baolu Lu, iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
kevin.tian
Hi Baolu,
On 8/22/2024 7:08 AM, Baolu Lu wrote:
> On 8/21/24 9:35 PM, Vasant Hegde wrote:
>> From: Jason Gunthorpe<jgg@ziepe.ca>
>>
>> Currently drivers calls iommu_paging_domain_alloc(dev) to get an
>> UNAMANAGED domain. This is not sufficient to support PASID with
>> UNAMANAGED domain as some HW like AMD requires certain page table type
>> to support PASIDs.
>>
>> Also domain_alloc_paging() passes device as param for domain
>> allocation. This is not sufficient for AMD driver to decide the right
>> page table.
>>
>> Hence add iommu_paging_domain_alloc_flags() API which takes flags as
>> parameter. Also update default domain allocation path to use this new
>> API whenever device is PASID capable. HW driver should implement
>> domain_alloc_user() to allocate PASID capable domain.
>>
>> Finally introduce new flag (IOMMUFD_HWPT_ALLOC_PASID) to
>> domain_alloc_users() ops. If both IOMMU and device supports PASID it
>> will allocate domain. Otherwise return error.
>
> I would suggest one patch for one thing. This will make the changes
> easier to review, especially for those who don't have much context. It
> might be helpful to split this into some smaller changes:
>
> - Code refactoring with no functionality change;
> - Introduce new domain allocation API;
> - Introduce domain allocation flag;
> - Code refactoring afterwards.
Ack. Will split it into smaller patch.
>
>>
>> Signed-off-by: Jason Gunthorpe<jgg@ziepe.ca>
>> [Added __iommu_paging_domain_alloc_flags() and description - Vasant]
>> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com>
>
>
> [...]
>
>> ---+static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct
>> device *dev,
>> + unsigned int type,
>> + unsigned int flags)
>> +{
>> + const struct iommu_ops *ops = dev_iommu_ops(dev);
>> + struct iommu_domain *domain;
>> +
>> + if (!dev_has_iommu(dev))
>> + return ERR_PTR(-ENODEV);
>
> This check is unnecessary as you have already called dev_iommu_ops(),
> which assumes that the device is iommu probed.
>
> If you are worrying about the caller passing a non-iommu-probed device,
> you can make it like this:
>
> const struct iommu_ops *ops;
>
> if (!dev_has_iommu(dev))
> return ERR_PTR(-ENODEV);
>
> ops = dev_iommu_ops(dev);
Ack. Will fix it.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
2024-08-21 16:31 ` Jason Gunthorpe
2024-08-22 1:38 ` Baolu Lu
@ 2024-08-22 2:10 ` kernel test robot
2024-08-22 3:03 ` kernel test robot
2024-08-22 5:07 ` kernel test robot
4 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2024-08-22 2:10 UTC (permalink / raw)
To: Vasant Hegde, iommu, joro
Cc: llvm, oe-kbuild-all, will, robin.murphy, suravee.suthikulpanit,
jgg, yi.l.liu, baolu.lu, kevin.tian, Vasant Hegde
Hi Vasant,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.11-rc4 next-20240821]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vasant-Hegde/iommu-Enhance-domain-allocation-code-to-take-additional-flags/20240821-214124
base: linus/master
patch link: https://lore.kernel.org/r/20240821133554.7405-2-vasant.hegde%40amd.com
patch subject: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
config: arm-footbridge_defconfig (https://download.01.org/0day-ci/archive/20240822/202408220940.YjSy3Avg-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 26670e7fa4f032a019d23d56c6a02926e854e8af)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240822/202408220940.YjSy3Avg-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408220940.YjSy3Avg-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/pci/pci-driver.c:7:
In file included from include/linux/pci.h:1646:
In file included from include/linux/dmapool.h:14:
In file included from include/linux/scatterlist.h:8:
In file included from include/linux/mm.h:2228:
include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
517 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
In file included from drivers/pci/pci-driver.c:23:
include/linux/iommu.h:1103:1: error: declaration of anonymous struct must be a definition
1103 | struct inline iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
| ^
include/linux/iommu.h:1103:1: warning: declaration does not declare anything [-Wmissing-declarations]
>> drivers/pci/pci-driver.c:1657:9: error: call to undeclared function 'iommu_device_use_default_domain'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1657 | ret = iommu_device_use_default_domain(dev);
| ^
>> drivers/pci/pci-driver.c:1670:3: error: call to undeclared function 'iommu_device_unuse_default_domain'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1670 | iommu_device_unuse_default_domain(dev);
| ^
2 warnings and 3 errors generated.
vim +/iommu_device_use_default_domain +1657 drivers/pci/pci-driver.c
02e0bea6c83c657 Phil Sutter 2017-01-18 1629
07397df29e57cde Nipun Gupta 2018-04-28 1630 /**
07397df29e57cde Nipun Gupta 2018-04-28 1631 * pci_dma_configure - Setup DMA configuration
07397df29e57cde Nipun Gupta 2018-04-28 1632 * @dev: ptr to dev structure
07397df29e57cde Nipun Gupta 2018-04-28 1633 *
07397df29e57cde Nipun Gupta 2018-04-28 1634 * Function to update PCI devices's DMA configuration using the same
07397df29e57cde Nipun Gupta 2018-04-28 1635 * info from the OF node or ACPI node of host bridge's parent (if any).
07397df29e57cde Nipun Gupta 2018-04-28 1636 */
07397df29e57cde Nipun Gupta 2018-04-28 1637 static int pci_dma_configure(struct device *dev)
07397df29e57cde Nipun Gupta 2018-04-28 1638 {
512881eacfa72c2 Lu Baolu 2022-04-18 1639 struct pci_driver *driver = to_pci_driver(dev->driver);
07397df29e57cde Nipun Gupta 2018-04-28 1640 struct device *bridge;
07397df29e57cde Nipun Gupta 2018-04-28 1641 int ret = 0;
07397df29e57cde Nipun Gupta 2018-04-28 1642
07397df29e57cde Nipun Gupta 2018-04-28 1643 bridge = pci_get_host_bridge_device(to_pci_dev(dev));
07397df29e57cde Nipun Gupta 2018-04-28 1644
07397df29e57cde Nipun Gupta 2018-04-28 1645 if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
07397df29e57cde Nipun Gupta 2018-04-28 1646 bridge->parent->of_node) {
3d6ce86ee79465e Christoph Hellwig 2018-05-03 1647 ret = of_dma_configure(dev, bridge->parent->of_node, true);
07397df29e57cde Nipun Gupta 2018-04-28 1648 } else if (has_acpi_companion(bridge)) {
07397df29e57cde Nipun Gupta 2018-04-28 1649 struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
07397df29e57cde Nipun Gupta 2018-04-28 1650
e5361ca29f2fea3 Robin Murphy 2018-12-06 1651 ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
07397df29e57cde Nipun Gupta 2018-04-28 1652 }
07397df29e57cde Nipun Gupta 2018-04-28 1653
07397df29e57cde Nipun Gupta 2018-04-28 1654 pci_put_host_bridge_device(bridge);
512881eacfa72c2 Lu Baolu 2022-04-18 1655
512881eacfa72c2 Lu Baolu 2022-04-18 1656 if (!ret && !driver->driver_managed_dma) {
512881eacfa72c2 Lu Baolu 2022-04-18 @1657 ret = iommu_device_use_default_domain(dev);
512881eacfa72c2 Lu Baolu 2022-04-18 1658 if (ret)
512881eacfa72c2 Lu Baolu 2022-04-18 1659 arch_teardown_dma_ops(dev);
512881eacfa72c2 Lu Baolu 2022-04-18 1660 }
512881eacfa72c2 Lu Baolu 2022-04-18 1661
07397df29e57cde Nipun Gupta 2018-04-28 1662 return ret;
07397df29e57cde Nipun Gupta 2018-04-28 1663 }
07397df29e57cde Nipun Gupta 2018-04-28 1664
512881eacfa72c2 Lu Baolu 2022-04-18 1665 static void pci_dma_cleanup(struct device *dev)
512881eacfa72c2 Lu Baolu 2022-04-18 1666 {
512881eacfa72c2 Lu Baolu 2022-04-18 1667 struct pci_driver *driver = to_pci_driver(dev->driver);
512881eacfa72c2 Lu Baolu 2022-04-18 1668
512881eacfa72c2 Lu Baolu 2022-04-18 1669 if (!driver->driver_managed_dma)
512881eacfa72c2 Lu Baolu 2022-04-18 @1670 iommu_device_unuse_default_domain(dev);
512881eacfa72c2 Lu Baolu 2022-04-18 1671 }
512881eacfa72c2 Lu Baolu 2022-04-18 1672
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
` (2 preceding siblings ...)
2024-08-22 2:10 ` kernel test robot
@ 2024-08-22 3:03 ` kernel test robot
2024-08-22 5:07 ` kernel test robot
4 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2024-08-22 3:03 UTC (permalink / raw)
To: Vasant Hegde, iommu, joro
Cc: llvm, oe-kbuild-all, will, robin.murphy, suravee.suthikulpanit,
jgg, yi.l.liu, baolu.lu, kevin.tian, Vasant Hegde
Hi Vasant,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.11-rc4 next-20240821]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vasant-Hegde/iommu-Enhance-domain-allocation-code-to-take-additional-flags/20240821-214124
base: linus/master
patch link: https://lore.kernel.org/r/20240821133554.7405-2-vasant.hegde%40amd.com
patch subject: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
config: i386-randconfig-004-20240822 (https://download.01.org/0day-ci/archive/20240822/202408221020.wjaKCVvB-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240822/202408221020.wjaKCVvB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408221020.wjaKCVvB-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/gpu/drm/arm/display/komeda/komeda_dev.c:8:
include/linux/iommu.h:1103:1: error: declaration of anonymous struct must be a definition
1103 | struct inline iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
| ^
include/linux/iommu.h:1103:1: warning: declaration does not declare anything [-Wmissing-declarations]
include/linux/iommu.h:1478:48: warning: declaration of 'struct msi_desc' will not be visible outside of this function [-Wvisibility]
1478 | static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
| ^
include/linux/iommu.h:1483:53: warning: declaration of 'struct msi_desc' will not be visible outside of this function [-Wvisibility]
1483 | static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
| ^
>> drivers/gpu/drm/arm/display/komeda/komeda_dev.c:250:16: error: call to undeclared function 'iommu_get_domain_for_dev'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
250 | mdev->iommu = iommu_get_domain_for_dev(mdev->dev);
| ^
>> drivers/gpu/drm/arm/display/komeda/komeda_dev.c:250:14: error: incompatible integer to pointer conversion assigning to 'struct iommu_domain *' from 'int' [-Wint-conversion]
250 | mdev->iommu = iommu_get_domain_for_dev(mdev->dev);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings and 3 errors generated.
vim +250 drivers/gpu/drm/arm/display/komeda/komeda_dev.c
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 179)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 180) struct komeda_dev *komeda_dev_create(struct device *dev)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 181) {
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 182) struct platform_device *pdev = to_platform_device(dev);
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 183) komeda_identify_func komeda_identify;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 184) struct komeda_dev *mdev;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 185) int err = 0;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 186)
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 187) komeda_identify = of_device_get_match_data(dev);
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 188) if (!komeda_identify)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 189) return ERR_PTR(-ENODEV);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 190)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 191) mdev = devm_kzalloc(dev, sizeof(*mdev), GFP_KERNEL);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 192) if (!mdev)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 193) return ERR_PTR(-ENOMEM);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 194)
20d84aa8417910 james qian wang (Arm Technology China 2019-01-22 195) mutex_init(&mdev->lock);
20d84aa8417910 james qian wang (Arm Technology China 2019-01-22 196)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 197) mdev->dev = dev;
50ec5b563bed04 Markus Elfring 2019-09-21 198 mdev->reg_base = devm_platform_ioremap_resource(pdev, 0);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 199) if (IS_ERR(mdev->reg_base)) {
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 200) DRM_ERROR("Map register space failed.\n");
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 201) err = PTR_ERR(mdev->reg_base);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 202) mdev->reg_base = NULL;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 203) goto err_cleanup;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 204) }
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 205)
6f84da0c74f12f james qian wang (Arm Technology China 2019-06-05 206) mdev->aclk = devm_clk_get(dev, "aclk");
6f84da0c74f12f james qian wang (Arm Technology China 2019-06-05 207) if (IS_ERR(mdev->aclk)) {
28be315c9c0c0b james qian wang (Arm Technology China 2019-06-05 208) DRM_ERROR("Get engine clk failed.\n");
6f84da0c74f12f james qian wang (Arm Technology China 2019-06-05 209) err = PTR_ERR(mdev->aclk);
6f84da0c74f12f james qian wang (Arm Technology China 2019-06-05 210) mdev->aclk = NULL;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 211) goto err_cleanup;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 212) }
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 213)
6f84da0c74f12f james qian wang (Arm Technology China 2019-06-05 214) clk_prepare_enable(mdev->aclk);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 215)
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 216) mdev->funcs = komeda_identify(mdev->reg_base, &mdev->chip);
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 217) if (!mdev->funcs) {
b25bc78f8a0750 james qian wang (Arm Technology China 2019-12-10 218) DRM_ERROR("Failed to identify the HW.\n");
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 219) err = -ENODEV;
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 220) goto disable_clk;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 221) }
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 222)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 223) DRM_INFO("Found ARM Mali-D%x version r%dp%d\n",
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 224) MALIDP_CORE_ID_PRODUCT_ID(mdev->chip.core_id),
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 225) MALIDP_CORE_ID_MAJOR(mdev->chip.core_id),
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 226) MALIDP_CORE_ID_MINOR(mdev->chip.core_id));
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 227)
981d29d2db7c96 james qian wang (Arm Technology China 2019-01-03 228) mdev->funcs->init_format_table(mdev);
981d29d2db7c96 james qian wang (Arm Technology China 2019-01-03 229)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 230) err = mdev->funcs->enum_resources(mdev);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 231) if (err) {
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 232) DRM_ERROR("enumerate display resource failed.\n");
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 233) goto disable_clk;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 234) }
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 235)
29e56aec911dd7 james qian wang (Arm Technology China 2019-01-03 236) err = komeda_parse_dt(dev, mdev);
29e56aec911dd7 james qian wang (Arm Technology China 2019-01-03 237) if (err) {
29e56aec911dd7 james qian wang (Arm Technology China 2019-01-03 238) DRM_ERROR("parse device tree failed.\n");
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 239) goto disable_clk;
29e56aec911dd7 james qian wang (Arm Technology China 2019-01-03 240) }
29e56aec911dd7 james qian wang (Arm Technology China 2019-01-03 241)
321e925c5813c2 james qian wang (Arm Technology China 2019-01-22 242) err = komeda_assemble_pipelines(mdev);
321e925c5813c2 james qian wang (Arm Technology China 2019-01-22 243) if (err) {
321e925c5813c2 james qian wang (Arm Technology China 2019-01-22 244) DRM_ERROR("assemble display pipelines failed.\n");
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 245) goto disable_clk;
321e925c5813c2 james qian wang (Arm Technology China 2019-01-22 246) }
321e925c5813c2 james qian wang (Arm Technology China 2019-01-22 247)
1c831ade9f352d Robin Murphy 2020-09-03 248 dma_set_max_seg_size(dev, U32_MAX);
a260e0b847f079 Lowry Li (Arm Technology China 2019-04-11 249)
e87cae37f6006f Lowry Li (Arm Technology China 2019-06-06 @250) mdev->iommu = iommu_get_domain_for_dev(mdev->dev);
e87cae37f6006f Lowry Li (Arm Technology China 2019-06-06 251) if (!mdev->iommu)
e87cae37f6006f Lowry Li (Arm Technology China 2019-06-06 252) DRM_INFO("continue without IOMMU support!\n");
e87cae37f6006f Lowry Li (Arm Technology China 2019-06-06 253)
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 254) clk_disable_unprepare(mdev->aclk);
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 255)
55223394d56bab james qian wang (Arm Technology China 2019-01-22 256) err = sysfs_create_group(&dev->kobj, &komeda_sysfs_attr_group);
55223394d56bab james qian wang (Arm Technology China 2019-01-22 257) if (err) {
55223394d56bab james qian wang (Arm Technology China 2019-01-22 258) DRM_ERROR("create sysfs group failed.\n");
55223394d56bab james qian wang (Arm Technology China 2019-01-22 259) goto err_cleanup;
55223394d56bab james qian wang (Arm Technology China 2019-01-22 260) }
55223394d56bab james qian wang (Arm Technology China 2019-01-22 261)
8894cd5824e500 Mihail Atanassov 2019-11-07 262 mdev->err_verbosity = KOMEDA_DEV_PRINT_ERR_EVENTS;
8894cd5824e500 Mihail Atanassov 2019-11-07 263
7d3cfb70a604d2 james qian wang (Arm Technology China 2019-01-22 264) komeda_debugfs_init(mdev);
7d3cfb70a604d2 james qian wang (Arm Technology China 2019-01-22 265)
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 266) return mdev;
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 267)
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 268) disable_clk:
2ebb6701654e0d Lowry Li (Arm Technology China 2019-09-23 269) clk_disable_unprepare(mdev->aclk);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 270) err_cleanup:
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 271) komeda_dev_destroy(mdev);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 272) return ERR_PTR(err);
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 273) }
bd628c1bed7902 james qian wang (Arm Technology China 2019-01-03 274)
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
` (3 preceding siblings ...)
2024-08-22 3:03 ` kernel test robot
@ 2024-08-22 5:07 ` kernel test robot
4 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2024-08-22 5:07 UTC (permalink / raw)
To: Vasant Hegde, iommu, joro
Cc: oe-kbuild-all, will, robin.murphy, suravee.suthikulpanit, jgg,
yi.l.liu, baolu.lu, kevin.tian, Vasant Hegde
Hi Vasant,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.11-rc4 next-20240821]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vasant-Hegde/iommu-Enhance-domain-allocation-code-to-take-additional-flags/20240821-214124
base: linus/master
patch link: https://lore.kernel.org/r/20240821133554.7405-2-vasant.hegde%40amd.com
patch subject: [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags
config: x86_64-randconfig-075-20240822 (https://download.01.org/0day-ci/archive/20240822/202408221237.HWmdmfvc-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240822/202408221237.HWmdmfvc-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408221237.HWmdmfvc-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from <command-line>:
include/linux/compiler_types.h:236:16: error: expected '{' before 'inline'
236 | #define inline inline __gnu_inline __inline_maybe_unused notrace
| ^~~~~~
include/linux/iommu.h:1103:8: note: in expansion of macro 'inline'
1103 | struct inline iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
| ^~~~~~
>> include/linux/compiler_types.h:236:16: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
236 | #define inline inline __gnu_inline __inline_maybe_unused notrace
| ^~~~~~
include/linux/iommu.h:1103:8: note: in expansion of macro 'inline'
1103 | struct inline iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev,
| ^~~~~~
vim +/inline +236 include/linux/compiler_types.h
71391bdd2e9aab Xiaozhou Liu 2018-12-14 228
71391bdd2e9aab Xiaozhou Liu 2018-12-14 229 /*
71391bdd2e9aab Xiaozhou Liu 2018-12-14 230 * Prefer gnu_inline, so that extern inline functions do not emit an
71391bdd2e9aab Xiaozhou Liu 2018-12-14 231 * externally visible function. This makes extern inline behave as per gnu89
71391bdd2e9aab Xiaozhou Liu 2018-12-14 232 * semantics rather than c99. This prevents multiple symbol definition errors
71391bdd2e9aab Xiaozhou Liu 2018-12-14 233 * of extern inline functions at link time.
71391bdd2e9aab Xiaozhou Liu 2018-12-14 234 * A lot of inline functions can cause havoc with function tracing.
71391bdd2e9aab Xiaozhou Liu 2018-12-14 235 */
889b3c1245de48 Masahiro Yamada 2020-04-06 @236 #define inline inline __gnu_inline __inline_maybe_unused notrace
71391bdd2e9aab Xiaozhou Liu 2018-12-14 237
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/5] iommu/amd: Separate page table setup from domain allocation
2024-08-21 13:35 [PATCH 0/5] iommu: Domain allocation enhancements Vasant Hegde
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
@ 2024-08-21 13:35 ` Vasant Hegde
2024-08-21 16:40 ` Jason Gunthorpe
2024-08-21 13:35 ` [PATCH 3/5] iommu/amd: Pass page table type to pdomain_setup_pgtable() Vasant Hegde
` (2 subsequent siblings)
4 siblings, 1 reply; 41+ messages in thread
From: Vasant Hegde @ 2024-08-21 13:35 UTC (permalink / raw)
To: iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
baolu.lu, kevin.tian, Vasant Hegde
Currently protection_domain_alloc() allocates domain and also sets up
page table. Page table setup is required for PAGING domain only. Domain
type like SVA doesn't need page table. Hence move page table setup code
to separate function.
Also SVA domain allocation path does not call pdomain_setup_pgtable().
Hence removed IOMMU_DOMAIN_SVA type check.
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
drivers/iommu/amd/iommu.c | 46 ++++++++++++++++++++++++---------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b19e8c0f48fa..33930e08bd4c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2302,29 +2302,38 @@ static int protection_domain_init_v2(struct protection_domain *pdom)
struct protection_domain *protection_domain_alloc(unsigned int type)
{
- struct io_pgtable_ops *pgtbl_ops;
struct protection_domain *domain;
- int pgtable;
- int ret;
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
domain->id = domain_id_alloc();
- if (!domain->id)
- goto out_err;
+ if (!domain->id) {
+ kfree(domain);
+ return NULL;
+ }
spin_lock_init(&domain->lock);
INIT_LIST_HEAD(&domain->dev_list);
INIT_LIST_HEAD(&domain->dev_data_list);
domain->nid = NUMA_NO_NODE;
+ domain->domain.type = type;
+ return domain;
+}
+
+static int pdom_setup_pgtable(struct protection_domain *domain,
+ unsigned int type)
+{
+ struct io_pgtable_ops *pgtbl_ops;
+ int pgtable;
+ int ret;
+
switch (type) {
/* No need to allocate io pgtable ops in passthrough mode */
case IOMMU_DOMAIN_IDENTITY:
- case IOMMU_DOMAIN_SVA:
- return domain;
+ return 0;
case IOMMU_DOMAIN_DMA:
pgtable = amd_iommu_pgtable;
break;
@@ -2336,9 +2345,8 @@ struct protection_domain *protection_domain_alloc(unsigned int type)
pgtable = AMD_IOMMU_V1;
break;
default:
- goto out_err;
+ return -EINVAL;
}
-
switch (pgtable) {
case AMD_IOMMU_V1:
ret = protection_domain_init_v1(domain, DEFAULT_PGTABLE_LEVEL);
@@ -2347,21 +2355,17 @@ struct protection_domain *protection_domain_alloc(unsigned int type)
ret = protection_domain_init_v2(domain);
break;
default:
- ret = -EINVAL;
- break;
+ return -EINVAL;
}
if (ret)
- goto out_err;
+ return ret;
pgtbl_ops = alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl_cfg, domain);
if (!pgtbl_ops)
- goto out_err;
+ return -EINVAL;
- return domain;
-out_err:
- protection_domain_free(domain);
- return NULL;
+ return 0;
}
static inline u64 dma_max_address(void)
@@ -2384,6 +2388,7 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
struct protection_domain *domain;
struct amd_iommu *iommu = NULL;
+ int ret;
if (dev)
iommu = get_amd_iommu_from_dev(dev);
@@ -2402,12 +2407,17 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
if (!domain)
return ERR_PTR(-ENOMEM);
+ ret = pdom_setup_pgtable(domain, type);
+ if (ret) {
+ protection_domain_free(domain);
+ return ERR_PTR(ret);
+ }
+
domain->domain.geometry.aperture_start = 0;
domain->domain.geometry.aperture_end = dma_max_address();
domain->domain.geometry.force_aperture = true;
if (iommu) {
- domain->domain.type = type;
domain->domain.pgsize_bitmap = iommu->iommu.ops->pgsize_bitmap;
domain->domain.ops = iommu->iommu.ops->default_domain_ops;
--
2.31.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 2/5] iommu/amd: Separate page table setup from domain allocation
2024-08-21 13:35 ` [PATCH 2/5] iommu/amd: Separate page table setup from domain allocation Vasant Hegde
@ 2024-08-21 16:40 ` Jason Gunthorpe
0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 16:40 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
On Wed, Aug 21, 2024 at 01:35:51PM +0000, Vasant Hegde wrote:
> Currently protection_domain_alloc() allocates domain and also sets up
> page table. Page table setup is required for PAGING domain only. Domain
> type like SVA doesn't need page table. Hence move page table setup code
> to separate function.
>
> Also SVA domain allocation path does not call pdomain_setup_pgtable().
> Hence removed IOMMU_DOMAIN_SVA type check.
>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 46 ++++++++++++++++++++++++---------------
> 1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b19e8c0f48fa..33930e08bd4c 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2302,29 +2302,38 @@ static int protection_domain_init_v2(struct protection_domain *pdom)
>
> struct protection_domain *protection_domain_alloc(unsigned int type)
> {
> - struct io_pgtable_ops *pgtbl_ops;
> struct protection_domain *domain;
> - int pgtable;
> - int ret;
>
> domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> if (!domain)
> return NULL;
>
> domain->id = domain_id_alloc();
> - if (!domain->id)
> - goto out_err;
> + if (!domain->id) {
> + kfree(domain);
> + return NULL;
> + }
>
> spin_lock_init(&domain->lock);
> INIT_LIST_HEAD(&domain->dev_list);
> INIT_LIST_HEAD(&domain->dev_data_list);
> domain->nid = NUMA_NO_NODE;
>
> + domain->domain.type = type;
In the end drivers won't be touching type at all because none of the
ops provide type. When the static identity domain is added this should
be removed.
It looks not too bad to do a basic static identity domain, it would be
good to put it as patch 2 and then remove type here and in later
patches.
> @@ -2402,12 +2407,17 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
> if (!domain)
> return ERR_PTR(-ENOMEM);
>
> + ret = pdom_setup_pgtable(domain, type);
> + if (ret) {
> + protection_domain_free(domain);
> + return ERR_PTR(ret);
> + }
> +
> domain->domain.geometry.aperture_start = 0;
> domain->domain.geometry.aperture_end = dma_max_address();
> domain->domain.geometry.force_aperture = true;
>
> if (iommu) {
> - domain->domain.type = type;
> domain->domain.pgsize_bitmap = iommu->iommu.ops->pgsize_bitmap;
> domain->domain.ops = iommu->iommu.ops->default_domain_ops;
This existing code is wrong anyhow, type and ops are set by the core,
and pgsize_bitmap is being taken from the wrong value.. I am about to
send a series fixing some of that.
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 3/5] iommu/amd: Pass page table type to pdomain_setup_pgtable()
2024-08-21 13:35 [PATCH 0/5] iommu: Domain allocation enhancements Vasant Hegde
2024-08-21 13:35 ` [PATCH 1/5] iommu: Enhance domain allocation code to take additional flags Vasant Hegde
2024-08-21 13:35 ` [PATCH 2/5] iommu/amd: Separate page table setup from domain allocation Vasant Hegde
@ 2024-08-21 13:35 ` Vasant Hegde
2024-08-21 13:35 ` [PATCH 4/5] iommu/amd: Enhance domain_alloc_user() to allocate PASID capable domain Vasant Hegde
2024-08-21 13:35 ` [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support Vasant Hegde
4 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-21 13:35 UTC (permalink / raw)
To: iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
baolu.lu, kevin.tian, Vasant Hegde
Current code forces v1 page table for UNMANAGED domain and global page
table type (amd_iommu_pgtable) for rest of paging domain.
Following patch series adds support for domain_alloc_paging() ops. Also
enhances domain_alloc_user() to allocate page table based on 'flags.
Hence pass page table type to parameter to pdomain_setup_pgtable(). So
that caller can decide right page table type. Also update
dma_max_address() to take pgtable as parameter.
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
drivers/iommu/amd/iommu.c | 50 +++++++++++++++++++--------------------
1 file changed, 24 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 33930e08bd4c..817a26727894 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2324,29 +2324,14 @@ struct protection_domain *protection_domain_alloc(unsigned int type)
}
static int pdom_setup_pgtable(struct protection_domain *domain,
- unsigned int type)
+ int pgtable)
{
struct io_pgtable_ops *pgtbl_ops;
- int pgtable;
int ret;
- switch (type) {
- /* No need to allocate io pgtable ops in passthrough mode */
- case IOMMU_DOMAIN_IDENTITY:
+ if (!(domain->domain.type & __IOMMU_DOMAIN_PAGING))
return 0;
- case IOMMU_DOMAIN_DMA:
- pgtable = amd_iommu_pgtable;
- break;
- /*
- * Force IOMMU v1 page table when allocating
- * domain for pass-through devices.
- */
- case IOMMU_DOMAIN_UNMANAGED:
- pgtable = AMD_IOMMU_V1;
- break;
- default:
- return -EINVAL;
- }
+
switch (pgtable) {
case AMD_IOMMU_V1:
ret = protection_domain_init_v1(domain, DEFAULT_PGTABLE_LEVEL);
@@ -2368,9 +2353,9 @@ static int pdom_setup_pgtable(struct protection_domain *domain,
return 0;
}
-static inline u64 dma_max_address(void)
+static inline u64 dma_max_address(int pgtable)
{
- if (amd_iommu_pgtable == AMD_IOMMU_V1)
+ if (pgtable == AMD_IOMMU_V1)
return ~0ULL;
/* V2 with 4/5 level page table */
@@ -2383,7 +2368,8 @@ static bool amd_iommu_hd_support(struct amd_iommu *iommu)
}
static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
- struct device *dev, u32 flags)
+ struct device *dev,
+ u32 flags, int pgtable)
{
bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
struct protection_domain *domain;
@@ -2407,18 +2393,22 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
if (!domain)
return ERR_PTR(-ENOMEM);
- ret = pdom_setup_pgtable(domain, type);
+ ret = pdom_setup_pgtable(domain, pgtable);
if (ret) {
protection_domain_free(domain);
return ERR_PTR(ret);
}
domain->domain.geometry.aperture_start = 0;
- domain->domain.geometry.aperture_end = dma_max_address();
+ domain->domain.geometry.aperture_end = dma_max_address(pgtable);
domain->domain.geometry.force_aperture = true;
if (iommu) {
- domain->domain.pgsize_bitmap = iommu->iommu.ops->pgsize_bitmap;
+ if (pgtable == AMD_IOMMU_V2)
+ domain->domain.pgsize_bitmap = AMD_IOMMU_PGSIZES_V2;
+ else
+ domain->domain.pgsize_bitmap = iommu->iommu.ops->pgsize_bitmap;
+
domain->domain.ops = iommu->iommu.ops->default_domain_ops;
if (dirty_tracking)
@@ -2431,8 +2421,16 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
{
struct iommu_domain *domain;
+ int pgtable = amd_iommu_pgtable;
+
+ /*
+ * Force IOMMU v1 page table when allocating
+ * domain for pass-through devices.
+ */
+ if (type == IOMMU_DOMAIN_UNMANAGED)
+ pgtable = AMD_IOMMU_V1;
- domain = do_iommu_domain_alloc(type, NULL, 0);
+ domain = do_iommu_domain_alloc(type, NULL, 0, pgtable);
if (IS_ERR(domain))
return NULL;
@@ -2450,7 +2448,7 @@ amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data)
return ERR_PTR(-EOPNOTSUPP);
- return do_iommu_domain_alloc(type, dev, flags);
+ return do_iommu_domain_alloc(type, dev, flags, AMD_IOMMU_V1);
}
void amd_iommu_domain_free(struct iommu_domain *dom)
--
2.31.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 4/5] iommu/amd: Enhance domain_alloc_user() to allocate PASID capable domain
2024-08-21 13:35 [PATCH 0/5] iommu: Domain allocation enhancements Vasant Hegde
` (2 preceding siblings ...)
2024-08-21 13:35 ` [PATCH 3/5] iommu/amd: Pass page table type to pdomain_setup_pgtable() Vasant Hegde
@ 2024-08-21 13:35 ` Vasant Hegde
2024-08-21 13:35 ` [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support Vasant Hegde
4 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-08-21 13:35 UTC (permalink / raw)
To: iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
baolu.lu, kevin.tian, Vasant Hegde
Allocate domain with v2 page table if IOMMUFD_HWPT_ALLOC_PASID flag is
passed to ops->domain_alloc_user().
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
drivers/iommu/amd/iommu.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 817a26727894..a4e6789f0487 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2443,12 +2443,23 @@ amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
unsigned int type = IOMMU_DOMAIN_UNMANAGED;
+ int pgtable = AMD_IOMMU_V1;
if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data)
return ERR_PTR(-EOPNOTSUPP);
- return do_iommu_domain_alloc(type, dev, flags, AMD_IOMMU_V1);
+ /* Allocate v2 page table if IOMMU and device supports PASID. */
+ if (flags & IOMMUFD_HWPT_ALLOC_PASID) {
+ if (amd_iommu_pasid_supported() &&
+ pdev_pasid_supported(dev_data))
+ pgtable = AMD_IOMMU_V2;
+ else
+ return ERR_PTR(-EINVAL);
+ }
+
+ return do_iommu_domain_alloc(type, dev, flags, pgtable);
}
void amd_iommu_domain_free(struct iommu_domain *dom)
--
2.31.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support
2024-08-21 13:35 [PATCH 0/5] iommu: Domain allocation enhancements Vasant Hegde
` (3 preceding siblings ...)
2024-08-21 13:35 ` [PATCH 4/5] iommu/amd: Enhance domain_alloc_user() to allocate PASID capable domain Vasant Hegde
@ 2024-08-21 13:35 ` Vasant Hegde
2024-08-21 15:57 ` Jason Gunthorpe
4 siblings, 1 reply; 41+ messages in thread
From: Vasant Hegde @ 2024-08-21 13:35 UTC (permalink / raw)
To: iommu, joro
Cc: will, robin.murphy, suravee.suthikulpanit, jgg, yi.l.liu,
baolu.lu, kevin.tian, Vasant Hegde
Add support to allocate paging domain. Note that core will call
domain_alloc_user() to allocate PASID capable domain. Hence its not
checking device capability and allocates page table based on
amd_iommu_pgtable.
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
drivers/iommu/amd/iommu.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a4e6789f0487..7bd1ce5469ae 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2437,6 +2437,12 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
return domain;
}
+static struct iommu_domain *amd_iommu_domain_alloc_paging(struct device *dev)
+{
+ return do_iommu_domain_alloc(IOMMU_DOMAIN_DMA,
+ dev, 0, amd_iommu_pgtable);
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
struct iommu_domain *parent,
@@ -2879,6 +2885,7 @@ static int amd_iommu_dev_disable_feature(struct device *dev,
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
+ .domain_alloc_paging = amd_iommu_domain_alloc_paging,
.domain_alloc_user = amd_iommu_domain_alloc_user,
.domain_alloc_sva = amd_iommu_domain_alloc_sva,
.probe_device = amd_iommu_probe_device,
--
2.31.1
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support
2024-08-21 13:35 ` [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support Vasant Hegde
@ 2024-08-21 15:57 ` Jason Gunthorpe
2024-09-11 10:44 ` Vasant Hegde
0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2024-08-21 15:57 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
On Wed, Aug 21, 2024 at 01:35:54PM +0000, Vasant Hegde wrote:
> Add support to allocate paging domain. Note that core will call
> domain_alloc_user() to allocate PASID capable domain. Hence its not
> checking device capability and allocates page table based on
> amd_iommu_pgtable.
>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a4e6789f0487..7bd1ce5469ae 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2437,6 +2437,12 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
> return domain;
> }
>
> +static struct iommu_domain *amd_iommu_domain_alloc_paging(struct device *dev)
> +{
> + return do_iommu_domain_alloc(IOMMU_DOMAIN_DMA,
> + dev, 0, amd_iommu_pgtable);
> +}
> +
> static struct iommu_domain *
> amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
> struct iommu_domain *parent,
> @@ -2879,6 +2885,7 @@ static int amd_iommu_dev_disable_feature(struct device *dev,
> const struct iommu_ops amd_iommu_ops = {
> .capable = amd_iommu_capable,
> .domain_alloc = amd_iommu_domain_alloc,
> + .domain_alloc_paging = amd_iommu_domain_alloc_paging,
Can you include the identity domain conversion too and remove
domain_alloc entirely in this series?
It looks like it would be pretty simple?
Jason
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 5/5] iommu/amd: Add iommu_ops->domain_alloc_paging support
2024-08-21 15:57 ` Jason Gunthorpe
@ 2024-09-11 10:44 ` Vasant Hegde
0 siblings, 0 replies; 41+ messages in thread
From: Vasant Hegde @ 2024-09-11 10:44 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, joro, will, robin.murphy, suravee.suthikulpanit, yi.l.liu,
baolu.lu, kevin.tian
Jason,
On 8/21/2024 9:27 PM, Jason Gunthorpe wrote:
> On Wed, Aug 21, 2024 at 01:35:54PM +0000, Vasant Hegde wrote:
>> Add support to allocate paging domain. Note that core will call
>> domain_alloc_user() to allocate PASID capable domain. Hence its not
>> checking device capability and allocates page table based on
>> amd_iommu_pgtable.
>>
>> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
>> ---
>> drivers/iommu/amd/iommu.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index a4e6789f0487..7bd1ce5469ae 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -2437,6 +2437,12 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type)
>> return domain;
>> }
>>
>> +static struct iommu_domain *amd_iommu_domain_alloc_paging(struct device *dev)
>> +{
>> + return do_iommu_domain_alloc(IOMMU_DOMAIN_DMA,
>> + dev, 0, amd_iommu_pgtable);
>> +}
>> +
>> static struct iommu_domain *
>> amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
>> struct iommu_domain *parent,
>> @@ -2879,6 +2885,7 @@ static int amd_iommu_dev_disable_feature(struct device *dev,
>> const struct iommu_ops amd_iommu_ops = {
>> .capable = amd_iommu_capable,
>> .domain_alloc = amd_iommu_domain_alloc,
>> + .domain_alloc_paging = amd_iommu_domain_alloc_paging,
>
> Can you include the identity domain conversion too and remove
> domain_alloc entirely in this series?
I have included global identity domain in v2 but deferred removing
domain_alloc() path. Once all these changes settle I will remove domain_alloc
and some more cleanup of domain free path.
-Vasant
^ permalink raw reply [flat|nested] 41+ messages in thread