- * [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 12:51   ` Robin Murphy
  2023-03-10 10:14   ` Eric Auger
  2023-03-09 10:53 ` [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3 Nicolin Chen
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
The nature of ITS virtualization on ARM is done via hypercalls, so kernel
handles all IOVA mappings for the MSI doorbell in iommu_dma_prepare_msi()
and iommu_dma_compose_msi_msg(). The current virtualization solution with
a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
guest-level IO page table via a RMR region in guest-level IORT, aligning
with an IOVA region that's predefined and mapped in the host kernel:
  [stage-2 host level]
  #define MSI_IOVA_BASE		0x8000000
  #define MSI_IOVA_LENGTH	0x100000
  ...
  iommu_get_msi_cookie():
	cookie->msi_iova = MSI_IOVA_BASE;
  ...
  iommu_dma_prepare_msi(its_pa):
	domain = iommu_get_domain_for_dev(dev);
	iommu_dma_get_msi_page(its_pa, domain):
		cookie = domain->iova_cookie;
		iova = iommu_dma_alloc_iova():
			return cookie->msi_iova - size;
		iommu_map(iova, its_pa, ...);
  [stage-1 guest level]
  // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
  ...
  iommu_create_device_direct_mappings():
	iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE, len=MSI_IOVA_LENGTH);
This solution calling iommu_get_domain_for_dev() needs the device to get
attached to a host-level iommu_domain that has the msi_cookie.
On the other hand, IOMMUFD designs two iommu_domain objects to represent
the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a stage-2
domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
attached to the stage-1 domain representing a guest-level IO page table,
or a Context Descriptor Table in SMMU's term.
This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
return the correct domain pointer in iommu_dma_prepare_msi().
Add an iommu_get_unmanaged_domain helper to allow drivers to return the
correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
the msi_cookie. Keep it in the iommu-priv header for internal use only.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/dma-iommu.c  |  5 +++--
 drivers/iommu/iommu-priv.h | 15 +++++++++++++++
 include/linux/iommu.h      |  2 ++
 3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 99b2646cb5c7..6b0409d0ff85 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -31,6 +31,7 @@
 #include <linux/vmalloc.h>
 
 #include "dma-iommu.h"
+#include "iommu-priv.h"
 
 struct iommu_dma_msi_page {
 	struct list_head	list;
@@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
 {
 	struct device *dev = msi_desc_to_dev(desc);
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
 	struct iommu_dma_msi_page *msi_page;
 	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
 
@@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
 void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
 {
 	struct device *dev = msi_desc_to_dev(desc);
-	const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	const struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
 	const struct iommu_dma_msi_page *msi_page;
 
 	msi_page = msi_desc_get_iommu_cookie(desc);
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index a6e694f59f64..da8044da9ad8 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -15,6 +15,21 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
 	return dev->iommu->iommu_dev->ops;
 }
 
+static inline struct iommu_domain *iommu_get_unmanaged_domain(struct device *dev)
+{
+	const struct iommu_ops *ops;
+
+	if (!dev->iommu || !dev->iommu->iommu_dev)
+		goto attached_domain;
+
+	ops = dev_iommu_ops(dev);
+	if (ops->get_unmanaged_domain)
+		return ops->get_unmanaged_domain(dev);
+
+attached_domain:
+	return iommu_get_domain_for_dev(dev);
+}
+
 int iommu_group_replace_domain(struct iommu_group *group,
 			       struct iommu_domain *new_domain);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 080278c8154d..76c65cc4fc15 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -275,6 +275,8 @@ struct iommu_ops {
 						  struct iommu_domain *parent,
 						  const void *user_data);
 
+	struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
+
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
 	void (*probe_finalize)(struct device *dev);
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 10:53 ` [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper Nicolin Chen
@ 2023-03-09 12:51   ` Robin Murphy
  2023-03-09 14:19     ` Jason Gunthorpe
  2023-03-10  8:41     ` Eric Auger
  2023-03-10 10:14   ` Eric Auger
  1 sibling, 2 replies; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 12:51 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> The nature of ITS virtualization on ARM is done via hypercalls, so kernel
> handles all IOVA mappings for the MSI doorbell in iommu_dma_prepare_msi()
> and iommu_dma_compose_msi_msg(). The current virtualization solution with
> a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
> guest-level IO page table via a RMR region in guest-level IORT, aligning
> with an IOVA region that's predefined and mapped in the host kernel:
> 
>    [stage-2 host level]
>    #define MSI_IOVA_BASE		0x8000000
>    #define MSI_IOVA_LENGTH	0x100000
>    ...
>    iommu_get_msi_cookie():
> 	cookie->msi_iova = MSI_IOVA_BASE;
>    ...
>    iommu_dma_prepare_msi(its_pa):
> 	domain = iommu_get_domain_for_dev(dev);
> 	iommu_dma_get_msi_page(its_pa, domain):
> 		cookie = domain->iova_cookie;
> 		iova = iommu_dma_alloc_iova():
> 			return cookie->msi_iova - size;
> 		iommu_map(iova, its_pa, ...);
> 
>    [stage-1 guest level]
>    // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
>    ...
>    iommu_create_device_direct_mappings():
> 	iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE, len=MSI_IOVA_LENGTH);
> 
> This solution calling iommu_get_domain_for_dev() needs the device to get
> attached to a host-level iommu_domain that has the msi_cookie.
> 
> On the other hand, IOMMUFD designs two iommu_domain objects to represent
> the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a stage-2
> domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
> attached to the stage-1 domain representing a guest-level IO page table,
> or a Context Descriptor Table in SMMU's term.
> 
> This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
> return the correct domain pointer in iommu_dma_prepare_msi().
> 
> Add an iommu_get_unmanaged_domain helper to allow drivers to return the
> correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
> the msi_cookie. Keep it in the iommu-priv header for internal use only.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/dma-iommu.c  |  5 +++--
>   drivers/iommu/iommu-priv.h | 15 +++++++++++++++
>   include/linux/iommu.h      |  2 ++
>   3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 99b2646cb5c7..6b0409d0ff85 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -31,6 +31,7 @@
>   #include <linux/vmalloc.h>
>   
>   #include "dma-iommu.h"
> +#include "iommu-priv.h"
>   
>   struct iommu_dma_msi_page {
>   	struct list_head	list;
> @@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>   {
>   	struct device *dev = msi_desc_to_dev(desc);
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
This still doesn't make sense - most of the time this will be expected 
to return the default DMA/identity domain if that's what the device is 
currently using. We can't know whether the current domain is managed or 
not until we look at it.
Just like every other caller of iommu_get_domain_for_dev(), what we want 
here is the current kernel-owned domain that we can inspect and maybe do 
standard IOMMU API things with. Why can't iommu_get_domain_for_dev() 
simply maintain that established usage model and return the kernel-owned 
s2_domain from a nested domain automatically? No IOMMU API user expects 
or needs it to return anything else (and IOMMUFD should certainly not be 
losing track of a nested domain within its own higher-level abstractions 
and needing to fall back on iommu_get_domain_for_dev()), so I really 
don't see a valid reason to overcomplicate things.
Please note I stress "valid" since I'm not buying arbitrarily made-up 
conceptual purity arguments. A nested domain cannot be the "one true 
domain" that is an opaque combination of S1+S2; the IOMMU API view has 
to be more like the device is attached to both the nested domain and the 
parent stage 2 domain somewhat in parallel. Even when nesting is active, 
the S2 domain still exists as a domain in its own right, and still needs 
to be visible and operated on as such, for instance if memory is 
hotplugged in or out of the VM.
TBH I'd also move the s2_domain pointer into the iommu_domain itself, 
since it's going to be a common feature for all nesting implementations, 
thus there seems little need to indirect lookups through the drivers at all.
Thanks,
Robin.
>   	struct iommu_dma_msi_page *msi_page;
>   	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
>   
> @@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>   void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>   {
>   	struct device *dev = msi_desc_to_dev(desc);
> -	const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	const struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>   	const struct iommu_dma_msi_page *msi_page;
>   
>   	msi_page = msi_desc_get_iommu_cookie(desc);
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index a6e694f59f64..da8044da9ad8 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -15,6 +15,21 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
>   	return dev->iommu->iommu_dev->ops;
>   }
>   
> +static inline struct iommu_domain *iommu_get_unmanaged_domain(struct device *dev)
> +{
> +	const struct iommu_ops *ops;
> +
> +	if (!dev->iommu || !dev->iommu->iommu_dev)
> +		goto attached_domain;
> +
> +	ops = dev_iommu_ops(dev);
> +	if (ops->get_unmanaged_domain)
> +		return ops->get_unmanaged_domain(dev);
> +
> +attached_domain:
> +	return iommu_get_domain_for_dev(dev);
> +}
> +
>   int iommu_group_replace_domain(struct iommu_group *group,
>   			       struct iommu_domain *new_domain);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 080278c8154d..76c65cc4fc15 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -275,6 +275,8 @@ struct iommu_ops {
>   						  struct iommu_domain *parent,
>   						  const void *user_data);
>   
> +	struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
> +
>   	struct iommu_device *(*probe_device)(struct device *dev);
>   	void (*release_device)(struct device *dev);
>   	void (*probe_finalize)(struct device *dev);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 12:51   ` Robin Murphy
@ 2023-03-09 14:19     ` Jason Gunthorpe
  2023-03-09 19:04       ` Robin Murphy
  2023-03-10  8:41     ` Eric Auger
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 14:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 12:51:20PM +0000, Robin Murphy wrote:
> Please note I stress "valid" since I'm not buying arbitrarily made-up
> conceptual purity arguments. A nested domain cannot be the "one true domain"
> that is an opaque combination of S1+S2; the IOMMU API view has to be more
> like the device is attached to both the nested domain and the parent stage 2
> domain somewhat in parallel.
I strongly disagree with this.
The design we have from the core perspective is an opaque domain that
is a hidden combination of S1/S2 inside the driver. We do not want to
change the basic design of the iommu core: there is only one domain
attached to a device/group at a time.
This patch should be seen as a temporary hack to allow the ARM ITS
stuff to hobble on a little longer. We already know that iommufd use
cases are incompatible with the design and we need to fix it. The
fixed solution must have iommufd install the ITS pages at domain
allocation time and so it will not need these APIs at all. This
tempoary code should not dictate the overall design of the iommu core.
If we belive exposing the S1/S2 relationships through the iommu core
is necessary for another legitimate purpose I would like to hear about
it. In my opinion using APIs to peek into these details is far more
likely to be buggy and long term I prefer to block the ability to
learn the S2 externally from iommufd completely.
Thus the overall design of the iommu core APIs is not being changed.
The core API design follows this logic with and without nesting:
   iommu_attach_device(domin);
   WARN_ON(domain != iommu_get_domain_for_dev());
The hack in this patch gets its own special single-use APIs so we can
purge them once they are not needed and do not confusingly contaminate
the whole design. For this reason the ops call back should only be
implemented by SMMUv3.
> Even when nesting is active, the S2 domain still exists as a domain
> in its own right, and still needs to be visible and operated on as
> such, for instance if memory is hotplugged in or out of the VM.
It exists in iommufd and iommufd will operate it. This is not a
problem.
iommufd is not using a dual attach model.
The S2 is provided to the S1's domain allocation function as creation
data. The returned S1 domain opaquely embeds the S2. The embedded S2
cannot be changed once the S1 domain is allocated.
Attaching the S1 always causes the embedded S2 to be used too - they
are not separable so we don't have APIs talking about about
"attaching" the S2.
Regards,
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 14:19     ` Jason Gunthorpe
@ 2023-03-09 19:04       ` Robin Murphy
  2023-03-10  0:23         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 19:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 2023-03-09 14:19, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:51:20PM +0000, Robin Murphy wrote:
> 
>> Please note I stress "valid" since I'm not buying arbitrarily made-up
>> conceptual purity arguments. A nested domain cannot be the "one true domain"
>> that is an opaque combination of S1+S2; the IOMMU API view has to be more
>> like the device is attached to both the nested domain and the parent stage 2
>> domain somewhat in parallel.
> 
> I strongly disagree with this.
> 
> The design we have from the core perspective is an opaque domain that
> is a hidden combination of S1/S2 inside the driver. We do not want to
> change the basic design of the iommu core: there is only one domain
> attached to a device/group at a time.
> 
> This patch should be seen as a temporary hack to allow the ARM ITS
> stuff to hobble on a little longer. We already know that iommufd use
> cases are incompatible with the design and we need to fix it. The
> fixed solution must have iommufd install the ITS pages at domain
> allocation time and so it will not need these APIs at all. This
> tempoary code should not dictate the overall design of the iommu core.
> 
> If we belive exposing the S1/S2 relationships through the iommu core
> is necessary for another legitimate purpose I would like to hear about
> it. In my opinion using APIs to peek into these details is far more
> likely to be buggy and long term I prefer to block the ability to
> learn the S2 externally from iommufd completely.
> 
> Thus the overall design of the iommu core APIs is not being changed.
> The core API design follows this logic with and without nesting:
>     iommu_attach_device(domin);
>     WARN_ON(domain != iommu_get_domain_for_dev());
That is indeed one of many conditions that are true today, but the facts 
are that nothing makes that specific assertion, nothing should ever 
*need* to make that specific assertion, and any driver sufficiently dumb 
to not bother keeping track of its own domain and relying on 
iommu_get_domain_for_dev() to retrieve it should most definitely not be 
allowed anywhere near nesting.
The overall design of the iommu core APIs *is* being changed, because 
the current design is also that iommu_get_domain_for_dev() always 
returns the correct domain that iommu_dma_prepare_msi() needs, which 
breaks with nesting. You are literally insisting on changing this core 
API, to work around intentionally breaking an existing behaviour which 
could easily be preserved (far less invasively), for the sake of 
preserving some other theoretical behaviour that IS NOT USEFUL.
The overall design of the iommu core APIs *is* being changed, because 
the core API design also follows this logic for any domain type:
	domain = iommu_get_domain_for_dev();
	phys = iommu_iova_to_phys(domain, iova);
	//phys meaningfully represents whether iova was valid
Yes, even blocking and identity domains, because drivers ACTUALLY DO 
THIS. I'm not sure there even is a single correct thing that nesting 
domains could do to satisfy all the possible expectations that callers 
of iommu_iova_to_phys() may have. However if the grand design says it's 
not OK for iommu_get_domain_for_dev() to return what 
iommu_dma_prepare_msi() needs even though nobody else should ever be 
passing a nesting domain to it, then it's also not OK for 
iommu_iova_to_phys() to crash or lie and return 0 when a valid 
translation (by some notion) exists, even though nobody should ever pass 
a nesting domain in there either.
Forgive me for getting wound up, but I'm a pragmatist and my tolerance 
for ignoring reality is low.
> The hack in this patch gets its own special single-use APIs so we can
> purge them once they are not needed and do not confusingly contaminate
> the whole design. For this reason the ops call back should only be
> implemented by SMMUv3.
> 
>> Even when nesting is active, the S2 domain still exists as a domain
>> in its own right, and still needs to be visible and operated on as
>> such, for instance if memory is hotplugged in or out of the VM.
> 
> It exists in iommufd and iommufd will operate it. This is not a
> problem.
> 
> iommufd is not using a dual attach model.
> 
> The S2 is provided to the S1's domain allocation function as creation
> data. The returned S1 domain opaquely embeds the S2. The embedded S2
> cannot be changed once the S1 domain is allocated.
> 
> Attaching the S1 always causes the embedded S2 to be used too - they
> are not separable so we don't have APIs talking about about
> "attaching" the S2.
Yes, that is one way of viewing things, but it's far from the only way. 
The typical lifecycle will involve starting the VM with S2 alone, then 
enabling nesting later - we can view that as allocating a nested domain 
based on S2, then "replacing" S2 with nested, but we could equally view 
it as just attaching the nested domain on top of the existing S2, like 
pushing to a stack (I do agree that trying to model it as two completely 
independent and orthogonal attachments would not be sensible). It's 
really just semantics of how we prefer to describe things, and whether 
the s2_domain pointer is passed up-front to domain_alloc or later to 
attach_dev.
The opaque nested domain looks clean in some ways, but on the other hand 
it also heavily obfuscates how translations may be shared between one 
nested domain and other nested and non-nested domains, such that 
changing mappings in one may affect others. This is a major and 
potentially surprising paradigm shift away from the current notion that 
all domains represent individual isolated address spaces, so abstracting 
it more openly within the core API internals would be clearer about 
what's really going on. And putting common concepts at common levels of 
abstraction makes things simpler and is why we have core API code at all.
Frankly it's not like we don't already have various API-internal stuff 
in struct iommu_domain that nobody external should ever be looking at, 
but if that angst remains overwhelming then I can't imagine it being all 
that much work to move the definition to iommu-priv.h - AFAICS it 
shouldn't need much more than some helpers for the handful of 
iommu_get_domain_for_dev() users currently inspecting the type, 
pgsize_bitmap, or geometry - which would ultimately be a good deal 
neater and more productive than adding yet more special-case ops that 
every driver is just going to implement identically.
And to even imagine the future possibility of things like S2 pagetable 
sharing with KVM, or unpinned S2 with ATS/PRI or SMMU stalls, by 
tunnelling more private nesting interfaces directly between IOMMUFD and 
individual drivers without some degree of common IOMMU API abstraction 
in between... no. Just... no.
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 19:04       ` Robin Murphy
@ 2023-03-10  0:23         ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10  0:23 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 07:04:19PM +0000, Robin Murphy wrote:
> You are literally insisting on changing this core API, to work
> around intentionally breaking an existing behaviour which could easily be
> preserved (far less invasively), for the sake of preserving some other
> theoretical behaviour that IS NOT USEFUL.
No! I am insisting that the core API *make logical sense* and not be a
random mismash of things that just happen to work. I'm not optimizing
for LOC here.
The end goal is to remove access to the S2 and not have this API at
all. It makes no sense to have a temporary step that makes the S2 even
more available then go and undo that.
This is a 'code smell' annotation that it needs to use a special API
because this code has a direct special requirement based on ARM's
definition to work on the S2 of the nest.
The goal for iommufd is to make ITS page mapping to have already
happened at domain allocation time. It cannot be deferred until
irq_domain_ops.alloc() time. Obviously once we do that we don't have a
need to obtain the S2 from a nest domain.
> The overall design of the iommu core APIs *is* being changed, because the
> core API design also follows this logic for any domain type:
> 
> 	domain = iommu_get_domain_for_dev();
> 	phys = iommu_iova_to_phys(domain, iova);
> 	//phys meaningfully represents whether iova was valid
If a nesting domain is used then this really should not translate the
IOVA through the S2.
IMHO the proper answer for a nesting domain is the same as for an SVA
domain - EOPNOTSUP.
But more importantly it is illegal to call this API unless the caller
already has some range lock on the IOVA being queried. It is
impossible to obtain a range lock on a NEST or SVA, so this is not
allowed to be called on those domains.
Currently it looks like it crashes if something calls it with an
SVA/NEST to drive this point home :)
> Forgive me for getting wound up, but I'm a pragmatist and my tolerance for
> ignoring reality is low.
Well, I would like this settled and it seems bike-shedding to me.
> > Attaching the S1 always causes the embedded S2 to be used too - they
> > are not separable so we don't have APIs talking about about
> > "attaching" the S2.
> 
> Yes, that is one way of viewing things, but it's far from the only
> way.
Sure, but it is the design we are going with. It is the design that
was built into iommufd from day 1.
If there is a good reason to change it, I'm open to hear it, but we've
gone through a lot of use cases now and it is holding up well.
> typical lifecycle will involve starting the VM with S2 alone, then enabling
> nesting later - we can view that as allocating a nested domain based on S2,
> then "replacing" S2 with nested, but we could equally view it as just
> attaching the nested domain on top of the existing S2, like pushing to a
> stack (I do agree that trying to model it as two completely independent and
> orthogonal attachments would not be sensible). It's really just semantics of
> how we prefer to describe things, and whether the s2_domain pointer is
> passed up-front to domain_alloc or later to attach_dev.
We settled on domain_alloc time for a pretty basic reason - it keeps
the invariant that the IOVA to Phys of an iommu_domain is
universal. Meaning IOVA to Phys always gives the same answer
regardless of what device is attached.
We directly disallow mixing a S1 with different S2's. I also don't
like the idea of 'first to reach attach_dev assigns the S2', partially
initialized iommu_domains have not worked well so far IMHO.
I think this makes it easier for the IOMMU to assign cache tags as the
the S1 iommu_domain always gives the same translation so it can
trivially be assigned to a cache tag.
Ie a basic Intel implementation can assign a DID to the S1. We know
that no matter what device the NEST iommu domain is used the DID is
the correct cache tag because of the universal translation rule.
ARM will assign a VMID to the S2, the S1 is actually a CD table handle
and has its own invalidation.
AFAICT AMD will assign a DID per-device to the S1, because they don't
have ASIDs in their PASID table :\
> The opaque nested domain looks clean in some ways, but on the other hand it
> also heavily obfuscates how translations may be shared between one nested
> domain and other nested and non-nested domains, such that changing mappings
> in one may affect others.
Well, it keeps the logic in iommufd which should be the only user of
this stuff.
If we develop a non-iommufd user then we can revist where the
abstractions should live, but for now iommufd is handling the thin
common abstraction.
> This is a major and potentially surprising
> paradigm shift away from the current notion that all domains represent
> individual isolated address spaces, so abstracting it more openly within the
> core API internals would be clearer about what's really going on. And
> putting common concepts at common levels of abstraction makes things simpler
> and is why we have core API code at all.
I'm all for more commen concepts, but I'm pragmatic here - I'd like to
see duplicated code in the drivers become unduplicated by the
abstraction. I'm not sure what this is in the S2 area, so far I
haven't noticed anything in the ARM and Intel patch series.
Without a need for S2 pointers in any code outside iommufd and the
driver I'd prefer to keep APIs out of there to prevent abuse.
> ultimately be a good deal neater and more productive than adding yet more
> special-case ops that every driver is just going to implement identically.
To be clear for this patch only SMMUv3 will implement this op because
only SMMUv3 has this ITS problem. Once we fix it the op will be
deleted.
It really is a hack because we are all too scared to fix this properly
right now :)
> And to even imagine the future possibility of things like S2 pagetable
> sharing with KVM, or unpinned S2 with ATS/PRI or SMMU stalls, by tunnelling
> more private nesting interfaces directly between IOMMUFD and individual
> drivers without some degree of common IOMMU API abstraction in between...
> no. Just... no.
I'm all for abstractions that remove duplicated code from drivers, I'm
just not sure right now what they will actualy be...
Like I say, I don't view this as duplicated code, this is hack for ARM
to temporarily break the architectural model of a hidden S2, not true
lasting design.
Thanks,
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 12:51   ` Robin Murphy
  2023-03-09 14:19     ` Jason Gunthorpe
@ 2023-03-10  8:41     ` Eric Auger
  2023-03-10 15:55       ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-10  8:41 UTC (permalink / raw)
  To: Robin Murphy, Nicolin Chen, jgg, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi,
On 3/9/23 13:51, Robin Murphy wrote:
> On 2023-03-09 10:53, Nicolin Chen wrote:
>> The nature of ITS virtualization on ARM is done via hypercalls, so
>> kernel
>> handles all IOVA mappings for the MSI doorbell in
>> iommu_dma_prepare_msi()
>> and iommu_dma_compose_msi_msg(). The current virtualization solution
>> with
>> a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
>> guest-level IO page table via a RMR region in guest-level IORT, aligning
>> with an IOVA region that's predefined and mapped in the host kernel:
>>
>>    [stage-2 host level]
>>    #define MSI_IOVA_BASE        0x8000000
>>    #define MSI_IOVA_LENGTH    0x100000
>>    ...
>>    iommu_get_msi_cookie():
>>     cookie->msi_iova = MSI_IOVA_BASE;
>>    ...
>>    iommu_dma_prepare_msi(its_pa):
>>     domain = iommu_get_domain_for_dev(dev);
>>     iommu_dma_get_msi_page(its_pa, domain):
>>         cookie = domain->iova_cookie;
>>         iova = iommu_dma_alloc_iova():
>>             return cookie->msi_iova - size;
>>         iommu_map(iova, its_pa, ...);
>>
>>    [stage-1 guest level]
>>    // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
>>    ...
>>    iommu_create_device_direct_mappings():
>>     iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE,
>> len=MSI_IOVA_LENGTH);
>>
>> This solution calling iommu_get_domain_for_dev() needs the device to get
>> attached to a host-level iommu_domain that has the msi_cookie.
>>
>> On the other hand, IOMMUFD designs two iommu_domain objects to represent
>> the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a
>> stage-2
>> domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
>> attached to the stage-1 domain representing a guest-level IO page table,
>> or a Context Descriptor Table in SMMU's term.
>>
>> This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
>> return the correct domain pointer in iommu_dma_prepare_msi().
>>
>> Add an iommu_get_unmanaged_domain helper to allow drivers to return the
>> correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
>> the msi_cookie. Keep it in the iommu-priv header for internal use only.
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> ---
>>   drivers/iommu/dma-iommu.c  |  5 +++--
>>   drivers/iommu/iommu-priv.h | 15 +++++++++++++++
>>   include/linux/iommu.h      |  2 ++
>>   3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 99b2646cb5c7..6b0409d0ff85 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -31,6 +31,7 @@
>>   #include <linux/vmalloc.h>
>>     #include "dma-iommu.h"
>> +#include "iommu-priv.h"
>>     struct iommu_dma_msi_page {
>>       struct list_head    list;
>> @@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page
>> *iommu_dma_get_msi_page(struct device *dev,
>>   int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>>   {
>>       struct device *dev = msi_desc_to_dev(desc);
>> -    struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +    struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>
> This still doesn't make sense - most of the time this will be expected
> to return the default DMA/identity domain if that's what the device is
> currently using. We can't know whether the current domain is managed
> or not until we look at it.
I tend to agree with Robin here. This was first introduced by
[PATCH v7 21/22] iommu/dma: Add support for mapping MSIs <https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/#r>
https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/
even before the support un VFIO use case which came later on. So using the "unmanaged" terminology sounds improper to me, at least.
Couldn't we use a parent/child terminology as used in the past in 
[RFC v2] /dev/iommu uAPI proposal <https://lore.kernel.org/all/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/#r>
This would still hold for the former use case.
Thanks
Eric
>
> Just like every other caller of iommu_get_domain_for_dev(), what we
> want here is the current kernel-owned domain that we can inspect and
> maybe do standard IOMMU API things with. Why can't
> iommu_get_domain_for_dev() simply maintain that established usage
> model and return the kernel-owned s2_domain from a nested domain
> automatically? No IOMMU API user expects or needs it to return
> anything else (and IOMMUFD should certainly not be losing track of a
> nested domain within its own higher-level abstractions and needing to
> fall back on iommu_get_domain_for_dev()), so I really don't see a
> valid reason to overcomplicate things.
>
> Please note I stress "valid" since I'm not buying arbitrarily made-up
> conceptual purity arguments. A nested domain cannot be the "one true
> domain" that is an opaque combination of S1+S2; the IOMMU API view has
> to be more like the device is attached to both the nested domain and
> the parent stage 2 domain somewhat in parallel. Even when nesting is
> active, the S2 domain still exists as a domain in its own right, and
> still needs to be visible and operated on as such, for instance if
> memory is hotplugged in or out of the VM.
>
> TBH I'd also move the s2_domain pointer into the iommu_domain itself,
> since it's going to be a common feature for all nesting
> implementations, thus there seems little need to indirect lookups
> through the drivers at all.
>
> Thanks,
> Robin.
>
>>       struct iommu_dma_msi_page *msi_page;
>>       static DEFINE_MUTEX(msi_prepare_lock); /* see below */
>>   @@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc
>> *desc, phys_addr_t msi_addr)
>>   void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct
>> msi_msg *msg)
>>   {
>>       struct device *dev = msi_desc_to_dev(desc);
>> -    const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +    const struct iommu_domain *domain =
>> iommu_get_unmanaged_domain(dev);
>>       const struct iommu_dma_msi_page *msi_page;
>>         msi_page = msi_desc_get_iommu_cookie(desc);
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index a6e694f59f64..da8044da9ad8 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -15,6 +15,21 @@ static inline const struct iommu_ops
>> *dev_iommu_ops(struct device *dev)
>>       return dev->iommu->iommu_dev->ops;
>>   }
>>   +static inline struct iommu_domain
>> *iommu_get_unmanaged_domain(struct device *dev)
>> +{
>> +    const struct iommu_ops *ops;
>> +
>> +    if (!dev->iommu || !dev->iommu->iommu_dev)
>> +        goto attached_domain;
>> +
>> +    ops = dev_iommu_ops(dev);
>> +    if (ops->get_unmanaged_domain)
>> +        return ops->get_unmanaged_domain(dev);
>> +
>> +attached_domain:
>> +    return iommu_get_domain_for_dev(dev);
>> +}
>> +
>>   int iommu_group_replace_domain(struct iommu_group *group,
>>                      struct iommu_domain *new_domain);
>>   diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 080278c8154d..76c65cc4fc15 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -275,6 +275,8 @@ struct iommu_ops {
>>                             struct iommu_domain *parent,
>>                             const void *user_data);
>>   +    struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
>> +
>>       struct iommu_device *(*probe_device)(struct device *dev);
>>       void (*release_device)(struct device *dev);
>>       void (*probe_finalize)(struct device *dev);
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10  8:41     ` Eric Auger
@ 2023-03-10 15:55       ` Jason Gunthorpe
  2023-03-16  1:21         ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 15:55 UTC (permalink / raw)
  To: Eric Auger
  Cc: Robin Murphy, Nicolin Chen, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 09:41:01AM +0100, Eric Auger wrote:
> I tend to agree with Robin here. This was first introduced by
> 
> [PATCH v7 21/22] iommu/dma: Add support for mapping MSIs <https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/#r>
> https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/
Presumably it had to use the iommu_get_domain_for_dev() instead of
iommu_get_dma_domain() to support ARM 32 arm-iommu. Ie it is poking
into the arm-iommu owned domain as well. VFIO just ended being the
same flow
> even before the support un VFIO use case which came later on. So
> using the "unmanaged" terminology sounds improper to me, at least.
> Couldn't we use a parent/child terminology as used in the past in
No objection to a better name...
Actually how about if we write it like this? Robin would you be
happier? I think it much more clearly explains why this function is
special within our single domain attachment model.
"get_unmanaged_msi_domain" seems like a much more narrowly specific to
the purpose name.
int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
{
	struct device *dev = msi_desc_to_dev(desc);
	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
	struct iommu_dma_msi_page *msi_page;
	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
	desc->iommu_cookie = NULL;
	/*
	 * This probably shouldn't happen as the ARM32 systems should only have
	 * NULL if arm-iommu has been disconnected during setup/destruction.
	 * Assume it is an identity domain.
	 */
	if (!domain)
		return 0;
	/* Caller is expected to use msi_addr for the page */
	if (domain->type == IOMMU_DOMAIN_IDENTITY)
		return 0;
	/*
	 * The current domain is some driver opaque thing. We assume the
	 * driver/user knows what it is doing regarding ARM ITS MSI pages and we
	 * want to try to install the page into some kind of kernel owned
	 * unmanaged domain. Eg for nesting this will install the ITS page into
	 * the S2 domain and then we assume that the S1 domain has independently
	 * made it mapped at the same address.
	 */
	// FIXME wrap in a function
	if (domain->type != IOMMU_DOMAIN_UNMANAGED &&
	    domain->ops->get_unmanged_msi_domain)
		domain = domain->ops->get_unmanged_msi_domain(domain);
	if (!domain || domain->type != IOMMU_DOMAIN_UNMANAGED)
		return -EINVAL;
	// ???
	if (!domain->iova_cookie)
		return 0;
	/*
	 * In fact the whole prepare operation should already be serialised by
	 * irq_domain_mutex further up the callchain, but that's pretty subtle
	 * on its own, so consider this locking as failsafe documentation...
	 */
	mutex_lock(&msi_prepare_lock);
	msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
	mutex_unlock(&msi_prepare_lock);
	msi_desc_set_iommu_cookie(desc, msi_page);
	if (!msi_page)
		return -ENOMEM;
	return 0;
}
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 15:55       ` Jason Gunthorpe
@ 2023-03-16  1:21         ` Nicolin Chen
  2023-03-16 18:42           ` Robin Murphy
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16  1:21 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: Eric Auger, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
Hi Robin,
How do you think about Jason's proposal below? I'd like to see
us come to an agreement on an acceptable solution...
Thanks
Nic
On Fri, Mar 10, 2023 at 11:55:07AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 10, 2023 at 09:41:01AM +0100, Eric Auger wrote:
> 
> > I tend to agree with Robin here. This was first introduced by
> > 
> > [PATCH v7 21/22] iommu/dma: Add support for mapping MSIs <https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/#r>
> > https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/
> 
> Presumably it had to use the iommu_get_domain_for_dev() instead of
> iommu_get_dma_domain() to support ARM 32 arm-iommu. Ie it is poking
> into the arm-iommu owned domain as well. VFIO just ended being the
> same flow
> 
> > even before the support un VFIO use case which came later on. So
> > using the "unmanaged" terminology sounds improper to me, at least.
> > Couldn't we use a parent/child terminology as used in the past in
> 
> No objection to a better name...
> 
> Actually how about if we write it like this? Robin would you be
> happier? I think it much more clearly explains why this function is
> special within our single domain attachment model.
> 
> "get_unmanaged_msi_domain" seems like a much more narrowly specific to
> the purpose name.
> 
> int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
> {
> 	struct device *dev = msi_desc_to_dev(desc);
> 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> 	struct iommu_dma_msi_page *msi_page;
> 	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
> 
> 	desc->iommu_cookie = NULL;
> 
> 	/*
> 	 * This probably shouldn't happen as the ARM32 systems should only have
> 	 * NULL if arm-iommu has been disconnected during setup/destruction.
> 	 * Assume it is an identity domain.
> 	 */
> 	if (!domain)
> 		return 0;
> 
> 	/* Caller is expected to use msi_addr for the page */
> 	if (domain->type == IOMMU_DOMAIN_IDENTITY)
> 		return 0;
> 
> 	/*
> 	 * The current domain is some driver opaque thing. We assume the
> 	 * driver/user knows what it is doing regarding ARM ITS MSI pages and we
> 	 * want to try to install the page into some kind of kernel owned
> 	 * unmanaged domain. Eg for nesting this will install the ITS page into
> 	 * the S2 domain and then we assume that the S1 domain has independently
> 	 * made it mapped at the same address.
> 	 */
> 	// FIXME wrap in a function
> 	if (domain->type != IOMMU_DOMAIN_UNMANAGED &&
> 	    domain->ops->get_unmanged_msi_domain)
> 		domain = domain->ops->get_unmanged_msi_domain(domain);
> 
> 	if (!domain || domain->type != IOMMU_DOMAIN_UNMANAGED)
> 		return -EINVAL;
> 
> 	// ???
> 	if (!domain->iova_cookie)
> 		return 0;
> 
> 	/*
> 	 * In fact the whole prepare operation should already be serialised by
> 	 * irq_domain_mutex further up the callchain, but that's pretty subtle
> 	 * on its own, so consider this locking as failsafe documentation...
> 	 */
> 	mutex_lock(&msi_prepare_lock);
> 	msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
> 	mutex_unlock(&msi_prepare_lock);
> 
> 	msi_desc_set_iommu_cookie(desc, msi_page);
> 
> 	if (!msi_page)
> 		return -ENOMEM;
> 	return 0;
> }
> 
> Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-16  1:21         ` Nicolin Chen
@ 2023-03-16 18:42           ` Robin Murphy
  2023-03-16 20:01             ` Nicolin Chen
  2023-03-20 12:51             ` Jason Gunthorpe
  0 siblings, 2 replies; 165+ messages in thread
From: Robin Murphy @ 2023-03-16 18:42 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Eric Auger, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 16/03/2023 1:21 am, Nicolin Chen wrote:
> Hi Robin,
> 
> How do you think about Jason's proposal below? I'd like to see
> us come to an agreement on an acceptable solution...
I think it's so thoroughly broken that I suspect Cunningham's law might
be at play, but fine, you win :) Hopefully it's sufficiently obvious how
the other pieces would fit around the patch below. FWIW I'd still prefer
a generic domain->s2_domain pointer rather than any op at all, but I'm
happy enough with this compromise.
Thanks,
Robin.
----->8-----
Subject: [PATCH] iommu/dma: Support MSIs through nested domains
Currently, iommu-dma is the only place outside of IOMMUFD and drivers
which might need to be aware of the stage 2 domain encapsulated within
a nested domain. This would be in the legacy-VFIO-style case where we're
using host-managed MSIs with an identity mapping at stage 1, where it is
the underlying stage 2 domain which owns an MSI cookie and holds the
corresponding dynamic mappings. Hook up the new op to resolve what we
need from a nested domain.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/iommu/dma-iommu.c | 18 ++++++++++++++++--
  1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 99b2646cb5c7..66b0d5fa49f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1642,6 +1642,20 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
  	return NULL;
  }
  
+/*
+ * Nested domains may not have an MSI cookie or accept mappings, but they may
+ * be related to a domain which does, so we let them tell us what they need.
+ */
+static struct iommu_domain *iommu_dma_get_msi_mapping_domain(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain && domain->type == IOMMU_DOMAIN_NESTED &&
+	    domain->ops->get_msi_mapping_domain)
+		domain = domain->ops->get_msi_mapping_domain(domain);
+	return domain;
+}
+
  /**
   * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
   * @desc: MSI descriptor, will store the MSI page
@@ -1652,7 +1666,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
  int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
  {
  	struct device *dev = msi_desc_to_dev(desc);
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
  	struct iommu_dma_msi_page *msi_page;
  	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
  
@@ -1685,7 +1699,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
  void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
  {
  	struct device *dev = msi_desc_to_dev(desc);
-	const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	const struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
  	const struct iommu_dma_msi_page *msi_page;
  
  	msi_page = msi_desc_get_iommu_cookie(desc);
-- 
2.39.2.101.g768bb238c484.dirty
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-16 18:42           ` Robin Murphy
@ 2023-03-16 20:01             ` Nicolin Chen
  2023-03-20 12:51             ` Jason Gunthorpe
  1 sibling, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16 20:01 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, Eric Auger, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 16, 2023 at 06:42:07PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 16/03/2023 1:21 am, Nicolin Chen wrote:
> > Hi Robin,
> > 
> > How do you think about Jason's proposal below? I'd like to see
> > us come to an agreement on an acceptable solution...
> 
> I think it's so thoroughly broken that I suspect Cunningham's law might
> be at play, but fine, you win :) Hopefully it's sufficiently obvious how
> the other pieces would fit around the patch below. FWIW I'd still prefer
> a generic domain->s2_domain pointer rather than any op at all, but I'm
> happy enough with this compromise.
Oh, I appreciate that! Looks like we can move on with a v2 :)
I will include this patch replacing mine in the next version.
Thanks!
Nic
> ----->8-----
> Subject: [PATCH] iommu/dma: Support MSIs through nested domains
> 
> Currently, iommu-dma is the only place outside of IOMMUFD and drivers
> which might need to be aware of the stage 2 domain encapsulated within
> a nested domain. This would be in the legacy-VFIO-style case where we're
> using host-managed MSIs with an identity mapping at stage 1, where it is
> the underlying stage 2 domain which owns an MSI cookie and holds the
> corresponding dynamic mappings. Hook up the new op to resolve what we
> need from a nested domain.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/dma-iommu.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 99b2646cb5c7..66b0d5fa49f8 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1642,6 +1642,20 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>        return NULL;
>  }
> 
> +/*
> + * Nested domains may not have an MSI cookie or accept mappings, but they may
> + * be related to a domain which does, so we let them tell us what they need.
> + */
> +static struct iommu_domain *iommu_dma_get_msi_mapping_domain(struct device *dev)
> +{
> +       struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +       if (domain && domain->type == IOMMU_DOMAIN_NESTED &&
> +           domain->ops->get_msi_mapping_domain)
> +               domain = domain->ops->get_msi_mapping_domain(domain);
> +       return domain;
> +}
> +
>  /**
>   * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
>   * @desc: MSI descriptor, will store the MSI page
> @@ -1652,7 +1666,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>  int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>  {
>        struct device *dev = msi_desc_to_dev(desc);
> -       struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +       struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
>        struct iommu_dma_msi_page *msi_page;
>        static DEFINE_MUTEX(msi_prepare_lock); /* see below */
> 
> @@ -1685,7 +1699,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>  void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>  {
>        struct device *dev = msi_desc_to_dev(desc);
> -       const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +       const struct iommu_domain *domain = iommu_dma_get_msi_mapping_domain(dev);
>        const struct iommu_dma_msi_page *msi_page;
> 
>        msi_page = msi_desc_get_iommu_cookie(desc);
> --
> 2.39.2.101.g768bb238c484.dirty
> 
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-16 18:42           ` Robin Murphy
  2023-03-16 20:01             ` Nicolin Chen
@ 2023-03-20 12:51             ` Jason Gunthorpe
  1 sibling, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 12:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, Eric Auger, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 16, 2023 at 06:42:07PM +0000, Robin Murphy wrote:
> On 16/03/2023 1:21 am, Nicolin Chen wrote:
> > Hi Robin,
> > 
> > How do you think about Jason's proposal below? I'd like to see
> > us come to an agreement on an acceptable solution...
> 
> I think it's so thoroughly broken that I suspect Cunningham's law might
> be at play, but fine, you win :)
Not to belabor this, but what was wrong with it?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-09 10:53 ` [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper Nicolin Chen
  2023-03-09 12:51   ` Robin Murphy
@ 2023-03-10 10:14   ` Eric Auger
  2023-03-10 15:33     ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-10 10:14 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi Nicolin,
On 3/9/23 11:53, Nicolin Chen wrote:
> The nature of ITS virtualization on ARM is done via hypercalls, so kernel
> handles all IOVA mappings for the MSI doorbell in iommu_dma_prepare_msi()
> and iommu_dma_compose_msi_msg(). The current virtualization solution with
> a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
Note that if we still intend to use that trick there is a known issue at
kernel side that needs to be fixed.
ARM DEN 0049E.b IORT specification mandates that when
RMRs are present, the OS must preserve PCIe configuration
performed by the boot FW.
As discussed in the past, enforcing this causes issue with PCI devices
with IO ports. See qemu commit
40c3472a29c9 ("Revert "acpi/gpex: Inform os to keep firmware resource
map"). This seemed to require a fix at kernel level. I am not sure this
fix has been worked on.
Thanks
Eric
> guest-level IO page table via a RMR region in guest-level IORT, aligning
> with an IOVA region that's predefined and mapped in the host kernel:
>
>   [stage-2 host level]
>   #define MSI_IOVA_BASE		0x8000000
>   #define MSI_IOVA_LENGTH	0x100000
>   ...
>   iommu_get_msi_cookie():
> 	cookie->msi_iova = MSI_IOVA_BASE;
>   ...
>   iommu_dma_prepare_msi(its_pa):
> 	domain = iommu_get_domain_for_dev(dev);
> 	iommu_dma_get_msi_page(its_pa, domain):
> 		cookie = domain->iova_cookie;
> 		iova = iommu_dma_alloc_iova():
> 			return cookie->msi_iova - size;
> 		iommu_map(iova, its_pa, ...);
>
>   [stage-1 guest level]
>   // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
>   ...
>   iommu_create_device_direct_mappings():
> 	iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE, len=MSI_IOVA_LENGTH);
>
> This solution calling iommu_get_domain_for_dev() needs the device to get
> attached to a host-level iommu_domain that has the msi_cookie.
>
> On the other hand, IOMMUFD designs two iommu_domain objects to represent
> the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a stage-2
> domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
> attached to the stage-1 domain representing a guest-level IO page table,
> or a Context Descriptor Table in SMMU's term.
>
> This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
> return the correct domain pointer in iommu_dma_prepare_msi().
>
> Add an iommu_get_unmanaged_domain helper to allow drivers to return the
> correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
> the msi_cookie. Keep it in the iommu-priv header for internal use only.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/dma-iommu.c  |  5 +++--
>  drivers/iommu/iommu-priv.h | 15 +++++++++++++++
>  include/linux/iommu.h      |  2 ++
>  3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 99b2646cb5c7..6b0409d0ff85 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -31,6 +31,7 @@
>  #include <linux/vmalloc.h>
>  
>  #include "dma-iommu.h"
> +#include "iommu-priv.h"
>  
>  struct iommu_dma_msi_page {
>  	struct list_head	list;
> @@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>  int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>  {
>  	struct device *dev = msi_desc_to_dev(desc);
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>  	struct iommu_dma_msi_page *msi_page;
>  	static DEFINE_MUTEX(msi_prepare_lock); /* see below */
>  
> @@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>  void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>  {
>  	struct device *dev = msi_desc_to_dev(desc);
> -	const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	const struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>  	const struct iommu_dma_msi_page *msi_page;
>  
>  	msi_page = msi_desc_get_iommu_cookie(desc);
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index a6e694f59f64..da8044da9ad8 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -15,6 +15,21 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
>  	return dev->iommu->iommu_dev->ops;
>  }
>  
> +static inline struct iommu_domain *iommu_get_unmanaged_domain(struct device *dev)
> +{
> +	const struct iommu_ops *ops;
> +
> +	if (!dev->iommu || !dev->iommu->iommu_dev)
> +		goto attached_domain;
> +
> +	ops = dev_iommu_ops(dev);
> +	if (ops->get_unmanaged_domain)
> +		return ops->get_unmanaged_domain(dev);
> +
> +attached_domain:
> +	return iommu_get_domain_for_dev(dev);
> +}
> +
>  int iommu_group_replace_domain(struct iommu_group *group,
>  			       struct iommu_domain *new_domain);
>  
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 080278c8154d..76c65cc4fc15 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -275,6 +275,8 @@ struct iommu_ops {
>  						  struct iommu_domain *parent,
>  						  const void *user_data);
>  
> +	struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
> +
>  	struct iommu_device *(*probe_device)(struct device *dev);
>  	void (*release_device)(struct device *dev);
>  	void (*probe_finalize)(struct device *dev);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 10:14   ` Eric Auger
@ 2023-03-10 15:33     ` Jason Gunthorpe
  2023-03-10 15:44       ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 15:33 UTC (permalink / raw)
  To: Eric Auger
  Cc: Nicolin Chen, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 11:14:59AM +0100, Eric Auger wrote:
> Hi Nicolin,
> 
> On 3/9/23 11:53, Nicolin Chen wrote:
> > The nature of ITS virtualization on ARM is done via hypercalls, so kernel
> > handles all IOVA mappings for the MSI doorbell in iommu_dma_prepare_msi()
> > and iommu_dma_compose_msi_msg(). The current virtualization solution with
> > a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
> Note that if we still intend to use that trick there is a known issue at
> kernel side that needs to be fixed.
> 
> ARM DEN 0049E.b IORT specification mandates that when
> RMRs are present, the OS must preserve PCIe configuration
> performed by the boot FW.
This limitation doesn't seem necessary for this MSI stuff?
What is it for?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 15:33     ` Jason Gunthorpe
@ 2023-03-10 15:44       ` Shameerali Kolothum Thodi
  2023-03-10 15:56         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-10 15:44 UTC (permalink / raw)
  To: Jason Gunthorpe, Eric Auger
  Cc: Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 10 March 2023 15:33
> To: Eric Auger <eric.auger@redhat.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>; robin.murphy@arm.com;
> will@kernel.org; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; jean-philippe@linaro.org;
> linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> helper
> 
> On Fri, Mar 10, 2023 at 11:14:59AM +0100, Eric Auger wrote:
> > Hi Nicolin,
> >
> > On 3/9/23 11:53, Nicolin Chen wrote:
> > > The nature of ITS virtualization on ARM is done via hypercalls, so
> > > kernel handles all IOVA mappings for the MSI doorbell in
> > > iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). The
> current
> > > virtualization solution with a 2-stage nested translation setup is
> > > to do 1:1 IOVA mappings at stage-1
> > Note that if we still intend to use that trick there is a known issue
> > at kernel side that needs to be fixed.
> >
> > ARM DEN 0049E.b IORT specification mandates that when RMRs are
> > present, the OS must preserve PCIe configuration performed by the boot
> > FW.
> 
> This limitation doesn't seem necessary for this MSI stuff?
> 
> What is it for?
That is to make sure the Stream Ids specified in RMR are still valid and is not being
reassigned by OS. The kernel checks for this(iort_rmr_has_dev()),
https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameerali.kolothum.thodi@huawei.com/
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 15:44       ` Shameerali Kolothum Thodi
@ 2023-03-10 15:56         ` Jason Gunthorpe
  2023-03-10 16:07           ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 15:56 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Eric Auger, Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
On Fri, Mar 10, 2023 at 03:44:02PM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > Sent: 10 March 2023 15:33
> > To: Eric Auger <eric.auger@redhat.com>
> > Cc: Nicolin Chen <nicolinc@nvidia.com>; robin.murphy@arm.com;
> > will@kernel.org; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> > joro@8bytes.org; Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; jean-philippe@linaro.org;
> > linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> > helper
> > 
> > On Fri, Mar 10, 2023 at 11:14:59AM +0100, Eric Auger wrote:
> > > Hi Nicolin,
> > >
> > > On 3/9/23 11:53, Nicolin Chen wrote:
> > > > The nature of ITS virtualization on ARM is done via hypercalls, so
> > > > kernel handles all IOVA mappings for the MSI doorbell in
> > > > iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). The
> > current
> > > > virtualization solution with a 2-stage nested translation setup is
> > > > to do 1:1 IOVA mappings at stage-1
> > > Note that if we still intend to use that trick there is a known issue
> > > at kernel side that needs to be fixed.
> > >
> > > ARM DEN 0049E.b IORT specification mandates that when RMRs are
> > > present, the OS must preserve PCIe configuration performed by the boot
> > > FW.
> > 
> > This limitation doesn't seem necessary for this MSI stuff?
> > 
> > What is it for?
> 
> That is to make sure the Stream Ids specified in RMR are still valid and is not being
> reassigned by OS. The kernel checks for this(iort_rmr_has_dev()),
> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameerali.kolothum.thodi@huawei.com/
So "boot configration" is more like "don't change the RIDs"? Ie don't
enable SRIOV?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 15:56         ` Jason Gunthorpe
@ 2023-03-10 16:07           ` Shameerali Kolothum Thodi
  2023-03-10 16:21             ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-10 16:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 10 March 2023 15:57
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
> jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> helper
> 
> On Fri, Mar 10, 2023 at 03:44:02PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > > Sent: 10 March 2023 15:33
> > > To: Eric Auger <eric.auger@redhat.com>
> > > Cc: Nicolin Chen <nicolinc@nvidia.com>; robin.murphy@arm.com;
> > > will@kernel.org; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> > > joro@8bytes.org; Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@huawei.com>; jean-philippe@linaro.org;
> > > linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH v1 01/14] iommu: Add
> iommu_get_unmanaged_domain
> > > helper
> > >
> > > On Fri, Mar 10, 2023 at 11:14:59AM +0100, Eric Auger wrote:
> > > > Hi Nicolin,
> > > >
> > > > On 3/9/23 11:53, Nicolin Chen wrote:
> > > > > The nature of ITS virtualization on ARM is done via hypercalls,
> > > > > so kernel handles all IOVA mappings for the MSI doorbell in
> > > > > iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg().
> The
> > > current
> > > > > virtualization solution with a 2-stage nested translation setup
> > > > > is to do 1:1 IOVA mappings at stage-1
> > > > Note that if we still intend to use that trick there is a known
> > > > issue at kernel side that needs to be fixed.
> > > >
> > > > ARM DEN 0049E.b IORT specification mandates that when RMRs are
> > > > present, the OS must preserve PCIe configuration performed by the
> > > > boot FW.
> > >
> > > This limitation doesn't seem necessary for this MSI stuff?
> > >
> > > What is it for?
> >
> > That is to make sure the Stream Ids specified in RMR are still valid
> > and is not being reassigned by OS. The kernel checks for
> > this(iort_rmr_has_dev()),
> >
> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
> > ali.kolothum.thodi@huawei.com/
> 
> So "boot configration" is more like "don't change the RIDs"? Ie don't enable
> SRIOV?
Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR specified 
SID.
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 16:07           ` Shameerali Kolothum Thodi
@ 2023-03-10 16:21             ` Jason Gunthorpe
  2023-03-10 16:30               ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 16:21 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Eric Auger, Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi wrote:
> > https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
> > > ali.kolothum.thodi@huawei.com/
> > 
> > So "boot configration" is more like "don't change the RIDs"? Ie don't enable
> > SRIOV?
> 
> Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR specified 
> SID.
So I think we are probably good them because vSR-IOV is already not
supported by qemu, so it impossible for a VM to change the PCI
configuration in a way that would alter the RID to SID mapping?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 16:21             ` Jason Gunthorpe
@ 2023-03-10 16:30               ` Shameerali Kolothum Thodi
  2023-03-10 17:03                 ` Jason Gunthorpe
  2023-03-16 19:51                 ` Nicolin Chen
  0 siblings, 2 replies; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-10 16:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 10 March 2023 16:21
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
> jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> helper
> 
> On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
> wrote:
> > >
> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
> > > > ali.kolothum.thodi@huawei.com/
> > >
> > > So "boot configration" is more like "don't change the RIDs"? Ie don't
> enable
> > > SRIOV?
> >
> > Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR
> specified
> > SID.
> 
> So I think we are probably good them because vSR-IOV is already not
> supported by qemu, so it impossible for a VM to change the PCI
> configuration in a way that would alter the RID to SID mapping?
> 
Provided we fix the issue mentioned by Eric. This was discussed here previously,
https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-6529078da97d@redhat.com/
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 16:30               ` Shameerali Kolothum Thodi
@ 2023-03-10 17:03                 ` Jason Gunthorpe
  2023-03-22 16:07                   ` Eric Auger
  2023-03-16 19:51                 ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 17:03 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Eric Auger, Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
On Fri, Mar 10, 2023 at 04:30:03PM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > Sent: 10 March 2023 16:21
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> > <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> > kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
> > jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
> > iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> > helper
> > 
> > On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > > >
> > https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
> > > > > ali.kolothum.thodi@huawei.com/
> > > >
> > > > So "boot configration" is more like "don't change the RIDs"? Ie don't
> > enable
> > > > SRIOV?
> > >
> > > Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR
> > specified
> > > SID.
> > 
> > So I think we are probably good them because vSR-IOV is already not
> > supported by qemu, so it impossible for a VM to change the PCI
> > configuration in a way that would alter the RID to SID mapping?
> > 
> 
> Provided we fix the issue mentioned by Eric. This was discussed here previously,
> 
> https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-6529078da97d@redhat.com/
Ah, I see so that we don't renumber the buses during PCI discovery..
It seems like Eric's issue is overly broad if we just want to block
RID reassignment that doesn't impact MMIO layout.
But, still, why do we care about this?
The vIOMMU should virtualize the vSIDs right? So why does qemu give a
vSID list to the guest anyhow? Shouldn't the guest use an algorithmic
calculation from the vRID so that qemu can reverse it to the correct
vPCI device and thus the correct vfio_device and then dev id in the
iommu_domain?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 17:03                 ` Jason Gunthorpe
@ 2023-03-22 16:07                   ` Eric Auger
  2023-03-22 17:02                     ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-22 16:07 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: Nicolin Chen, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
Hi Jason,
On 3/10/23 18:03, Jason Gunthorpe wrote:
> On Fri, Mar 10, 2023 at 04:30:03PM +0000, Shameerali Kolothum Thodi wrote:
>>
>>> -----Original Message-----
>>> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
>>> Sent: 10 March 2023 16:21
>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>>> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
>>> kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
>>> jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
>>> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
>>> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
>>> helper
>>>
>>> On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
>>> wrote:
>>> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
>>>>>> ali.kolothum.thodi@huawei.com/
>>>>> So "boot configration" is more like "don't change the RIDs"? Ie don't
>>> enable
>>>>> SRIOV?
>>>> Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR
>>> specified
>>>> SID.
>>> So I think we are probably good them because vSR-IOV is already not
>>> supported by qemu, so it impossible for a VM to change the PCI
>>> configuration in a way that would alter the RID to SID mapping?
>>>
>> Provided we fix the issue mentioned by Eric. This was discussed here previously,
>>
>> https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-6529078da97d@redhat.com/
> Ah, I see so that we don't renumber the buses during PCI discovery..
>
> It seems like Eric's issue is overly broad if we just want to block
> RID reassignment that doesn't impact MMIO layout.
IORT spec says
"
If reserved memory regions are present, the OS must preserve PCIe
configuration performed by the boot
firmware. This preservation is required to ensure functional continuity
of the endpoints that are using the reserved
memory regions. Therefore, RMR nodes must be supported by the inclusion
of the PCI Firmware defined _DSM
for ignoring PCI boot configuration, Function 5, in the ACPI device
object of the PCIe host bridge in ACPI
namespace. The _DSM method should return a value of 0 to indicate that
the OS must honour the PCI
configuration that the firmware has done at boot time. See [PCIFW] for
more details on this _DSM method.
"
Enforcing preservation was attempted in the past in QEMU and then
reverted due to the aforemented bug.
qemu commit: 40c3472a29  Revert "acpi/gpex: Inform os to keep firmware
resource map"
So if we want to rely on RMRs and re-introduce that change I don't see
how we can avoid fixing the kernel issue.
>
> But, still, why do we care about this?
>
> The vIOMMU should virtualize the vSIDs right? So why does qemu give a
> vSID list to the guest anyhow? Shouldn't the guest use an algorithmic
> calculation from the vRID so that qemu can reverse it to the correct
> vPCI device and thus the correct vfio_device and then dev id in the
> iommu_domain?
I don't understand how this changes the above picture?
Thanks
Eric
>
> Jason
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-22 16:07                   ` Eric Auger
@ 2023-03-22 17:02                     ` Jason Gunthorpe
  2023-03-22 17:41                       ` Eric Auger
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 17:02 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, kevin.tian@intel.com, baolu.lu@linux.intel.com,
	joro@8bytes.org, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 05:07:39PM +0100, Eric Auger wrote:
> > It seems like Eric's issue is overly broad if we just want to block
> > RID reassignment that doesn't impact MMIO layout.
> IORT spec says
> 
> "
> If reserved memory regions are present, the OS must preserve PCIe
> configuration performed by the boot
> firmware. This preservation is required to ensure functional continuity
> of the endpoints that are using the reserved
> memory regions. Therefore, RMR nodes must be supported by the inclusion
> of the PCI Firmware defined _DSM
> for ignoring PCI boot configuration, Function 5, in the ACPI device
> object of the PCIe host bridge in ACPI
> namespace. The _DSM method should return a value of 0 to indicate that
> the OS must honour the PCI
> configuration that the firmware has done at boot time. See [PCIFW] for
> more details on this _DSM method.
> "
I would say this spec language is overly broad. If the FW knows the
reserved memory regions it creates are not sensitive to PCI layout
then it should not be forced to set this flag.
> > But, still, why do we care about this?
> >
> > The vIOMMU should virtualize the vSIDs right? So why does qemu give a
> > vSID list to the guest anyhow? Shouldn't the guest use an algorithmic
> > calculation from the vRID so that qemu can reverse it to the correct
> > vPCI device and thus the correct vfio_device and then dev id in the
> > iommu_domain?
> I don't understand how this changes the above picture?
We are forced to use RMR because of the hacky GIC ITS stuff.
ITS placement is not sensitive to PCI layout.
ITS is not sensitive to bus numbers/etc.
vSID to dev_id should also be taken care of by QEMU even if bus
numbers change and doesn't need to be fixed.
So let's have a reason why we need to do all this weird stuff beyond
the spec says so.
If there is no actual functional issue we should not restrict the
guest and provide RMR without the DSM method. Someone should go and
update the spec if this offends them :)
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-22 17:02                     ` Jason Gunthorpe
@ 2023-03-22 17:41                       ` Eric Auger
  2023-03-22 18:07                         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-22 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, kevin.tian@intel.com, baolu.lu@linux.intel.com,
	joro@8bytes.org, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Hi Jason,
On 3/22/23 18:02, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 05:07:39PM +0100, Eric Auger wrote:
>
>>> It seems like Eric's issue is overly broad if we just want to block
>>> RID reassignment that doesn't impact MMIO layout.
>> IORT spec says
>>
>> "
>> If reserved memory regions are present, the OS must preserve PCIe
>> configuration performed by the boot
>> firmware. This preservation is required to ensure functional continuity
>> of the endpoints that are using the reserved
>> memory regions. Therefore, RMR nodes must be supported by the inclusion
>> of the PCI Firmware defined _DSM
>> for ignoring PCI boot configuration, Function 5, in the ACPI device
>> object of the PCIe host bridge in ACPI
>> namespace. The _DSM method should return a value of 0 to indicate that
>> the OS must honour the PCI
>> configuration that the firmware has done at boot time. See [PCIFW] for
>> more details on this _DSM method.
>> "
> I would say this spec language is overly broad. If the FW knows the
> reserved memory regions it creates are not sensitive to PCI layout
> then it should not be forced to set this flag.
But do we have any guarantee the bus numbers can't change. I thought the
guest was allowed to re-number at will? While further thinking at it,
all RID ID mappings should be affected by this concern, I mean not only
RID 2 RMRs? What do I miss?
>
>>> But, still, why do we care about this?
>>>
>>> The vIOMMU should virtualize the vSIDs right? So why does qemu give a
>>> vSID list to the guest anyhow? Shouldn't the guest use an algorithmic
>>> calculation from the vRID so that qemu can reverse it to the correct
>>> vPCI device and thus the correct vfio_device and then dev id in the
>>> iommu_domain?
>> I don't understand how this changes the above picture?
> We are forced to use RMR because of the hacky GIC ITS stuff.
well we are not obliged to use RMRs. My first revisions did not use it
and created a non direct S1 mapping. This is just a commodity that
simplifies the integration and was nicely suggested by jean.
>
> ITS placement is not sensitive to PCI layout.
>
> ITS is not sensitive to bus numbers/etc.
>
> vSID to dev_id should also be taken care of by QEMU even if bus
> numbers change and doesn't need to be fixed.
agreed, hence the above question.
>
> So let's have a reason why we need to do all this weird stuff beyond
> the spec says so.
>
> If there is no actual functional issue we should not restrict the
> guest and provide RMR without the DSM method. Someone should go and
> update the spec if this offends them :)
>
> Jason
>
Thanks
Eric
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-22 17:41                       ` Eric Auger
@ 2023-03-22 18:07                         ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 18:07 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, kevin.tian@intel.com, baolu.lu@linux.intel.com,
	joro@8bytes.org, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 06:41:42PM +0100, Eric Auger wrote:
> > I would say this spec language is overly broad. If the FW knows the
> > reserved memory regions it creates are not sensitive to PCI layout
> > then it should not be forced to set this flag.
> 
> But do we have any guarantee the bus numbers can't change. I thought the
> guest was allowed to re-number at will? While further thinking at it,
> all RID ID mappings should be affected by this concern, I mean not only
> RID 2 RMRs? What do I miss?
Bus number changing is allowed, but qemu should not be sensitive to
this.
qemu always knows the current guest assigned bus number for the vPCI,
since it traps the bus number changes like anything else.
Thus when a STE is configured qemu has access to accurate data to
convert the vSID to the vPCI and vfio_device. Even if the bus numbers
change since boot.
> > We are forced to use RMR because of the hacky GIC ITS stuff.
> well we are not obliged to use RMRs. My first revisions did not use it
> and created a non direct S1 mapping. This is just a commodity that
> simplifies the integration and was nicely suggested by jean.
I undertand it is ARM's architectural preference..
Personally I would prefer the vGIC model include the ITS page itself
and that the guest put the ITS page into the S1 mapping in the usual
way. But we are a long way away from that..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-10 16:30               ` Shameerali Kolothum Thodi
  2023-03-10 17:03                 ` Jason Gunthorpe
@ 2023-03-16 19:51                 ` Nicolin Chen
  2023-03-16 19:56                   ` Shameerali Kolothum Thodi
  1 sibling, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16 19:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Eric Auger
  Cc: Jason Gunthorpe, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
Hi Shameer/Eric,
On Fri, Mar 10, 2023 at 04:30:03PM +0000, Shameerali Kolothum Thodi wrote:
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > Sent: 10 March 2023 16:21
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> > <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> > kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
> > jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
> > iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> > helper
> >
> > On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > > >
> > https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shameer
> > > > > ali.kolothum.thodi@huawei.com/
> > > >
> > > > So "boot configration" is more like "don't change the RIDs"? Ie don't
> > enable
> > > > SRIOV?
> > >
> > > Yes. Don't think it will work with SR-IOV if you can't guarantee the RMR
> > specified
> > > SID.
> >
> > So I think we are probably good them because vSR-IOV is already not
> > supported by qemu, so it impossible for a VM to change the PCI
> > configuration in a way that would alter the RID to SID mapping?
> >
> 
> Provided we fix the issue mentioned by Eric. This was discussed here previously,
> 
> https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-6529078da97d@redhat.com/
Have we fixed the issue? I saw Lorenzo replying in that thread:
https://lore.kernel.org/linux-arm-kernel/Yi8nV8H4Jjlpadmk@lpieralisi/
Or, what's remaining here regarding this topic? Is it a blocker?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-16 19:51                 ` Nicolin Chen
@ 2023-03-16 19:56                   ` Shameerali Kolothum Thodi
  2023-03-22 15:44                     ` Eric Auger
  0 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-16 19:56 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger, lpieralisi@kernel.org
  Cc: Jason Gunthorpe, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 16 March 2023 19:51
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> Eric Auger <eric.auger@redhat.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; robin.murphy@arm.com;
> will@kernel.org; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; jean-philippe@linaro.org;
> linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
> helper
> 
> Hi Shameer/Eric,
> 
> On Fri, Mar 10, 2023 at 04:30:03PM +0000, Shameerali Kolothum Thodi
> wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > > Sent: 10 March 2023 16:21
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
> > > <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> > > kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
> > > jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
> > > iommu@lists.linux.dev; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH v1 01/14] iommu: Add
> iommu_get_unmanaged_domain
> > > helper
> > >
> > > On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > > > >
> > >
> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shame
> > > er
> > > > > > ali.kolothum.thodi@huawei.com/
> > > > >
> > > > > So "boot configration" is more like "don't change the RIDs"? Ie
> > > > > don't
> > > enable
> > > > > SRIOV?
> > > >
> > > > Yes. Don't think it will work with SR-IOV if you can't guarantee
> > > > the RMR
> > > specified
> > > > SID.
> > >
> > > So I think we are probably good them because vSR-IOV is already not
> > > supported by qemu, so it impossible for a VM to change the PCI
> > > configuration in a way that would alter the RID to SID mapping?
> > >
> >
> > Provided we fix the issue mentioned by Eric. This was discussed here
> > previously,
> >
> > https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-65290
> > 78da97d@redhat.com/
> 
> Have we fixed the issue? I saw Lorenzo replying in that thread:
> https://lore.kernel.org/linux-arm-kernel/Yi8nV8H4Jjlpadmk@lpieralisi/
> 
> Or, what's remaining here regarding this topic? Is it a blocker?
[+Lorenzo]
Not sure it is fixed yet. Also, assuming we take RMR path, do we plan to support 
DT base Guests at all?
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
  2023-03-16 19:56                   ` Shameerali Kolothum Thodi
@ 2023-03-22 15:44                     ` Eric Auger
  0 siblings, 0 replies; 165+ messages in thread
From: Eric Auger @ 2023-03-22 15:44 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen, lpieralisi@kernel.org
  Cc: Jason Gunthorpe, robin.murphy@arm.com, will@kernel.org,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
Hi,
On 3/16/23 20:56, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
>> Sent: 16 March 2023 19:51
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>> Eric Auger <eric.auger@redhat.com>
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; robin.murphy@arm.com;
>> will@kernel.org; kevin.tian@intel.com; baolu.lu@linux.intel.com;
>> joro@8bytes.org; jean-philippe@linaro.org;
>> linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain
>> helper
>>
>> Hi Shameer/Eric,
>>
>> On Fri, Mar 10, 2023 at 04:30:03PM +0000, Shameerali Kolothum Thodi
>> wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
>>>> Sent: 10 March 2023 16:21
>>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: Eric Auger <eric.auger@redhat.com>; Nicolin Chen
>>>> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
>>>> kevin.tian@intel.com; baolu.lu@linux.intel.com; joro@8bytes.org;
>>>> jean-philippe@linaro.org; linux-arm-kernel@lists.infradead.org;
>>>> iommu@lists.linux.dev; linux-kernel@vger.kernel.org
>>>> Subject: Re: [PATCH v1 01/14] iommu: Add
>> iommu_get_unmanaged_domain
>>>> helper
>>>>
>>>> On Fri, Mar 10, 2023 at 04:07:38PM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>> https://lore.kernel.org/linux-arm-kernel/20220420164836.1181-5-shame
>>>> er
>>>>>>> ali.kolothum.thodi@huawei.com/
>>>>>> So "boot configration" is more like "don't change the RIDs"? Ie
>>>>>> don't
>>>> enable
>>>>>> SRIOV?
>>>>> Yes. Don't think it will work with SR-IOV if you can't guarantee
>>>>> the RMR
>>>> specified
>>>>> SID.
>>>> So I think we are probably good them because vSR-IOV is already not
>>>> supported by qemu, so it impossible for a VM to change the PCI
>>>> configuration in a way that would alter the RID to SID mapping?
>>>>
>>> Provided we fix the issue mentioned by Eric. This was discussed here
>>> previously,
>>>
>>> https://lore.kernel.org/linux-arm-kernel/bb3688c7-8f42-039e-e22f-65290
>>> 78da97d@redhat.com/
>> Have we fixed the issue? I saw Lorenzo replying in that thread:
>> https://lore.kernel.org/linux-arm-kernel/Yi8nV8H4Jjlpadmk@lpieralisi/
>>
>> Or, what's remaining here regarding this topic? Is it a blocker?
> [+Lorenzo]
I am not aware of any change in the situation.
Thanks
Eric
>
> Not sure it is fixed yet. Also, assuming we take RMR path, do we plan to support 
> DT base Guests at all?
>
> Thanks,
> Shameer
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
 
 
 
- * [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 13:42   ` Jean-Philippe Brucker
  2023-03-09 10:53 ` [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains Nicolin Chen
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Add the following data structures for corresponding ioctls:
               iommu_hwpt_arm_smmuv3 => IOMMUFD_CMD_HWPT_ALLOC
    iommu_hwpt_invalidate_arm_smmuv3 => IOMMUFD_CMD_HWPT_INVALIDATE
Also, add IOMMU_HW_INFO_TYPE_ARM_SMMUV3 and IOMMU_PGTBL_TYPE_ARM_SMMUV3_S1
to the header and corresponding type/size arrays.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/hw_pagetable.c |  4 +++
 drivers/iommu/iommufd/main.c         |  1 +
 include/uapi/linux/iommufd.h         | 50 ++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8f9985bddeeb..5e798b2f9a3a 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -173,6 +173,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 static const size_t iommufd_hwpt_alloc_data_size[] = {
 	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
 	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_intel_vtd),
+	[IOMMU_HWPT_TYPE_ARM_SMMUV3] = sizeof(struct iommu_hwpt_arm_smmuv3),
 };
 
 /*
@@ -183,6 +184,8 @@ const u64 iommufd_hwpt_type_bitmaps[] =  {
 	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
 	[IOMMU_HW_INFO_TYPE_INTEL_VTD] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT) |
 					 BIT_ULL(IOMMU_HWPT_TYPE_VTD_S1),
+	[IOMMU_HW_INFO_TYPE_ARM_SMMUV3] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT) |
+					  BIT_ULL(IOMMU_HWPT_TYPE_ARM_SMMUV3),
 };
 
 /* Return true if type is supported, otherwise false */
@@ -329,6 +332,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
  */
 static const size_t iommufd_hwpt_invalidate_info_size[] = {
 	[IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_invalidate_intel_vtd),
+	[IOMMU_HWPT_TYPE_ARM_SMMUV3] = sizeof(struct iommu_hwpt_invalidate_arm_smmuv3),
 };
 
 int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 514db4c26927..0b0097af7c86 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -280,6 +280,7 @@ union ucmd_buffer {
 	 * path.
 	 */
 	struct iommu_hwpt_invalidate_intel_vtd vtd;
+	struct iommu_hwpt_invalidate_arm_smmuv3 smmuv3;
 };
 
 struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 2a6c326391b2..0d5551b1b2be 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -352,10 +352,13 @@ struct iommu_vfio_ioas {
  * enum iommu_hwpt_type - IOMMU HWPT Type
  * @IOMMU_HWPT_TYPE_DEFAULT: default
  * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
+ * @IOMMU_HWPT_TYPE_ARM_SMMUV3: ARM SMMUv3 stage-1 Context Descriptor
+ *                              table
  */
 enum iommu_hwpt_type {
 	IOMMU_HWPT_TYPE_DEFAULT,
 	IOMMU_HWPT_TYPE_VTD_S1,
+	IOMMU_HWPT_TYPE_ARM_SMMUV3,
 };
 
 /**
@@ -411,6 +414,28 @@ struct iommu_hwpt_intel_vtd {
 	__u32 __reserved;
 };
 
+/**
+ * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific page table data
+ *
+ * @flags: page table entry attributes
+ * @s2vmid: Virtual machine identifier
+ * @s1ctxptr: Stage-1 context descriptor pointer
+ * @s1cdmax: Number of CDs pointed to by s1ContextPtr
+ * @s1fmt: Stage-1 Format
+ * @s1dss: Default substream
+ */
+struct iommu_hwpt_arm_smmuv3 {
+#define IOMMU_SMMUV3_FLAG_S2	(1 << 0) /* if unset, stage-1 */
+#define IOMMU_SMMUV3_FLAG_VMID	(1 << 1) /* vmid override */
+	__u64 flags;
+	__u32 s2vmid;
+	__u32 __reserved;
+	__u64 s1ctxptr;
+	__u64 s1cdmax;
+	__u64 s1fmt;
+	__u64 s1dss;
+};
+
 /**
  * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
  * @size: sizeof(struct iommu_hwpt_alloc)
@@ -446,6 +471,8 @@ struct iommu_hwpt_intel_vtd {
  * +------------------------------+-------------------------------------+-----------+
  * | IOMMU_HWPT_TYPE_VTD_S1       |      struct iommu_hwpt_intel_vtd    |    HWPT   |
  * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_ARM_SMMUV3   |      struct iommu_hwpt_arm_smmuv3   | IOAS/HWPT |
+ * +------------------------------+-------------------------------------------------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -463,10 +490,12 @@ struct iommu_hwpt_alloc {
 /**
  * enum iommu_hw_info_type - IOMMU Hardware Info Types
  * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
+ * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type
  */
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_DEFAULT,
 	IOMMU_HW_INFO_TYPE_INTEL_VTD,
+	IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
 };
 
 /**
@@ -591,6 +620,25 @@ struct iommu_hwpt_invalidate_intel_vtd {
 	__u64 nb_granules;
 };
 
+/**
+ * struct iommu_hwpt_invalidate_arm_smmuv3 - ARM SMMUv3 cahce invalidation info
+ * @flags: boolean attributes of cache invalidation command
+ * @opcode: opcode of cache invalidation command
+ * @ssid: SubStream ID
+ * @granule_size: page/block size of the mapping in bytes
+ * @range: IOVA range to invalidate
+ */
+struct iommu_hwpt_invalidate_arm_smmuv3 {
+#define IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF	(1 << 0)
+	__u64 flags;
+	__u8 opcode;
+	__u8 padding[3];
+	__u32 asid;
+	__u32 ssid;
+	__u32 granule_size;
+	struct iommu_iova_range range;
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
@@ -609,6 +657,8 @@ struct iommu_hwpt_invalidate_intel_vtd {
  * +------------------------------+----------------------------------------+
  * | IOMMU_HWPT_TYPE_VTD_S1       | struct iommu_hwpt_invalidate_intel_vtd |
  * +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_ARM_SMMUV3   | struct iommu_hwpt_invalidate_arm_smmuv3|
+ * +------------------------------+----------------------------------------+
  */
 struct iommu_hwpt_invalidate {
 	__u32 size;
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 10:53 ` [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3 Nicolin Chen
@ 2023-03-09 13:42   ` Jean-Philippe Brucker
  2023-03-09 14:48     ` Jason Gunthorpe
                       ` (3 more replies)
  0 siblings, 4 replies; 165+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-09 13:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, robin.murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, linux-arm-kernel, iommu, linux-kernel,
	yi.l.liu
Hi Nicolin,
On Thu, Mar 09, 2023 at 02:53:38AM -0800, Nicolin Chen wrote:
> Add the following data structures for corresponding ioctls:
>                iommu_hwpt_arm_smmuv3 => IOMMUFD_CMD_HWPT_ALLOC
>     iommu_hwpt_invalidate_arm_smmuv3 => IOMMUFD_CMD_HWPT_INVALIDATE
> 
> Also, add IOMMU_HW_INFO_TYPE_ARM_SMMUV3 and IOMMU_PGTBL_TYPE_ARM_SMMUV3_S1
> to the header and corresponding type/size arrays.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> +/**
> + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific page table data
> + *
> + * @flags: page table entry attributes
> + * @s2vmid: Virtual machine identifier
> + * @s1ctxptr: Stage-1 context descriptor pointer
> + * @s1cdmax: Number of CDs pointed to by s1ContextPtr
> + * @s1fmt: Stage-1 Format
> + * @s1dss: Default substream
> + */
> +struct iommu_hwpt_arm_smmuv3 {
> +#define IOMMU_SMMUV3_FLAG_S2	(1 << 0) /* if unset, stage-1 */
I don't understand the purpose of this flag, since the structure only
provides stage-1 configuration fields
> +#define IOMMU_SMMUV3_FLAG_VMID	(1 << 1) /* vmid override */
Doesn't this break isolation?  The VMID space is global for the SMMU, so
the guest could access cached mappings of another device
> +	__u64 flags;
> +	__u32 s2vmid;
> +	__u32 __reserved;
> +	__u64 s1ctxptr;
> +	__u64 s1cdmax;
> +	__u64 s1fmt;
> +	__u64 s1dss;
> +};
> +
> +/**
> + * struct iommu_hwpt_invalidate_arm_smmuv3 - ARM SMMUv3 cahce invalidation info
> + * @flags: boolean attributes of cache invalidation command
> + * @opcode: opcode of cache invalidation command
> + * @ssid: SubStream ID
> + * @granule_size: page/block size of the mapping in bytes
> + * @range: IOVA range to invalidate
> + */
> +struct iommu_hwpt_invalidate_arm_smmuv3 {
> +#define IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF	(1 << 0)
> +	__u64 flags;
> +	__u8 opcode;
> +	__u8 padding[3];
> +	__u32 asid;
> +	__u32 ssid;
> +	__u32 granule_size;
> +	struct iommu_iova_range range;
> +};
Although we can keep the alloc and hardware info separate for each IOMMU
architecture, we should try to come up with common invalidation methods.
It matters because things like vSVA, or just efficient dynamic mappings,
will require optimal invalidation latency. A paravirtual interface like
vhost-iommu can help with that, as the host kernel will handle guest
invalidations directly instead of doing a round-trip to host userspace
(and we'll likely want the same path for PRI.)
Supporting HW page tables for a common PV IOMMU does require some
architecture-specific knowledge, but invalidation messages contain roughly
the same information on all architectures. The PV IOMMU won't include
command opcodes for each possible architecture if one generic command does
the same job.
Ideally I'd like something like this for vhost-iommu:
* slow path through userspace for complex requests like attach-table and
  probe, where the VMM can decode arch-specific information and translate
  them to iommufd and vhost-iommu ioctls to update the configuration.
* fast path within the kernel for performance-critical requests like
  invalidate, page request and response. It would be absurd for the
  vhost-iommu driver to translate generic invalidation requests from the
  guest into arch-specific commands with special opcodes, when the next
  step is calling the IOMMU driver which does that for free.
During previous discussions we came up with generic invalidations that
could fit both Arm and x86 [1][2]. The only difference was the ASID
(called archid/id in those proposals) which VT-d didn't need. Could we try
to build on that?
[1] https://elixir.bootlin.com/linux/v5.17/source/include/uapi/linux/iommu.h#L161
[2] https://lists.oasis-open.org/archives/virtio-dev/202102/msg00014.html
Thanks,
Jean
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 13:42   ` Jean-Philippe Brucker
@ 2023-03-09 14:48     ` Jason Gunthorpe
  2023-03-09 18:26       ` Jean-Philippe Brucker
  2023-03-10  4:50       ` Nicolin Chen
  2023-03-09 15:26     ` Shameerali Kolothum Thodi
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 14:48 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Nicolin Chen, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:
> Although we can keep the alloc and hardware info separate for each IOMMU
> architecture, we should try to come up with common invalidation methods.
The invalidation language is tightly linked to the actual cache block
and cache tag in the IOMMU HW design. Generality will loose or
obfuscate the necessary specificity that is required for creating real
vIOMMUs.
Further, invalidation is a fast path, it is crazy to take a vIOMMU of
a real HW receving a native invalidation request, mangle it to some
obfuscated kernel version and then de-mangle it again in the kernel
driver. IMHO ideally qemu will simply point the invalidation at the
WQE in the SW vIOMMU command queue and invoke the ioctl. (Nicolin, we
should check more into this)
The purpose of these interfaces is to support high performance full
functionality vIOMMUs of the real HW, we should not loose sight of
that goal.
We are actually planning to go futher and expose direct invalidation
work queues complete with HW doorbells to userspace. This obviously
must be in native HW format.
Nicolin, I think we should tweak the uAPI here so that the
invalidation opaque data has a format tagged on its own, instead of
re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
tag and also a virtio-viommu invalidate type tag.
This will allow Jean to put the invalidation decoding in the iommu
drivers if we think that is the right direction. virtio can
standardize it as a "HW format".
> Ideally I'd like something like this for vhost-iommu:
> 
> * slow path through userspace for complex requests like attach-table and
>   probe, where the VMM can decode arch-specific information and translate
>   them to iommufd and vhost-iommu ioctls to update the configuration.
> 
> * fast path within the kernel for performance-critical requests like
>   invalidate, page request and response. It would be absurd for the
>   vhost-iommu driver to translate generic invalidation requests from the
>   guest into arch-specific commands with special opcodes, when the next
>   step is calling the IOMMU driver which does that for free.
Someone has to do the conversion. If you don't think virito should do
it then I'd be OK to add a type tag for virtio format invalidation and
put it in the IOMMU driver.
But given virtio overall already has to know *alot* about how the HW
it is wrapping works I don't think it is necessarily absurd for virtio
to do the conversion. I'd like to evaluate this in patches in context
with how much other unique HW code ends up in kernel-side vhost-iommu.
However, I don't know the rational for virtio-viommu, it seems like a
strange direction to me. All the iommu drivers have native command
queues. ARM and AMD are both supporting native command queues directly
in the guest, complete with a direct guest MMIO doorbell ring.
If someone wants to optimize this I'd think the way to do it is to use
virtio like techniques to put SW command queue processing in the
kernel iommu driver and continue to use the HW native interface in the
VM.
What benifit comes from replacing the HW native interface with virtio?
Especially on ARM where the native interface is pretty clean?
> During previous discussions we came up with generic invalidations that
> could fit both Arm and x86 [1][2]. The only difference was the ASID
> (called archid/id in those proposals) which VT-d didn't need. Could we try
> to build on that?
IMHO this was just unioning all the different invalidation types
together. It makes sense for something like virtio but it is
illogical/obfuscated as a user/kernel interface since it still
requires a userspace HW driver to understand what subset of the
invalidations are used on the actual HW.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 14:48     ` Jason Gunthorpe
@ 2023-03-09 18:26       ` Jean-Philippe Brucker
  2023-03-09 21:01         ` Jason Gunthorpe
  2023-03-10  4:50       ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-09 18:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:
> 
> > Although we can keep the alloc and hardware info separate for each IOMMU
> > architecture, we should try to come up with common invalidation methods.
> 
> The invalidation language is tightly linked to the actual cache block
> and cache tag in the IOMMU HW design.
Concretely though, what are the incompatibilities between the HW designs?
They all need to remove a range of TLB entries, using some address space
tag. But if there is an actual difference I do need to know.
> Generality will loose or
> obfuscate the necessary specificity that is required for creating real
> vIOMMUs.
> 
> Further, invalidation is a fast path, it is crazy to take a vIOMMU of
> a real HW receving a native invalidation request, mangle it to some
> obfuscated kernel version and then de-mangle it again in the kernel
> driver. IMHO ideally qemu will simply point the invalidation at the
> WQE in the SW vIOMMU command queue and invoke the ioctl. (Nicolin, we
> should check more into this)
Avoiding copying a few bytes won't make up for the extra context switches
to userspace. An emulated IOMMU can easily decode commands and translate
them to generic kernel structures, in a handful of CPU cycles, just like
they decode STEs. It's what they do, and it's the opposite of obfuscation.
> 
> The purpose of these interfaces is to support high performance full
> functionality vIOMMUs of the real HW, we should not loose sight of
> that goal.
> 
> We are actually planning to go futher and expose direct invalidation
> work queues complete with HW doorbells to userspace. This obviously
> must be in native HW format.
Doesn't seem relevant since direct access to command queue wouldn't use
this struct.
> 
> Nicolin, I think we should tweak the uAPI here so that the
> invalidation opaque data has a format tagged on its own, instead of
> re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> tag and also a virtio-viommu invalidate type tag.
> 
> This will allow Jean to put the invalidation decoding in the iommu
> drivers if we think that is the right direction. virtio can
> standardize it as a "HW format".
> 
> > Ideally I'd like something like this for vhost-iommu:
> > 
> > * slow path through userspace for complex requests like attach-table and
> >   probe, where the VMM can decode arch-specific information and translate
> >   them to iommufd and vhost-iommu ioctls to update the configuration.
> > 
> > * fast path within the kernel for performance-critical requests like
> >   invalidate, page request and response. It would be absurd for the
> >   vhost-iommu driver to translate generic invalidation requests from the
> >   guest into arch-specific commands with special opcodes, when the next
> >   step is calling the IOMMU driver which does that for free.
> 
> Someone has to do the conversion. If you don't think virito should do
> it then I'd be OK to add a type tag for virtio format invalidation and
> put it in the IOMMU driver.
Implementing two invalidation formats in each IOMMU driver does not seem
practical.
> 
> But given virtio overall already has to know *alot* about how the HW
> it is wrapping works I don't think it is necessarily absurd for virtio
> to do the conversion. I'd like to evaluate this in patches in context
> with how much other unique HW code ends up in kernel-side vhost-iommu.
Ideally none. I'd rather leave those, attach and probe, in userspace and
if possible compatible with iommufd to avoid register decoding. 
> 
> However, I don't know the rational for virtio-viommu, it seems like a
> strange direction to me.
A couple of reasons are relevant here: non-QEMU VMMs don't want to emulate
all vendor IOMMUs, new architectures get vIOMMU mostly for free, and vhost
provides a faster path. Also the ability to optimize paravirtual
interfaces for things like combined invalidation (IOTLB+ATC) or, later,
nested page requests.
For a while the main vIOMMU use-case was assignment to guest userspace,
mainly DPDK, which works great with a generic and slow map/unmap
interface. Since vSVA is still a niche use-case, and nesting without page
faults requires pinning the whole guest memory, map/unmap still seems more
desirable to me. But there is some renewed interest in supporting page
tables with virtio-iommu for the reasons above.
> All the iommu drivers have native command
> queues. ARM and AMD are both supporting native command queues directly
> in the guest, complete with a direct guest MMIO doorbell ring.
Arm SMMUv3 mandates a single global command queue (SMMUv2 uses registers).
An SMMUv3 can optionally implement multiple command queues, though I don't
know if they can be safely assigned to guests. For a lot of SMMUv3
implementations that have a single queue and for other architectures, we
can do better than hardware emulation.
> 
> If someone wants to optimize this I'd think the way to do it is to use
> virtio like techniques to put SW command queue processing in the
> kernel iommu driver and continue to use the HW native interface in the
> VM.
I didn't get which kernel this is.
> 
> What benifit comes from replacing the HW native interface with virtio?
> Especially on ARM where the native interface is pretty clean?
> 
> > During previous discussions we came up with generic invalidations that
> > could fit both Arm and x86 [1][2]. The only difference was the ASID
> > (called archid/id in those proposals) which VT-d didn't need. Could we try
> > to build on that?
> 
> IMHO this was just unioning all the different invalidation types
> together. It makes sense for something like virtio but it is
> illogical/obfuscated as a user/kernel interface since it still
> requires a userspace HW driver to understand what subset of the
> invalidations are used on the actual HW.
As above, decoding arch-specific structures into generic ones is what an
emulated IOMMU does, and it doesn't make a performance difference in which
format it forwards that to the kernel. The host IOMMU driver checks the
guest request and copies them into the command queue. Whether that request
comes in the form of a structure binary-compatible with Arm SMMUvX.Y, or
some generic structure, does not make a difference.
Thanks,
Jean
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 18:26       ` Jean-Philippe Brucker
@ 2023-03-09 21:01         ` Jason Gunthorpe
  2023-03-10 12:16           ` Jean-Philippe Brucker
  2023-03-10 14:52           ` Robin Murphy
  0 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 21:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Nicolin Chen, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 06:26:59PM +0000, Jean-Philippe Brucker wrote:
> On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:
> > 
> > > Although we can keep the alloc and hardware info separate for each IOMMU
> > > architecture, we should try to come up with common invalidation methods.
> > 
> > The invalidation language is tightly linked to the actual cache block
> > and cache tag in the IOMMU HW design.
> 
> Concretely though, what are the incompatibilities between the HW designs?
> They all need to remove a range of TLB entries, using some address space
> tag. But if there is an actual difference I do need to know.
For instance the address space tags and the cache entires they match
to are wildly different.
ARM uses a fine grained ASID and Intel uses a shared ASID called a DID
and incorporates the PASID into the cache tag.
AMD uses something called a DID that covers a different set of stuff
than the Intel DID, and it doesn't seem to work for nesting. AMD uses
PASID as the primary nested cache tag.
Superficially you can say all three have an ASID and you can have an
invalidate ASID Operation and make it "look" the same, but the actual
behavior is totally ill defined and the whole thing is utterly
obfuscated as to what does it actually MEAN.
But this doesn't matter for virtio. You have already got a spec that
defines invalidation in terms of virtio objects that sort of match
things like iommu_domains. I hope the virtio
VIRTIO_IOMMU_INVAL_S_DOMAIN is very well defined as to exactly what
objects a DOMAIN match applies to. The job of the hypervisor is to
translate that to whatever invalidation(s) the real HW requires.
ie virito is going to say "invalidate this domain" and expect the
hypervisor to spew it to every attached PASID if that is what the HW
requires (eg AMD), or do a single ASID invalidation (Intel, sometimes)
But when a vIOMMU gets a vDID or vPASID invalidation command it
doesn't mean the same thing as virtio. The driver must invalidate
exactly what the vIOMMU programming model says to invalidate because
the guest is going to spew more invalidations to cover what it
needs. Over invalidation would be a performance problem.
Exposing these subtle differences to userspace is necessary. What I do
not want is leaking those differences through an ill-defined "generic"
interface.
Even more importantly Intel and ARM should not have to fight about the
subtle implementation specific details of the specification of the
"generic" interface. If Intel needs something dumb to make their
viommu work well then they should simply be able to do it. I don't
want to spend 6 months of pointless arguing about language details in
an unnecessary "generic" spec.
> > The purpose of these interfaces is to support high performance full
> > functionality vIOMMUs of the real HW, we should not loose sight of
> > that goal.
> > 
> > We are actually planning to go futher and expose direct invalidation
> > work queues complete with HW doorbells to userspace. This obviously
> > must be in native HW format.
> 
> Doesn't seem relevant since direct access to command queue wouldn't use
> this struct.
The point is our design direction with iommufd is to expose the raw HW
to userspace, not to obfuscate it with ill defined generalizations.
> > Someone has to do the conversion. If you don't think virito should do
> > it then I'd be OK to add a type tag for virtio format invalidation and
> > put it in the IOMMU driver.
> 
> Implementing two invalidation formats in each IOMMU driver does not seem
> practical.
I don't see why.
The advantage of the kernel side is that the implementation is not
strong ABI. If we want to adjust and review the virtio invalidation
path as new HW comes along we can, so long as it is only in the
kernel.
On the other hand if we mess up the uABI for iommufd we are stuck with
it.
The safest and best uABI for iommufd is the HW native uABI because it,
almost by definition, cannot be wrong.
Anyhow, I'm still not very convinced adapting to virtio invalidation
format should be in iommu drivers. I think what you end up with for
virtio is that Intel/AMD have some nice common code to invalidate an
iommu_domain address range (probably even the existing invalidation
interface), and SMMUv3 is just totally different and special.
This is because SMMUv3 has no option to keep the PASID table in the
hypervisor so you are sadly forced to expose both the native ASID and
native PASID caches to the virtio protocol.
Given that the VM virtio driver has to have SMMUv3 specific code to
handle the CD table it must get, I don't see the problem with also
having SMMUv3 specific code in the hypervisor virtio driver to handle
invalidating based on the CD table.
Really, I want to see patches implementing all of this before we make
any decision on what the ops interface is for virtio-iommu kernel
side.
> > However, I don't know the rational for virtio-viommu, it seems like a
> > strange direction to me.
> 
> A couple of reasons are relevant here: non-QEMU VMMs don't want to emulate
> all vendor IOMMUs, new architectures get vIOMMU mostly for free,
So your argument is you can implement a simple map/unmap API riding
on the common IOMMU API and this is portable?
Seems sensible, but that falls apart pretty quickly when we talk about
nesting.. I don't think we can avoid VMM components to set this up, so
it stops being portable. At that point I'm back to asking why not use
the real HW model?
> > All the iommu drivers have native command
> > queues. ARM and AMD are both supporting native command queues directly
> > in the guest, complete with a direct guest MMIO doorbell ring.
> 
> Arm SMMUv3 mandates a single global command queue (SMMUv2 uses
> registers). An SMMUv3 can optionally implement multiple command
> queues, though I don't know if they can be safely assigned to
> guests.
It is not standardized by ARM, but it can (and has) been done.
> For a lot of SMMUv3 implementations that have a single queue and for
> other architectures, we can do better than hardware emulation.
How is using a SW emulated virtio formatted queue better than using a
SW emulated SMMUv3 ECMDQ?
The vSMMUv3 driver controls what capabilites are shown to the guest it
can definitely create a ECMDQ enabled device and do something like
assign the 2ndary ECMDQs to hypervisor kernel SW queues the same way
virito does.
I don't think there is a very solid argument that virtio-iommu is
necessary to get faster invalidation.
> > If someone wants to optimize this I'd think the way to do it is to use
> > virtio like techniques to put SW command queue processing in the
> > kernel iommu driver and continue to use the HW native interface in the
> > VM.
> 
> I didn't get which kernel this is.
hypervisor kernel.
> > IMHO this was just unioning all the different invalidation types
> > together. It makes sense for something like virtio but it is
> > illogical/obfuscated as a user/kernel interface since it still
> > requires a userspace HW driver to understand what subset of the
> > invalidations are used on the actual HW.
> 
> As above, decoding arch-specific structures into generic ones is what an
> emulated IOMMU does,
No, it is what virtio wants to do. We are deliberately trying not to
do that for real accelerated HW vIOMMU emulators.
> and it doesn't make a performance difference in which
> format it forwards that to the kernel. The host IOMMU driver checks the
> guest request and copies them into the command queue. Whether that request
> comes in the form of a structure binary-compatible with Arm SMMUvX.Y, or
> some generic structure, does not make a difference.
It is not the structure layouts that matter!
It is the semantic meaning of each request, on each unique piece of
hardware. We actually want to leak the subtle semantic differences to
userspace.
Doing that and continuing to give them the same command label is
exactly the kind of obfuscated ill defined "generic" interface I do
not want.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 21:01         ` Jason Gunthorpe
@ 2023-03-10 12:16           ` Jean-Philippe Brucker
  2023-03-10 14:52           ` Robin Murphy
  1 sibling, 0 replies; 165+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-10 12:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 05:01:15PM -0400, Jason Gunthorpe wrote:
> > Concretely though, what are the incompatibilities between the HW designs?
> > They all need to remove a range of TLB entries, using some address space
> > tag. But if there is an actual difference I do need to know.
> 
> For instance the address space tags and the cache entires they match
> to are wildly different.
> 
> ARM uses a fine grained ASID and Intel uses a shared ASID called a DID
> and incorporates the PASID into the cache tag.
> 
> AMD uses something called a DID that covers a different set of stuff
> than the Intel DID, and it doesn't seem to work for nesting. AMD uses
> PASID as the primary nested cache tag.
Thanks, we'll look into that
> This is because SMMUv3 has no option to keep the PASID table in the
> hypervisor so you are sadly forced to expose both the native ASID and
> native PASID caches to the virtio protocol.
It is possible to keep the PASID table in the host, but you need a way to
allocate GPA since the SMMU accesses it after stage-2 translation. I think
that necessarily requires a PV interface, but you could look into it.
Anyway, even with that, ATC invalidations take a PASID.
> 
> Given that the VM virtio driver has to have SMMUv3 specific code to
> handle the CD table it must get, I don't see the problem with also
> having SMMUv3 specific code in the hypervisor virtio driver to handle
> invalidating based on the CD table.
There isn't much we can't do, I'm just hoping to build something
straightforward instead of having to work around awkward interfaces
> > A couple of reasons are relevant here: non-QEMU VMMs don't want to emulate
> > all vendor IOMMUs, new architectures get vIOMMU mostly for free,
> 
> So your argument is you can implement a simple map/unmap API riding
> on the common IOMMU API and this is portable?
> 
> Seems sensible, but that falls apart pretty quickly when we talk about
> nesting.. I don't think we can avoid VMM components to set this up, so
> it stops being portable. At that point I'm back to asking why not use
> the real HW model?
A single VMM component that shovels data from the virtqueue to the kernel
API and back, rather than four different hardware emulations, four
different queues, four different device tables. It's obviously better for
VMMs that don't do full-system emulation like QEMU, especially as they
generally already implement a virtio transport. Smaller attack surface,
fewer bugs.
The VMM developer gets a multi-platform vIOMMU without having to study all
the different architecture manuals. There is a small amount of HW specific
data in there, but it only relates to table formats. 
Ideally it wouldn't need any HW knowledge, but that would requires the
APIs to be aligned: instead of ID registers we pass plain features, and
invalidations don't require HW specific opcodes. Otherwise there is going
to be a layer of glue everywhere, which is what I'm trying to avoid here.
> 
> > > All the iommu drivers have native command
> > > queues. ARM and AMD are both supporting native command queues directly
> > > in the guest, complete with a direct guest MMIO doorbell ring.
> > 
> > Arm SMMUv3 mandates a single global command queue (SMMUv2 uses
> > registers). An SMMUv3 can optionally implement multiple command
> > queues, though I don't know if they can be safely assigned to
> > guests.
> 
> It is not standardized by ARM, but it can (and has) been done.
> 
> > For a lot of SMMUv3 implementations that have a single queue and for
> > other architectures, we can do better than hardware emulation.
> 
> How is using a SW emulated virtio formatted queue better than using a
> SW emulated SMMUv3 ECMDQ?
We don't need to repeat it for all IOMMU architectures, not emulate a new
queue in the kernel. The first motivator for virtio-iommu was avoiding to
emulate hardware in the kernel. The SMMU maintainer saw how painful that
was to do for the GIC, saw that there is a virtualization queue readily
available in vhost and, well, it just made sense. Still does.
> > As above, decoding arch-specific structures into generic ones is what an
> > emulated IOMMU does,
> 
> No, it is what virtio wants to do. We are deliberately trying not to
> do that for real accelerated HW vIOMMU emulators.
Yes there is a line somewhere, and I'd prefer it to be the page table.
Given how many possible hardware combinations exist and how many more will
show up, it would be good to abstract things where possible.
> 
> > and it doesn't make a performance difference in which
> > format it forwards that to the kernel. The host IOMMU driver checks the
> > guest request and copies them into the command queue. Whether that request
> > comes in the form of a structure binary-compatible with Arm SMMUvX.Y, or
> > some generic structure, does not make a difference.
> 
> It is not the structure layouts that matter!
> 
> It is the semantic meaning of each request, on each unique piece of
> hardware. We actually want to leak the subtle semantic differences to
> userspace.
These are hardware emulations, of course they have to know about hardware
semantics. The QEMU IOMMUs can work in TCG mode where they decode and
handle everything themselves.
Thanks,
Jean
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 21:01         ` Jason Gunthorpe
  2023-03-10 12:16           ` Jean-Philippe Brucker
@ 2023-03-10 14:52           ` Robin Murphy
  2023-03-10 15:25             ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-10 14:52 UTC (permalink / raw)
  To: Jason Gunthorpe, Jean-Philippe Brucker
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, linux-arm-kernel, iommu, linux-kernel,
	yi.l.liu
On 2023-03-09 21:01, Jason Gunthorpe wrote:
>> For a lot of SMMUv3 implementations that have a single queue and for
>> other architectures, we can do better than hardware emulation.
> 
> How is using a SW emulated virtio formatted queue better than using a
> SW emulated SMMUv3 ECMDQ?
Since it's not been said, the really big thing is that virtio explicitly 
informs the host whenever the guest maps something. Emulating SMMUv3 
means the host has to chase all the pagetable pointers in guest memory 
and trap writes such that it has visibility of invalid->valid 
transitions and can update the physical shadow pagetable correspondingly.
FWIW we spent quite some time on and off discussing something like 
VT-d's "caching mode", but never found a convincing argument that it was 
a gap which needed filling, since we already had hardware nesting for 
maximum performance and a paravirtualisation option for efficient 
emulation. Thus full SMMUv3 emulation seems to just sit at the bottom as 
the maximum-compatibility option for pushing an unmodified legacy 
bare-metal software stack into a VM where nesting isn't available.
Cheers,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 14:52           ` Robin Murphy
@ 2023-03-10 15:25             ` Jason Gunthorpe
  2023-03-10 15:57               ` Robin Murphy
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 15:25 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jean-Philippe Brucker, Nicolin Chen, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Fri, Mar 10, 2023 at 02:52:42PM +0000, Robin Murphy wrote:
> On 2023-03-09 21:01, Jason Gunthorpe wrote:
> > > For a lot of SMMUv3 implementations that have a single queue and for
> > > other architectures, we can do better than hardware emulation.
> > 
> > How is using a SW emulated virtio formatted queue better than using a
> > SW emulated SMMUv3 ECMDQ?
> 
> Since it's not been said, the really big thing is that virtio explicitly
> informs the host whenever the guest maps something. Emulating SMMUv3 means
> the host has to chase all the pagetable pointers in guest memory and trap
> writes such that it has visibility of invalid->valid transitions and can
> update the physical shadow pagetable correspondingly.
Sorry, I mean in the context of future virtio-iommu that is providing
nested translation.
eg why would anyone want to use virtio to provide SMMUv3 based HW
accelerated nesting?
Jean suggested that the invalidation flow for virtio-iommu could be
faster because it is in kernel, but I'm saying that we could also make
the SMMUv3 invalidation in-kernel with the same basic technique. (and
actively wondering if we should put more focus on that)
I understand the appeal of the virtio scheme with its current
map/unmap interface.
I could also see some appeal of a simple virtio-iommu SVA that could
point map a CPU page table as an option. The guest already has to know
how to manage these anyhow so it is nicely general.
If iommufd could provide a general cross-driver API to set exactly
that scenario up then VMM code could also be general. That seems
prettty interesting.
But if the plan is to expose more detailed stuff like the CD or GCR3
PASID tables as something the guest has to manipulate and then a bunch
of special invalidation to support that, and VMM code to back it, then
I'm questioning the whole point. We lost the generality.
Just use the normal HW accelerated SMMUv3 nesting model instead.
If virtio-iommu SVA is really important for ARM then I'd suggest
SMMUv3 should gain a new HW capability to allowed the CD table to be
in hypervisor memory so it works consistently for virtio-iommu SVA.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 15:25             ` Jason Gunthorpe
@ 2023-03-10 15:57               ` Robin Murphy
  2023-03-10 16:03                 ` Jason Gunthorpe
  2023-03-17 10:04                 ` Tian, Kevin
  0 siblings, 2 replies; 165+ messages in thread
From: Robin Murphy @ 2023-03-10 15:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, Nicolin Chen, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On 2023-03-10 15:25, Jason Gunthorpe wrote:
> On Fri, Mar 10, 2023 at 02:52:42PM +0000, Robin Murphy wrote:
>> On 2023-03-09 21:01, Jason Gunthorpe wrote:
>>>> For a lot of SMMUv3 implementations that have a single queue and for
>>>> other architectures, we can do better than hardware emulation.
>>>
>>> How is using a SW emulated virtio formatted queue better than using a
>>> SW emulated SMMUv3 ECMDQ?
>>
>> Since it's not been said, the really big thing is that virtio explicitly
>> informs the host whenever the guest maps something. Emulating SMMUv3 means
>> the host has to chase all the pagetable pointers in guest memory and trap
>> writes such that it has visibility of invalid->valid transitions and can
>> update the physical shadow pagetable correspondingly.
> 
> Sorry, I mean in the context of future virtio-iommu that is providing
> nested translation.
Ah, that's probably me missing the context again.
> eg why would anyone want to use virtio to provide SMMUv3 based HW
> accelerated nesting?
> 
> Jean suggested that the invalidation flow for virtio-iommu could be
> faster because it is in kernel, but I'm saying that we could also make
> the SMMUv3 invalidation in-kernel with the same basic technique. (and
> actively wondering if we should put more focus on that)
> 
> I understand the appeal of the virtio scheme with its current
> map/unmap interface.
> 
> I could also see some appeal of a simple virtio-iommu SVA that could
> point map a CPU page table as an option. The guest already has to know
> how to manage these anyhow so it is nicely general.
> 
> If iommufd could provide a general cross-driver API to set exactly
> that scenario up then VMM code could also be general. That seems
> prettty interesting.
Indeed, I've always assumed the niche for virtio would be that kind of 
in-between use-case using nesting to accelerate simple translation, 
where we plug a guest-owned pagetable into a host-owned context. That 
way the guest retains the simple virtio interface and only needs to 
understand a pagetable format (or as you say, simply share a CPU 
pagetable) without having to care about the nitty-gritty of all the 
IOMMU-specific moving parts around it. For guests that want to get into 
more advanced stuff like managing their own PASID tables, pushing them 
towards "native" nesting probably does make more sense.
> But if the plan is to expose more detailed stuff like the CD or GCR3
> PASID tables as something the guest has to manipulate and then a bunch
> of special invalidation to support that, and VMM code to back it, then
> I'm questioning the whole point. We lost the generality.
> 
> Just use the normal HW accelerated SMMUv3 nesting model instead.
> 
> If virtio-iommu SVA is really important for ARM then I'd suggest
> SMMUv3 should gain a new HW capability to allowed the CD table to be
> in hypervisor memory so it works consistently for virtio-iommu SVA.
Oh, maybe I should have read this far before reasoning the exact same 
thing from scratch... oh well, this time I'm not going to go back and 
edit :)
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 15:57               ` Robin Murphy
@ 2023-03-10 16:03                 ` Jason Gunthorpe
  2023-03-17 10:10                   ` Tian, Kevin
  2023-03-17 10:04                 ` Tian, Kevin
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 16:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jean-Philippe Brucker, Nicolin Chen, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Fri, Mar 10, 2023 at 03:57:27PM +0000, Robin Murphy wrote:
> about the nitty-gritty of all the IOMMU-specific moving parts around it. For
> guests that want to get into more advanced stuff like managing their own
> PASID tables, pushing them towards "native" nesting probably does make more
> sense.
IMHO with the simplified virtio model I would say the guest should
not have its own PASID table.
hyper trap to install a PASID and let the hypervisor driver handle
this abstractly. If abstractly is the whole point and benifit then
virtio should lean into that.
This also means virtio protocol doesn't do PASID invalidation. It
invalidates an ASID and the hypervisor takes care of whatever it is
connected to. Very simple and general for the VM.
Adding a S1 iommu_domain op for invalidate address range is perfectly
fine and the virtio kernel hypervisor driver can call it generically.
The primary reason to have guest-owned PASID tables is CC stuff, which
definitely won't be part of virtio-iommu.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 16:03                 ` Jason Gunthorpe
@ 2023-03-17 10:10                   ` Tian, Kevin
  0 siblings, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17 10:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Jean-Philippe Brucker, Nicolin Chen, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, March 11, 2023 12:03 AM
> 
> On Fri, Mar 10, 2023 at 03:57:27PM +0000, Robin Murphy wrote:
> 
> > about the nitty-gritty of all the IOMMU-specific moving parts around it. For
> > guests that want to get into more advanced stuff like managing their own
> > PASID tables, pushing them towards "native" nesting probably does make
> more
> > sense.
> 
> IMHO with the simplified virtio model I would say the guest should
> not have its own PASID table.
> 
> hyper trap to install a PASID and let the hypervisor driver handle
> this abstractly. If abstractly is the whole point and benifit then
> virtio should lean into that.
> 
> This also means virtio protocol doesn't do PASID invalidation. It
> invalidates an ASID and the hypervisor takes care of whatever it is
> connected to. Very simple and general for the VM.
this sounds fair, if ASID here refers a general ID identifying the page
table instead of ARM specific ASID. 😊
but guest still needs to manage the PASID and program PASID into
the assigned device to tag DMA.
> 
> Adding a S1 iommu_domain op for invalidate address range is perfectly
> fine and the virtio kernel hypervisor driver can call it generically.
> 
> The primary reason to have guest-owned PASID tables is CC stuff, which
> definitely won't be part of virtio-iommu.
> 
This fits Intel well.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 15:57               ` Robin Murphy
  2023-03-10 16:03                 ` Jason Gunthorpe
@ 2023-03-17 10:04                 ` Tian, Kevin
  1 sibling, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17 10:04 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: Jean-Philippe Brucker, Nicolin Chen, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Liu, Yi L
> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Friday, March 10, 2023 11:57 PM
> 
> >
> > If iommufd could provide a general cross-driver API to set exactly
> > that scenario up then VMM code could also be general. That seems
> > prettty interesting.
> 
> Indeed, I've always assumed the niche for virtio would be that kind of
> in-between use-case using nesting to accelerate simple translation,
> where we plug a guest-owned pagetable into a host-owned context. That
> way the guest retains the simple virtio interface and only needs to
> understand a pagetable format (or as you say, simply share a CPU
> pagetable) without having to care about the nitty-gritty of all the
> IOMMU-specific moving parts around it. For guests that want to get into
> more advanced stuff like managing their own PASID tables, pushing them
> towards "native" nesting probably does make more sense.
> 
Interesting thing is that we cannot expose both virtio-iommu and
emulated vIOMMU to one guest to choose. then if the guest has
been using virtio-iommu for whatever reason naturally it may
want more advanced features on it too.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 14:48     ` Jason Gunthorpe
  2023-03-09 18:26       ` Jean-Philippe Brucker
@ 2023-03-10  4:50       ` Nicolin Chen
  2023-03-10 12:54         ` Jean-Philippe Brucker
  2023-03-10 16:06         ` Jason Gunthorpe
  1 sibling, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  4:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> Nicolin, I think we should tweak the uAPI here so that the
> invalidation opaque data has a format tagged on its own, instead of
> re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> tag and also a virtio-viommu invalidate type tag.
The invalidation tage is shared with the hwpt allocation. Does
it mean that virtio-iommu won't have it's own allocation tag?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10  4:50       ` Nicolin Chen
@ 2023-03-10 12:54         ` Jean-Philippe Brucker
  2023-03-10 14:00           ` Jason Gunthorpe
  2023-03-10 16:06         ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Jean-Philippe Brucker @ 2023-03-10 12:54 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 08:50:52PM -0800, Nicolin Chen wrote:
> On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> 
> > Nicolin, I think we should tweak the uAPI here so that the
> > invalidation opaque data has a format tagged on its own, instead of
> > re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> > tag and also a virtio-viommu invalidate type tag.
> 
> The invalidation tage is shared with the hwpt allocation. Does
> it mean that virtio-iommu won't have it's own allocation tag?
I'm not entirely sure what you mean by allocation tag. For example with
SMMU, when attaching page tables (SMMUv2), the guest passes an ASID at
allocation, and when it modifies that address space it passes the same
ASID for invalidation. When attaching PASID tables (SMMUv3), it writes the
ASID/PASID in the PASID table, and passes both in the invalidation.
Note that none of this is set in stone. It copies the Linux API we
originally discussed, but we were waiting for progress on that front
before committing to anything. Now we'll probably align to the new API
where possible, leaving out what doesn't work for virtio-iommu.
Thanks,
Jean
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 12:54         ` Jean-Philippe Brucker
@ 2023-03-10 14:00           ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 14:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Nicolin Chen, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Fri, Mar 10, 2023 at 12:54:53PM +0000, Jean-Philippe Brucker wrote:
> On Thu, Mar 09, 2023 at 08:50:52PM -0800, Nicolin Chen wrote:
> > On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> > 
> > > Nicolin, I think we should tweak the uAPI here so that the
> > > invalidation opaque data has a format tagged on its own, instead of
> > > re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> > > tag and also a virtio-viommu invalidate type tag.
> > 
> > The invalidation tage is shared with the hwpt allocation. Does
> > it mean that virtio-iommu won't have it's own allocation tag?
> 
> I'm not entirely sure what you mean by allocation tag. 
He means the tag identifying the allocation driver specific data is
the same tag that is passed in to identify the invalidation driver
specific data.
With the notion that the allocation data and invalidation data would
be in the same driver's format.
> Note that none of this is set in stone. It copies the Linux API we
> originally discussed, but we were waiting for progress on that front
> before committing to anything. Now we'll probably align to the new API
> where possible, leaving out what doesn't work for virtio-iommu.
IMHO virtio-iommu should stand alone and make sense with its own
internal object model.
eg I would probably try not to have guests invalidate PASID. Have a
strong ASID model and in most cases have the hypervisor track where
the ASID's are mapped to PASID/etc and rely on the hypervisor to spew
the invalidations to PASID as required. It is more abstracted from the
actual HW for the guest. The guest can simply say it changed an IOPTE
under a certain ASID.
The ugly wrinkle is SMMUv3 but perhaps your idea of allowing the
hypervisor to manage the CD table in guest memory is reasonable.
IMHO it is a missing SMMUv3 HW feature that the CD table doesn't have
the option to be in hypervisor memory. AMD allows both options - so
I'm not sure I would invest a huge amount to make special cases to
support this... Assume a SMMUv3 update might gain the option someday.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10  4:50       ` Nicolin Chen
  2023-03-10 12:54         ` Jean-Philippe Brucker
@ 2023-03-10 16:06         ` Jason Gunthorpe
  2023-03-16  0:59           ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 16:06 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jean-Philippe Brucker, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Thu, Mar 09, 2023 at 08:50:52PM -0800, Nicolin Chen wrote:
> On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> 
> > Nicolin, I think we should tweak the uAPI here so that the
> > invalidation opaque data has a format tagged on its own, instead of
> > re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> > tag and also a virtio-viommu invalidate type tag.
> 
> The invalidation tage is shared with the hwpt allocation. Does
> it mean that virtio-iommu won't have it's own allocation tag?
We probably shouldn't assume it will
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 16:06         ` Jason Gunthorpe
@ 2023-03-16  0:59           ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16  0:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, robin.murphy, will, eric.auger, kevin.tian,
	baolu.lu, joro, shameerali.kolothum.thodi, linux-arm-kernel,
	iommu, linux-kernel, yi.l.liu
On Fri, Mar 10, 2023 at 12:06:18PM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 08:50:52PM -0800, Nicolin Chen wrote:
> > On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> > 
> > > Nicolin, I think we should tweak the uAPI here so that the
> > > invalidation opaque data has a format tagged on its own, instead of
> > > re-using the HWPT tag. Ie you can have a ARM SMMUv3 invalidate type
> > > tag and also a virtio-viommu invalidate type tag.
> > 
> > The invalidation tage is shared with the hwpt allocation. Does
> > it mean that virtio-iommu won't have it's own allocation tag?
> 
> We probably shouldn't assume it will
In that case, why do have need an invalidation tag/type on its
own? Can't we use an IOMMU_HWPT_TYPE_VIRTIO tag for allocation
and invalidation together for virtio?
Or did you mean that we should define a flag inside the data
structure like this?
struct iommu_hwpt_invalidate_arm_smmuv3 {
#define IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF  (1 << 0)
#define IOMMU_SMMUV3_FORMAT_VIRTIO	(1 << 63)
	__u64 flags;
}
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
 
 
 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 13:42   ` Jean-Philippe Brucker
  2023-03-09 14:48     ` Jason Gunthorpe
@ 2023-03-09 15:26     ` Shameerali Kolothum Thodi
  2023-03-09 15:40       ` Jason Gunthorpe
  2023-03-10  5:04     ` Nicolin Chen
  2023-03-10 11:33     ` Eric Auger
  3 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-09 15:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Nicolin Chen
  Cc: jgg@nvidia.com, robin.murphy@arm.com, will@kernel.org,
	eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org]
> Sent: 09 March 2023 13:42
> To: Nicolin Chen <nicolinc@nvidia.com>
> Cc: jgg@nvidia.com; robin.murphy@arm.com; will@kernel.org;
> eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>;
> linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org; yi.l.liu@intel.com
> Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures
> for ARM SMMUv3
> 
> Hi Nicolin,
> 
> On Thu, Mar 09, 2023 at 02:53:38AM -0800, Nicolin Chen wrote:
> > Add the following data structures for corresponding ioctls:
> >                iommu_hwpt_arm_smmuv3 =>
> IOMMUFD_CMD_HWPT_ALLOC
> >     iommu_hwpt_invalidate_arm_smmuv3 =>
> IOMMUFD_CMD_HWPT_INVALIDATE
> >
> > Also, add IOMMU_HW_INFO_TYPE_ARM_SMMUV3 and
> IOMMU_PGTBL_TYPE_ARM_SMMUV3_S1
> > to the header and corresponding type/size arrays.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> > +/**
> > + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific page table
> data
> > + *
> > + * @flags: page table entry attributes
> > + * @s2vmid: Virtual machine identifier
> > + * @s1ctxptr: Stage-1 context descriptor pointer
> > + * @s1cdmax: Number of CDs pointed to by s1ContextPtr
> > + * @s1fmt: Stage-1 Format
> > + * @s1dss: Default substream
> > + */
> > +struct iommu_hwpt_arm_smmuv3 {
> > +#define IOMMU_SMMUV3_FLAG_S2	(1 << 0) /* if unset, stage-1 */
> 
> I don't understand the purpose of this flag, since the structure only
> provides stage-1 configuration fields
> 
> > +#define IOMMU_SMMUV3_FLAG_VMID	(1 << 1) /* vmid override */
> 
> Doesn't this break isolation?  The VMID space is global for the SMMU, so
> the guest could access cached mappings of another device
On platforms that supports BTM [1], we may need the VMID allocated by KVM.
But again getting that from user pace doesn't look safe. I have attempted to revise
the earlier RFC to pin and use the KVM VMID from SMMUv3 here[2].
But the problem is getting the KVM instance associated with the device. Currently I am 
going through the VFIO layer to retrieve the KVM instance(vfio_device->kvm).
On the previous RFC discussion thread[3], Jean has proposed,
" In the new design we can require from the start that creating a nesting IOMMU
container through /dev/iommu *must* come with a KVM context, that way
we're sure to reuse the existing VMID. "
Is that something we can still do or there is a better way to handle this now?
Thanks,
Shameer
1. https://lore.kernel.org/linux-arm-kernel/YEEUocRn3IfIDpLj@myrica/T/#m478f7e7d5dcb729e02721beda35efa12c1d20707
2. https://github.com/hisilicon/kernel-dev/commits/iommufd-v6.2-rc4-nesting-arm-btm-v2
3. https://lore.kernel.org/linux-arm-kernel/YEEUocRn3IfIDpLj@myrica/T/#m11cde7534943ea7cf35f534cb809a023eabd9da3
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 15:26     ` Shameerali Kolothum Thodi
@ 2023-03-09 15:40       ` Jason Gunthorpe
  2023-03-09 15:51         ` Shameerali Kolothum Thodi
  2023-03-10  5:18         ` Nicolin Chen
  0 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 15:40 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 03:26:12PM +0000, Shameerali Kolothum Thodi wrote:
> On platforms that supports BTM [1], we may need the VMID allocated by KVM.
> But again getting that from user pace doesn't look safe. I have attempted to revise
> the earlier RFC to pin and use the KVM VMID from SMMUv3 here[2].
Gurk
> " In the new design we can require from the start that creating a nesting IOMMU
> container through /dev/iommu *must* come with a KVM context, that way
> we're sure to reuse the existing VMID. "
I've been dreading this but yes I execpt we will eventually need to
connect kvm and iommufd together. The iommu driver can receive a kvm
pointer as part of the alloc domain operation to do any setups like
this.
If there is no KVM it should either fail to setup the domain or setup
a domain disconnected from KVM.
If IOMMU HW and KVM HW are using the same ID number space then
arguably the two kernel drivers need to use a shared ID allocator in
the arch, regardless of what iommufd/etc does. Using KVM should not be
mandatory for iommufd.
For ARM cases where there is no shared VMID space with KVM, the ARM
VMID should be somehow assigned to the iommfd_ctx itself and the alloc
domain op should receive it from there.
Nicolin, that seems to be missing in this series? I'm not entirely
sure how to elegantly code it :\
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 15:40       ` Jason Gunthorpe
@ 2023-03-09 15:51         ` Shameerali Kolothum Thodi
  2023-03-09 15:59           ` Jason Gunthorpe
  2023-03-10  5:18         ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-09 15:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 09 March 2023 15:40
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>; Nicolin Chen
> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; linux-arm-kernel@lists.infradead.org;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org; yi.l.liu@intel.com
> Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures
> for ARM SMMUv3
> 
> On Thu, Mar 09, 2023 at 03:26:12PM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > On platforms that supports BTM [1], we may need the VMID allocated by
> KVM.
> > But again getting that from user pace doesn't look safe. I have attempted
> to revise
> > the earlier RFC to pin and use the KVM VMID from SMMUv3 here[2].
> 
> Gurk
> 
> > " In the new design we can require from the start that creating a nesting
> IOMMU
> > container through /dev/iommu *must* come with a KVM context, that way
> > we're sure to reuse the existing VMID. "
> 
> I've been dreading this but yes I execpt we will eventually need to
> connect kvm and iommufd together. The iommu driver can receive a kvm
> pointer as part of the alloc domain operation to do any setups like
> this.
That will make life easier :)
 
> If there is no KVM it should either fail to setup the domain or setup
> a domain disconnected from KVM.
> 
If no KVM the SMMUv3 can fall back to its internal VMID allocation I guess.
And my intention was to use KVM VMID only if the platform supports
BTM.
> If IOMMU HW and KVM HW are using the same ID number space then
> arguably the two kernel drivers need to use a shared ID allocator in
> the arch, regardless of what iommufd/etc does. Using KVM should not be
> mandatory for iommufd.
> 
> For ARM cases where there is no shared VMID space with KVM, the ARM
> VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> domain op should receive it from there.
Is there any use of VMID outside SMMUv3? I was thinking if nested domain alloc
doesn't provide the KVM instance, then SMMUv3 can use its internal VMID. 
Thanks,
Shameer
> Nicolin, that seems to be missing in this series? I'm not entirely
> sure how to elegantly code it :\
> 
> Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 15:51         ` Shameerali Kolothum Thodi
@ 2023-03-09 15:59           ` Jason Gunthorpe
  2023-03-09 16:07             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 15:59 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 03:51:42PM +0000, Shameerali Kolothum Thodi wrote:
> > For ARM cases where there is no shared VMID space with KVM, the ARM
> > VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> > domain op should receive it from there.
> 
> Is there any use of VMID outside SMMUv3? I was thinking if nested domain alloc
> doesn't provide the KVM instance, then SMMUv3 can use its internal VMID. 
When we talk about exposing an SMMUv3 IOMMU CMDQ directly to userspace then
VMID is the security token that protects it.
So in that environment every domain under the same iommufd should
share the same VMID so that the CMDQ's also share the same VMID.
I expect this to be a common sort of requirement as we will see
userspace command queues in the other HW as well.
So, I suppose the answer for now is that ARM SMMUv3 should just
allocate one VMID per iommu_domain and there should be no VMID in the
uapi at all.
Moving all iommu_domains to share the same VMID is a future patch.
Though.. I have no idea how vVMID is handled in the SMMUv3
architecture. I suppose the guest IOMMU HW caps are set in a way that
it knows it does not have VMID?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 15:59           ` Jason Gunthorpe
@ 2023-03-09 16:07             ` Shameerali Kolothum Thodi
  2023-03-10  5:26               ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-09 16:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 09 March 2023 16:00
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>; Nicolin Chen
> <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; linux-arm-kernel@lists.infradead.org;
> iommu@lists.linux.dev; linux-kernel@vger.kernel.org; yi.l.liu@intel.com
> Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures
> for ARM SMMUv3
> 
> On Thu, Mar 09, 2023 at 03:51:42PM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > > For ARM cases where there is no shared VMID space with KVM, the ARM
> > > VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> > > domain op should receive it from there.
> >
> > Is there any use of VMID outside SMMUv3? I was thinking if nested domain
> alloc
> > doesn't provide the KVM instance, then SMMUv3 can use its internal VMID.
> 
> When we talk about exposing an SMMUv3 IOMMU CMDQ directly to
> userspace then
> VMID is the security token that protects it.
> 
> So in that environment every domain under the same iommufd should
> share the same VMID so that the CMDQ's also share the same VMID.
> 
> I expect this to be a common sort of requirement as we will see
> userspace command queues in the other HW as well.
> 
> So, I suppose the answer for now is that ARM SMMUv3 should just
> allocate one VMID per iommu_domain and there should be no VMID in the
> uapi at all.
> 
> Moving all iommu_domains to share the same VMID is a future patch.
> 
> Though.. I have no idea how vVMID is handled in the SMMUv3
> architecture. I suppose the guest IOMMU HW caps are set in a way that
> it knows it does not have VMID?
I think, Guest only sets up the SMMUv3 S1 stage and it doesn't use VMID.
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 16:07             ` Shameerali Kolothum Thodi
@ 2023-03-10  5:26               ` Nicolin Chen
  2023-03-10  5:36                 ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  5:26 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, Jean-Philippe Brucker, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 04:07:54PM +0000, Shameerali Kolothum Thodi wrote:
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > Sent: 09 March 2023 16:00
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>; Nicolin Chen
> > <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> > eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> > joro@8bytes.org; linux-arm-kernel@lists.infradead.org;
> > iommu@lists.linux.dev; linux-kernel@vger.kernel.org; yi.l.liu@intel.com
> > Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures
> > for ARM SMMUv3
> >
> > On Thu, Mar 09, 2023 at 03:51:42PM +0000, Shameerali Kolothum Thodi
> > wrote:
> >
> > > > For ARM cases where there is no shared VMID space with KVM, the ARM
> > > > VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> > > > domain op should receive it from there.
> > >
> > > Is there any use of VMID outside SMMUv3? I was thinking if nested domain
> > alloc
> > > doesn't provide the KVM instance, then SMMUv3 can use its internal VMID.
> >
> > When we talk about exposing an SMMUv3 IOMMU CMDQ directly to
> > userspace then
> > VMID is the security token that protects it.
> >
> > So in that environment every domain under the same iommufd should
> > share the same VMID so that the CMDQ's also share the same VMID.
> >
> > I expect this to be a common sort of requirement as we will see
> > userspace command queues in the other HW as well.
> >
> > So, I suppose the answer for now is that ARM SMMUv3 should just
> > allocate one VMID per iommu_domain and there should be no VMID in the
> > uapi at all.
> >
> > Moving all iommu_domains to share the same VMID is a future patch.
> >
> > Though.. I have no idea how vVMID is handled in the SMMUv3
> > architecture. I suppose the guest IOMMU HW caps are set in a way that
> > it knows it does not have VMID?
> 
> I think, Guest only sets up the SMMUv3 S1 stage and it doesn't use VMID.
Yea, a vmid is only allocated in an S2 domain allocation. So,
a guest allocating only S1 domains always sets VMID=0. Yet, I
think that the hypervisor or some where in host kernel should
replace the VMID=0 with a unified VMID.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10  5:26               ` Nicolin Chen
@ 2023-03-10  5:36                 ` Nicolin Chen
  2023-03-10 12:55                   ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  5:36 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, Jean-Philippe Brucker, robin.murphy@arm.com,
	will@kernel.org, eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 09:26:57PM -0800, Nicolin Chen wrote:
> On Thu, Mar 09, 2023 at 04:07:54PM +0000, Shameerali Kolothum Thodi wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > > -----Original Message-----
> > > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > > Sent: 09 March 2023 16:00
> > > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>; Nicolin Chen
> > > <nicolinc@nvidia.com>; robin.murphy@arm.com; will@kernel.org;
> > > eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> > > joro@8bytes.org; linux-arm-kernel@lists.infradead.org;
> > > iommu@lists.linux.dev; linux-kernel@vger.kernel.org; yi.l.liu@intel.com
> > > Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures
> > > for ARM SMMUv3
> > >
> > > On Thu, Mar 09, 2023 at 03:51:42PM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > >
> > > > > For ARM cases where there is no shared VMID space with KVM, the ARM
> > > > > VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> > > > > domain op should receive it from there.
> > > >
> > > > Is there any use of VMID outside SMMUv3? I was thinking if nested domain
> > > alloc
> > > > doesn't provide the KVM instance, then SMMUv3 can use its internal VMID.
> > >
> > > When we talk about exposing an SMMUv3 IOMMU CMDQ directly to
> > > userspace then
> > > VMID is the security token that protects it.
> > >
> > > So in that environment every domain under the same iommufd should
> > > share the same VMID so that the CMDQ's also share the same VMID.
> > >
> > > I expect this to be a common sort of requirement as we will see
> > > userspace command queues in the other HW as well.
> > >
> > > So, I suppose the answer for now is that ARM SMMUv3 should just
> > > allocate one VMID per iommu_domain and there should be no VMID in the
> > > uapi at all.
> > >
> > > Moving all iommu_domains to share the same VMID is a future patch.
> > >
> > > Though.. I have no idea how vVMID is handled in the SMMUv3
> > > architecture. I suppose the guest IOMMU HW caps are set in a way that
> > > it knows it does not have VMID?
> > 
> > I think, Guest only sets up the SMMUv3 S1 stage and it doesn't use VMID.
> 
> Yea, a vmid is only allocated in an S2 domain allocation. So,
> a guest allocating only S1 domains always sets VMID=0. Yet, I
> think that the hypervisor or some where in host kernel should
> replace the VMID=0 with a unified VMID.
Ah, I just recall a conversation with Jason that a VM should only
have one S2 domain. In that case, the VMID is already unified?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10  5:36                 ` Nicolin Chen
@ 2023-03-10 12:55                   ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 12:55 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameerali Kolothum Thodi, Jean-Philippe Brucker,
	robin.murphy@arm.com, will@kernel.org, eric.auger@redhat.com,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 09:36:18PM -0800, Nicolin Chen wrote:
> > Yea, a vmid is only allocated in an S2 domain allocation. So,
> > a guest allocating only S1 domains always sets VMID=0. Yet, I
> > think that the hypervisor or some where in host kernel should
> > replace the VMID=0 with a unified VMID.
> 
> Ah, I just recall a conversation with Jason that a VM should only
> have one S2 domain. In that case, the VMID is already unified?
Not requried per-say, but yes, most likely qemu would run that way.
But you can't just re-use the VMID however you like. AFAIK the VMID is
the cache tag for the S2 IOPTEs, so every VMID must refer to the same
S2 translation.
You can't mix different S2's with the same VMID.
Thus you are stuck with the single S2 model in qemu if you want to use
a userspace CMDQ.
I suppose that suggests that if KVM supplies the VMID then it is
assigned to a singular S2 iommu_domain also.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 15:40       ` Jason Gunthorpe
  2023-03-09 15:51         ` Shameerali Kolothum Thodi
@ 2023-03-10  5:18         ` Nicolin Chen
  1 sibling, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  5:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, Jean-Philippe Brucker,
	robin.murphy@arm.com, will@kernel.org, eric.auger@redhat.com,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
On Thu, Mar 09, 2023 at 11:40:16AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 03:26:12PM +0000, Shameerali Kolothum Thodi wrote:
> 
> > On platforms that supports BTM [1], we may need the VMID allocated by KVM.
> > But again getting that from user pace doesn't look safe. I have attempted to revise
> > the earlier RFC to pin and use the KVM VMID from SMMUv3 here[2].
> 
> Gurk
> 
> > " In the new design we can require from the start that creating a nesting IOMMU
> > container through /dev/iommu *must* come with a KVM context, that way
> > we're sure to reuse the existing VMID. "
> 
> I've been dreading this but yes I execpt we will eventually need to
> connect kvm and iommufd together. The iommu driver can receive a kvm
> pointer as part of the alloc domain operation to do any setups like
> this.
> 
> If there is no KVM it should either fail to setup the domain or setup
> a domain disconnected from KVM.
> 
> If IOMMU HW and KVM HW are using the same ID number space then
> arguably the two kernel drivers need to use a shared ID allocator in
> the arch, regardless of what iommufd/etc does. Using KVM should not be
> mandatory for iommufd.
> 
> For ARM cases where there is no shared VMID space with KVM, the ARM
> VMID should be somehow assigned to the iommfd_ctx itself and the alloc
> domain op should receive it from there.
> 
> Nicolin, that seems to be missing in this series? I'm not entirely
> sure how to elegantly code it :\
Yea, it's missing. The VMID thing is supposed to be a sneak peek
of my next VCMDQ solution. Now it seems that BTM needs this too.
Remember that my previous VCMDQ series had a big complication to
share VMID across the passthrough devices in the same VM? During
that patch review, we concluded that IOMMUFD would simply align
VMIDs using a unified ctx ID or so, IIRC.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 13:42   ` Jean-Philippe Brucker
  2023-03-09 14:48     ` Jason Gunthorpe
  2023-03-09 15:26     ` Shameerali Kolothum Thodi
@ 2023-03-10  5:04     ` Nicolin Chen
  2023-03-10 11:33     ` Eric Auger
  3 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  5:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: jgg, robin.murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, linux-arm-kernel, iommu, linux-kernel,
	yi.l.liu
Hi Jeans,
Allow me to partially reply your email:
On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:
> > +struct iommu_hwpt_arm_smmuv3 {
> > +#define IOMMU_SMMUV3_FLAG_S2 (1 << 0) /* if unset, stage-1 */
> 
> I don't understand the purpose of this flag, since the structure only
> provides stage-1 configuration fields
I should have probably put more description for this flag. It
is used to allocate a stage-2 domain for a nested translation
setup. The default allocation for a kernel-managed domain will
allocate an S1 format of IO page table, at ARM_SMMU_DOMAIN_S1
stage. But a nested kernel-managed domain needs an S2 format,
at ARM_SMMU_DOMAIN_S2.
	
So the whole structure seems to only provide stage-1 info but
it's used for both stages. And a stage-2 allocation will only
need s2vmid if VMID flag is set (explaining below).
> > +#define IOMMU_SMMUV3_FLAG_VMID       (1 << 1) /* vmid override */
> 
> Doesn't this break isolation?  The VMID space is global for the SMMU, so
> the guest could access cached mappings of another device
This flag isn't mature yet. I kept it from my internal RFC to
see if we can have a better solution. There are use cases on
certain platforms where the VMIDs across all devices in the
same VM need to be aligned.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-09 13:42   ` Jean-Philippe Brucker
                       ` (2 preceding siblings ...)
  2023-03-10  5:04     ` Nicolin Chen
@ 2023-03-10 11:33     ` Eric Auger
  2023-03-10 12:51       ` Jason Gunthorpe
  3 siblings, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-10 11:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Nicolin Chen
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, linux-arm-kernel, iommu, linux-kernel,
	yi.l.liu
Hi,
On 3/9/23 14:42, Jean-Philippe Brucker wrote:
> Hi Nicolin,
>
> On Thu, Mar 09, 2023 at 02:53:38AM -0800, Nicolin Chen wrote:
>> Add the following data structures for corresponding ioctls:
>>                iommu_hwpt_arm_smmuv3 => IOMMUFD_CMD_HWPT_ALLOC
>>     iommu_hwpt_invalidate_arm_smmuv3 => IOMMUFD_CMD_HWPT_INVALIDATE
>>
>> Also, add IOMMU_HW_INFO_TYPE_ARM_SMMUV3 and IOMMU_PGTBL_TYPE_ARM_SMMUV3_S1
>> to the header and corresponding type/size arrays.
>>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> +/**
>> + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 specific page table data
>> + *
>> + * @flags: page table entry attributes
>> + * @s2vmid: Virtual machine identifier
>> + * @s1ctxptr: Stage-1 context descriptor pointer
>> + * @s1cdmax: Number of CDs pointed to by s1ContextPtr
>> + * @s1fmt: Stage-1 Format
>> + * @s1dss: Default substream
>> + */
>> +struct iommu_hwpt_arm_smmuv3 {
>> +#define IOMMU_SMMUV3_FLAG_S2	(1 << 0) /* if unset, stage-1 */
> I don't understand the purpose of this flag, since the structure only
> provides stage-1 configuration fields
>
>> +#define IOMMU_SMMUV3_FLAG_VMID	(1 << 1) /* vmid override */
> Doesn't this break isolation?  The VMID space is global for the SMMU, so
> the guest could access cached mappings of another device
>
>> +	__u64 flags;
>> +	__u32 s2vmid;
>> +	__u32 __reserved;
>> +	__u64 s1ctxptr;
>> +	__u64 s1cdmax;
>> +	__u64 s1fmt;
>> +	__u64 s1dss;
>> +};
>> +
>
>> +/**
>> + * struct iommu_hwpt_invalidate_arm_smmuv3 - ARM SMMUv3 cahce invalidation info
>> + * @flags: boolean attributes of cache invalidation command
>> + * @opcode: opcode of cache invalidation command
>> + * @ssid: SubStream ID
>> + * @granule_size: page/block size of the mapping in bytes
>> + * @range: IOVA range to invalidate
>> + */
>> +struct iommu_hwpt_invalidate_arm_smmuv3 {
>> +#define IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF	(1 << 0)
>> +	__u64 flags;
>> +	__u8 opcode;
>> +	__u8 padding[3];
>> +	__u32 asid;
>> +	__u32 ssid;
>> +	__u32 granule_size;
>> +	struct iommu_iova_range range;
>> +};
> Although we can keep the alloc and hardware info separate for each IOMMU
> architecture, we should try to come up with common invalidation methods.
>
> It matters because things like vSVA, or just efficient dynamic mappings,
> will require optimal invalidation latency. A paravirtual interface like
> vhost-iommu can help with that, as the host kernel will handle guest
> invalidations directly instead of doing a round-trip to host userspace
> (and we'll likely want the same path for PRI.)
>
> Supporting HW page tables for a common PV IOMMU does require some
> architecture-specific knowledge, but invalidation messages contain roughly
> the same information on all architectures. The PV IOMMU won't include
> command opcodes for each possible architecture if one generic command does
> the same job.
>
> Ideally I'd like something like this for vhost-iommu:
>
> * slow path through userspace for complex requests like attach-table and
>   probe, where the VMM can decode arch-specific information and translate
>   them to iommufd and vhost-iommu ioctls to update the configuration.
>
> * fast path within the kernel for performance-critical requests like
>   invalidate, page request and response. It would be absurd for the
>   vhost-iommu driver to translate generic invalidation requests from the
>   guest into arch-specific commands with special opcodes, when the next
>   step is calling the IOMMU driver which does that for free.
>
> During previous discussions we came up with generic invalidations that
> could fit both Arm and x86 [1][2]. The only difference was the ASID
> (called archid/id in those proposals) which VT-d didn't need. Could we try
> to build on that?
I do agree with Jean. We spent a lot of efforts all together to define
this generic invalidation API and if there is compelling reason that
prevents from using it, we should try to reuse it.
Thanks
Eric
>
> [1] https://elixir.bootlin.com/linux/v5.17/source/include/uapi/linux/iommu.h#L161
> [2] https://lists.oasis-open.org/archives/virtio-dev/202102/msg00014.html
>
> Thanks,
> Jean
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 11:33     ` Eric Auger
@ 2023-03-10 12:51       ` Jason Gunthorpe
  2023-03-17 10:17         ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 12:51 UTC (permalink / raw)
  To: Eric Auger
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy, will,
	kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	linux-arm-kernel, iommu, linux-kernel, yi.l.liu
On Fri, Mar 10, 2023 at 12:33:12PM +0100, Eric Auger wrote:
> I do agree with Jean. We spent a lot of efforts all together to define
> this generic invalidation API and if there is compelling reason that
> prevents from using it, we should try to reuse it.
That's the compelling reason in a nutshell right there.
Alot of time was invested to create something that might be
general. We still don't know if it is well defined and general. Even
more time is going to be required on it before it could go forward. In
future more time will be needed for every future HW to try and fit
into it. We don't even know if it will scale to future HW. Nobody has
even checked what today's POWER and S390 HW need.
vs, this stuff was made in a few days. We know it is correct as a uAPI
since it mirrors the HW and we know it is scalable to different HW
schemes if they come up.
So I don't see a good reason to take a risk on a "general" uAPI. If we
make this wrong it could seriously damage the main goal of iommufd -
to build accelerated vIOMMU models.
Especially since the motivating reason in this thread - use it for
virtio-iommu - doesn't even want to use it as a uAPI!
If we get a vhost-virtio then we can decide what to do in-kernel and
maybe this general API returns as an in-kernel API, I dont know, we
need to see what it is this thing ends up looking like.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
  2023-03-10 12:51       ` Jason Gunthorpe
@ 2023-03-17 10:17         ` Tian, Kevin
  0 siblings, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17 10:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Eric Auger
  Cc: Jean-Philippe Brucker, Nicolin Chen, robin.murphy@arm.com,
	will@kernel.org, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, Liu, Yi L
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 10, 2023 8:52 PM
> 
> On Fri, Mar 10, 2023 at 12:33:12PM +0100, Eric Auger wrote:
> 
> > I do agree with Jean. We spent a lot of efforts all together to define
> > this generic invalidation API and if there is compelling reason that
> > prevents from using it, we should try to reuse it.
> 
> That's the compelling reason in a nutshell right there.
> 
> Alot of time was invested to create something that might be
> general. We still don't know if it is well defined and general. Even
> more time is going to be required on it before it could go forward. In
> future more time will be needed for every future HW to try and fit
> into it. We don't even know if it will scale to future HW. Nobody has
> even checked what today's POWER and S390 HW need.
> 
> vs, this stuff was made in a few days. We know it is correct as a uAPI
> since it mirrors the HW and we know it is scalable to different HW
> schemes if they come up.
> 
> So I don't see a good reason to take a risk on a "general" uAPI. If we
> make this wrong it could seriously damage the main goal of iommufd -
> to build accelerated vIOMMU models.
> 
I'm with this point. We can add a virtio format when it comes.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
- * [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3 Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-10 16:45   ` Eric Auger
  2023-03-09 10:53 ` [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info Nicolin Chen
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
The IOMMU_RESV_SW_MSI is a kernel-managed domain thing. So, it should be
only setup on a kernel-managed domain only. If the attaching domain is a
user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index f95b558f5e95..a3e7d2889164 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -350,7 +350,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 	 * call iommu_get_msi_cookie() on its behalf. This is necessary to setup
 	 * the MSI window so iommu_dma_prepare_msi() can install pages into our
 	 * domain after request_irq(). If it is not done interrupts will not
-	 * work on this domain.
+	 * work on this domain. And the msi_cookie should be always set into the
+	 * kernel-managed (parent) domain.
 	 *
 	 * FIXME: This is conceptually broken for iommufd since we want to allow
 	 * userspace to change the domains, eg switch from an identity IOAS to a
@@ -358,6 +359,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 	 * matches what the IRQ layer actually expects in a newly created
 	 * domain.
 	 */
+	if (hwpt->parent)
+		hwpt = hwpt->parent;
 	if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) {
 		rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
 		if (rc)
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains
  2023-03-09 10:53 ` [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains Nicolin Chen
@ 2023-03-10 16:45   ` Eric Auger
  2023-03-11  0:17     ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-10 16:45 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi Nicolin,
On 3/9/23 11:53, Nicolin Chen wrote:
> The IOMMU_RESV_SW_MSI is a kernel-managed domain thing. So, it should be
> only setup on a kernel-managed domain only. If the attaching domain is a
> user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/iommufd/device.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index f95b558f5e95..a3e7d2889164 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -350,7 +350,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
>  	 * call iommu_get_msi_cookie() on its behalf. This is necessary to setup
>  	 * the MSI window so iommu_dma_prepare_msi() can install pages into our
>  	 * domain after request_irq(). If it is not done interrupts will not
> -	 * work on this domain.
> +	 * work on this domain. And the msi_cookie should be always set into the
s/And the/The/
> +	 * kernel-managed (parent) domain.
>  	 *
>  	 * FIXME: This is conceptually broken for iommufd since we want to allow
>  	 * userspace to change the domains, eg switch from an identity IOAS to a
> @@ -358,6 +359,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
>  	 * matches what the IRQ layer actually expects in a newly created
>  	 * domain.
>  	 */
> +	if (hwpt->parent)
> +		hwpt = hwpt->parent;
I guess there is a garantee the parent hwpt is necessarily a
kernel-managed domain?
Is it that part of the spec that enforces it?
IOMMU_HWPT_ALLOC doc says:
" * A user-managed HWPT will be created from a given parent HWPT via
@pt_id, in
 * which the parent HWPT must be allocated previously via the same ioctl
from a
 * given IOAS.
"
Maybe precise that in the commit msg?
Thanks
Eric
>  	if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) {
>  		rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start);
>  		if (rc)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains
  2023-03-10 16:45   ` Eric Auger
@ 2023-03-11  0:17     ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11  0:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 05:45:20PM +0100, Eric Auger wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi Nicolin,
> 
> On 3/9/23 11:53, Nicolin Chen wrote:
> > The IOMMU_RESV_SW_MSI is a kernel-managed domain thing. So, it should be
> > only setup on a kernel-managed domain only. If the attaching domain is a
> > user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/iommu/iommufd/device.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index f95b558f5e95..a3e7d2889164 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -350,7 +350,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
> >        * call iommu_get_msi_cookie() on its behalf. This is necessary to setup
> >        * the MSI window so iommu_dma_prepare_msi() can install pages into our
> >        * domain after request_irq(). If it is not done interrupts will not
> > -      * work on this domain.
> > +      * work on this domain. And the msi_cookie should be always set into the
> s/And the/The/
OK.
> > +      * kernel-managed (parent) domain.
> >        *
> >        * FIXME: This is conceptually broken for iommufd since we want to allow
> >        * userspace to change the domains, eg switch from an identity IOAS to a
> > @@ -358,6 +359,8 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
> >        * matches what the IRQ layer actually expects in a newly created
> >        * domain.
> >        */
> > +     if (hwpt->parent)
> > +             hwpt = hwpt->parent;
> I guess there is a garantee the parent hwpt is necessarily a
> kernel-managed domain?
Yes. It must be.
> Is it that part of the spec that enforces it?
The hwpt_alloc() function has a sanity to enforce that.
> IOMMU_HWPT_ALLOC doc says:
> " * A user-managed HWPT will be created from a given parent HWPT via
> @pt_id, in
>  * which the parent HWPT must be allocated previously via the same ioctl
> from a
>  * given IOAS.
> "
> Maybe precise that in the commit msg?
There is a paragraph just above that, for kernel-managed HWPT:
455  * A normal HWPT will be created with the mappings from the given IOAS.
456  * The @data_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
457  * another type (being listed below) to specialize a kernel-managed HWPT.
Perhaps we could rephrase "normal HWPT" with "kernel-managed
HWPT", to make it more clear.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (2 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 13:03   ` Robin Murphy
  2023-03-09 10:53 ` [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Nicolin Chen
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
This is used to forward the host IDR values to the user space, so the
hypervisor and the guest VM can learn about the underlying hardware's
capabilities.
Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
corresponding type sanity in the core.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
 include/uapi/linux/iommufd.h                | 14 ++++++++++++
 3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f2425b0f0cd6..c1aac695ae0d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
 	}
 }
 
+static void *arm_smmu_hw_info(struct device *dev, u32 *length)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct iommu_hw_info_smmuv3 *info;
+	void *base_idr;
+	int i;
+
+	if (!master || !master->smmu)
+		return ERR_PTR(-ENODEV);
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	base_idr = master->smmu->base + ARM_SMMU_IDR0;
+	for (i = 0; i <= 5; i++)
+		info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
+
+	*length = sizeof(*info);
+
+	return info;
+}
+
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
@@ -2845,6 +2868,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
+	.hw_info		= arm_smmu_hw_info,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
@@ -2857,6 +2881,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.page_response		= arm_smmu_page_response,
 	.def_domain_type	= arm_smmu_def_domain_type,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
+	.driver_type		= IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
 	.owner			= THIS_MODULE,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev		= arm_smmu_attach_dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8d772ea8a583..ba2b4562f4b2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -14,6 +14,8 @@
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
 
+#include <uapi/linux/iommufd.h>
+
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
 #define IDR0_ST_LVL			GENMASK(28, 27)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 0d5551b1b2be..c7a37915b49c 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -519,6 +519,20 @@ struct iommu_hw_info_vtd {
 	__aligned_u64 ecap_reg;
 };
 
+/**
+ * struct iommu_hw_info_smmuv3 - ARM SMMUv3 device info
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ * @idr: Implemented features for the SMMU Non-secure programming interface.
+ *       Please refer to the chapters from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
+ */
+struct iommu_hw_info_smmuv3 {
+	__u32 flags;
+	__u32 __reserved;
+	__u32 idr[6];
+};
+
 /**
  * struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO)
  * @size: sizeof(struct iommu_hw_info)
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-09 10:53 ` [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info Nicolin Chen
@ 2023-03-09 13:03   ` Robin Murphy
  2023-03-10  1:17     ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 13:03 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> This is used to forward the host IDR values to the user space, so the
> hypervisor and the guest VM can learn about the underlying hardware's
> capabilities.
> 
> Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
> corresponding type sanity in the core.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
>   include/uapi/linux/iommufd.h                | 14 ++++++++++++
>   3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index f2425b0f0cd6..c1aac695ae0d 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>   	}
>   }
>   
> +static void *arm_smmu_hw_info(struct device *dev, u32 *length)
> +{
> +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	struct iommu_hw_info_smmuv3 *info;
> +	void *base_idr;
> +	int i;
> +
> +	if (!master || !master->smmu)
> +		return ERR_PTR(-ENODEV);
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		return ERR_PTR(-ENOMEM);
> +
> +	base_idr = master->smmu->base + ARM_SMMU_IDR0;
> +	for (i = 0; i <= 5; i++)
> +		info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
You need to take firmware overrides etc. into account here. In 
particular, features like BTM may need to be hidden to work around 
errata either in the system integration or the SMMU itself. It isn't 
reasonable to expect every VMM to be aware of every erratum and 
workaround, and there may even be workarounds where we need to go out of 
our way to prevent guests from trying to use certain features in order 
to maintain correctness at S2.
In general this should probably follow the same principle as KVM, where 
we only expose sanitised feature registers representing the 
functionality the host understands. Code written today is almost 
guaranteed to be running on hardware released in 2030, at least *somewhere*.
Thanks,
Robin.
> +
> +	*length = sizeof(*info);
> +
> +	return info;
> +}
> +
>   static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
>   {
>   	struct arm_smmu_domain *smmu_domain;
> @@ -2845,6 +2868,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>   
>   static struct iommu_ops arm_smmu_ops = {
>   	.capable		= arm_smmu_capable,
> +	.hw_info		= arm_smmu_hw_info,
>   	.domain_alloc		= arm_smmu_domain_alloc,
>   	.probe_device		= arm_smmu_probe_device,
>   	.release_device		= arm_smmu_release_device,
> @@ -2857,6 +2881,7 @@ static struct iommu_ops arm_smmu_ops = {
>   	.page_response		= arm_smmu_page_response,
>   	.def_domain_type	= arm_smmu_def_domain_type,
>   	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
> +	.driver_type		= IOMMU_HW_INFO_TYPE_ARM_SMMUV3,
>   	.owner			= THIS_MODULE,
>   	.default_domain_ops = &(const struct iommu_domain_ops) {
>   		.attach_dev		= arm_smmu_attach_dev,
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 8d772ea8a583..ba2b4562f4b2 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -14,6 +14,8 @@
>   #include <linux/mmzone.h>
>   #include <linux/sizes.h>
>   
> +#include <uapi/linux/iommufd.h>
> +
>   /* MMIO registers */
>   #define ARM_SMMU_IDR0			0x0
>   #define IDR0_ST_LVL			GENMASK(28, 27)
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 0d5551b1b2be..c7a37915b49c 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -519,6 +519,20 @@ struct iommu_hw_info_vtd {
>   	__aligned_u64 ecap_reg;
>   };
>   
> +/**
> + * struct iommu_hw_info_smmuv3 - ARM SMMUv3 device info
> + *
> + * @flags: Must be set to 0
> + * @__reserved: Must be 0
> + * @idr: Implemented features for the SMMU Non-secure programming interface.
> + *       Please refer to the chapters from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
> + */
> +struct iommu_hw_info_smmuv3 {
> +	__u32 flags;
> +	__u32 __reserved;
> +	__u32 idr[6];
> +};
> +
>   /**
>    * struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO)
>    * @size: sizeof(struct iommu_hw_info)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-09 13:03   ` Robin Murphy
@ 2023-03-10  1:17     ` Nicolin Chen
  2023-03-10 15:28       ` Robin Murphy
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  1:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
Hi Robin,
Thanks for the inputs.
On Thu, Mar 09, 2023 at 01:03:41PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-09 10:53, Nicolin Chen wrote:
> > This is used to forward the host IDR values to the user space, so the
> > hypervisor and the guest VM can learn about the underlying hardware's
> > capabilities.
> > 
> > Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
> > corresponding type sanity in the core.
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
> >   include/uapi/linux/iommufd.h                | 14 ++++++++++++
> >   3 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index f2425b0f0cd6..c1aac695ae0d 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
> >       }
> >   }
> > 
> > +static void *arm_smmu_hw_info(struct device *dev, u32 *length)
> > +{
> > +     struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +     struct iommu_hw_info_smmuv3 *info;
> > +     void *base_idr;
> > +     int i;
> > +
> > +     if (!master || !master->smmu)
> > +             return ERR_PTR(-ENODEV);
> > +
> > +     info = kzalloc(sizeof(*info), GFP_KERNEL);
> > +     if (!info)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     base_idr = master->smmu->base + ARM_SMMU_IDR0;
> > +     for (i = 0; i <= 5; i++)
> > +             info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
> 
> You need to take firmware overrides etc. into account here. In
> particular, features like BTM may need to be hidden to work around
> errata either in the system integration or the SMMU itself. It isn't
> reasonable to expect every VMM to be aware of every erratum and
> workaround, and there may even be workarounds where we need to go out of
> our way to prevent guests from trying to use certain features in order
> to maintain correctness at S2.
We can add a bit of overrides after this for errata, perhaps?
I have some trouble with finding the errata docs. Would it be
possible for you to direct me to it with a link maybe?
> In general this should probably follow the same principle as KVM, where
> we only expose sanitised feature registers representing the
> functionality the host understands. Code written today is almost
> guaranteed to be running on hardware released in 2030, at least *somewhere*.
Yes.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-10  1:17     ` Nicolin Chen
@ 2023-03-10 15:28       ` Robin Murphy
  2023-03-16  0:13         ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-10 15:28 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 2023-03-10 01:17, Nicolin Chen wrote:
> Hi Robin,
> 
> Thanks for the inputs.
> 
> On Thu, Mar 09, 2023 at 01:03:41PM +0000, Robin Murphy wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023-03-09 10:53, Nicolin Chen wrote:
>>> This is used to forward the host IDR values to the user space, so the
>>> hypervisor and the guest VM can learn about the underlying hardware's
>>> capabilities.
>>>
>>> Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
>>> corresponding type sanity in the core.
>>>
>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>> ---
>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
>>>    include/uapi/linux/iommufd.h                | 14 ++++++++++++
>>>    3 files changed, 41 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index f2425b0f0cd6..c1aac695ae0d 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>>>        }
>>>    }
>>>
>>> +static void *arm_smmu_hw_info(struct device *dev, u32 *length)
>>> +{
>>> +     struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>>> +     struct iommu_hw_info_smmuv3 *info;
>>> +     void *base_idr;
>>> +     int i;
>>> +
>>> +     if (!master || !master->smmu)
>>> +             return ERR_PTR(-ENODEV);
>>> +
>>> +     info = kzalloc(sizeof(*info), GFP_KERNEL);
>>> +     if (!info)
>>> +             return ERR_PTR(-ENOMEM);
>>> +
>>> +     base_idr = master->smmu->base + ARM_SMMU_IDR0;
>>> +     for (i = 0; i <= 5; i++)
>>> +             info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
>>
>> You need to take firmware overrides etc. into account here. In
>> particular, features like BTM may need to be hidden to work around
>> errata either in the system integration or the SMMU itself. It isn't
>> reasonable to expect every VMM to be aware of every erratum and
>> workaround, and there may even be workarounds where we need to go out of
>> our way to prevent guests from trying to use certain features in order
>> to maintain correctness at S2.
> 
> We can add a bit of overrides after this for errata, perhaps?
> 
> I have some trouble with finding the errata docs. Would it be
> possible for you to direct me to it with a link maybe?
The key Arm term is "Software Developer Errata Notice", or just SDEN. 
Here's the ones for MMU-600 and MMU-700:
https://developer.arm.com/documentation/SDEN-946810/latest/
https://developer.arm.com/documentation/SDEN-1786925/latest/
Note that until now it has been extremely fortunate that in pretty much 
every case Linux either hasn't supported the affected feature at all, or 
has happened to avoid meeting the conditions. Once we do introduce 
nesting support that all goes out the window (and I'll have to think 
more when reviewing new errata in future...)
I've been putting off revisiting all the existing errata to figure out 
what we'd need to do until new nesting patches appeared, so I'll try to 
get to that soon now. I think in many cases it's likely to be best to 
just disallowing nesting entirely on affected implementations.
Thanks,
Robin.
>> In general this should probably follow the same principle as KVM, where
>> we only expose sanitised feature registers representing the
>> functionality the host understands. Code written today is almost
>> guaranteed to be running on hardware released in 2030, at least *somewhere*.
> 
> Yes.
> 
> Thanks
> Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-10 15:28       ` Robin Murphy
@ 2023-03-16  0:13         ` Nicolin Chen
  2023-03-16 15:19           ` Robin Murphy
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16  0:13 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 03:28:56PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-10 01:17, Nicolin Chen wrote:
> > Hi Robin,
> > 
> > Thanks for the inputs.
> > 
> > On Thu, Mar 09, 2023 at 01:03:41PM +0000, Robin Murphy wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 2023-03-09 10:53, Nicolin Chen wrote:
> > > > This is used to forward the host IDR values to the user space, so the
> > > > hypervisor and the guest VM can learn about the underlying hardware's
> > > > capabilities.
> > > > 
> > > > Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
> > > > corresponding type sanity in the core.
> > > > 
> > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > ---
> > > >    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
> > > >    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
> > > >    include/uapi/linux/iommufd.h                | 14 ++++++++++++
> > > >    3 files changed, 41 insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > index f2425b0f0cd6..c1aac695ae0d 100644
> > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > @@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
> > > >        }
> > > >    }
> > > > 
> > > > +static void *arm_smmu_hw_info(struct device *dev, u32 *length)
> > > > +{
> > > > +     struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > > > +     struct iommu_hw_info_smmuv3 *info;
> > > > +     void *base_idr;
> > > > +     int i;
> > > > +
> > > > +     if (!master || !master->smmu)
> > > > +             return ERR_PTR(-ENODEV);
> > > > +
> > > > +     info = kzalloc(sizeof(*info), GFP_KERNEL);
> > > > +     if (!info)
> > > > +             return ERR_PTR(-ENOMEM);
> > > > +
> > > > +     base_idr = master->smmu->base + ARM_SMMU_IDR0;
> > > > +     for (i = 0; i <= 5; i++)
> > > > +             info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
> > > 
> > > You need to take firmware overrides etc. into account here. In
> > > particular, features like BTM may need to be hidden to work around
> > > errata either in the system integration or the SMMU itself. It isn't
> > > reasonable to expect every VMM to be aware of every erratum and
> > > workaround, and there may even be workarounds where we need to go out of
> > > our way to prevent guests from trying to use certain features in order
> > > to maintain correctness at S2.
> > 
> > We can add a bit of overrides after this for errata, perhaps?
> > 
> > I have some trouble with finding the errata docs. Would it be
> > possible for you to direct me to it with a link maybe?
> 
> The key Arm term is "Software Developer Errata Notice", or just SDEN.
> Here's the ones for MMU-600 and MMU-700:
> 
> https://developer.arm.com/documentation/SDEN-946810/latest/
This page shows "Arm CoreLink MMU-600 System Memory Management
Unit Software Developer Errata Notice" but the downloaded file
is "Arm CoreLink CI-700 Coherent Interconnect" errata notice.
And I don't quite understand what it's about.
> https://developer.arm.com/documentation/SDEN-1786925/latest/
Yea, this one I got an "MMU-700 System Memory Management Unit"
SMMU errata file that I can read and understand.
> Note that until now it has been extremely fortunate that in pretty much
> every case Linux either hasn't supported the affected feature at all, or
> has happened to avoid meeting the conditions. Once we do introduce
> nesting support that all goes out the window (and I'll have to think
> more when reviewing new errata in future...)
> 
> I've been putting off revisiting all the existing errata to figure out
> what we'd need to do until new nesting patches appeared, so I'll try to
> get to that soon now. I think in many cases it's likely to be best to
> just disallowing nesting entirely on affected implementations.
Do we have already a list of "affected implementations"? Or,
we would need to make such a list now? In a latter case, can
these affected implementations be detected from their IRD0-5
registers, so that we can simply do something in hw_info()?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-16  0:13         ` Nicolin Chen
@ 2023-03-16 15:19           ` Robin Murphy
  2023-03-16 20:06             ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-16 15:19 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 16/03/2023 12:13 am, Nicolin Chen wrote:
> On Fri, Mar 10, 2023 at 03:28:56PM +0000, Robin Murphy wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023-03-10 01:17, Nicolin Chen wrote:
>>> Hi Robin,
>>>
>>> Thanks for the inputs.
>>>
>>> On Thu, Mar 09, 2023 at 01:03:41PM +0000, Robin Murphy wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 2023-03-09 10:53, Nicolin Chen wrote:
>>>>> This is used to forward the host IDR values to the user space, so the
>>>>> hypervisor and the guest VM can learn about the underlying hardware's
>>>>> capabilities.
>>>>>
>>>>> Also, set the driver_type to IOMMU_HW_INFO_TYPE_ARM_SMMUV3 to pass the
>>>>> corresponding type sanity in the core.
>>>>>
>>>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>>>> ---
>>>>>     drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++++
>>>>>     drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 ++
>>>>>     include/uapi/linux/iommufd.h                | 14 ++++++++++++
>>>>>     3 files changed, 41 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> index f2425b0f0cd6..c1aac695ae0d 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> @@ -2005,6 +2005,29 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
>>>>>         }
>>>>>     }
>>>>>
>>>>> +static void *arm_smmu_hw_info(struct device *dev, u32 *length)
>>>>> +{
>>>>> +     struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>>>>> +     struct iommu_hw_info_smmuv3 *info;
>>>>> +     void *base_idr;
>>>>> +     int i;
>>>>> +
>>>>> +     if (!master || !master->smmu)
>>>>> +             return ERR_PTR(-ENODEV);
>>>>> +
>>>>> +     info = kzalloc(sizeof(*info), GFP_KERNEL);
>>>>> +     if (!info)
>>>>> +             return ERR_PTR(-ENOMEM);
>>>>> +
>>>>> +     base_idr = master->smmu->base + ARM_SMMU_IDR0;
>>>>> +     for (i = 0; i <= 5; i++)
>>>>> +             info->idr[i] = readl_relaxed(base_idr + 0x4 * i);
>>>>
>>>> You need to take firmware overrides etc. into account here. In
>>>> particular, features like BTM may need to be hidden to work around
>>>> errata either in the system integration or the SMMU itself. It isn't
>>>> reasonable to expect every VMM to be aware of every erratum and
>>>> workaround, and there may even be workarounds where we need to go out of
>>>> our way to prevent guests from trying to use certain features in order
>>>> to maintain correctness at S2.
>>>
>>> We can add a bit of overrides after this for errata, perhaps?
>>>
>>> I have some trouble with finding the errata docs. Would it be
>>> possible for you to direct me to it with a link maybe?
>>
>> The key Arm term is "Software Developer Errata Notice", or just SDEN.
>> Here's the ones for MMU-600 and MMU-700:
>>
>> https://developer.arm.com/documentation/SDEN-946810/latest/
> 
> This page shows "Arm CoreLink MMU-600 System Memory Management
> Unit Software Developer Errata Notice" but the downloaded file
> is "Arm CoreLink CI-700 Coherent Interconnect" errata notice.
> And I don't quite understand what it's about.
Oh, wonderful... I've reported that now, hopefully it gets fixed soon...
>> https://developer.arm.com/documentation/SDEN-1786925/latest/
> 
> Yea, this one I got an "MMU-700 System Memory Management Unit"
> SMMU errata file that I can read and understand.
> 
>> Note that until now it has been extremely fortunate that in pretty much
>> every case Linux either hasn't supported the affected feature at all, or
>> has happened to avoid meeting the conditions. Once we do introduce
>> nesting support that all goes out the window (and I'll have to think
>> more when reviewing new errata in future...)
>>
>> I've been putting off revisiting all the existing errata to figure out
>> what we'd need to do until new nesting patches appeared, so I'll try to
>> get to that soon now. I think in many cases it's likely to be best to
>> just disallowing nesting entirely on affected implementations.
> 
> Do we have already a list of "affected implementations"? Or,
> we would need to make such a list now? In a latter case, can
> these affected implementations be detected from their IRD0-5
> registers, so that we can simply do something in hw_info()?
Somewhere I have a patch that adds all the IIDR stuff needed for this, 
but I never sent it upstream since the erratum itself was an early 
MMU-600 one which in practice doesn't matter. I'll dig that out and 
update it with what I have in mind.
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-16 15:19           ` Robin Murphy
@ 2023-03-16 20:06             ` Nicolin Chen
  2023-04-12  7:47               ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16 20:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 16, 2023 at 03:19:27PM +0000, Robin Murphy wrote:
> > > Note that until now it has been extremely fortunate that in pretty much
> > > every case Linux either hasn't supported the affected feature at all, or
> > > has happened to avoid meeting the conditions. Once we do introduce
> > > nesting support that all goes out the window (and I'll have to think
> > > more when reviewing new errata in future...)
> > > 
> > > I've been putting off revisiting all the existing errata to figure out
> > > what we'd need to do until new nesting patches appeared, so I'll try to
> > > get to that soon now. I think in many cases it's likely to be best to
> > > just disallowing nesting entirely on affected implementations.
> > 
> > Do we have already a list of "affected implementations"? Or,
> > we would need to make such a list now? In a latter case, can
> > these affected implementations be detected from their IRD0-5
> > registers, so that we can simply do something in hw_info()?
> 
> Somewhere I have a patch that adds all the IIDR stuff needed for this,
> but I never sent it upstream since the erratum itself was an early
> MMU-600 one which in practice doesn't matter. I'll dig that out and
> update it with what I have in mind.
Nice!
Perhaps we should merge that first, or include in this series
if you don't mind, so that we would be less worried about any
affected platform when releasing the new Linux version having
this nesting feature.
Thanks!
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info
  2023-03-16 20:06             ` Nicolin Chen
@ 2023-04-12  7:47               ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-04-12  7:47 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
Hi Robin,
On Thu, Mar 16, 2023 at 01:06:17PM -0700, Nicolin Chen wrote:
> On Thu, Mar 16, 2023 at 03:19:27PM +0000, Robin Murphy wrote:
> 
> > > > Note that until now it has been extremely fortunate that in pretty much
> > > > every case Linux either hasn't supported the affected feature at all, or
> > > > has happened to avoid meeting the conditions. Once we do introduce
> > > > nesting support that all goes out the window (and I'll have to think
> > > > more when reviewing new errata in future...)
> > > > 
> > > > I've been putting off revisiting all the existing errata to figure out
> > > > what we'd need to do until new nesting patches appeared, so I'll try to
> > > > get to that soon now. I think in many cases it's likely to be best to
> > > > just disallowing nesting entirely on affected implementations.
> > > 
> > > Do we have already a list of "affected implementations"? Or,
> > > we would need to make such a list now? In a latter case, can
> > > these affected implementations be detected from their IRD0-5
> > > registers, so that we can simply do something in hw_info()?
> > 
> > Somewhere I have a patch that adds all the IIDR stuff needed for this,
> > but I never sent it upstream since the erratum itself was an early
> > MMU-600 one which in practice doesn't matter. I'll dig that out and
> > update it with what I have in mind.
> 
> Nice!
> 
> Perhaps we should merge that first, or include in this series
> if you don't mind, so that we would be less worried about any
> affected platform when releasing the new Linux version having
> this nesting feature.
I just want to see if there's a possibility of adding the
patch that you mentioned above in the near term?
I'd like to send a v2 of this series for another round of
review before the next -rc1, so it'd be nicer to include
that.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
- * [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (3 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-10 16:39   ` Eric Auger
  2023-03-09 10:53 ` [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL Nicolin Chen
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
IOMMUFD designs two iommu_domain pointers to represent two stages. The S1
iommu_domain (IOMMU_DOMAIN_NESTED type) represents the Context Descriptor
table in the user space. The S2 iommu_domain (IOMMU_DOMAIN_UNMANAGED type)
represents the translation table in the kernel, owned by a hypervisor.
So there comes to no use case of the ARM_SMMU_DOMAIN_NESTED. Drop it, and
use the type IOMMU_DOMAIN_NESTED instead.
Also drop the unused arm_smmu_enable_nesting(). One following patche will
configure the correct smmu_domain->stage.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ------------------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 2 files changed, 19 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c1aac695ae0d..c5616145e2a3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1279,7 +1279,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			s1_cfg = &smmu_domain->s1_cfg;
 			break;
 		case ARM_SMMU_DOMAIN_S2:
-		case ARM_SMMU_DOMAIN_NESTED:
 			s2_cfg = &smmu_domain->s2_cfg;
 			break;
 		default:
@@ -2220,7 +2219,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		fmt = ARM_64_LPAE_S1;
 		finalise_stage_fn = arm_smmu_domain_finalise_s1;
 		break;
-	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
 		ias = smmu->ias;
 		oas = smmu->oas;
@@ -2747,21 +2745,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
-static int arm_smmu_enable_nesting(struct iommu_domain *domain)
-{
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	int ret = 0;
-
-	mutex_lock(&smmu_domain->init_mutex);
-	if (smmu_domain->smmu)
-		ret = -EPERM;
-	else
-		smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-	mutex_unlock(&smmu_domain->init_mutex);
-
-	return ret;
-}
-
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
 	return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2890,7 +2873,6 @@ static struct iommu_ops arm_smmu_ops = {
 		.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
 		.iotlb_sync		= arm_smmu_iotlb_sync,
 		.iova_to_phys		= arm_smmu_iova_to_phys,
-		.enable_nesting		= arm_smmu_enable_nesting,
 		.free			= arm_smmu_domain_free,
 	}
 };
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ba2b4562f4b2..233bfc377267 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -704,7 +704,6 @@ struct arm_smmu_master {
 enum arm_smmu_domain_stage {
 	ARM_SMMU_DOMAIN_S1 = 0,
 	ARM_SMMU_DOMAIN_S2,
-	ARM_SMMU_DOMAIN_NESTED,
 	ARM_SMMU_DOMAIN_BYPASS,
 };
 
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  2023-03-09 10:53 ` [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Nicolin Chen
@ 2023-03-10 16:39   ` Eric Auger
  2023-03-10 17:05     ` Jason Gunthorpe
  2023-03-11  0:23     ` Nicolin Chen
  0 siblings, 2 replies; 165+ messages in thread
From: Eric Auger @ 2023-03-10 16:39 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi Nicolin,
On 3/9/23 11:53, Nicolin Chen wrote:
> IOMMUFD designs two iommu_domain pointers to represent two stages. The S1
s/designs/uses?
> iommu_domain (IOMMU_DOMAIN_NESTED type) represents the Context Descriptor
> table in the user space. The S2 iommu_domain (IOMMU_DOMAIN_UNMANAGED type)
> represents the translation table in the kernel, owned by a hypervisor.
>
> So there comes to no use case of the ARM_SMMU_DOMAIN_NESTED. Drop it, and
> use the type IOMMU_DOMAIN_NESTED instead.
last sentence may be rephrased as this patch does not use
IOMMU_DOMAIN_NESTED anywhere:
Generic IOMMU_DOMAIN_NESTED type will be used in nested SMMU
implementation instead.
>
> Also drop the unused arm_smmu_enable_nesting(). One following patche will
> configure the correct smmu_domain->stage.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ------------------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
If you go this way you may also remove it from arm/arm-smmu/arm-smmu.c.
Then if I am not wrong no other driver does implement enable_nesting cb.
Shouldn't we also remove it and fellow iommu_enable_nesting()?
Thanks
Eric
>  2 files changed, 19 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c1aac695ae0d..c5616145e2a3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1279,7 +1279,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  			s1_cfg = &smmu_domain->s1_cfg;
>  			break;
>  		case ARM_SMMU_DOMAIN_S2:
> -		case ARM_SMMU_DOMAIN_NESTED:
>  			s2_cfg = &smmu_domain->s2_cfg;
>  			break;
>  		default:
> @@ -2220,7 +2219,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  		fmt = ARM_64_LPAE_S1;
>  		finalise_stage_fn = arm_smmu_domain_finalise_s1;
>  		break;
> -	case ARM_SMMU_DOMAIN_NESTED:
>  	case ARM_SMMU_DOMAIN_S2:
>  		ias = smmu->ias;
>  		oas = smmu->oas;
> @@ -2747,21 +2745,6 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
>  	return group;
>  }
>  
> -static int arm_smmu_enable_nesting(struct iommu_domain *domain)
> -{
> -	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> -	int ret = 0;
> -
> -	mutex_lock(&smmu_domain->init_mutex);
> -	if (smmu_domain->smmu)
> -		ret = -EPERM;
> -	else
> -		smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> -	mutex_unlock(&smmu_domain->init_mutex);
> -
> -	return ret;
> -}
> -
>  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>  {
>  	return iommu_fwspec_add_ids(dev, args->args, 1);
> @@ -2890,7 +2873,6 @@ static struct iommu_ops arm_smmu_ops = {
>  		.flush_iotlb_all	= arm_smmu_flush_iotlb_all,
>  		.iotlb_sync		= arm_smmu_iotlb_sync,
>  		.iova_to_phys		= arm_smmu_iova_to_phys,
> -		.enable_nesting		= arm_smmu_enable_nesting,
>  		.free			= arm_smmu_domain_free,
>  	}
>  };
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index ba2b4562f4b2..233bfc377267 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -704,7 +704,6 @@ struct arm_smmu_master {
>  enum arm_smmu_domain_stage {
>  	ARM_SMMU_DOMAIN_S1 = 0,
>  	ARM_SMMU_DOMAIN_S2,
> -	ARM_SMMU_DOMAIN_NESTED,
>  	ARM_SMMU_DOMAIN_BYPASS,
>  };
>  
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  2023-03-10 16:39   ` Eric Auger
@ 2023-03-10 17:05     ` Jason Gunthorpe
  2023-03-11  0:24       ` Nicolin Chen
  2023-03-11  0:23     ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 17:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: Nicolin Chen, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 05:39:22PM +0100, Eric Auger wrote:
> > Also drop the unused arm_smmu_enable_nesting(). One following patche will
> > configure the correct smmu_domain->stage.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ------------------
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
> If you go this way you may also remove it from arm/arm-smmu/arm-smmu.c.
> Then if I am not wrong no other driver does implement enable_nesting cb.
> Shouldn't we also remove it and fellow iommu_enable_nesting()?
Yes, lets just put this patch in the series please:
https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  2023-03-10 17:05     ` Jason Gunthorpe
@ 2023-03-11  0:24       ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11  0:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 01:05:36PM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 10, 2023 at 05:39:22PM +0100, Eric Auger wrote:
> 
> > > Also drop the unused arm_smmu_enable_nesting(). One following patche will
> > > configure the correct smmu_domain->stage.
> > >
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ------------------
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
> > If you go this way you may also remove it from arm/arm-smmu/arm-smmu.c.
> > Then if I am not wrong no other driver does implement enable_nesting cb.
> > Shouldn't we also remove it and fellow iommu_enable_nesting()?
> 
> Yes, lets just put this patch in the series please:
> 
> https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
Oh. Didn't read this before sending my previous reply..
Will do that.
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
- * Re: [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  2023-03-10 16:39   ` Eric Auger
  2023-03-10 17:05     ` Jason Gunthorpe
@ 2023-03-11  0:23     ` Nicolin Chen
  1 sibling, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11  0:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 05:39:22PM +0100, Eric Auger wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi Nicolin,
> 
> On 3/9/23 11:53, Nicolin Chen wrote:
> > IOMMUFD designs two iommu_domain pointers to represent two stages. The S1
> s/designs/uses?
> > iommu_domain (IOMMU_DOMAIN_NESTED type) represents the Context Descriptor
> > table in the user space. The S2 iommu_domain (IOMMU_DOMAIN_UNMANAGED type)
> > represents the translation table in the kernel, owned by a hypervisor.
> >
> > So there comes to no use case of the ARM_SMMU_DOMAIN_NESTED. Drop it, and
> > use the type IOMMU_DOMAIN_NESTED instead.
> last sentence may be rephrased as this patch does not use
> IOMMU_DOMAIN_NESTED anywhere:
> Generic IOMMU_DOMAIN_NESTED type will be used in nested SMMU
> implementation instead.
> >
> > Also drop the unused arm_smmu_enable_nesting(). One following patche will
> > configure the correct smmu_domain->stage.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 ------------------
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
> If you go this way you may also remove it from arm/arm-smmu/arm-smmu.c.
> Then if I am not wrong no other driver does implement enable_nesting cb.
> Shouldn't we also remove it and fellow iommu_enable_nesting()?
We had a small discussion before this community version, where
Robin mentioned that we can remove that too after the nesting
series gets merged. Yet, I didn't want to touch the v2 driver
with this series since there's no nesting change adding to it.
And a few month ago, Jason had a patch removing everything of
that API from the top. Perhaps that one can be resent after
all?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (4 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 13:13   ` Robin Murphy
  2023-03-09 10:53 ` [PATCH v1 07/14] iommu/arm-smmu-v3: Add STRTAB_STE_0_CFG_NESTED for 2-stage translation Nicolin Chen
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
From: Eric Auger <eric.auger@redhat.com>
Despite the spec does not seem to mention this, on some implementations,
when the STE configuration switches from an S1+S2 cfg to an S1 only one,
a C_BAD_STE error would happen if dst[3] (S2TTB) is not reset.
Explicitly reset those two higher 64b fields, to prevent that.
Note that this is not a bug at this moment, since a 2-stage translation
setup is not yet enabled, until the following patches add its support.
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c5616145e2a3..29e36448d23b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1361,6 +1361,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
 
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+	}  else {
+		dst[2] = 0;
+		dst[3] = 0;
 	}
 
 	if (master->ats_enabled)
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL
  2023-03-09 10:53 ` [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL Nicolin Chen
@ 2023-03-09 13:13   ` Robin Murphy
  2023-03-09 18:24     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 13:13 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> Despite the spec does not seem to mention this, on some implementations,
> when the STE configuration switches from an S1+S2 cfg to an S1 only one,
> a C_BAD_STE error would happen if dst[3] (S2TTB) is not reset.
Can you provide more details, since it's not clear whether this is a 
hardware erratum workaround or a bodge around the driver itself doing 
something wrong like not doing a proper break-before-make transition of 
the STE. The architecture explicitly states that all the STE.S2* fields 
except S2VMID and potentially S2S are ignored when Stage 2 is bypassed.
Thanks,
Robin.
> Explicitly reset those two higher 64b fields, to prevent that.
> 
> Note that this is not a bug at this moment, since a 2-stage translation
> setup is not yet enabled, until the following patches add its support.
> 
> Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c5616145e2a3..29e36448d23b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1361,6 +1361,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
>   
>   		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
> +	}  else {
> +		dst[2] = 0;
> +		dst[3] = 0;
>   	}
>   
>   	if (master->ats_enabled)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * RE: [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL
  2023-03-09 13:13   ` Robin Murphy
@ 2023-03-09 18:24     ` Shameerali Kolothum Thodi
  2023-03-10  1:54       ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-09 18:24 UTC (permalink / raw)
  To: Robin Murphy, Nicolin Chen, jgg@nvidia.com, will@kernel.org
  Cc: eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: 09 March 2023 13:13
> To: Nicolin Chen <nicolinc@nvidia.com>; jgg@nvidia.com; will@kernel.org
> Cc: eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> joro@8bytes.org; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; jean-philippe@linaro.org;
> linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding
> STE fields when s2_cfg is NULL
> 
> On 2023-03-09 10:53, Nicolin Chen wrote:
> > From: Eric Auger <eric.auger@redhat.com>
> >
> > Despite the spec does not seem to mention this, on some implementations,
> > when the STE configuration switches from an S1+S2 cfg to an S1 only one,
> > a C_BAD_STE error would happen if dst[3] (S2TTB) is not reset.
> 
> Can you provide more details, since it's not clear whether this is a
> hardware erratum workaround or a bodge around the driver itself doing
> something wrong like not doing a proper break-before-make transition of
> the STE. The architecture explicitly states that all the STE.S2* fields
> except S2VMID and potentially S2S are ignored when Stage 2 is bypassed.
Took a while to locate the email thread where this was discussed,
https://patchwork.kernel.org/cover/11449895/#23244457
This was observed on a HiSilicon implementation where, if the SMMUv3 is configured with
both Stage 1 and Stage 2 (nested) mode once, then it is not possible to configure it back
for Stage 1 mode for the same device(stream id).
IIRC, the SMMUv3 implementation on these boards expects to set the S2TTB field in STE to zero
when using S1, otherwise it reports C_BAD_STE error. :(
You are right that the specification doesn't demand this and I am not sure there are any other
Hardware that requires this.
Could we please have this with a comment added in the code?
Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL
  2023-03-09 18:24     ` Shameerali Kolothum Thodi
@ 2023-03-10  1:54       ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  1:54 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Robin Murphy, jgg@nvidia.com, will@kernel.org,
	eric.auger@redhat.com, kevin.tian@intel.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	jean-philippe@linaro.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
On Thu, Mar 09, 2023 at 06:24:29PM +0000, Shameerali Kolothum Thodi wrote:
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Robin Murphy [mailto:robin.murphy@arm.com]
> > Sent: 09 March 2023 13:13
> > To: Nicolin Chen <nicolinc@nvidia.com>; jgg@nvidia.com; will@kernel.org
> > Cc: eric.auger@redhat.com; kevin.tian@intel.com; baolu.lu@linux.intel.com;
> > joro@8bytes.org; Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; jean-philippe@linaro.org;
> > linux-arm-kernel@lists.infradead.org; iommu@lists.linux.dev;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding
> > STE fields when s2_cfg is NULL
> >
> > On 2023-03-09 10:53, Nicolin Chen wrote:
> > > From: Eric Auger <eric.auger@redhat.com>
> > >
> > > Despite the spec does not seem to mention this, on some implementations,
> > > when the STE configuration switches from an S1+S2 cfg to an S1 only one,
> > > a C_BAD_STE error would happen if dst[3] (S2TTB) is not reset.
> >
> > Can you provide more details, since it's not clear whether this is a
> > hardware erratum workaround or a bodge around the driver itself doing
> > something wrong like not doing a proper break-before-make transition of
> > the STE. The architecture explicitly states that all the STE.S2* fields
> > except S2VMID and potentially S2S are ignored when Stage 2 is bypassed.
> 
> Took a while to locate the email thread where this was discussed,
> https://patchwork.kernel.org/cover/11449895/#23244457
> 
> This was observed on a HiSilicon implementation where, if the SMMUv3 is configured with
> both Stage 1 and Stage 2 (nested) mode once, then it is not possible to configure it back
> for Stage 1 mode for the same device(stream id).
> 
> IIRC, the SMMUv3 implementation on these boards expects to set the S2TTB field in STE to zero
> when using S1, otherwise it reports C_BAD_STE error. :(
> 
> You are right that the specification doesn't demand this and I am not sure there are any other
> Hardware that requires this.
> 
> Could we please have this with a comment added in the code?
Yes, I can add that, and put that link in the commit message too.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
- * [PATCH v1 07/14] iommu/arm-smmu-v3: Add STRTAB_STE_0_CFG_NESTED for 2-stage translation
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (5 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support Nicolin Chen
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
From: Eric Auger <eric.auger@redhat.com>
The value of the STRTAB_STE_0_CFG field can be 0b111 as the configuration
for a 2-stage translation, meaning that both S1 and S2 are valid. Add it
and mark the ste_live accordingly.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
 2 files changed, 2 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 29e36448d23b..21d819979865 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1292,6 +1292,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			break;
 		case STRTAB_STE_0_CFG_S1_TRANS:
 		case STRTAB_STE_0_CFG_S2_TRANS:
+		case STRTAB_STE_0_CFG_NESTED:
 			ste_live = true;
 			break;
 		case STRTAB_STE_0_CFG_ABORT:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 233bfc377267..1a93eeb993ea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -208,6 +208,7 @@
 #define STRTAB_STE_0_CFG_BYPASS		4
 #define STRTAB_STE_0_CFG_S1_TRANS	5
 #define STRTAB_STE_0_CFG_S2_TRANS	6
+#define STRTAB_STE_0_CFG_NESTED		7
 
 #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR	0
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (6 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 07/14] iommu/arm-smmu-v3: Add STRTAB_STE_0_CFG_NESTED for 2-stage translation Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-10 20:39   ` Robin Murphy
  2023-03-09 10:53 ` [PATCH v1 09/14] iommu/arm-smmu-v3: Implement arm_smmu_get_unmanaged_domain Nicolin Chen
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
In a nested translation setup, the device is attached to a stage-1 domain
that represents the guest-level Context Descriptor table. A Stream Table
Entry for a 2-stage translation needs both the stage-1 Context Descriptor
table info and the stage-2 Translation table information, i.e. a pair of
s1_cfg and s2_cfg.
Add an "s2" pointer in struct arm_smmu_domain, so a nested stage-1 domain
can simply navigate its stage-2 domain for the s2_cfg pointer. Also, add
a to_s2_cfg() helper for this purpose, and use it at proper places.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 21d819979865..fee5977feef3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -100,6 +100,24 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
 	} while (arm_smmu_options[++i].opt);
 }
 
+static struct arm_smmu_s2_cfg *to_s2_cfg(struct arm_smmu_domain *smmu_domain)
+{
+	if (!smmu_domain)
+		return NULL;
+
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		if (smmu_domain->s2)
+			return &smmu_domain->s2->s2_cfg;
+		return NULL;
+	case ARM_SMMU_DOMAIN_S2:
+		return &smmu_domain->s2_cfg;
+	case ARM_SMMU_DOMAIN_BYPASS:
+	default:
+		return NULL;
+	}
+}
+
 /* Low-level queue manipulation functions */
 static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
 {
@@ -1277,6 +1295,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		switch (smmu_domain->stage) {
 		case ARM_SMMU_DOMAIN_S1:
 			s1_cfg = &smmu_domain->s1_cfg;
+			s2_cfg = to_s2_cfg(smmu_domain);
 			break;
 		case ARM_SMMU_DOMAIN_S2:
 			s2_cfg = &smmu_domain->s2_cfg;
@@ -1846,6 +1865,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 static void arm_smmu_tlb_inv_context(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_s2_cfg *s2_cfg = to_s2_cfg(smmu_domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd;
 
@@ -1860,7 +1880,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
-		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+		cmd.tlbi.vmid	= s2_cfg->vmid;
 		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 	}
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
@@ -1931,6 +1951,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 					  size_t granule, bool leaf,
 					  struct arm_smmu_domain *smmu_domain)
 {
+	struct arm_smmu_s2_cfg *s2_cfg = to_s2_cfg(smmu_domain);
 	struct arm_smmu_cmdq_ent cmd = {
 		.tlbi = {
 			.leaf	= leaf,
@@ -1943,7 +1964,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
-		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+		cmd.tlbi.vmid	= s2_cfg->vmid;
 	}
 	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1a93eeb993ea..6cf516852721 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -709,6 +709,7 @@ enum arm_smmu_domain_stage {
 };
 
 struct arm_smmu_domain {
+	struct arm_smmu_domain		*s2;
 	struct arm_smmu_device		*smmu;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support
  2023-03-09 10:53 ` [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support Nicolin Chen
@ 2023-03-10 20:39   ` Robin Murphy
  2023-03-11 12:40     ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-10 20:39 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> In a nested translation setup, the device is attached to a stage-1 domain
> that represents the guest-level Context Descriptor table. A Stream Table
> Entry for a 2-stage translation needs both the stage-1 Context Descriptor
> table info and the stage-2 Translation table information, i.e. a pair of
> s1_cfg and s2_cfg.
> 
> Add an "s2" pointer in struct arm_smmu_domain, so a nested stage-1 domain
> can simply navigate its stage-2 domain for the s2_cfg pointer. Also, add
> a to_s2_cfg() helper for this purpose, and use it at proper places.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +++++++++++++++++++--
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
>   2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 21d819979865..fee5977feef3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -100,6 +100,24 @@ static void parse_driver_options(struct arm_smmu_device *smmu)
>   	} while (arm_smmu_options[++i].opt);
>   }
>   
> +static struct arm_smmu_s2_cfg *to_s2_cfg(struct arm_smmu_domain *smmu_domain)
> +{
> +	if (!smmu_domain)
> +		return NULL;
> +
> +	switch (smmu_domain->stage) {
> +	case ARM_SMMU_DOMAIN_S1:
> +		if (smmu_domain->s2)
> +			return &smmu_domain->s2->s2_cfg;
> +		return NULL;
> +	case ARM_SMMU_DOMAIN_S2:
> +		return &smmu_domain->s2_cfg;
> +	case ARM_SMMU_DOMAIN_BYPASS:
> +	default:
> +		return NULL;
> +	}
> +}
> +
>   /* Low-level queue manipulation functions */
>   static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
>   {
> @@ -1277,6 +1295,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   		switch (smmu_domain->stage) {
>   		case ARM_SMMU_DOMAIN_S1:
>   			s1_cfg = &smmu_domain->s1_cfg;
> +			s2_cfg = to_s2_cfg(smmu_domain);
TBH I'd say you only need a 2-line change here. All the other cases 
below are when the stage is guaranteed to be ARM_SMMU_DOMAIN_S2 (once 
ARM_SMMU_DOMAIN_NESTED is gone), so pretending it might be otherwise 
seems unnecessarily confusing.
Thanks,
Robin.
>   			break;
>   		case ARM_SMMU_DOMAIN_S2:
>   			s2_cfg = &smmu_domain->s2_cfg;
> @@ -1846,6 +1865,7 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
>   static void arm_smmu_tlb_inv_context(void *cookie)
>   {
>   	struct arm_smmu_domain *smmu_domain = cookie;
> +	struct arm_smmu_s2_cfg *s2_cfg = to_s2_cfg(smmu_domain);
>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>   	struct arm_smmu_cmdq_ent cmd;
>   
> @@ -1860,7 +1880,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>   		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>   	} else {
>   		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
> -		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> +		cmd.tlbi.vmid	= s2_cfg->vmid;
>   		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
>   	}
>   	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
> @@ -1931,6 +1951,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>   					  size_t granule, bool leaf,
>   					  struct arm_smmu_domain *smmu_domain)
>   {
> +	struct arm_smmu_s2_cfg *s2_cfg = to_s2_cfg(smmu_domain);
>   	struct arm_smmu_cmdq_ent cmd = {
>   		.tlbi = {
>   			.leaf	= leaf,
> @@ -1943,7 +1964,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>   		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
>   	} else {
>   		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
> -		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> +		cmd.tlbi.vmid	= s2_cfg->vmid;
>   	}
>   	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
>   
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1a93eeb993ea..6cf516852721 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -709,6 +709,7 @@ enum arm_smmu_domain_stage {
>   };
>   
>   struct arm_smmu_domain {
> +	struct arm_smmu_domain		*s2;
>   	struct arm_smmu_device		*smmu;
>   	struct mutex			init_mutex; /* Protects smmu pointer */
>   
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support
  2023-03-10 20:39   ` Robin Murphy
@ 2023-03-11 12:40     ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11 12:40 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 08:39:20PM +0000, Robin Murphy wrote:
> > @@ -1277,6 +1295,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> >               switch (smmu_domain->stage) {
> >               case ARM_SMMU_DOMAIN_S1:
> >                       s1_cfg = &smmu_domain->s1_cfg;
> > +                     s2_cfg = to_s2_cfg(smmu_domain);
> 
> TBH I'd say you only need a 2-line change here. All the other cases
> below are when the stage is guaranteed to be ARM_SMMU_DOMAIN_S2 (once
> ARM_SMMU_DOMAIN_NESTED is gone), so pretending it might be otherwise
> seems unnecessarily confusing.
Oh right...I will drop those.
Thanks!
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
 
 
- * [PATCH v1 09/14] iommu/arm-smmu-v3: Implement arm_smmu_get_unmanaged_domain
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (7 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 10/14] iommu/arm-smmu-v3: Pass in user_cfg to arm_smmu_domain_finalise Nicolin Chen
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
In a 1-stage translation setup, a device is attached to an iommu_domain
(ARM_SMMU_DOMAIN_S1) that is IOMMU_DOMAIN_UNMANAGED type.
In a 2-stage translation setup, a device is attached to an iommu_domain
(ARM_SMMU_DOMAIN_S1) that is IOMMU_DOMAIN_NESTED type, which must have
a valid "s2" pointer for an iommu_domain (ARM_SMMU_DOMAIN_S2) that is
IOMMU_DOMAIN_UNMANAGED type.
Add a function to return the correct iommu_domain pointer accordingly.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fee5977feef3..18ab5d516cf2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2082,6 +2082,17 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	return &smmu_domain->domain;
 }
 
+static struct iommu_domain *arm_smmu_get_unmanaged_domain(struct device *dev)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain->s2)
+		return &smmu_domain->s2->domain;
+
+	return &smmu_domain->domain;
+}
+
 static int arm_smmu_bitmap_alloc(unsigned long *map, int span)
 {
 	int idx, size = 1 << span;
@@ -2878,6 +2889,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.hw_info		= arm_smmu_hw_info,
 	.domain_alloc		= arm_smmu_domain_alloc,
+	.get_unmanaged_domain	= arm_smmu_get_unmanaged_domain,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * [PATCH v1 10/14] iommu/arm-smmu-v3: Pass in user_cfg to arm_smmu_domain_finalise
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (8 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 09/14] iommu/arm-smmu-v3: Implement arm_smmu_get_unmanaged_domain Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user Nicolin Chen
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
The struct iommu_hwpt_arm_smmuv3 contains the userspace Stream Table Entry
info (for ARM_SMMU_DOMAIN_S1) and an "S2" flag (for ARM_SMMU_DOMAIN_S2).
Pass in a valid user_cfg pointer, so arm_smmu_domain_finalise() can handle
both types of user domain finalizations.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 18ab5d516cf2..2d29f7320570 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -26,6 +26,7 @@
 #include <linux/pci.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
+#include <uapi/linux/iommufd.h>
 
 #include "arm-smmu-v3.h"
 #include "../../dma-iommu.h"
@@ -2223,7 +2224,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 }
 
 static int arm_smmu_domain_finalise(struct iommu_domain *domain,
-				    struct arm_smmu_master *master)
+				    struct arm_smmu_master *master,
+				    const struct iommu_hwpt_arm_smmuv3 *user_cfg)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -2235,12 +2237,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 				 struct io_pgtable_cfg *);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	bool user_cfg_s2 = user_cfg && (user_cfg->flags & IOMMU_SMMUV3_FLAG_S2);
 
 	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
 		smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
 		return 0;
 	}
 
+	if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
+		return -EINVAL;
+	if (user_cfg_s2)
+		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
+
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
@@ -2484,7 +2492,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain, master);
+		ret = arm_smmu_domain_finalise(domain, master, NULL);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (9 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 10/14] iommu/arm-smmu-v3: Pass in user_cfg to arm_smmu_domain_finalise Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-24 15:28   ` Eric Auger
  2023-03-24 15:33   ` Eric Auger
  2023-03-09 10:53 ` [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations Nicolin Chen
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
The arm_smmu_domain_alloc_user callback function is used for userspace to
allocate iommu_domains, such as standalone stage-1 domain, nested stage-1
domain, and nested stage-2 domain. The input user_data is in the type of
struct iommu_hwpt_arm_smmuv3 that contains the configurations of a nested
stage-1 or a nested stage-2 iommu_domain. A NULL user_data will just opt
in a standalone stage-1 domain allocation.
Add a constitutive function __arm_smmu_domain_alloc to support that.
Since ops->domain_alloc_user has a valid dev pointer, the master pointer
is available when calling __arm_smmu_domain_alloc() in this case, meaning
that arm_smmu_domain_finalise() can be done at the allocation stage. This
allows IOMMUFD to initialize the hw_pagetable for the domain.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 95 ++++++++++++++-------
 1 file changed, 65 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2d29f7320570..5ff74edfbd68 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2053,36 +2053,6 @@ static void *arm_smmu_hw_info(struct device *dev, u32 *length)
 	return info;
 }
 
-static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
-{
-	struct arm_smmu_domain *smmu_domain;
-
-	if (type == IOMMU_DOMAIN_SVA)
-		return arm_smmu_sva_domain_alloc();
-
-	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_DMA_FQ &&
-	    type != IOMMU_DOMAIN_IDENTITY)
-		return NULL;
-
-	/*
-	 * Allocate the domain and initialise some of its data structures.
-	 * We can't really do anything meaningful until we've added a
-	 * master.
-	 */
-	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
-	if (!smmu_domain)
-		return NULL;
-
-	mutex_init(&smmu_domain->init_mutex);
-	INIT_LIST_HEAD(&smmu_domain->devices);
-	spin_lock_init(&smmu_domain->devices_lock);
-	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
-
-	return &smmu_domain->domain;
-}
-
 static struct iommu_domain *arm_smmu_get_unmanaged_domain(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
@@ -2893,10 +2863,75 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
 }
 
+static struct iommu_domain *
+__arm_smmu_domain_alloc(unsigned type,
+			struct arm_smmu_domain *s2,
+			struct arm_smmu_master *master,
+			const struct iommu_hwpt_arm_smmuv3 *user_cfg)
+{
+	struct arm_smmu_domain *smmu_domain;
+	struct iommu_domain *domain;
+	int ret = 0;
+
+	if (type == IOMMU_DOMAIN_SVA)
+		return arm_smmu_sva_domain_alloc();
+
+	if (type != IOMMU_DOMAIN_UNMANAGED &&
+	    type != IOMMU_DOMAIN_DMA &&
+	    type != IOMMU_DOMAIN_DMA_FQ &&
+	    type != IOMMU_DOMAIN_IDENTITY)
+		return NULL;
+
+	/*
+	 * Allocate the domain and initialise some of its data structures.
+	 * We can't really finalise the domain unless a master is given.
+	 */
+	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
+	if (!smmu_domain)
+		return NULL;
+	domain = &smmu_domain->domain;
+
+	domain->type = type;
+	domain->ops = arm_smmu_ops.default_domain_ops;
+
+	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
+
+	if (master) {
+		smmu_domain->smmu = master->smmu;
+		ret = arm_smmu_domain_finalise(domain, master, user_cfg);
+		if (ret) {
+			kfree(smmu_domain);
+			return NULL;
+		}
+	}
+
+	return domain;
+}
+
+static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
+{
+	return __arm_smmu_domain_alloc(type, NULL, NULL, NULL);
+}
+
+static struct iommu_domain *
+arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
+			   const void *user_data)
+{
+	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	unsigned type = IOMMU_DOMAIN_UNMANAGED;
+
+	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.hw_info		= arm_smmu_hw_info,
 	.domain_alloc		= arm_smmu_domain_alloc,
+	.domain_alloc_user	= arm_smmu_domain_alloc_user,
 	.get_unmanaged_domain	= arm_smmu_get_unmanaged_domain,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-09 10:53 ` [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user Nicolin Chen
@ 2023-03-24 15:28   ` Eric Auger
  2023-03-24 17:40     ` Nicolin Chen
  2023-03-24 15:33   ` Eric Auger
  1 sibling, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-24 15:28 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi Nicolin,
On 3/9/23 11:53, Nicolin Chen wrote:
> The arm_smmu_domain_alloc_user callback function is used for userspace to
> allocate iommu_domains, such as standalone stage-1 domain, nested stage-1
> domain, and nested stage-2 domain. The input user_data is in the type of
> struct iommu_hwpt_arm_smmuv3 that contains the configurations of a nested
> stage-1 or a nested stage-2 iommu_domain. A NULL user_data will just opt
> in a standalone stage-1 domain allocation.
>
> Add a constitutive function __arm_smmu_domain_alloc to support that.
>
> Since ops->domain_alloc_user has a valid dev pointer, the master pointer
> is available when calling __arm_smmu_domain_alloc() in this case, meaning
> that arm_smmu_domain_finalise() can be done at the allocation stage. This
> allows IOMMUFD to initialize the hw_pagetable for the domain.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 95 ++++++++++++++-------
>  1 file changed, 65 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2d29f7320570..5ff74edfbd68 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2053,36 +2053,6 @@ static void *arm_smmu_hw_info(struct device *dev, u32 *length)
>  	return info;
>  }
>  
> -static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> -{
> -	struct arm_smmu_domain *smmu_domain;
> -
> -	if (type == IOMMU_DOMAIN_SVA)
> -		return arm_smmu_sva_domain_alloc();
> -
> -	if (type != IOMMU_DOMAIN_UNMANAGED &&
> -	    type != IOMMU_DOMAIN_DMA &&
> -	    type != IOMMU_DOMAIN_DMA_FQ &&
> -	    type != IOMMU_DOMAIN_IDENTITY)
> -		return NULL;
> -
> -	/*
> -	 * Allocate the domain and initialise some of its data structures.
> -	 * We can't really do anything meaningful until we've added a
> -	 * master.
> -	 */
> -	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> -	if (!smmu_domain)
> -		return NULL;
> -
> -	mutex_init(&smmu_domain->init_mutex);
> -	INIT_LIST_HEAD(&smmu_domain->devices);
> -	spin_lock_init(&smmu_domain->devices_lock);
> -	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
> -
> -	return &smmu_domain->domain;
> -}
> -
>  static struct iommu_domain *arm_smmu_get_unmanaged_domain(struct device *dev)
>  {
>  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> @@ -2893,10 +2863,75 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>  	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>  }
>  
> +static struct iommu_domain *
> +__arm_smmu_domain_alloc(unsigned type,
> +			struct arm_smmu_domain *s2,
> +			struct arm_smmu_master *master,
> +			const struct iommu_hwpt_arm_smmuv3 *user_cfg)
> +{
> +	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain *domain;
> +	int ret = 0;
> +
> +	if (type == IOMMU_DOMAIN_SVA)
> +		return arm_smmu_sva_domain_alloc();
> +
> +	if (type != IOMMU_DOMAIN_UNMANAGED &&
> +	    type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_DMA_FQ &&
> +	    type != IOMMU_DOMAIN_IDENTITY)
> +		return NULL;
> +
> +	/*
> +	 * Allocate the domain and initialise some of its data structures.
> +	 * We can't really finalise the domain unless a master is given.
> +	 */
> +	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> +	if (!smmu_domain)
> +		return NULL;
> +	domain = &smmu_domain->domain;
> +
> +	domain->type = type;
> +	domain->ops = arm_smmu_ops.default_domain_ops;
Compared to the original code, that's something new. Please can you
explain why this is added in this patch?
> +
> +	mutex_init(&smmu_domain->init_mutex);
> +	INIT_LIST_HEAD(&smmu_domain->devices);
> +	spin_lock_init(&smmu_domain->devices_lock);
> +	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
> +
> +	if (master) {
> +		smmu_domain->smmu = master->smmu;
> +		ret = arm_smmu_domain_finalise(domain, master, user_cfg);
> +		if (ret) {
> +			kfree(smmu_domain);
> +			return NULL;
> +		}
> +	}
> +
> +	return domain;
> +}
> +
> +static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> +{
> +	return __arm_smmu_domain_alloc(type, NULL, NULL, NULL);
> +}
> +
> +static struct iommu_domain *
> +arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
> +			   const void *user_data)
> +{
> +	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
> +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	unsigned type = IOMMU_DOMAIN_UNMANAGED;
is there any guarantee that master is non null? Don't we want to check?
> +
> +	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
> +}
> +
>  static struct iommu_ops arm_smmu_ops = {
>  	.capable		= arm_smmu_capable,
>  	.hw_info		= arm_smmu_hw_info,
>  	.domain_alloc		= arm_smmu_domain_alloc,
> +	.domain_alloc_user	= arm_smmu_domain_alloc_user,
>  	.get_unmanaged_domain	= arm_smmu_get_unmanaged_domain,
>  	.probe_device		= arm_smmu_probe_device,
>  	.release_device		= arm_smmu_release_device,
Thanks
Eric
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-24 15:28   ` Eric Auger
@ 2023-03-24 17:40     ` Nicolin Chen
  2023-03-24 17:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 17:40 UTC (permalink / raw)
  To: Eric Auger
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
Hi Eirc,
Thanks for the review.
On Fri, Mar 24, 2023 at 04:28:26PM +0100, Eric Auger wrote:
> > +static struct iommu_domain *
> > +__arm_smmu_domain_alloc(unsigned type,
> > +                     struct arm_smmu_domain *s2,
> > +                     struct arm_smmu_master *master,
> > +                     const struct iommu_hwpt_arm_smmuv3 *user_cfg)
> > +{
> > +     struct arm_smmu_domain *smmu_domain;
> > +     struct iommu_domain *domain;
> > +     int ret = 0;
> > +
> > +     if (type == IOMMU_DOMAIN_SVA)
> > +             return arm_smmu_sva_domain_alloc();
> > +
> > +     if (type != IOMMU_DOMAIN_UNMANAGED &&
> > +         type != IOMMU_DOMAIN_DMA &&
> > +         type != IOMMU_DOMAIN_DMA_FQ &&
> > +         type != IOMMU_DOMAIN_IDENTITY)
> > +             return NULL;
> > +
> > +     /*
> > +      * Allocate the domain and initialise some of its data structures.
> > +      * We can't really finalise the domain unless a master is given.
> > +      */
> > +     smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> > +     if (!smmu_domain)
> > +             return NULL;
> > +     domain = &smmu_domain->domain;
> > +
> > +     domain->type = type;
> > +     domain->ops = arm_smmu_ops.default_domain_ops;
> Compared to the original code, that's something new. Please can you
> explain why this is added in this patch?
Yea, I probably should have mentioned in the commit message that
this function via ops->domain_alloc_user() is called by IOMMUFD
directly without a wrapper, v.s. the other callers all go with
the __iommu_domain_alloc() helper in the iommu core where an ops
pointer gets setup.
So, this is something new, in order to work with IOMMUFD.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-24 17:40     ` Nicolin Chen
@ 2023-03-24 17:50       ` Jason Gunthorpe
  2023-03-24 18:00         ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 17:50 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Eric Auger, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 10:40:46AM -0700, Nicolin Chen wrote:
> Hi Eirc,
> 
> Thanks for the review.
> 
> On Fri, Mar 24, 2023 at 04:28:26PM +0100, Eric Auger wrote:
> 
> > > +static struct iommu_domain *
> > > +__arm_smmu_domain_alloc(unsigned type,
> > > +                     struct arm_smmu_domain *s2,
> > > +                     struct arm_smmu_master *master,
> > > +                     const struct iommu_hwpt_arm_smmuv3 *user_cfg)
> > > +{
> > > +     struct arm_smmu_domain *smmu_domain;
> > > +     struct iommu_domain *domain;
> > > +     int ret = 0;
> > > +
> > > +     if (type == IOMMU_DOMAIN_SVA)
> > > +             return arm_smmu_sva_domain_alloc();
> > > +
> > > +     if (type != IOMMU_DOMAIN_UNMANAGED &&
> > > +         type != IOMMU_DOMAIN_DMA &&
> > > +         type != IOMMU_DOMAIN_DMA_FQ &&
> > > +         type != IOMMU_DOMAIN_IDENTITY)
> > > +             return NULL;
> > > +
> > > +     /*
> > > +      * Allocate the domain and initialise some of its data structures.
> > > +      * We can't really finalise the domain unless a master is given.
> > > +      */
> > > +     smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> > > +     if (!smmu_domain)
> > > +             return NULL;
> > > +     domain = &smmu_domain->domain;
> > > +
> > > +     domain->type = type;
> > > +     domain->ops = arm_smmu_ops.default_domain_ops;
> > Compared to the original code, that's something new. Please can you
> > explain why this is added in this patch?
> 
> Yea, I probably should have mentioned in the commit message that
> this function via ops->domain_alloc_user() is called by IOMMUFD
> directly without a wrapper, v.s. the other callers all go with
> the __iommu_domain_alloc() helper in the iommu core where an ops
> pointer gets setup.
> 
> So, this is something new, in order to work with IOMMUFD.
But using default_domain_ops is not great, the ops should be set based
on the domain type being created and the various different flavours
should have their own types and ops.
So name the existing ops something logical like 'unmanaged_domain_ops'
and move it out of the inline initializer.
Make another ops for identity like shown here to get the ball rolling:
 https://lore.kernel.org/r/20230324111127.221640-1-steven.price@arm.com
There is a whole bunch of tidying here to make things follow the op
per type design.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-24 17:50       ` Jason Gunthorpe
@ 2023-03-24 18:00         ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 18:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 02:50:42PM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 24, 2023 at 10:40:46AM -0700, Nicolin Chen wrote:
> > Hi Eirc,
> > 
> > Thanks for the review.
> > 
> > On Fri, Mar 24, 2023 at 04:28:26PM +0100, Eric Auger wrote:
> > 
> > > > +static struct iommu_domain *
> > > > +__arm_smmu_domain_alloc(unsigned type,
> > > > +                     struct arm_smmu_domain *s2,
> > > > +                     struct arm_smmu_master *master,
> > > > +                     const struct iommu_hwpt_arm_smmuv3 *user_cfg)
> > > > +{
> > > > +     struct arm_smmu_domain *smmu_domain;
> > > > +     struct iommu_domain *domain;
> > > > +     int ret = 0;
> > > > +
> > > > +     if (type == IOMMU_DOMAIN_SVA)
> > > > +             return arm_smmu_sva_domain_alloc();
> > > > +
> > > > +     if (type != IOMMU_DOMAIN_UNMANAGED &&
> > > > +         type != IOMMU_DOMAIN_DMA &&
> > > > +         type != IOMMU_DOMAIN_DMA_FQ &&
> > > > +         type != IOMMU_DOMAIN_IDENTITY)
> > > > +             return NULL;
> > > > +
> > > > +     /*
> > > > +      * Allocate the domain and initialise some of its data structures.
> > > > +      * We can't really finalise the domain unless a master is given.
> > > > +      */
> > > > +     smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> > > > +     if (!smmu_domain)
> > > > +             return NULL;
> > > > +     domain = &smmu_domain->domain;
> > > > +
> > > > +     domain->type = type;
> > > > +     domain->ops = arm_smmu_ops.default_domain_ops;
> > > Compared to the original code, that's something new. Please can you
> > > explain why this is added in this patch?
> > 
> > Yea, I probably should have mentioned in the commit message that
> > this function via ops->domain_alloc_user() is called by IOMMUFD
> > directly without a wrapper, v.s. the other callers all go with
> > the __iommu_domain_alloc() helper in the iommu core where an ops
> > pointer gets setup.
> > 
> > So, this is something new, in order to work with IOMMUFD.
> 
> But using default_domain_ops is not great, the ops should be set based
> on the domain type being created and the various different flavours
> should have their own types and ops.
> 
> So name the existing ops something logical like 'unmanaged_domain_ops'
> and move it out of the inline initializer.
> 
> Make another ops for identity like shown here to get the ball rolling:
> 
>  https://lore.kernel.org/r/20230324111127.221640-1-steven.price@arm.com
> 
> There is a whole bunch of tidying here to make things follow the op
> per type design.
Thanks for the suggestion. Will add a patch doing that in v2.
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
 
 
 
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-09 10:53 ` [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user Nicolin Chen
  2023-03-24 15:28   ` Eric Auger
@ 2023-03-24 15:33   ` Eric Auger
  2023-03-24 17:43     ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Eric Auger @ 2023-03-24 15:33 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 3/9/23 11:53, Nicolin Chen wrote:
> The arm_smmu_domain_alloc_user callback function is used for userspace to
> allocate iommu_domains, such as standalone stage-1 domain, nested stage-1
> domain, and nested stage-2 domain. The input user_data is in the type of
> struct iommu_hwpt_arm_smmuv3 that contains the configurations of a nested
> stage-1 or a nested stage-2 iommu_domain. A NULL user_data will just opt
> in a standalone stage-1 domain allocation.
>
> Add a constitutive function __arm_smmu_domain_alloc to support that.
>
> Since ops->domain_alloc_user has a valid dev pointer, the master pointer
> is available when calling __arm_smmu_domain_alloc() in this case, meaning
> that arm_smmu_domain_finalise() can be done at the allocation stage. This
> allows IOMMUFD to initialize the hw_pagetable for the domain.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 95 ++++++++++++++-------
>  1 file changed, 65 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2d29f7320570..5ff74edfbd68 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2053,36 +2053,6 @@ static void *arm_smmu_hw_info(struct device *dev, u32 *length)
>  	return info;
>  }
>  
> -static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> -{
> -	struct arm_smmu_domain *smmu_domain;
> -
> -	if (type == IOMMU_DOMAIN_SVA)
> -		return arm_smmu_sva_domain_alloc();
> -
> -	if (type != IOMMU_DOMAIN_UNMANAGED &&
> -	    type != IOMMU_DOMAIN_DMA &&
> -	    type != IOMMU_DOMAIN_DMA_FQ &&
> -	    type != IOMMU_DOMAIN_IDENTITY)
> -		return NULL;
> -
> -	/*
> -	 * Allocate the domain and initialise some of its data structures.
> -	 * We can't really do anything meaningful until we've added a
> -	 * master.
> -	 */
> -	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> -	if (!smmu_domain)
> -		return NULL;
> -
> -	mutex_init(&smmu_domain->init_mutex);
> -	INIT_LIST_HEAD(&smmu_domain->devices);
> -	spin_lock_init(&smmu_domain->devices_lock);
> -	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
> -
> -	return &smmu_domain->domain;
> -}
> -
>  static struct iommu_domain *arm_smmu_get_unmanaged_domain(struct device *dev)
>  {
>  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> @@ -2893,10 +2863,75 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>  	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>  }
>  
> +static struct iommu_domain *
> +__arm_smmu_domain_alloc(unsigned type,
> +			struct arm_smmu_domain *s2,
I think you should rather introduce s2 param in "iommu/arm-smmu-v3:
Support IOMMU_DOMAIN_NESTED type of allocations" because it is not use
at all in this patch nor really explained in the commit msg
Thanks
Eric
> +			struct arm_smmu_master *master,
> +			const struct iommu_hwpt_arm_smmuv3 *user_cfg)
> +{
> +	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_domain *domain;
> +	int ret = 0;
> +
> +	if (type == IOMMU_DOMAIN_SVA)
> +		return arm_smmu_sva_domain_alloc();
> +
> +	if (type != IOMMU_DOMAIN_UNMANAGED &&
> +	    type != IOMMU_DOMAIN_DMA &&
> +	    type != IOMMU_DOMAIN_DMA_FQ &&
> +	    type != IOMMU_DOMAIN_IDENTITY)
> +		return NULL;
> +
> +	/*
> +	 * Allocate the domain and initialise some of its data structures.
> +	 * We can't really finalise the domain unless a master is given.
> +	 */
> +	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> +	if (!smmu_domain)
> +		return NULL;
> +	domain = &smmu_domain->domain;
> +
> +	domain->type = type;
> +	domain->ops = arm_smmu_ops.default_domain_ops;
> +
> +	mutex_init(&smmu_domain->init_mutex);
> +	INIT_LIST_HEAD(&smmu_domain->devices);
> +	spin_lock_init(&smmu_domain->devices_lock);
> +	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
> +
> +	if (master) {
> +		smmu_domain->smmu = master->smmu;
> +		ret = arm_smmu_domain_finalise(domain, master, user_cfg);
> +		if (ret) {
> +			kfree(smmu_domain);
> +			return NULL;
> +		}
> +	}
> +
> +	return domain;
> +}
> +
> +static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> +{
> +	return __arm_smmu_domain_alloc(type, NULL, NULL, NULL);
> +}
> +
> +static struct iommu_domain *
> +arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
> +			   const void *user_data)
> +{
> +	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
> +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	unsigned type = IOMMU_DOMAIN_UNMANAGED;
> +
> +	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
> +}
> +
>  static struct iommu_ops arm_smmu_ops = {
>  	.capable		= arm_smmu_capable,
>  	.hw_info		= arm_smmu_hw_info,
>  	.domain_alloc		= arm_smmu_domain_alloc,
> +	.domain_alloc_user	= arm_smmu_domain_alloc_user,
>  	.get_unmanaged_domain	= arm_smmu_get_unmanaged_domain,
>  	.probe_device		= arm_smmu_probe_device,
>  	.release_device		= arm_smmu_release_device,
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user
  2023-03-24 15:33   ` Eric Auger
@ 2023-03-24 17:43     ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 17:43 UTC (permalink / raw)
  To: Eric Auger
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 04:33:31PM +0100, Eric Auger wrote:
> > @@ -2893,10 +2863,75 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> >       arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
> >  }
> >
> > +static struct iommu_domain *
> > +__arm_smmu_domain_alloc(unsigned type,
> > +                     struct arm_smmu_domain *s2,
> I think you should rather introduce s2 param in "iommu/arm-smmu-v3:
> Support IOMMU_DOMAIN_NESTED type of allocations" because it is not use
> at all in this patch nor really explained in the commit msg
OK. I will move it.
Thanks!
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (10 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 13:20   ` Robin Murphy
  2023-03-24 15:44   ` Eric Auger
  2023-03-09 10:53 ` [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL Nicolin Chen
  2023-03-09 10:53 ` [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user Nicolin Chen
  13 siblings, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
the "finalise" part to log in the user space Stream Table Entry info.
Co-developed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5ff74edfbd68..1f318b5e0921 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2214,6 +2214,19 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		return 0;
 	}
 
+	if (domain->type == IOMMU_DOMAIN_NESTED) {
+		if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
+		    !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) {
+			dev_dbg(smmu->dev, "does not implement two stages\n");
+			return -EINVAL;
+		}
+		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
+		smmu_domain->s1_cfg.s1fmt = user_cfg->s1fmt;
+		smmu_domain->s1_cfg.s1cdmax = user_cfg->s1cdmax;
+		smmu_domain->s1_cfg.cdcfg.cdtab_dma = user_cfg->s1ctxptr;
+		return 0;
+	}
+
 	if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
 		return -EINVAL;
 	if (user_cfg_s2)
@@ -2863,6 +2876,11 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
 }
 
+static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
+	.attach_dev		= arm_smmu_attach_dev,
+	.free			= arm_smmu_domain_free,
+};
+
 static struct iommu_domain *
 __arm_smmu_domain_alloc(unsigned type,
 			struct arm_smmu_domain *s2,
@@ -2877,11 +2895,15 @@ __arm_smmu_domain_alloc(unsigned type,
 		return arm_smmu_sva_domain_alloc();
 
 	if (type != IOMMU_DOMAIN_UNMANAGED &&
+	    type != IOMMU_DOMAIN_NESTED &&
 	    type != IOMMU_DOMAIN_DMA &&
 	    type != IOMMU_DOMAIN_DMA_FQ &&
 	    type != IOMMU_DOMAIN_IDENTITY)
 		return NULL;
 
+	if (s2 && s2->stage != ARM_SMMU_DOMAIN_S2)
+		return NULL;
+
 	/*
 	 * Allocate the domain and initialise some of its data structures.
 	 * We can't really finalise the domain unless a master is given.
@@ -2889,10 +2911,14 @@ __arm_smmu_domain_alloc(unsigned type,
 	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
 	if (!smmu_domain)
 		return NULL;
+	smmu_domain->s2 = s2;
 	domain = &smmu_domain->domain;
 
 	domain->type = type;
-	domain->ops = arm_smmu_ops.default_domain_ops;
+	if (s2)
+		domain->ops = &arm_smmu_nested_domain_ops;
+	else
+		domain->ops = arm_smmu_ops.default_domain_ops;
 
 	mutex_init(&smmu_domain->init_mutex);
 	INIT_LIST_HEAD(&smmu_domain->devices);
@@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
 	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	unsigned type = IOMMU_DOMAIN_UNMANAGED;
+	struct arm_smmu_domain *s2 = NULL;
+
+	if (parent) {
+		if (parent->ops != arm_smmu_ops.default_domain_ops)
+			return NULL;
+		type = IOMMU_DOMAIN_NESTED;
+		s2 = to_smmu_domain(parent);
+	}
 
-	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
+	return __arm_smmu_domain_alloc(type, s2, master, user_cfg);
 }
 
 static struct iommu_ops arm_smmu_ops = {
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-09 10:53 ` [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations Nicolin Chen
@ 2023-03-09 13:20   ` Robin Murphy
  2023-03-09 14:28     ` Robin Murphy
  2023-03-24 15:44   ` Eric Auger
  1 sibling, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 13:20 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
> the "finalise" part to log in the user space Stream Table Entry info.
> 
> Co-developed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +++++++++++++++++++--
>   1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 5ff74edfbd68..1f318b5e0921 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2214,6 +2214,19 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>   		return 0;
>   	}
>   
> +	if (domain->type == IOMMU_DOMAIN_NESTED) {
> +		if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
> +		    !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) {
> +			dev_dbg(smmu->dev, "does not implement two stages\n");
> +			return -EINVAL;
> +		}
> +		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> +		smmu_domain->s1_cfg.s1fmt = user_cfg->s1fmt;
> +		smmu_domain->s1_cfg.s1cdmax = user_cfg->s1cdmax;
> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = user_cfg->s1ctxptr;
> +		return 0;
How's that going to work? If the caller's asked for something we can't 
provide, returning something else and hoping it fails later is not 
sensible, we should just fail right here. It's even more worrying if 
there's a chance it *won't* fail later, and a guest ends up with 
"nested" translation giving it full access to host PA space :/
Thanks,
Robin.
> +	}
> +
>   	if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
>   		return -EINVAL;
>   	if (user_cfg_s2)
> @@ -2863,6 +2876,11 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>   	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>   }
>   
> +static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
> +	.attach_dev		= arm_smmu_attach_dev,
> +	.free			= arm_smmu_domain_free,
> +};
> +
>   static struct iommu_domain *
>   __arm_smmu_domain_alloc(unsigned type,
>   			struct arm_smmu_domain *s2,
> @@ -2877,11 +2895,15 @@ __arm_smmu_domain_alloc(unsigned type,
>   		return arm_smmu_sva_domain_alloc();
>   
>   	if (type != IOMMU_DOMAIN_UNMANAGED &&
> +	    type != IOMMU_DOMAIN_NESTED &&
>   	    type != IOMMU_DOMAIN_DMA &&
>   	    type != IOMMU_DOMAIN_DMA_FQ &&
>   	    type != IOMMU_DOMAIN_IDENTITY)
>   		return NULL;
>   
> +	if (s2 && s2->stage != ARM_SMMU_DOMAIN_S2)
> +		return NULL;
> +
>   	/*
>   	 * Allocate the domain and initialise some of its data structures.
>   	 * We can't really finalise the domain unless a master is given.
> @@ -2889,10 +2911,14 @@ __arm_smmu_domain_alloc(unsigned type,
>   	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
>   	if (!smmu_domain)
>   		return NULL;
> +	smmu_domain->s2 = s2;
>   	domain = &smmu_domain->domain;
>   
>   	domain->type = type;
> -	domain->ops = arm_smmu_ops.default_domain_ops;
> +	if (s2)
> +		domain->ops = &arm_smmu_nested_domain_ops;
> +	else
> +		domain->ops = arm_smmu_ops.default_domain_ops;
>   
>   	mutex_init(&smmu_domain->init_mutex);
>   	INIT_LIST_HEAD(&smmu_domain->devices);
> @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
>   	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
>   	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>   	unsigned type = IOMMU_DOMAIN_UNMANAGED;
> +	struct arm_smmu_domain *s2 = NULL;
> +
> +	if (parent) {
> +		if (parent->ops != arm_smmu_ops.default_domain_ops)
> +			return NULL;
> +		type = IOMMU_DOMAIN_NESTED;
> +		s2 = to_smmu_domain(parent);
> +	}
>   
> -	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
> +	return __arm_smmu_domain_alloc(type, s2, master, user_cfg);
>   }
>   
>   static struct iommu_ops arm_smmu_ops = {
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-09 13:20   ` Robin Murphy
@ 2023-03-09 14:28     ` Robin Murphy
  2023-03-10  1:34       ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 14:28 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 13:20, Robin Murphy wrote:
> On 2023-03-09 10:53, Nicolin Chen wrote:
>> Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
>> the "finalise" part to log in the user space Stream Table Entry info.
>>
>> Co-developed-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +++++++++++++++++++--
>>   1 file changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5ff74edfbd68..1f318b5e0921 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2214,6 +2214,19 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>           return 0;
>>       }
>> +    if (domain->type == IOMMU_DOMAIN_NESTED) {
>> +        if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
>> +            !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) {
>> +            dev_dbg(smmu->dev, "does not implement two stages\n");
>> +            return -EINVAL;
>> +        }
>> +        smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
>> +        smmu_domain->s1_cfg.s1fmt = user_cfg->s1fmt;
>> +        smmu_domain->s1_cfg.s1cdmax = user_cfg->s1cdmax;
>> +        smmu_domain->s1_cfg.cdcfg.cdtab_dma = user_cfg->s1ctxptr;
>> +        return 0;
> 
> How's that going to work? If the caller's asked for something we can't 
> provide, returning something else and hoping it fails later is not 
> sensible, we should just fail right here. It's even more worrying if 
> there's a chance it *won't* fail later, and a guest ends up with 
> "nested" translation giving it full access to host PA space :/
Oops, apologies - in part thanks to the confusing indentation, I managed 
to miss the early return and misread this all being under the if 
condition for nesting not being supported. Sorry for the confusion :(
Thanks,
Robin.
>> +    }
>> +
>>       if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
>>           return -EINVAL;
>>       if (user_cfg_s2)
>> @@ -2863,6 +2876,11 @@ static void arm_smmu_remove_dev_pasid(struct 
>> device *dev, ioasid_t pasid)
>>       arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>>   }
>> +static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
>> +    .attach_dev        = arm_smmu_attach_dev,
>> +    .free            = arm_smmu_domain_free,
>> +};
>> +
>>   static struct iommu_domain *
>>   __arm_smmu_domain_alloc(unsigned type,
>>               struct arm_smmu_domain *s2,
>> @@ -2877,11 +2895,15 @@ __arm_smmu_domain_alloc(unsigned type,
>>           return arm_smmu_sva_domain_alloc();
>>       if (type != IOMMU_DOMAIN_UNMANAGED &&
>> +        type != IOMMU_DOMAIN_NESTED &&
>>           type != IOMMU_DOMAIN_DMA &&
>>           type != IOMMU_DOMAIN_DMA_FQ &&
>>           type != IOMMU_DOMAIN_IDENTITY)
>>           return NULL;
>> +    if (s2 && s2->stage != ARM_SMMU_DOMAIN_S2)
>> +        return NULL;
>> +
>>       /*
>>        * Allocate the domain and initialise some of its data structures.
>>        * We can't really finalise the domain unless a master is given.
>> @@ -2889,10 +2911,14 @@ __arm_smmu_domain_alloc(unsigned type,
>>       smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
>>       if (!smmu_domain)
>>           return NULL;
>> +    smmu_domain->s2 = s2;
>>       domain = &smmu_domain->domain;
>>       domain->type = type;
>> -    domain->ops = arm_smmu_ops.default_domain_ops;
>> +    if (s2)
>> +        domain->ops = &arm_smmu_nested_domain_ops;
>> +    else
>> +        domain->ops = arm_smmu_ops.default_domain_ops;
>>       mutex_init(&smmu_domain->init_mutex);
>>       INIT_LIST_HEAD(&smmu_domain->devices);
>> @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, 
>> struct iommu_domain *parent,
>>       const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
>>       struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>>       unsigned type = IOMMU_DOMAIN_UNMANAGED;
>> +    struct arm_smmu_domain *s2 = NULL;
>> +
>> +    if (parent) {
>> +        if (parent->ops != arm_smmu_ops.default_domain_ops)
>> +            return NULL;
>> +        type = IOMMU_DOMAIN_NESTED;
>> +        s2 = to_smmu_domain(parent);
>> +    }
>> -    return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
>> +    return __arm_smmu_domain_alloc(type, s2, master, user_cfg);
>>   }
>>   static struct iommu_ops arm_smmu_ops = {
> 
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-09 14:28     ` Robin Murphy
@ 2023-03-10  1:34       ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  1:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 02:28:09PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-09 13:20, Robin Murphy wrote:
> > On 2023-03-09 10:53, Nicolin Chen wrote:
> > > Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
> > > the "finalise" part to log in the user space Stream Table Entry info.
> > > 
> > > Co-developed-by: Eric Auger <eric.auger@redhat.com>
> > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > ---
> > >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +++++++++++++++++++--
> > >   1 file changed, 36 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 5ff74edfbd68..1f318b5e0921 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -2214,6 +2214,19 @@ static int arm_smmu_domain_finalise(struct
> > > iommu_domain *domain,
> > >           return 0;
> > >       }
> > > +    if (domain->type == IOMMU_DOMAIN_NESTED) {
> > > +        if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
> > > +            !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) {
> > > +            dev_dbg(smmu->dev, "does not implement two stages\n");
> > > +            return -EINVAL;
> > > +        }
> > > +        smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> > > +        smmu_domain->s1_cfg.s1fmt = user_cfg->s1fmt;
> > > +        smmu_domain->s1_cfg.s1cdmax = user_cfg->s1cdmax;
> > > +        smmu_domain->s1_cfg.cdcfg.cdtab_dma = user_cfg->s1ctxptr;
> > > +        return 0;
> > 
> > How's that going to work? If the caller's asked for something we can't
> > provide, returning something else and hoping it fails later is not
> > sensible, we should just fail right here. It's even more worrying if
> > there's a chance it *won't* fail later, and a guest ends up with
> > "nested" translation giving it full access to host PA space :/
> 
> Oops, apologies - in part thanks to the confusing indentation, I managed
> to miss the early return and misread this all being under the if
> condition for nesting not being supported. Sorry for the confusion :(
Perhaps this can help readability, considering that we have
multiple places checking the TRANS_S1 and TRANS_S2 features:
	bool feat_has_s1 smmu->features & ARM_SMMU_FEAT_TRANS_S1;
	bool feat_has_s2 smmu->features & ARM_SMMU_FEAT_TRANS_S2;
	if (domain->type == IOMMU_DOMAIN_NESTED) {
		if (!feat_has_s1 || !feat_has_s2) {
			dev_dbg(smmu->dev, "does not implement two stages\n");
			return -EINVAL;
		}
		...
		return 0;
	}
	if (user_cfg_s2 && !feat_has_s2)
		return -EINVAL;
	...
	if (!feat_has_s1)
		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
	if (!feat_has_s2)
		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
Would you like this?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
 
 
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-09 10:53 ` [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations Nicolin Chen
  2023-03-09 13:20   ` Robin Murphy
@ 2023-03-24 15:44   ` Eric Auger
  2023-03-24 16:30     ` Jason Gunthorpe
  2023-03-24 17:50     ` Nicolin Chen
  1 sibling, 2 replies; 165+ messages in thread
From: Eric Auger @ 2023-03-24 15:44 UTC (permalink / raw)
  To: Nicolin Chen, jgg, robin.murphy, will
  Cc: kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Hi Nicolin,
On 3/9/23 11:53, Nicolin Chen wrote:
> Add domain allocation support for IOMMU_DOMAIN_NESTED type. This includes
> the "finalise" part to log in the user space Stream Table Entry info.
Please explain the domain ops specialization.
>
> Co-developed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 5ff74edfbd68..1f318b5e0921 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2214,6 +2214,19 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>  		return 0;
>  	}
>  
> +	if (domain->type == IOMMU_DOMAIN_NESTED) {
> +		if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
> +		    !(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) {
> +			dev_dbg(smmu->dev, "does not implement two stages\n");
> +			return -EINVAL;
> +		}
> +		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> +		smmu_domain->s1_cfg.s1fmt = user_cfg->s1fmt;
> +		smmu_domain->s1_cfg.s1cdmax = user_cfg->s1cdmax;
> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = user_cfg->s1ctxptr;
> +		return 0;
> +	}
> +
>  	if (user_cfg_s2 && !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
>  		return -EINVAL;
>  	if (user_cfg_s2)
> @@ -2863,6 +2876,11 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>  	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>  }
>  
> +static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
> +	.attach_dev		= arm_smmu_attach_dev,
> +	.free			= arm_smmu_domain_free,
> +};
> +
>  static struct iommu_domain *
>  __arm_smmu_domain_alloc(unsigned type,
>  			struct arm_smmu_domain *s2,
> @@ -2877,11 +2895,15 @@ __arm_smmu_domain_alloc(unsigned type,
>  		return arm_smmu_sva_domain_alloc();
>  
>  	if (type != IOMMU_DOMAIN_UNMANAGED &&
> +	    type != IOMMU_DOMAIN_NESTED &&
>  	    type != IOMMU_DOMAIN_DMA &&
>  	    type != IOMMU_DOMAIN_DMA_FQ &&
>  	    type != IOMMU_DOMAIN_IDENTITY)
>  		return NULL;
>  
> +	if (s2 && s2->stage != ARM_SMMU_DOMAIN_S2)
> +		return NULL;
> +
>  	/*
>  	 * Allocate the domain and initialise some of its data structures.
>  	 * We can't really finalise the domain unless a master is given.
> @@ -2889,10 +2911,14 @@ __arm_smmu_domain_alloc(unsigned type,
>  	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
>  	if (!smmu_domain)
>  		return NULL;
> +	smmu_domain->s2 = s2;
>  	domain = &smmu_domain->domain;
>  
>  	domain->type = type;
> -	domain->ops = arm_smmu_ops.default_domain_ops;
> +	if (s2)
> +		domain->ops = &arm_smmu_nested_domain_ops;
> +	else
> +		domain->ops = arm_smmu_ops.default_domain_ops;
>  
>  	mutex_init(&smmu_domain->init_mutex);
>  	INIT_LIST_HEAD(&smmu_domain->devices);
> @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
>  	const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
>  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>  	unsigned type = IOMMU_DOMAIN_UNMANAGED;
> +	struct arm_smmu_domain *s2 = NULL;
> +
> +	if (parent) {
> +		if (parent->ops != arm_smmu_ops.default_domain_ops)
> +			return NULL;
> +		type = IOMMU_DOMAIN_NESTED;
> +		s2 = to_smmu_domain(parent);
> +	}
Please can you explain the (use) case where !parent. This creates an
unmanaged S1?
Thanks
Eric
>  
> -	return __arm_smmu_domain_alloc(type, NULL, master, user_cfg);
> +	return __arm_smmu_domain_alloc(type, s2, master, user_cfg);
>  }
>  
>  static struct iommu_ops arm_smmu_ops = {
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-24 15:44   ` Eric Auger
@ 2023-03-24 16:30     ` Jason Gunthorpe
  2023-03-24 17:50     ` Nicolin Chen
  1 sibling, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 16:30 UTC (permalink / raw)
  To: Eric Auger
  Cc: Nicolin Chen, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 04:44:58PM +0100, Eric Auger wrote:
> Please can you explain the (use) case where !parent. This creates an
> unmanaged S1?
If parent is not specified then userspace can force the IOPTE format
to be S1 or S2 of a normal unmanaged domain.
Not sure there is a usecase, but it seems reasonable to support. It
would be useful if there is further parameterization of the S1 like
limiting the number of address bits or something.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-24 15:44   ` Eric Auger
  2023-03-24 16:30     ` Jason Gunthorpe
@ 2023-03-24 17:50     ` Nicolin Chen
  2023-03-24 17:51       ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 17:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: jgg, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 04:44:58PM +0100, Eric Auger wrote:
> > @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
> >       const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
> >       struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> >       unsigned type = IOMMU_DOMAIN_UNMANAGED;
> > +     struct arm_smmu_domain *s2 = NULL;
> > +
> > +     if (parent) {
> > +             if (parent->ops != arm_smmu_ops.default_domain_ops)
> > +                     return NULL;
> > +             type = IOMMU_DOMAIN_NESTED;
> > +             s2 = to_smmu_domain(parent);
> > +     }
> Please can you explain the (use) case where !parent. This creates an
> unmanaged S1?
It creates an unmanaged type of a domain. The decision to mark
it as an unmanaged S1 or an unmanaged S2 domain, is done in the
finalise() function that it checks the S2 flag and set a stage
accordingly.
I think that I could add a few lines of comments inline or at
the top of the function to ease the readability.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-24 17:50     ` Nicolin Chen
@ 2023-03-24 17:51       ` Jason Gunthorpe
  2023-03-24 17:55         ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 17:51 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Eric Auger, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 10:50:34AM -0700, Nicolin Chen wrote:
> On Fri, Mar 24, 2023 at 04:44:58PM +0100, Eric Auger wrote:
> > > @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
> > >       const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
> > >       struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > >       unsigned type = IOMMU_DOMAIN_UNMANAGED;
> > > +     struct arm_smmu_domain *s2 = NULL;
> > > +
> > > +     if (parent) {
> > > +             if (parent->ops != arm_smmu_ops.default_domain_ops)
> > > +                     return NULL;
> > > +             type = IOMMU_DOMAIN_NESTED;
> > > +             s2 = to_smmu_domain(parent);
> > > +     }
> > Please can you explain the (use) case where !parent. This creates an
> > unmanaged S1?
> 
> It creates an unmanaged type of a domain. The decision to mark
> it as an unmanaged S1 or an unmanaged S2 domain, is done in the
> finalise() function that it checks the S2 flag and set a stage
> accordingly.
This also needs to be fixed up, the alloc_user should not return
incompletely initialized domains.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations
  2023-03-24 17:51       ` Jason Gunthorpe
@ 2023-03-24 17:55         ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 17:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, robin.murphy, will, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 24, 2023 at 02:51:45PM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 24, 2023 at 10:50:34AM -0700, Nicolin Chen wrote:
> > On Fri, Mar 24, 2023 at 04:44:58PM +0100, Eric Auger wrote:
> > > > @@ -2923,8 +2949,16 @@ arm_smmu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
> > > >       const struct iommu_hwpt_arm_smmuv3 *user_cfg = user_data;
> > > >       struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > > >       unsigned type = IOMMU_DOMAIN_UNMANAGED;
> > > > +     struct arm_smmu_domain *s2 = NULL;
> > > > +
> > > > +     if (parent) {
> > > > +             if (parent->ops != arm_smmu_ops.default_domain_ops)
> > > > +                     return NULL;
> > > > +             type = IOMMU_DOMAIN_NESTED;
> > > > +             s2 = to_smmu_domain(parent);
> > > > +     }
> > > Please can you explain the (use) case where !parent. This creates an
> > > unmanaged S1?
> > 
> > It creates an unmanaged type of a domain. The decision to mark
> > it as an unmanaged S1 or an unmanaged S2 domain, is done in the
> > finalise() function that it checks the S2 flag and set a stage
> > accordingly.
> 
> This also needs to be fixed up, the alloc_user should not return
> incompletely initialized domains.
The finalise() is called at the end of __arm_smmu_domain_alloc()
so alloc_user passing a dev pointer completes the initialization
actually.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
 
 
 
 
- * [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (11 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 13:44   ` Robin Murphy
  2023-03-09 10:53 ` [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user Nicolin Chen
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
With a nested translation setup, a stage-1 Context Descriptor table can be
managed by a guest OS in the user space. So, the kernel driver should not
assume that the guest OS will use a user space device driver that doesn't
support TLBI_NH_VAA and TLBI_NH_ALL commands.
Add them in the arm_smmu_cmdq_build_cmd(), to prepare for support of these
two TLBI invalidation requests from the guest level.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
 2 files changed, 6 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1f318b5e0921..ac63185ae268 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -277,6 +277,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		/* Cover the entire SID range */
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
 		break;
+	case CMDQ_OP_TLBI_NH_VAA:
+		ent->tlbi.asid = 0;
+		fallthrough;
 	case CMDQ_OP_TLBI_NH_VA:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
 		fallthrough;
@@ -301,6 +304,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_NH_ASID:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
 		fallthrough;
+	case CMDQ_OP_TLBI_NH_ALL:
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
 		break;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 6cf516852721..6181d6cd8b51 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -454,8 +454,10 @@ struct arm_smmu_cmdq_ent {
 			};
 		} cfgi;
 
+		#define CMDQ_OP_TLBI_NH_ALL	0x10
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
+		#define CMDQ_OP_TLBI_NH_VAA	0x13
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
 		#define CMDQ_OP_TLBI_EL2_ASID	0x21
 		#define CMDQ_OP_TLBI_EL2_VA	0x22
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL
  2023-03-09 10:53 ` [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL Nicolin Chen
@ 2023-03-09 13:44   ` Robin Murphy
  2023-03-10  1:19     ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 13:44 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> With a nested translation setup, a stage-1 Context Descriptor table can be
> managed by a guest OS in the user space. So, the kernel driver should not
> assume that the guest OS will use a user space device driver that doesn't
> support TLBI_NH_VAA and TLBI_NH_ALL commands.
> 
> Add them in the arm_smmu_cmdq_build_cmd(), to prepare for support of these
> two TLBI invalidation requests from the guest level.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++++
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
>   2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1f318b5e0921..ac63185ae268 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -277,6 +277,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   		/* Cover the entire SID range */
>   		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
>   		break;
> +	case CMDQ_OP_TLBI_NH_VAA:
> +		ent->tlbi.asid = 0;
This is backwards - NH_VA is a superset of NH_VAA (not to mention that 
quietly modifying the input argument is ugly; in fact it might be nice 
if ent was const here).
Please follow the existing pattern, and decouple NH_VA from EL2_VA if 
necessary.
Thanks,
Robin.
> +		fallthrough;
>   	case CMDQ_OP_TLBI_NH_VA:
>   		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
>   		fallthrough;
> @@ -301,6 +304,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   	case CMDQ_OP_TLBI_NH_ASID:
>   		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
>   		fallthrough;
> +	case CMDQ_OP_TLBI_NH_ALL:
>   	case CMDQ_OP_TLBI_S12_VMALL:
>   		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
>   		break;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 6cf516852721..6181d6cd8b51 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -454,8 +454,10 @@ struct arm_smmu_cmdq_ent {
>   			};
>   		} cfgi;
>   
> +		#define CMDQ_OP_TLBI_NH_ALL	0x10
>   		#define CMDQ_OP_TLBI_NH_ASID	0x11
>   		#define CMDQ_OP_TLBI_NH_VA	0x12
> +		#define CMDQ_OP_TLBI_NH_VAA	0x13
>   		#define CMDQ_OP_TLBI_EL2_ALL	0x20
>   		#define CMDQ_OP_TLBI_EL2_ASID	0x21
>   		#define CMDQ_OP_TLBI_EL2_VA	0x22
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL
  2023-03-09 13:44   ` Robin Murphy
@ 2023-03-10  1:19     ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  1:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 01:44:34PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-09 10:53, Nicolin Chen wrote:
> > With a nested translation setup, a stage-1 Context Descriptor table can be
> > managed by a guest OS in the user space. So, the kernel driver should not
> > assume that the guest OS will use a user space device driver that doesn't
> > support TLBI_NH_VAA and TLBI_NH_ALL commands.
> > 
> > Add them in the arm_smmu_cmdq_build_cmd(), to prepare for support of these
> > two TLBI invalidation requests from the guest level.
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 ++++
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++
> >   2 files changed, 6 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 1f318b5e0921..ac63185ae268 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -277,6 +277,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> >               /* Cover the entire SID range */
> >               cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> >               break;
> > +     case CMDQ_OP_TLBI_NH_VAA:
> > +             ent->tlbi.asid = 0;
> 
> This is backwards - NH_VA is a superset of NH_VAA (not to mention that
> quietly modifying the input argument is ugly; in fact it might be nice
> if ent was const here).
I see.
> Please follow the existing pattern, and decouple NH_VA from EL2_VA if
> necessary.
OK. I was trying to keep it neat, but it looks like decoupling
is the right way.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
- * [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
                   ` (12 preceding siblings ...)
  2023-03-09 10:53 ` [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL Nicolin Chen
@ 2023-03-09 10:53 ` Nicolin Chen
  2023-03-09 14:49   ` Robin Murphy
  13 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-09 10:53 UTC (permalink / raw)
  To: jgg, robin.murphy, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
Add arm_smmu_cache_invalidate_user() function for user space to invalidate
TLB entries and Context Descriptors, since either an IO page table entrie
or a Context Descriptor in the user space is still cached by the hardware.
The input user_data is defined in "struct iommu_hwpt_invalidate_arm_smmuv3"
that contains the essential data for corresponding invalidation commands.
Co-developed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 +++++++++++++++++++++
 1 file changed, 56 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ac63185ae268..7d73eab5e7f4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2880,9 +2880,65 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
 }
 
+static void arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
+					   void *user_data)
+{
+	struct iommu_hwpt_invalidate_arm_smmuv3 *inv_info = user_data;
+	struct arm_smmu_cmdq_ent cmd = { .opcode = inv_info->opcode };
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	size_t granule_size = inv_info->granule_size;
+	unsigned long iova = 0;
+	size_t size = 0;
+	int ssid = 0;
+
+	if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
+		return;
+
+	switch (inv_info->opcode) {
+	case CMDQ_OP_CFGI_CD:
+	case CMDQ_OP_CFGI_CD_ALL:
+		return arm_smmu_sync_cd(smmu_domain, inv_info->ssid, true);
+	case CMDQ_OP_TLBI_NH_VA:
+		cmd.tlbi.asid = inv_info->asid;
+		fallthrough;
+	case CMDQ_OP_TLBI_NH_VAA:
+		if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
+		    granule_size & ~(1ULL << __ffs(granule_size)))
+			return;
+
+		iova = inv_info->range.start;
+		size = inv_info->range.last - inv_info->range.start + 1;
+		if (!size)
+			return;
+
+		cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
+		cmd.tlbi.leaf = inv_info->flags & IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF;
+		__arm_smmu_tlb_inv_range(&cmd, iova, size, granule_size, smmu_domain);
+		break;
+	case CMDQ_OP_TLBI_NH_ASID:
+		cmd.tlbi.asid = inv_info->asid;
+		fallthrough;
+	case CMDQ_OP_TLBI_NH_ALL:
+		cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
+		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+		break;
+	case CMDQ_OP_ATC_INV:
+		ssid = inv_info->ssid;
+		iova = inv_info->range.start;
+		size = inv_info->range.last - inv_info->range.start + 1;
+		break;
+	default:
+		return;
+	}
+
+	arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size);
+}
+
 static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
 	.attach_dev		= arm_smmu_attach_dev,
 	.free			= arm_smmu_domain_free,
+	.cache_invalidate_user	= arm_smmu_cache_invalidate_user,
 };
 
 static struct iommu_domain *
-- 
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 10:53 ` [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user Nicolin Chen
@ 2023-03-09 14:49   ` Robin Murphy
  2023-03-09 15:31     ` Jason Gunthorpe
                       ` (2 more replies)
  0 siblings, 3 replies; 165+ messages in thread
From: Robin Murphy @ 2023-03-09 14:49 UTC (permalink / raw)
  To: Nicolin Chen, jgg, will
  Cc: eric.auger, kevin.tian, baolu.lu, joro, shameerali.kolothum.thodi,
	jean-philippe, linux-arm-kernel, iommu, linux-kernel
On 2023-03-09 10:53, Nicolin Chen wrote:
> Add arm_smmu_cache_invalidate_user() function for user space to invalidate
> TLB entries and Context Descriptors, since either an IO page table entrie
> or a Context Descriptor in the user space is still cached by the hardware.
> 
> The input user_data is defined in "struct iommu_hwpt_invalidate_arm_smmuv3"
> that contains the essential data for corresponding invalidation commands.
> 
> Co-developed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 +++++++++++++++++++++
>   1 file changed, 56 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ac63185ae268..7d73eab5e7f4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2880,9 +2880,65 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>   	arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>   }
>   
> +static void arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
> +					   void *user_data)
> +{
> +	struct iommu_hwpt_invalidate_arm_smmuv3 *inv_info = user_data;
> +	struct arm_smmu_cmdq_ent cmd = { .opcode = inv_info->opcode };
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	size_t granule_size = inv_info->granule_size;
> +	unsigned long iova = 0;
> +	size_t size = 0;
> +	int ssid = 0;
> +
> +	if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
> +		return;
> +
> +	switch (inv_info->opcode) {
> +	case CMDQ_OP_CFGI_CD:
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		return arm_smmu_sync_cd(smmu_domain, inv_info->ssid, true);
Since we let the guest choose its own S1Fmt (and S1CDMax, yet not 
S1DSS?), how can we assume leaf = true here?
> +	case CMDQ_OP_TLBI_NH_VA:
> +		cmd.tlbi.asid = inv_info->asid;
> +		fallthrough;
> +	case CMDQ_OP_TLBI_NH_VAA:
> +		if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
Non-range invalidations with TG=0 are perfectly legal, and should not be 
ignored.
> +		    granule_size & ~(1ULL << __ffs(granule_size)))
If that's intended to mean is_power_of_2(), please just use is_power_of_2().
> +			return;
> +
> +		iova = inv_info->range.start;
> +		size = inv_info->range.last - inv_info->range.start + 1;
If the design here is that user_data is so deeply driver-specific and 
special to the point that it can't possibly be passed as a type-checked 
union of the known and publicly-visible UAPI types that it is, wouldn't 
it make sense to just encode the whole thing in the expected format and 
not have to make these kinds of niggling little conversions at both ends?
> +		if (!size)
> +			return;
> +
> +		cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> +		cmd.tlbi.leaf = inv_info->flags & IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF;
> +		__arm_smmu_tlb_inv_range(&cmd, iova, size, granule_size, smmu_domain);
> +		break;
> +	case CMDQ_OP_TLBI_NH_ASID:
> +		cmd.tlbi.asid = inv_info->asid;
> +		fallthrough;
> +	case CMDQ_OP_TLBI_NH_ALL:
> +		cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> +		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
> +		break;
> +	case CMDQ_OP_ATC_INV:
> +		ssid = inv_info->ssid;
> +		iova = inv_info->range.start;
> +		size = inv_info->range.last - inv_info->range.start + 1;
> +		break;
Can we do any better than multiplying every single ATC_INV command, even 
for random bogus StreamIDs, into multiple commands across every physical 
device? In fact, I'm not entirely confident this isn't problematic, if 
the guest wishes to send invalidations for one device specifically while 
it's put some other device into a state where sending it a command would 
do something bad. At the very least, it's liable to be confusing if the 
guest sends a command for one StreamID but gets an error back for a 
different one.
And if we expect ATS, what about PRI? Per patch #4 you're currently 
offering that to the guest as well.
> +	default:
> +		return;
What about NSNH_ALL? That still needs to invalidate all the S1 context 
that the guest *thinks* it's invalidating.
Also, perhaps I've overlooked something obvious, but what's the 
procedure for reflecting illegal commands back to userspace? Some of the 
things we're silently ignoring here would be expected to raise 
CERROR_ILL. Same goes for all the other fault events which may occur due 
to invalid S1 config, come to think of it.
Thanks,
Robin.
> +	}
> +
> +	arm_smmu_atc_inv_domain(smmu_domain, ssid, iova, size);
> +}
> +
>   static const struct iommu_domain_ops arm_smmu_nested_domain_ops = {
>   	.attach_dev		= arm_smmu_attach_dev,
>   	.free			= arm_smmu_domain_free,
> +	.cache_invalidate_user	= arm_smmu_cache_invalidate_user,
>   };
>   
>   static struct iommu_domain *
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 14:49   ` Robin Murphy
@ 2023-03-09 15:31     ` Jason Gunthorpe
  2023-03-10  4:20       ` Nicolin Chen
  2023-03-17  9:24       ` Tian, Kevin
  2023-03-10  3:51     ` Nicolin Chen
  2023-03-17  9:47     ` Tian, Kevin
  2 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-09 15:31 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> If the design here is that user_data is so deeply driver-specific and
> special to the point that it can't possibly be passed as a type-checked
> union of the known and publicly-visible UAPI types that it is, wouldn't it
> make sense to just encode the whole thing in the expected format and not
> have to make these kinds of niggling little conversions at both ends?
Yes, I suspect the design for ARM should have the input be the entire
actual command work queue entry. There is no reason to burn CPU cycles
in userspace marshalling it to something else and then decode it again
in the kernel. Organize things to point the ioctl directly at the
queue entry, and the kernel can do a single memcpy from guest
controlled pages to kernel memory then parse it?
More broadly, maybe should this be able to process a list of commands?
If the queue has a number of invalidations batching them to the kernel
sure would be nice.
Maybe also for Intel? Kevin?
> Also, perhaps I've overlooked something obvious, but what's the procedure
> for reflecting illegal commands back to userspace? Some of the things we're
> silently ignoring here would be expected to raise CERROR_ILL. Same goes for
> all the other fault events which may occur due to invalid S1 config, come to
> think of it.
Perhaps the ioctl should fail and the userpace viommu should inject
this CERROR_ILL?
But I'm also wondering if we are making a mistake to not just have the
kernel driver to expose a SW work queue in its native format and the
ioctl is only just 'read the queue'. Then it could (asynchronously!)
push back answers, real or emulated, as well, including all error
indications.
I think we got down this synchronous one-ioctl-per-invalidation path
because that was what the original generic stuff wanted to do. Is it
what we really want? Kevin what is your perspective?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 15:31     ` Jason Gunthorpe
@ 2023-03-10  4:20       ` Nicolin Chen
  2023-03-10 16:19         ` Jason Gunthorpe
  2023-03-17  9:24       ` Tian, Kevin
  1 sibling, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  4:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 11:31:04AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> 
> > If the design here is that user_data is so deeply driver-specific and
> > special to the point that it can't possibly be passed as a type-checked
> > union of the known and publicly-visible UAPI types that it is, wouldn't it
> > make sense to just encode the whole thing in the expected format and not
> > have to make these kinds of niggling little conversions at both ends?
> 
> Yes, I suspect the design for ARM should have the input be the entire
> actual command work queue entry. There is no reason to burn CPU cycles
> in userspace marshalling it to something else and then decode it again
> in the kernel. Organize things to point the ioctl directly at the
> queue entry, and the kernel can do a single memcpy from guest
> controlled pages to kernel memory then parse it?
There still can be complications to do something straightforward
like that. Firstly, the consumer and producer indexes might need
to be synced between the host and kernel? Secondly, things like
SID and VMID fields in the commands need to be replaced manually
when the host kernel reads commands out, which means that there
need to be a translation table(s) in the host kernel to replace
those fields. These actually are parts of the features of VCMDQ
hardware itself.
Though I am not sure about the amounts of burning CPU cycles, it
at least can simplify the uAPI a bit and meanwhile address the
multiplying issue at the ATC_INV command that Robin raised, so
long as we ensure the consumer and producer indexes wouldn't be
messed between host and guest?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10  4:20       ` Nicolin Chen
@ 2023-03-10 16:19         ` Jason Gunthorpe
  2023-03-11 11:56           ` Nicolin Chen
  2023-03-17  9:41           ` Tian, Kevin
  0 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 16:19 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 08:20:03PM -0800, Nicolin Chen wrote:
> On Thu, Mar 09, 2023 at 11:31:04AM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> > 
> > > If the design here is that user_data is so deeply driver-specific and
> > > special to the point that it can't possibly be passed as a type-checked
> > > union of the known and publicly-visible UAPI types that it is, wouldn't it
> > > make sense to just encode the whole thing in the expected format and not
> > > have to make these kinds of niggling little conversions at both ends?
> > 
> > Yes, I suspect the design for ARM should have the input be the entire
> > actual command work queue entry. There is no reason to burn CPU cycles
> > in userspace marshalling it to something else and then decode it again
> > in the kernel. Organize things to point the ioctl directly at the
> > queue entry, and the kernel can do a single memcpy from guest
> > controlled pages to kernel memory then parse it?
> 
> There still can be complications to do something straightforward
> like that. 
> Firstly, the consumer and producer indexes might need
> to be synced between the host and kernel?
No, qemu would handles this. The kernel would just read the command
entries it is told by qemu to read which qemu has already sorted out.
> Secondly, things like SID and VMID fields in the commands need to
> be replaced manually when the host kernel reads commands out, which
> means that there need to be a translation table(s) in the host
> kernel to replace those fields. These actually are parts of the
> features of VCMDQ hardware itself.
VMID should be ignored in a guest request.
SID translation is a good point. Can qemu do this? How does SID
translation work with VCMDQ in HW? (Jean this is exactly the sort of
tiny detail that the generic interface ignored)
What I'm broadly thinking is if we have to make the infrastructure for
VCMDQ HW accelerated invalidation then it is not a big step to also
have the kernel SW path use the same infrastructure just with a CPU
wake up instead of a MMIO poke.
Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
support.
I suspect the answer to Robin's question on how to handle errors is
the most important deciding factor. If we have to capture and relay
actual HW errors back to userspace that really suggests we should do
something different than a synchronous ioctl.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10 16:19         ` Jason Gunthorpe
@ 2023-03-11 11:56           ` Nicolin Chen
  2023-03-11 12:53             ` Nicolin Chen
  2023-03-20 13:03             ` Jason Gunthorpe
  2023-03-17  9:41           ` Tian, Kevin
  1 sibling, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11 11:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 12:19:50PM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 08:20:03PM -0800, Nicolin Chen wrote:
> > On Thu, Mar 09, 2023 at 11:31:04AM -0400, Jason Gunthorpe wrote:
> > > On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> > > 
> > > > If the design here is that user_data is so deeply driver-specific and
> > > > special to the point that it can't possibly be passed as a type-checked
> > > > union of the known and publicly-visible UAPI types that it is, wouldn't it
> > > > make sense to just encode the whole thing in the expected format and not
> > > > have to make these kinds of niggling little conversions at both ends?
> > > 
> > > Yes, I suspect the design for ARM should have the input be the entire
> > > actual command work queue entry. There is no reason to burn CPU cycles
> > > in userspace marshalling it to something else and then decode it again
> > > in the kernel. Organize things to point the ioctl directly at the
> > > queue entry, and the kernel can do a single memcpy from guest
> > > controlled pages to kernel memory then parse it?
> > 
> > There still can be complications to do something straightforward
> > like that. 
> 
> > Firstly, the consumer and producer indexes might need
> > to be synced between the host and kernel?
> 
> No, qemu would handles this. The kernel would just read the command
> entries it is told by qemu to read which qemu has already sorted out.
Then, instead of sending command, forwarding the consumer index?
> > Secondly, things like SID and VMID fields in the commands need to
> > be replaced manually when the host kernel reads commands out, which
> > means that there need to be a translation table(s) in the host
> > kernel to replace those fields. These actually are parts of the
> > features of VCMDQ hardware itself.
> 
> VMID should be ignored in a guest request.
The guest always set VMID fields to zero. But it should be then
handled in the host for most of TLBI commands.
VCMDQ has a register to set VMID explicitly so hardware can fill
the VMID fields spontaneously.
> SID translation is a good point. Can qemu do this? How does SID
> translation work with VCMDQ in HW? (Jean this is exactly the sort of
> tiny detail that the generic interface ignored)
VCMDQ has multiple pairs of MATCH and REPLACE registers to set
up hardware lookup table for SIDs. So hardware can do the job,
replacing the SID fields in the TLBI commands.
> What I'm broadly thinking is if we have to make the infrastructure for
> VCMDQ HW accelerated invalidation then it is not a big step to also
> have the kernel SW path use the same infrastructure just with a CPU
> wake up instead of a MMIO poke.
> 
> Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> support.
Very interesting idea!
I recall that one difficulty is to pass the vSID from the guest
down to the host kernel driver and to link with the pSID. What I
did previously for VCMDQ was to set the SID_MATCH register with
iommu_group_id(group) and set the SID_REPLACE register with the
pSID. Then hyper will use the iommu_group_id to search for the
pair of the registers, and to set vSID. Perhaps we should think
of something smarter.
> I suspect the answer to Robin's question on how to handle errors is
> the most important deciding factor. If we have to capture and relay
> actual HW errors back to userspace that really suggests we should do
> something different than a synchronous ioctl.
A synchronous ioctl is to return some values other than defining
cache_invalidate_user as void, like we are doing now? An fault
injection pathway to report CERROR asynchronously is what we've
been doing though -- even with Eric's previous VFIO solution.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-11 11:56           ` Nicolin Chen
@ 2023-03-11 12:53             ` Nicolin Chen
  2023-03-20 13:03             ` Jason Gunthorpe
  1 sibling, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11 12:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Sat, Mar 11, 2023 at 03:56:56AM -0800, Nicolin Chen wrote:
 
> I recall that one difficulty is to pass the vSID from the guest
> down to the host kernel driver and to link with the pSID. What I
> did previously for VCMDQ was to set the SID_MATCH register with
> iommu_group_id(group) and set the SID_REPLACE register with the
> pSID. Then hyper will use the iommu_group_id to search for the
> pair of the registers, and to set vSID. Perhaps we should think
> of something smarter.
I just found that the CFGI_STE command has the SID field, yet
we just didn't pack it in the data structure for a hwpt_alloc
ioctl. So, perhaps it isn't that difficult at all. I'll try a
bit of a test run next week.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-11 11:56           ` Nicolin Chen
  2023-03-11 12:53             ` Nicolin Chen
@ 2023-03-20 13:03             ` Jason Gunthorpe
  2023-03-20 15:56               ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 13:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Sat, Mar 11, 2023 at 03:56:50AM -0800, Nicolin Chen wrote:
> I recall that one difficulty is to pass the vSID from the guest
> down to the host kernel driver and to link with the pSID. What I
> did previously for VCMDQ was to set the SID_MATCH register with
> iommu_group_id(group) and set the SID_REPLACE register with the
> pSID. Then hyper will use the iommu_group_id to search for the
> pair of the registers, and to set vSID. Perhaps we should think
> of something smarter.
We need an ioctl for this, I think. To load a map of vSID to dev_id
into the driver. Kernel will convert dev_id to pSID. Driver will
program the map into HW.
SW path will program the map into an xarray
> > I suspect the answer to Robin's question on how to handle errors is
> > the most important deciding factor. If we have to capture and relay
> > actual HW errors back to userspace that really suggests we should do
> > something different than a synchronous ioctl.
> 
> A synchronous ioctl is to return some values other than defining
> cache_invalidate_user as void, like we are doing now? An fault
> injection pathway to report CERROR asynchronously is what we've
> been doing though -- even with Eric's previous VFIO solution.
Where is this? How does it look?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 13:03             ` Jason Gunthorpe
@ 2023-03-20 15:56               ` Nicolin Chen
  2023-03-20 16:04                 ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 15:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 10:03:04AM -0300, Jason Gunthorpe wrote:
> On Sat, Mar 11, 2023 at 03:56:50AM -0800, Nicolin Chen wrote:
> 
> > I recall that one difficulty is to pass the vSID from the guest
> > down to the host kernel driver and to link with the pSID. What I
> > did previously for VCMDQ was to set the SID_MATCH register with
> > iommu_group_id(group) and set the SID_REPLACE register with the
> > pSID. Then hyper will use the iommu_group_id to search for the
> > pair of the registers, and to set vSID. Perhaps we should think
> > of something smarter.
> 
> We need an ioctl for this, I think. To load a map of vSID to dev_id
> into the driver. Kernel will convert dev_id to pSID. Driver will
> program the map into HW.
Can we just pass a vSID via the alloc ioctl like this?
-----------------------------------------------------------
@@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
 #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
        __u64 flags;
        __u32 s2vmid;
-       __u32 __reserved;
+       __u32 sid;
        __u64 s1ctxptr;
        __u64 s1cdmax;
        __u64 s1fmt;
-----------------------------------------------------------
An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
an SID filed anyway.
> SW path will program the map into an xarray
I found a tricky thing about SIDs in the SMMU driver when doing
this experiment: the SMMU kernel driver mostly handles devices
using struct arm_smmu_master. However, an arm_smmu_master might
have a num_streams>1, meaning a device can have multiple SIDs.
Though it seems that PCI devices might not be in this scope, a
plain xarray might not work for other type of devices in a long
run, if there'd be?
> > > I suspect the answer to Robin's question on how to handle errors is
> > > the most important deciding factor. If we have to capture and relay
> > > actual HW errors back to userspace that really suggests we should do
> > > something different than a synchronous ioctl.
> > 
> > A synchronous ioctl is to return some values other than defining
> > cache_invalidate_user as void, like we are doing now? An fault
> > injection pathway to report CERROR asynchronously is what we've
> > been doing though -- even with Eric's previous VFIO solution.
> 
> Where is this? How does it look?
That's postponed with the PRI support, right? My use case does
not need PRI actually, but a fault injection pathway to guests.
This pathway should be able to take care of any CERROR (detected
by a host interrupt) or something funky in cache_invalidate_user
requests itself?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 15:56               ` Nicolin Chen
@ 2023-03-20 16:04                 ` Jason Gunthorpe
  2023-03-20 16:59                   ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 16:04 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 08:56:00AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 10:03:04AM -0300, Jason Gunthorpe wrote:
> > On Sat, Mar 11, 2023 at 03:56:50AM -0800, Nicolin Chen wrote:
> > 
> > > I recall that one difficulty is to pass the vSID from the guest
> > > down to the host kernel driver and to link with the pSID. What I
> > > did previously for VCMDQ was to set the SID_MATCH register with
> > > iommu_group_id(group) and set the SID_REPLACE register with the
> > > pSID. Then hyper will use the iommu_group_id to search for the
> > > pair of the registers, and to set vSID. Perhaps we should think
> > > of something smarter.
> > 
> > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > into the driver. Kernel will convert dev_id to pSID. Driver will
> > program the map into HW.
> 
> Can we just pass a vSID via the alloc ioctl like this?
> 
> -----------------------------------------------------------
> @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
>  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
>         __u64 flags;
>         __u32 s2vmid;
> -       __u32 __reserved;
> +       __u32 sid;
>         __u64 s1ctxptr;
>         __u64 s1cdmax;
>         __u64 s1fmt;
> -----------------------------------------------------------
> 
> An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> an SID filed anyway.
No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
model.
dev_id is the SID.
The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
The kernel has to translate the vSID to the dev_id to the pSID to
issue an ATC invalidation for the correct entity.
> > SW path will program the map into an xarray
> 
> I found a tricky thing about SIDs in the SMMU driver when doing
> this experiment: the SMMU kernel driver mostly handles devices
> using struct arm_smmu_master. However, an arm_smmu_master might
> have a num_streams>1, meaning a device can have multiple SIDs.
> Though it seems that PCI devices might not be in this scope, a
> plain xarray might not work for other type of devices in a long
> run, if there'd be?
You'd replicate each of the vSIDs of the extra SIDs in the xarray.
> > > cache_invalidate_user as void, like we are doing now? An fault
> > > injection pathway to report CERROR asynchronously is what we've
> > > been doing though -- even with Eric's previous VFIO solution.
> > 
> > Where is this? How does it look?
> 
> That's postponed with the PRI support, right? My use case does
> not need PRI actually, but a fault injection pathway to guests.
> This pathway should be able to take care of any CERROR (detected
> by a host interrupt) or something funky in cache_invalidate_user
> requests itself?
I would expect that if invalidation can fail that we have a way to
signal that failure back to the guest.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 16:04                 ` Jason Gunthorpe
@ 2023-03-20 16:59                   ` Nicolin Chen
  2023-03-20 18:45                     ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 01:04:35PM -0300, Jason Gunthorpe wrote:
> > > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > > into the driver. Kernel will convert dev_id to pSID. Driver will
> > > program the map into HW.
> > 
> > Can we just pass a vSID via the alloc ioctl like this?
> > 
> > -----------------------------------------------------------
> > @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
> >  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
> >         __u64 flags;
> >         __u32 s2vmid;
> > -       __u32 __reserved;
> > +       __u32 sid;
> >         __u64 s1ctxptr;
> >         __u64 s1cdmax;
> >         __u64 s1fmt;
> > -----------------------------------------------------------
> > 
> > An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> > an SID filed anyway.
> 
> No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
> model.
> 
> dev_id is the SID.
> 
> The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
> 
> The kernel has to translate the vSID to the dev_id to the pSID to
> issue an ATC invalidation for the correct entity.
OK. This narrative makes sense. I think our solution (the entire
stack) here mixes these two terms between HWPT/ASID and STE/SID.
What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
the host an HWPT alloc ioctl. The former one is based on an SID
or a device, while the latter one is based on ASID.
So the correct way should be for QEMU to maintain an ASID-based
list, corresponding to the s1ctxptr from STEs, and only send an
alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
a dev_id (and pSID accordingly).
In another word, an SMMU_CMD_CFGI_STE should do a mandatory SID
ioctl and an optional HWPT alloc ioctl (only allocates a HWPT if
the s1ctxptr in the STE is new).
What could be a good prototype of the ioctl? Would it be a VFIO
device one or IOMMUFD one?
> > > SW path will program the map into an xarray
> > 
> > I found a tricky thing about SIDs in the SMMU driver when doing
> > this experiment: the SMMU kernel driver mostly handles devices
> > using struct arm_smmu_master. However, an arm_smmu_master might
> > have a num_streams>1, meaning a device can have multiple SIDs.
> > Though it seems that PCI devices might not be in this scope, a
> > plain xarray might not work for other type of devices in a long
> > run, if there'd be?
> 
> You'd replicate each of the vSIDs of the extra SIDs in the xarray.
Noted it down.
> > > > cache_invalidate_user as void, like we are doing now? An fault
> > > > injection pathway to report CERROR asynchronously is what we've
> > > > been doing though -- even with Eric's previous VFIO solution.
> > > 
> > > Where is this? How does it look?
> > 
> > That's postponed with the PRI support, right? My use case does
> > not need PRI actually, but a fault injection pathway to guests.
> > This pathway should be able to take care of any CERROR (detected
> > by a host interrupt) or something funky in cache_invalidate_user
> > requests itself?
> 
> I would expect that if invalidation can fail that we have a way to
> signal that failure back to the guest.
That's plausible to me, and it could apply to a translation
fault too. So, should we add back the iommufd infrastructure
for the fault injection (without PRI), in v2?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 16:59                   ` Nicolin Chen
@ 2023-03-20 18:45                     ` Jason Gunthorpe
  2023-03-20 21:22                       ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 18:45 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 09:59:45AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 01:04:35PM -0300, Jason Gunthorpe wrote:
> 
> > > > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > > > into the driver. Kernel will convert dev_id to pSID. Driver will
> > > > program the map into HW.
> > > 
> > > Can we just pass a vSID via the alloc ioctl like this?
> > > 
> > > -----------------------------------------------------------
> > > @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
> > >  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
> > >         __u64 flags;
> > >         __u32 s2vmid;
> > > -       __u32 __reserved;
> > > +       __u32 sid;
> > >         __u64 s1ctxptr;
> > >         __u64 s1cdmax;
> > >         __u64 s1fmt;
> > > -----------------------------------------------------------
> > > 
> > > An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> > > an SID filed anyway.
> > 
> > No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
> > model.
> > 
> > dev_id is the SID.
> > 
> > The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
> > 
> > The kernel has to translate the vSID to the dev_id to the pSID to
> > issue an ATC invalidation for the correct entity.
> 
> OK. This narrative makes sense. I think our solution (the entire
> stack) here mixes these two terms between HWPT/ASID and STE/SID.
HWPT is an "ASID/DID" on Intel and a CD table on SMMUv3
> What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
> the host an HWPT alloc ioctl. The former one is based on an SID
> or a device, while the latter one is based on ASID.
> 
> So the correct way should be for QEMU to maintain an ASID-based
> list, corresponding to the s1ctxptr from STEs, and only send an
> alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
> of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
> a dev_id (and pSID accordingly).
It is not ASID, it just s1ctxptr's - de-duplicate them.
Do something about SMMUv3 not being able to interwork iommu_domains
across instances
> In another word, an SMMU_CMD_CFGI_STE should do a mandatory SID
> ioctl and an optional HWPT alloc ioctl (only allocates a HWPT if
> the s1ctxptr in the STE is new).
No, there is no SID ioctl at the STE stage.
The vSID was decided by qemu before the VM booted. It created it when
it built the vRID and the vPCI device. The vSID is tied to the vfio
device FD.
Somehow the VM knows the relationship between vSID and vPCI/vRID. IIRC
this is passed in through ACPI from qemu.
So vSID is an alais for the dev_id in iommfd language, and quemu
always has a translation table for it.
So CFGI_STE maps to allocating a de-duplicated HWPT for the CD table,
and then a replace operation on the device FD represented by the vSID
to change the pSTE to point to the HWPT.
The HWPT is effectively the "shadow STE".
> What could be a good prototype of the ioctl? Would it be a VFIO
> device one or IOMMUFD one?
If we load the vSID table it should be a iommufd one, linked to the
ARM SMMUv3 driver and probably take in a pointer to an array of
vSID/dev_id pairs. Maybe an add/remove type of operation.
> > I would expect that if invalidation can fail that we have a way to
> > signal that failure back to the guest.
> 
> That's plausible to me, and it could apply to a translation
> fault too. So, should we add back the iommufd infrastructure
> for the fault injection (without PRI), in v2?
It would be nice if things were not so big, I don't think we need to
tackle translation fault at this time, but we should be thinking about
what invalidation cmd fault converts into.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 18:45                     ` Jason Gunthorpe
@ 2023-03-20 21:22                       ` Nicolin Chen
  2023-03-20 22:19                         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 21:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 03:45:54PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 20, 2023 at 09:59:45AM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 01:04:35PM -0300, Jason Gunthorpe wrote:
> > 
> > > > > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > > > > into the driver. Kernel will convert dev_id to pSID. Driver will
> > > > > program the map into HW.
> > > > 
> > > > Can we just pass a vSID via the alloc ioctl like this?
> > > > 
> > > > -----------------------------------------------------------
> > > > @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
> > > >  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
> > > >         __u64 flags;
> > > >         __u32 s2vmid;
> > > > -       __u32 __reserved;
> > > > +       __u32 sid;
> > > >         __u64 s1ctxptr;
> > > >         __u64 s1cdmax;
> > > >         __u64 s1fmt;
> > > > -----------------------------------------------------------
> > > > 
> > > > An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> > > > an SID filed anyway.
> > > 
> > > No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
> > > model.
> > > 
> > > dev_id is the SID.
> > > 
> > > The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
> > > 
> > > The kernel has to translate the vSID to the dev_id to the pSID to
> > > issue an ATC invalidation for the correct entity.
> > 
> > OK. This narrative makes sense. I think our solution (the entire
> > stack) here mixes these two terms between HWPT/ASID and STE/SID.
> 
> HWPT is an "ASID/DID" on Intel and a CD table on SMMUv3
> 
> > What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
> > the host an HWPT alloc ioctl. The former one is based on an SID
> > or a device, while the latter one is based on ASID.
> > 
> > So the correct way should be for QEMU to maintain an ASID-based
> > list, corresponding to the s1ctxptr from STEs, and only send an
> > alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
> > of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
> > a dev_id (and pSID accordingly).
> 
> It is not ASID, it just s1ctxptr's - de-duplicate them.
SMMU has "ASID" too. And it's one per CD table. It can be also
seen as one per iommu_domain.
The following are lines from arm_smmu_domain_finalise_s1():
	...
	ret = xa_alloc(&arm_smmu_asid_xa, &asid, &cfg->cd,
		       XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
	...
	cfg->cd.asid    = (u16)asid;
	...
> Do something about SMMUv3 not being able to interwork iommu_domains
> across instances
I don't follow this one. Device instances?
> > In another word, an SMMU_CMD_CFGI_STE should do a mandatory SID
> > ioctl and an optional HWPT alloc ioctl (only allocates a HWPT if
> > the s1ctxptr in the STE is new).
> 
> No, there is no SID ioctl at the STE stage.
> 
> The vSID was decided by qemu before the VM booted. It created it when
> it built the vRID and the vPCI device. The vSID is tied to the vfio
> device FD.
> 
> Somehow the VM knows the relationship between vSID and vPCI/vRID. IIRC
> this is passed in through ACPI from qemu.
Yes.
> So vSID is an alais for the dev_id in iommfd language, and quemu
> always has a translation table for it.
I see.
> So CFGI_STE maps to allocating a de-duplicated HWPT for the CD table,
> and then a replace operation on the device FD represented by the vSID
> to change the pSTE to point to the HWPT.
> 
> The HWPT is effectively the "shadow STE".
IIUIC, the ioctl for the link of vSID/dev_id should happen at
the stage when boot boots, while the HWPT alloc ioctl happens
at CFGI_STE.
> > What could be a good prototype of the ioctl? Would it be a VFIO
> > device one or IOMMUFD one?
> 
> If we load the vSID table it should be a iommufd one, linked to the
> ARM SMMUv3 driver and probably take in a pointer to an array of
> vSID/dev_id pairs. Maybe an add/remove type of operation.
Will try some solution.
> > > I would expect that if invalidation can fail that we have a way to
> > > signal that failure back to the guest.
> > 
> > That's plausible to me, and it could apply to a translation
> > fault too. So, should we add back the iommufd infrastructure
> > for the fault injection (without PRI), in v2?
> 
> It would be nice if things were not so big, I don't think we need to
> tackle translation fault at this time, but we should be thinking about
> what invalidation cmd fault converts into.
Will see if we can add a compact one, or some other solution
for invalidation fault only.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 21:22                       ` Nicolin Chen
@ 2023-03-20 22:19                         ` Jason Gunthorpe
  2023-03-22 20:57                           ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 22:19 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 02:22:42PM -0700, Nicolin Chen wrote:
> > > What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
> > > the host an HWPT alloc ioctl. The former one is based on an SID
> > > or a device, while the latter one is based on ASID.
> > > 
> > > So the correct way should be for QEMU to maintain an ASID-based
> > > list, corresponding to the s1ctxptr from STEs, and only send an
> > > alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
> > > of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
> > > a dev_id (and pSID accordingly).
> > 
> > It is not ASID, it just s1ctxptr's - de-duplicate them.
> 
> SMMU has "ASID" too. And it's one per CD table. It can be also
> seen as one per iommu_domain.
Yes and no, the ASID in ARM is per CDE not per CD table. It is
associated with each TTB0/1 pointer and is effectively the handle for
the IOPTEs.
Every iommu_domain that has a TTB0/1 (ie represents IOPTEs) should
have an ASID.
The "nested" iommu_domains don't represent IOPTEs and don't have ASIDs.
The nested domains are just "shadow STEs".
> > Do something about SMMUv3 not being able to interwork iommu_domains
> > across instances
> 
> I don't follow this one. Device instances?
There is some code that makes sure each iommu_domain is hooked to only
one smmu driver instance, IIRC.
 
> IIUIC, the ioctl for the link of vSID/dev_id should happen at
> the stage when boot boots, while the HWPT alloc ioctl happens
> at CFGI_STE.
Yes
 
> > > What could be a good prototype of the ioctl? Would it be a VFIO
> > > device one or IOMMUFD one?
> > 
> > If we load the vSID table it should be a iommufd one, linked to the
> > ARM SMMUv3 driver and probably take in a pointer to an array of
> > vSID/dev_id pairs. Maybe an add/remove type of operation.
> 
> Will try some solution.
It is only necessary if you want to do batching
For non-batching the SID invalidation should be done differently with
a device_id input instead. That is a bit tricky to organize as you
want iommufd to get back a 'struct device *' from the ID.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 22:19                         ` Jason Gunthorpe
@ 2023-03-22 20:57                           ` Nicolin Chen
  2023-03-23 12:17                             ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22 20:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 07:19:34PM -0300, Jason Gunthorpe wrote:
> > IIUIC, the ioctl for the link of vSID/dev_id should happen at
> > the stage when boot boots, while the HWPT alloc ioctl happens
> > at CFGI_STE.
> 
> Yes
>  
> > > > What could be a good prototype of the ioctl? Would it be a VFIO
> > > > device one or IOMMUFD one?
> > > 
> > > If we load the vSID table it should be a iommufd one, linked to the
> > > ARM SMMUv3 driver and probably take in a pointer to an array of
> > > vSID/dev_id pairs. Maybe an add/remove type of operation.
> > 
> > Will try some solution.
> 
> It is only necessary if you want to do batching
> 
> For non-batching the SID invalidation should be done differently with
> a device_id input instead. That is a bit tricky to organize as you
> want iommufd to get back a 'struct device *' from the ID.
I am wondering whether we need to have dev_id, i.e. IOMMUFD,
in play with the link of pSID<->vSID, as I am thinking of a
simplified approach by passing the vSID via the hwpt alloc
structure when we allocate an S2 domain.
The arm_smmu_domain_alloc_user() takes this vSID and a dev
pointer, so it can easily tie the vSID to the dev's pSID.
By doing so, we wouldn't need a new ioctl anymore.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 20:57                           ` Nicolin Chen
@ 2023-03-23 12:17                             ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 12:17 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Wed, Mar 22, 2023 at 01:57:23PM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 07:19:34PM -0300, Jason Gunthorpe wrote:
> 
> > > IIUIC, the ioctl for the link of vSID/dev_id should happen at
> > > the stage when boot boots, while the HWPT alloc ioctl happens
> > > at CFGI_STE.
> > 
> > Yes
> >  
> > > > > What could be a good prototype of the ioctl? Would it be a VFIO
> > > > > device one or IOMMUFD one?
> > > > 
> > > > If we load the vSID table it should be a iommufd one, linked to the
> > > > ARM SMMUv3 driver and probably take in a pointer to an array of
> > > > vSID/dev_id pairs. Maybe an add/remove type of operation.
> > > 
> > > Will try some solution.
> > 
> > It is only necessary if you want to do batching
> > 
> > For non-batching the SID invalidation should be done differently with
> > a device_id input instead. That is a bit tricky to organize as you
> > want iommufd to get back a 'struct device *' from the ID.
> 
> I am wondering whether we need to have dev_id, i.e. IOMMUFD,
> in play with the link of pSID<->vSID, as I am thinking of a
> simplified approach by passing the vSID via the hwpt alloc
> structure when we allocate an S2 domain.
No, that doesn't make sense. the vSID is per-STE, the S2 domain is
fully shared. You can't put SID information in the iommu_domains.
JAson
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
 
 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10 16:19         ` Jason Gunthorpe
  2023-03-11 11:56           ` Nicolin Chen
@ 2023-03-17  9:41           ` Tian, Kevin
  2023-03-17 14:24             ` Nicolin Chen
  2023-03-20 12:59             ` Jason Gunthorpe
  1 sibling, 2 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17  9:41 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, March 11, 2023 12:20 AM
> 
> What I'm broadly thinking is if we have to make the infrastructure for
> VCMDQ HW accelerated invalidation then it is not a big step to also
> have the kernel SW path use the same infrastructure just with a CPU
> wake up instead of a MMIO poke.
> 
> Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> support.
> 
I thought about this in VT-d context. Looks there are some difficulties.
The most prominent one is that head/tail of the VT-d invalidation queue
are in MMIO registers. Handling it in kernel iommu driver suggests
reading virtual tail register and updating virtual head register. Kind of 
moving some vIOMMU awareness into the kernel which, iirc, is not
a welcomed model.
vhost doesn't have this problem as its vring structure fully resides in
memory including ring tail/head. As long as kernel vhost driver understands
the structure and can send/receive notification to/from kvm then the
in-kernel acceleration works seamlessly.
Not sure whether SMMU has similar obstacle as VT-d. But this is my
impression why vhost-iommu is preferred when talking about such
optimization before.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-17  9:41           ` Tian, Kevin
@ 2023-03-17 14:24             ` Nicolin Chen
  2023-03-20 12:59             ` Jason Gunthorpe
  1 sibling, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-17 14:24 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, March 11, 2023 12:20 AM
> >
> > What I'm broadly thinking is if we have to make the infrastructure for
> > VCMDQ HW accelerated invalidation then it is not a big step to also
> > have the kernel SW path use the same infrastructure just with a CPU
> > wake up instead of a MMIO poke.
> >
> > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > support.
> >
> 
> I thought about this in VT-d context. Looks there are some difficulties.
> 
> The most prominent one is that head/tail of the VT-d invalidation queue
> are in MMIO registers. Handling it in kernel iommu driver suggests
> reading virtual tail register and updating virtual head register. Kind of
> moving some vIOMMU awareness into the kernel which, iirc, is not
> a welcomed model.
I had a similar question in another email:
"Firstly, the consumer and producer indexes might need
 to be synced between the host and kernel?"
And Jason replied me with this:
"No, qemu would handles this. The kernel would just read the command
 entries it is told by qemu to read which qemu has already sorted out."
Maybe there is no need of a concern for the head/tail readings?
> vhost doesn't have this problem as its vring structure fully resides in
> memory including ring tail/head. As long as kernel vhost driver understands
> the structure and can send/receive notification to/from kvm then the
> in-kernel acceleration works seamlessly.
> 
> Not sure whether SMMU has similar obstacle as VT-d. But this is my
> impression why vhost-iommu is preferred when talking about such
> optimization before.
SMMU has a similar pair of head/tail pointers to the invalidation
queue (consumer/producer indexes and command queue in SMMU term).
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-17  9:41           ` Tian, Kevin
  2023-03-17 14:24             ` Nicolin Chen
@ 2023-03-20 12:59             ` Jason Gunthorpe
  2023-03-20 16:12               ` Nicolin Chen
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 12:59 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, March 11, 2023 12:20 AM
> > 
> > What I'm broadly thinking is if we have to make the infrastructure for
> > VCMDQ HW accelerated invalidation then it is not a big step to also
> > have the kernel SW path use the same infrastructure just with a CPU
> > wake up instead of a MMIO poke.
> > 
> > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > support.
> > 
> 
> I thought about this in VT-d context. Looks there are some difficulties.
> 
> The most prominent one is that head/tail of the VT-d invalidation queue
> are in MMIO registers. Handling it in kernel iommu driver suggests
> reading virtual tail register and updating virtual head register. Kind of 
> moving some vIOMMU awareness into the kernel which, iirc, is not
> a welcomed model.
qemu would trap the MMIO and generate an IOCTL with the written head
pointer. It isn't as efficient as having the kernel do the trap, but
does give batching.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 12:59             ` Jason Gunthorpe
@ 2023-03-20 16:12               ` Nicolin Chen
  2023-03-20 18:00                 ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 16:12 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, March 11, 2023 12:20 AM
> > > 
> > > What I'm broadly thinking is if we have to make the infrastructure for
> > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > have the kernel SW path use the same infrastructure just with a CPU
> > > wake up instead of a MMIO poke.
> > > 
> > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > > support.
> > > 
> > 
> > I thought about this in VT-d context. Looks there are some difficulties.
> > 
> > The most prominent one is that head/tail of the VT-d invalidation queue
> > are in MMIO registers. Handling it in kernel iommu driver suggests
> > reading virtual tail register and updating virtual head register. Kind of 
> > moving some vIOMMU awareness into the kernel which, iirc, is not
> > a welcomed model.
> 
> qemu would trap the MMIO and generate an IOCTL with the written head
> pointer. It isn't as efficient as having the kernel do the trap, but
> does give batching.
Rephrasing that to put into a design: the IOCTL would pass a
user pointer to the queue, the size of the queue, then a head
pointer and a tail pointer? Then the kernel reads out all the
commands between the head and the tail and handles all those
invalidation commands only?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 16:12               ` Nicolin Chen
@ 2023-03-20 18:00                 ` Jason Gunthorpe
  2023-03-21  8:34                   ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 18:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Mon, Mar 20, 2023 at 09:12:06AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> > On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Saturday, March 11, 2023 12:20 AM
> > > > 
> > > > What I'm broadly thinking is if we have to make the infrastructure for
> > > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > > have the kernel SW path use the same infrastructure just with a CPU
> > > > wake up instead of a MMIO poke.
> > > > 
> > > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases without HW
> > > > support.
> > > > 
> > > 
> > > I thought about this in VT-d context. Looks there are some difficulties.
> > > 
> > > The most prominent one is that head/tail of the VT-d invalidation queue
> > > are in MMIO registers. Handling it in kernel iommu driver suggests
> > > reading virtual tail register and updating virtual head register. Kind of 
> > > moving some vIOMMU awareness into the kernel which, iirc, is not
> > > a welcomed model.
> > 
> > qemu would trap the MMIO and generate an IOCTL with the written head
> > pointer. It isn't as efficient as having the kernel do the trap, but
> > does give batching.
> 
> Rephrasing that to put into a design: the IOCTL would pass a
> user pointer to the queue, the size of the queue, then a head
> pointer and a tail pointer? Then the kernel reads out all the
> commands between the head and the tail and handles all those
> invalidation commands only?
Yes, that is one possible design
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 18:00                 ` Jason Gunthorpe
@ 2023-03-21  8:34                   ` Tian, Kevin
  2023-03-21 11:48                     ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Tian, Kevin @ 2023-03-21  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 21, 2023 2:01 AM
> 
> On Mon, Mar 20, 2023 at 09:12:06AM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 09:59:23AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Mar 17, 2023 at 09:41:34AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Saturday, March 11, 2023 12:20 AM
> > > > >
> > > > > What I'm broadly thinking is if we have to make the infrastructure for
> > > > > VCMDQ HW accelerated invalidation then it is not a big step to also
> > > > > have the kernel SW path use the same infrastructure just with a CPU
> > > > > wake up instead of a MMIO poke.
> > > > >
> > > > > Ie we have a SW version of VCMDQ to speed up SMMUv3 cases
> without HW
> > > > > support.
> > > > >
> > > >
> > > > I thought about this in VT-d context. Looks there are some difficulties.
> > > >
> > > > The most prominent one is that head/tail of the VT-d invalidation
> queue
> > > > are in MMIO registers. Handling it in kernel iommu driver suggests
> > > > reading virtual tail register and updating virtual head register. Kind of
> > > > moving some vIOMMU awareness into the kernel which, iirc, is not
> > > > a welcomed model.
> > >
> > > qemu would trap the MMIO and generate an IOCTL with the written head
> > > pointer. It isn't as efficient as having the kernel do the trap, but
> > > does give batching.
> >
> > Rephrasing that to put into a design: the IOCTL would pass a
> > user pointer to the queue, the size of the queue, then a head
> > pointer and a tail pointer? Then the kernel reads out all the
> > commands between the head and the tail and handles all those
> > invalidation commands only?
> 
> Yes, that is one possible design
> 
If we cannot have the short path in the kernel then I'm not sure the
value of using native format and queue in the uAPI. Batching can
be enabled over any format.
Btw probably a dumb question. The current invalidation IOCTL is
per hwpt. If picking a native format does it suggest making the IOCTL
per iommufd given native format is per IOMMU and could carry
scope bigger than a hwpt.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-21  8:34                   ` Tian, Kevin
@ 2023-03-21 11:48                     ` Jason Gunthorpe
  2023-03-22  6:42                       ` Nicolin Chen
  2023-03-24  8:47                       ` Tian, Kevin
  0 siblings, 2 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 11:48 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > > Rephrasing that to put into a design: the IOCTL would pass a
> > > user pointer to the queue, the size of the queue, then a head
> > > pointer and a tail pointer? Then the kernel reads out all the
> > > commands between the head and the tail and handles all those
> > > invalidation commands only?
> > 
> > Yes, that is one possible design
> 
> If we cannot have the short path in the kernel then I'm not sure the
> value of using native format and queue in the uAPI. Batching can
> be enabled over any format.
SMMUv3 will have a hardware short path where the HW itself runs the
VM's command queue and does this logic.
So I like the symmetry of the SW path being close to that.
> Btw probably a dumb question. The current invalidation IOCTL is
> per hwpt. If picking a native format does it suggest making the IOCTL
> per iommufd given native format is per IOMMU and could carry
> scope bigger than a hwpt.
At least on SMMUv3 it depends on what happens with VMID.
If we can tie the VMID to the iommu_domain then the invalidation has
to flow through the iommu_domain to pick up the VMID.
If the VMID is tied to the entire iommufd_ctx then it can flow
independently.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-21 11:48                     ` Jason Gunthorpe
@ 2023-03-22  6:42                       ` Nicolin Chen
  2023-03-22 12:43                         ` Jason Gunthorpe
  2023-03-24  9:02                         ` Tian, Kevin
  2023-03-24  8:47                       ` Tian, Kevin
  1 sibling, 2 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22  6:42 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> 
> > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > user pointer to the queue, the size of the queue, then a head
> > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > commands between the head and the tail and handles all those
> > > > invalidation commands only?
> > > 
> > > Yes, that is one possible design
> > 
> > If we cannot have the short path in the kernel then I'm not sure the
> > value of using native format and queue in the uAPI. Batching can
> > be enabled over any format.
> 
> SMMUv3 will have a hardware short path where the HW itself runs the
> VM's command queue and does this logic.
> 
> So I like the symmetry of the SW path being close to that.
A tricky thing here that I just realized:
With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
invalidation IOCTL, and the other hardware accelerated VCMDQ
handling all TLBI commands by the HW. In this setup, we will
need a VCMDQ kernel driver to dispatch commands into the two
different queues.
Yet, it feels a bit different with this SW path exposing the
entire SMMU CMDQ, since now theoretically non-TLBI and TLBI
commands can be interlaced in one batch, so the hypervisor
should go through the queue first to handle and delete all
non-TLBI commands, and then forward the CMDQ to the host to
run remaining TLBI commands, if there's any.
> > Btw probably a dumb question. The current invalidation IOCTL is
> > per hwpt. If picking a native format does it suggest making the IOCTL
> > per iommufd given native format is per IOMMU and could carry
> > scope bigger than a hwpt.
> 
> At least on SMMUv3 it depends on what happens with VMID.
> 
> If we can tie the VMID to the iommu_domain then the invalidation has
> to flow through the iommu_domain to pick up the VMID.
Yes. This is what we do now. An invalidation handler finds the
corresponding S2 domain pointer to pick up the VMID. And it'd
be safe, until the S2 domain gets replaced with another domain
I think?
> If the VMID is tied to the entire iommufd_ctx then it can flow
> independently.
One more thing about the VMID unification is that SMMU might
have limitation on the VMID range:
	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
	...
	vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
So, we'd likely need a CAP for that, to apply some limitation
with the iommufd_ctx too?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22  6:42                       ` Nicolin Chen
@ 2023-03-22 12:43                         ` Jason Gunthorpe
  2023-03-22 17:11                           ` Nicolin Chen
  2023-03-24  9:02                         ` Tian, Kevin
  1 sibling, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 12:43 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Tue, Mar 21, 2023 at 11:42:25PM -0700, Nicolin Chen wrote:
> On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > 
> > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > user pointer to the queue, the size of the queue, then a head
> > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > commands between the head and the tail and handles all those
> > > > > invalidation commands only?
> > > > 
> > > > Yes, that is one possible design
> > > 
> > > If we cannot have the short path in the kernel then I'm not sure the
> > > value of using native format and queue in the uAPI. Batching can
> > > be enabled over any format.
> > 
> > SMMUv3 will have a hardware short path where the HW itself runs the
> > VM's command queue and does this logic.
> > 
> > So I like the symmetry of the SW path being close to that.
> 
> A tricky thing here that I just realized:
> 
> With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
> CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
> invalidation IOCTL, and the other hardware accelerated VCMDQ
> handling all TLBI commands by the HW. In this setup, we will
> need a VCMDQ kernel driver to dispatch commands into the two
> different queues.
You mean a VM kernel driver? Yes that was always the point, the VM
would use the extra CMDQ's only for invalidation
The main CMDQ would work as today through a trap.
> Yet, it feels a bit different with this SW path exposing the
> entire SMMU CMDQ, since now theoretically non-TLBI and TLBI
> commands can be interlaced in one batch, so the hypervisor
> should go through the queue first to handle and delete all
> non-TLBI commands, and then forward the CMDQ to the host to
> run remaining TLBI commands, if there's any.
Yes, there are a few different ways to handle this and still preserve
batching. It is part of the reason it would be hard to make the kernel
natively parse the commandq
On the other hand, we could add some more native kernel support for a
SW emulated vCMDQ and that might be interesting for performance.
One of the biggest reasons to use nesting is to get to vSVA and
invalidation performance is very important in a vSVA environment. We
should not ignore this in the design.
> > If the VMID is tied to the entire iommufd_ctx then it can flow
> > independently.
> 
> One more thing about the VMID unification is that SMMU might
> have limitation on the VMID range:
> 	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
> 	...
> 	vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
> 
> So, we'd likely need a CAP for that, to apply some limitation
> with the iommufd_ctx too?
I'd imagine the driver would have to allocate its internal data
against the iommufd_ctx
I'm not sure how best to organize that if it is the way to go.
Do we have a use case for more than one S2 iommu_domain on ARM?
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 12:43                         ` Jason Gunthorpe
@ 2023-03-22 17:11                           ` Nicolin Chen
  2023-03-22 17:28                             ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22 17:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 09:43:43AM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 21, 2023 at 11:42:25PM -0700, Nicolin Chen wrote:
> > On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > > 
> > > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > > user pointer to the queue, the size of the queue, then a head
> > > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > > commands between the head and the tail and handles all those
> > > > > > invalidation commands only?
> > > > > 
> > > > > Yes, that is one possible design
> > > > 
> > > > If we cannot have the short path in the kernel then I'm not sure the
> > > > value of using native format and queue in the uAPI. Batching can
> > > > be enabled over any format.
> > > 
> > > SMMUv3 will have a hardware short path where the HW itself runs the
> > > VM's command queue and does this logic.
> > > 
> > > So I like the symmetry of the SW path being close to that.
> > 
> > A tricky thing here that I just realized:
> > 
> > With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
> > CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
> > invalidation IOCTL, and the other hardware accelerated VCMDQ
> > handling all TLBI commands by the HW. In this setup, we will
> > need a VCMDQ kernel driver to dispatch commands into the two
> > different queues.
> 
> You mean a VM kernel driver? Yes that was always the point, the VM
> would use the extra CMDQ's only for invalidation
Yes, I was saying the guest kernel driver would dispatch the
commands.
> The main CMDQ would work as today through a trap.
Yes.
> > Yet, it feels a bit different with this SW path exposing the
> > entire SMMU CMDQ, since now theoretically non-TLBI and TLBI
> > commands can be interlaced in one batch, so the hypervisor
> > should go through the queue first to handle and delete all
> > non-TLBI commands, and then forward the CMDQ to the host to
> > run remaining TLBI commands, if there's any.
> 
> Yes, there are a few different ways to handle this and still preserve
> batching. It is part of the reason it would be hard to make the kernel
> natively parse the commandq
Yea. I think the way I described above might be the cleanest,
since the host kernel would only handle all the leftover TLBI
commands? I am open for other better idea, if there's any.
> On the other hand, we could add some more native kernel support for a
> SW emulated vCMDQ and that might be interesting for performance.
That's something I have thought about too. But it would feel
like changing the "hardware" of the VM, right? If the host
kernel enables nesting, then we'd have this extra queue for
TLBI commands. From the driver prospective, it would feels
like detecting an extra feature bit in the HW register, but
there's no such bit in the SMMU HW spec :)
Yet, would you please elaborate how it impacts performance?
I can only see the benefit of isolation, from having a SW
emulated VCMDQ exclusively for TLBI commands v.s. having a
single CMDQ interlacing different commands, because both of
them requires trapping and some sort of dispatching.
> One of the biggest reasons to use nesting is to get to vSVA and
> invalidation performance is very important in a vSVA environment. We
> should not ignore this in the design.
> 
> > > If the VMID is tied to the entire iommufd_ctx then it can flow
> > > independently.
> > 
> > One more thing about the VMID unification is that SMMU might
> > have limitation on the VMID range:
> > 	smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8;
> > 	...
> > 	vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
> > 
> > So, we'd likely need a CAP for that, to apply some limitation
> > with the iommufd_ctx too?
> 
> I'd imagine the driver would have to allocate its internal data
> against the iommufd_ctx
> 
> I'm not sure how best to organize that if it is the way to go.
> 
> Do we have a use case for more than one S2 iommu_domain on ARM?
In the previous VFIO solution from Eric, a nested iommu_domain
represented an S1+S2 two-stage setup. Since every CMD_CFGI_STE
could trigger an iommu_domain allocation of that, there could
be multiple S2 domains, when we have 2+ passthrough devices.
That's why I had quite a few patch for VMID unification in the
old VCMDQ series.
But now, we have only one S2 domain that works well with multi-
devices. So, I can't really think of a use case that needs two
S2 domains. Yet, I am not very sure.
Btw, just to confirm my understanding, a use case having two
or more iommu_domains means an S2 iommu_domain replacement,
right? I.e. a running S2 iommu_domain gets replaced on the fly
by a different S2 iommu_domain holding a different VMID, while
the IOAS still has the previous mappings? When would that
actually happen in the real world?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 17:11                           ` Nicolin Chen
@ 2023-03-22 17:28                             ` Jason Gunthorpe
  2023-03-22 19:21                               ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 17:28 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 10:11:33AM -0700, Nicolin Chen wrote:
> > Yes, there are a few different ways to handle this and still preserve
> > batching. It is part of the reason it would be hard to make the kernel
> > natively parse the commandq
> 
> Yea. I think the way I described above might be the cleanest,
> since the host kernel would only handle all the leftover TLBI
> commands? I am open for other better idea, if there's any.
It seems best to have userspace take a first pass over the cmdq and
then send what it didn't handle to the kernel
> > On the other hand, we could add some more native kernel support for a
> > SW emulated vCMDQ and that might be interesting for performance.
> 
> That's something I have thought about too. But it would feel
> like changing the "hardware" of the VM, right? If the host
> kernel enables nesting, then we'd have this extra queue for
> TLBI commands. From the driver prospective, it would feels
> like detecting an extra feature bit in the HW register, but
> there's no such bit in the SMMU HW spec :)
You'd trigger it the same way vCMDQ triggers. It is basically SW
emulated vCMDQ.
> Yet, would you please elaborate how it impacts performance?
> I can only see the benefit of isolation, from having a SW
> emulated VCMDQ exclusively for TLBI commands v.s. having a
> single CMDQ interlacing different commands, because both of
> them requires trapping and some sort of dispatching.
In theory would could make it work like virtio-iommu where the
doorbell ring for the SW emulated vCMDQ is delivered directly to a
kernel thread and chop a bunch of latency out of it.
The issue is latency to complete invalidation as in a vSVA scenario
the virtual process MM will block on IOMMU invlidation whenever it
does any mm_struct maintenance. Ie you slow a vast set of
operations. The less latency the better.
> Btw, just to confirm my understanding, a use case having two
> or more iommu_domains means an S2 iommu_domain replacement,
> right? I.e. a running S2 iommu_domain gets replaced on the fly
> by a different S2 iommu_domain holding a different VMID, while
> the IOAS still has the previous mappings? When would that
> actually happen in the real world?
It doesn't have to be replace - what is needed is that evey vPCI
device connected to the same SMMU instance be using the same S2 and
thus the same VM_ID.
IOW evey SID must be linked to the same VM_ID or invalidation commands
will not be properly processed.
qemu would have to have multiple SMMU instances according to S2
domains, which is probably true anyhow since we need to know what
physical SMMU instance to deliver the invalidation too anyhow.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 17:28                             ` Jason Gunthorpe
@ 2023-03-22 19:21                               ` Nicolin Chen
  2023-03-22 19:41                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22 19:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 02:28:38PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 10:11:33AM -0700, Nicolin Chen wrote:
> 
> > > Yes, there are a few different ways to handle this and still preserve
> > > batching. It is part of the reason it would be hard to make the kernel
> > > natively parse the commandq
> > 
> > Yea. I think the way I described above might be the cleanest,
> > since the host kernel would only handle all the leftover TLBI
> > commands? I am open for other better idea, if there's any.
> 
> It seems best to have userspace take a first pass over the cmdq and
> then send what it didn't handle to the kernel
Yes. I can go ahead with this approach for v2.
> > > On the other hand, we could add some more native kernel support for a
> > > SW emulated vCMDQ and that might be interesting for performance.
> > 
> > That's something I have thought about too. But it would feel
> > like changing the "hardware" of the VM, right? If the host
> > kernel enables nesting, then we'd have this extra queue for
> > TLBI commands. From the driver prospective, it would feels
> > like detecting an extra feature bit in the HW register, but
> > there's no such bit in the SMMU HW spec :)
> 
> You'd trigger it the same way vCMDQ triggers. It is basically SW
> emulated vCMDQ.
It still feels something very big. Off the top of my head,
we'd need a pair of new emulated registers for consumer and
producer indexes, and perhaps some configuration registers
too. How should we put into the MMIO space? Maybe we could
emulate that via ECMDQ? So, for QEMU, the SMMU device model
always has the ECMDQ feature so we can have this extra MMIO
space for a separate CMDQ.
> > Yet, would you please elaborate how it impacts performance?
> > I can only see the benefit of isolation, from having a SW
> > emulated VCMDQ exclusively for TLBI commands v.s. having a
> > single CMDQ interlacing different commands, because both of
> > them requires trapping and some sort of dispatching.
> 
> In theory would could make it work like virtio-iommu where the
> doorbell ring for the SW emulated vCMDQ is delivered directly to a
> kernel thread and chop a bunch of latency out of it.
With a SW emulated VCMDQ, the dispatching is moved to the
guest kernel, v.s. the hypervisor. I still don't see a big
improvement here. Perhaps we should run a benchmark with
some experimental changes.
> The issue is latency to complete invalidation as in a vSVA scenario
> the virtual process MM will block on IOMMU invlidation whenever it
> does any mm_struct maintenance. Ie you slow a vast set of
> operations. The less latency the better.
Yea. If it has a noticeable per gain, we should do that.
Do you prefer this to happen with this series? I would think
of adding this in the later stage, although I am not sure if
the uAPI would be completely compatible. It seems to me that
we would need a different uAPI, so as to setup a queue in an
earlier stage, and then to ring a bell when QEMU traps any
incoming commands in the emulated VCMDQ.
> > Btw, just to confirm my understanding, a use case having two
> > or more iommu_domains means an S2 iommu_domain replacement,
> > right? I.e. a running S2 iommu_domain gets replaced on the fly
> > by a different S2 iommu_domain holding a different VMID, while
> > the IOAS still has the previous mappings? When would that
> > actually happen in the real world?
> 
> It doesn't have to be replace - what is needed is that evey vPCI
> device connected to the same SMMU instance be using the same S2 and
> thus the same VM_ID.
> 
> IOW evey SID must be linked to the same VM_ID or invalidation commands
> will not be properly processed.
> 
> qemu would have to have multiple SMMU instances according to S2
> domains, which is probably true anyhow since we need to know what
> physical SMMU instance to deliver the invalidation too anyhow.
I am not 100% following this part. So, you mean that we're
safe if we only have one SMMU instance, because there'd be
only one S2 domain, while multiple S2 domains would happen
if we have multiple SMMU instances?
Can we still use the same S2 domain for multiple instances?
Our approach of setting up a stage-2 mapping in QEMU is to
map the entire guest memory. I don't see a point in having
a separate S2 domain, even if there are multiple instances?
Btw, from a private discussion with Eric, he expressed the
difficulty of adding multiple SMMU instances in QEMU, as it
would complicate the device and ACPI components. For VCMDQ,
we do need a multi-instance environment, because there are
multiple physical pairs of SMMU+VCMDQ, i.e. multiple VCMDQ
MMIO regions being attached/used by different devices. So,
I have been exploring a different approach by creating an
internal multiplication inside VCMDQ...
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 19:21                               ` Nicolin Chen
@ 2023-03-22 19:41                                 ` Jason Gunthorpe
  2023-03-22 20:43                                   ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-22 19:41 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 12:21:27PM -0700, Nicolin Chen wrote:
> Do you prefer this to happen with this series? 
No, I just don't want to exclude doing it someday if people are
interested to optimize this. As I said in the other thread I'd rather
optimize SMMUv3 emulation than try to use virtio-iommu to make it run
faster.
> the uAPI would be completely compatible. It seems to me that
> we would need a different uAPI, so as to setup a queue in an
> earlier stage, and then to ring a bell when QEMU traps any
> incoming commands in the emulated VCMDQ.
Yes, it would need more uAPI. Lets just make sure there is room and
maybe think a bit about what it would look like.
You should also draft through the HW vCMDQ stuff to ensure it fits
in here nicely.
 
> > > Btw, just to confirm my understanding, a use case having two
> > > or more iommu_domains means an S2 iommu_domain replacement,
> > > right? I.e. a running S2 iommu_domain gets replaced on the fly
> > > by a different S2 iommu_domain holding a different VMID, while
> > > the IOAS still has the previous mappings? When would that
> > > actually happen in the real world?
> > 
> > It doesn't have to be replace - what is needed is that evey vPCI
> > device connected to the same SMMU instance be using the same S2 and
> > thus the same VM_ID.
> > 
> > IOW evey SID must be linked to the same VM_ID or invalidation commands
> > will not be properly processed.
> > 
> > qemu would have to have multiple SMMU instances according to S2
> > domains, which is probably true anyhow since we need to know what
> > physical SMMU instance to deliver the invalidation too anyhow.
> 
> I am not 100% following this part. So, you mean that we're
> safe if we only have one SMMU instance, because there'd be
> only one S2 domain, while multiple S2 domains would happen
> if we have multiple SMMU instances?
Yes, that would happen today, especially since each smmu has its own
vm_id allocator IIRC
 
> Can we still use the same S2 domain for multiple instances?
I think not today.
At the core, if we share the same S2 domain then it is a problem to
figure out what smmu instance to send the invalidation command too. EG
if the userspace invalidates ASID 1 you'd have to replicate
invalidation to all SMMU instances. Even if ASID 1 is used by only a
single SID/STE that has a single SMMU instance backing it.
So I think for ARM we want to reflect the physical SMMU instances into
vSMMU instances and that feels best done by having a unique S2
iommu_domain for each SMMU instance. Then we know that an invalidation
for a SMMU instance is delivered to that S2's singular CMDQ and things
like vCMDQ become possible.
> Our approach of setting up a stage-2 mapping in QEMU is to
> map the entire guest memory. I don't see a point in having
> a separate S2 domain, even if there are multiple instances?
And then this is the drawback, we don't really want to have duplicated
S2 page tables in the system for every stage 2.
Maybe we have made a mistake by allowing the S2 to be an unmanaged
domain. Perhaps we should create the S2 out of an unmanaged domain
like the S1.
Then the rules could be
 - Unmanaged domain can be used with every smmu instance, only one
   copy of the page table. The ASID in the iommu_domain is
   kernel-global
 - S2 domain is a child of a shared unmanaged domain. It can be used
   only with the SMMU it is associated with, it has a per-SMMU VM ID
 - S1 domain is a child of a S2 domain, it can be used only with the
   SMMU it's S2 is associated with, just because
> Btw, from a private discussion with Eric, he expressed the
> difficulty of adding multiple SMMU instances in QEMU, as it
> would complicate the device and ACPI components. 
I'm not surprised by this, but for efficiency we probably have to do
this. Eric am I wrong?
qemu shouldn't have to do it immediately, but the kernel uAPI should
allow for a VMM that is optimized. We shouldn't exclude this by
mis-designing the kernel uAPI. qemu can replicate the invalidations
itself to make an ineffecient single vSMMU.
> For VCMDQ, we do need a multi-instance environment, because there
> are multiple physical pairs of SMMU+VCMDQ, i.e. multiple VCMDQ MMIO
> regions being attached/used by different devices. 
Yes. IMHO vCMDQ is the sane design here - invalidation performance is
important, having a kernel-bypass way to do it is ideal. I understand
AMD has a similar kernel-bypass queue approach for their stuff too. I
think everyone will eventually need to do this, especially for CC
applications. Having the hypervisor able to interfere with
invalidation feels like an attack vector.
So we should focus on long term designs that allow kernel-bypass to
work, and I don't see way to hide multi-instance and still truely
support vCMDQ??
> So, I have been exploring a different approach by creating an
> internal multiplication inside VCMDQ...
How can that work?
You'd have to have the guest VM to know to replicate to different
vCMDQ's? Which isn't the standard SMMU programming model anymore..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 19:41                                 ` Jason Gunthorpe
@ 2023-03-22 20:43                                   ` Nicolin Chen
  2023-03-23 12:16                                     ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22 20:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 04:41:32PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 12:21:27PM -0700, Nicolin Chen wrote:
> 
> > Do you prefer this to happen with this series? 
> 
> No, I just don't want to exclude doing it someday if people are
> interested to optimize this. As I said in the other thread I'd rather
> optimize SMMUv3 emulation than try to use virtio-iommu to make it run
> faster.
Got it. I will then just focus on reworking the invalidation
data structure with a list of command queue info.
> > the uAPI would be completely compatible. It seems to me that
> > we would need a different uAPI, so as to setup a queue in an
> > earlier stage, and then to ring a bell when QEMU traps any
> > incoming commands in the emulated VCMDQ.
> 
> Yes, it would need more uAPI. Lets just make sure there is room and
> maybe think a bit about what it would look like.
> 
> You should also draft through the HW vCMDQ stuff to ensure it fits
> in here nicely.
Yes.
  
> > > > Btw, just to confirm my understanding, a use case having two
> > > > or more iommu_domains means an S2 iommu_domain replacement,
> > > > right? I.e. a running S2 iommu_domain gets replaced on the fly
> > > > by a different S2 iommu_domain holding a different VMID, while
> > > > the IOAS still has the previous mappings? When would that
> > > > actually happen in the real world?
> > > 
> > > It doesn't have to be replace - what is needed is that evey vPCI
> > > device connected to the same SMMU instance be using the same S2 and
> > > thus the same VM_ID.
> > > 
> > > IOW evey SID must be linked to the same VM_ID or invalidation commands
> > > will not be properly processed.
> > > 
> > > qemu would have to have multiple SMMU instances according to S2
> > > domains, which is probably true anyhow since we need to know what
> > > physical SMMU instance to deliver the invalidation too anyhow.
> > 
> > I am not 100% following this part. So, you mean that we're
> > safe if we only have one SMMU instance, because there'd be
> > only one S2 domain, while multiple S2 domains would happen
> > if we have multiple SMMU instances?
> 
> Yes, that would happen today, especially since each smmu has its own
> vm_id allocator IIRC
>  
> > Can we still use the same S2 domain for multiple instances?
> 
> I think not today.
> 
> At the core, if we share the same S2 domain then it is a problem to
> figure out what smmu instance to send the invalidation command too. EG
> if the userspace invalidates ASID 1 you'd have to replicate
> invalidation to all SMMU instances. Even if ASID 1 is used by only a
> single SID/STE that has a single SMMU instance backing it.
Oh, Right. That would be a perf drawdown from an unnecessary
IOTLB miss potentially, because with a single instance QEMU
has to broadcast that invalidation to all SMMU instances.
> So I think for ARM we want to reflect the physical SMMU instances into
> vSMMU instances and that feels best done by having a unique S2
> iommu_domain for each SMMU instance. Then we know that an invalidation
> for a SMMU instance is delivered to that S2's singular CMDQ and things
> like vCMDQ become possible.
In that environment, do we still need a VMID unification?
 
> > Our approach of setting up a stage-2 mapping in QEMU is to
> > map the entire guest memory. I don't see a point in having
> > a separate S2 domain, even if there are multiple instances?
> 
> And then this is the drawback, we don't really want to have duplicated
> S2 page tables in the system for every stage 2.
> 
> Maybe we have made a mistake by allowing the S2 to be an unmanaged
> domain. Perhaps we should create the S2 out of an unmanaged domain
> like the S1.
> 
> Then the rules could be
>  - Unmanaged domain can be used with every smmu instance, only one
>    copy of the page table. The ASID in the iommu_domain is
>    kernel-global
>  - S2 domain is a child of a shared unmanaged domain. It can be used
>    only with the SMMU it is associated with, it has a per-SMMU VM ID
>  - S1 domain is a child of a S2 domain, it can be used only with the
>    SMMU it's S2 is associated with, just because
The actual S2 pagetable has to stay at the unmanaged domain
for IOAS_MAP, while we maintain an s2_cfg data structure in
the shadow S2 domain per SMMU instance that has its own VMID
but a shared S2 page table pointer?
Hmm... Feels very complicated to me. How does that help?
> > Btw, from a private discussion with Eric, he expressed the
> > difficulty of adding multiple SMMU instances in QEMU, as it
> > would complicate the device and ACPI components. 
> 
> I'm not surprised by this, but for efficiency we probably have to do
> this. Eric am I wrong?
> 
> qemu shouldn't have to do it immediately, but the kernel uAPI should
> allow for a VMM that is optimized. We shouldn't exclude this by
> mis-designing the kernel uAPI. qemu can replicate the invalidations
> itself to make an ineffecient single vSMMU.
> 
> > For VCMDQ, we do need a multi-instance environment, because there
> > are multiple physical pairs of SMMU+VCMDQ, i.e. multiple VCMDQ MMIO
> > regions being attached/used by different devices. 
> 
> Yes. IMHO vCMDQ is the sane design here - invalidation performance is
> important, having a kernel-bypass way to do it is ideal. I understand
> AMD has a similar kernel-bypass queue approach for their stuff too. I
> think everyone will eventually need to do this, especially for CC
> applications. Having the hypervisor able to interfere with
> invalidation feels like an attack vector.
> 
> So we should focus on long term designs that allow kernel-bypass to
> work, and I don't see way to hide multi-instance and still truely
> support vCMDQ??
Well, I agree and hope people across the board decide to move
towards the multi-instance direction.
> > So, I have been exploring a different approach by creating an
> > internal multiplication inside VCMDQ...
> 
> How can that work?
> 
> You'd have to have the guest VM to know to replicate to different
> vCMDQ's? Which isn't the standard SMMU programming model anymore..
VCMDQ has multiple VINTFs (Virtual Interfaces) that's supposed
to be used by the host to expose to multiple VMs.
In a multi-SMMU environment, every single SMMU+VCMDQ instance
would have one VINTF only that contains one or more VCMDQs. In
this case, passthrough devices behind different physical SMMU
instances are straightforwardly attached to different vSMMUs.
However, if we can't have multiple vSMMU instances, the guest
VM (its HW) would enable multiple VINTFs corresponding to the
number of physical SMMU/VCMDQ instances, for devices to attach
accordingly. That means I need to figure out a way to pin the
devices onto those VINTFs, by somehow passing their physical
SMMU IDs. The latest progress that I made is to have a bit of
a hack in the Dsdt table by inserting a physical SMMU ID to
every single passthrough device node, though I still need to
confirm the legality of doing that...
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22 20:43                                   ` Nicolin Chen
@ 2023-03-23 12:16                                     ` Jason Gunthorpe
  2023-03-23 18:13                                       ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 12:16 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Wed, Mar 22, 2023 at 01:43:59PM -0700, Nicolin Chen wrote:
> > So I think for ARM we want to reflect the physical SMMU instances into
> > vSMMU instances and that feels best done by having a unique S2
> > iommu_domain for each SMMU instance. Then we know that an invalidation
> > for a SMMU instance is delivered to that S2's singular CMDQ and things
> > like vCMDQ become possible.
> 
> In that environment, do we still need a VMID unification?
If each S2 is per-smmu-instance then the VMID can be local to the SMMU
instance
> > > Our approach of setting up a stage-2 mapping in QEMU is to
> > > map the entire guest memory. I don't see a point in having
> > > a separate S2 domain, even if there are multiple instances?
> > 
> > And then this is the drawback, we don't really want to have duplicated
> > S2 page tables in the system for every stage 2.
> > 
> > Maybe we have made a mistake by allowing the S2 to be an unmanaged
> > domain. Perhaps we should create the S2 out of an unmanaged domain
> > like the S1.
> > 
> > Then the rules could be
> >  - Unmanaged domain can be used with every smmu instance, only one
> >    copy of the page table. The ASID in the iommu_domain is
> >    kernel-global
> >  - S2 domain is a child of a shared unmanaged domain. It can be used
> >    only with the SMMU it is associated with, it has a per-SMMU VM ID
> >  - S1 domain is a child of a S2 domain, it can be used only with the
> >    SMMU it's S2 is associated with, just because
> 
> The actual S2 pagetable has to stay at the unmanaged domain
> for IOAS_MAP, while we maintain an s2_cfg data structure in
> the shadow S2 domain per SMMU instance that has its own VMID
> but a shared S2 page table pointer?
Yes
> Hmm... Feels very complicated to me. How does that help?
It de-duplicates the page table across multiple SMMU instances.
> > So, I have been exploring a different approach by creating an
> > > internal multiplication inside VCMDQ...
> > 
> > How can that work?
> > 
> > You'd have to have the guest VM to know to replicate to different
> > vCMDQ's? Which isn't the standard SMMU programming model anymore..
> 
> VCMDQ has multiple VINTFs (Virtual Interfaces) that's supposed
> to be used by the host to expose to multiple VMs.
> 
> In a multi-SMMU environment, every single SMMU+VCMDQ instance
> would have one VINTF only that contains one or more VCMDQs. In
> this case, passthrough devices behind different physical SMMU
> instances are straightforwardly attached to different vSMMUs.
Yes, this is the obvious simple impementation
> However, if we can't have multiple vSMMU instances, the guest
> VM (its HW) would enable multiple VINTFs corresponding to the
> number of physical SMMU/VCMDQ instances, for devices to attach
> accordingly. That means I need to figure out a way to pin the
> devices onto those VINTFs, by somehow passing their physical
> SMMU IDs. 
And a way to request the correctly bound vCMDQ from the guest as well.
Sounds really messsy, I'd think multi-smmu is the much cleaner choice
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-23 12:16                                     ` Jason Gunthorpe
@ 2023-03-23 18:13                                       ` Nicolin Chen
  2023-03-23 18:27                                         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-23 18:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Thu, Mar 23, 2023 at 09:16:51AM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 22, 2023 at 01:43:59PM -0700, Nicolin Chen wrote:
> 
> > > So I think for ARM we want to reflect the physical SMMU instances into
> > > vSMMU instances and that feels best done by having a unique S2
> > > iommu_domain for each SMMU instance. Then we know that an invalidation
> > > for a SMMU instance is delivered to that S2's singular CMDQ and things
> > > like vCMDQ become possible.
> > 
> > In that environment, do we still need a VMID unification?
> 
> If each S2 is per-smmu-instance then the VMID can be local to the SMMU
> instance
It sounds like related to the multi-SMMU instance too? Anyway,
it's good to think we that have a way out from requiring this
VMID unification.
> > > > Our approach of setting up a stage-2 mapping in QEMU is to
> > > > map the entire guest memory. I don't see a point in having
> > > > a separate S2 domain, even if there are multiple instances?
> > > 
> > > And then this is the drawback, we don't really want to have duplicated
> > > S2 page tables in the system for every stage 2.
> > > 
> > > Maybe we have made a mistake by allowing the S2 to be an unmanaged
> > > domain. Perhaps we should create the S2 out of an unmanaged domain
> > > like the S1.
> > > 
> > > Then the rules could be
> > >  - Unmanaged domain can be used with every smmu instance, only one
> > >    copy of the page table. The ASID in the iommu_domain is
> > >    kernel-global
> > >  - S2 domain is a child of a shared unmanaged domain. It can be used
> > >    only with the SMMU it is associated with, it has a per-SMMU VM ID
> > >  - S1 domain is a child of a S2 domain, it can be used only with the
> > >    SMMU it's S2 is associated with, just because
> > 
> > The actual S2 pagetable has to stay at the unmanaged domain
> > for IOAS_MAP, while we maintain an s2_cfg data structure in
> > the shadow S2 domain per SMMU instance that has its own VMID
> > but a shared S2 page table pointer?
> 
> Yes
> 
> > Hmm... Feels very complicated to me. How does that help?
> 
> It de-duplicates the page table across multiple SMMU instances.
Oh. So that the s2_cfg data structures can have a shared S2
IOPT while having different VMIDs. This would be a big rework.
It changes the two-domain design for nesting. Should we do
this at a later stage when supporting multi-SMMU instance or
now? And I am not sure Intel would need this...
> > > So, I have been exploring a different approach by creating an
> > > > internal multiplication inside VCMDQ...
> > > 
> > > How can that work?
> > > 
> > > You'd have to have the guest VM to know to replicate to different
> > > vCMDQ's? Which isn't the standard SMMU programming model anymore..
> > 
> > VCMDQ has multiple VINTFs (Virtual Interfaces) that's supposed
> > to be used by the host to expose to multiple VMs.
> > 
> > In a multi-SMMU environment, every single SMMU+VCMDQ instance
> > would have one VINTF only that contains one or more VCMDQs. In
> > this case, passthrough devices behind different physical SMMU
> > instances are straightforwardly attached to different vSMMUs.
> 
> Yes, this is the obvious simple impementation
> 
> > However, if we can't have multiple vSMMU instances, the guest
> > VM (its HW) would enable multiple VINTFs corresponding to the
> > number of physical SMMU/VCMDQ instances, for devices to attach
> > accordingly. That means I need to figure out a way to pin the
> > devices onto those VINTFs, by somehow passing their physical
> > SMMU IDs. 
> 
> And a way to request the correctly bound vCMDQ from the guest as well.
> Sounds really messsy, I'd think multi-smmu is the much cleaner choice
Yes. I agree, we would need the entire QEMU community to give
consent to change that though.
Thanks!
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-23 18:13                                       ` Nicolin Chen
@ 2023-03-23 18:27                                         ` Jason Gunthorpe
  0 siblings, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 18:27 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Tian, Kevin, Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Thu, Mar 23, 2023 at 11:13:48AM -0700, Nicolin Chen wrote:
> Oh. So that the s2_cfg data structures can have a shared S2
> IOPT while having different VMIDs. This would be a big rework.
> It changes the two-domain design for nesting. Should we do
> this at a later stage when supporting multi-SMMU instance or
> now? And I am not sure Intel would need this...
If we do nothing right now then the S2 unmanaged iommu_domain will
carry the vm_id and it will be locked to a single SMMU instance.
To support multi-instance HW qemu would have to duplicate the entire
S2 unmanaged domain to get different vm_ids.
This is basically status-quo today because SMMU already doesn't
support sharing the unmanaged iommu_domain between instances.
If we chart a path to using a dedicated S2 domain then qemu side would
have to change to make a normal HWPT to back the S2 and then create a
real S2 as a child.
This implies that the request for S2 has to be in the driver data
today so that the driver knows if it should enable the unamanged
domain for S2 operation and lock it do an instance.
So long as that is OK we are probably OK to be incremental..
> > And a way to request the correctly bound vCMDQ from the guest as well.
> > Sounds really messsy, I'd think multi-smmu is the much cleaner choice
> 
> Yes. I agree, we would need the entire QEMU community to give
> consent to change that though.
I suppose it wasn't consent, it was someone needs to do the difficult
work.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22  6:42                       ` Nicolin Chen
  2023-03-22 12:43                         ` Jason Gunthorpe
@ 2023-03-24  9:02                         ` Tian, Kevin
  2023-03-24 14:57                           ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Tian, Kevin @ 2023-03-24  9:02 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 22, 2023 2:42 PM
> 
> On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> >
> > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > user pointer to the queue, the size of the queue, then a head
> > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > commands between the head and the tail and handles all those
> > > > > invalidation commands only?
> > > >
> > > > Yes, that is one possible design
> > >
> > > If we cannot have the short path in the kernel then I'm not sure the
> > > value of using native format and queue in the uAPI. Batching can
> > > be enabled over any format.
> >
> > SMMUv3 will have a hardware short path where the HW itself runs the
> > VM's command queue and does this logic.
> >
> > So I like the symmetry of the SW path being close to that.
> 
> A tricky thing here that I just realized:
> 
> With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
> CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
> invalidation IOCTL, and the other hardware accelerated VCMDQ
> handling all TLBI commands by the HW. In this setup, we will
> need a VCMDQ kernel driver to dispatch commands into the two
> different queues.
> 
why doesn't hw generate a vm-exit for unsupported CMDs in VCMDQ
and then let them emulated by vSMMU? such events should be rare
once map/unmap are being conducted...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-24  9:02                         ` Tian, Kevin
@ 2023-03-24 14:57                           ` Jason Gunthorpe
  2023-03-24 17:35                             ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 14:57 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 24, 2023 at 09:02:34AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, March 22, 2023 2:42 PM
> > 
> > On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > >
> > > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > > user pointer to the queue, the size of the queue, then a head
> > > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > > commands between the head and the tail and handles all those
> > > > > > invalidation commands only?
> > > > >
> > > > > Yes, that is one possible design
> > > >
> > > > If we cannot have the short path in the kernel then I'm not sure the
> > > > value of using native format and queue in the uAPI. Batching can
> > > > be enabled over any format.
> > >
> > > SMMUv3 will have a hardware short path where the HW itself runs the
> > > VM's command queue and does this logic.
> > >
> > > So I like the symmetry of the SW path being close to that.
> > 
> > A tricky thing here that I just realized:
> > 
> > With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
> > CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
> > invalidation IOCTL, and the other hardware accelerated VCMDQ
> > handling all TLBI commands by the HW. In this setup, we will
> > need a VCMDQ kernel driver to dispatch commands into the two
> > different queues.
> > 
> 
> why doesn't hw generate a vm-exit for unsupported CMDs in VCMDQ
> and then let them emulated by vSMMU? such events should be rare
> once map/unmap are being conducted...
IIRC vcmdq is defined to only process invalidations, so it would be a
driver error to send anything else. I think this is what Nicolin
means.  Most likely to use it the VM would have to see the nvidia acpi
extension and activate vcmdq in the VM.
If you suggest to overlay the main cmdq with the vcmdq and then don't
tell the guest about it.. Robin suggested something similar.
This idea would be a half and half, the HW would run the queue and the
doorbell and generate error interrupts back to the hypervisor and tell
it that the queue is paused and ask it to fix the failed entry and
restart.
I could see this as an interesting solution, but I don't know if this
HW can support it..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-24 14:57                           ` Jason Gunthorpe
@ 2023-03-24 17:35                             ` Nicolin Chen
  2023-03-28  3:03                               ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-24 17:35 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 24, 2023 at 11:57:09AM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 24, 2023 at 09:02:34AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, March 22, 2023 2:42 PM
> > > 
> > > On Tue, Mar 21, 2023 at 08:48:31AM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > > >
> > > > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > > > user pointer to the queue, the size of the queue, then a head
> > > > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > > > commands between the head and the tail and handles all those
> > > > > > > invalidation commands only?
> > > > > >
> > > > > > Yes, that is one possible design
> > > > >
> > > > > If we cannot have the short path in the kernel then I'm not sure the
> > > > > value of using native format and queue in the uAPI. Batching can
> > > > > be enabled over any format.
> > > >
> > > > SMMUv3 will have a hardware short path where the HW itself runs the
> > > > VM's command queue and does this logic.
> > > >
> > > > So I like the symmetry of the SW path being close to that.
> > > 
> > > A tricky thing here that I just realized:
> > > 
> > > With VCMDQ, the guest will have two CMDQs. One is the vSMMU's
> > > CMDQ handling all non-TLBI commands like CMD_CFGI_STE via the
> > > invalidation IOCTL, and the other hardware accelerated VCMDQ
> > > handling all TLBI commands by the HW. In this setup, we will
> > > need a VCMDQ kernel driver to dispatch commands into the two
> > > different queues.
> > > 
> > 
> > why doesn't hw generate a vm-exit for unsupported CMDs in VCMDQ
> > and then let them emulated by vSMMU? such events should be rare
> > once map/unmap are being conducted...
> 
> IIRC vcmdq is defined to only process invalidations, so it would be a
> driver error to send anything else. I think this is what Nicolin
> means.  Most likely to use it the VM would have to see the nvidia acpi
> extension and activate vcmdq in the VM.
> 
> If you suggest to overlay the main cmdq with the vcmdq and then don't
> tell the guest about it.. Robin suggested something similar.
Yea, I remember that too, from the email that I received from
Robin on Christmas Eve :)
Yet, I haven't got a chance to run some experiment with that.
> This idea would be a half and half, the HW would run the queue and the
> doorbell and generate error interrupts back to the hypervisor and tell
> it that the queue is paused and ask it to fix the failed entry and
> restart.
>
> I could see this as an interesting solution, but I don't know if this
> HW can support it..
It possibly can, since an unsupported command will trigger an
Illegal Command interrupt, then the IRQ handler could read it
out of the CMDQ. Again, I'd need to run some experiment, once
this SMMU nesting series is settled down to certain level.
One immediate thing about this solution is that we still need
a multi-CMDQ support per SMMU instance, besides from a multi-
SMMU instance support. This might be implemented as the ECMDQ
I guess. But I am not sure if there is a ECMDQ HW available,
so that we can add its support first, to fit VCMDQ into it.
Overall, interesting topics! I'd like to carry on along the
way of this series, hoping we can figure out something smart
and solid to implement :)
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-24 17:35                             ` Nicolin Chen
@ 2023-03-28  3:03                               ` Tian, Kevin
  0 siblings, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-28  3:03 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 25, 2023 1:35 AM
> 
> On Fri, Mar 24, 2023 at 11:57:09AM -0300, Jason Gunthorpe wrote:
> >
> > If you suggest to overlay the main cmdq with the vcmdq and then don't
> > tell the guest about it.. Robin suggested something similar.
yes, that's my point.
> 
> Yea, I remember that too, from the email that I received from
> Robin on Christmas Eve :)
> 
> Yet, I haven't got a chance to run some experiment with that.
> 
> > This idea would be a half and half, the HW would run the queue and the
> > doorbell and generate error interrupts back to the hypervisor and tell
> > it that the queue is paused and ask it to fix the failed entry and
> > restart.
> >
> > I could see this as an interesting solution, but I don't know if this
> > HW can support it..
> 
> It possibly can, since an unsupported command will trigger an
> Illegal Command interrupt, then the IRQ handler could read it
> out of the CMDQ. Again, I'd need to run some experiment, once
> this SMMU nesting series is settled down to certain level.
> 
also you want to ensure that error is a recoverable type so
once sw fixes it the hw can continue to behave correctly.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-21 11:48                     ` Jason Gunthorpe
  2023-03-22  6:42                       ` Nicolin Chen
@ 2023-03-24  8:47                       ` Tian, Kevin
  2023-03-24 14:44                         ` Jason Gunthorpe
  1 sibling, 1 reply; 165+ messages in thread
From: Tian, Kevin @ 2023-03-24  8:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 21, 2023 7:49 PM
> 
> On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> 
> > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > user pointer to the queue, the size of the queue, then a head
> > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > commands between the head and the tail and handles all those
> > > > invalidation commands only?
> > >
> > > Yes, that is one possible design
> >
> > If we cannot have the short path in the kernel then I'm not sure the
> > value of using native format and queue in the uAPI. Batching can
> > be enabled over any format.
> 
> SMMUv3 will have a hardware short path where the HW itself runs the
> VM's command queue and does this logic.
> 
> So I like the symmetry of the SW path being close to that.
> 
Out of curiosity. VCMDQ is per SMMU. Does it imply that Qemu needs
to create multiple vSMMU instances if devices assigned to it are behind
different physical SMMUs (plus one instance specific for emulated
devices), to match VCMDQ with a specific device?
btw is VCMDQ in standard SMMU spec or a NVIDIA specific extension?
If the latter does it require extra changes in guest smmu driver?
The symmetry of the SW path has another merit beyond performance.
It allows live migration falling back to the sw short-path with not-so-bad
overhead when the dest machine cannot afford the same number of
VCMDQ's as the src.
But still the main open for in-kernel short-path is what would be the
framework to move part of vIOMMU emulation into the kernel. If this
can be done cleanly then it's better than vhost-iommu which lacks
behind significantly regarding to advanced features. But if it cannot
be done cleanly leaving each vendor move random emulation logic
into the kernel then vhost-iommu sounds more friendly to the kernel
 though lots of work remains to fill the feature gap.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-24  8:47                       ` Tian, Kevin
@ 2023-03-24 14:44                         ` Jason Gunthorpe
  2023-03-28  2:48                           ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 14:44 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 24, 2023 at 08:47:20AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, March 21, 2023 7:49 PM
> > 
> > On Tue, Mar 21, 2023 at 08:34:00AM +0000, Tian, Kevin wrote:
> > 
> > > > > Rephrasing that to put into a design: the IOCTL would pass a
> > > > > user pointer to the queue, the size of the queue, then a head
> > > > > pointer and a tail pointer? Then the kernel reads out all the
> > > > > commands between the head and the tail and handles all those
> > > > > invalidation commands only?
> > > >
> > > > Yes, that is one possible design
> > >
> > > If we cannot have the short path in the kernel then I'm not sure the
> > > value of using native format and queue in the uAPI. Batching can
> > > be enabled over any format.
> > 
> > SMMUv3 will have a hardware short path where the HW itself runs the
> > VM's command queue and does this logic.
> > 
> > So I like the symmetry of the SW path being close to that.
> > 
> 
> Out of curiosity. VCMDQ is per SMMU. Does it imply that Qemu needs
> to create multiple vSMMU instances if devices assigned to it are behind
> different physical SMMUs (plus one instance specific for emulated
> devices), to match VCMDQ with a specific device?
Yes
> btw is VCMDQ in standard SMMU spec or a NVIDIA specific extension?
> If the latter does it require extra changes in guest smmu driver?
It is a mash up of ARM standard ECMDQ with a few additions. I hope ARM
will standardize something someday
> The symmetry of the SW path has another merit beyond performance.
> It allows live migration falling back to the sw short-path with not-so-bad
> overhead when the dest machine cannot afford the same number of
> VCMDQ's as the src.
Well, that requires SW emulation of the VCMDQ thing, but yes
 
> But still the main open for in-kernel short-path is what would be the
> framework to move part of vIOMMU emulation into the kernel. If this
> can be done cleanly then it's better than vhost-iommu which lacks
> behind significantly regarding to advanced features. But if it cannot
> be done cleanly leaving each vendor move random emulation logic
> into the kernel then vhost-iommu sounds more friendly to the kernel
>  though lots of work remains to fill the feature gap.
I assume there are reasonable ways to hook the kernel to kvm, vhost
does it. I've never looked at it. At worst we need to factor some of
the vhost code into some library to allow it.
We want a kernel thread to wakeup on a doorbell ring basically.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-24 14:44                         ` Jason Gunthorpe
@ 2023-03-28  2:48                           ` Tian, Kevin
  2023-03-28 12:26                             ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Tian, Kevin @ 2023-03-28  2:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 24, 2023 10:45 PM
> 
> > But still the main open for in-kernel short-path is what would be the
> > framework to move part of vIOMMU emulation into the kernel. If this
> > can be done cleanly then it's better than vhost-iommu which lacks
> > behind significantly regarding to advanced features. But if it cannot
> > be done cleanly leaving each vendor move random emulation logic
> > into the kernel then vhost-iommu sounds more friendly to the kernel
> >  though lots of work remains to fill the feature gap.
> 
> I assume there are reasonable ways to hook the kernel to kvm, vhost
> does it. I've never looked at it. At worst we need to factor some of
> the vhost code into some library to allow it.
> 
> We want a kernel thread to wakeup on a doorbell ring basically.
> 
kvm supports ioeventfd for the doorbell purpose.
Aside from that I'm not sure which part of vhost can be generalized
to be used by other vIOMMU. it's a in-memory ring structure plus
doorbell so it's easy to fit in the kernel.
But emulated vIOMMUs are typically MMIO-based ring structure
which requires 1) kvm provides a synchronous ioeventfd for MMIO
based head/tail emulation; 2) userspace vIOMMU shares its virtual
register page with the kernel which can then update virtual tail/head
registers w/o exiting to the userspace; 3) the kernel thread can
selectively exit to userspace for cmds which it cannot directly handle.
Those require a new framework to establish.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-28  2:48                           ` Tian, Kevin
@ 2023-03-28 12:26                             ` Jason Gunthorpe
  2023-03-31  8:09                               ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-28 12:26 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Tue, Mar 28, 2023 at 02:48:31AM +0000, Tian, Kevin wrote:
> But emulated vIOMMUs are typically MMIO-based ring structure
> which requires 1) kvm provides a synchronous ioeventfd for MMIO
> based head/tail emulation; 2) userspace vIOMMU shares its virtual
> register page with the kernel which can then update virtual tail/head
> registers w/o exiting to the userspace; 3) the kernel thread can
> selectively exit to userspace for cmds which it cannot directly handle.
What is needed is for the kvm side to capture the store execute it to
some backing memory, and also trigger the eventfd.
It shouldn't need to be synchronous.
For SMMU the interface is layed out with unique 4k pages per-CMDQ that
contains the 3 relevant 8 byte values.
So we could mmap a page from the kernel that has the 3 values. qemu
would install the page in the kvm memory map and it would 
arrange things so that stores reach the 8 bytes and trigger an
eventfd.
Kernel simply reads the cons index after the eventfd, looks in the
IOAS to get the queue memory and does the operation async.
It is not especially conceptually difficult..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-28 12:26                             ` Jason Gunthorpe
@ 2023-03-31  8:09                               ` Tian, Kevin
  0 siblings, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-31  8:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Robin Murphy, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 28, 2023 8:27 PM
> 
> On Tue, Mar 28, 2023 at 02:48:31AM +0000, Tian, Kevin wrote:
> 
> > But emulated vIOMMUs are typically MMIO-based ring structure
> > which requires 1) kvm provides a synchronous ioeventfd for MMIO
> > based head/tail emulation; 2) userspace vIOMMU shares its virtual
> > register page with the kernel which can then update virtual tail/head
> > registers w/o exiting to the userspace; 3) the kernel thread can
> > selectively exit to userspace for cmds which it cannot directly handle.
> 
> What is needed is for the kvm side to capture the store execute it to
> some backing memory, and also trigger the eventfd.
> 
> It shouldn't need to be synchronous.
Correct
> 
> For SMMU the interface is layed out with unique 4k pages per-CMDQ that
> contains the 3 relevant 8 byte values.
VT-d has only one invalidation queue with relevant registers mixed
with other VT-d registers in 4k page. But this should be fine as long
as the new mechanism allows specifying which offsets in mapped
page fall into the fast path.
> 
> So we could mmap a page from the kernel that has the 3 values. qemu
> would install the page in the kvm memory map and it would
> arrange things so that stores reach the 8 bytes and trigger an
> eventfd.
> 
> Kernel simply reads the cons index after the eventfd, looks in the
> IOAS to get the queue memory and does the operation async.
> 
> It is not especially conceptually difficult..
> 
Looks so, at least in concept.
btw regarding to the initial nesting support on smmu do you want
to follow this unique 4k layout plus native cmdq format or just
the latter (i.e. cmd format is native but head/tail/start is defined
in a sw customized way)?
If the latter I wonder whether it's necessary to generalize it so
the batching format is vendor-agnostic while the specific cmd/
descriptor format is vendor specific.
Thanks
Kevin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
 
 
 
 
 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 15:31     ` Jason Gunthorpe
  2023-03-10  4:20       ` Nicolin Chen
@ 2023-03-17  9:24       ` Tian, Kevin
  1 sibling, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17  9:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Nicolin Chen, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 9, 2023 11:31 PM
> 
> > Also, perhaps I've overlooked something obvious, but what's the
> procedure
> > for reflecting illegal commands back to userspace? Some of the things we're
> > silently ignoring here would be expected to raise CERROR_ILL. Same goes
> for
> > all the other fault events which may occur due to invalid S1 config, come to
> > think of it.
> 
> Perhaps the ioctl should fail and the userpace viommu should inject
> this CERROR_ILL?
> 
> But I'm also wondering if we are making a mistake to not just have the
> kernel driver to expose a SW work queue in its native format and the
> ioctl is only just 'read the queue'. Then it could (asynchronously!)
> push back answers, real or emulated, as well, including all error
> indications.
> 
> I think we got down this synchronous one-ioctl-per-invalidation path
> because that was what the original generic stuff wanted to do. Is it
> what we really want? Kevin what is your perspective?
> 
That's an interesting idea. I think the original synchronous model
also matches how intel-iommu driver works today. In most time
it does synchronous one-invalidation at one time. 
Another problem is how to map invalidation scope in native descriptor
format to affected devices.
VT-d allows per-DID invalidation. This needs extra information to map
vDID to affected devices in the kernel.
It also allows a global invalidation type which invalidate all vDIDs. This
might be easy by simply looping every device bound to the iommufd_ctx.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 14:49   ` Robin Murphy
  2023-03-09 15:31     ` Jason Gunthorpe
@ 2023-03-10  3:51     ` Nicolin Chen
  2023-03-10 17:53       ` Robin Murphy
  2023-03-17  9:47     ` Tian, Kevin
  2 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-10  3:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-09 10:53, Nicolin Chen wrote:
> > Add arm_smmu_cache_invalidate_user() function for user space to invalidate
> > TLB entries and Context Descriptors, since either an IO page table entrie
> > or a Context Descriptor in the user space is still cached by the hardware.
> > 
> > The input user_data is defined in "struct iommu_hwpt_invalidate_arm_smmuv3"
> > that contains the essential data for corresponding invalidation commands.
> > 
> > Co-developed-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 +++++++++++++++++++++
> >   1 file changed, 56 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index ac63185ae268..7d73eab5e7f4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2880,9 +2880,65 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> >       arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
> >   }
> > 
> > +static void arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
> > +                                        void *user_data)
> > +{
> > +     struct iommu_hwpt_invalidate_arm_smmuv3 *inv_info = user_data;
> > +     struct arm_smmu_cmdq_ent cmd = { .opcode = inv_info->opcode };
> > +     struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +     size_t granule_size = inv_info->granule_size;
> > +     unsigned long iova = 0;
> > +     size_t size = 0;
> > +     int ssid = 0;
> > +
> > +     if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
> > +             return;
> > +
> > +     switch (inv_info->opcode) {
> > +     case CMDQ_OP_CFGI_CD:
> > +     case CMDQ_OP_CFGI_CD_ALL:
> > +             return arm_smmu_sync_cd(smmu_domain, inv_info->ssid, true);
> 
> Since we let the guest choose its own S1Fmt (and S1CDMax, yet not
> S1DSS?), how can we assume leaf = true here?
The s1dss is forwarded in the user_data structure too. So, the
driver should have set that too down to a nested STE. Will add
this missing pathway.
And you are right that the guest OS can use a 2-level table, so
we should set leaf = false to cover all cases, I think.
> > +     case CMDQ_OP_TLBI_NH_VA:
> > +             cmd.tlbi.asid = inv_info->asid;
> > +             fallthrough;
> > +     case CMDQ_OP_TLBI_NH_VAA:
> > +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
> 
> Non-range invalidations with TG=0 are perfectly legal, and should not be
> ignored.
I assume that you are talking about the pgsize_bitmap check.
QEMU embeds a !tg case into the granule_size [1]. So it might
not be straightforward to cover that case. Let me see how to
untangle different cases and handle them accordingly.
[1] https://patchew.org/QEMU/20200824094811.15439-1-peter.maydell@linaro.org/20200824094811.15439-9-peter.maydell@linaro.org/
> > +                 granule_size & ~(1ULL << __ffs(granule_size)))
> 
> If that's intended to mean is_power_of_2(), please just use is_power_of_2().
> 
> > +                     return;
> > +
> > +             iova = inv_info->range.start;
> > +             size = inv_info->range.last - inv_info->range.start + 1;
> 
> If the design here is that user_data is so deeply driver-specific and
> special to the point that it can't possibly be passed as a type-checked
> union of the known and publicly-visible UAPI types that it is, wouldn't
> it make sense to just encode the whole thing in the expected format and
> not have to make these kinds of niggling little conversions at both ends?
Hmm, that makes sense to me.
I just tracked back the history of Eric's previous work. There
was a mismatch between guest and host that RIL isn't supported
by the hardware. Now, guest can have whatever information it'd
need from the host to send supported instructions.
> > +             if (!size)
> > +                     return;
> > +
> > +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> > +             cmd.tlbi.leaf = inv_info->flags & IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF;
> > +             __arm_smmu_tlb_inv_range(&cmd, iova, size, granule_size, smmu_domain);
> > +             break;
> > +     case CMDQ_OP_TLBI_NH_ASID:
> > +             cmd.tlbi.asid = inv_info->asid;
> > +             fallthrough;
> > +     case CMDQ_OP_TLBI_NH_ALL:
> > +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
> > +             arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
> > +             break;
> > +     case CMDQ_OP_ATC_INV:
> > +             ssid = inv_info->ssid;
> > +             iova = inv_info->range.start;
> > +             size = inv_info->range.last - inv_info->range.start + 1;
> > +             break;
> 
> Can we do any better than multiplying every single ATC_INV command, even
> for random bogus StreamIDs, into multiple commands across every physical
> device? In fact, I'm not entirely confident this isn't problematic, if
> the guest wishes to send invalidations for one device specifically while
> it's put some other device into a state where sending it a command would
> do something bad. At the very least, it's liable to be confusing if the
> guest sends a command for one StreamID but gets an error back for a
> different one.
We'd need here an sid translation from the guest value to the
host value to specify a device, so as not to multiply the cmd
with the device list, if I understand it correctly?
> And if we expect ATS, what about PRI? Per patch #4 you're currently
> offering that to the guest as well.
Oh, I should have probably blocked PRI. The PRI and the fault
injection will be followed after the basic 2-stage translation
patches. And I don't have a supporting hardware to test PRI.
> 
> > +     default:
> > +             return;
> 
> What about NSNH_ALL? That still needs to invalidate all the S1 context
> that the guest *thinks* it's invalidating.
NSNH_ALL is translated to NH_ALL at the guest level. But maybe
it should have been done here instead.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10  3:51     ` Nicolin Chen
@ 2023-03-10 17:53       ` Robin Murphy
  2023-03-10 18:49         ` Jason Gunthorpe
  2023-03-11 12:38         ` Nicolin Chen
  0 siblings, 2 replies; 165+ messages in thread
From: Robin Murphy @ 2023-03-10 17:53 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 2023-03-10 03:51, Nicolin Chen wrote:
> On Thu, Mar 09, 2023 at 02:49:14PM +0000, Robin Murphy wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023-03-09 10:53, Nicolin Chen wrote:
>>> Add arm_smmu_cache_invalidate_user() function for user space to invalidate
>>> TLB entries and Context Descriptors, since either an IO page table entrie
>>> or a Context Descriptor in the user space is still cached by the hardware.
>>>
>>> The input user_data is defined in "struct iommu_hwpt_invalidate_arm_smmuv3"
>>> that contains the essential data for corresponding invalidation commands.
>>>
>>> Co-developed-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>> ---
>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 56 +++++++++++++++++++++
>>>    1 file changed, 56 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index ac63185ae268..7d73eab5e7f4 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -2880,9 +2880,65 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>>>        arm_smmu_sva_remove_dev_pasid(domain, dev, pasid);
>>>    }
>>>
>>> +static void arm_smmu_cache_invalidate_user(struct iommu_domain *domain,
>>> +                                        void *user_data)
>>> +{
>>> +     struct iommu_hwpt_invalidate_arm_smmuv3 *inv_info = user_data;
>>> +     struct arm_smmu_cmdq_ent cmd = { .opcode = inv_info->opcode };
>>> +     struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>> +     size_t granule_size = inv_info->granule_size;
>>> +     unsigned long iova = 0;
>>> +     size_t size = 0;
>>> +     int ssid = 0;
>>> +
>>> +     if (!smmu || !smmu_domain->s2 || domain->type != IOMMU_DOMAIN_NESTED)
>>> +             return;
>>> +
>>> +     switch (inv_info->opcode) {
>>> +     case CMDQ_OP_CFGI_CD:
>>> +     case CMDQ_OP_CFGI_CD_ALL:
>>> +             return arm_smmu_sync_cd(smmu_domain, inv_info->ssid, true);
>>
>> Since we let the guest choose its own S1Fmt (and S1CDMax, yet not
>> S1DSS?), how can we assume leaf = true here?
> 
> The s1dss is forwarded in the user_data structure too. So, the
> driver should have set that too down to a nested STE. Will add
> this missing pathway.
> 
> And you are right that the guest OS can use a 2-level table, so
> we should set leaf = false to cover all cases, I think.
> 
>>> +     case CMDQ_OP_TLBI_NH_VA:
>>> +             cmd.tlbi.asid = inv_info->asid;
>>> +             fallthrough;
>>> +     case CMDQ_OP_TLBI_NH_VAA:
>>> +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
>>
>> Non-range invalidations with TG=0 are perfectly legal, and should not be
>> ignored.
> 
> I assume that you are talking about the pgsize_bitmap check.
> 
> QEMU embeds a !tg case into the granule_size [1]. So it might
> not be straightforward to cover that case. Let me see how to
> untangle different cases and handle them accordingly.
Oh, double-checking patch #2, that might be me misunderstanding the 
interface. I hadn't realised that the UAPI was apparently modelled on 
arm_smmu_tlb_inv_range_asid() rather than actual SMMU commands :)
I really think UAPI should reflect the hardware and encode TG and TTL 
directly. Especially since there's technically a flaw in the current 
driver where we assume TTL in cases where it isn't actually known, thus 
may potentially fail to invalidate level 2 block entries when removing a 
level 1 table, since io-pgtable passes the level 3 granule in that case. 
When range invalidation came along, the distinction between "all leaves 
are definitely at the last level" and "use last-level granularity to 
make sure everything at at any level is hit" started to matter, but the 
interface never caught up. It hasn't seemed desperately urgent to fix 
(who does 1GB+ unmaps outside of VFIO teardown anyway?), but we must 
definitely not bake the same mistake into user ABI.
Of course, there might then be cases where we need to transform 
non-range commands into range commands for the sake of workarounds, but 
that's our own problem to deal with.
> [1] https://patchew.org/QEMU/20200824094811.15439-1-peter.maydell@linaro.org/20200824094811.15439-9-peter.maydell@linaro.org/
> 
>>> +                 granule_size & ~(1ULL << __ffs(granule_size)))
>>
>> If that's intended to mean is_power_of_2(), please just use is_power_of_2().
>>
>>> +                     return;
>>> +
>>> +             iova = inv_info->range.start;
>>> +             size = inv_info->range.last - inv_info->range.start + 1;
>>
>> If the design here is that user_data is so deeply driver-specific and
>> special to the point that it can't possibly be passed as a type-checked
>> union of the known and publicly-visible UAPI types that it is, wouldn't
>> it make sense to just encode the whole thing in the expected format and
>> not have to make these kinds of niggling little conversions at both ends?
> 
> Hmm, that makes sense to me.
> 
> I just tracked back the history of Eric's previous work. There
> was a mismatch between guest and host that RIL isn't supported
> by the hardware. Now, guest can have whatever information it'd
> need from the host to send supported instructions.
> 
>>> +             if (!size)
>>> +                     return;
>>> +
>>> +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
>>> +             cmd.tlbi.leaf = inv_info->flags & IOMMU_SMMUV3_CMDQ_TLBI_VA_LEAF;
>>> +             __arm_smmu_tlb_inv_range(&cmd, iova, size, granule_size, smmu_domain);
>>> +             break;
>>> +     case CMDQ_OP_TLBI_NH_ASID:
>>> +             cmd.tlbi.asid = inv_info->asid;
>>> +             fallthrough;
>>> +     case CMDQ_OP_TLBI_NH_ALL:
>>> +             cmd.tlbi.vmid = smmu_domain->s2->s2_cfg.vmid;
>>> +             arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
>>> +             break;
>>> +     case CMDQ_OP_ATC_INV:
>>> +             ssid = inv_info->ssid;
>>> +             iova = inv_info->range.start;
>>> +             size = inv_info->range.last - inv_info->range.start + 1;
>>> +             break;
>>
>> Can we do any better than multiplying every single ATC_INV command, even
>> for random bogus StreamIDs, into multiple commands across every physical
>> device? In fact, I'm not entirely confident this isn't problematic, if
>> the guest wishes to send invalidations for one device specifically while
>> it's put some other device into a state where sending it a command would
>> do something bad. At the very least, it's liable to be confusing if the
>> guest sends a command for one StreamID but gets an error back for a
>> different one.
> 
> We'd need here an sid translation from the guest value to the
> host value to specify a device, so as not to multiply the cmd
> with the device list, if I understand it correctly?
I guess it depends on whether IOMMUFD is aware of the vSID->device 
relationships that the VMM is using. If so, then it should be OK for the 
VMM to pass through the vSID directly, and we can translate and 
sanity-check it internally. Otherwise, the interface might have to 
require the VMM to translate vSID->RID and pass the corresponding host 
RID, which we can then map back to a SID (userspace cannot do the full 
vSID->SID by itself, and even if it could that would probably be more 
awkward to validate).
>> And if we expect ATS, what about PRI? Per patch #4 you're currently
>> offering that to the guest as well.
> 
> Oh, I should have probably blocked PRI. The PRI and the fault
> injection will be followed after the basic 2-stage translation
> patches. And I don't have a supporting hardware to test PRI.
> 
>>
>>> +     default:
>>> +             return;
>>
>> What about NSNH_ALL? That still needs to invalidate all the S1 context
>> that the guest *thinks* it's invalidating.
> 
> NSNH_ALL is translated to NH_ALL at the guest level. But maybe
> it should have been done here instead.
Yes. It seems the worst of both worlds to have an interface which takes 
raw opcodes rather than an enum of supported commands, but still 
requires userspace to know which opcodes are supported and which ones 
don't work as expected even though they are entirely reasonable to use 
in the context of the stage-1-only SMMU being emulated.
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10 17:53       ` Robin Murphy
@ 2023-03-10 18:49         ` Jason Gunthorpe
  2023-03-11 12:38         ` Nicolin Chen
  1 sibling, 0 replies; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 18:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 05:53:46PM +0000, Robin Murphy wrote:
> I guess it depends on whether IOMMUFD is aware of the vSID->device
> relationships that the VMM is using. If so, then it should be OK for the VMM
> to pass through the vSID directly, and we can translate and sanity-check it
> internally. Otherwise, the interface might have to require the VMM to
> translate vSID->RID and pass the corresponding host RID, which we can then
> map back to a SID (userspace cannot do the full vSID->SID by itself, and
> even if it could that would probably be more awkward to validate).
The thing we have in iommufd is the "idevid" ie the handle for
the 'struct device' which is also the handle for the phyiscal SID in
the iommu..
The trouble is that there is not such an easy way for the iommu driver
to translate an idevid at this point since it would have to call out
from a built-in kernel driver to the iommufd module :( :( We have to
eventually solve that but I was hoping it wouldn't have to be on the
fast path...
So, having a vSID xarray in the driver that holds the struct device *
is possibly a good thing. Especially if the vCMDQ scheme needs the
same information.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-10 17:53       ` Robin Murphy
  2023-03-10 18:49         ` Jason Gunthorpe
@ 2023-03-11 12:38         ` Nicolin Chen
  2023-03-13 13:07           ` Robin Murphy
  1 sibling, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-11 12:38 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Fri, Mar 10, 2023 at 05:53:46PM +0000, Robin Murphy wrote:
> > > > +     case CMDQ_OP_TLBI_NH_VA:
> > > > +             cmd.tlbi.asid = inv_info->asid;
> > > > +             fallthrough;
> > > > +     case CMDQ_OP_TLBI_NH_VAA:
> > > > +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
> > > 
> > > Non-range invalidations with TG=0 are perfectly legal, and should not be
> > > ignored.
> > 
> > I assume that you are talking about the pgsize_bitmap check.
> > 
> > QEMU embeds a !tg case into the granule_size [1]. So it might
> > not be straightforward to cover that case. Let me see how to
> > untangle different cases and handle them accordingly.
> 
> Oh, double-checking patch #2, that might be me misunderstanding the
> interface. I hadn't realised that the UAPI was apparently modelled on
> arm_smmu_tlb_inv_range_asid() rather than actual SMMU commands :)
Yea. In fact, most of the invalidation info in QEMU was packed
for the previously defined general cache invalidation structure,
and the range invalidation part is still not quite independent.
> I really think UAPI should reflect the hardware and encode TG and TTL
> directly. Especially since there's technically a flaw in the current
> driver where we assume TTL in cases where it isn't actually known, thus
> may potentially fail to invalidate level 2 block entries when removing a
> level 1 table, since io-pgtable passes the level 3 granule in that case.
Do you mean something like hw_info forwarding pgsize_bitmap/tg
to the guest? Or the other direction?
> When range invalidation came along, the distinction between "all leaves
> are definitely at the last level" and "use last-level granularity to
> make sure everything at at any level is hit" started to matter, but the
> interface never caught up. It hasn't seemed desperately urgent to fix
> (who does 1GB+ unmaps outside of VFIO teardown anyway?), but we must
> definitely not bake the same mistake into user ABI.
> 
> Of course, there might then be cases where we need to transform
> non-range commands into range commands for the sake of workarounds, but
> that's our own problem to deal with.
Noted it down.
> > > What about NSNH_ALL? That still needs to invalidate all the S1 context
> > > that the guest *thinks* it's invalidating.
> > 
> > NSNH_ALL is translated to NH_ALL at the guest level. But maybe
> > it should have been done here instead.
> 
> Yes. It seems the worst of both worlds to have an interface which takes
> raw opcodes rather than an enum of supported commands, but still
> requires userspace to know which opcodes are supported and which ones
> don't work as expected even though they are entirely reasonable to use
> in the context of the stage-1-only SMMU being emulated.
Maybe a list of supported TLBI commands via the hw_info uAPI?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-11 12:38         ` Nicolin Chen
@ 2023-03-13 13:07           ` Robin Murphy
  2023-03-16  0:01             ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-13 13:07 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 2023-03-11 12:38, Nicolin Chen wrote:
> On Fri, Mar 10, 2023 at 05:53:46PM +0000, Robin Murphy wrote:
> 
>>>>> +     case CMDQ_OP_TLBI_NH_VA:
>>>>> +             cmd.tlbi.asid = inv_info->asid;
>>>>> +             fallthrough;
>>>>> +     case CMDQ_OP_TLBI_NH_VAA:
>>>>> +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
>>>>
>>>> Non-range invalidations with TG=0 are perfectly legal, and should not be
>>>> ignored.
>>>
>>> I assume that you are talking about the pgsize_bitmap check.
>>>
>>> QEMU embeds a !tg case into the granule_size [1]. So it might
>>> not be straightforward to cover that case. Let me see how to
>>> untangle different cases and handle them accordingly.
>>
>> Oh, double-checking patch #2, that might be me misunderstanding the
>> interface. I hadn't realised that the UAPI was apparently modelled on
>> arm_smmu_tlb_inv_range_asid() rather than actual SMMU commands :)
> 
> Yea. In fact, most of the invalidation info in QEMU was packed
> for the previously defined general cache invalidation structure,
> and the range invalidation part is still not quite independent.
> 
>> I really think UAPI should reflect the hardware and encode TG and TTL
>> directly. Especially since there's technically a flaw in the current
>> driver where we assume TTL in cases where it isn't actually known, thus
>> may potentially fail to invalidate level 2 block entries when removing a
>> level 1 table, since io-pgtable passes the level 3 granule in that case.
> 
> Do you mean something like hw_info forwarding pgsize_bitmap/tg
> to the guest? Or the other direction?
I mean if the interface wants to support range invalidations in a way 
which works correctly, then it should ideally carry both the TG and TTL 
fields from the guest command straight through to the host. If not, then 
at the very least the host must always assume TTL=0, because it cannot 
correctly infer otherwise once the guest command's original intent has 
been lost.
>> When range invalidation came along, the distinction between "all leaves
>> are definitely at the last level" and "use last-level granularity to
>> make sure everything at at any level is hit" started to matter, but the
>> interface never caught up. It hasn't seemed desperately urgent to fix
>> (who does 1GB+ unmaps outside of VFIO teardown anyway?), but we must
>> definitely not bake the same mistake into user ABI.
>>
>> Of course, there might then be cases where we need to transform
>> non-range commands into range commands for the sake of workarounds, but
>> that's our own problem to deal with.
> 
> Noted it down.
> 
>>>> What about NSNH_ALL? That still needs to invalidate all the S1 context
>>>> that the guest *thinks* it's invalidating.
>>>
>>> NSNH_ALL is translated to NH_ALL at the guest level. But maybe
>>> it should have been done here instead.
>>
>> Yes. It seems the worst of both worlds to have an interface which takes
>> raw opcodes rather than an enum of supported commands, but still
>> requires userspace to know which opcodes are supported and which ones
>> don't work as expected even though they are entirely reasonable to use
>> in the context of the stage-1-only SMMU being emulated.
> 
> Maybe a list of supported TLBI commands via the hw_info uAPI?
I don't think it's all that difficult to implicitly support all commands 
that are valid for a stage-1-only SMMU, it just needs the right 
interface design to be capable of encoding them all completely and 
unambiguously. Coming back to the previous point about the address 
encoding, I think that means basing it more directly on the actual 
SMMUv3 commands, rather than on io-pgtable's abstraction of invalidation 
with SMMUv3 opcodes bolted on.
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-13 13:07           ` Robin Murphy
@ 2023-03-16  0:01             ` Nicolin Chen
  2023-03-16 14:58               ` Robin Murphy
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16  0:01 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 13, 2023 at 01:07:42PM +0000, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023-03-11 12:38, Nicolin Chen wrote:
> > On Fri, Mar 10, 2023 at 05:53:46PM +0000, Robin Murphy wrote:
> > 
> > > > > > +     case CMDQ_OP_TLBI_NH_VA:
> > > > > > +             cmd.tlbi.asid = inv_info->asid;
> > > > > > +             fallthrough;
> > > > > > +     case CMDQ_OP_TLBI_NH_VAA:
> > > > > > +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
> > > > > 
> > > > > Non-range invalidations with TG=0 are perfectly legal, and should not be
> > > > > ignored.
> > > > 
> > > > I assume that you are talking about the pgsize_bitmap check.
> > > > 
> > > > QEMU embeds a !tg case into the granule_size [1]. So it might
> > > > not be straightforward to cover that case. Let me see how to
> > > > untangle different cases and handle them accordingly.
> > > 
> > > Oh, double-checking patch #2, that might be me misunderstanding the
> > > interface. I hadn't realised that the UAPI was apparently modelled on
> > > arm_smmu_tlb_inv_range_asid() rather than actual SMMU commands :)
> > 
> > Yea. In fact, most of the invalidation info in QEMU was packed
> > for the previously defined general cache invalidation structure,
> > and the range invalidation part is still not quite independent.
> > 
> > > I really think UAPI should reflect the hardware and encode TG and TTL
> > > directly. Especially since there's technically a flaw in the current
> > > driver where we assume TTL in cases where it isn't actually known, thus
> > > may potentially fail to invalidate level 2 block entries when removing a
> > > level 1 table, since io-pgtable passes the level 3 granule in that case.
> > 
> > Do you mean something like hw_info forwarding pgsize_bitmap/tg
> > to the guest? Or the other direction?
> 
> I mean if the interface wants to support range invalidations in a way
> which works correctly, then it should ideally carry both the TG and TTL
> fields from the guest command straight through to the host. If not, then
> at the very least the host must always assume TTL=0, because it cannot
> correctly infer otherwise once the guest command's original intent has
> been lost.
Oh, it's about hypervisor simply forwarding the entire CMD to
the host side. Jason is suggesting a fast approach by letting
host kernel read the CMDQ directly to get the raw CMD. Perhaps
that would address this comments about TG/TTL too.
I wonder if there could be other case than a WAR, where TG/TTL
fields from the guest's aren't supported by the host. And then
should the host handle it with a different CMD?
> > > When range invalidation came along, the distinction between "all leaves
> > > are definitely at the last level" and "use last-level granularity to
> > > make sure everything at at any level is hit" started to matter, but the
> > > interface never caught up. It hasn't seemed desperately urgent to fix
> > > (who does 1GB+ unmaps outside of VFIO teardown anyway?), but we must
> > > definitely not bake the same mistake into user ABI.
> > > 
> > > Of course, there might then be cases where we need to transform
> > > non-range commands into range commands for the sake of workarounds, but
> > > that's our own problem to deal with.
> > 
> > Noted it down.
> > 
> > > > > What about NSNH_ALL? That still needs to invalidate all the S1 context
> > > > > that the guest *thinks* it's invalidating.
> > > > 
> > > > NSNH_ALL is translated to NH_ALL at the guest level. But maybe
> > > > it should have been done here instead.
> > > 
> > > Yes. It seems the worst of both worlds to have an interface which takes
> > > raw opcodes rather than an enum of supported commands, but still
> > > requires userspace to know which opcodes are supported and which ones
> > > don't work as expected even though they are entirely reasonable to use
> > > in the context of the stage-1-only SMMU being emulated.
> > 
> > Maybe a list of supported TLBI commands via the hw_info uAPI?
> 
> I don't think it's all that difficult to implicitly support all commands
> that are valid for a stage-1-only SMMU, it just needs the right
> interface design to be capable of encoding them all completely and
> unambiguously. Coming back to the previous point about the address
> encoding, I think that means basing it more directly on the actual
> SMMUv3 commands, rather than on io-pgtable's abstraction of invalidation
> with SMMUv3 opcodes bolted on.
Yea, with the actual commands from the guest, the host can do
something with its supported commands, I think.
Thanks
Nicolin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-16  0:01             ` Nicolin Chen
@ 2023-03-16 14:58               ` Robin Murphy
  2023-03-16 21:09                 ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Robin Murphy @ 2023-03-16 14:58 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On 2023-03-16 00:01, Nicolin Chen wrote:
> On Mon, Mar 13, 2023 at 01:07:42PM +0000, Robin Murphy wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023-03-11 12:38, Nicolin Chen wrote:
>>> On Fri, Mar 10, 2023 at 05:53:46PM +0000, Robin Murphy wrote:
>>>
>>>>>>> +     case CMDQ_OP_TLBI_NH_VA:
>>>>>>> +             cmd.tlbi.asid = inv_info->asid;
>>>>>>> +             fallthrough;
>>>>>>> +     case CMDQ_OP_TLBI_NH_VAA:
>>>>>>> +             if (!granule_size || !(granule_size & smmu->pgsize_bitmap) ||
>>>>>>
>>>>>> Non-range invalidations with TG=0 are perfectly legal, and should not be
>>>>>> ignored.
>>>>>
>>>>> I assume that you are talking about the pgsize_bitmap check.
>>>>>
>>>>> QEMU embeds a !tg case into the granule_size [1]. So it might
>>>>> not be straightforward to cover that case. Let me see how to
>>>>> untangle different cases and handle them accordingly.
>>>>
>>>> Oh, double-checking patch #2, that might be me misunderstanding the
>>>> interface. I hadn't realised that the UAPI was apparently modelled on
>>>> arm_smmu_tlb_inv_range_asid() rather than actual SMMU commands :)
>>>
>>> Yea. In fact, most of the invalidation info in QEMU was packed
>>> for the previously defined general cache invalidation structure,
>>> and the range invalidation part is still not quite independent.
>>>
>>>> I really think UAPI should reflect the hardware and encode TG and TTL
>>>> directly. Especially since there's technically a flaw in the current
>>>> driver where we assume TTL in cases where it isn't actually known, thus
>>>> may potentially fail to invalidate level 2 block entries when removing a
>>>> level 1 table, since io-pgtable passes the level 3 granule in that case.
>>>
>>> Do you mean something like hw_info forwarding pgsize_bitmap/tg
>>> to the guest? Or the other direction?
>>
>> I mean if the interface wants to support range invalidations in a way
>> which works correctly, then it should ideally carry both the TG and TTL
>> fields from the guest command straight through to the host. If not, then
>> at the very least the host must always assume TTL=0, because it cannot
>> correctly infer otherwise once the guest command's original intent has
>> been lost.
> 
> Oh, it's about hypervisor simply forwarding the entire CMD to
> the host side. Jason is suggesting a fast approach by letting
> host kernel read the CMDQ directly to get the raw CMD. Perhaps
> that would address this comments about TG/TTL too.
That did cross my mind, but given the usage model, having host userspace 
give guest memory whose contents it can't control (unless it pauses the 
whole VM on all CPUs) directly to the host kernel just seems to invite 
more potential problems than necessary. Commands aren't big, so I think 
it's fair to expect the VMM to marshal them into host memory, and save 
the host kernel from ever having to reason about any races or other 
emulation details which may exist between a VM and its VMM.
> I wonder if there could be other case than a WAR, where TG/TTL
> fields from the guest's aren't supported by the host. And then
> should the host handle it with a different CMD?
As Eric found previously, there's a clear benefit in emulating range 
invalidation for guests even if the underlying hardware doesn't support 
it, to minimise trapping. But that's not hard, and the patch as-is is 
already achieving it. All we need to be careful to avoid is issuing 
hardware commands with *less* scope than guest originally asked for - if 
the guest asks for a nonsense TG/TTL which doesn't match its current 
context, that's fine. The interface just has to ensure that a VMM's SMMU 
emulation *is* able to make a nested S1 context behave as expected by 
the architecture; we don't need to care if a guest uses the architecture 
wrong, since it's only hurting itself.
>>>> When range invalidation came along, the distinction between "all leaves
>>>> are definitely at the last level" and "use last-level granularity to
>>>> make sure everything at at any level is hit" started to matter, but the
>>>> interface never caught up. It hasn't seemed desperately urgent to fix
>>>> (who does 1GB+ unmaps outside of VFIO teardown anyway?), but we must
>>>> definitely not bake the same mistake into user ABI.
>>>>
>>>> Of course, there might then be cases where we need to transform
>>>> non-range commands into range commands for the sake of workarounds, but
>>>> that's our own problem to deal with.
>>>
>>> Noted it down.
>>>
>>>>>> What about NSNH_ALL? That still needs to invalidate all the S1 context
>>>>>> that the guest *thinks* it's invalidating.
>>>>>
>>>>> NSNH_ALL is translated to NH_ALL at the guest level. But maybe
>>>>> it should have been done here instead.
>>>>
>>>> Yes. It seems the worst of both worlds to have an interface which takes
>>>> raw opcodes rather than an enum of supported commands, but still
>>>> requires userspace to know which opcodes are supported and which ones
>>>> don't work as expected even though they are entirely reasonable to use
>>>> in the context of the stage-1-only SMMU being emulated.
>>>
>>> Maybe a list of supported TLBI commands via the hw_info uAPI?
>>
>> I don't think it's all that difficult to implicitly support all commands
>> that are valid for a stage-1-only SMMU, it just needs the right
>> interface design to be capable of encoding them all completely and
>> unambiguously. Coming back to the previous point about the address
>> encoding, I think that means basing it more directly on the actual
>> SMMUv3 commands, rather than on io-pgtable's abstraction of invalidation
>> with SMMUv3 opcodes bolted on.
> 
> Yea, with the actual commands from the guest, the host can do
> something with its supported commands, I think.
The one slightly fiddly case, of course, is CMD_SYNC, but I think that's 
just a matter for clear documentation of the expectations and behaviour.
Thanks,
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-16 14:58               ` Robin Murphy
@ 2023-03-16 21:09                 ` Nicolin Chen
  2023-03-20  1:32                   ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-16 21:09 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 16, 2023 at 02:58:39PM +0000, Robin Murphy wrote:
> > > > > I really think UAPI should reflect the hardware and encode TG and TTL
> > > > > directly. Especially since there's technically a flaw in the current
> > > > > driver where we assume TTL in cases where it isn't actually known, thus
> > > > > may potentially fail to invalidate level 2 block entries when removing a
> > > > > level 1 table, since io-pgtable passes the level 3 granule in that case.
> > > > 
> > > > Do you mean something like hw_info forwarding pgsize_bitmap/tg
> > > > to the guest? Or the other direction?
> > > 
> > > I mean if the interface wants to support range invalidations in a way
> > > which works correctly, then it should ideally carry both the TG and TTL
> > > fields from the guest command straight through to the host. If not, then
> > > at the very least the host must always assume TTL=0, because it cannot
> > > correctly infer otherwise once the guest command's original intent has
> > > been lost.
> > 
> > Oh, it's about hypervisor simply forwarding the entire CMD to
> > the host side. Jason is suggesting a fast approach by letting
> > host kernel read the CMDQ directly to get the raw CMD. Perhaps
> > that would address this comments about TG/TTL too.
> 
> That did cross my mind, but given the usage model, having host userspace
> give guest memory whose contents it can't control (unless it pauses the
> whole VM on all CPUs) directly to the host kernel just seems to invite
> more potential problems than necessary. Commands aren't big, so I think
> it's fair to expect the VMM to marshal them into host memory, and save
> the host kernel from ever having to reason about any races or other
> emulation details which may exist between a VM and its VMM.
An invalidation ioctl is synchronously executed from the top
level in QEMU when it traps any CMDQ_PROD write. So, either
packing the fields of a command into a data structure or just
forwarding the command directly, it seems to be the same for
the matter of worrying about race conditions?
> > I wonder if there could be other case than a WAR, where TG/TTL
> > fields from the guest's aren't supported by the host. And then
> > should the host handle it with a different CMD?
> 
> As Eric found previously, there's a clear benefit in emulating range
> invalidation for guests even if the underlying hardware doesn't support
> it, to minimise trapping. But that's not hard, and the patch as-is is
> already achieving it. All we need to be careful to avoid is issuing
> hardware commands with *less* scope than guest originally asked for - if
> the guest asks for a nonsense TG/TTL which doesn't match its current
> context, that's fine. The interface just has to ensure that a VMM's SMMU
> emulation *is* able to make a nested S1 context behave as expected by
> the architecture; we don't need to care if a guest uses the architecture
> wrong, since it's only hurting itself.
Agreed. Yet, similar to moving the emulation of TLBI_NSNH_ALL,
from QEMU to the kernel, we could move the emulations of other
TLBI commands to the kernel too? So that a hyperviosr doesn't
need to know the underlying supported TLBI commands by a host,
and then simply relies on the host to emulate the command with
whatever the actual commands that the host can do, addressing
one of your comments mentioned in the conversation below?
> > > > > > > What about NSNH_ALL? That still needs to invalidate all the S1 context
> > > > > > > that the guest *thinks* it's invalidating.
> > > > > > 
> > > > > > NSNH_ALL is translated to NH_ALL at the guest level. But maybe
> > > > > > it should have been done here instead.
> > > > > 
> > > > > Yes. It seems the worst of both worlds to have an interface which takes
> > > > > raw opcodes rather than an enum of supported commands, but still
> > > > > requires userspace to know which opcodes are supported and which ones
> > > > > don't work as expected even though they are entirely reasonable to use
> > > > > in the context of the stage-1-only SMMU being emulated.
> > > > 
> > > > Maybe a list of supported TLBI commands via the hw_info uAPI?
> > > 
> > > I don't think it's all that difficult to implicitly support all commands
> > > that are valid for a stage-1-only SMMU, it just needs the right
> > > interface design to be capable of encoding them all completely and
> > > unambiguously. Coming back to the previous point about the address
> > > encoding, I think that means basing it more directly on the actual
> > > SMMUv3 commands, rather than on io-pgtable's abstraction of invalidation
> > > with SMMUv3 opcodes bolted on.
> > 
> > Yea, with the actual commands from the guest, the host can do
> > something with its supported commands, I think.
> 
> The one slightly fiddly case, of course, is CMD_SYNC, but I think that's
> just a matter for clear documentation of the expectations and behaviour.
What could be odd about CMD_SYNC?
Actually with QEMU, an ioctl for a CMD execution is initiated
by a CMD_PROD write trapped by the QEMU, then a CMD_SYNC only
triggers an IRQ in this setup.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-16 21:09                 ` Nicolin Chen
@ 2023-03-20  1:32                   ` Nicolin Chen
  2023-03-20 13:11                     ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20  1:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: jgg, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Thu, Mar 16, 2023 at 02:09:08PM -0700, Nicolin Chen wrote:
> On Thu, Mar 16, 2023 at 02:58:39PM +0000, Robin Murphy wrote:
> 
> > > > > > I really think UAPI should reflect the hardware and encode TG and TTL
> > > > > > directly. Especially since there's technically a flaw in the current
> > > > > > driver where we assume TTL in cases where it isn't actually known, thus
> > > > > > may potentially fail to invalidate level 2 block entries when removing a
> > > > > > level 1 table, since io-pgtable passes the level 3 granule in that case.
> > > > > 
> > > > > Do you mean something like hw_info forwarding pgsize_bitmap/tg
> > > > > to the guest? Or the other direction?
> > > > 
> > > > I mean if the interface wants to support range invalidations in a way
> > > > which works correctly, then it should ideally carry both the TG and TTL
> > > > fields from the guest command straight through to the host. If not, then
> > > > at the very least the host must always assume TTL=0, because it cannot
> > > > correctly infer otherwise once the guest command's original intent has
> > > > been lost.
> > > 
> > > Oh, it's about hypervisor simply forwarding the entire CMD to
> > > the host side. Jason is suggesting a fast approach by letting
> > > host kernel read the CMDQ directly to get the raw CMD. Perhaps
> > > that would address this comments about TG/TTL too.
> > 
> > That did cross my mind, but given the usage model, having host userspace
> > give guest memory whose contents it can't control (unless it pauses the
> > whole VM on all CPUs) directly to the host kernel just seems to invite
> > more potential problems than necessary. Commands aren't big, so I think
> > it's fair to expect the VMM to marshal them into host memory, and save
> > the host kernel from ever having to reason about any races or other
> > emulation details which may exist between a VM and its VMM.
> 
> An invalidation ioctl is synchronously executed from the top
> level in QEMU when it traps any CMDQ_PROD write. So, either
> packing the fields of a command into a data structure or just
> forwarding the command directly, it seems to be the same for
> the matter of worrying about race conditions?
I think I misread your reply here :)
What you suggested is exactly forwarding the command v.s. host
reading guest's command queue memory.
Although I haven't fully got what Jason's "sorting" approach,
this could already simplify the data structure holding all the
fields, by passing a "__u64 cmds[2]" alone. A sample code:
+struct iommu_hwpt_invalidate_arm_smmuv3 {
+       struct iommu_iova_range range;
+       __u64 cmd[2];
+};
then...
+       cmd[0] = inv_info->cmd[0];
+       cmd[1] = inv_info->cmd[1];
+       switch (cmd[0] & 0xff) {
+       case CMDQ_OP_TLBI_NSNH_ALL:
+               cmd[0] &= ~0xffULL;
+               cmd[0] |= CMDQ_OP_TLBI_NH_ALL;
+               fallthrough;
+       case CMDQ_OP_TLBI_NH_VA:
+       case CMDQ_OP_TLBI_NH_VAA:
+       case CMDQ_OP_TLBI_NH_ALL:
+       case CMDQ_OP_TLBI_NH_ASID:
+               cmd[0] &= ~CMDQ_TLBI_0_VMID;
+               cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, smmu_domain->s2->s2_cfg.vmid);
+               arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, true);
+               break;
+       case CMDQ_OP_CFGI_CD:
+       case CMDQ_OP_CFGI_CD_ALL:
+               arm_smmu_sync_cd(smmu_domain,
+                                FIELD_GET(CMDQ_CFGI_0_SSID, cmd[0]), false);
+               break;
+       default:
+               return;
+       }
We could probably do a batch forwarding to if it's worthy?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20  1:32                   ` Nicolin Chen
@ 2023-03-20 13:11                     ` Jason Gunthorpe
  2023-03-20 15:28                       ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 13:11 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Sun, Mar 19, 2023 at 06:32:03PM -0700, Nicolin Chen wrote:
> +struct iommu_hwpt_invalidate_arm_smmuv3 {
> +       struct iommu_iova_range range;
what is this?
> +       __u64 cmd[2];
> +};
You still have to do something with the SID. We can't just allow any
un-validated SID value - the driver has to check the incoming SID
against allowed SIDs for this iommufd_ctx
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 13:11                     ` Jason Gunthorpe
@ 2023-03-20 15:28                       ` Nicolin Chen
  2023-03-20 16:01                         ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 15:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 10:11:54AM -0300, Jason Gunthorpe wrote:
> On Sun, Mar 19, 2023 at 06:32:03PM -0700, Nicolin Chen wrote:
> 
> > +struct iommu_hwpt_invalidate_arm_smmuv3 {
> > +       struct iommu_iova_range range;
> 
> what is this?
Not used. A copy-n-paste mistake :(
> 
> > +       __u64 cmd[2];
> > +};
> 
> You still have to do something with the SID. We can't just allow any
> un-validated SID value - the driver has to check the incoming SID
> against allowed SIDs for this iommufd_ctx
Hmm, that's something "missing" even in the current design.
Yet, most of the TLBI commands don't hold an SID field. So,
the hypervisor only trapping a queue write-pointer movement
cannot get the exact vSID for a TLBI command. What our QEMU
code currently does is simply broadcasting all the devices
on the list of attaching devices to the vSMMU, which means
that such an enforcement in the kernel would basically just
allow any vSID (device) that's attached to the domain?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 15:28                       ` Nicolin Chen
@ 2023-03-20 16:01                         ` Jason Gunthorpe
  2023-03-20 16:35                           ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 16:01 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 08:28:05AM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 10:11:54AM -0300, Jason Gunthorpe wrote:
> > On Sun, Mar 19, 2023 at 06:32:03PM -0700, Nicolin Chen wrote:
> > 
> > > +struct iommu_hwpt_invalidate_arm_smmuv3 {
> > > +       struct iommu_iova_range range;
> > 
> > what is this?
> 
> Not used. A copy-n-paste mistake :(
> 
> > 
> > > +       __u64 cmd[2];
> > > +};
> > 
> > You still have to do something with the SID. We can't just allow any
> > un-validated SID value - the driver has to check the incoming SID
> > against allowed SIDs for this iommufd_ctx
> 
> Hmm, that's something "missing" even in the current design.
> 
> Yet, most of the TLBI commands don't hold an SID field. So,
> the hypervisor only trapping a queue write-pointer movement
> cannot get the exact vSID for a TLBI command. What our QEMU
> code currently does is simply broadcasting all the devices
> on the list of attaching devices to the vSMMU, which means
> that such an enforcement in the kernel would basically just
> allow any vSID (device) that's attached to the domain?
SID is only used for managing the ATC as far as I know. It is because
the ASID doesn't convey enough information to determine what PCI RID
to generate an ATC invalidation for.
We shouldn't be broadcasting for efficiency, at least it should not be
baked into the API.
You need to know what devices the vSID is targetting ang issues
invalidations only for those devices.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 16:01                         ` Jason Gunthorpe
@ 2023-03-20 16:35                           ` Nicolin Chen
  2023-03-20 18:07                             ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 16:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 01:01:53PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 20, 2023 at 08:28:05AM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 10:11:54AM -0300, Jason Gunthorpe wrote:
> > > On Sun, Mar 19, 2023 at 06:32:03PM -0700, Nicolin Chen wrote:
> > > 
> > > > +struct iommu_hwpt_invalidate_arm_smmuv3 {
> > > > +       struct iommu_iova_range range;
> > > 
> > > what is this?
> > 
> > Not used. A copy-n-paste mistake :(
> > 
> > > 
> > > > +       __u64 cmd[2];
> > > > +};
> > > 
> > > You still have to do something with the SID. We can't just allow any
> > > un-validated SID value - the driver has to check the incoming SID
> > > against allowed SIDs for this iommufd_ctx
> > 
> > Hmm, that's something "missing" even in the current design.
> > 
> > Yet, most of the TLBI commands don't hold an SID field. So,
> > the hypervisor only trapping a queue write-pointer movement
> > cannot get the exact vSID for a TLBI command. What our QEMU
> > code currently does is simply broadcasting all the devices
> > on the list of attaching devices to the vSMMU, which means
> > that such an enforcement in the kernel would basically just
> > allow any vSID (device) that's attached to the domain?
> 
> SID is only used for managing the ATC as far as I know. It is because
> the ASID doesn't convey enough information to determine what PCI RID
> to generate an ATC invalidation for.
Yes. And a CD invalidation too, though the kernel eventually
would do a broadcast to all devices that are using the same
CD.
> We shouldn't be broadcasting for efficiency, at least it should not be
> baked into the API.
> 
> You need to know what devices the vSID is targetting ang issues
> invalidations only for those devices.
I agree with that, yet cannot think of a solution to achieve
that out of vSID. QEMU code by means of emulating a physical
SMMU only reads the commands from the queue, without knowing
which device (vSID) actually sent these commands.
I probably can do something to the solution that is doing an
entire broadcasting, with the ASID fields from the commands,
yet it'd only improve the situation by having an ASID-based
broadcasting...
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 16:35                           ` Nicolin Chen
@ 2023-03-20 18:07                             ` Jason Gunthorpe
  2023-03-20 20:46                               ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 18:07 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 09:35:20AM -0700, Nicolin Chen wrote:
> > You need to know what devices the vSID is targetting ang issues
> > invalidations only for those devices.
> 
> I agree with that, yet cannot think of a solution to achieve
> that out of vSID. QEMU code by means of emulating a physical
> SMMU only reads the commands from the queue, without knowing
> which device (vSID) actually sent these commands.
Huh?
CMD_ATC_INV has the SID
Other commands have the ASID.
You never need to cross an ASID to a SID or vice versa.
If the guest is aware of ATS it will issue CMD_ATC_INV with vSIDs, and
the hypervisor just needs to convert vSID to pSID.
Otherwise vSID doesn't matter because it isn't used in the invalidation
API and you are just handling ASIDs that only need the VM_ID scope
applied.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 18:07                             ` Jason Gunthorpe
@ 2023-03-20 20:46                               ` Nicolin Chen
  2023-03-20 22:14                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-20 20:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 20, 2023 at 09:35:20AM -0700, Nicolin Chen wrote:
> 
> > > You need to know what devices the vSID is targetting ang issues
> > > invalidations only for those devices.
> > 
> > I agree with that, yet cannot think of a solution to achieve
> > that out of vSID. QEMU code by means of emulating a physical
> > SMMU only reads the commands from the queue, without knowing
> > which device (vSID) actually sent these commands.
> 
> Huh?
> 
> CMD_ATC_INV has the SID
> 
> Other commands have the ASID.
> 
> You never need to cross an ASID to a SID or vice versa.
> 
> If the guest is aware of ATS it will issue CMD_ATC_INV with vSIDs, and
> the hypervisor just needs to convert vSID to pSID.
> 
> Otherwise vSID doesn't matter because it isn't used in the invalidation
> API and you are just handling ASIDs that only need the VM_ID scope
> applied.
Yea, I was thinking of your point (at the top) how we could
ensure if an invalidation is targeting a correct vSID. So,
that narrative was only about CMD_ATC_INV...
Actually, we don't forward CMD_ATC_INV in QEMU. In another
thread, Kevin also remarked whether we need to support that
in the host or not. And I plan to drop CMD_ATC_INV from the
list of cache_invalidate_user(), following his comments and
the QEMU situation. Our uAPI, either forwarding the commands
or a package of queue info, should be able to cover this in
the future whenever we think it's required.
Combining the two parts above, we probably don't need to know
at this moment which vSID an invalidation is targeting, nor
to only allow it to execute for those devices, since the rest
of commands are all ASID based.
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 20:46                               ` Nicolin Chen
@ 2023-03-20 22:14                                 ` Jason Gunthorpe
  2023-03-22  5:14                                   ` Nicolin Chen
  0 siblings, 1 reply; 165+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 22:14 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 01:46:52PM -0700, Nicolin Chen wrote:
> On Mon, Mar 20, 2023 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 20, 2023 at 09:35:20AM -0700, Nicolin Chen wrote:
> > 
> > > > You need to know what devices the vSID is targetting ang issues
> > > > invalidations only for those devices.
> > > 
> > > I agree with that, yet cannot think of a solution to achieve
> > > that out of vSID. QEMU code by means of emulating a physical
> > > SMMU only reads the commands from the queue, without knowing
> > > which device (vSID) actually sent these commands.
> > 
> > Huh?
> > 
> > CMD_ATC_INV has the SID
> > 
> > Other commands have the ASID.
> > 
> > You never need to cross an ASID to a SID or vice versa.
> > 
> > If the guest is aware of ATS it will issue CMD_ATC_INV with vSIDs, and
> > the hypervisor just needs to convert vSID to pSID.
> > 
> > Otherwise vSID doesn't matter because it isn't used in the invalidation
> > API and you are just handling ASIDs that only need the VM_ID scope
> > applied.
> 
> Yea, I was thinking of your point (at the top) how we could
> ensure if an invalidation is targeting a correct vSID. So,
> that narrative was only about CMD_ATC_INV...
> 
> Actually, we don't forward CMD_ATC_INV in QEMU. In another
> thread, Kevin also remarked whether we need to support that
> in the host or not. And I plan to drop CMD_ATC_INV from the
> list of cache_invalidate_user(), following his comments and
> the QEMU situation. Our uAPI, either forwarding the commands
> or a package of queue info, should be able to cover this in
> the future whenever we think it's required.
Something has to generate CMD_ATC_INV.
How do you plan to generate this from the hypervisor based on ASID
invalidations?
The hypervisor doesn't know what ASIDs are connected to what SIDs to
generate the ATC?
Intel is different, they know what devices the vDID is connected to,
so when they get a vDID invalidation they can elaborate it into a ATC
invalidation. ARM doesn't have that information.
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-20 22:14                                 ` Jason Gunthorpe
@ 2023-03-22  5:14                                   ` Nicolin Chen
  2023-03-24  8:55                                     ` Tian, Kevin
  0 siblings, 1 reply; 165+ messages in thread
From: Nicolin Chen @ 2023-03-22  5:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, will, eric.auger, kevin.tian, baolu.lu, joro,
	shameerali.kolothum.thodi, jean-philippe, linux-arm-kernel, iommu,
	linux-kernel
On Mon, Mar 20, 2023 at 07:14:17PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 20, 2023 at 01:46:52PM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Mar 20, 2023 at 09:35:20AM -0700, Nicolin Chen wrote:
> > > 
> > > > > You need to know what devices the vSID is targetting ang issues
> > > > > invalidations only for those devices.
> > > > 
> > > > I agree with that, yet cannot think of a solution to achieve
> > > > that out of vSID. QEMU code by means of emulating a physical
> > > > SMMU only reads the commands from the queue, without knowing
> > > > which device (vSID) actually sent these commands.
> > > 
> > > Huh?
> > > 
> > > CMD_ATC_INV has the SID
> > > 
> > > Other commands have the ASID.
> > > 
> > > You never need to cross an ASID to a SID or vice versa.
> > > 
> > > If the guest is aware of ATS it will issue CMD_ATC_INV with vSIDs, and
> > > the hypervisor just needs to convert vSID to pSID.
> > > 
> > > Otherwise vSID doesn't matter because it isn't used in the invalidation
> > > API and you are just handling ASIDs that only need the VM_ID scope
> > > applied.
> > 
> > Yea, I was thinking of your point (at the top) how we could
> > ensure if an invalidation is targeting a correct vSID. So,
> > that narrative was only about CMD_ATC_INV...
> > 
> > Actually, we don't forward CMD_ATC_INV in QEMU. In another
> > thread, Kevin also remarked whether we need to support that
> > in the host or not. And I plan to drop CMD_ATC_INV from the
> > list of cache_invalidate_user(), following his comments and
> > the QEMU situation. Our uAPI, either forwarding the commands
> > or a package of queue info, should be able to cover this in
> > the future whenever we think it's required.
> 
> Something has to generate CMD_ATC_INV.
>
> How do you plan to generate this from the hypervisor based on ASID
> invalidations?
>
> The hypervisor doesn't know what ASIDs are connected to what SIDs to
> generate the ATC?
> 
> Intel is different, they know what devices the vDID is connected to,
> so when they get a vDID invalidation they can elaborate it into a ATC
> invalidation. ARM doesn't have that information.
I see. Perhaps vSMMU still needs to forward CMD_ATC_INV. And,
as you suggested, it should go through a vSID sanity check by
the host handler. We can find the corresponding pSID to check
if the device is associated with the iommu_domain?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-22  5:14                                   ` Nicolin Chen
@ 2023-03-24  8:55                                     ` Tian, Kevin
  0 siblings, 0 replies; 165+ messages in thread
From: Tian, Kevin @ 2023-03-24  8:55 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Robin Murphy, will@kernel.org, eric.auger@redhat.com,
	baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 22, 2023 1:15 PM
> 
> >
> > Something has to generate CMD_ATC_INV.
> >
> > How do you plan to generate this from the hypervisor based on ASID
> > invalidations?
> >
> > The hypervisor doesn't know what ASIDs are connected to what SIDs to
> > generate the ATC?
> >
> > Intel is different, they know what devices the vDID is connected to,
> > so when they get a vDID invalidation they can elaborate it into a ATC
> > invalidation. ARM doesn't have that information.
> 
> I see. Perhaps vSMMU still needs to forward CMD_ATC_INV. And,
Ah that's quite a different story. 😊
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
- * RE: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-09 14:49   ` Robin Murphy
  2023-03-09 15:31     ` Jason Gunthorpe
  2023-03-10  3:51     ` Nicolin Chen
@ 2023-03-17  9:47     ` Tian, Kevin
  2023-03-17 14:16       ` Nicolin Chen
  2 siblings, 1 reply; 165+ messages in thread
From: Tian, Kevin @ 2023-03-17  9:47 UTC (permalink / raw)
  To: Robin Murphy, Nicolin Chen, jgg@nvidia.com, will@kernel.org
  Cc: eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Thursday, March 9, 2023 10:49 PM
> > +	case CMDQ_OP_ATC_INV:
> > +		ssid = inv_info->ssid;
> > +		iova = inv_info->range.start;
> > +		size = inv_info->range.last - inv_info->range.start + 1;
> > +		break;
> 
> Can we do any better than multiplying every single ATC_INV command, even
> for random bogus StreamIDs, into multiple commands across every physical
> device? In fact, I'm not entirely confident this isn't problematic, if
> the guest wishes to send invalidations for one device specifically while
> it's put some other device into a state where sending it a command would
> do something bad. At the very least, it's liable to be confusing if the
> guest sends a command for one StreamID but gets an error back for a
> different one.
> 
Or do we need support this cmd at all?
For vt-d we always implicitly invalidate ATC following a iotlb invalidation
request from userspace. Then vIOMMU just treats it as a nop in the
virtual queue.
IMHO a sane iommu driver should always invalidate both iotlb and atc
together. I'm not sure a valid usage where iotlb is invalidated while
atc is left with some stale mappings.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread 
- * Re: [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user
  2023-03-17  9:47     ` Tian, Kevin
@ 2023-03-17 14:16       ` Nicolin Chen
  0 siblings, 0 replies; 165+ messages in thread
From: Nicolin Chen @ 2023-03-17 14:16 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, jgg@nvidia.com, will@kernel.org,
	eric.auger@redhat.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com, jean-philippe@linaro.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
On Fri, Mar 17, 2023 at 09:47:47AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Robin Murphy <robin.murphy@arm.com>
> > Sent: Thursday, March 9, 2023 10:49 PM
> > > +   case CMDQ_OP_ATC_INV:
> > > +           ssid = inv_info->ssid;
> > > +           iova = inv_info->range.start;
> > > +           size = inv_info->range.last - inv_info->range.start + 1;
> > > +           break;
> >
> > Can we do any better than multiplying every single ATC_INV command, even
> > for random bogus StreamIDs, into multiple commands across every physical
> > device? In fact, I'm not entirely confident this isn't problematic, if
> > the guest wishes to send invalidations for one device specifically while
> > it's put some other device into a state where sending it a command would
> > do something bad. At the very least, it's liable to be confusing if the
> > guest sends a command for one StreamID but gets an error back for a
> > different one.
> >
> 
> Or do we need support this cmd at all?
> 
> For vt-d we always implicitly invalidate ATC following a iotlb invalidation
> request from userspace. Then vIOMMU just treats it as a nop in the
> virtual queue.
> 
> IMHO a sane iommu driver should always invalidate both iotlb and atc
> together. I'm not sure a valid usage where iotlb is invalidated while
> atc is left with some stale mappings.
vSMMU code in QEMU actually doesn't forward this command. So,
I guess that you are right about this support here and we may
just drop it.
Thanks!
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply	[flat|nested] 165+ messages in thread