All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
@ 2024-06-28  6:43 Vasant Hegde
  2024-06-28 12:23 ` Baolu Lu
  2024-06-28 13:03 ` Jason Gunthorpe
  0 siblings, 2 replies; 30+ messages in thread
From: Vasant Hegde @ 2024-06-28  6:43 UTC (permalink / raw)
  To: Joerg Roedel, Jason Gunthorpe, iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon, Robin Murphy, Baolu Lu

Hi All,

We are working on adding domain_alloc_paging() support in AMD driver and came
across below issue.

BACKGROUND:
============
- AMD IOMMU HW has two different page tables : V1 (host page table) and V2
(guest page table). Only V2 page table supports PASID and PRI features.

- With V2 page table we have an aliasing issue. Hence we added
per-device-domain-id when domain is configured with v2 page table. See upstream
commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
TLB aliasing issue")



PROBLEM :
=========
With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose best page
table based on device capabilities (V2 for PASID capable device and V1 page
table for rest of the devices).

But we would like to continue enforcing V1 page table for UNMANAGED domain. As
in terms of IOMMU caching, it performs better than V2 page table.

Also while adding SVA in AMD driver Jason mentioned that we should support PASID
with UNMANAGED domain. As I understand currently we don't have this feature in
upstream but we would like to support it in future. Keeping this use case also
in mind, we came up with below two options :

  1 - Pass domain type : domain_alloc_paging(dev, type)
   When we add PASID with UNMANAGED domain we need a way to differentiate the
domain 'type' (See below attach sample code)

  2 - Introduce new flag for domain_alloc_paging()
    Something like :
    #define DOMAIN_FLAG_UNMANAGED	0x01
    #define DOMAIN_FLAG_UNMANAGED_SVA	0x02

    domain_alloc_paging(dev, flag)
    For now we will have 'DOMAIN_PAGING_UNMANAGED'.

Please let us know which one is preferred -OR- is there any other better way to
handle this.

-Vasant


---------------
commit bdf17f429dfb76b5a9a5714045ce3f38a4ee98b3
Author: Vasant Hegde <vasant.hegde@amd.com>
Date:   Fri Jun 28 11:56:18 2024 +0530

    [RFC] iommu: domain_alloc_paging enhancement

    Prototype to show how domain_alloc_paging() looks like. This doesn't even
    compile as I have not modified other driver which has domain_alloc_paging()
    implementation.

    Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c2703599bb16..e401c2d9ab8e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2429,6 +2429,22 @@ static struct iommu_domain
*amd_iommu_domain_alloc(unsigned int type)
 	return domain;
 }

+static struct iommu_domain *amd_iommu_domain_alloc_paging(struct device *dev,
u32 type)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+	int pgtable = amd_iommu_pgtable;
+
+	/* TODO: Add SNP and other checks */
+
+	/* Force V1 page table for UNMANAGED domain */
+	if (type == IOMMU_DOMAIN_UNMANAGED)
+		pgtable = AMD_IOMMU_V1;
+	else if (dev_is_pci(dev) && pdev_pasid_supported(dev_data))
+		pgtable = AMD_IOMMU_V2;
+
+	return do_iommu_domain_alloc(IOMMU_DOMAIN_DMA, dev, 0, pgtable);
+}
+
 static struct iommu_domain *
 amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
 			    struct iommu_domain *parent,
@@ -2860,6 +2876,7 @@ static int amd_iommu_dev_disable_feature(struct device *dev,
 const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.domain_alloc = amd_iommu_domain_alloc,
+	.domain_alloc_paging = amd_iommu_domain_alloc_paging,
 	.domain_alloc_user = amd_iommu_domain_alloc_user,
 	.domain_alloc_sva = amd_iommu_domain_alloc_sva,
 	.probe_device = amd_iommu_probe_device,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9df7cc75c1bc..7fb95f90158e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1952,7 +1952,7 @@ static struct iommu_domain *__iommu_domain_alloc(const
struct iommu_ops *ops,
 	else if (alloc_type == IOMMU_DOMAIN_BLOCKED && ops->blocked_domain)
 		return ops->blocked_domain;
 	else if (type & __IOMMU_DOMAIN_PAGING && ops->domain_alloc_paging)
-		domain = ops->domain_alloc_paging(dev);
+		domain = ops->domain_alloc_paging(dev, type);
 	else if (ops->domain_alloc)
 		domain = ops->domain_alloc(alloc_type);
 	else
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 17b3f36ad843..d408eb70e086 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,9 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_BLOCKED	(0U)
 #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
+/* XXX: Introduce this type when we add PASID support for UNMANAGED domain */
+#define IOMMU_DOMAIN_UNMANAGED_SVA (__IOMMU_DOMAIN_PAGING |    \
+				    __IOMMU_DOMAIN_SVA)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API)
 #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
@@ -557,7 +560,7 @@ struct iommu_ops {
 	struct iommu_domain *(*domain_alloc_user)(
 		struct device *dev, u32 flags, struct iommu_domain *parent,
 		const struct iommu_user_data *user_data);
-	struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
+	struct iommu_domain *(*domain_alloc_paging)(struct device *dev, u32
iommu_domain_type);
 	struct iommu_domain *(*domain_alloc_sva)(struct device *dev,
 						 struct mm_struct *mm);


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28  6:43 [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver Vasant Hegde
@ 2024-06-28 12:23 ` Baolu Lu
  2024-06-28 13:06   ` Robin Murphy
  2024-06-28 14:50   ` Vasant Hegde
  2024-06-28 13:03 ` Jason Gunthorpe
  1 sibling, 2 replies; 30+ messages in thread
From: Baolu Lu @ 2024-06-28 12:23 UTC (permalink / raw)
  To: Vasant Hegde, Joerg Roedel, Jason Gunthorpe,
	iommu@lists.linux.dev
  Cc: baolu.lu, Suravee Suthikulpanit, Will Deacon, Robin Murphy

On 2024/6/28 14:43, Vasant Hegde wrote:
> We are working on adding domain_alloc_paging() support in AMD driver and came
> across below issue.
> 
> BACKGROUND:
> ============
> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
> (guest page table). Only V2 page table supports PASID and PRI features.
> 
> - With V2 page table we have an aliasing issue. Hence we added
> per-device-domain-id when domain is configured with v2 page table. See upstream
> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
> TLB aliasing issue")
> 
> 
> 
> PROBLEM :
> =========
> With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose best page
> table based on device capabilities (V2 for PASID capable device and V1 page
> table for rest of the devices).
> 
> But we would like to continue enforcing V1 page table for UNMANAGED domain. As
> in terms of IOMMU caching, it performs better than V2 page table.
> 
> Also while adding SVA in AMD driver Jason mentioned that we should support PASID
> with UNMANAGED domain. As I understand currently we don't have this feature in
> upstream but we would like to support it in future. Keeping this use case also
> in mind, we came up with below two options :
> 
>    1 - Pass domain type : domain_alloc_paging(dev, type)
>     When we add PASID with UNMANAGED domain we need a way to differentiate the
> domain 'type' (See below attach sample code)

The domain type is not enough for the iommu driver to differentiate
between device or PASID. For example, a domain of type UNMANAGED could
be attached to either a device or a PASID. Furthermore, SVA domain is
not a paging domain.

>    2 - Introduce new flag for domain_alloc_paging()
>      Something like :
>      #define DOMAIN_FLAG_UNMANAGED	0x01
>      #define DOMAIN_FLAG_UNMANAGED_SVA	0x02
> 
>      domain_alloc_paging(dev, flag)
>      For now we will have 'DOMAIN_PAGING_UNMANAGED'.

Add an allocation flag seems to work. It doesn't indicate any type of
domain. Instead, it indicates the required iommu feature.

Something like

#define DOMAIN_ALLOC_FLAG_PASID		0x1

means the allocated domain requires PASID to support on both device and
IOMMU. If the hardware lacks this support, it should return failure.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28  6:43 [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver Vasant Hegde
  2024-06-28 12:23 ` Baolu Lu
@ 2024-06-28 13:03 ` Jason Gunthorpe
  2024-06-28 17:49   ` Vasant Hegde
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-06-28 13:03 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On Fri, Jun 28, 2024 at 12:13:43PM +0530, Vasant Hegde wrote:
> Hi All,
> 
> We are working on adding domain_alloc_paging() support in AMD driver and came
> across below issue.
> 
> BACKGROUND:
> ============
> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
> (guest page table). Only V2 page table supports PASID and PRI features.
> 
> - With V2 page table we have an aliasing issue. Hence we added
> per-device-domain-id when domain is configured with v2 page table. See upstream
> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
> TLB aliasing issue")

Yes, but IIRC this is a shortcut to developing a proper packing
algorithm to optimize the IOTLB. HW like this that has aliasing issues
needs some more complex SW support to get optimal usage.

ie you can share DIDs if devices have a logically equivilant GCR3
table. Optimizing this is a SW problem inside the driver and should
not leak out to API.

> With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose best page
> table based on device capabilities (V2 for PASID capable device and V1 page
> table for rest of the devices).

Yes, this is correct.

> But we would like to continue enforcing V1 page table for UNMANAGED domain. As
> in terms of IOMMU caching, it performs better than V2 page table.

We are getting rid of UNMANAGED domains. And we are adding PASID
support to VFIO.

There is no reason VFIO should have a V1 domain by default and end up
with a non-working VFIO PASID API. That doesn't make any sense.

You need to actually explain when and why you need V1 page table
support in the VFIO context.

> Also while adding SVA in AMD driver Jason mentioned that we should support PASID
> with UNMANAGED domain. As I understand currently we don't have this feature in
> upstream but we would like to support it in future.

My patch series enables it on SMMUv3 and Intel VT-d already supports
it. Only AMD is missing the funtionality, which is why I keep asking
you to structure things properly so it gets it too :)

> Please let us know which one is preferred -OR- is there any other better way to
> handle this.

Neither is really going to work. VFIO will have to assume the user
will want to use the PASID API and will always request a PASID capable
domain anyhow.

We already have a path where the VFIO userspace can request a v1
domain by using the NESTING_PARENT flags during user domain
allocation.

If it is really important and logical we could also add a NO_PASID
flag to hint to the driver that userspace doesn't want to use PASID in
combination with this domain. That could also trigger v1.

But I don't see any option here that doesn't involve userspace itself
making a request and indicating it wants a degrated VFIO
functionality.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 12:23 ` Baolu Lu
@ 2024-06-28 13:06   ` Robin Murphy
  2024-06-28 17:08     ` Vasant Hegde
  2024-06-28 14:50   ` Vasant Hegde
  1 sibling, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2024-06-28 13:06 UTC (permalink / raw)
  To: Baolu Lu, Vasant Hegde, Joerg Roedel, Jason Gunthorpe,
	iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon

On 28/06/2024 1:23 pm, Baolu Lu wrote:
> On 2024/6/28 14:43, Vasant Hegde wrote:
>> We are working on adding domain_alloc_paging() support in AMD driver 
>> and came
>> across below issue.
>>
>> BACKGROUND:
>> ============
>> - AMD IOMMU HW has two different page tables : V1 (host page table) 
>> and V2
>> (guest page table). Only V2 page table supports PASID and PRI features.
>>
>> - With V2 page table we have an aliasing issue. Hence we added
>> per-device-domain-id when domain is configured with v2 page table. See 
>> upstream
>> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix 
>> potential
>> TLB aliasing issue")
>>
>>
>>
>> PROBLEM :
>> =========
>> With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose 
>> best page
>> table based on device capabilities (V2 for PASID capable device and V1 
>> page
>> table for rest of the devices).
>>
>> But we would like to continue enforcing V1 page table for UNMANAGED 
>> domain. As
>> in terms of IOMMU caching, it performs better than V2 page table.
>>
>> Also while adding SVA in AMD driver Jason mentioned that we should 
>> support PASID
>> with UNMANAGED domain. As I understand currently we don't have this 
>> feature in
>> upstream but we would like to support it in future. Keeping this use 
>> case also
>> in mind, we came up with below two options :
>>
>>    1 - Pass domain type : domain_alloc_paging(dev, type)
>>     When we add PASID with UNMANAGED domain we need a way to 
>> differentiate the
>> domain 'type' (See below attach sample code)
> 
> The domain type is not enough for the iommu driver to differentiate
> between device or PASID. For example, a domain of type UNMANAGED could
> be attached to either a device or a PASID. Furthermore, SVA domain is
> not a paging domain.
> 
>>    2 - Introduce new flag for domain_alloc_paging()
>>      Something like :
>>      #define DOMAIN_FLAG_UNMANAGED    0x01
>>      #define DOMAIN_FLAG_UNMANAGED_SVA    0x02
>>
>>      domain_alloc_paging(dev, flag)
>>      For now we will have 'DOMAIN_PAGING_UNMANAGED'.
> 
> Add an allocation flag seems to work. It doesn't indicate any type of
> domain. Instead, it indicates the required iommu feature.
> 
> Something like
> 
> #define DOMAIN_ALLOC_FLAG_PASID        0x1
> 
> means the allocated domain requires PASID to support on both device and
> IOMMU. If the hardware lacks this support, it should return failure.

Yeah, we could do with something sufficiently general, as we'll also 
have a need for passing pagetable-format options through to drivers to 
be able to replace iommu_set_pgtable_quirks().

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 12:23 ` Baolu Lu
  2024-06-28 13:06   ` Robin Murphy
@ 2024-06-28 14:50   ` Vasant Hegde
  2024-06-28 15:34     ` Jason Gunthorpe
  1 sibling, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-06-28 14:50 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Jason Gunthorpe, iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon, Robin Murphy

Hi Lu,


On 6/28/2024 5:53 PM, Baolu Lu wrote:
> On 2024/6/28 14:43, Vasant Hegde wrote:
>> We are working on adding domain_alloc_paging() support in AMD driver and came
>> across below issue.
>>
>> BACKGROUND:
>> ============
>> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
>> (guest page table). Only V2 page table supports PASID and PRI features.
>>
>> - With V2 page table we have an aliasing issue. Hence we added
>> per-device-domain-id when domain is configured with v2 page table. See upstream
>> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
>> TLB aliasing issue")
>>
>>
>>
>> PROBLEM :
>> =========
>> With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose best page
>> table based on device capabilities (V2 for PASID capable device and V1 page
>> table for rest of the devices).
>>
>> But we would like to continue enforcing V1 page table for UNMANAGED domain. As
>> in terms of IOMMU caching, it performs better than V2 page table.
>>
>> Also while adding SVA in AMD driver Jason mentioned that we should support PASID
>> with UNMANAGED domain. As I understand currently we don't have this feature in
>> upstream but we would like to support it in future. Keeping this use case also
>> in mind, we came up with below two options :
>>
>>    1 - Pass domain type : domain_alloc_paging(dev, type)
>>     When we add PASID with UNMANAGED domain we need a way to differentiate the
>> domain 'type' (See below attach sample code)
> 
> The domain type is not enough for the iommu driver to differentiate
> between device or PASID. For example, a domain of type UNMANAGED could
> be attached to either a device or a PASID. Furthermore, SVA domain is
> not a paging domain.
> 
>>    2 - Introduce new flag for domain_alloc_paging()
>>      Something like :
>>      #define DOMAIN_FLAG_UNMANAGED    0x01
>>      #define DOMAIN_FLAG_UNMANAGED_SVA    0x02
>>
>>      domain_alloc_paging(dev, flag)
>>      For now we will have 'DOMAIN_PAGING_UNMANAGED'.
> 
> Add an allocation flag seems to work. It doesn't indicate any type of
> domain. Instead, it indicates the required iommu feature.
> 
> Something like
> 
> #define DOMAIN_ALLOC_FLAG_PASID        0x1
> 
> means the allocated domain requires PASID to support on both device and
> IOMMU. If the hardware lacks this support, it should return failure.

Sure. Flag is fine for us. We can do
	domain_alloc_paging(dev, flag)

But that means:
  - For IOMMU_DOMAIN_DMA[_FQ] : pass DOMAIN_ALLOC_FLAG_PASID
  - DOMAIN_PAGING_UNMANAGED  : *No* flags

.. and when PASID with UNMANAGED domain support is added, we should pass
ALLOC_FLAG_PASID.

Is this fine for other drivers? Do you see any issue?

-Vasant



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 14:50   ` Vasant Hegde
@ 2024-06-28 15:34     ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2024-06-28 15:34 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Baolu Lu, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy

On Fri, Jun 28, 2024 at 08:20:50PM +0530, Vasant Hegde wrote:

> > means the allocated domain requires PASID to support on both device and
> > IOMMU. If the hardware lacks this support, it should return failure.
> 
> Sure. Flag is fine for us. We can do
> 	domain_alloc_paging(dev, flag)
> 
> But that means:
>   - For IOMMU_DOMAIN_DMA[_FQ] : pass DOMAIN_ALLOC_FLAG_PASID
>   - DOMAIN_PAGING_UNMANAGED  : *No* flags

As I said, VFIO would have to set the flag once it gets PASID support,
which is likely to happen in a few months.

So this won't solve the problem you stated of poor DID caching.

IMHO if you want to optimize the DID assignment then that is a
algorithm contained in the driver and we don't need to disturb any of
the core code or uAPI for you to make that optimization.

JAson

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 13:06   ` Robin Murphy
@ 2024-06-28 17:08     ` Vasant Hegde
  2024-06-28 17:58       ` Robin Murphy
  0 siblings, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-06-28 17:08 UTC (permalink / raw)
  To: Robin Murphy, Baolu Lu, Joerg Roedel, Jason Gunthorpe,
	iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon

Hi Robin,

On 6/28/2024 6:36 PM, Robin Murphy wrote:
> On 28/06/2024 1:23 pm, Baolu Lu wrote:
>> On 2024/6/28 14:43, Vasant Hegde wrote:
>>> We are working on adding domain_alloc_paging() support in AMD driver and came
>>> across below issue.
>>>

.../...

>> Add an allocation flag seems to work. It doesn't indicate any type of
>> domain. Instead, it indicates the required iommu feature.
>>
>> Something like
>>
>> #define DOMAIN_ALLOC_FLAG_PASID        0x1
>>
>> means the allocated domain requires PASID to support on both device and
>> IOMMU. If the hardware lacks this support, it should return failure.
> 
> Yeah, we could do with something sufficiently general, as we'll also have a need
> for passing pagetable-format options through to drivers to be able to replace
> iommu_set_pgtable_quirks().

Sorry. I doubt I understood it completely. You mean to say:
Enhance iommu_domain_alloc() such that caller can pass any special quirks they
need and then pass those quirk flag along with the other flags we need (like
PASID support or not) to indivisual driver? So that at the end we are remove
ops->set_pgtable_quirks().

Is that what you are thinking or something else?

-Vasant


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 13:03 ` Jason Gunthorpe
@ 2024-06-28 17:49   ` Vasant Hegde
  2024-06-28 18:04     ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-06-28 17:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

Hi Jason,

On 6/28/2024 6:33 PM, Jason Gunthorpe wrote:
> On Fri, Jun 28, 2024 at 12:13:43PM +0530, Vasant Hegde wrote:
>> Hi All,
>>
>> We are working on adding domain_alloc_paging() support in AMD driver and came
>> across below issue.
>>
>> BACKGROUND:
>> ============
>> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
>> (guest page table). Only V2 page table supports PASID and PRI features.
>>
>> - With V2 page table we have an aliasing issue. Hence we added
>> per-device-domain-id when domain is configured with v2 page table. See upstream
>> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
>> TLB aliasing issue")
> 
> Yes, but IIRC this is a shortcut to developing a proper packing
> algorithm to optimize the IOTLB. HW like this that has aliasing issues
> needs some more complex SW support to get optimal usage.
> 
> ie you can share DIDs if devices have a logically equivilant GCR3
> table. Optimizing this is a SW problem inside the driver and should
> not leak out to API.

Its not just SW issue. With V1 page table HW supports variable page size. (not
just fixed 4k, 2M and 1G). So in terms of HW cache management, V1 page table is
better.

Another issue is with VFIO device passthrough and mixed device passthrough (few
w/ PASID and few w/o PASID), type of domain we endup allocating is depends on
the order in which VFIO requested for domain allocation. So its not deterministic.

> 
>> With iommu_ops->domain_alloc_paging(dev) API AMD driver will chose best page
>> table based on device capabilities (V2 for PASID capable device and V1 page
>> table for rest of the devices).
> 
> Yes, this is correct.
> 
>> But we would like to continue enforcing V1 page table for UNMANAGED domain. As
>> in terms of IOMMU caching, it performs better than V2 page table.
> 
> We are getting rid of UNMANAGED domains. And we are adding PASID
> support to VFIO.
> 
> There is no reason VFIO should have a V1 domain by default and end up
> with a non-working VFIO PASID API. That doesn't make any sense.
> 
> You need to actually explain when and why you need V1 page table
> support in the VFIO context.

Adding further to above explanation, driver should have enough information to
decide page table type.

> 
>> Also while adding SVA in AMD driver Jason mentioned that we should support PASID
>> with UNMANAGED domain. As I understand currently we don't have this feature in
>> upstream but we would like to support it in future.
> 
> My patch series enables it on SMMUv3 and Intel VT-d already supports
> it. Only AMD is missing the funtionality, which is why I keep asking
> you to structure things properly so it gets it too :)

I have not looked into this series yet. We will look into it after fixing domain
allocation path and few other things.

By the way, can you point me to your series please?

> 
>> Please let us know which one is preferred -OR- is there any other better way to
>> handle this.
> 
> Neither is really going to work. VFIO will have to assume the user
> will want to use the PASID API and will always request a PASID capable
> domain anyhow.

Even to make PASID support with VFIO, some way it should communicate the driver
saying like "I need PASID capable domain". So that driver can allocate with
right page table type.

> 
> We already have a path where the VFIO userspace can request a v1
> domain by using the NESTING_PARENT flags during user domain
> allocation.

That's with domain_alloc_user() API right? As I understand that should work fine
for AMD driver.

> 
> If it is really important and logical we could also add a NO_PASID
> flag to hint to the driver that userspace doesn't want to use PASID in
> combination with this domain. That could also trigger v1.
> 
> But I don't see any option here that doesn't involve userspace itself
> making a request and indicating it wants a degrated VFIO
> functionality.

That's along an option. We can explore it later.

-Vasant


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 17:08     ` Vasant Hegde
@ 2024-06-28 17:58       ` Robin Murphy
  2024-07-01 10:28         ` Vasant Hegde
  0 siblings, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2024-06-28 17:58 UTC (permalink / raw)
  To: Vasant Hegde, Baolu Lu, Joerg Roedel, Jason Gunthorpe,
	iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon

On 28/06/2024 6:08 pm, Vasant Hegde wrote:
> Hi Robin,
> 
> On 6/28/2024 6:36 PM, Robin Murphy wrote:
>> On 28/06/2024 1:23 pm, Baolu Lu wrote:
>>> On 2024/6/28 14:43, Vasant Hegde wrote:
>>>> We are working on adding domain_alloc_paging() support in AMD driver and came
>>>> across below issue.
>>>>
> 
> .../...
> 
>>> Add an allocation flag seems to work. It doesn't indicate any type of
>>> domain. Instead, it indicates the required iommu feature.
>>>
>>> Something like
>>>
>>> #define DOMAIN_ALLOC_FLAG_PASID        0x1
>>>
>>> means the allocated domain requires PASID to support on both device and
>>> IOMMU. If the hardware lacks this support, it should return failure.
>>
>> Yeah, we could do with something sufficiently general, as we'll also have a need
>> for passing pagetable-format options through to drivers to be able to replace
>> iommu_set_pgtable_quirks().
> 
> Sorry. I doubt I understood it completely. You mean to say:
> Enhance iommu_domain_alloc() such that caller can pass any special quirks they
> need and then pass those quirk flag along with the other flags we need (like
> PASID support or not) to indivisual driver? So that at the end we are remove
> ops->set_pgtable_quirks().
> 
> Is that what you are thinking or something else?

Basically, what set_pgtable_quirks does currently will need to be known 
at domain_alloc_paging time in order to convert arm-smmu to the latter 
properly. You don't need to worry about the public interface for it just 
yet, but if we're going to start extending the driver-level interface 
already then it would be good to plan ahead for what other domain 
properties we may want to accommodate callers requesting. The things we 
have as io-pgtable quirks today are definite ones (and those would 
happen to be OK as simple flags), but it's always seemed like a good 
idea for callers to be able to indicate their preference for other 
things like VA size (or even full domain geometry), even perhaps 
specific io-pgtable formats, such that the IOMMU driver can make the 
most effective choice about what it actually allocates.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 17:49   ` Vasant Hegde
@ 2024-06-28 18:04     ` Jason Gunthorpe
  2024-07-01 10:48       ` Vasant Hegde
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-06-28 18:04 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On Fri, Jun 28, 2024 at 11:19:23PM +0530, Vasant Hegde wrote:
> Hi Jason,
> 
> On 6/28/2024 6:33 PM, Jason Gunthorpe wrote:
> > On Fri, Jun 28, 2024 at 12:13:43PM +0530, Vasant Hegde wrote:
> >> Hi All,
> >>
> >> We are working on adding domain_alloc_paging() support in AMD driver and came
> >> across below issue.
> >>
> >> BACKGROUND:
> >> ============
> >> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
> >> (guest page table). Only V2 page table supports PASID and PRI features.
> >>
> >> - With V2 page table we have an aliasing issue. Hence we added
> >> per-device-domain-id when domain is configured with v2 page table. See upstream
> >> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
> >> TLB aliasing issue")
> > 
> > Yes, but IIRC this is a shortcut to developing a proper packing
> > algorithm to optimize the IOTLB. HW like this that has aliasing issues
> > needs some more complex SW support to get optimal usage.
> > 
> > ie you can share DIDs if devices have a logically equivilant GCR3
> > table. Optimizing this is a SW problem inside the driver and should
> > not leak out to API.
> 
> Its not just SW issue. With V1 page table HW supports variable page size. (not
> just fixed 4k, 2M and 1G). So in terms of HW cache management, V1 page table is
> better.

Okay, that does make alot of sense.

> Another issue is with VFIO device passthrough and mixed device passthrough (few
> w/ PASID and few w/o PASID), type of domain we endup allocating is depends on
> the order in which VFIO requested for domain allocation. So its not deterministic.

Yes, VFIO is not good about optimizing disjoint domain types to
minimize domain requirements. This probably does need some more
work. You have the other problem too where if you attach the non-pasid
device first then pasid will be blocked and this is not expected
either.

> By the way, can you point me to your series please?

https://lore.kernel.org/linux-iommu/0-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com/
 
> > 
> >> Please let us know which one is preferred -OR- is there any other better way to
> >> handle this.
> > 
> > Neither is really going to work. VFIO will have to assume the user
> > will want to use the PASID API and will always request a PASID capable
> > domain anyhow.
> 
> Even to make PASID support with VFIO, some way it should communicate the driver
> saying like "I need PASID capable domain". So that driver can allocate with
> right page table type.

Well, that is already implicit in the fact it asked for the domain from a
PASID capable device.

What we perhaps need is for VFIO to have some way to evaluate a lot of
devices together and decide on the best domain configuration for the
full set. It prpbably makes little sense to have a v1 page table and
then a copy with a v2 page table, for the PASID device.

> > We already have a path where the VFIO userspace can request a v1
> > domain by using the NESTING_PARENT flags during user domain
> > allocation.
> 
> That's with domain_alloc_user() API right? As I understand that should work fine
> for AMD driver.

Yes I expect so.
 
> > But I don't see any option here that doesn't involve userspace itself
> > making a request and indicating it wants a degrated VFIO
> > functionality.
> 
> That's along an option. We can explore it later.

But there is no later.

Your said your desire is to get a v1 domain even if the device
supports PASID, and VFIO will get PASID support likely before you post
your patches for this. Yi's work looks almost done to me.

Then VFIO will just request the V2 domain anyhow and you are right
back to the starting problem again.

Given your remarks it may make sense that VIFO disable PASID support
by default so that by default AMD does not have a performance
regression. You should start discussing this with Alex in Yi's series
to come to a decision. It makes sense to me at least.

If that is the choice then the idea of adding some flags/etc would
work well.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 17:58       ` Robin Murphy
@ 2024-07-01 10:28         ` Vasant Hegde
  0 siblings, 0 replies; 30+ messages in thread
From: Vasant Hegde @ 2024-07-01 10:28 UTC (permalink / raw)
  To: Robin Murphy, Baolu Lu, Joerg Roedel, Jason Gunthorpe,
	iommu@lists.linux.dev
  Cc: Suravee Suthikulpanit, Will Deacon

Hi Robin,


On 6/28/2024 11:28 PM, Robin Murphy wrote:
> On 28/06/2024 6:08 pm, Vasant Hegde wrote:
>> Hi Robin,
>>
>> On 6/28/2024 6:36 PM, Robin Murphy wrote:
>>> On 28/06/2024 1:23 pm, Baolu Lu wrote:
>>>> On 2024/6/28 14:43, Vasant Hegde wrote:
>>>>> We are working on adding domain_alloc_paging() support in AMD driver and came
>>>>> across below issue.
>>>>>
>>
>> .../...
>>
>>>> Add an allocation flag seems to work. It doesn't indicate any type of
>>>> domain. Instead, it indicates the required iommu feature.
>>>>
>>>> Something like
>>>>
>>>> #define DOMAIN_ALLOC_FLAG_PASID        0x1
>>>>
>>>> means the allocated domain requires PASID to support on both device and
>>>> IOMMU. If the hardware lacks this support, it should return failure.
>>>
>>> Yeah, we could do with something sufficiently general, as we'll also have a need
>>> for passing pagetable-format options through to drivers to be able to replace
>>> iommu_set_pgtable_quirks().
>>
>> Sorry. I doubt I understood it completely. You mean to say:
>> Enhance iommu_domain_alloc() such that caller can pass any special quirks they
>> need and then pass those quirk flag along with the other flags we need (like
>> PASID support or not) to indivisual driver? So that at the end we are remove
>> ops->set_pgtable_quirks().
>>
>> Is that what you are thinking or something else?
> 
> Basically, what set_pgtable_quirks does currently will need to be known at
> domain_alloc_paging time in order to convert arm-smmu to the latter properly.
> You don't need to worry about the public interface for it just yet, but if we're
> going to start extending the driver-level interface already then it would be
> good to plan ahead for what other domain properties we may want to accommodate
> callers requesting. The things we have as io-pgtable quirks today are definite
> ones (and those would happen to be OK as simple flags), but it's always seemed
> like a good idea for callers to be able to indicate their preference for other
> things like VA size (or even full domain geometry), even perhaps specific
> io-pgtable formats, such that the IOMMU driver can make the most effective
> choice about what it actually allocates.

Thanks for explaining. It does make sense to pass enough details to individual
drivers.

-Vasant

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-06-28 18:04     ` Jason Gunthorpe
@ 2024-07-01 10:48       ` Vasant Hegde
  2024-07-01 17:26         ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-07-01 10:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

Joson,


On 6/28/2024 11:34 PM, Jason Gunthorpe wrote:
> On Fri, Jun 28, 2024 at 11:19:23PM +0530, Vasant Hegde wrote:
>> Hi Jason,
>>
>> On 6/28/2024 6:33 PM, Jason Gunthorpe wrote:
>>> On Fri, Jun 28, 2024 at 12:13:43PM +0530, Vasant Hegde wrote:
>>>> Hi All,
>>>>
>>>> We are working on adding domain_alloc_paging() support in AMD driver and came
>>>> across below issue.
>>>>
>>>> BACKGROUND:
>>>> ============
>>>> - AMD IOMMU HW has two different page tables : V1 (host page table) and V2
>>>> (guest page table). Only V2 page table supports PASID and PRI features.
>>>>
>>>> - With V2 page table we have an aliasing issue. Hence we added
>>>> per-device-domain-id when domain is configured with v2 page table. See upstream
>>>> commit 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential
>>>> TLB aliasing issue")
>>>
>>> Yes, but IIRC this is a shortcut to developing a proper packing
>>> algorithm to optimize the IOTLB. HW like this that has aliasing issues
>>> needs some more complex SW support to get optimal usage.
>>>
>>> ie you can share DIDs if devices have a logically equivilant GCR3
>>> table. Optimizing this is a SW problem inside the driver and should
>>> not leak out to API.
>>
>> Its not just SW issue. With V1 page table HW supports variable page size. (not
>> just fixed 4k, 2M and 1G). So in terms of HW cache management, V1 page table is
>> better.
> 
> Okay, that does make alot of sense.
> 
>> Another issue is with VFIO device passthrough and mixed device passthrough (few
>> w/ PASID and few w/o PASID), type of domain we endup allocating is depends on
>> the order in which VFIO requested for domain allocation. So its not deterministic.
> 
> Yes, VFIO is not good about optimizing disjoint domain types to
> minimize domain requirements. This probably does need some more
> work. You have the other problem too where if you attach the non-pasid
> device first then pasid will be blocked and this is not expected
> either.

Right. If we solve the domain allocation issue, then we can fix our attach
device path to solve this (Yes. It needs code change. But its doable).

> 
>> By the way, can you point me to your series please?
> 
> https://lore.kernel.org/linux-iommu/0-v9-5cd718286059+79186-smmuv3_newapi_p2b_jgg@nvidia.com/

Thanks.

>  
>>>
>>>> Please let us know which one is preferred -OR- is there any other better way to
>>>> handle this.
>>>
>>> Neither is really going to work. VFIO will have to assume the user
>>> will want to use the PASID API and will always request a PASID capable
>>> domain anyhow.
>>
>> Even to make PASID support with VFIO, some way it should communicate the driver
>> saying like "I need PASID capable domain". So that driver can allocate with
>> right page table type.
> 
> Well, that is already implicit in the fact it asked for the domain from a
> PASID capable device.

All we need a explicit indication in domain_alloc_paging() saying allocate
*PASID* capable domain.

> 
> What we perhaps need is for VFIO to have some way to evaluate a lot of
> devices together and decide on the best domain configuration for the
> full set. It prpbably makes little sense to have a v1 page table and
> then a copy with a v2 page table, for the PASID device.
> 
>>> We already have a path where the VFIO userspace can request a v1
>>> domain by using the NESTING_PARENT flags during user domain
>>> allocation.
>>
>> That's with domain_alloc_user() API right? As I understand that should work fine
>> for AMD driver.
> 
> Yes I expect so.

Thanks.

>  
>>> But I don't see any option here that doesn't involve userspace itself
>>> making a request and indicating it wants a degrated VFIO
>>> functionality.
>>
>> That's along an option. We can explore it later.
> 
> But there is no later.
> 
> Your said your desire is to get a v1 domain even if the device
> supports PASID, and VFIO will get PASID support likely before you post
> your patches for this. Yi's work looks almost done to me.
> 
> Then VFIO will just request the V2 domain anyhow and you are right
> back to the starting problem again.

Let me see if I can put together here.


- Basically we want domain_alloc_paging() to explicitly tell domain requirement
(ex: allocate PASID capable UNMANAGED domain)
  If we do that then we will fix ordering issue I described earlier.
- Fix attach device path so that for 'PASID capable UNMANAGED domain' :
  non-PASID capable device will use AMD V2 page table with GCR3[0]
  PASID capable device will setup its GCR3 table as well and it should work fine.

Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
to reliable implement things.

Regarding performance impact: Thanks for pointing out. I will start discussion
in that thread. Lets see if we can come up with something.

But these two can go independently?

-Vasant

> 
> Given your remarks it may make sense that VIFO disable PASID support
> by default so that by default AMD does not have a performance
> regression. You should start discussing this with Alex in Yi's series
> to come to a decision. It makes sense to me at least.
> 
> If that is the choice then the idea of adding some flags/etc would
> work well.
> 
> Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-01 10:48       ` Vasant Hegde
@ 2024-07-01 17:26         ` Jason Gunthorpe
  2024-07-03  5:42           ` Vasant Hegde
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-01 17:26 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:

> > Then VFIO will just request the V2 domain anyhow and you are right
> > back to the starting problem again.
> 
> Let me see if I can put together here.
> 
> 
> - Basically we want domain_alloc_paging() to explicitly tell domain requirement
> (ex: allocate PASID capable UNMANAGED domain)
>   If we do that then we will fix ordering issue I described earlier.
> - Fix attach device path so that for 'PASID capable UNMANAGED domain' :
>   non-PASID capable device will use AMD V2 page table with GCR3[0]
>   PASID capable device will setup its GCR3 table as well and it should work fine.
> 
> Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
> to reliable implement things.

Yes, some flag at domain allocation time is clearly needed

But you need the whole story, including when will VFIO set the flag,
which requires a uAPI someplace.

> But these two can go independently?

If PASID goes with a default-on uAPI then you loose what you are
asking for as VFIO will always request PASID.

So the overall plan has to be sorted out now..

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-01 17:26         ` Jason Gunthorpe
@ 2024-07-03  5:42           ` Vasant Hegde
  2024-07-03  6:57             ` Yi Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-07-03  5:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

Jason,


On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:
> 
>>> Then VFIO will just request the V2 domain anyhow and you are right
>>> back to the starting problem again.
>>
>> Let me see if I can put together here.
>>
>>
>> - Basically we want domain_alloc_paging() to explicitly tell domain requirement
>> (ex: allocate PASID capable UNMANAGED domain)
>>   If we do that then we will fix ordering issue I described earlier.
>> - Fix attach device path so that for 'PASID capable UNMANAGED domain' :
>>   non-PASID capable device will use AMD V2 page table with GCR3[0]
>>   PASID capable device will setup its GCR3 table as well and it should work fine.
>>
>> Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
>> to reliable implement things.
> 
> Yes, some flag at domain allocation time is clearly needed

Right. I will try to get this going and then we can fine tune with Robin's
suggestion.

> 
> But you need the whole story, including when will VFIO set the flag,
> which requires a uAPI someplace.
> 
>> But these two can go independently?
> 
> If PASID goes with a default-on uAPI then you loose what you are
> asking for as VFIO will always request PASID.
> 
> So the overall plan has to be sorted out now..

Is this the Yi's thread you were referring ?

https://lore.kernel.org/linux-iommu/20240628090557.50898-1-yi.l.liu@intel.com/

-Vasant

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-03  5:42           ` Vasant Hegde
@ 2024-07-03  6:57             ` Yi Liu
  2024-07-09 18:23               ` Jason Gunthorpe
  2024-07-11 10:15               ` Vasant Hegde
  0 siblings, 2 replies; 30+ messages in thread
From: Yi Liu @ 2024-07-03  6:57 UTC (permalink / raw)
  To: Vasant Hegde, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On 2024/7/3 13:42, Vasant Hegde wrote:
> Jason,
> 
> 
> On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
>> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:
>>
>>>> Then VFIO will just request the V2 domain anyhow and you are right
>>>> back to the starting problem again.
>>>
>>> Let me see if I can put together here.
>>>
>>>
>>> - Basically we want domain_alloc_paging() to explicitly tell domain requirement
>>> (ex: allocate PASID capable UNMANAGED domain)
>>>    If we do that then we will fix ordering issue I described earlier.
>>> - Fix attach device path so that for 'PASID capable UNMANAGED domain' :
>>>    non-PASID capable device will use AMD V2 page table with GCR3[0]
>>>    PASID capable device will setup its GCR3 table as well and it should work fine.
>>>
>>> Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
>>> to reliable implement things.
>>
>> Yes, some flag at domain allocation time is clearly needed
> 
> Right. I will try to get this going and then we can fine tune with Robin's
> suggestion.
> 
>>
>> But you need the whole story, including when will VFIO set the flag,
>> which requires a uAPI someplace.
>>
>>> But these two can go independently?
>>
>> If PASID goes with a default-on uAPI then you loose what you are
>> asking for as VFIO will always request PASID.
>>
>> So the overall plan has to be sorted out now..
> 
> Is this the Yi's thread you were referring ?
> 
> https://lore.kernel.org/linux-iommu/20240628090557.50898-1-yi.l.liu@intel.com/

This is part of the whole work. What Jason refers may be the below one.
It includes how VFIO report the PASID capability to userspace. There is
an open to be closed though. I'll loop you in the next version. :)

https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/

BTW. For AMD, can a single device be attached to multiple V1 page table?
IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
is a PASID table which has multiple cr3 page table pointers.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-03  6:57             ` Yi Liu
@ 2024-07-09 18:23               ` Jason Gunthorpe
  2024-07-10  4:17                 ` Yi Liu
  2024-07-11 10:15               ` Vasant Hegde
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 18:23 UTC (permalink / raw)
  To: Yi Liu
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy, Baolu Lu

On Wed, Jul 03, 2024 at 02:57:51PM +0800, Yi Liu wrote:

> BTW. For AMD, can a single device be attached to multiple V1 page table?
> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
> is a PASID table which has multiple cr3 page table pointers.

Yes, ARM and AMD both have kinds of iommu_domains that cannot be
attached to a PASID.

We are going to have to deal with that as well at some
point. Currently ARM is always creating PASID compatible iommu
domains.

While AMD would like to use a RID-only type by default.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-09 18:23               ` Jason Gunthorpe
@ 2024-07-10  4:17                 ` Yi Liu
  2024-07-11 23:49                   ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Yi Liu @ 2024-07-10  4:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy, Baolu Lu

On 2024/7/10 02:23, Jason Gunthorpe wrote:
> On Wed, Jul 03, 2024 at 02:57:51PM +0800, Yi Liu wrote:
> 
>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
>> is a PASID table which has multiple cr3 page table pointers.
> 
> Yes, ARM and AMD both have kinds of iommu_domains that cannot be
> attached to a PASID.

I see. VT-d does not have such a limitation AFAIK. BTW. SVA domain shall
not be attached to RID I suppose. This should be a common restriction
across vendors. Do you think it is valuable to let iommu core block such
an attempt?

> We are going to have to deal with that as well at some
> point. Currently ARM is always creating PASID compatible iommu
> domains.
> 
> While AMD would like to use a RID-only type by default.

thanks for the explanation.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-03  6:57             ` Yi Liu
  2024-07-09 18:23               ` Jason Gunthorpe
@ 2024-07-11 10:15               ` Vasant Hegde
  2024-07-11 13:56                 ` Yi Liu
  1 sibling, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-07-11 10:15 UTC (permalink / raw)
  To: Yi Liu, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

Hi Yi,


On 7/3/2024 12:27 PM, Yi Liu wrote:
> On 2024/7/3 13:42, Vasant Hegde wrote:
>> Jason,
>>
>>
>> On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
>>> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:
>>>
>>>>> Then VFIO will just request the V2 domain anyhow and you are right
>>>>> back to the starting problem again.
>>>>
>>>> Let me see if I can put together here.
>>>>
>>>>
>>>> - Basically we want domain_alloc_paging() to explicitly tell domain requirement
>>>> (ex: allocate PASID capable UNMANAGED domain)
>>>>    If we do that then we will fix ordering issue I described earlier.
>>>> - Fix attach device path so that for 'PASID capable UNMANAGED domain' :
>>>>    non-PASID capable device will use AMD V2 page table with GCR3[0]
>>>>    PASID capable device will setup its GCR3 table as well and it should work
>>>> fine.
>>>>
>>>> Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
>>>> to reliable implement things.
>>>
>>> Yes, some flag at domain allocation time is clearly needed
>>
>> Right. I will try to get this going and then we can fine tune with Robin's
>> suggestion.
>>
>>>
>>> But you need the whole story, including when will VFIO set the flag,
>>> which requires a uAPI someplace.
>>>
>>>> But these two can go independently?
>>>
>>> If PASID goes with a default-on uAPI then you loose what you are
>>> asking for as VFIO will always request PASID.
>>>
>>> So the overall plan has to be sorted out now..
>>
>> Is this the Yi's thread you were referring ?
>>
>> https://lore.kernel.org/linux-iommu/20240628090557.50898-1-yi.l.liu@intel.com/
> 
> This is part of the whole work. What Jason refers may be the below one.
> It includes how VFIO report the PASID capability to userspace. There is
> an open to be closed though. I'll loop you in the next version. :)

Thank you Yi. I will start looking into this series.


> 
> https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/
> 
> BTW. For AMD, can a single device be attached to multiple V1 page table?
> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
> is a PASID table which has multiple cr3 page table pointers.

Jason covered this part in his response. Basically AMD has two page table :
  V1 which doesn't support PASID.
  V2 which support PASID

By default when host is booted it will be in V1 page table. What we are planning
is we will implement domain_alloc_paging() interface in AMD driver. As part of that:
  - PASID capable device will be placed in V2 page table, rest will be in V1
page table

For UNMANAGED domain (like vfio-pci), we would need a indication from core layer
saying whether we want to allocate PASID capable domain or not. So that
internally we can allocate appropriate page table type.

That means from VFIO layer we need to pass this detail to IOMMU layer.
For this part, I will start watching your patch. May be we can start discussion
in that thread itself.

-Vasant



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-11 10:15               ` Vasant Hegde
@ 2024-07-11 13:56                 ` Yi Liu
  2024-07-12  1:39                   ` Baolu Lu
  2024-07-15 10:39                   ` Vasant Hegde
  0 siblings, 2 replies; 30+ messages in thread
From: Yi Liu @ 2024-07-11 13:56 UTC (permalink / raw)
  To: Vasant Hegde, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On 2024/7/11 18:15, Vasant Hegde wrote:
> Hi Yi,
> 
> 
> On 7/3/2024 12:27 PM, Yi Liu wrote:
>> On 2024/7/3 13:42, Vasant Hegde wrote:
>>> Jason,
>>>
>>>
>>> On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
>>>> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:
>>>>
>>>>>> Then VFIO will just request the V2 domain anyhow and you are right
>>>>>> back to the starting problem again.
>>>>>
>>>>> Let me see if I can put together here.
>>>>>
>>>>>
>>>>> - Basically we want domain_alloc_paging() to explicitly tell domain requirement
>>>>> (ex: allocate PASID capable UNMANAGED domain)
>>>>>     If we do that then we will fix ordering issue I described earlier.
>>>>> - Fix attach device path so that for 'PASID capable UNMANAGED domain' :
>>>>>     non-PASID capable device will use AMD V2 page table with GCR3[0]
>>>>>     PASID capable device will setup its GCR3 table as well and it should work
>>>>> fine.
>>>>>
>>>>> Net we need a enhancement to domain_alloc_paging(). Otherwise I don't see a way
>>>>> to reliable implement things.
>>>>
>>>> Yes, some flag at domain allocation time is clearly needed
>>>
>>> Right. I will try to get this going and then we can fine tune with Robin's
>>> suggestion.
>>>
>>>>
>>>> But you need the whole story, including when will VFIO set the flag,
>>>> which requires a uAPI someplace.
>>>>
>>>>> But these two can go independently?
>>>>
>>>> If PASID goes with a default-on uAPI then you loose what you are
>>>> asking for as VFIO will always request PASID.
>>>>
>>>> So the overall plan has to be sorted out now..
>>>
>>> Is this the Yi's thread you were referring ?
>>>
>>> https://lore.kernel.org/linux-iommu/20240628090557.50898-1-yi.l.liu@intel.com/
>>
>> This is part of the whole work. What Jason refers may be the below one.
>> It includes how VFIO report the PASID capability to userspace. There is
>> an open to be closed though. I'll loop you in the next version. :)
> 
> Thank you Yi. I will start looking into this series.
> 
> 
>>
>> https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/
>>
>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
>> is a PASID table which has multiple cr3 page table pointers.
> 
> Jason covered this part in his response. Basically AMD has two page table :
>    V1 which doesn't support PASID.
>    V2 which support PASID
> 
> By default when host is booted it will be in V1 page table. What we are planning
> is we will implement domain_alloc_paging() interface in AMD driver. As part of that:
>    - PASID capable device will be placed in V2 page table, rest will be in V1
> page table

looks like the format of V1 and V2 page tables are different. Can the V2
page table be attached to a device (RID) and used to do address translation
for the DMAs w/o PASID? I have an impression that a RID can only use V1
page table. I may be wrong here. :)

> For UNMANAGED domain (like vfio-pci), we would need a indication from core layer
> saying whether we want to allocate PASID capable domain or not. So that
> internally we can allocate appropriate page table type.
> 
> That means from VFIO layer we need to pass this detail to IOMMU layer.
> For this part, I will start watching your patch. May be we can start discussion
> in that thread itself.
> 

yeah. Looks like you need to pass a hint to iommu driver for domain
allocation. For the usage vfio+iommufd, it is fine as the domain_alloc_user
accepts a flag. For legacy vfio (using vfio iommu type1) usage, it's a bit
tough as the domain allocation is within kernel, userspace does not have
any input on it so far. And vfio type1 does not expose the domain concept
to userspace, it only exposes container which is kind of an address space
object. This means no good way for userspace to control the domain
allocation.

If it's only a matter when PASID is enabled, perhaps we can skip vfio iommu
type1 as it is not supposed to support PASID.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-10  4:17                 ` Yi Liu
@ 2024-07-11 23:49                   ` Jason Gunthorpe
  2024-07-12 13:40                     ` Robin Murphy
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-11 23:49 UTC (permalink / raw)
  To: Yi Liu
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy, Baolu Lu

On Wed, Jul 10, 2024 at 12:17:17PM +0800, Yi Liu wrote:
> On 2024/7/10 02:23, Jason Gunthorpe wrote:
> > On Wed, Jul 03, 2024 at 02:57:51PM +0800, Yi Liu wrote:
> > 
> > > BTW. For AMD, can a single device be attached to multiple V1 page table?
> > > IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
> > > is a PASID table which has multiple cr3 page table pointers.
> > 
> > Yes, ARM and AMD both have kinds of iommu_domains that cannot be
> > attached to a PASID.
> 
> I see. VT-d does not have such a limitation AFAIK. 

Yeah, it is nice for software that VT-d put the right bits in to let
all paths select either format.

> BTW. SVA domain shall not be attached to RID I suppose. This should
> be a common restriction across vendors. Do you think it is valuable
> to let iommu core block such an attempt?

Yeah, maybe.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-11 13:56                 ` Yi Liu
@ 2024-07-12  1:39                   ` Baolu Lu
  2024-07-12  2:43                     ` Yi Liu
  2024-07-15 10:39                   ` Vasant Hegde
  1 sibling, 1 reply; 30+ messages in thread
From: Baolu Lu @ 2024-07-12  1:39 UTC (permalink / raw)
  To: Yi Liu, Vasant Hegde, Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy

On 7/11/24 9:56 PM, Yi Liu wrote:
> 
>> For UNMANAGED domain (like vfio-pci), we would need a indication from 
>> core layer
>> saying whether we want to allocate PASID capable domain or not. So that
>> internally we can allocate appropriate page table type.
>>
>> That means from VFIO layer we need to pass this detail to IOMMU layer.
>> For this part, I will start watching your patch. May be we can start 
>> discussion
>> in that thread itself.
>>
> 
> yeah. Looks like you need to pass a hint to iommu driver for domain
> allocation. For the usage vfio+iommufd, it is fine as the domain_alloc_user
> accepts a flag. For legacy vfio (using vfio iommu type1) usage, it's a bit
> tough as the domain allocation is within kernel, userspace does not have
> any input on it so far. And vfio type1 does not expose the domain concept
> to userspace, it only exposes container which is kind of an address space
> object. This means no good way for userspace to control the domain
> allocation.
> 
> If it's only a matter when PASID is enabled, perhaps we can skip vfio iommu
> type1 as it is not supposed to support PASID.

As we have already agreed that new features should go through IOMMUFD
and vfio/type1 is only kept for legacy compatible purpose, I don't think
we need to carry any PASID bits in the vifo path. Did I miss anything?

Thanks,
baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-12  1:39                   ` Baolu Lu
@ 2024-07-12  2:43                     ` Yi Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Yi Liu @ 2024-07-12  2:43 UTC (permalink / raw)
  To: Baolu Lu, Vasant Hegde, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy

On 2024/7/12 09:39, Baolu Lu wrote:
> On 7/11/24 9:56 PM, Yi Liu wrote:
>>
>>> For UNMANAGED domain (like vfio-pci), we would need a indication from 
>>> core layer
>>> saying whether we want to allocate PASID capable domain or not. So that
>>> internally we can allocate appropriate page table type.
>>>
>>> That means from VFIO layer we need to pass this detail to IOMMU layer.
>>> For this part, I will start watching your patch. May be we can start 
>>> discussion
>>> in that thread itself.
>>>
>>
>> yeah. Looks like you need to pass a hint to iommu driver for domain
>> allocation. For the usage vfio+iommufd, it is fine as the domain_alloc_user
>> accepts a flag. For legacy vfio (using vfio iommu type1) usage, it's a bit
>> tough as the domain allocation is within kernel, userspace does not have
>> any input on it so far. And vfio type1 does not expose the domain concept
>> to userspace, it only exposes container which is kind of an address space
>> object. This means no good way for userspace to control the domain
>> allocation.
>>
>> If it's only a matter when PASID is enabled, perhaps we can skip vfio iommu
>> type1 as it is not supposed to support PASID.
> 
> As we have already agreed that new features should go through IOMMUFD
> and vfio/type1 is only kept for legacy compatible purpose, I don't think
> we need to carry any PASID bits in the vifo path. Did I miss anything?

yes, we do. I have below kdoc for it in my pasid series [1]. Let's wait for
Vasant's confirmation if it's only a matter of pasid. It's not quite clear
to me based on my limited AMD iommu knowledge yet. :)

+ * Associate a pasid (of a cdev device) with an address space within the
+ * bound iommufd. Undo by VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT or device fd
+ * close. This is only allowed on cdev fds.

[1] https://lore.kernel.org/kvm/20240412082121.33382-4-yi.l.liu@intel.com/

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-11 23:49                   ` Jason Gunthorpe
@ 2024-07-12 13:40                     ` Robin Murphy
  2024-07-12 13:53                       ` Jason Gunthorpe
  2024-07-15  8:46                       ` Yi Liu
  0 siblings, 2 replies; 30+ messages in thread
From: Robin Murphy @ 2024-07-12 13:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Yi Liu
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Baolu Lu

On 12/07/2024 12:49 am, Jason Gunthorpe wrote:
> On Wed, Jul 10, 2024 at 12:17:17PM +0800, Yi Liu wrote:
>> On 2024/7/10 02:23, Jason Gunthorpe wrote:
>>> On Wed, Jul 03, 2024 at 02:57:51PM +0800, Yi Liu wrote:
>>>
>>>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>>>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
>>>> is a PASID table which has multiple cr3 page table pointers.
>>>
>>> Yes, ARM and AMD both have kinds of iommu_domains that cannot be
>>> attached to a PASID.
>>
>> I see. VT-d does not have such a limitation AFAIK.
> 
> Yeah, it is nice for software that VT-d put the right bits in to let
> all paths select either format.
> 
>> BTW. SVA domain shall not be attached to RID I suppose. This should
>> be a common restriction across vendors. Do you think it is valuable
>> to let iommu core block such an attempt?
> 
> Yeah, maybe.

There's no functional reason to disallow it for Arm SMMUs, at least, so 
it's something that could potentially want relaxing back to per-driver 
in future. I've heard murmurings over the years about potential 
use-cases for SVA with platform and/or non-PASID PCI devices, but never 
anything concrete for existing hardware. I imagine we can expect to see 
more SVA-interested on-chip accelerators in future, but whether they'll 
all be nice and support Substreams (i.e. "platform PASIDs") is the 
question...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-12 13:40                     ` Robin Murphy
@ 2024-07-12 13:53                       ` Jason Gunthorpe
  2024-07-12 15:12                         ` Robin Murphy
  2024-07-15  8:46                       ` Yi Liu
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-12 13:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yi Liu, Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Baolu Lu

On Fri, Jul 12, 2024 at 02:40:37PM +0100, Robin Murphy wrote:
> > > BTW. SVA domain shall not be attached to RID I suppose. This should
> > > be a common restriction across vendors. Do you think it is valuable
> > > to let iommu core block such an attempt?
> > 
> > Yeah, maybe.
> 
> There's no functional reason to disallow it for Arm SMMUs, at least, so it's
> something that could potentially want relaxing back to per-driver in
> future.

I think everyone can technically do SVA as everyone should be able to
do PRI on the RID. We want PRI on RID anyhow for unpinned guest KVM
support someday.

My specific "maybe" was because SVA is troublesome for MSI, on ARM we
won't have an ITS page mapped for instance.

> I've heard murmurings over the years about potential use-cases for SVA with
> platform and/or non-PASID PCI devices, but never anything concrete for
> existing hardware. I imagine we can expect to see more SVA-interested
> on-chip accelerators in future, but whether they'll all be nice and support
> Substreams (i.e. "platform PASIDs") is the question...

So I guess they don't send MSI down those special stream ids? It makes
sense as a use case to me.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-12 13:53                       ` Jason Gunthorpe
@ 2024-07-12 15:12                         ` Robin Murphy
  2024-07-12 15:19                           ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Robin Murphy @ 2024-07-12 15:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Baolu Lu

On 12/07/2024 2:53 pm, Jason Gunthorpe wrote:
> On Fri, Jul 12, 2024 at 02:40:37PM +0100, Robin Murphy wrote:
>>>> BTW. SVA domain shall not be attached to RID I suppose. This should
>>>> be a common restriction across vendors. Do you think it is valuable
>>>> to let iommu core block such an attempt?
>>>
>>> Yeah, maybe.
>>
>> There's no functional reason to disallow it for Arm SMMUs, at least, so it's
>> something that could potentially want relaxing back to per-driver in
>> future.
> 
> I think everyone can technically do SVA as everyone should be able to
> do PRI on the RID. We want PRI on RID anyhow for unpinned guest KVM
> support someday.
> 
> My specific "maybe" was because SVA is troublesome for MSI, on ARM we
> won't have an ITS page mapped for instance.

Yeah, I think the trick for that is going to be to require the device to 
be happy with shoving any kernel-owned mappings (MSIs, firmware, etc.) 
into the TTB1 range.

>> I've heard murmurings over the years about potential use-cases for SVA with
>> platform and/or non-PASID PCI devices, but never anything concrete for
>> existing hardware. I imagine we can expect to see more SVA-interested
>> on-chip accelerators in future, but whether they'll all be nice and support
>> Substreams (i.e. "platform PASIDs") is the question...
> 
> So I guess they don't send MSI down those special stream ids? It makes
> sense as a use case to me.

Indeed I'd hope that people follow the intuitive usage model and MSIs 
are sent without a SubstreamID, but I guess we're just going to have to 
wait and see what gets built. Of course, the one real SSID-capable 
device I *have* seen so far still only has wired interrupts anyway...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-12 15:12                         ` Robin Murphy
@ 2024-07-12 15:19                           ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-12 15:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yi Liu, Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Baolu Lu

On Fri, Jul 12, 2024 at 04:12:43PM +0100, Robin Murphy wrote:
> On 12/07/2024 2:53 pm, Jason Gunthorpe wrote:
> > On Fri, Jul 12, 2024 at 02:40:37PM +0100, Robin Murphy wrote:
> > > > > BTW. SVA domain shall not be attached to RID I suppose. This should
> > > > > be a common restriction across vendors. Do you think it is valuable
> > > > > to let iommu core block such an attempt?
> > > > 
> > > > Yeah, maybe.
> > > 
> > > There's no functional reason to disallow it for Arm SMMUs, at least, so it's
> > > something that could potentially want relaxing back to per-driver in
> > > future.
> > 
> > I think everyone can technically do SVA as everyone should be able to
> > do PRI on the RID. We want PRI on RID anyhow for unpinned guest KVM
> > support someday.
> > 
> > My specific "maybe" was because SVA is troublesome for MSI, on ARM we
> > won't have an ITS page mapped for instance.
> 
> Yeah, I think the trick for that is going to be to require the device to be
> happy with shoving any kernel-owned mappings (MSIs, firmware, etc.) into the
> TTB1 range.

Oh that's a really clever trick!

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-12 13:40                     ` Robin Murphy
  2024-07-12 13:53                       ` Jason Gunthorpe
@ 2024-07-15  8:46                       ` Yi Liu
  1 sibling, 0 replies; 30+ messages in thread
From: Yi Liu @ 2024-07-15  8:46 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Baolu Lu

On 2024/7/12 21:40, Robin Murphy wrote:
> On 12/07/2024 12:49 am, Jason Gunthorpe wrote:
>> On Wed, Jul 10, 2024 at 12:17:17PM +0800, Yi Liu wrote:
>>> On 2024/7/10 02:23, Jason Gunthorpe wrote:
>>>> On Wed, Jul 03, 2024 at 02:57:51PM +0800, Yi Liu wrote:
>>>>
>>>>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>>>>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But 
>>>>> there
>>>>> is a PASID table which has multiple cr3 page table pointers.
>>>>
>>>> Yes, ARM and AMD both have kinds of iommu_domains that cannot be
>>>> attached to a PASID.
>>>
>>> I see. VT-d does not have such a limitation AFAIK.
>>
>> Yeah, it is nice for software that VT-d put the right bits in to let
>> all paths select either format.
>>
>>> BTW. SVA domain shall not be attached to RID I suppose. This should
>>> be a common restriction across vendors. Do you think it is valuable
>>> to let iommu core block such an attempt?
>>
>> Yeah, maybe.
> 
> There's no functional reason to disallow it for Arm SMMUs, at least, so 
> it's something that could potentially want relaxing back to per-driver in 
> future. I've heard murmurings over the years about potential use-cases for 
> SVA with platform and/or non-PASID PCI devices, but never anything concrete 
> for existing hardware. I imagine we can expect to see more SVA-interested 
> on-chip accelerators in future, but whether they'll all be nice and support 
> Substreams (i.e. "platform PASIDs") is the question...

I see. Then let's keep the core as it is now.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-11 13:56                 ` Yi Liu
  2024-07-12  1:39                   ` Baolu Lu
@ 2024-07-15 10:39                   ` Vasant Hegde
  2024-07-16  7:43                     ` Yi Liu
  1 sibling, 1 reply; 30+ messages in thread
From: Vasant Hegde @ 2024-07-15 10:39 UTC (permalink / raw)
  To: Yi Liu, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

Hi,


On 7/11/2024 7:26 PM, Yi Liu wrote:
> On 2024/7/11 18:15, Vasant Hegde wrote:
>> Hi Yi,
>>
>>
>> On 7/3/2024 12:27 PM, Yi Liu wrote:
>>> On 2024/7/3 13:42, Vasant Hegde wrote:
>>>> Jason,
>>>>
>>>>
>>>> On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
>>>>> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:


.../...

>>>
>>> https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/
>>>
>>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
>>> is a PASID table which has multiple cr3 page table pointers.
>>
>> Jason covered this part in his response. Basically AMD has two page table :
>>    V1 which doesn't support PASID.
>>    V2 which support PASID
>>
>> By default when host is booted it will be in V1 page table. What we are planning
>> is we will implement domain_alloc_paging() interface in AMD driver. As part of
>> that:
>>    - PASID capable device will be placed in V2 page table, rest will be in V1
>> page table
> 
> looks like the format of V1 and V2 page tables are different. Can the V2
> page table be attached to a device (RID) and used to do address translation
> for the DMAs w/o PASID? I have an impression that a RID can only use V1
> page table. I may be wrong here. :)

We can attach V2 page table to a device. DMAs without PASID will use GCR3[0]
(PASID zero).

V1 is AMD IOMMU specific, V2 is compatible with CPU page table hence it can
support PASID.



> 
>> For UNMANAGED domain (like vfio-pci), we would need a indication from core layer
>> saying whether we want to allocate PASID capable domain or not. So that
>> internally we can allocate appropriate page table type.
>>
>> That means from VFIO layer we need to pass this detail to IOMMU layer.
>> For this part, I will start watching your patch. May be we can start discussion
>> in that thread itself.
>>
> 
> yeah. Looks like you need to pass a hint to iommu driver for domain
> allocation. For the usage vfio+iommufd, it is fine as the domain_alloc_user
> accepts a flag. For legacy vfio (using vfio iommu type1) usage, it's a bit
> tough as the domain allocation is within kernel, userspace does not have
> any input on it so far. And vfio type1 does not expose the domain concept
> to userspace, it only exposes container which is kind of an address space
> object. This means no good way for userspace to control the domain
> allocation.
> 
> If it's only a matter when PASID is enabled, perhaps we can skip vfio iommu
> type1 as it is not supposed to support PASID.

Right. My understanding is with vfio IOMMU we will not support PASID. SO it
should be fine.


-Vasant

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-15 10:39                   ` Vasant Hegde
@ 2024-07-16  7:43                     ` Yi Liu
  2024-07-16 13:41                       ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Yi Liu @ 2024-07-16  7:43 UTC (permalink / raw)
  To: Vasant Hegde, Jason Gunthorpe
  Cc: Joerg Roedel, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Will Deacon, Robin Murphy, Baolu Lu

On 2024/7/15 18:39, Vasant Hegde wrote:
> Hi,
> 
> 
> On 7/11/2024 7:26 PM, Yi Liu wrote:
>> On 2024/7/11 18:15, Vasant Hegde wrote:
>>> Hi Yi,
>>>
>>>
>>> On 7/3/2024 12:27 PM, Yi Liu wrote:
>>>> On 2024/7/3 13:42, Vasant Hegde wrote:
>>>>> Jason,
>>>>>
>>>>>
>>>>> On 7/1/2024 10:56 PM, Jason Gunthorpe wrote:
>>>>>> On Mon, Jul 01, 2024 at 04:18:01PM +0530, Vasant Hegde wrote:
> 
> 
> .../...
> 
>>>>
>>>> https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/
>>>>
>>>> BTW. For AMD, can a single device be attached to multiple V1 page table?
>>>> IIRC. looks like AMD has only one V1 page table pointer per BDF. But there
>>>> is a PASID table which has multiple cr3 page table pointers.
>>>
>>> Jason covered this part in his response. Basically AMD has two page table :
>>>     V1 which doesn't support PASID.
>>>     V2 which support PASID
>>>
>>> By default when host is booted it will be in V1 page table. What we are planning
>>> is we will implement domain_alloc_paging() interface in AMD driver. As part of
>>> that:
>>>     - PASID capable device will be placed in V2 page table, rest will be in V1
>>> page table
>>
>> looks like the format of V1 and V2 page tables are different. Can the V2
>> page table be attached to a device (RID) and used to do address translation
>> for the DMAs w/o PASID? I have an impression that a RID can only use V1
>> page table. I may be wrong here. :)
> 
> We can attach V2 page table to a device. DMAs without PASID will use GCR3[0]
> (PASID zero).

got it. :)

> V1 is AMD IOMMU specific, V2 is compatible with CPU page table hence it can
> support PASID.
> 

yeah, this suits my understanding.

> 
>>
>>> For UNMANAGED domain (like vfio-pci), we would need a indication from core layer
>>> saying whether we want to allocate PASID capable domain or not. So that
>>> internally we can allocate appropriate page table type.
>>>
>>> That means from VFIO layer we need to pass this detail to IOMMU layer.
>>> For this part, I will start watching your patch. May be we can start discussion
>>> in that thread itself.
>>>
>>
>> yeah. Looks like you need to pass a hint to iommu driver for domain
>> allocation. For the usage vfio+iommufd, it is fine as the domain_alloc_user
>> accepts a flag. For legacy vfio (using vfio iommu type1) usage, it's a bit
>> tough as the domain allocation is within kernel, userspace does not have
>> any input on it so far. And vfio type1 does not expose the domain concept
>> to userspace, it only exposes container which is kind of an address space
>> object. This means no good way for userspace to control the domain
>> allocation.
>>
>> If it's only a matter when PASID is enabled, perhaps we can skip vfio iommu
>> type1 as it is not supposed to support PASID.
> 
> Right. My understanding is with vfio IOMMU we will not support PASID. SO it
> should be fine.

then you only need to take good care of the iommufd hwpt allocation uapi,
and the domain_alloc_user() callback to achieve your goal of this thread.
Anyhow, I'll still loop you in my pasid series in case anything missed
here. :)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver
  2024-07-16  7:43                     ` Yi Liu
@ 2024-07-16 13:41                       ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2024-07-16 13:41 UTC (permalink / raw)
  To: Yi Liu
  Cc: Vasant Hegde, Joerg Roedel, iommu@lists.linux.dev,
	Suravee Suthikulpanit, Will Deacon, Robin Murphy, Baolu Lu

On Tue, Jul 16, 2024 at 03:43:35PM +0800, Yi Liu wrote:
> > Right. My understanding is with vfio IOMMU we will not support PASID. SO it
> > should be fine.
> 
> then you only need to take good care of the iommufd hwpt allocation uapi,
> and the domain_alloc_user() callback to achieve your goal of this thread.
> Anyhow, I'll still loop you in my pasid series in case anything missed
> here. :)

Right, Yi's series to add PASID support to VFIO is the primary issue
here, that has to be done in some way to accommodate this AMD
situation.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-07-16 13:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28  6:43 [RFC] iommu_ops->domain_alloc_paging() enhancement to support AMD IOMMU driver Vasant Hegde
2024-06-28 12:23 ` Baolu Lu
2024-06-28 13:06   ` Robin Murphy
2024-06-28 17:08     ` Vasant Hegde
2024-06-28 17:58       ` Robin Murphy
2024-07-01 10:28         ` Vasant Hegde
2024-06-28 14:50   ` Vasant Hegde
2024-06-28 15:34     ` Jason Gunthorpe
2024-06-28 13:03 ` Jason Gunthorpe
2024-06-28 17:49   ` Vasant Hegde
2024-06-28 18:04     ` Jason Gunthorpe
2024-07-01 10:48       ` Vasant Hegde
2024-07-01 17:26         ` Jason Gunthorpe
2024-07-03  5:42           ` Vasant Hegde
2024-07-03  6:57             ` Yi Liu
2024-07-09 18:23               ` Jason Gunthorpe
2024-07-10  4:17                 ` Yi Liu
2024-07-11 23:49                   ` Jason Gunthorpe
2024-07-12 13:40                     ` Robin Murphy
2024-07-12 13:53                       ` Jason Gunthorpe
2024-07-12 15:12                         ` Robin Murphy
2024-07-12 15:19                           ` Jason Gunthorpe
2024-07-15  8:46                       ` Yi Liu
2024-07-11 10:15               ` Vasant Hegde
2024-07-11 13:56                 ` Yi Liu
2024-07-12  1:39                   ` Baolu Lu
2024-07-12  2:43                     ` Yi Liu
2024-07-15 10:39                   ` Vasant Hegde
2024-07-16  7:43                     ` Yi Liu
2024-07-16 13:41                       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.