[PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses
@ 2025-11-03 14:00 Wei Wang
  2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
  2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang
  0 siblings, 2 replies; 20+ messages in thread
From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw)
  To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro
  Cc: kevin.tian, wei.w.wang, linux-kernel, iommu

When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for
all physical addresses. This is correct for system RAM, since the C-bit is
required by SME to indicate encrypted memory and ensure proper
encryption/decryption.

However, applying the C-bit to MMIO addresses is incorrect. Devices and
PCIe switches do not interpret the C-bit currently, and doing so can break
PCIe peer-to-peer communication. To prevent this, avoid setting the C-bit
when the physical address is backed by MMIO.

Note: this patchset only updates vfio_iommu_type1. Corresponding changes
to iommufd to pass the IOMMU_MMIO prot flag will be added if this approach
is accepted.

v1->v2 changes:
- 1 used page_is_ram() in the AMD IOMMU driver to detect non-RAM
  addresses, avoiding changes to upper-layer callers (vfio and iommufd).
  v2 instead lets upper layers explicitly indicate MMIO mappings via the
  IOMMU_MMIO prot flag. This avoids the potential overhead of
  page_is_ram(). (suggested by Jason Gunthorpe) 
  v1 link: https://lkml.org/lkml/2025/10/23/1211

Wei Wang (2):
  iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses

 drivers/iommu/amd/amd_iommu_types.h |  3 ++-
 drivers/iommu/amd/io_pgtable.c      |  7 +++++--
 drivers/iommu/amd/iommu.c           |  2 ++
 drivers/vfio/vfio_iommu_type1.c     | 14 +++++++++-----
 4 files changed, 18 insertions(+), 8 deletions(-)

-- 
2.51.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang
@ 2025-11-03 14:00 ` Wei Wang
  2025-11-07  1:02   ` Jason Gunthorpe
  2025-11-10  9:55   ` Vasant Hegde
  2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang
  1 sibling, 2 replies; 20+ messages in thread
From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw)
  To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro
  Cc: kevin.tian, wei.w.wang, linux-kernel, iommu

Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages()
to explicitly request memory encryption for specific mappings.

With SME enabled, the C-bit (encryption bit) in IOMMU page table entries
is now set only when IOMMU_PROT_IE is specified. This provides
fine-grained control over which IOVAs are encrypted through the IOMMU
page tables.

Current PCIe devices and switches do not interpret the C-bit, so applying
it to MMIO mappings would break PCIe peer‑to‑peer communication. Update
the implementation to restrict C-bit usage to non‑MMIO backed IOVAs.

Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
---
 drivers/iommu/amd/amd_iommu_types.h | 3 ++-
 drivers/iommu/amd/io_pgtable.c      | 7 +++++--
 drivers/iommu/amd/iommu.c           | 2 ++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a698a2e7ce2a..5b6ce0286a16 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -442,9 +442,10 @@
 #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
 #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
 
-#define IOMMU_PROT_MASK 0x03
+#define IOMMU_PROT_MASK (IOMMU_PROT_IR | IOMMU_PROT_IW | IOMMU_PROT_IE)
 #define IOMMU_PROT_IR 0x01
 #define IOMMU_PROT_IW 0x02
+#define IOMMU_PROT_IE 0x04
 
 #define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE	(1 << 2)
 
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 70c2f5b1631b..ae5032dd3b2f 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -367,11 +367,14 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		if (!iommu_pages_list_empty(&freelist))
 			updated = true;
 
+		if (prot & IOMMU_PROT_IE)
+			paddr = __sme_set(paddr);
+
 		if (count > 1) {
-			__pte = PAGE_SIZE_PTE(__sme_set(paddr), pgsize);
+			__pte = PAGE_SIZE_PTE(paddr, pgsize);
 			__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
 		} else
-			__pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC;
+			__pte = paddr | IOMMU_PTE_PR | IOMMU_PTE_FC;
 
 		if (prot & IOMMU_PROT_IR)
 			__pte |= IOMMU_PTE_IR;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2e1865daa1ce..eaf024e9dff0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2762,6 +2762,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
 		prot |= IOMMU_PROT_IR;
 	if (iommu_prot & IOMMU_WRITE)
 		prot |= IOMMU_PROT_IW;
+	if (!(iommu_prot & IOMMU_MMIO))
+		prot |= IOMMU_PROT_IE;
 
 	if (ops->map_pages) {
 		ret = ops->map_pages(ops, iova, paddr, pgsize,
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
@ 2025-11-07  1:02   ` Jason Gunthorpe
  2025-11-07  2:39     ` Wei Wang
  2025-11-10  9:55   ` Vasant Hegde
  1 sibling, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07  1:02 UTC (permalink / raw)
  To: Wei Wang
  Cc: alex, suravee.suthikulpanit, thomas.lendacky, joro, kevin.tian,
	linux-kernel, iommu

On Mon, Nov 03, 2025 at 10:00:33PM +0800, Wei Wang wrote:
> Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages()
> to explicitly request memory encryption for specific mappings.
> 
> With SME enabled, the C-bit (encryption bit) in IOMMU page table entries
> is now set only when IOMMU_PROT_IE is specified. This provides
> fine-grained control over which IOVAs are encrypted through the IOMMU
> page tables.
> 
> Current PCIe devices and switches do not interpret the C-bit, so applying
> it to MMIO mappings would break PCIe peer‑to‑peer communication. Update
> the implementation to restrict C-bit usage to non‑MMIO backed IOVAs.
> 
> Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> ---
>  drivers/iommu/amd/amd_iommu_types.h | 3 ++-
>  drivers/iommu/amd/io_pgtable.c      | 7 +++++--
>  drivers/iommu/amd/iommu.c           | 2 ++
>  3 files changed, 9 insertions(+), 3 deletions(-)

Since Joerg took the iommupt patches this will need to be rebased on
his tree, I think it will be simpler..

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-07  1:02   ` Jason Gunthorpe
@ 2025-11-07  2:39     ` Wei Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Wei Wang @ 2025-11-07  2:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Friday, November 7, 2025 9:02 AM, Jason Gunthorpe wrote:
> On Mon, Nov 03, 2025 at 10:00:33PM +0800, Wei Wang wrote:
> > Introduce the IOMMU_PROT_IE flag to allow callers of
> > iommu_v1_map_pages() to explicitly request memory encryption for
> specific mappings.
> >
> > With SME enabled, the C-bit (encryption bit) in IOMMU page table
> > entries is now set only when IOMMU_PROT_IE is specified. This provides
> > fine-grained control over which IOVAs are encrypted through the IOMMU
> > page tables.
> >
> > Current PCIe devices and switches do not interpret the C-bit, so
> > applying it to MMIO mappings would break PCIe peer‑to‑peer
> > communication. Update the implementation to restrict C-bit usage to
> non‑MMIO backed IOVAs.
> >
> > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with
> > memory encryption")
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > ---
> >  drivers/iommu/amd/amd_iommu_types.h | 3 ++-
> >  drivers/iommu/amd/io_pgtable.c      | 7 +++++--
> >  drivers/iommu/amd/iommu.c           | 2 ++
> >  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> Since Joerg took the iommupt patches this will need to be rebased on his
> tree, I think it will be simpler..

OK, I will have a check, thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
  2025-11-07  1:02   ` Jason Gunthorpe
@ 2025-11-10  9:55   ` Vasant Hegde
  2025-11-11  1:18     ` Wei Wang
  1 sibling, 1 reply; 20+ messages in thread
From: Vasant Hegde @ 2025-11-10  9:55 UTC (permalink / raw)
  To: Wei Wang, alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro
  Cc: kevin.tian, linux-kernel, iommu

Hi Wei,

On 11/3/2025 7:30 PM, Wei Wang wrote:
> Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages()
> to explicitly request memory encryption for specific mappings.
> 
> With SME enabled, the C-bit (encryption bit) in IOMMU page table entries
> is now set only when IOMMU_PROT_IE is specified. This provides
> fine-grained control over which IOVAs are encrypted through the IOMMU
> page tables.
> 
> Current PCIe devices and switches do not interpret the C-bit, so applying
> it to MMIO mappings would break PCIe peer‑to‑peer communication. Update
> the implementation to restrict C-bit usage to non‑MMIO backed IOVAs.

Right. Quote from AMD Programmers Manual Vol2, "any pages corresponding to MMIO
addresses must be configured with the C-bit clear."

> 
> Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> --->  drivers/iommu/amd/amd_iommu_types.h | 3 ++-
>  drivers/iommu/amd/io_pgtable.c      | 7 +++++--

May be apply same fix for io_pgtable_v2.c as well? (Of course filename changed
with generic pt series).

-Vasant

>  drivers/iommu/amd/iommu.c           | 2 ++
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index a698a2e7ce2a..5b6ce0286a16 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -442,9 +442,10 @@
>  #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
>  #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
>  
> -#define IOMMU_PROT_MASK 0x03
> +#define IOMMU_PROT_MASK (IOMMU_PROT_IR | IOMMU_PROT_IW | IOMMU_PROT_IE)
>  #define IOMMU_PROT_IR 0x01
>  #define IOMMU_PROT_IW 0x02
> +#define IOMMU_PROT_IE 0x04
>  
>  #define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE	(1 << 2)
>  
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 70c2f5b1631b..ae5032dd3b2f 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -367,11 +367,14 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  		if (!iommu_pages_list_empty(&freelist))
>  			updated = true;
>  
> +		if (prot & IOMMU_PROT_IE)
> +			paddr = __sme_set(paddr);
> +
>  		if (count > 1) {
> -			__pte = PAGE_SIZE_PTE(__sme_set(paddr), pgsize);
> +			__pte = PAGE_SIZE_PTE(paddr, pgsize);
>  			__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
>  		} else
> -			__pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC;
> +			__pte = paddr | IOMMU_PTE_PR | IOMMU_PTE_FC;
>  
>  		if (prot & IOMMU_PROT_IR)
>  			__pte |= IOMMU_PTE_IR;
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 2e1865daa1ce..eaf024e9dff0 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2762,6 +2762,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova,
>  		prot |= IOMMU_PROT_IR;
>  	if (iommu_prot & IOMMU_WRITE)
>  		prot |= IOMMU_PROT_IW;
> +	if (!(iommu_prot & IOMMU_MMIO))
> +		prot |= IOMMU_PROT_IE;
>  
>  	if (ops->map_pages) {
>  		ret = ops->map_pages(ops, iova, paddr, pgsize,


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-10  9:55   ` Vasant Hegde
@ 2025-11-11  1:18     ` Wei Wang
  2025-11-11  4:44       ` Vasant Hegde
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Wang @ 2025-11-11  1:18 UTC (permalink / raw)
  To: Vasant Hegde, alex@shazbot.org, jgg@nvidia.com,
	suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org
  Cc: kevin.tian@intel.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev

On Monday, November 10, 2025 5:55 PM, Vasant Hegde wrote:
> To: Wei Wang <wei.w.wang@hotmail.com>; alex@shazbot.org;
> jgg@nvidia.com; suravee.suthikulpanit@amd.com;
> thomas.lendacky@amd.com; joro@8bytes.org
> Cc: kevin.tian@intel.com; linux-kernel@vger.kernel.org;
> iommu@lists.linux.dev
> Subject: Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for
> memory encryption
> 
> Hi Wei,
> 
> On 11/3/2025 7:30 PM, Wei Wang wrote:
> > Introduce the IOMMU_PROT_IE flag to allow callers of
> > iommu_v1_map_pages() to explicitly request memory encryption for
> specific mappings.
> >
> > With SME enabled, the C-bit (encryption bit) in IOMMU page table
> > entries is now set only when IOMMU_PROT_IE is specified. This provides
> > fine-grained control over which IOVAs are encrypted through the IOMMU
> > page tables.
> >
> > Current PCIe devices and switches do not interpret the C-bit, so
> > applying it to MMIO mappings would break PCIe peer‑to‑peer
> > communication. Update the implementation to restrict C-bit usage to
> non‑MMIO backed IOVAs.
> 
> Right. Quote from AMD Programmers Manual Vol2, "any pages
> corresponding to MMIO addresses must be configured with the C-bit clear."
> 

Yes, thanks.

> >
> > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with
> > memory encryption")
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > --->  drivers/iommu/amd/amd_iommu_types.h | 3 ++-
> >  drivers/iommu/amd/io_pgtable.c      | 7 +++++--
> 
> May be apply same fix for io_pgtable_v2.c as well? (Of course filename
> changed with generic pt series).

Yes. I was uncertain about the 1st stage mapping as it has a usage for
GVA->GPA mappings, and for the trusted MMIO case, we do need the
C-bit added to GPA. But since vIOMMU isn’t supported for SNP guests,
and the trusted MMIO isn't ready yet, I think it should be safe to proceed
with this now. The above consideration can be re-visited when the trusted
MMIO gets landed.

I'll add it in the next version and see if others might have a different
perspective on this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption
  2025-11-11  1:18     ` Wei Wang
@ 2025-11-11  4:44       ` Vasant Hegde
  0 siblings, 0 replies; 20+ messages in thread
From: Vasant Hegde @ 2025-11-11  4:44 UTC (permalink / raw)
  To: Wei Wang, alex@shazbot.org, jgg@nvidia.com,
	suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org
  Cc: kevin.tian@intel.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev

Wei,


On 11/11/2025 6:48 AM, Wei Wang wrote:
> On Monday, November 10, 2025 5:55 PM, Vasant Hegde wrote:
>> To: Wei Wang <wei.w.wang@hotmail.com>; alex@shazbot.org;
>> jgg@nvidia.com; suravee.suthikulpanit@amd.com;
>> thomas.lendacky@amd.com; joro@8bytes.org
>> Cc: kevin.tian@intel.com; linux-kernel@vger.kernel.org;
>> iommu@lists.linux.dev
>> Subject: Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for
>> memory encryption
>>
>> Hi Wei,
>>
>> On 11/3/2025 7:30 PM, Wei Wang wrote:
>>> Introduce the IOMMU_PROT_IE flag to allow callers of
>>> iommu_v1_map_pages() to explicitly request memory encryption for
>> specific mappings.
>>>
>>> With SME enabled, the C-bit (encryption bit) in IOMMU page table
>>> entries is now set only when IOMMU_PROT_IE is specified. This provides
>>> fine-grained control over which IOVAs are encrypted through the IOMMU
>>> page tables.
>>>
>>> Current PCIe devices and switches do not interpret the C-bit, so
>>> applying it to MMIO mappings would break PCIe peer‑to‑peer
>>> communication. Update the implementation to restrict C-bit usage to
>> non‑MMIO backed IOVAs.
>>
>> Right. Quote from AMD Programmers Manual Vol2, "any pages
>> corresponding to MMIO addresses must be configured with the C-bit clear."
>>
> 
> Yes, thanks.
> 
>>>
>>> Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with
>>> memory encryption")
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
>>> --->  drivers/iommu/amd/amd_iommu_types.h | 3 ++-
>>>  drivers/iommu/amd/io_pgtable.c      | 7 +++++--
>>
>> May be apply same fix for io_pgtable_v2.c as well? (Of course filename
>> changed with generic pt series).
> 
> Yes. I was uncertain about the 1st stage mapping as it has a usage for
> GVA->GPA mappings, and for the trusted MMIO case, we do need the
> C-bit added to GPA. But since vIOMMU isn’t supported for SNP guests,
> and the trusted MMIO isn't ready yet, I think it should be safe to proceed
> with this now. The above consideration can be re-visited when the trusted
> MMIO gets landed.

Right. We use v2 (guest) page table to support PASID/PRI in host.

Agree. for now lets fix the v2 page table as well. We will fix it for Secure
vIOMMU when we add support.




> 
> I'll add it in the next version and see if others might have a different
> perspective on this.

Thanks!

-Vasant


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang
  2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
@ 2025-11-03 14:00 ` Wei Wang
  2025-11-07  1:03   ` Jason Gunthorpe
  1 sibling, 1 reply; 20+ messages in thread
From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw)
  To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro
  Cc: kevin.tian, wei.w.wang, linux-kernel, iommu

Before requesting the IOMMU driver to map an IOVA to a physical address,
set the IOMMU_MMIO flag in dma->prot when the physical address corresponds
to MMIO. This allows the IOMMU driver to handle MMIO mappings specially.
For example, on AMD CPUs with SME enabled, the AMD IOMMU driver avoids
setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot.
This prevents issues with PCIe P2P communication, since current PCIe
switches and devices do not interpret the C-bit correctly.

Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
---
 drivers/vfio/vfio_iommu_type1.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5167bec14e36..f5c56e227f9a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -583,7 +583,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
  * returned initial pfn are provided; subsequent pfns are contiguous.
  */
 static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
-			   unsigned long npages, int prot, unsigned long *pfn,
+			   unsigned long npages, int *prot, unsigned long *pfn,
 			   struct vfio_batch *batch)
 {
 	unsigned long pin_pages = min_t(unsigned long, npages, batch->capacity);
@@ -591,7 +591,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 	unsigned int flags = 0;
 	long ret;
 
-	if (prot & IOMMU_WRITE)
+	if (*prot & IOMMU_WRITE)
 		flags |= FOLL_WRITE;
 
 	mmap_read_lock(mm);
@@ -601,6 +601,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 		*pfn = page_to_pfn(batch->pages[0]);
 		batch->size = ret;
 		batch->offset = 0;
+		*prot &= ~IOMMU_MMIO;
 		goto done;
 	} else if (!ret) {
 		ret = -EFAULT;
@@ -615,7 +616,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 		unsigned long addr_mask;
 
 		ret = follow_fault_pfn(vma, mm, vaddr, pfn, &addr_mask,
-				       prot & IOMMU_WRITE);
+				       *prot & IOMMU_WRITE);
 		if (ret == -EAGAIN)
 			goto retry;
 
@@ -629,6 +630,9 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 				ret = -EFAULT;
 			}
 		}
+
+		if (vma->vm_flags & VM_IO)
+			*prot |= IOMMU_MMIO;
 	}
 done:
 	mmap_read_unlock(mm);
@@ -709,7 +713,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 			cond_resched();
 
 			/* Empty batch, so refill it. */
-			ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
+			ret = vaddr_get_pfns(mm, vaddr, npage, &dma->prot,
 					     &pfn, batch);
 			if (ret < 0)
 				goto unpin_out;
@@ -850,7 +854,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
 
 	vfio_batch_init_single(&batch);
 
-	ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, &batch);
+	ret = vaddr_get_pfns(mm, vaddr, 1, &dma->prot, pfn_base, &batch);
 	if (ret != 1)
 		goto out;
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang
@ 2025-11-07  1:03   ` Jason Gunthorpe
  2025-11-07  2:38     ` Wei Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07  1:03 UTC (permalink / raw)
  To: Wei Wang
  Cc: alex, suravee.suthikulpanit, thomas.lendacky, joro, kevin.tian,
	linux-kernel, iommu

On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote:
> Before requesting the IOMMU driver to map an IOVA to a physical address,
> set the IOMMU_MMIO flag in dma->prot when the physical address corresponds
> to MMIO. This allows the IOMMU driver to handle MMIO mappings specially.
> For example, on AMD CPUs with SME enabled, the AMD IOMMU driver avoids
> setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot.
> This prevents issues with PCIe P2P communication, since current PCIe
> switches and devices do not interpret the C-bit correctly.
> 
> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)

This may be the best you can do with vfio type1, but just because the
VMA is special doesn't necessarily mean it is MMIO, nor does it mean
it is decrypted memory.

I think to really make this work fully properly going forward people
are going to have to use iommufd's dmabuf.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07  1:03   ` Jason Gunthorpe
@ 2025-11-07  2:38     ` Wei Wang
  2025-11-07 14:16       ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Wang @ 2025-11-07  2:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Friday, November 7, 2025 9:04 AM, Jason Gunthorpe wrote:
> On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote:
> > Before requesting the IOMMU driver to map an IOVA to a physical
> > address, set the IOMMU_MMIO flag in dma->prot when the physical
> > address corresponds to MMIO. This allows the IOMMU driver to handle
> MMIO mappings specially.
> > For example, on AMD CPUs with SME enabled, the AMD IOMMU driver
> avoids
> > setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot.
> > This prevents issues with PCIe P2P communication, since current PCIe
> > switches and devices do not interpret the C-bit correctly.
> >
> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> This may be the best you can do with vfio type1, but just because the VMA is
> special doesn't necessarily mean it is MMIO, nor does it mean it is decrypted
> memory.

I think here vfio type1 only needs to provide the info about "MMIO or not"
(the decision to encrypt MMIO or not rests with the vendor IOMMU driver).

Why might a region not be MMIO when vma->flags includes VM_IO | VM_PFNMAP?
(are you aware of any real examples in use?)

For reference, BAR MMIO regions are explicitly mapped with these flags in
vfio_pci_core_mmap() :
	 vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
             		            VM_DONTEXPAND | VM_DONTDUMP);"

The only exception I can think of is nested virtualization, where an emulated device (with
vMMIO emulated using system RAM) is passed through to a nested guest. But this might
not be commonly used in practice (no performance gain as physical device pass through),
and the lack of encryption for such vMMIO should not be a concern, IMHO. Its security
model aligns essentially with that of the host (i.e., physical MMIO data is not encrypted
on the host, and the same principle applies to emulated vMMIO in nested environments).
Also the same for physical devices (as opposed to virtual devices) passed to a nested guest.

> 
> I think to really make this work fully properly going forward people are going
> to have to use iommufd's dmabuf.

Yeah, I'll also patch for iommufd. We still need to account for the case that many users are
still relying on legacy VFIO type1 (will also have some backport work of this patch).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07  2:38     ` Wei Wang
@ 2025-11-07 14:16       ` Jason Gunthorpe
       [not found]         ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07 14:16 UTC (permalink / raw)
  To: Wei Wang
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Fri, Nov 07, 2025 at 02:38:50AM +0000, Wei Wang wrote:
> On Friday, November 7, 2025 9:04 AM, Jason Gunthorpe wrote:
> > On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote:
> > > Before requesting the IOMMU driver to map an IOVA to a physical
> > > address, set the IOMMU_MMIO flag in dma->prot when the physical
> > > address corresponds to MMIO. This allows the IOMMU driver to handle
> > MMIO mappings specially.
> > > For example, on AMD CPUs with SME enabled, the AMD IOMMU driver
> > avoids
> > > setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot.
> > > This prevents issues with PCIe P2P communication, since current PCIe
> > > switches and devices do not interpret the C-bit correctly.
> > >
> > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 14 +++++++++-----
> > >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > This may be the best you can do with vfio type1, but just because the VMA is
> > special doesn't necessarily mean it is MMIO, nor does it mean it is decrypted
> > memory.
> 
> I think here vfio type1 only needs to provide the info about "MMIO or not"
> (the decision to encrypt MMIO or not rests with the vendor IOMMU driver).
> 
> Why might a region not be MMIO when vma->flags includes VM_IO | VM_PFNMAP?
> (are you aware of any real examples in use?)

VM_IO should indicate MMIO, yes, but we don't actually check that in
this type 1 path..

> > I think to really make this work fully properly going forward people are going
> > to have to use iommufd's dmabuf.
> 
> Yeah, I'll also patch for iommufd. We still need to account for the
> case that many users are still relying on legacy VFIO type1 (will
> also have some backport work of this patch).

I think my dmabuf patch for iommufd already does this properly.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>]

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
       [not found]         ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>
@ 2025-11-07 15:57           ` Jason Gunthorpe
  2025-11-07 16:19             ` Wei Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07 15:57 UTC (permalink / raw)
  To: Wei Wang
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote:
>    > (are you aware of any real examples in use?)
>    > VM_IO should indicate MMIO, yes, but we don't actually check that in
>    > this type 1 path..

>    Is it because VFIO type1 didn’t need to check for MMIO before?
>    (not sure how this impacts this patch adding the VM_IO check for MMIO
>    :) )

Okay, but it still doesn't mean it has to be decrypted..

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 15:57           ` Jason Gunthorpe
@ 2025-11-07 16:19             ` Wei Wang
  2025-11-07 16:36               ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Wang @ 2025-11-07 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote:
On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote:
>    > (are you aware of any real examples in use?)
>    > VM_IO should indicate MMIO, yes, but we don't actually check that in
>    > this type 1 path..

>    Is it because VFIO type1 didn’t need to check for MMIO before?
>    (not sure how this impacts this patch adding the VM_IO check for MMIO
>    :) )

> Okay, but it still doesn't mean it has to be decrypted..

I think "decrypted or not" is the job of the 1st patch. For now, MMIO cannot be encrypted, particularly not via sme_set(). If MMIO encryption is ever introduced in the future, a new flag (probably different from sme_me_mask) would need to be added. 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 16:19             ` Wei Wang
@ 2025-11-07 16:36               ` Jason Gunthorpe
  2025-11-07 17:56                 ` Tom Lendacky
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07 16:36 UTC (permalink / raw)
  To: Wei Wang
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com,
	thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Fri, Nov 07, 2025 at 04:19:35PM +0000, Wei Wang wrote:
> On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote:
> On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote:
> >    > (are you aware of any real examples in use?)
> >    > VM_IO should indicate MMIO, yes, but we don't actually check that in
> >    > this type 1 path..
> 
> >    Is it because VFIO type1 didn’t need to check for MMIO before?
> >    (not sure how this impacts this patch adding the VM_IO check for MMIO
> >    :) )
> 
> > Okay, but it still doesn't mean it has to be decrypted..
> 
> I think "decrypted or not" is the job of the 1st patch. For now,
> MMIO cannot be encrypted, particularly not via sme_set(). If MMIO
> encryption is ever introduced in the future, a new flag (probably
> different from sme_me_mask) would need to be added.

The kernel is using "decrypted" as some weirdo code-word to mean the
memory is shared with the hypervisor. Only on AMD does it even have
anything to do with actual memory encryption.

However when I look at swiotlb and dma coherent mmap I see it calls
set_memory_decrypted(), uses pgprot_decrypted(), but still uses
__sme_set() when forming the iommu page table??

So why is that OK, but MMIO needs to avoid the sme_set() in the iommu
page table?

IOW I would like to hear from AMD some clear rules when sme_set needs
to be called and when it isn't.

Then we can decide if VM_IO is sufficient and so on.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 16:36               ` Jason Gunthorpe
@ 2025-11-07 17:56                 ` Tom Lendacky
  2025-11-07 18:32                   ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Tom Lendacky @ 2025-11-07 17:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Wei Wang
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org,
	kevin.tian@intel.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev, Alexey Kardashevskiy

On 11/7/25 10:36, Jason Gunthorpe wrote:
> On Fri, Nov 07, 2025 at 04:19:35PM +0000, Wei Wang wrote:
>> On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote:
>> On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote:
>>>    > (are you aware of any real examples in use?)
>>>    > VM_IO should indicate MMIO, yes, but we don't actually check that in
>>>    > this type 1 path..
>>
>>>    Is it because VFIO type1 didn’t need to check for MMIO before?
>>>    (not sure how this impacts this patch adding the VM_IO check for MMIO
>>>    :) )
>>
>>> Okay, but it still doesn't mean it has to be decrypted..
>>
>> I think "decrypted or not" is the job of the 1st patch. For now,
>> MMIO cannot be encrypted, particularly not via sme_set(). If MMIO
>> encryption is ever introduced in the future, a new flag (probably
>> different from sme_me_mask) would need to be added.
> 
> The kernel is using "decrypted" as some weirdo code-word to mean the
> memory is shared with the hypervisor. Only on AMD does it even have
> anything to do with actual memory encryption.
> 
> However when I look at swiotlb and dma coherent mmap I see it calls
> set_memory_decrypted(), uses pgprot_decrypted(), but still uses
> __sme_set() when forming the iommu page table??
> 
> So why is that OK, but MMIO needs to avoid the sme_set() in the iommu
> page table?
> 
> IOW I would like to hear from AMD some clear rules when sme_set needs
> to be called and when it isn't.

When you are on bare-metal, or in the hypervisor, System Memory Encryption
(SME) deals with the encryption bit set in the page table entries
(including the nested page table entries for guests). If the encryption
bit is not set (decrypted), data does not get encrypted when written to
memory and does not get decrypted when read from memory. If the encryption
bit is set (encrypted), data gets encrypted when written to memory and
decrypted when read from memory. MMIO, since it does not go through the
memory controller, does not support encryption capabilities and so should
not have the encryption bit set as it isn't recognized as system memory.

On the hypervisor, when using the IOMMU, SWIOTLB is not used and I/O to
and from system memory (DMA) will be encrypted and/or decrypted if the
encryption bit is set in the I/O page table leaf entry. If the IOMMU is
not enabled, then SWIOTLB is only used if the device does not support DMA
addressing at or above the encryption bit location.

In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory
is used because a device cannot DMA to or from guest memory using the
guest encryption key. So all DMA must go to "decrypted" memory or be
bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory
that does not get encrypted/decrypted using the guest encryption key.

It is not until we get to Trusted I/O / TDISP where devices will be able
to DMA directly to guest encrypted memory and guests will require secure
MMIO addresses which will need the encryption bit set (Alexey can correct
me on the TIO statements if they aren't correct, as he is closer to it all).

I hope I've explained it in a way that makes sense.

Thanks,
Tom

> 
> Then we can decide if VM_IO is sufficient and so on.
> 
> Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 17:56                 ` Tom Lendacky
@ 2025-11-07 18:32                   ` Jason Gunthorpe
  2025-11-07 19:59                     ` Tom Lendacky
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-07 18:32 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Alexey Kardashevskiy

On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:

> When you are on bare-metal, or in the hypervisor, System Memory Encryption
> (SME) deals with the encryption bit set in the page table entries
> (including the nested page table entries for guests). 

So "decrypted" means something about AMD's unique memory encryption
scheme on bare metal but in a CC guest it is a cross arch 'shared with
hypervisor' flag?

What about CXL memory? What about ZONE_DEVICE coherent memory? Do
these get the C bit set too?

:( :( :(

> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory
> is used because a device cannot DMA to or from guest memory using the
> guest encryption key. So all DMA must go to "decrypted" memory or be
> bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory
> that does not get encrypted/decrypted using the guest encryption key.

Where is the code for this? As I wrote we always do sme_set in the
iommu driver, even on guests, even for "decrypted" bounce buffered
memory.

That sounds like a bug by your explanation?

Does this mean vIOMMU has never worked in AMD CC guests?

> It is not until we get to Trusted I/O / TDISP where devices will be able
> to DMA directly to guest encrypted memory and guests will require secure
> MMIO addresses which will need the encryption bit set (Alexey can correct
> me on the TIO statements if they aren't correct, as he is closer to it all).

So in this case we do need to do sme_set on MMIO even though that MMIO
is not using the dram encryption key?

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 18:32                   ` Jason Gunthorpe
@ 2025-11-07 19:59                     ` Tom Lendacky
  2025-11-10  6:28                       ` Wei Wang
                                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Tom Lendacky @ 2025-11-07 19:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Alexey Kardashevskiy

On 11/7/25 12:32, Jason Gunthorpe wrote:
> On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:
> 
>> When you are on bare-metal, or in the hypervisor, System Memory Encryption
>> (SME) deals with the encryption bit set in the page table entries
>> (including the nested page table entries for guests). 
> 
> So "decrypted" means something about AMD's unique memory encryption
> scheme on bare metal but in a CC guest it is a cross arch 'shared with
> hypervisor' flag?

Note, that if the encryption bit is not set in the guest, then the host
encryption key is used if the underlying NPT leaf entry has the encryption
bit set. In that case, both the host and guest can read the memory, with
the memory still being encrypted in physical memory.

> 
> What about CXL memory? What about ZONE_DEVICE coherent memory? Do
> these get the C bit set too?

When CXL memory is presented as system memory to the OS it does support
the encryption bit. So when pages are allocated for the guest, the memory
pages will be encrypted with the guest key.

Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to
the system as system physical memory that the hypervisor can allocate as
guest memory?

> 
> :( :( :(
> 
>> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory
>> is used because a device cannot DMA to or from guest memory using the
>> guest encryption key. So all DMA must go to "decrypted" memory or be
>> bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory
>> that does not get encrypted/decrypted using the guest encryption key.
> 
> Where is the code for this? As I wrote we always do sme_set in the
> iommu driver, even on guests, even for "decrypted" bounce buffered
> memory.
> 
> That sounds like a bug by your explanation?
> 
> Does this mean vIOMMU has never worked in AMD CC guests?

I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This does
does not work today with AMD CC guests since it requires the hypervisor to
read the guest IOMMU buffers in order to emulate the behavior and those
buffers are encrypted. So there is no vIOMMU support today in AMD CC
guests.

There was a patch series submitted a while back to allocate the IOMMU
buffers in shared memory in order to support a (non-secure) vIOMMU in the
guest in order to support >255 vCPUs, but that was rejected in favor of
using kvm-msi-ext-dest-id.

https://lore.kernel.org/linux-iommu/20240430152430.4245-1-suravee.suthikulpanit@amd.com/

> 
>> It is not until we get to Trusted I/O / TDISP where devices will be able
>> to DMA directly to guest encrypted memory and guests will require secure
>> MMIO addresses which will need the encryption bit set (Alexey can correct
>> me on the TIO statements if they aren't correct, as he is closer to it all).
> 
> So in this case we do need to do sme_set on MMIO even though that MMIO
> is not using the dram encryption key?

@Alexey will be able to provide more details on how this works.

Thanks,
Tom

> 
> Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 19:59                     ` Tom Lendacky
@ 2025-11-10  6:28                       ` Wei Wang
  2025-11-10  9:55                       ` Vasant Hegde
  2025-11-18 14:36                       ` Jason Gunthorpe
  2 siblings, 0 replies; 20+ messages in thread
From: Wei Wang @ 2025-11-10  6:28 UTC (permalink / raw)
  To: Tom Lendacky, Jason Gunthorpe
  Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org,
	kevin.tian@intel.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev, Alexey Kardashevskiy

On Saturday, November 8, 2025 3:59 AM, Tom Lendacky wrote:
> On 11/7/25 12:32, Jason Gunthorpe wrote:
> > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:
> >
> >> When you are on bare-metal, or in the hypervisor, System Memory
> >> Encryption
> >> (SME) deals with the encryption bit set in the page table entries
> >> (including the nested page table entries for guests).
> >
> > So "decrypted" means something about AMD's unique memory encryption
> > scheme on bare metal but in a CC guest it is a cross arch 'shared with
> > hypervisor' flag?
> 
> Note, that if the encryption bit is not set in the guest, then the host
> encryption key is used if the underlying NPT leaf entry has the encryption bit
> set. In that case, both the host and guest can read the memory, with the
> memory still being encrypted in physical memory.
> 
> >
> > What about CXL memory? What about ZONE_DEVICE coherent memory?
> Do
> > these get the C bit set too?
> 
> When CXL memory is presented as system memory to the OS it does support
> the encryption bit. So when pages are allocated for the guest, the memory
> pages will be encrypted with the guest key.
> 
> Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented
> to the system as system physical memory that the hypervisor can allocate as
> guest memory?
> 
> >
> > :( :( :(
> >
> >> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared)
> >> memory is used because a device cannot DMA to or from guest memory
> >> using the guest encryption key. So all DMA must go to "decrypted"
> >> memory or be bounce-buffered through "decrypted" memory (SWIOTLB)
> -
> >> basically memory that does not get encrypted/decrypted using the guest
> encryption key.
> >
> > Where is the code for this? As I wrote we always do sme_set in the
> > iommu driver, even on guests, even for "decrypted" bounce buffered
> > memory.
> >
> > That sounds like a bug by your explanation?
> >
> > Does this mean vIOMMU has never worked in AMD CC guests?
> 
> I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This
> does does not work today with AMD CC guests since it requires the
> hypervisor to read the guest IOMMU buffers in order to emulate the
> behavior and those buffers are encrypted. So there is no vIOMMU support
> today in AMD CC guests.
> 
> There was a patch series submitted a while back to allocate the IOMMU
> buffers in shared memory in order to support a (non-secure) vIOMMU in the
> guest in order to support >255 vCPUs, but that was rejected in favor of using
> kvm-msi-ext-dest-id.
> 
> https://lore.kernel.org/linux-iommu/20240430152430.4245-1-
> suravee.suthikulpanit@amd.com/
> 
> >
> >> It is not until we get to Trusted I/O / TDISP where devices will be
> >> able to DMA directly to guest encrypted memory and guests will
> >> require secure MMIO addresses which will need the encryption bit set
> >> (Alexey can correct me on the TIO statements if they aren't correct, as he
> is closer to it all).
> >
> > So in this case we do need to do sme_set on MMIO even though that
> MMIO
> > is not using the dram encryption key?
> 
> @Alexey will be able to provide more details on how this works.
> 


Also share my perspective on the proposed questions:

In the TEE-IO case, trusted device MMIO is mapped to a private Guest Physical
Address (FYI: this can be checked in the SEV-TIO whitepaper and Intel TDX
Connect architecture spec), that is, the C-bit is added to GPA, not to the host
physical address. So this case is not related to the updates introduced by this
patch, which handles the C-bit for MMIO physical addresses (via
iommu_v1_map_pages()).
Also the "encryption" bit (C-bit for AMD and S-bit for Intel) in the MMIO GPA
does not actually engage the encryption engine (e.g. SME) in the memory
controller for data encryption (communication with the trusted device is
encrypted via IDE). This bit is used for other non-encryption purposes.

For the ZONE_DEVICE memory (memory hosted on devices), IIUC, it is accessed
via PCIe (not through the on-die memory controller). So SME is not used to
encrypt this type of memory. If we pass through such a device to the guest using
VFIO type1, it will be treated as device MMIO that bypasses the C-bit setting added
by this patch, which I think is the expected behavior.

For the CXL memory, when it is used as the guest's system memory, it does not
go through the VFIO PCI BAR MMIO pass-through mechanism. So it also falls
outside the scope of the changes in this patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 19:59                     ` Tom Lendacky
  2025-11-10  6:28                       ` Wei Wang
@ 2025-11-10  9:55                       ` Vasant Hegde
  2025-11-18 14:36                       ` Jason Gunthorpe
  2 siblings, 0 replies; 20+ messages in thread
From: Vasant Hegde @ 2025-11-10  9:55 UTC (permalink / raw)
  To: Tom Lendacky, Jason Gunthorpe
  Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Alexey Kardashevskiy

On 11/8/2025 1:29 AM, Tom Lendacky wrote:
> On 11/7/25 12:32, Jason Gunthorpe wrote:
>> On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:
>>
>>> When you are on bare-metal, or in the hypervisor, System Memory Encryption
>>> (SME) deals with the encryption bit set in the page table entries
>>> (including the nested page table entries for guests). 
>>
>> So "decrypted" means something about AMD's unique memory encryption
>> scheme on bare metal but in a CC guest it is a cross arch 'shared with
>> hypervisor' flag?
> 
> Note, that if the encryption bit is not set in the guest, then the host
> encryption key is used if the underlying NPT leaf entry has the encryption
> bit set. In that case, both the host and guest can read the memory, with
> the memory still being encrypted in physical memory.
> 
>>
>> What about CXL memory? What about ZONE_DEVICE coherent memory? Do
>> these get the C bit set too?
> 
> When CXL memory is presented as system memory to the OS it does support
> the encryption bit. So when pages are allocated for the guest, the memory
> pages will be encrypted with the guest key.
> 
> Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to
> the system as system physical memory that the hypervisor can allocate as
> guest memory?
> 
>>
>> :( :( :(
>>
>>> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory
>>> is used because a device cannot DMA to or from guest memory using the
>>> guest encryption key. So all DMA must go to "decrypted" memory or be
>>> bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory
>>> that does not get encrypted/decrypted using the guest encryption key.
>>
>> Where is the code for this? As I wrote we always do sme_set in the
>> iommu driver, even on guests, even for "decrypted" bounce buffered
>> memory.
>>
>> That sounds like a bug by your explanation?
>>
>> Does this mean vIOMMU has never worked in AMD CC guests?
> 
> I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This does
> does not work today with AMD CC guests since it requires the hypervisor to
> read the guest IOMMU buffers in order to emulate the behavior and those
> buffers are encrypted. So there is no vIOMMU support today in AMD CC
> guests.
> 
> There was a patch series submitted a while back to allocate the IOMMU
> buffers in shared memory in order to support a (non-secure) vIOMMU in the
> guest in order to support >255 vCPUs, but that was rejected in favor of
> using kvm-msi-ext-dest-id.
> 
> https://lore.kernel.org/linux-iommu/20240430152430.4245-1-suravee.suthikulpanit@amd.com/
> 
>>
>>> It is not until we get to Trusted I/O / TDISP where devices will be able
>>> to DMA directly to guest encrypted memory and guests will require secure
>>> MMIO addresses which will need the encryption bit set (Alexey can correct
>>> me on the TIO statements if they aren't correct, as he is closer to it all).
>>
>> So in this case we do need to do sme_set on MMIO even though that MMIO
>> is not using the dram encryption key?

Yes. Its mapped to GPA (at least IOMMU VF MMIO BAR, I believe its same for TIO
device as well) and we need to set 'C' bit.

-Vasan



> 
> @Alexey will be able to provide more details on how this works.
> 
> Thanks,
> Tom
> 
>>
>> Jason
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses
  2025-11-07 19:59                     ` Tom Lendacky
  2025-11-10  6:28                       ` Wei Wang
  2025-11-10  9:55                       ` Vasant Hegde
@ 2025-11-18 14:36                       ` Jason Gunthorpe
  2 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 14:36 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Alexey Kardashevskiy

On Fri, Nov 07, 2025 at 01:59:00PM -0600, Tom Lendacky wrote:
> On 11/7/25 12:32, Jason Gunthorpe wrote:
> > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote:
> > 
> >> When you are on bare-metal, or in the hypervisor, System Memory Encryption
> >> (SME) deals with the encryption bit set in the page table entries
> >> (including the nested page table entries for guests). 
> > 
> > So "decrypted" means something about AMD's unique memory encryption
> > scheme on bare metal but in a CC guest it is a cross arch 'shared with
> > hypervisor' flag?
> 
> Note, that if the encryption bit is not set in the guest, then the host
> encryption key is used if the underlying NPT leaf entry has the encryption
> bit set. In that case, both the host and guest can read the memory, with
> the memory still being encrypted in physical memory.

Sure, so in the guest it is simple a 'shared with hypervisor' flag and
does not directly indicate if the memory controller did encryption or
not.

> > What about CXL memory? What about ZONE_DEVICE coherent memory? Do
> > these get the C bit set too?
> 
> When CXL memory is presented as system memory to the OS it does support
> the encryption bit. So when pages are allocated for the guest, the memory
> pages will be encrypted with the guest key.
> 
> Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to
> the system as system physical memory that the hypervisor can allocate as
> guest memory?

This is an option for CXL memory on CXL type 2 devices - ie GPU
memory. It is coherent but it is managed by a driver not by the core
OS as system memory.

> There was a patch series submitted a while back to allocate the IOMMU
> buffers in shared memory in order to support a (non-secure) vIOMMU in the
> guest in order to support >255 vCPUs, but that was rejected in favor of
> using kvm-msi-ext-dest-id.

Yes, but that was incomplete, it only did the data structures and only
really worked for interrupt remapping. It left the actual iommu
broken since we don't clear the C bit on swiotlb.
 
Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-11-18 14:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang
2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
2025-11-07  1:02   ` Jason Gunthorpe
2025-11-07  2:39     ` Wei Wang
2025-11-10  9:55   ` Vasant Hegde
2025-11-11  1:18     ` Wei Wang
2025-11-11  4:44       ` Vasant Hegde
2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang
2025-11-07  1:03   ` Jason Gunthorpe
2025-11-07  2:38     ` Wei Wang
2025-11-07 14:16       ` Jason Gunthorpe
     [not found]         ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>
2025-11-07 15:57           ` Jason Gunthorpe
2025-11-07 16:19             ` Wei Wang
2025-11-07 16:36               ` Jason Gunthorpe
2025-11-07 17:56                 ` Tom Lendacky
2025-11-07 18:32                   ` Jason Gunthorpe
2025-11-07 19:59                     ` Tom Lendacky
2025-11-10  6:28                       ` Wei Wang
2025-11-10  9:55                       ` Vasant Hegde
2025-11-18 14:36                       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox