* [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses @ 2025-11-03 14:00 Wei Wang 2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang 2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang 0 siblings, 2 replies; 20+ messages in thread From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw) To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro Cc: kevin.tian, wei.w.wang, linux-kernel, iommu When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for all physical addresses. This is correct for system RAM, since the C-bit is required by SME to indicate encrypted memory and ensure proper encryption/decryption. However, applying the C-bit to MMIO addresses is incorrect. Devices and PCIe switches do not interpret the C-bit currently, and doing so can break PCIe peer-to-peer communication. To prevent this, avoid setting the C-bit when the physical address is backed by MMIO. Note: this patchset only updates vfio_iommu_type1. Corresponding changes to iommufd to pass the IOMMU_MMIO prot flag will be added if this approach is accepted. v1->v2 changes: - 1 used page_is_ram() in the AMD IOMMU driver to detect non-RAM addresses, avoiding changes to upper-layer callers (vfio and iommufd). v2 instead lets upper layers explicitly indicate MMIO mappings via the IOMMU_MMIO prot flag. This avoids the potential overhead of page_is_ram(). (suggested by Jason Gunthorpe) v1 link: https://lkml.org/lkml/2025/10/23/1211 Wei Wang (2): iommu/amd: Add IOMMU_PROT_IE flag for memory encryption vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses drivers/iommu/amd/amd_iommu_types.h | 3 ++- drivers/iommu/amd/io_pgtable.c | 7 +++++-- drivers/iommu/amd/iommu.c | 2 ++ drivers/vfio/vfio_iommu_type1.c | 14 +++++++++----- 4 files changed, 18 insertions(+), 8 deletions(-) -- 2.51.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang @ 2025-11-03 14:00 ` Wei Wang 2025-11-07 1:02 ` Jason Gunthorpe 2025-11-10 9:55 ` Vasant Hegde 2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang 1 sibling, 2 replies; 20+ messages in thread From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw) To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro Cc: kevin.tian, wei.w.wang, linux-kernel, iommu Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages() to explicitly request memory encryption for specific mappings. With SME enabled, the C-bit (encryption bit) in IOMMU page table entries is now set only when IOMMU_PROT_IE is specified. This provides fine-grained control over which IOVAs are encrypted through the IOMMU page tables. Current PCIe devices and switches do not interpret the C-bit, so applying it to MMIO mappings would break PCIe peer‑to‑peer communication. Update the implementation to restrict C-bit usage to non‑MMIO backed IOVAs. Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> --- drivers/iommu/amd/amd_iommu_types.h | 3 ++- drivers/iommu/amd/io_pgtable.c | 7 +++++-- drivers/iommu/amd/iommu.c | 2 ++ 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index a698a2e7ce2a..5b6ce0286a16 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -442,9 +442,10 @@ #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK)) #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07) -#define IOMMU_PROT_MASK 0x03 +#define IOMMU_PROT_MASK (IOMMU_PROT_IR | IOMMU_PROT_IW | IOMMU_PROT_IE) #define IOMMU_PROT_IR 0x01 #define IOMMU_PROT_IW 0x02 +#define IOMMU_PROT_IE 0x04 #define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE (1 << 2) diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c index 70c2f5b1631b..ae5032dd3b2f 100644 --- a/drivers/iommu/amd/io_pgtable.c +++ b/drivers/iommu/amd/io_pgtable.c @@ -367,11 +367,14 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova, if (!iommu_pages_list_empty(&freelist)) updated = true; + if (prot & IOMMU_PROT_IE) + paddr = __sme_set(paddr); + if (count > 1) { - __pte = PAGE_SIZE_PTE(__sme_set(paddr), pgsize); + __pte = PAGE_SIZE_PTE(paddr, pgsize); __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC; } else - __pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC; + __pte = paddr | IOMMU_PTE_PR | IOMMU_PTE_FC; if (prot & IOMMU_PROT_IR) __pte |= IOMMU_PTE_IR; diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 2e1865daa1ce..eaf024e9dff0 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2762,6 +2762,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova, prot |= IOMMU_PROT_IR; if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; + if (!(iommu_prot & IOMMU_MMIO)) + prot |= IOMMU_PROT_IE; if (ops->map_pages) { ret = ops->map_pages(ops, iova, paddr, pgsize, -- 2.51.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang @ 2025-11-07 1:02 ` Jason Gunthorpe 2025-11-07 2:39 ` Wei Wang 2025-11-10 9:55 ` Vasant Hegde 1 sibling, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 1:02 UTC (permalink / raw) To: Wei Wang Cc: alex, suravee.suthikulpanit, thomas.lendacky, joro, kevin.tian, linux-kernel, iommu On Mon, Nov 03, 2025 at 10:00:33PM +0800, Wei Wang wrote: > Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages() > to explicitly request memory encryption for specific mappings. > > With SME enabled, the C-bit (encryption bit) in IOMMU page table entries > is now set only when IOMMU_PROT_IE is specified. This provides > fine-grained control over which IOVAs are encrypted through the IOMMU > page tables. > > Current PCIe devices and switches do not interpret the C-bit, so applying > it to MMIO mappings would break PCIe peer‑to‑peer communication. Update > the implementation to restrict C-bit usage to non‑MMIO backed IOVAs. > > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > --- > drivers/iommu/amd/amd_iommu_types.h | 3 ++- > drivers/iommu/amd/io_pgtable.c | 7 +++++-- > drivers/iommu/amd/iommu.c | 2 ++ > 3 files changed, 9 insertions(+), 3 deletions(-) Since Joerg took the iommupt patches this will need to be rebased on his tree, I think it will be simpler.. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-07 1:02 ` Jason Gunthorpe @ 2025-11-07 2:39 ` Wei Wang 0 siblings, 0 replies; 20+ messages in thread From: Wei Wang @ 2025-11-07 2:39 UTC (permalink / raw) To: Jason Gunthorpe Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Friday, November 7, 2025 9:02 AM, Jason Gunthorpe wrote: > On Mon, Nov 03, 2025 at 10:00:33PM +0800, Wei Wang wrote: > > Introduce the IOMMU_PROT_IE flag to allow callers of > > iommu_v1_map_pages() to explicitly request memory encryption for > specific mappings. > > > > With SME enabled, the C-bit (encryption bit) in IOMMU page table > > entries is now set only when IOMMU_PROT_IE is specified. This provides > > fine-grained control over which IOVAs are encrypted through the IOMMU > > page tables. > > > > Current PCIe devices and switches do not interpret the C-bit, so > > applying it to MMIO mappings would break PCIe peer‑to‑peer > > communication. Update the implementation to restrict C-bit usage to > non‑MMIO backed IOVAs. > > > > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with > > memory encryption") > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > > --- > > drivers/iommu/amd/amd_iommu_types.h | 3 ++- > > drivers/iommu/amd/io_pgtable.c | 7 +++++-- > > drivers/iommu/amd/iommu.c | 2 ++ > > 3 files changed, 9 insertions(+), 3 deletions(-) > > Since Joerg took the iommupt patches this will need to be rebased on his > tree, I think it will be simpler.. OK, I will have a check, thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang 2025-11-07 1:02 ` Jason Gunthorpe @ 2025-11-10 9:55 ` Vasant Hegde 2025-11-11 1:18 ` Wei Wang 1 sibling, 1 reply; 20+ messages in thread From: Vasant Hegde @ 2025-11-10 9:55 UTC (permalink / raw) To: Wei Wang, alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro Cc: kevin.tian, linux-kernel, iommu Hi Wei, On 11/3/2025 7:30 PM, Wei Wang wrote: > Introduce the IOMMU_PROT_IE flag to allow callers of iommu_v1_map_pages() > to explicitly request memory encryption for specific mappings. > > With SME enabled, the C-bit (encryption bit) in IOMMU page table entries > is now set only when IOMMU_PROT_IE is specified. This provides > fine-grained control over which IOVAs are encrypted through the IOMMU > page tables. > > Current PCIe devices and switches do not interpret the C-bit, so applying > it to MMIO mappings would break PCIe peer‑to‑peer communication. Update > the implementation to restrict C-bit usage to non‑MMIO backed IOVAs. Right. Quote from AMD Programmers Manual Vol2, "any pages corresponding to MMIO addresses must be configured with the C-bit clear." > > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > ---> drivers/iommu/amd/amd_iommu_types.h | 3 ++- > drivers/iommu/amd/io_pgtable.c | 7 +++++-- May be apply same fix for io_pgtable_v2.c as well? (Of course filename changed with generic pt series). -Vasant > drivers/iommu/amd/iommu.c | 2 ++ > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h > index a698a2e7ce2a..5b6ce0286a16 100644 > --- a/drivers/iommu/amd/amd_iommu_types.h > +++ b/drivers/iommu/amd/amd_iommu_types.h > @@ -442,9 +442,10 @@ > #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK)) > #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07) > > -#define IOMMU_PROT_MASK 0x03 > +#define IOMMU_PROT_MASK (IOMMU_PROT_IR | IOMMU_PROT_IW | IOMMU_PROT_IE) > #define IOMMU_PROT_IR 0x01 > #define IOMMU_PROT_IW 0x02 > +#define IOMMU_PROT_IE 0x04 > > #define IOMMU_UNITY_MAP_FLAG_EXCL_RANGE (1 << 2) > > diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c > index 70c2f5b1631b..ae5032dd3b2f 100644 > --- a/drivers/iommu/amd/io_pgtable.c > +++ b/drivers/iommu/amd/io_pgtable.c > @@ -367,11 +367,14 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova, > if (!iommu_pages_list_empty(&freelist)) > updated = true; > > + if (prot & IOMMU_PROT_IE) > + paddr = __sme_set(paddr); > + > if (count > 1) { > - __pte = PAGE_SIZE_PTE(__sme_set(paddr), pgsize); > + __pte = PAGE_SIZE_PTE(paddr, pgsize); > __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC; > } else > - __pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC; > + __pte = paddr | IOMMU_PTE_PR | IOMMU_PTE_FC; > > if (prot & IOMMU_PROT_IR) > __pte |= IOMMU_PTE_IR; > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 2e1865daa1ce..eaf024e9dff0 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -2762,6 +2762,8 @@ static int amd_iommu_map_pages(struct iommu_domain *dom, unsigned long iova, > prot |= IOMMU_PROT_IR; > if (iommu_prot & IOMMU_WRITE) > prot |= IOMMU_PROT_IW; > + if (!(iommu_prot & IOMMU_MMIO)) > + prot |= IOMMU_PROT_IE; > > if (ops->map_pages) { > ret = ops->map_pages(ops, iova, paddr, pgsize, ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-10 9:55 ` Vasant Hegde @ 2025-11-11 1:18 ` Wei Wang 2025-11-11 4:44 ` Vasant Hegde 0 siblings, 1 reply; 20+ messages in thread From: Wei Wang @ 2025-11-11 1:18 UTC (permalink / raw) To: Vasant Hegde, alex@shazbot.org, jgg@nvidia.com, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org Cc: kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Monday, November 10, 2025 5:55 PM, Vasant Hegde wrote: > To: Wei Wang <wei.w.wang@hotmail.com>; alex@shazbot.org; > jgg@nvidia.com; suravee.suthikulpanit@amd.com; > thomas.lendacky@amd.com; joro@8bytes.org > Cc: kevin.tian@intel.com; linux-kernel@vger.kernel.org; > iommu@lists.linux.dev > Subject: Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for > memory encryption > > Hi Wei, > > On 11/3/2025 7:30 PM, Wei Wang wrote: > > Introduce the IOMMU_PROT_IE flag to allow callers of > > iommu_v1_map_pages() to explicitly request memory encryption for > specific mappings. > > > > With SME enabled, the C-bit (encryption bit) in IOMMU page table > > entries is now set only when IOMMU_PROT_IE is specified. This provides > > fine-grained control over which IOVAs are encrypted through the IOMMU > > page tables. > > > > Current PCIe devices and switches do not interpret the C-bit, so > > applying it to MMIO mappings would break PCIe peer‑to‑peer > > communication. Update the implementation to restrict C-bit usage to > non‑MMIO backed IOVAs. > > Right. Quote from AMD Programmers Manual Vol2, "any pages > corresponding to MMIO addresses must be configured with the C-bit clear." > Yes, thanks. > > > > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with > > memory encryption") > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > > ---> drivers/iommu/amd/amd_iommu_types.h | 3 ++- > > drivers/iommu/amd/io_pgtable.c | 7 +++++-- > > May be apply same fix for io_pgtable_v2.c as well? (Of course filename > changed with generic pt series). Yes. I was uncertain about the 1st stage mapping as it has a usage for GVA->GPA mappings, and for the trusted MMIO case, we do need the C-bit added to GPA. But since vIOMMU isn’t supported for SNP guests, and the trusted MMIO isn't ready yet, I think it should be safe to proceed with this now. The above consideration can be re-visited when the trusted MMIO gets landed. I'll add it in the next version and see if others might have a different perspective on this. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption 2025-11-11 1:18 ` Wei Wang @ 2025-11-11 4:44 ` Vasant Hegde 0 siblings, 0 replies; 20+ messages in thread From: Vasant Hegde @ 2025-11-11 4:44 UTC (permalink / raw) To: Wei Wang, alex@shazbot.org, jgg@nvidia.com, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org Cc: kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev Wei, On 11/11/2025 6:48 AM, Wei Wang wrote: > On Monday, November 10, 2025 5:55 PM, Vasant Hegde wrote: >> To: Wei Wang <wei.w.wang@hotmail.com>; alex@shazbot.org; >> jgg@nvidia.com; suravee.suthikulpanit@amd.com; >> thomas.lendacky@amd.com; joro@8bytes.org >> Cc: kevin.tian@intel.com; linux-kernel@vger.kernel.org; >> iommu@lists.linux.dev >> Subject: Re: [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for >> memory encryption >> >> Hi Wei, >> >> On 11/3/2025 7:30 PM, Wei Wang wrote: >>> Introduce the IOMMU_PROT_IE flag to allow callers of >>> iommu_v1_map_pages() to explicitly request memory encryption for >> specific mappings. >>> >>> With SME enabled, the C-bit (encryption bit) in IOMMU page table >>> entries is now set only when IOMMU_PROT_IE is specified. This provides >>> fine-grained control over which IOVAs are encrypted through the IOMMU >>> page tables. >>> >>> Current PCIe devices and switches do not interpret the C-bit, so >>> applying it to MMIO mappings would break PCIe peer‑to‑peer >>> communication. Update the implementation to restrict C-bit usage to >> non‑MMIO backed IOVAs. >> >> Right. Quote from AMD Programmers Manual Vol2, "any pages >> corresponding to MMIO addresses must be configured with the C-bit clear." >> > > Yes, thanks. > >>> >>> Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with >>> memory encryption") >>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >>> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> >>> ---> drivers/iommu/amd/amd_iommu_types.h | 3 ++- >>> drivers/iommu/amd/io_pgtable.c | 7 +++++-- >> >> May be apply same fix for io_pgtable_v2.c as well? (Of course filename >> changed with generic pt series). > > Yes. I was uncertain about the 1st stage mapping as it has a usage for > GVA->GPA mappings, and for the trusted MMIO case, we do need the > C-bit added to GPA. But since vIOMMU isn’t supported for SNP guests, > and the trusted MMIO isn't ready yet, I think it should be safe to proceed > with this now. The above consideration can be re-visited when the trusted > MMIO gets landed. Right. We use v2 (guest) page table to support PASID/PRI in host. Agree. for now lets fix the v2 page table as well. We will fix it for Secure vIOMMU when we add support. > > I'll add it in the next version and see if others might have a different > perspective on this. Thanks! -Vasant ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang 2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang @ 2025-11-03 14:00 ` Wei Wang 2025-11-07 1:03 ` Jason Gunthorpe 1 sibling, 1 reply; 20+ messages in thread From: Wei Wang @ 2025-11-03 14:00 UTC (permalink / raw) To: alex, jgg, suravee.suthikulpanit, thomas.lendacky, joro Cc: kevin.tian, wei.w.wang, linux-kernel, iommu Before requesting the IOMMU driver to map an IOVA to a physical address, set the IOMMU_MMIO flag in dma->prot when the physical address corresponds to MMIO. This allows the IOMMU driver to handle MMIO mappings specially. For example, on AMD CPUs with SME enabled, the AMD IOMMU driver avoids setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot. This prevents issues with PCIe P2P communication, since current PCIe switches and devices do not interpret the C-bit correctly. Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> --- drivers/vfio/vfio_iommu_type1.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 5167bec14e36..f5c56e227f9a 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -583,7 +583,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, * returned initial pfn are provided; subsequent pfns are contiguous. */ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, - unsigned long npages, int prot, unsigned long *pfn, + unsigned long npages, int *prot, unsigned long *pfn, struct vfio_batch *batch) { unsigned long pin_pages = min_t(unsigned long, npages, batch->capacity); @@ -591,7 +591,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, unsigned int flags = 0; long ret; - if (prot & IOMMU_WRITE) + if (*prot & IOMMU_WRITE) flags |= FOLL_WRITE; mmap_read_lock(mm); @@ -601,6 +601,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, *pfn = page_to_pfn(batch->pages[0]); batch->size = ret; batch->offset = 0; + *prot &= ~IOMMU_MMIO; goto done; } else if (!ret) { ret = -EFAULT; @@ -615,7 +616,7 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, unsigned long addr_mask; ret = follow_fault_pfn(vma, mm, vaddr, pfn, &addr_mask, - prot & IOMMU_WRITE); + *prot & IOMMU_WRITE); if (ret == -EAGAIN) goto retry; @@ -629,6 +630,9 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, ret = -EFAULT; } } + + if (vma->vm_flags & VM_IO) + *prot |= IOMMU_MMIO; } done: mmap_read_unlock(mm); @@ -709,7 +713,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, cond_resched(); /* Empty batch, so refill it. */ - ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot, + ret = vaddr_get_pfns(mm, vaddr, npage, &dma->prot, &pfn, batch); if (ret < 0) goto unpin_out; @@ -850,7 +854,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, vfio_batch_init_single(&batch); - ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, &batch); + ret = vaddr_get_pfns(mm, vaddr, 1, &dma->prot, pfn_base, &batch); if (ret != 1) goto out; -- 2.51.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang @ 2025-11-07 1:03 ` Jason Gunthorpe 2025-11-07 2:38 ` Wei Wang 0 siblings, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 1:03 UTC (permalink / raw) To: Wei Wang Cc: alex, suravee.suthikulpanit, thomas.lendacky, joro, kevin.tian, linux-kernel, iommu On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote: > Before requesting the IOMMU driver to map an IOVA to a physical address, > set the IOMMU_MMIO flag in dma->prot when the physical address corresponds > to MMIO. This allows the IOMMU driver to handle MMIO mappings specially. > For example, on AMD CPUs with SME enabled, the AMD IOMMU driver avoids > setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot. > This prevents issues with PCIe P2P communication, since current PCIe > switches and devices do not interpret the C-bit correctly. > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > --- > drivers/vfio/vfio_iommu_type1.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) This may be the best you can do with vfio type1, but just because the VMA is special doesn't necessarily mean it is MMIO, nor does it mean it is decrypted memory. I think to really make this work fully properly going forward people are going to have to use iommufd's dmabuf. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 1:03 ` Jason Gunthorpe @ 2025-11-07 2:38 ` Wei Wang 2025-11-07 14:16 ` Jason Gunthorpe 0 siblings, 1 reply; 20+ messages in thread From: Wei Wang @ 2025-11-07 2:38 UTC (permalink / raw) To: Jason Gunthorpe Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Friday, November 7, 2025 9:04 AM, Jason Gunthorpe wrote: > On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote: > > Before requesting the IOMMU driver to map an IOVA to a physical > > address, set the IOMMU_MMIO flag in dma->prot when the physical > > address corresponds to MMIO. This allows the IOMMU driver to handle > MMIO mappings specially. > > For example, on AMD CPUs with SME enabled, the AMD IOMMU driver > avoids > > setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot. > > This prevents issues with PCIe P2P communication, since current PCIe > > switches and devices do not interpret the C-bit correctly. > > > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > > --- > > drivers/vfio/vfio_iommu_type1.c | 14 +++++++++----- > > 1 file changed, 9 insertions(+), 5 deletions(-) > > This may be the best you can do with vfio type1, but just because the VMA is > special doesn't necessarily mean it is MMIO, nor does it mean it is decrypted > memory. I think here vfio type1 only needs to provide the info about "MMIO or not" (the decision to encrypt MMIO or not rests with the vendor IOMMU driver). Why might a region not be MMIO when vma->flags includes VM_IO | VM_PFNMAP? (are you aware of any real examples in use?) For reference, BAR MMIO regions are explicitly mapped with these flags in vfio_pci_core_mmap() : vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);" The only exception I can think of is nested virtualization, where an emulated device (with vMMIO emulated using system RAM) is passed through to a nested guest. But this might not be commonly used in practice (no performance gain as physical device pass through), and the lack of encryption for such vMMIO should not be a concern, IMHO. Its security model aligns essentially with that of the host (i.e., physical MMIO data is not encrypted on the host, and the same principle applies to emulated vMMIO in nested environments). Also the same for physical devices (as opposed to virtual devices) passed to a nested guest. > > I think to really make this work fully properly going forward people are going > to have to use iommufd's dmabuf. Yeah, I'll also patch for iommufd. We still need to account for the case that many users are still relying on legacy VFIO type1 (will also have some backport work of this patch). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 2:38 ` Wei Wang @ 2025-11-07 14:16 ` Jason Gunthorpe [not found] ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com> 0 siblings, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 14:16 UTC (permalink / raw) To: Wei Wang Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Fri, Nov 07, 2025 at 02:38:50AM +0000, Wei Wang wrote: > On Friday, November 7, 2025 9:04 AM, Jason Gunthorpe wrote: > > On Mon, Nov 03, 2025 at 10:00:34PM +0800, Wei Wang wrote: > > > Before requesting the IOMMU driver to map an IOVA to a physical > > > address, set the IOMMU_MMIO flag in dma->prot when the physical > > > address corresponds to MMIO. This allows the IOMMU driver to handle > > MMIO mappings specially. > > > For example, on AMD CPUs with SME enabled, the AMD IOMMU driver > > avoids > > > setting the C-bit if iommu_map() is called with IOMMU_MMIO set in prot. > > > This prevents issues with PCIe P2P communication, since current PCIe > > > switches and devices do not interpret the C-bit correctly. > > > > > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> > > > --- > > > drivers/vfio/vfio_iommu_type1.c | 14 +++++++++----- > > > 1 file changed, 9 insertions(+), 5 deletions(-) > > > > This may be the best you can do with vfio type1, but just because the VMA is > > special doesn't necessarily mean it is MMIO, nor does it mean it is decrypted > > memory. > > I think here vfio type1 only needs to provide the info about "MMIO or not" > (the decision to encrypt MMIO or not rests with the vendor IOMMU driver). > > Why might a region not be MMIO when vma->flags includes VM_IO | VM_PFNMAP? > (are you aware of any real examples in use?) VM_IO should indicate MMIO, yes, but we don't actually check that in this type 1 path.. > > I think to really make this work fully properly going forward people are going > > to have to use iommufd's dmabuf. > > Yeah, I'll also patch for iommufd. We still need to account for the > case that many users are still relying on legacy VFIO type1 (will > also have some backport work of this patch). I think my dmabuf patch for iommufd already does this properly. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>]
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses [not found] ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com> @ 2025-11-07 15:57 ` Jason Gunthorpe 2025-11-07 16:19 ` Wei Wang 0 siblings, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 15:57 UTC (permalink / raw) To: Wei Wang Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote: > > (are you aware of any real examples in use?) > > VM_IO should indicate MMIO, yes, but we don't actually check that in > > this type 1 path.. > Is it because VFIO type1 didn’t need to check for MMIO before? > (not sure how this impacts this patch adding the VM_IO check for MMIO > :) ) Okay, but it still doesn't mean it has to be decrypted.. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 15:57 ` Jason Gunthorpe @ 2025-11-07 16:19 ` Wei Wang 2025-11-07 16:36 ` Jason Gunthorpe 0 siblings, 1 reply; 20+ messages in thread From: Wei Wang @ 2025-11-07 16:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote: On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote: > > (are you aware of any real examples in use?) > > VM_IO should indicate MMIO, yes, but we don't actually check that in > > this type 1 path.. > Is it because VFIO type1 didn’t need to check for MMIO before? > (not sure how this impacts this patch adding the VM_IO check for MMIO > :) ) > Okay, but it still doesn't mean it has to be decrypted.. I think "decrypted or not" is the job of the 1st patch. For now, MMIO cannot be encrypted, particularly not via sme_set(). If MMIO encryption is ever introduced in the future, a new flag (probably different from sme_me_mask) would need to be added. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 16:19 ` Wei Wang @ 2025-11-07 16:36 ` Jason Gunthorpe 2025-11-07 17:56 ` Tom Lendacky 0 siblings, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 16:36 UTC (permalink / raw) To: Wei Wang Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev On Fri, Nov 07, 2025 at 04:19:35PM +0000, Wei Wang wrote: > On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote: > On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote: > > > (are you aware of any real examples in use?) > > > VM_IO should indicate MMIO, yes, but we don't actually check that in > > > this type 1 path.. > > > Is it because VFIO type1 didn’t need to check for MMIO before? > > (not sure how this impacts this patch adding the VM_IO check for MMIO > > :) ) > > > Okay, but it still doesn't mean it has to be decrypted.. > > I think "decrypted or not" is the job of the 1st patch. For now, > MMIO cannot be encrypted, particularly not via sme_set(). If MMIO > encryption is ever introduced in the future, a new flag (probably > different from sme_me_mask) would need to be added. The kernel is using "decrypted" as some weirdo code-word to mean the memory is shared with the hypervisor. Only on AMD does it even have anything to do with actual memory encryption. However when I look at swiotlb and dma coherent mmap I see it calls set_memory_decrypted(), uses pgprot_decrypted(), but still uses __sme_set() when forming the iommu page table?? So why is that OK, but MMIO needs to avoid the sme_set() in the iommu page table? IOW I would like to hear from AMD some clear rules when sme_set needs to be called and when it isn't. Then we can decide if VM_IO is sufficient and so on. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 16:36 ` Jason Gunthorpe @ 2025-11-07 17:56 ` Tom Lendacky 2025-11-07 18:32 ` Jason Gunthorpe 0 siblings, 1 reply; 20+ messages in thread From: Tom Lendacky @ 2025-11-07 17:56 UTC (permalink / raw) To: Jason Gunthorpe, Wei Wang Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On 11/7/25 10:36, Jason Gunthorpe wrote: > On Fri, Nov 07, 2025 at 04:19:35PM +0000, Wei Wang wrote: >> On Friday, November 7, 2025 11:57 PM, Jason Gunthorpe wrote: >> On Fri, Nov 07, 2025 at 03:49:17PM +0000, Wei Wang wrote: >>> > (are you aware of any real examples in use?) >>> > VM_IO should indicate MMIO, yes, but we don't actually check that in >>> > this type 1 path.. >> >>> Is it because VFIO type1 didn’t need to check for MMIO before? >>> (not sure how this impacts this patch adding the VM_IO check for MMIO >>> :) ) >> >>> Okay, but it still doesn't mean it has to be decrypted.. >> >> I think "decrypted or not" is the job of the 1st patch. For now, >> MMIO cannot be encrypted, particularly not via sme_set(). If MMIO >> encryption is ever introduced in the future, a new flag (probably >> different from sme_me_mask) would need to be added. > > The kernel is using "decrypted" as some weirdo code-word to mean the > memory is shared with the hypervisor. Only on AMD does it even have > anything to do with actual memory encryption. > > However when I look at swiotlb and dma coherent mmap I see it calls > set_memory_decrypted(), uses pgprot_decrypted(), but still uses > __sme_set() when forming the iommu page table?? > > So why is that OK, but MMIO needs to avoid the sme_set() in the iommu > page table? > > IOW I would like to hear from AMD some clear rules when sme_set needs > to be called and when it isn't. When you are on bare-metal, or in the hypervisor, System Memory Encryption (SME) deals with the encryption bit set in the page table entries (including the nested page table entries for guests). If the encryption bit is not set (decrypted), data does not get encrypted when written to memory and does not get decrypted when read from memory. If the encryption bit is set (encrypted), data gets encrypted when written to memory and decrypted when read from memory. MMIO, since it does not go through the memory controller, does not support encryption capabilities and so should not have the encryption bit set as it isn't recognized as system memory. On the hypervisor, when using the IOMMU, SWIOTLB is not used and I/O to and from system memory (DMA) will be encrypted and/or decrypted if the encryption bit is set in the I/O page table leaf entry. If the IOMMU is not enabled, then SWIOTLB is only used if the device does not support DMA addressing at or above the encryption bit location. In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory is used because a device cannot DMA to or from guest memory using the guest encryption key. So all DMA must go to "decrypted" memory or be bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory that does not get encrypted/decrypted using the guest encryption key. It is not until we get to Trusted I/O / TDISP where devices will be able to DMA directly to guest encrypted memory and guests will require secure MMIO addresses which will need the encryption bit set (Alexey can correct me on the TIO statements if they aren't correct, as he is closer to it all). I hope I've explained it in a way that makes sense. Thanks, Tom > > Then we can decide if VM_IO is sufficient and so on. > > Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 17:56 ` Tom Lendacky @ 2025-11-07 18:32 ` Jason Gunthorpe 2025-11-07 19:59 ` Tom Lendacky 0 siblings, 1 reply; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-07 18:32 UTC (permalink / raw) To: Tom Lendacky Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote: > When you are on bare-metal, or in the hypervisor, System Memory Encryption > (SME) deals with the encryption bit set in the page table entries > (including the nested page table entries for guests). So "decrypted" means something about AMD's unique memory encryption scheme on bare metal but in a CC guest it is a cross arch 'shared with hypervisor' flag? What about CXL memory? What about ZONE_DEVICE coherent memory? Do these get the C bit set too? :( :( :( > In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory > is used because a device cannot DMA to or from guest memory using the > guest encryption key. So all DMA must go to "decrypted" memory or be > bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory > that does not get encrypted/decrypted using the guest encryption key. Where is the code for this? As I wrote we always do sme_set in the iommu driver, even on guests, even for "decrypted" bounce buffered memory. That sounds like a bug by your explanation? Does this mean vIOMMU has never worked in AMD CC guests? > It is not until we get to Trusted I/O / TDISP where devices will be able > to DMA directly to guest encrypted memory and guests will require secure > MMIO addresses which will need the encryption bit set (Alexey can correct > me on the TIO statements if they aren't correct, as he is closer to it all). So in this case we do need to do sme_set on MMIO even though that MMIO is not using the dram encryption key? Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 18:32 ` Jason Gunthorpe @ 2025-11-07 19:59 ` Tom Lendacky 2025-11-10 6:28 ` Wei Wang ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Tom Lendacky @ 2025-11-07 19:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On 11/7/25 12:32, Jason Gunthorpe wrote: > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote: > >> When you are on bare-metal, or in the hypervisor, System Memory Encryption >> (SME) deals with the encryption bit set in the page table entries >> (including the nested page table entries for guests). > > So "decrypted" means something about AMD's unique memory encryption > scheme on bare metal but in a CC guest it is a cross arch 'shared with > hypervisor' flag? Note, that if the encryption bit is not set in the guest, then the host encryption key is used if the underlying NPT leaf entry has the encryption bit set. In that case, both the host and guest can read the memory, with the memory still being encrypted in physical memory. > > What about CXL memory? What about ZONE_DEVICE coherent memory? Do > these get the C bit set too? When CXL memory is presented as system memory to the OS it does support the encryption bit. So when pages are allocated for the guest, the memory pages will be encrypted with the guest key. Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to the system as system physical memory that the hypervisor can allocate as guest memory? > > :( :( :( > >> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory >> is used because a device cannot DMA to or from guest memory using the >> guest encryption key. So all DMA must go to "decrypted" memory or be >> bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory >> that does not get encrypted/decrypted using the guest encryption key. > > Where is the code for this? As I wrote we always do sme_set in the > iommu driver, even on guests, even for "decrypted" bounce buffered > memory. > > That sounds like a bug by your explanation? > > Does this mean vIOMMU has never worked in AMD CC guests? I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This does does not work today with AMD CC guests since it requires the hypervisor to read the guest IOMMU buffers in order to emulate the behavior and those buffers are encrypted. So there is no vIOMMU support today in AMD CC guests. There was a patch series submitted a while back to allocate the IOMMU buffers in shared memory in order to support a (non-secure) vIOMMU in the guest in order to support >255 vCPUs, but that was rejected in favor of using kvm-msi-ext-dest-id. https://lore.kernel.org/linux-iommu/20240430152430.4245-1-suravee.suthikulpanit@amd.com/ > >> It is not until we get to Trusted I/O / TDISP where devices will be able >> to DMA directly to guest encrypted memory and guests will require secure >> MMIO addresses which will need the encryption bit set (Alexey can correct >> me on the TIO statements if they aren't correct, as he is closer to it all). > > So in this case we do need to do sme_set on MMIO even though that MMIO > is not using the dram encryption key? @Alexey will be able to provide more details on how this works. Thanks, Tom > > Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 19:59 ` Tom Lendacky @ 2025-11-10 6:28 ` Wei Wang 2025-11-10 9:55 ` Vasant Hegde 2025-11-18 14:36 ` Jason Gunthorpe 2 siblings, 0 replies; 20+ messages in thread From: Wei Wang @ 2025-11-10 6:28 UTC (permalink / raw) To: Tom Lendacky, Jason Gunthorpe Cc: alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On Saturday, November 8, 2025 3:59 AM, Tom Lendacky wrote: > On 11/7/25 12:32, Jason Gunthorpe wrote: > > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote: > > > >> When you are on bare-metal, or in the hypervisor, System Memory > >> Encryption > >> (SME) deals with the encryption bit set in the page table entries > >> (including the nested page table entries for guests). > > > > So "decrypted" means something about AMD's unique memory encryption > > scheme on bare metal but in a CC guest it is a cross arch 'shared with > > hypervisor' flag? > > Note, that if the encryption bit is not set in the guest, then the host > encryption key is used if the underlying NPT leaf entry has the encryption bit > set. In that case, both the host and guest can read the memory, with the > memory still being encrypted in physical memory. > > > > > What about CXL memory? What about ZONE_DEVICE coherent memory? > Do > > these get the C bit set too? > > When CXL memory is presented as system memory to the OS it does support > the encryption bit. So when pages are allocated for the guest, the memory > pages will be encrypted with the guest key. > > Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented > to the system as system physical memory that the hypervisor can allocate as > guest memory? > > > > > :( :( :( > > > >> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) > >> memory is used because a device cannot DMA to or from guest memory > >> using the guest encryption key. So all DMA must go to "decrypted" > >> memory or be bounce-buffered through "decrypted" memory (SWIOTLB) > - > >> basically memory that does not get encrypted/decrypted using the guest > encryption key. > > > > Where is the code for this? As I wrote we always do sme_set in the > > iommu driver, even on guests, even for "decrypted" bounce buffered > > memory. > > > > That sounds like a bug by your explanation? > > > > Does this mean vIOMMU has never worked in AMD CC guests? > > I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This > does does not work today with AMD CC guests since it requires the > hypervisor to read the guest IOMMU buffers in order to emulate the > behavior and those buffers are encrypted. So there is no vIOMMU support > today in AMD CC guests. > > There was a patch series submitted a while back to allocate the IOMMU > buffers in shared memory in order to support a (non-secure) vIOMMU in the > guest in order to support >255 vCPUs, but that was rejected in favor of using > kvm-msi-ext-dest-id. > > https://lore.kernel.org/linux-iommu/20240430152430.4245-1- > suravee.suthikulpanit@amd.com/ > > > > >> It is not until we get to Trusted I/O / TDISP where devices will be > >> able to DMA directly to guest encrypted memory and guests will > >> require secure MMIO addresses which will need the encryption bit set > >> (Alexey can correct me on the TIO statements if they aren't correct, as he > is closer to it all). > > > > So in this case we do need to do sme_set on MMIO even though that > MMIO > > is not using the dram encryption key? > > @Alexey will be able to provide more details on how this works. > Also share my perspective on the proposed questions: In the TEE-IO case, trusted device MMIO is mapped to a private Guest Physical Address (FYI: this can be checked in the SEV-TIO whitepaper and Intel TDX Connect architecture spec), that is, the C-bit is added to GPA, not to the host physical address. So this case is not related to the updates introduced by this patch, which handles the C-bit for MMIO physical addresses (via iommu_v1_map_pages()). Also the "encryption" bit (C-bit for AMD and S-bit for Intel) in the MMIO GPA does not actually engage the encryption engine (e.g. SME) in the memory controller for data encryption (communication with the trusted device is encrypted via IDE). This bit is used for other non-encryption purposes. For the ZONE_DEVICE memory (memory hosted on devices), IIUC, it is accessed via PCIe (not through the on-die memory controller). So SME is not used to encrypt this type of memory. If we pass through such a device to the guest using VFIO type1, it will be treated as device MMIO that bypasses the C-bit setting added by this patch, which I think is the expected behavior. For the CXL memory, when it is used as the guest's system memory, it does not go through the VFIO PCI BAR MMIO pass-through mechanism. So it also falls outside the scope of the changes in this patch. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 19:59 ` Tom Lendacky 2025-11-10 6:28 ` Wei Wang @ 2025-11-10 9:55 ` Vasant Hegde 2025-11-18 14:36 ` Jason Gunthorpe 2 siblings, 0 replies; 20+ messages in thread From: Vasant Hegde @ 2025-11-10 9:55 UTC (permalink / raw) To: Tom Lendacky, Jason Gunthorpe Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On 11/8/2025 1:29 AM, Tom Lendacky wrote: > On 11/7/25 12:32, Jason Gunthorpe wrote: >> On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote: >> >>> When you are on bare-metal, or in the hypervisor, System Memory Encryption >>> (SME) deals with the encryption bit set in the page table entries >>> (including the nested page table entries for guests). >> >> So "decrypted" means something about AMD's unique memory encryption >> scheme on bare metal but in a CC guest it is a cross arch 'shared with >> hypervisor' flag? > > Note, that if the encryption bit is not set in the guest, then the host > encryption key is used if the underlying NPT leaf entry has the encryption > bit set. In that case, both the host and guest can read the memory, with > the memory still being encrypted in physical memory. > >> >> What about CXL memory? What about ZONE_DEVICE coherent memory? Do >> these get the C bit set too? > > When CXL memory is presented as system memory to the OS it does support > the encryption bit. So when pages are allocated for the guest, the memory > pages will be encrypted with the guest key. > > Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to > the system as system physical memory that the hypervisor can allocate as > guest memory? > >> >> :( :( :( >> >>> In the guest (prior to Trusted I/O / TDISP), decrypted (or shared) memory >>> is used because a device cannot DMA to or from guest memory using the >>> guest encryption key. So all DMA must go to "decrypted" memory or be >>> bounce-buffered through "decrypted" memory (SWIOTLB) - basically memory >>> that does not get encrypted/decrypted using the guest encryption key. >> >> Where is the code for this? As I wrote we always do sme_set in the >> iommu driver, even on guests, even for "decrypted" bounce buffered >> memory. >> >> That sounds like a bug by your explanation? >> >> Does this mean vIOMMU has never worked in AMD CC guests? > > I assume by vIOMMU you mean a VMM-emulated IOMMU in the guest. This does > does not work today with AMD CC guests since it requires the hypervisor to > read the guest IOMMU buffers in order to emulate the behavior and those > buffers are encrypted. So there is no vIOMMU support today in AMD CC > guests. > > There was a patch series submitted a while back to allocate the IOMMU > buffers in shared memory in order to support a (non-secure) vIOMMU in the > guest in order to support >255 vCPUs, but that was rejected in favor of > using kvm-msi-ext-dest-id. > > https://lore.kernel.org/linux-iommu/20240430152430.4245-1-suravee.suthikulpanit@amd.com/ > >> >>> It is not until we get to Trusted I/O / TDISP where devices will be able >>> to DMA directly to guest encrypted memory and guests will require secure >>> MMIO addresses which will need the encryption bit set (Alexey can correct >>> me on the TIO statements if they aren't correct, as he is closer to it all). >> >> So in this case we do need to do sme_set on MMIO even though that MMIO >> is not using the dram encryption key? Yes. Its mapped to GPA (at least IOMMU VF MMIO BAR, I believe its same for TIO device as well) and we need to set 'C' bit. -Vasan > > @Alexey will be able to provide more details on how this works. > > Thanks, > Tom > >> >> Jason > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses 2025-11-07 19:59 ` Tom Lendacky 2025-11-10 6:28 ` Wei Wang 2025-11-10 9:55 ` Vasant Hegde @ 2025-11-18 14:36 ` Jason Gunthorpe 2 siblings, 0 replies; 20+ messages in thread From: Jason Gunthorpe @ 2025-11-18 14:36 UTC (permalink / raw) To: Tom Lendacky Cc: Wei Wang, alex@shazbot.org, suravee.suthikulpanit@amd.com, joro@8bytes.org, kevin.tian@intel.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, Alexey Kardashevskiy On Fri, Nov 07, 2025 at 01:59:00PM -0600, Tom Lendacky wrote: > On 11/7/25 12:32, Jason Gunthorpe wrote: > > On Fri, Nov 07, 2025 at 11:56:51AM -0600, Tom Lendacky wrote: > > > >> When you are on bare-metal, or in the hypervisor, System Memory Encryption > >> (SME) deals with the encryption bit set in the page table entries > >> (including the nested page table entries for guests). > > > > So "decrypted" means something about AMD's unique memory encryption > > scheme on bare metal but in a CC guest it is a cross arch 'shared with > > hypervisor' flag? > > Note, that if the encryption bit is not set in the guest, then the host > encryption key is used if the underlying NPT leaf entry has the encryption > bit set. In that case, both the host and guest can read the memory, with > the memory still being encrypted in physical memory. Sure, so in the guest it is simple a 'shared with hypervisor' flag and does not directly indicate if the memory controller did encryption or not. > > What about CXL memory? What about ZONE_DEVICE coherent memory? Do > > these get the C bit set too? > > When CXL memory is presented as system memory to the OS it does support > the encryption bit. So when pages are allocated for the guest, the memory > pages will be encrypted with the guest key. > > Not sure what you mean by ZONE_DEVICE coherent memory. Is it presented to > the system as system physical memory that the hypervisor can allocate as > guest memory? This is an option for CXL memory on CXL type 2 devices - ie GPU memory. It is coherent but it is managed by a driver not by the core OS as system memory. > There was a patch series submitted a while back to allocate the IOMMU > buffers in shared memory in order to support a (non-secure) vIOMMU in the > guest in order to support >255 vCPUs, but that was rejected in favor of > using kvm-msi-ext-dest-id. Yes, but that was incomplete, it only did the data structures and only really worked for interrupt remapping. It left the actual iommu broken since we don't clear the C bit on swiotlb. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-11-18 14:37 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 14:00 [PATCH v2 0/2] iommu/amd: Avoid setting C-bit for MMIO addresses Wei Wang
2025-11-03 14:00 ` [PATCH v2 1/2] iommu/amd: Add IOMMU_PROT_IE flag for memory encryption Wei Wang
2025-11-07 1:02 ` Jason Gunthorpe
2025-11-07 2:39 ` Wei Wang
2025-11-10 9:55 ` Vasant Hegde
2025-11-11 1:18 ` Wei Wang
2025-11-11 4:44 ` Vasant Hegde
2025-11-03 14:00 ` [PATCH v2 2/2] vfio/type1: Set IOMMU_MMIO in dma->prot for MMIO-backed addresses Wei Wang
2025-11-07 1:03 ` Jason Gunthorpe
2025-11-07 2:38 ` Wei Wang
2025-11-07 14:16 ` Jason Gunthorpe
[not found] ` <SI2PR01MB4393E04163E5AC9FD45D56EFDCC3A@SI2PR01MB4393.apcprd01.prod.exchangelabs.com>
2025-11-07 15:57 ` Jason Gunthorpe
2025-11-07 16:19 ` Wei Wang
2025-11-07 16:36 ` Jason Gunthorpe
2025-11-07 17:56 ` Tom Lendacky
2025-11-07 18:32 ` Jason Gunthorpe
2025-11-07 19:59 ` Tom Lendacky
2025-11-10 6:28 ` Wei Wang
2025-11-10 9:55 ` Vasant Hegde
2025-11-18 14:36 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox