public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
@ 2025-10-23 15:15 Wei Wang
  2025-10-23 16:01 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Wei Wang @ 2025-10-23 15:15 UTC (permalink / raw)
  To: suravee.suthikulpanit, thomas.lendacky, jroedel
  Cc: kevin.tian, jgg, linux-kernel, iommu, Wei Wang

When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for
all physical addresses. This is correct for RAM, since the C-bit is
required by SME to indicate encrypted memory and ensure proper
encryption/decryption.

However, applying the C-bit to MMIO addresses is incorrect. Devices and
PCIe switches do not interpret the C-bit currently, and doing so can break
PCIe peer-to-peer communication. To avoid this, only set the C-bit when
the physical address is backed by RAM, and leave MMIO mappings unchanged.

Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
---
 drivers/iommu/amd/io_pgtable.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 70c2f5b1631b..6f395940d0a4 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -353,6 +353,9 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	if (!(prot & IOMMU_PROT_MASK))
 		goto out;
 
+	if (sme_me_mask && page_is_ram(PHYS_PFN(paddr)))
+		paddr = __sme_set(paddr);
+
 	while (pgcount > 0) {
 		count = PAGE_SIZE_PTE_COUNT(pgsize);
 		pte   = alloc_pte(pgtable, iova, pgsize, NULL, gfp, &updated);
@@ -368,10 +371,10 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 			updated = true;
 
 		if (count > 1) {
-			__pte = PAGE_SIZE_PTE(__sme_set(paddr), pgsize);
+			__pte = PAGE_SIZE_PTE(paddr, pgsize);
 			__pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
 		} else
-			__pte = __sme_set(paddr) | IOMMU_PTE_PR | IOMMU_PTE_FC;
+			__pte = paddr | IOMMU_PTE_PR | IOMMU_PTE_FC;
 
 		if (prot & IOMMU_PROT_IR)
 			__pte |= IOMMU_PTE_IR;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
  2025-10-23 15:15 [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables Wei Wang
@ 2025-10-23 16:01 ` Jason Gunthorpe
  2025-10-24  3:05   ` Wei Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-10-23 16:01 UTC (permalink / raw)
  To: Wei Wang
  Cc: suravee.suthikulpanit, thomas.lendacky, jroedel, kevin.tian,
	linux-kernel, iommu

On Thu, Oct 23, 2025 at 11:15:43PM +0800, Wei Wang wrote:
> When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for
> all physical addresses. This is correct for RAM, since the C-bit is
> required by SME to indicate encrypted memory and ensure proper
> encryption/decryption.
> 
> However, applying the C-bit to MMIO addresses is incorrect. Devices and
> PCIe switches do not interpret the C-bit currently, and doing so can break
> PCIe peer-to-peer communication. To avoid this, only set the C-bit when
> the physical address is backed by RAM, and leave MMIO mappings unchanged.
> 
> Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> ---
>  drivers/iommu/amd/io_pgtable.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 70c2f5b1631b..6f395940d0a4 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -353,6 +353,9 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (!(prot & IOMMU_PROT_MASK))
>  		goto out;
>  
> +	if (sme_me_mask && page_is_ram(PHYS_PFN(paddr)))
> +		paddr = __sme_set(paddr);

It needs to use the IOMMU_MMIO flag not page_is_ram, which I think got
mangled by the time it reached here..

Though broadly this points to a larger problem, the iommu domain code
should not be trying to guess if a mapping is private or not, this
needs to be passed in from higher level code which knows what state
the PFN is..

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
  2025-10-23 16:01 ` Jason Gunthorpe
@ 2025-10-24  3:05   ` Wei Wang
  2025-10-24 11:40     ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Wei Wang @ 2025-10-24  3:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Friday, October 24, 2025 12:02 AM, Jason Gunthorpe wrote:
> On Thu, Oct 23, 2025 at 11:15:43PM +0800, Wei Wang wrote:
> > When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for
> > all physical addresses. This is correct for RAM, since the C-bit is
> > required by SME to indicate encrypted memory and ensure proper
> > encryption/decryption.
> >
> > However, applying the C-bit to MMIO addresses is incorrect. Devices
> > and PCIe switches do not interpret the C-bit currently, and doing so
> > can break PCIe peer-to-peer communication. To avoid this, only set the
> > C-bit when the physical address is backed by RAM, and leave MMIO
> mappings unchanged.
> >
> > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with
> > memory encryption")
> > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > ---
> >  drivers/iommu/amd/io_pgtable.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/amd/io_pgtable.c
> > b/drivers/iommu/amd/io_pgtable.c index 70c2f5b1631b..6f395940d0a4
> > 100644
> > --- a/drivers/iommu/amd/io_pgtable.c
> > +++ b/drivers/iommu/amd/io_pgtable.c
> > @@ -353,6 +353,9 @@ static int iommu_v1_map_pages(struct
> io_pgtable_ops *ops, unsigned long iova,
> >  	if (!(prot & IOMMU_PROT_MASK))
> >  		goto out;
> >
> > +	if (sme_me_mask && page_is_ram(PHYS_PFN(paddr)))
> > +		paddr = __sme_set(paddr);
> 
> It needs to use the IOMMU_MMIO flag not page_is_ram, which I think got
> mangled by the time it reached here..

Could you please elaborate how page_is_ram() would be mangled when
reaching here?
(I might not have fully understood your point. My understanding was that
the result from page_is_ram(), regarding whether a physical address
corresponds to system RAM, is stable after the system boots. One exception
would be RAM hotplug, but this is the same for the case using IOMMU_MMIO
from callers. If the stability of this result is indeed problematic, then
other users of page_is_ram() and /proc/iomem would also be at risk)

> 
> Though broadly this points to a larger problem, the iommu domain code
> should not be trying to guess if a mapping is private or not, this needs to be
> passed in from higher level code which knows what state the PFN is..

Please note that this patch is not intended to add an interface allowing users
to specify whether a requested physical address is expected to be mapped as
Private or not. Instead, it implements a sanity or correctness check within the
IOMMU driver to validate whether a user-supplied address _can_ be mapped
with the Private bit (RAM is the case that "can" currently, and since the driver
can already determine whether a PFN is RAM or not, I'm not sure why it needs
an interface for users to tell the driver).

Even if a need arises in the future to add a new interface for users to indicate
that the IOVA->PA mapping expects a Private bit to PA, a new flag (e.g.,
IOMMU_ENCRYPT) would be more semantically appropriate than IOMMU_MMIO.
The validation introduced by this patch would still be necessary. I think this check
Is essentially similar to other guardrails or gatekeeping functions at lower driver
layers.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
  2025-10-24  3:05   ` Wei Wang
@ 2025-10-24 11:40     ` Jason Gunthorpe
  2025-10-24 14:23       ` Wei Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-10-24 11:40 UTC (permalink / raw)
  To: Wei Wang
  Cc: suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Fri, Oct 24, 2025 at 03:05:05AM +0000, Wei Wang wrote:
> On Friday, October 24, 2025 12:02 AM, Jason Gunthorpe wrote:
> > On Thu, Oct 23, 2025 at 11:15:43PM +0800, Wei Wang wrote:
> > > When SME is enabled, iommu_v1_map_pages() currently sets the C-bit for
> > > all physical addresses. This is correct for RAM, since the C-bit is
> > > required by SME to indicate encrypted memory and ensure proper
> > > encryption/decryption.
> > >
> > > However, applying the C-bit to MMIO addresses is incorrect. Devices
> > > and PCIe switches do not interpret the C-bit currently, and doing so
> > > can break PCIe peer-to-peer communication. To avoid this, only set the
> > > C-bit when the physical address is backed by RAM, and leave MMIO
> > mappings unchanged.
> > >
> > > Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with
> > > memory encryption")
> > > Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
> > > ---
> > >  drivers/iommu/amd/io_pgtable.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/iommu/amd/io_pgtable.c
> > > b/drivers/iommu/amd/io_pgtable.c index 70c2f5b1631b..6f395940d0a4
> > > 100644
> > > --- a/drivers/iommu/amd/io_pgtable.c
> > > +++ b/drivers/iommu/amd/io_pgtable.c
> > > @@ -353,6 +353,9 @@ static int iommu_v1_map_pages(struct
> > io_pgtable_ops *ops, unsigned long iova,
> > >  	if (!(prot & IOMMU_PROT_MASK))
> > >  		goto out;
> > >
> > > +	if (sme_me_mask && page_is_ram(PHYS_PFN(paddr)))
> > > +		paddr = __sme_set(paddr);
> > 
> > It needs to use the IOMMU_MMIO flag not page_is_ram, which I think got
> > mangled by the time it reached here..
> 
> Could you please elaborate how page_is_ram() would be mangled when
> reaching here?

Sorry not page_is_ram(), but prot - it starts out with something that
had IOMMU_MMIO and got mangled.

> > Though broadly this points to a larger problem, the iommu domain code
> > should not be trying to guess if a mapping is private or not, this needs to be
> > passed in from higher level code which knows what state the PFN is..
> 
> Please note that this patch is not intended to add an interface allowing users
> to specify whether a requested physical address is expected to be mapped as
> Private or not. Instead, it implements a sanity or correctness check
> within the

It is not a santiy check, a sanity check would fail the mapping. This
is changing what PTE is created by trying to guess properties about
the pfn.

> IOMMU driver to validate whether a user-supplied address _can_ be mapped
> with the Private bit (RAM is the case that "can" currently, and since the driver
> can already determine whether a PFN is RAM or not, I'm not sure why it needs
> an interface for users to tell the driver).

As I understand AMD's architecture the hypervisor runs with all ram as
encrypted and has to set the C bit for any dram. The MMIO is only
protected by the RMP and does not have a C bit set.

So even in a TDISP world with private MMIO we still end up with
system DRAM being treated differently than MMIO.

It really does seem like IOMMU_MMIO is the right direction here.

Again we should not be trying to guess if something is "ram" or not
deep inside the iommu code. We have IOMMU_MMIO specifically to tell
the iommu if it is ram or not.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
  2025-10-24 11:40     ` Jason Gunthorpe
@ 2025-10-24 14:23       ` Wei Wang
  2025-10-24 14:34         ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Wei Wang @ 2025-10-24 14:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Friday, October 24, 2025 11:40 AM, Jason Gunthorpe wrote:
> It is not a santiy check, a sanity check would fail the mapping. This
> is changing what PTE is created by trying to guess properties about
> the pfn.

It's a check and handling for now. If, later, it becomes necessary for callers
to pass an IOMMU_* flag to request the Private bit and upon the
check (via page_is_ram) failure, an error will be returned to the caller. 

> IOMMU driver to validate whether a user-supplied address _can_ be mapped
> with the Private bit (RAM is the case that "can" currently, and since the driver
> can already determine whether a PFN is RAM or not, I'm not sure why it needs
> an interface for users to tell the driver).

> As I understand AMD's architecture the hypervisor runs with all ram as
> encrypted and has to set the C bit for any dram. The MMIO is only
> protected by the RMP and does not have a C bit set.

> So even in a TDISP world with private MMIO we still end up with
> system DRAM being treated differently than MMIO.

Yes. In fact, the mentioned P2P broken issue also occurs on the host (this
is the primary case that the patch aims to address). The SME feature is used
in the host/native environment (not discussing the SNP guest case), and in
this context the C-bit is added to the host physical address.

> It really does seem like IOMMU_MMIO is the right direction here.

> Again we should not be trying to guess if something is "ram" or not
> deep inside the iommu code. We have IOMMU_MMIO specifically to tell
> the iommu if it is ram or not.

Sorry I think my main confusion here is why this is considered a ‘guess’ of RAM.
The function page_is_ram() clearly returns whether it is RAM, right? If so,
why is there a need to add an interface and require users (e.g., p2pdma,
vfio etc.) to tell the IOMMU driver that it is RAM or not?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables
  2025-10-24 14:23       ` Wei Wang
@ 2025-10-24 14:34         ` Jason Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2025-10-24 14:34 UTC (permalink / raw)
  To: Wei Wang
  Cc: suravee.suthikulpanit@amd.com, thomas.lendacky@amd.com,
	joro@8bytes.org, kevin.tian@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev

On Fri, Oct 24, 2025 at 02:23:54PM +0000, Wei Wang wrote:
> > Again we should not be trying to guess if something is "ram" or not
> > deep inside the iommu code. We have IOMMU_MMIO specifically to tell
> > the iommu if it is ram or not.
> 
> Sorry I think my main confusion here is why this is considered a ‘guess’ of RAM.
> The function page_is_ram() clearly returns whether it is RAM, right? 

IIRC it is not reliable. We have a lot of things in modern systems
that are cachable ram-like objects that page_is_ram() may or may not
return true on.

Further, page_is_ram is very expensive:

int __weak page_is_ram(unsigned long pfn)
{
        return walk_system_ram_range(pfn, 1, NULL, __is_ram) == 1;

We really don't want to be doing something like that for every IOMMU
PTE.

For your immediate problem IOMMU_MMIO is better, but broadly I think
this will need some attention later. I'm not sure how something like
CXL cachable ram is supposed to work through these APIs, or if it
should have the C bit set.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-24 14:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 15:15 [PATCH v1] iommu/amd: Set C-bit only for RAM-backed PTEs in IOMMU page tables Wei Wang
2025-10-23 16:01 ` Jason Gunthorpe
2025-10-24  3:05   ` Wei Wang
2025-10-24 11:40     ` Jason Gunthorpe
2025-10-24 14:23       ` Wei Wang
2025-10-24 14:34         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox