public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Will Deacon <will@kernel.org>
Cc: Evangelos Petrongonas <epetron@amazon.de>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Pranjal Shrivastava <praan@google.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, nh-open-source@amazon.com,
	Zeev Zilberman <zeev@amazon.com>
Subject: Re: [PATCH] iommu/arm-smmu-v3: Allow disabling Stage 1 translation
Date: Thu, 23 Apr 2026 11:23:26 -0300	[thread overview]
Message-ID: <20260423142326.GP3611611@ziepe.ca> (raw)
In-Reply-To: <aenqxasLJ8yKYqbT@willie-the-truck>

On Thu, Apr 23, 2026 at 10:47:49AM +0100, Will Deacon wrote:
> On Thu, Apr 23, 2026 at 10:44:08AM +0100, Will Deacon wrote:
> > On Wed, Apr 22, 2026 at 01:23:51PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 22, 2026 at 06:44:31AM +0000, Evangelos Petrongonas wrote:
> > > > The motivation is live update of the hypervisor: we want to kexec into a
> > > > new kernel while keeping DMA from passthrough devices flowing, which
> > > > means the SMMU's translation state has to survive the handover. The Live
> > > > Update Orchestrator work [1] and the in-progress  "iommu: Add live
> > > > update state preservation" series [2] are building exactly this plumbing
> > > > on top of KHO; [2]'s cover letter calls out Arm SMMUv3 support as future
> > > > work, and an earlier RFC from Amazon [3] sketched the same idea for
> > > > iommufd.
> > > 
> > > It would be appropriate to keep this patch with the rest of that out
> > > of tree pile, for example in the series that enables s2 only support
> > > in smmuv3.
> > > 
> > > > For this use case, Stage 2 is materially easier to persist than Stage 1,
> > > > for structural rather than performance reasons: 
> > > 
> > > I don't think so. The driver needs to know each and every STE that
> > > will survive KHO. The ones that don't survive need to be reset to
> > > abort STEs. From that point it is trivial enough to include the CD
> > > memory in the preservation.
> > > 
> > > It would help to send a preparation series to switch the ARM STE and
> > > CD logic away from dma_alloc_coherent and use iommu-pages instead,
> > > since we only expect iommu-pages to support preservation..
> > 
> > Does iommu-pages provide a mechanism to map the memory as non-cacheable
> > if the SMMU isn't coherent? 

No, it has to use CMOs today.

It looks like all the stuff dma_alloc_coherent does to make a
non-cached mapping are pretty arch specific. I don't know if there is
a way we could make more general code get a struct page into an
uncached KVA and meet all the arch rules?

I also think dma_alloc_coherent is far to complex, with pools and
more, to support KHO.

> > I really don't want to entertain CMOs for > the queues.
> 
> Sorry, I said "queues" here but I was really referring to any of the
> current dma_alloc_coherent() allocations and it's the CDs that matter
> in this thread.

queues shouldn't change they are too performance sensitive

> The rationale being that:
> 
> 1. A cacheable mapping is going to pollute the cache unnecessarily.
> 2. Reasoning about atomicity and ordering is a lot more subtle with CMOs.

The page table suffers from all of these draw backs, and the STE/CD is
touched alot less frequently. It is kind of odd to focus on these
issues with STE/CD when page table is a much bigger problem.

STE/CD is pretty simple now, there is only one place to put the CMO
and the ordering is all handled with that shared code. We no longer
care about ordering beyond all the writes must be visible to HW before
issuing the CMDQ invalidation command - which is the same environment
as the pagetable.

> 3. It seems like a pretty invasive driver change to support live update,
>    which isn't relevant for a lot of systems.

That's sort of the whole story of live update.. Trying to keep it
small means using the abstractions that support it like iommu-pages.

IMHO live update is OK to require coherent only, so at worst it could
use iommu-pages on coherent systems and keep using the
dma_alloc_coherent() for others.

I also don't like this "lot of systems thing". I don't want these
powerful capabilities locked up in some giant CSP's proprietary
kernel.  I want all the companies in the cloud market to have access
to the same feature set. That's what open source is supposed to be
driving toward. I have several interesting use cases for this
functionality already.

It will run probably $50-100B of AI cloud servers at least, I think
that is enough justification.

Jason


  reply	other threads:[~2026-04-23 14:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 12:32 [PATCH] iommu/arm-smmu-v3: Allow disabling Stage 1 translation Evangelos Petrongonas
2026-04-20 12:40 ` Jason Gunthorpe
2026-04-22  6:44   ` Evangelos Petrongonas
2026-04-22 15:44     ` Pranjal Shrivastava
2026-04-22 16:23     ` Jason Gunthorpe
2026-04-22 16:36       ` Robin Murphy
2026-04-23  9:44       ` Will Deacon
2026-04-23  9:47         ` Will Deacon
2026-04-23 14:23           ` Jason Gunthorpe [this message]
2026-04-23 17:07             ` Will Deacon
2026-04-23 18:43               ` Samiullah Khawaja
2026-04-23 22:37               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423142326.GP3611611@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=baolu.lu@linux.intel.com \
    --cc=epetron@amazon.de \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nh-open-source@amazon.com \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=zeev@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox