From: Pranjal Shrivastava <praan@google.com>
To: Samiullah Khawaja <skhawaja@google.com>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
David Woodhouse <dwmw2@infradead.org>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Alex Williamson <alex@shazbot.org>, Shuah Khan <shuah@kernel.org>,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Adithya Jayachandran <ajayachandra@nvidia.com>,
Parav Pandit <parav@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>, William Tu <witu@nvidia.com>,
Pratyush Yadav <pratyush@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
David Matlack <dmatlack@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Vipin Sharma <vipinsh@google.com>,
YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops
Date: Tue, 19 May 2026 14:40:05 +0000 [thread overview]
Message-ID: <agx2RW_jujXbsiea@google.com> (raw)
In-Reply-To: <agtux6SrjpRYvMit@google.com>
On Mon, May 18, 2026 at 08:32:42PM +0000, Samiullah Khawaja wrote:
> On Fri, May 08, 2026 at 02:36:56AM +0000, Samiullah Khawaja wrote:
> > On Thu, May 07, 2026 at 02:25:14PM +0800, Baolu Lu wrote:
> > > On 4/28/26 01:56, Samiullah Khawaja wrote:
> > > > Add implementation of the device and iommu presevation in a separate
> > > > file. Also set the device and iommu preserve/unpreserve ops in the
> > > > struct iommu_ops.
> > > >
> > > > During normal shutdown the iommu translation is disabled. Since the root
> > > > table is preserved during live update, it needs to be cleaned up and the
> > > > context entries of the unpreserved devices need to be cleared.
> > >
> > > This is not related to preserve/unpreserve ops and could be made in a
> > > separated patch?
> >
> > Agreed. I will move this stuff to a separate patch.
> > >
> > > >
> > > > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> > > > ---
> > > > MAINTAINERS | 1 +
> > > > drivers/iommu/intel/Makefile | 1 +
> > > > drivers/iommu/intel/iommu.c | 52 +++++++++++-
> > > > drivers/iommu/intel/iommu.h | 28 +++++++
> > > > drivers/iommu/intel/liveupdate.c | 139 +++++++++++++++++++++++++++++++
> > > > drivers/iommu/iommu.c | 18 ++++
> > > > include/linux/iommu-liveupdate.h | 10 +++
> > > > include/linux/iommu.h | 14 ++++
> > > > include/linux/kho/abi/iommu.h | 18 ++++
> > > > 9 files changed, 277 insertions(+), 4 deletions(-)
> > > > create mode 100644 drivers/iommu/intel/liveupdate.c
> > > >
>
> [snip]
> > >
> > > > +{
> > > > + struct context_entry *context;
> > > > + int ret;
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < ROOT_ENTRY_NR; i++) {
> > > > + /*
> > > > + * Alloc the context tables now to make sure the iommu unit is
> > > > + * properly preserved. These might stay unused and wastes around
> > > > + * 32MB max in scalable mode.
> > > > + */
> > >
> > > Instead of allocating and preserving context tables for all root entries
> > > (as noted, can waste up to 32MB), could we restrict this only to the
> > > entries possibly in use by active PCI devices?
> >
> > I think the hotplug devices or VFs created through SR-IOV will be missed
> > that way. Lets say device A is preserved and the associated iommu is
> > also preserved. And then a new device B is hotplugged and preserved,
> > then the context table for that will be missed.
>
> Ok I thought about it a little more and basically we have following
> things to consider when we preserve context tables,
>
> - The devices can be hotplugged and preserved, so the context tables of
> those need to be preserved if we don't allocate all of them first time
> we preserve iommu, as done here.
> - New context tables can be added (after hotplug) for unpreserved
> devices. And if we don't get another iommu preserve call after these
> are added, those remain unpreserved, so during shutdown those entries
> need to be removed from root table or preserved for simplicity.
>
> To solve this we can,
>
> 1. Either preserve the new context table when it is added for a preserved
> iommu. This can be done in iommu_context_addr(). This is simpler and
> no tracking needed.
>
> 2. Or track the preserved context tables using a bitmap and then preserve
> them incremently whenever a device is preserved. On shutdown during
> cleanup, we can clear the entries for unpreserved context tables from
> root table.
>
> I am inclined towards second option. WDYT?
Thinking out loud here, I agree that shifting away from the 32MB
pre-allocation is the right direction. I'm wondering if we can avoid the
overhead of introducing a new tracking bitmap (Option 2) altogether?
Since the IOMMU serialization is a strict dependency for device tracking,
could we move the context table preservation directly into the device
level op: intel_iommu_preserve_device()?
Whenever a specific device is preserved on-demand:
1. It queries the parent IOMMU to fetch the allocated context table
backing its info->bus.
2. It calls iommu_preserve_page(context) for that table. Because KHO's
tracking handles duplicates, this should be fine if multiple devices
reside on the same bus...
Regarding Scalable Mode, we could just need a simple check in that path:
/* intel_iommu_preserve_device */
/* Preserve the primary/lower context table backing this bus */
context = iommu_context_addr(info->iommu, info->bus, 0, 0);
if (context)
iommu_preserve_page(context);
/* If scalable mode is active, preserve the upper context table as well */
if (sm_supported(info->iommu)) {
context = iommu_context_addr(info->iommu, info->bus, 0x80, 0);
if (context)
iommu_preserve_page(context);
}
WDYT?
>
> I think we will have to do similar stuff for PASID also down the road to
> preserve pasid_tables in PASID directory.
> >
> > Since we don't track the context_tables that are preserved, there is no
> > way to incremently preserve the new-ones. Let me look into the behaviour
> > of KHO, maybe we can make the preserve call idempotent and do these
> > incrementally.
> > >
> > > > + spin_lock(&iommu->lock);
> > > > + context = iommu_context_addr(iommu, i, 0, 1);
> > > > + spin_unlock(&iommu->lock);
> > > > + if (!context) {
> > > > + ret = -ENOMEM;
> > > > + goto error;
> > > > + }
>
[snip]
Thanks,
Praan
next prev parent reply other threads:[~2026-05-19 14:40 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 17:56 [PATCH v2 00/16] iommu: Add live update state preservation Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 01/16] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-05-18 11:40 ` Pranjal Shrivastava
2026-05-18 19:08 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 02/16] iommu: Implement IOMMU Live update FLB callbacks Samiullah Khawaja
2026-05-01 21:45 ` David Matlack
2026-05-18 11:52 ` Pranjal Shrivastava
2026-05-18 14:10 ` Pratyush Yadav
2026-05-18 15:08 ` Pranjal Shrivastava
2026-05-18 12:33 ` Pranjal Shrivastava
2026-05-18 17:20 ` Samiullah Khawaja
2026-05-18 17:32 ` Pranjal Shrivastava
2026-05-18 17:06 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 03/16] iommu: Implement IOMMU domain preservation Samiullah Khawaja
2026-05-01 22:08 ` David Matlack
2026-05-04 18:33 ` Samiullah Khawaja
2026-05-18 13:13 ` Pranjal Shrivastava
2026-05-18 18:55 ` Samiullah Khawaja
2026-05-18 21:36 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 04/16] iommu: Implement device and IOMMU HW preservation Samiullah Khawaja
2026-05-01 22:42 ` David Matlack
2026-05-04 19:06 ` Samiullah Khawaja
2026-05-07 2:07 ` Baolu Lu
2026-05-07 18:47 ` Samiullah Khawaja
2026-05-18 14:01 ` Pranjal Shrivastava
2026-05-18 18:33 ` Samiullah Khawaja
2026-05-18 13:55 ` Pranjal Shrivastava
2026-05-18 18:44 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 05/16] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-05-18 14:23 ` Pranjal Shrivastava
2026-05-18 17:22 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 06/16] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-05-07 2:55 ` Baolu Lu
2026-05-07 18:40 ` Samiullah Khawaja
2026-05-19 13:15 ` Pranjal Shrivastava
2026-05-19 17:14 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-05-07 6:25 ` Baolu Lu
2026-05-08 2:36 ` Samiullah Khawaja
2026-05-18 20:32 ` Samiullah Khawaja
2026-05-19 14:40 ` Pranjal Shrivastava [this message]
2026-05-19 18:26 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 08/16] iommu: Add APIs to get iommu and device preserved state Samiullah Khawaja
2026-05-19 15:52 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 09/16] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-05-07 9:05 ` Baolu Lu
2026-05-07 17:35 ` Samiullah Khawaja
2026-05-19 21:46 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 10/16] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-05-07 13:54 ` Baolu Lu
2026-05-07 16:52 ` Samiullah Khawaja
2026-04-27 17:56 ` [PATCH v2 11/16] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-05-08 6:05 ` Baolu Lu
2026-05-11 18:45 ` Samiullah Khawaja
2026-05-12 11:32 ` Baolu Lu
2026-05-19 22:35 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 12/16] iommufd: Implement ioctl to mark HWPT for preservation Samiullah Khawaja
2026-05-19 23:05 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 13/16] iommufd: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-05-20 0:00 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 14/16] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-05-20 0:46 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 15/16] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-05-20 0:57 ` Pranjal Shrivastava
2026-04-27 17:56 ` [PATCH v2 16/16] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agx2RW_jujXbsiea@google.com \
--to=praan@google.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=baolu.lu@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=dmatlack@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=robin.murphy@arm.com \
--cc=saeedm@nvidia.com \
--cc=shuah@kernel.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=will@kernel.org \
--cc=witu@nvidia.com \
--cc=zhuyifei@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox