From: Samiullah Khawaja <skhawaja@google.com>
To: Pranjal Shrivastava <praan@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Alex Williamson <alex@shazbot.org>,
Shuah Khan <shuah@kernel.org>,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Adithya Jayachandran <ajayachandra@nvidia.com>,
Parav Pandit <parav@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
William Tu <witu@nvidia.com>,
Pratyush Yadav <pratyush@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
David Matlack <dmatlack@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Vipin Sharma <vipinsh@google.com>,
YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH 08/14] iommu: Restore and reattach preserved domains to devices
Date: Mon, 23 Mar 2026 18:02:36 +0000 [thread overview]
Message-ID: <acFu5vTAcKULIYOt@google.com> (raw)
In-Reply-To: <acBmNM_NIVamBrpm@google.com>
On Sun, Mar 22, 2026 at 09:59:16PM +0000, Pranjal Shrivastava wrote:
>On Tue, Feb 03, 2026 at 10:09:42PM +0000, Samiullah Khawaja wrote:
>> Restore the preserved domains by restoring the page tables using restore
>> IOMMU domain op. Reattach the preserved domain to the device during
>> default domain setup. While attaching, reuse the domain ID that was used
>> in the previous kernel. The context entry setup is not needed as that is
>> preserved during liveupdate.
>>
>> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
>> ---
>> drivers/iommu/intel/iommu.c | 40 ++++++++++++++++++------------
>> drivers/iommu/intel/iommu.h | 3 ++-
>> drivers/iommu/intel/nested.c | 2 +-
>> drivers/iommu/iommu.c | 47 ++++++++++++++++++++++++++++++++++--
>> drivers/iommu/liveupdate.c | 31 ++++++++++++++++++++++++
>> include/linux/iommu-lu.h | 8 ++++++
>> 6 files changed, 112 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 8acb7f8a7627..83faad53f247 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1029,7 +1029,8 @@ static bool first_level_by_default(struct intel_iommu *iommu)
>> return true;
>> }
>>
>> -int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
>> +int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu,
>> + int restore_did)
>> {
>> struct iommu_domain_info *info, *curr;
>> int num, ret = -ENOSPC;
>> @@ -1049,8 +1050,11 @@ int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
>> return 0;
>> }
>>
>> - num = ida_alloc_range(&iommu->domain_ida, IDA_START_DID,
>> - cap_ndoms(iommu->cap) - 1, GFP_KERNEL);
>> + if (restore_did >= 0)
>
>I just looked at the code and saw the comment for IDA_START_DID [1] and
>I realized we probably shouldn't allow domain_ids 0 or 1 here, as they
>are reserved?
>
>It seems that DID 0 essentially acts as a blocking domain, and DID 1 is
>used for identity / FLPT_DEFAULT_DID (comparing it to S1DSS in Arm)
>
>If a corrupted KHO image passes restore_did = 0, this check evaluates to
>true. The driver will bypass the IDA, assign DID 0, and program it into
>the context entries. Wouldn't we accidentally attach the device to a
>blocking mode where the VT-d hardware just silently drops/faults all DMA?
>
>Should this check be:
>if (restore_did >= IDA_START_DID) ?
Agreed. Will update.
>
>> + num = restore_did;
>> + else
>> + num = ida_alloc_range(&iommu->domain_ida, IDA_START_DID,
>> + cap_ndoms(iommu->cap) - 1, GFP_KERNEL);
>> if (num < 0) {
>> pr_err("%s: No free domain ids\n", iommu->name);
>> goto err_unlock;
>> @@ -1321,10 +1325,14 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
>> {
>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>> struct intel_iommu *iommu = info->iommu;
>> + struct device_ser *device_ser = NULL;
>> unsigned long flags;
>> int ret;
>>
>> - ret = domain_attach_iommu(domain, iommu);
>> + device_ser = dev_iommu_restored_state(dev);
>> +
>> + ret = domain_attach_iommu(domain, iommu,
>> + dev_iommu_restore_did(dev, &domain->domain));
>> if (ret)
>> return ret;
>>
>> @@ -1337,16 +1345,18 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
>
>[ ------- >8 snip -------- ]
>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index c0632cb5b570..8103b5372364 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -18,6 +18,7 @@
>> #include <linux/errno.h>
>> #include <linux/host1x_context_bus.h>
>> #include <linux/iommu.h>
>> +#include <linux/iommu-lu.h>
>> #include <linux/iommufd.h>
>> #include <linux/idr.h>
>> #include <linux/err.h>
>> @@ -489,6 +490,10 @@ static int iommu_init_device(struct device *dev)
>> goto err_free;
>> }
>>
>> +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> + dev->iommu->device_ser = iommu_get_device_preserved_data(dev);
>> +#endif
>> +
>> iommu_dev = ops->probe_device(dev);
>> if (IS_ERR(iommu_dev)) {
>> ret = PTR_ERR(iommu_dev);
>> @@ -2149,6 +2154,13 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>> ret = domain->ops->attach_dev(domain, dev, old);
>> if (ret)
>> return ret;
>> +
>> +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> + /* The associated state can be unset once restored. */
>> + if (dev_iommu_restored_state(dev))
>> + WRITE_ONCE(dev->iommu->device_ser, NULL);
>> +#endif
>> +
>> dev->iommu->attach_deferred = 0;
>> trace_attach_device_to_domain(dev);
>> return 0;
>> @@ -3061,6 +3073,34 @@ int iommu_fwspec_add_ids(struct device *dev, const u32 *ids, int num_ids)
>> }
>> EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
>>
>> +static struct iommu_domain *__iommu_group_maybe_restore_domain(struct iommu_group *group)
>> +{
>> + struct device_ser *device_ser;
>> + struct iommu_domain *domain;
>> + struct device *dev;
>> +
>> + dev = iommu_group_first_dev(group);
>> + if (!dev_is_pci(dev))
>> + return NULL;
>> +
>> + device_ser = dev_iommu_restored_state(dev);
>> + if (!device_ser)
>> + return NULL;
>> +
>> + domain = iommu_restore_domain(dev, device_ser);
>> + if (WARN_ON(IS_ERR(domain)))
>> + return NULL;
>> +
>> + /*
>> + * The group is owned by the entity (a preserved iommufd) that provided
>> + * this token in the previous kernel. It will be used to reclaim it
>> + * later.
>> + */
>> + group->owner = (void *)device_ser->token;
>
>Umm.. this is a nak as it looks troubled for the following reasons:
>
>1. The group->owner and group->owner_cnt are being updated directly
>without holding group->mutex. We shouldn't bypass the lock as this
>violates the group's concurrency protections and can cause APIs like
>iommu_device_claim_dma_owner[2] to race or fail.
This function is called with group->mutex held during the setup of
default domain during device iommu probe. So this should fine. I will
add a lockdep to make it clear.
>
>2. Stuff like this shouldn't be open-coded. Presumably, this was
>open-coded because the standard API (__iommu_take_dma_ownership) forces
>the IOMMU group into a blocking_domain, which conflicts with the fact
>that we just restored an active paging domain?
>
>I think we should introduce a proper helper that acquires the lock,
>safely sets the opaque owner, and manages the domain state without
>forcing it into a blocking domain? Please note that the helper shouldn't
>be exposed / exported, we wouldn't want any abusive calls to claim owner
>without attaching to a blocked domain.
The init/restore of owner fields in group should only be done during
restore when restoring the iommu domain for the group with mutex already
held. Adding a separate helper might allow setting it up at some other
place even in the core, so want to avoid that.
>
>3. Please note that owner is an opaque "pointer" but we set it to a u64.
>Setting an opaque void *owner directly to a user-controlled u64 token is
>dangerous. If a malicious or unlucky userspace provides a token value
>that happens to perfectly match a valid kernel pointer (e.g. another
>subsystem's cookie), we'll have an aliasing collision.
>
>If that subsystem subsequently calls iommu_device_claim_dma_owner(), the
>if (group->owner == owner) check will falsely pass, incorrectly granting
>DMA ownership and breaking device isolation.
>
>I understand we can't restore the original owner pointer because the
>original iommufd file/context from the previous kernel no longer exists.
>However, to avoid aliasing could we dynamically allocate a small wrapper
>(e.g., struct iommu_lu_owner { u64 token; }) and set group->owner to the
>uniquely allocated pointer?
Agreed. Will update.
>
>4. In the full RFC [3], we seem to be "transferring" the ownership but
>here we seem to set the owner_cnt = 1 what if it was > 1 earlier? The
>iommu_device_release_dma_owner seems to handle multiple release calls
>well so this shouldn't lead to crashes but the "owning" entity could use
>this count for some book keeping.. it'd be nice if we could have a
>proper restore_owner API / helper here.
This is being done during iommu probe of the device when the default
domain is being set on the group, so group will not have any existing
owner. I will add a WARN for this. The transfer of ownership will happen
after the iommufd is restored. Also note that once initialized here, the
device/group will not be owned by any other driver or iommufd until the
ownership transfer is done.
>
>> + group->owner_cnt = 1;
>> + return domain;
>> +}
>> +
>> /**
>> * iommu_setup_default_domain - Set the default_domain for the group
>> * @group: Group to change
>> @@ -3075,8 +3115,8 @@ static int iommu_setup_default_domain(struct iommu_group *group,
>> int target_type)
>> {
>> struct iommu_domain *old_dom = group->default_domain;
>> + struct iommu_domain *dom, *restored_domain;
>> struct group_device *gdev;
>> - struct iommu_domain *dom;
>> bool direct_failed;
>> int req_type;
>> int ret;
>> @@ -3120,6 +3160,9 @@ static int iommu_setup_default_domain(struct iommu_group *group,
>> /* We must set default_domain early for __iommu_device_set_domain */
>> group->default_domain = dom;
>> if (!group->domain) {
>> + restored_domain = __iommu_group_maybe_restore_domain(group);
>> + if (!restored_domain)
>> + restored_domain = dom;
>> /*
>> * Drivers are not allowed to fail the first domain attach.
>> * The only way to recover from this is to fail attaching the
>> @@ -3127,7 +3170,7 @@ static int iommu_setup_default_domain(struct iommu_group *group,
>> * in group->default_domain so it is freed after.
>> */
>> ret = __iommu_group_set_domain_internal(
>> - group, dom, IOMMU_SET_DOMAIN_MUST_SUCCEED);
>> + group, restored_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED);
>> if (WARN_ON(ret))
>> goto out_free_old;
>> } else {
>> diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c
>> index 83eb609b3fd7..6b211436ad25 100644
>> --- a/drivers/iommu/liveupdate.c
>> +++ b/drivers/iommu/liveupdate.c
>> @@ -501,3 +501,34 @@ void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev)
>>
>> iommu_unpreserve_locked(iommu->iommu_dev);
>> }
>> +
>> +struct iommu_domain *iommu_restore_domain(struct device *dev, struct device_ser *ser)
>> +{
>> + struct iommu_domain_ser *domain_ser;
>> + struct iommu_lu_flb_obj *flb_obj;
>> + struct iommu_domain *domain;
>> + int ret;
>> +
>> + domain_ser = __va(ser->domain_iommu_ser.domain_phys);
>> +
>> + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&flb_obj);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>> + guard(mutex)(&flb_obj->lock);
>> + if (domain_ser->restored_domain)
>> + return domain_ser->restored_domain;
>> +
>> + domain_ser->obj.incoming = true;
>> + domain = iommu_paging_domain_alloc(dev);
>> + if (IS_ERR(domain))
>> + return domain;
>> +
>> + ret = domain->ops->restore(domain, domain_ser);
>> + if (ret)
>
>did we miss free-ing domain here?
Yes, Ankit pointed out same. Will update this.
>
>> + return ERR_PTR(ret);
>> +
>> + domain->preserved_state = domain_ser;
>> + domain_ser->restored_domain = domain;
>> + return domain;
>> +}
>> diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h
>> index 48c07514a776..4879abaf83d3 100644
>> --- a/include/linux/iommu-lu.h
>> +++ b/include/linux/iommu-lu.h
>> @@ -65,6 +65,8 @@ static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain
>> return -1;
>> }
>>
>> +struct iommu_domain *iommu_restore_domain(struct device *dev,
>> + struct device_ser *ser);
>> int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn,
>> void *arg);
>> struct device_ser *iommu_get_device_preserved_data(struct device *dev);
>> @@ -95,6 +97,12 @@ static inline void *iommu_domain_restored_state(struct iommu_domain *domain)
>> return NULL;
>> }
>>
>> +static inline struct iommu_domain *iommu_restore_domain(struct device *dev,
>> + struct device_ser *ser)
>> +{
>> + return NULL;
>> +}
>> +
>> static inline int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, void *arg)
>> {
>> return -EOPNOTSUPP;
>
>Thanks,
>Praan
>
>[1] https://elixir.bootlin.com/linux/v7.0-rc4/source/drivers/iommu/intel/iommu.h#L795
>[2] https://elixir.bootlin.com/linux/v7.0-rc4/source/drivers/iommu/iommu.c#L3367
Thanks,
Sami
next prev parent reply other threads:[~2026-03-23 18:02 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 22:09 [PATCH 00/14] iommu: Add live update state preservation Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks Samiullah Khawaja
2026-03-11 21:07 ` Pranjal Shrivastava
2026-03-12 16:43 ` Samiullah Khawaja
2026-03-12 23:43 ` Pranjal Shrivastava
2026-03-13 16:47 ` Samiullah Khawaja
2026-03-13 15:36 ` Pranjal Shrivastava
2026-03-13 16:58 ` Samiullah Khawaja
2026-03-16 22:54 ` Vipin Sharma
2026-03-17 1:06 ` Samiullah Khawaja
2026-03-23 23:27 ` Vipin Sharma
2026-02-03 22:09 ` [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Samiullah Khawaja
2026-03-12 23:10 ` Pranjal Shrivastava
2026-03-13 18:42 ` Samiullah Khawaja
2026-03-17 20:09 ` Pranjal Shrivastava
2026-03-17 20:13 ` Samiullah Khawaja
2026-03-17 20:23 ` Pranjal Shrivastava
2026-03-17 21:03 ` Vipin Sharma
2026-03-18 18:51 ` Pranjal Shrivastava
2026-03-18 17:49 ` Samiullah Khawaja
2026-03-17 19:58 ` Vipin Sharma
2026-03-17 20:33 ` Samiullah Khawaja
2026-03-24 19:06 ` Vipin Sharma
2026-03-24 19:45 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 03/14] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-03-18 10:00 ` Pranjal Shrivastava
2026-03-18 16:54 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 04/14] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-03-03 16:42 ` Ankit Soni
2026-03-03 18:41 ` Samiullah Khawaja
2026-03-20 17:27 ` Pranjal Shrivastava
2026-03-20 18:12 ` Samiullah Khawaja
2026-03-17 20:59 ` Vipin Sharma
2026-03-20 9:28 ` Pranjal Shrivastava
2026-03-20 18:27 ` Samiullah Khawaja
2026-03-20 11:01 ` Pranjal Shrivastava
2026-03-20 18:56 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 05/14] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-03-20 21:57 ` Pranjal Shrivastava
2026-03-23 16:41 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-03-19 16:04 ` Vipin Sharma
2026-03-19 16:27 ` Samiullah Khawaja
2026-03-20 23:01 ` Pranjal Shrivastava
2026-03-21 13:27 ` Pranjal Shrivastava
2026-03-23 18:32 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-03-19 20:54 ` Vipin Sharma
2026-03-20 1:05 ` Samiullah Khawaja
2026-03-22 19:51 ` Pranjal Shrivastava
2026-03-23 19:33 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 08/14] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-03-10 5:16 ` Ankit Soni
2026-03-10 21:47 ` Samiullah Khawaja
2026-03-22 21:59 ` Pranjal Shrivastava
2026-03-23 18:02 ` Samiullah Khawaja [this message]
2026-02-03 22:09 ` [PATCH 09/14] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-03-23 18:19 ` Pranjal Shrivastava
2026-03-23 18:51 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 10/14] iommufd-lu: Implement ioctl to let userspace mark an HWPT to be preserved Samiullah Khawaja
2026-03-19 23:35 ` Vipin Sharma
2026-03-20 0:40 ` Samiullah Khawaja
2026-03-20 23:34 ` Vipin Sharma
2026-03-23 16:24 ` Samiullah Khawaja
2026-03-25 14:37 ` Pranjal Shrivastava
2026-03-25 17:31 ` Samiullah Khawaja
2026-03-25 18:55 ` Pranjal Shrivastava
2026-03-25 20:19 ` Samiullah Khawaja
2026-03-25 20:36 ` Pranjal Shrivastava
2026-03-25 20:46 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 11/14] iommufd-lu: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-02-25 23:47 ` Samiullah Khawaja
2026-03-03 5:56 ` Ankit Soni
2026-03-03 18:51 ` Samiullah Khawaja
2026-03-23 20:28 ` Vipin Sharma
2026-03-23 21:34 ` Samiullah Khawaja
2026-03-25 20:08 ` Pranjal Shrivastava
2026-03-25 20:32 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-03-23 20:59 ` Vipin Sharma
2026-03-23 21:38 ` Samiullah Khawaja
2026-03-25 20:24 ` Pranjal Shrivastava
2026-03-25 20:41 ` Samiullah Khawaja
2026-03-25 21:23 ` Pranjal Shrivastava
2026-03-26 0:16 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 13/14] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-02-17 4:18 ` Ankit Soni
2026-03-03 18:35 ` Samiullah Khawaja
2026-03-23 21:17 ` Vipin Sharma
2026-03-23 22:07 ` Samiullah Khawaja
2026-03-24 20:30 ` Vipin Sharma
2026-03-25 20:55 ` Pranjal Shrivastava
2026-02-03 22:09 ` [PATCH 14/14] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
2026-03-23 22:18 ` Vipin Sharma
2026-03-25 21:05 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acFu5vTAcKULIYOt@google.com \
--to=skhawaja@google.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=baolu.lu@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=dmatlack@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=praan@google.com \
--cc=pratyush@kernel.org \
--cc=robin.murphy@arm.com \
--cc=saeedm@nvidia.com \
--cc=shuah@kernel.org \
--cc=vipinsh@google.com \
--cc=will@kernel.org \
--cc=witu@nvidia.com \
--cc=zhuyifei@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox