public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: Pranjal Shrivastava <praan@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
	 Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	 Jason Gunthorpe <jgg@ziepe.ca>,
	Robin Murphy <robin.murphy@arm.com>,
	 Kevin Tian <kevin.tian@intel.com>,
	Alex Williamson <alex@shazbot.org>,
	 Shuah Khan <shuah@kernel.org>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	 kvm@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	 Adithya Jayachandran <ajayachandra@nvidia.com>,
	Parav Pandit <parav@nvidia.com>,
	 Leon Romanovsky <leonro@nvidia.com>,
	William Tu <witu@nvidia.com>,
	 Pratyush Yadav <pratyush@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 David Matlack <dmatlack@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Chris Li <chrisl@kernel.org>, Vipin Sharma <vipinsh@google.com>,
	 YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton
Date: Tue, 17 Mar 2026 20:13:47 +0000	[thread overview]
Message-ID: <abm1esL1dzN7NTC7@google.com> (raw)
In-Reply-To: <abm09vCaAMtiQPVs@google.com>

On Tue, Mar 17, 2026 at 08:09:26PM +0000, Pranjal Shrivastava wrote:
>On Fri, Mar 13, 2026 at 06:42:05PM +0000, Samiullah Khawaja wrote:
>> On Thu, Mar 12, 2026 at 11:10:36PM +0000, Pranjal Shrivastava wrote:
>> > On Tue, Feb 03, 2026 at 10:09:36PM +0000, Samiullah Khawaja wrote:
>> > > Add IOMMU domain ops that can be implemented by the IOMMU drivers if
>> > > they support IOMMU domain preservation across liveupdate. The new IOMMU
>> > > domain preserve, unpreserve and restore APIs call these ops to perform
>> > > respective live update operations.
>> > >
>> > > Similarly add IOMMU ops to preserve/unpreserve a device. These can be
>> > > implemented by the IOMMU drivers that support preservation of devices
>> > > that have their IOMMU domains preserved. During device preservation the
>> > > state of the associated IOMMU is also preserved. The device can only be
>> > > preserved if the attached iommu domain is preserved and the associated
>> > > iommu supports preservation.
>> > >
>> > > The preserved state of the device and IOMMU needs to be fetched during
>> > > shutdown and boot in the next kernel. Add APIs that can be used to fetch
>> > > the preserved state of a device and IOMMU. The APIs will only be used
>> > > during shutdown and after liveupdate so no locking needed.
>> > >
>> > > Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
>> > > ---
>> > >  drivers/iommu/iommu.c      |   3 +
>> > >  drivers/iommu/liveupdate.c | 326 +++++++++++++++++++++++++++++++++++++
>> > >  include/linux/iommu-lu.h   | 119 ++++++++++++++
>> > >  include/linux/iommu.h      |  32 ++++
>> > >  4 files changed, 480 insertions(+)
>> > >
>> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> > > index 4926a43118e6..c0632cb5b570 100644
>> > > --- a/drivers/iommu/iommu.c
>> > > +++ b/drivers/iommu/iommu.c
>> > > @@ -389,6 +389,9 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
>> > >
>> > >  	mutex_init(&param->lock);
>> > >  	dev->iommu = param;
>> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> > > +	dev->iommu->device_ser = NULL;
>> > > +#endif
>> > >  	return param;
>> > >  }
>> > >
>> > > diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c
>> > > index 6189ba32ff2c..83eb609b3fd7 100644
>> > > --- a/drivers/iommu/liveupdate.c
>> > > +++ b/drivers/iommu/liveupdate.c
>> > > @@ -11,6 +11,7 @@
>> > >  #include <linux/liveupdate.h>
>> > >  #include <linux/iommu-lu.h>
>> > >  #include <linux/iommu.h>
>> > > +#include <linux/pci.h>
>> > >  #include <linux/errno.h>
>> > >
>> > >  static void iommu_liveupdate_restore_objs(u64 next)
>> > > @@ -175,3 +176,328 @@ int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler)
>> > >  	return liveupdate_unregister_flb(handler, &iommu_flb);
>> > >  }
>> > >  EXPORT_SYMBOL(iommu_liveupdate_unregister_flb);
>> > > +
>> > > +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn,
>> > > +				    void *arg)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *obj;
>> > > +	struct devices_ser *devices;
>> > > +	int ret, i, idx;
>> > > +
>> > > +	ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj);
>> > > +	if (ret)
>> > > +		return -ENOENT;
>> > > +
>> > > +	devices = __va(obj->ser->devices_phys);
>> > > +	for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) {
>> > > +		if (idx >= MAX_DEVICE_SERS) {
>> > > +			devices = __va(devices->objs.next_objs);
>> > > +			idx = 0;
>> > > +		}
>> > > +
>> > > +		if (devices->devices[idx].obj.deleted)
>> > > +			continue;
>> > > +
>> > > +		ret = fn(&devices->devices[idx], arg);
>> > > +		if (ret)
>> > > +			return ret;
>> > > +	}
>> > > +
>> > > +	return 0;
>> > > +}
>> > > +EXPORT_SYMBOL(iommu_for_each_preserved_device);
>> > > +
>> > > +static inline bool device_ser_match(struct device_ser *match,
>> > > +				    struct pci_dev *pdev)
>> > > +{
>> > > +	return match->devid == pci_dev_id(pdev) && match->pci_domain == pci_domain_nr(pdev->bus);
>> > > +}
>> > > +
>> > > +struct device_ser *iommu_get_device_preserved_data(struct device *dev)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *obj;
>> > > +	struct devices_ser *devices;
>> > > +	int ret, i, idx;
>> > > +
>> > > +	if (!dev_is_pci(dev))
>> > > +		return NULL;
>> > > +
>> > > +	ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj);
>> > > +	if (ret)
>> > > +		return NULL;
>> > > +
>> > > +	devices = __va(obj->ser->devices_phys);
>> > > +	for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) {
>> > > +		if (idx >= MAX_DEVICE_SERS) {
>> > > +			devices = __va(devices->objs.next_objs);
>> > > +			idx = 0;
>> > > +		}
>> > > +
>> > > +		if (devices->devices[idx].obj.deleted)
>> > > +			continue;
>> > > +
>> > > +		if (device_ser_match(&devices->devices[idx], to_pci_dev(dev))) {
>> > > +			devices->devices[idx].obj.incoming = true;
>> > > +			return &devices->devices[idx];
>> > > +		}
>> > > +	}
>> > > +
>> > > +	return NULL;
>> > > +}
>> > > +EXPORT_SYMBOL(iommu_get_device_preserved_data);
>> > > +
>> > > +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *obj;
>> > > +	struct iommus_ser *iommus;
>> > > +	int ret, i, idx;
>> > > +
>> > > +	ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj);
>> > > +	if (ret)
>> > > +		return NULL;
>> > > +
>> > > +	iommus = __va(obj->ser->iommus_phys);
>> > > +	for (i = 0, idx = 0; i < obj->ser->nr_iommus; ++i, ++idx) {
>> > > +		if (idx >= MAX_IOMMU_SERS) {
>> > > +			iommus = __va(iommus->objs.next_objs);
>> > > +			idx = 0;
>> > > +		}
>> > > +
>> > > +		if (iommus->iommus[idx].obj.deleted)
>> > > +			continue;
>> > > +
>> > > +		if (iommus->iommus[idx].token == token &&
>> > > +		    iommus->iommus[idx].type == type)
>> > > +			return &iommus->iommus[idx];
>> > > +	}
>> > > +
>> > > +	return NULL;
>> > > +}
>> > > +EXPORT_SYMBOL(iommu_get_preserved_data);
>> > > +
>
>Also, I don't see these helpers being used anywhere in this series yet?
>Should we add them with the phase-2 series? When we actually "Restore"
>the domain?

Phase 1 does restore the iommu and domains during boot. But the
iommufd/hwpt state is not restored or re-associated with the restored
domains.

These helpers are used to fetch the preserved state during boot.

Thanks,
Sami
>
>Thanks,
>Praan
>
>> > > +static int reserve_obj_ser(struct iommu_objs_ser **objs_ptr, u64 max_objs)
>> >
>> > Isn't this more of an "insert" / "populate" / write_ser_entry? We can
>> > rename it to something like iommu_ser_push_entry / iommu_ser_write_entry
>>
>> This is reserving an object in the objects array. The object is filled
>> once reservation is done. Maybe I can call it alloc_obj_ser or
>> alloc_entry_ser.
>> >
>> > > +{
>> > > +	struct iommu_objs_ser *next_objs, *objs = *objs_ptr;
>> >
>> > Not loving these names :(
>> >
>> > TBH, the reserve_obj_ser function isn't too readable, esp. with all the
>> > variable names, here and in the lu header.. I've had to go back n forth
>> > to the first patch. For example, here next_objs can be next_objs_page &
>> > objs can be curr_page. (PTAL at my reply on PATCH 01 about renaming).
>>
>> Basically with current naming there are "serialization objects" in
>> "serializatoin object arrays". The object is a "base" type with
>> "inherited" types for iommu, domain etc. I will add explanation of all
>> this in the ABI header.
>>
>> Maybe we can name it to:
>>
>> struct iommu_obj_ser;
>> struct iommu_obj_array_ser; or struct iommu_obj_ser_array;
>>
>> Then for reserve/alloc, we use names like "next_obj_array" and
>> "curr_obj_array"?
>> >
>> > > +	int idx;
>> > > +
>> > > +	if (objs->nr_objs == max_objs) {
>> > > +		next_objs = kho_alloc_preserve(PAGE_SIZE);
>> > > +		if (IS_ERR(next_objs))
>> > > +			return PTR_ERR(next_objs);
>> > > +
>> > > +		objs->next_objs = virt_to_phys(next_objs);
>> > > +		objs = next_objs;
>> > > +		*objs_ptr = objs;
>> > > +		objs->nr_objs = 0;
>> > > +		objs->next_objs = 0;
>> >
>> > This seems redundant, no need to zero these out, kho_alloc_preserve
>> > passes __GFP_ZERO to folio_alloc [1], which should z-alloc the pages.
>>
>> Agreed. Will remove this.
>> >
>> > > +	}
>> > > +
>> > > +	idx = objs->nr_objs++;
>> > > +	return idx;
>> > > +}
>> >
>> > Just to give a mental model to fellow reviewers, here's how this is laid
>> > out:
>> >
>> > ----------------------------------------------------------------------
>> > [ PAGE START ]                                                       |
>> > ----------------------------------------------------------------------
>> > | iommu_objs_ser (The Page Header)                                   |
>> > |   - next_objs: 0x0000 (End of the page-chain)                      |
>> > |   - nr_objs: 2                                                     |
>> > ----------------------------------------------------------------------
>> > | ITEM 0: iommu_domain_ser                                           |
>> > |   [ iommu_obj_ser (The entry header) ]                             |
>> > |     - idx: 0                                                       |
>> > |     - ref_count: 1                                                 |
>> > |     - deleted: 0                                                   |
>> > |   [ Domain Data ]                                                  |
>> > ----------------------------------------------------------------------
>> > | ITEM 1: iommu_domain_ser                                           |
>> > |   [ iommu_obj_ser (The Price Tag) ]                                |
>> > |     - idx: 1                                                       |
>> > |     - ref_count: 1                                                 |
>> > |     - deleted: 0                                                   |
>> > |   [ Domain Data ]                                                  |
>> > ----------------------------------------------------------------------
>> > | ... (Empty space for more domains) ...                             |
>> > |                                                                    |
>> > ----------------------------------------------------------------------
>> > [ PAGE END ]                                                         |
>> > ----------------------------------------------------------------------
>>
>> +1
>>
>> Will add a table in the header as you suggested.
>> >
>> > > +
>> > > +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser)
>> > > +{
>> > > +	struct iommu_domain_ser *domain_ser;
>> > > +	struct iommu_lu_flb_obj *flb_obj;
>> > > +	int idx, ret;
>> > > +
>> > > +	if (!domain->ops->preserve)
>> > > +		return -EOPNOTSUPP;
>> > > +
>> > > +	ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj);
>> > > +	if (ret)
>> > > +		return ret;
>> > > +
>> > > +	guard(mutex)(&flb_obj->lock);
>> > > +	idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommu_domains,
>> > > +			      MAX_IOMMU_DOMAIN_SERS);
>> > > +	if (idx < 0)
>> > > +		return idx;
>> > > +
>> > > +	domain_ser = &flb_obj->iommu_domains->iommu_domains[idx];
>> >
>> > This is slightly less-readable as well, I understand we're trying to:
>> >
>> > iommu_domains_ser -> iommu_domain_ser[idx] but the same name
>> > (iommu_domains) makes it difficult to read.. we should rename this as:
>>
>> Agreed.
>>
>> I will update it to,
>>
>> &flb_obj->curr_iommu_domains->domain_array[idx]
>> >
>> > &flb_obj->iommu_domains_page->domain_entries[idx] or something for
>> > better readability..
>> >
>> > Also, let's add a comment explaining that reserve_obj_ser actually
>> > advances the flb_obj ptr when necessary..
>>
>> Agreed.
>> >
>> > IIUC, we start with PAGE 0 initially, store it's phys in the
>> > iommu_flb_preserve op (the iommu_ser_phys et al) & then we go on
>> > alloc-ing more pages and keep storing the "current" active page with the
>> > liveupdate core. Now when we jump into the new kernel, we read the
>> > ser_phys and then follow the page chain, right?
>>
>> Yes, we do an initial allocation of arrays for each object type and then
>> later allocate more as needed. The flb_obj holds the address of
>> currently active array.
>>
>> I will add this explanation in the header.
>> >
>> > > +	idx = flb_obj->ser->nr_domains++;
>> > > +	domain_ser->obj.idx = idx;
>> > > +	domain_ser->obj.ref_count = 1;
>> > > +
>> > > +	ret = domain->ops->preserve(domain, domain_ser);
>> > > +	if (ret) {
>> > > +		domain_ser->obj.deleted = true;
>> > > +		return ret;
>> > > +	}
>> > > +
>> > > +	domain->preserved_state = domain_ser;
>> > > +	*ser = domain_ser;
>> > > +	return 0;
>> > > +}
>> > > +EXPORT_SYMBOL_GPL(iommu_domain_preserve);
>> > > +
>> > > +void iommu_domain_unpreserve(struct iommu_domain *domain)
>> > > +{
>> > > +	struct iommu_domain_ser *domain_ser;
>> > > +	struct iommu_lu_flb_obj *flb_obj;
>> > > +	int ret;
>> > > +
>> > > +	if (!domain->ops->unpreserve)
>> > > +		return;
>> > > +
>> > > +	ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj);
>> > > +	if (ret)
>> > > +		return;
>> > > +
>> > > +	guard(mutex)(&flb_obj->lock);
>> > > +
>> > > +	/*
>> > > +	 * There is no check for attached devices here. The correctness relies
>> > > +	 * on the Live Update Orchestrator's session lifecycle. All resources
>> > > +	 * (iommufd, vfio devices) are preserved within a single session. If the
>> > > +	 * session is torn down, the .unpreserve callbacks for all files will be
>> > > +	 * invoked, ensuring a consistent cleanup without needing explicit
>> > > +	 * refcounting for the serialized objects here.
>> > > +	 */
>> > > +	domain_ser = domain->preserved_state;
>> > > +	domain->ops->unpreserve(domain, domain_ser);
>> > > +	domain_ser->obj.deleted = true;
>> > > +	domain->preserved_state = NULL;
>> > > +}
>> > > +EXPORT_SYMBOL_GPL(iommu_domain_unpreserve);
>> > > +
>> > > +static int iommu_preserve_locked(struct iommu_device *iommu)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *flb_obj;
>> > > +	struct iommu_ser *iommu_ser;
>> > > +	int idx, ret;
>> > > +
>> > > +	if (!iommu->ops->preserve)
>> > > +		return -EOPNOTSUPP;
>> > > +
>> > > +	if (iommu->outgoing_preserved_state) {
>> > > +		iommu->outgoing_preserved_state->obj.ref_count++;
>> > > +		return 0;
>> > > +	}
>> > > +
>> > > +	ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj);
>> > > +	if (ret)
>> > > +		return ret;
>> > > +
>> > > +	idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommus,
>> > > +			      MAX_IOMMU_SERS);
>> > > +	if (idx < 0)
>> > > +		return idx;
>> > > +
>> > > +	iommu_ser = &flb_obj->iommus->iommus[idx];
>> > > +	idx = flb_obj->ser->nr_iommus++;
>> > > +	iommu_ser->obj.idx = idx;
>> > > +	iommu_ser->obj.ref_count = 1;
>> > > +
>> > > +	ret = iommu->ops->preserve(iommu, iommu_ser);
>> > > +	if (ret)
>> > > +		iommu_ser->obj.deleted = true;
>> > > +
>> > > +	iommu->outgoing_preserved_state = iommu_ser;
>> > > +	return ret;
>> > > +}
>> > > +
>> > > +static void iommu_unpreserve_locked(struct iommu_device *iommu)
>> > > +{
>> > > +	struct iommu_ser *iommu_ser = iommu->outgoing_preserved_state;
>> > > +
>> > > +	iommu_ser->obj.ref_count--;
>> > > +	if (iommu_ser->obj.ref_count)
>> > > +		return;
>> > > +
>> > > +	iommu->outgoing_preserved_state = NULL;
>> > > +	iommu->ops->unpreserve(iommu, iommu_ser);
>> > > +	iommu_ser->obj.deleted = true;
>> > > +}
>> > > +
>> > > +int iommu_preserve_device(struct iommu_domain *domain,
>> > > +			  struct device *dev, u64 token)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *flb_obj;
>> > > +	struct device_ser *device_ser;
>> > > +	struct dev_iommu *iommu;
>> > > +	struct pci_dev *pdev;
>> > > +	int ret, idx;
>> > > +
>> > > +	if (!dev_is_pci(dev))
>> > > +		return -EOPNOTSUPP;
>> > > +
>> > > +	if (!domain->preserved_state)
>> > > +		return -EINVAL;
>> > > +
>> > > +	pdev = to_pci_dev(dev);
>> > > +	iommu = dev->iommu;
>> > > +	if (!iommu->iommu_dev->ops->preserve_device ||
>> > > +	    !iommu->iommu_dev->ops->preserve)
>> > > +		return -EOPNOTSUPP;
>> > > +
>> > > +	ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj);
>> > > +	if (ret)
>> > > +		return ret;
>> > > +
>> > > +	guard(mutex)(&flb_obj->lock);
>> > > +	idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->devices,
>> > > +			      MAX_DEVICE_SERS);
>> > > +	if (idx < 0)
>> > > +		return idx;
>> > > +
>> > > +	device_ser = &flb_obj->devices->devices[idx];
>> > > +	idx = flb_obj->ser->nr_devices++;
>> > > +	device_ser->obj.idx = idx;
>> > > +	device_ser->obj.ref_count = 1;
>> > > +
>> > > +	ret = iommu_preserve_locked(iommu->iommu_dev);
>> > > +	if (ret) {
>> > > +		device_ser->obj.deleted = true;
>> > > +		return ret;
>> > > +	}
>> > > +
>> > > +	device_ser->domain_iommu_ser.domain_phys = __pa(domain->preserved_state);
>> > > +	device_ser->domain_iommu_ser.iommu_phys = __pa(iommu->iommu_dev->outgoing_preserved_state);
>> > > +	device_ser->devid = pci_dev_id(pdev);
>> > > +	device_ser->pci_domain = pci_domain_nr(pdev->bus);
>> > > +	device_ser->token = token;
>> > > +
>> > > +	ret = iommu->iommu_dev->ops->preserve_device(dev, device_ser);
>> > > +	if (ret) {
>> > > +		device_ser->obj.deleted = true;
>> > > +		iommu_unpreserve_locked(iommu->iommu_dev);
>> > > +		return ret;
>> > > +	}
>> > > +
>> > > +	dev->iommu->device_ser = device_ser;
>> > > +	return 0;
>> > > +}
>> > > +
>> > > +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev)
>> > > +{
>> > > +	struct iommu_lu_flb_obj *flb_obj;
>> > > +	struct device_ser *device_ser;
>> > > +	struct dev_iommu *iommu;
>> > > +	struct pci_dev *pdev;
>> > > +	int ret;
>> > > +
>> > > +	if (!dev_is_pci(dev))
>> > > +		return;
>> > > +
>> > > +	pdev = to_pci_dev(dev);
>> > > +	iommu = dev->iommu;
>> > > +	if (!iommu->iommu_dev->ops->unpreserve_device ||
>> > > +	    !iommu->iommu_dev->ops->unpreserve)
>> > > +		return;
>> > > +
>> > > +	ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj);
>> > > +	if (WARN_ON(ret))
>> > > +		return;
>> > > +
>> > > +	guard(mutex)(&flb_obj->lock);
>> > > +	device_ser = dev_iommu_preserved_state(dev);
>> > > +	if (WARN_ON(!device_ser))
>> > > +		return;
>> > > +
>> > > +	iommu->iommu_dev->ops->unpreserve_device(dev, device_ser);
>> > > +	dev->iommu->device_ser = NULL;
>> > > +
>> > > +	iommu_unpreserve_locked(iommu->iommu_dev);
>> > > +}
>> >
>> > I'm wondering ig we should guard these APIs against accidental or
>> > potential abuse by other in-kernel drivers. Since the Live Update
>> > Orchestrator (LUO) is architecturally designed around user-space
>> > driven sequences (IOCTLs, specific mutex ordering, etc.), for now.
>> >
>> > Since the header-file is also under include/linux, we should avoid any
>> > possibility where we end up having drivers depending on these API.
>> > We could add a check based on dma owner:
>> >
>> > +	/* Only devices explicitly claimed by a user-space driver
>> > +	 * (VFIO/IOMMUFD) are eligible for Live Update preservation.
>> > +	 */
>> > +	if (!iommu_dma_owner_claimed(dev))
>> > +		return -EPERM;
>> >
>> > This should ensure we aren't creating 'zombie' preserved states for
>> > devices not managed by IOMMUFD/VFIO.
>>
>> Agreed. I will update this.
>> >
>> >
>> > > diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h
>> > > index 59095d2f1bb2..48c07514a776 100644
>> > > --- a/include/linux/iommu-lu.h
>> > > +++ b/include/linux/iommu-lu.h
>> > > @@ -8,9 +8,128 @@
>> > >  #ifndef _LINUX_IOMMU_LU_H
>> > >  #define _LINUX_IOMMU_LU_H
>> > >
>> > > +#include <linux/device.h>
>> > > +#include <linux/iommu.h>
>> > >  #include <linux/liveupdate.h>
>> > >  #include <linux/kho/abi/iommu.h>
>> > >
>> > > +typedef int (*iommu_preserved_device_iter_fn)(struct device_ser *ser,
>> > > +					      void *arg);
>> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> > > +static inline void *dev_iommu_preserved_state(struct device *dev)
>> > > +{
>> > > +	struct device_ser *ser;
>> > > +
>> > > +	if (!dev->iommu)
>> > > +		return NULL;
>> > > +
>> > > +	ser = dev->iommu->device_ser;
>> > > +	if (ser && !ser->obj.incoming)
>> > > +		return ser;
>> > > +
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline void *dev_iommu_restored_state(struct device *dev)
>> > > +{
>> > > +	struct device_ser *ser;
>> > > +
>> > > +	if (!dev->iommu)
>> > > +		return NULL;
>> > > +
>> > > +	ser = dev->iommu->device_ser;
>> > > +	if (ser && ser->obj.incoming)
>> > > +		return ser;
>> > > +
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline void *iommu_domain_restored_state(struct iommu_domain *domain)
>> > > +{
>> > > +	struct iommu_domain_ser *ser;
>> > > +
>> > > +	ser = domain->preserved_state;
>> > > +	if (ser && ser->obj.incoming)
>> > > +		return ser;
>> > > +
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain)
>> > > +{
>> > > +	struct device_ser *ser = dev_iommu_restored_state(dev);
>> > > +
>> > > +	if (ser && iommu_domain_restored_state(domain))
>> > > +		return ser->domain_iommu_ser.did;
>> > > +
>> > > +	return -1;
>> > > +}
>> > > +
>> > > +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn,
>> > > +				    void *arg);
>> > > +struct device_ser *iommu_get_device_preserved_data(struct device *dev);
>> > > +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type);
>> > > +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser);
>> > > +void iommu_domain_unpreserve(struct iommu_domain *domain);
>> > > +int iommu_preserve_device(struct iommu_domain *domain,
>> > > +			  struct device *dev, u64 token);
>> > > +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev);
>> > > +#else
>> > > +static inline void *dev_iommu_preserved_state(struct device *dev)
>> > > +{
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline void *dev_iommu_restored_state(struct device *dev)
>> > > +{
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain)
>> > > +{
>> > > +	return -1;
>> > > +}
>> > > +
>> > > +static inline void *iommu_domain_restored_state(struct iommu_domain *domain)
>> > > +{
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, void *arg)
>> > > +{
>> > > +	return -EOPNOTSUPP;
>> > > +}
>> > > +
>> > > +static inline struct device_ser *iommu_get_device_preserved_data(struct device *dev)
>> > > +{
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type)
>> > > +{
>> > > +	return NULL;
>> > > +}
>> > > +
>> > > +static inline int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser)
>> > > +{
>> > > +	return -EOPNOTSUPP;
>> > > +}
>> > > +
>> > > +static inline void iommu_domain_unpreserve(struct iommu_domain *domain)
>> > > +{
>> > > +}
>> > > +
>> > > +static inline int iommu_preserve_device(struct iommu_domain *domain,
>> > > +					struct device *dev, u64 token)
>> > > +{
>> > > +	return -EOPNOTSUPP;
>> > > +}
>> > > +
>> > > +static inline void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev)
>> > > +{
>> > > +}
>> > > +#endif
>> > > +
>> > >  int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler);
>> > >  int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler);
>> > >
>> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> > > index 54b8b48c762e..bd949c1ce7c5 100644
>> > > --- a/include/linux/iommu.h
>> > > +++ b/include/linux/iommu.h
>> > > @@ -14,6 +14,8 @@
>> > >  #include <linux/err.h>
>> > >  #include <linux/of.h>
>> > >  #include <linux/iova_bitmap.h>
>> > > +#include <linux/atomic.h>
>> > > +#include <linux/kho/abi/iommu.h>
>> > >  #include <uapi/linux/iommufd.h>
>> > >
>> > >  #define IOMMU_READ	(1 << 0)
>> > > @@ -248,6 +250,10 @@ struct iommu_domain {
>> > >  			struct list_head next;
>> > >  		};
>> > >  	};
>> > > +
>> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> > > +	struct iommu_domain_ser *preserved_state;
>> > > +#endif
>> > >  };
>> > >
>> > >  static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
>> > > @@ -647,6 +653,10 @@ __iommu_copy_struct_to_user(const struct iommu_user_data *dst_data,
>> > >   *               resources shared/passed to user space IOMMU instance. Associate
>> > >   *               it with a nesting @parent_domain. It is required for driver to
>> > >   *               set @viommu->ops pointing to its own viommu_ops
>> > > + * @preserve_device: Preserve state of a device for liveupdate.
>> > > + * @unpreserve_device: Unpreserve state that was preserved earlier.
>> > > + * @preserve: Preserve state of iommu translation hardware for liveupdate.
>> > > + * @unpreserve: Unpreserve state of iommu that was preserved earlier.
>> > >   * @owner: Driver module providing these ops
>> > >   * @identity_domain: An always available, always attachable identity
>> > >   *                   translation.
>> > > @@ -703,6 +713,11 @@ struct iommu_ops {
>> > >  			   struct iommu_domain *parent_domain,
>> > >  			   const struct iommu_user_data *user_data);
>> > >
>> > > +	int (*preserve_device)(struct device *dev, struct device_ser *device_ser);
>> > > +	void (*unpreserve_device)(struct device *dev, struct device_ser *device_ser);
>> >
>> > Nit: Let's move the _device ops under the comment:
>> > `/* Per device IOMMU features */`
>>
>> I will move these.
>> >
>> > > +	int (*preserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser);
>> > > +	void (*unpreserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser);
>> > > +
>> >
>> > I'm wondering if there's any benefit to add these ops under a #ifdef ?
>>
>> These should be NULL when liveupdate is disabled, so #ifdef should not
>> be needed. Not sure how much space we save if don't define these. I will
>> move these in #ifdef anyway.
>> >
>> > >  	const struct iommu_domain_ops *default_domain_ops;
>> > >  	struct module *owner;
>> > >  	struct iommu_domain *identity_domain;
>> > > @@ -749,6 +764,11 @@ struct iommu_ops {
>> > >   *                           specific mechanisms.
>> > >   * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>> > >   * @free: Release the domain after use.
>> > > + * @preserve: Preserve the iommu domain for liveupdate.
>> > > + *            Returns 0 on success, a negative errno on failure.
>> > > + * @unpreserve: Unpreserve the iommu domain that was preserved earlier.
>> > > + * @restore: Restore the iommu domain after liveupdate.
>> > > + *           Returns 0 on success, a negative errno on failure.
>> > >   */
>> > >  struct iommu_domain_ops {
>> > >  	int (*attach_dev)(struct iommu_domain *domain, struct device *dev,
>> > > @@ -779,6 +799,9 @@ struct iommu_domain_ops {
>> > >  				  unsigned long quirks);
>> > >
>> > >  	void (*free)(struct iommu_domain *domain);
>> > > +	int (*preserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser);
>> > > +	void (*unpreserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser);
>> > > +	int (*restore)(struct iommu_domain *domain, struct iommu_domain_ser *ser);
>> > >  };
>> > >
>> > >  /**
>> > > @@ -790,6 +813,8 @@ struct iommu_domain_ops {
>> > >   * @singleton_group: Used internally for drivers that have only one group
>> > >   * @max_pasids: number of supported PASIDs
>> > >   * @ready: set once iommu_device_register() has completed successfully
>> > > + * @outgoing_preserved_state: preserved iommu state of outgoing kernel for
>> > > + * liveupdate.
>> > >   */
>> > >  struct iommu_device {
>> > >  	struct list_head list;
>> > > @@ -799,6 +824,10 @@ struct iommu_device {
>> > >  	struct iommu_group *singleton_group;
>> > >  	u32 max_pasids;
>> > >  	bool ready;
>> > > +
>> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> > > +	struct iommu_ser *outgoing_preserved_state;
>> > > +#endif
>> > >  };
>> > >
>> > >  /**
>> > > @@ -853,6 +882,9 @@ struct dev_iommu {
>> > >  	u32				pci_32bit_workaround:1;
>> > >  	u32				require_direct:1;
>> > >  	u32				shadow_on_flush:1;
>> > > +#ifdef CONFIG_IOMMU_LIVEUPDATE
>> > > +	struct device_ser		*device_ser;
>> > > +#endif
>> > >  };
>> > >
>> > >  int iommu_device_register(struct iommu_device *iommu,
>> >
>> > Thanks,
>> > Praan
>> >
>> > [1] https://elixir.bootlin.com/linux/v7.0-rc3/source/kernel/liveupdate/kexec_handover.c#L1182

  reply	other threads:[~2026-03-17 20:13 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:09 [PATCH 00/14] iommu: Add live update state preservation Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks Samiullah Khawaja
2026-03-11 21:07   ` Pranjal Shrivastava
2026-03-12 16:43     ` Samiullah Khawaja
2026-03-12 23:43       ` Pranjal Shrivastava
2026-03-13 16:47         ` Samiullah Khawaja
2026-03-13 15:36       ` Pranjal Shrivastava
2026-03-13 16:58         ` Samiullah Khawaja
2026-03-16 22:54   ` Vipin Sharma
2026-03-17  1:06     ` Samiullah Khawaja
2026-03-23 23:27       ` Vipin Sharma
2026-02-03 22:09 ` [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Samiullah Khawaja
2026-03-12 23:10   ` Pranjal Shrivastava
2026-03-13 18:42     ` Samiullah Khawaja
2026-03-17 20:09       ` Pranjal Shrivastava
2026-03-17 20:13         ` Samiullah Khawaja [this message]
2026-03-17 20:23           ` Pranjal Shrivastava
2026-03-17 21:03             ` Vipin Sharma
2026-03-18 18:51               ` Pranjal Shrivastava
2026-03-18 17:49             ` Samiullah Khawaja
2026-03-17 19:58   ` Vipin Sharma
2026-03-17 20:33     ` Samiullah Khawaja
2026-03-24 19:06       ` Vipin Sharma
2026-03-24 19:45         ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 03/14] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-03-18 10:00   ` Pranjal Shrivastava
2026-03-18 16:54     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 04/14] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-03-03 16:42   ` Ankit Soni
2026-03-03 18:41     ` Samiullah Khawaja
2026-03-20 17:27       ` Pranjal Shrivastava
2026-03-20 18:12         ` Samiullah Khawaja
2026-03-17 20:59   ` Vipin Sharma
2026-03-20  9:28     ` Pranjal Shrivastava
2026-03-20 18:27       ` Samiullah Khawaja
2026-03-20 11:01     ` Pranjal Shrivastava
2026-03-20 18:56       ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 05/14] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-03-20 21:57   ` Pranjal Shrivastava
2026-03-23 16:41     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-03-19 16:04   ` Vipin Sharma
2026-03-19 16:27     ` Samiullah Khawaja
2026-03-20 23:01   ` Pranjal Shrivastava
2026-03-21 13:27     ` Pranjal Shrivastava
2026-03-23 18:32     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-03-19 20:54   ` Vipin Sharma
2026-03-20  1:05     ` Samiullah Khawaja
2026-03-22 19:51   ` Pranjal Shrivastava
2026-03-23 19:33     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 08/14] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-03-10  5:16   ` Ankit Soni
2026-03-10 21:47     ` Samiullah Khawaja
2026-03-22 21:59   ` Pranjal Shrivastava
2026-03-23 18:02     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 09/14] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-03-23 18:19   ` Pranjal Shrivastava
2026-03-23 18:51     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 10/14] iommufd-lu: Implement ioctl to let userspace mark an HWPT to be preserved Samiullah Khawaja
2026-03-19 23:35   ` Vipin Sharma
2026-03-20  0:40     ` Samiullah Khawaja
2026-03-20 23:34       ` Vipin Sharma
2026-03-23 16:24         ` Samiullah Khawaja
2026-03-25 14:37   ` Pranjal Shrivastava
2026-03-25 17:31     ` Samiullah Khawaja
2026-03-25 18:55       ` Pranjal Shrivastava
2026-03-25 20:19         ` Samiullah Khawaja
2026-03-25 20:36           ` Pranjal Shrivastava
2026-03-25 20:46             ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 11/14] iommufd-lu: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-02-25 23:47   ` Samiullah Khawaja
2026-03-03  5:56   ` Ankit Soni
2026-03-03 18:51     ` Samiullah Khawaja
2026-03-23 20:28   ` Vipin Sharma
2026-03-23 21:34     ` Samiullah Khawaja
2026-03-25 20:08   ` Pranjal Shrivastava
2026-03-25 20:32     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-03-23 20:59   ` Vipin Sharma
2026-03-23 21:38     ` Samiullah Khawaja
2026-03-25 20:24   ` Pranjal Shrivastava
2026-03-25 20:41     ` Samiullah Khawaja
2026-03-25 21:23       ` Pranjal Shrivastava
2026-03-26  0:16         ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 13/14] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-02-17  4:18   ` Ankit Soni
2026-03-03 18:35     ` Samiullah Khawaja
2026-03-23 21:17   ` Vipin Sharma
2026-03-23 22:07     ` Samiullah Khawaja
2026-03-24 20:30       ` Vipin Sharma
2026-03-25 20:55   ` Pranjal Shrivastava
2026-02-03 22:09 ` [PATCH 14/14] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
2026-03-23 22:18   ` Vipin Sharma
2026-03-25 21:05   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abm1esL1dzN7NTC7@google.com \
    --to=skhawaja@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=chrisl@kernel.org \
    --cc=dmatlack@google.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=saeedm@nvidia.com \
    --cc=shuah@kernel.org \
    --cc=vipinsh@google.com \
    --cc=will@kernel.org \
    --cc=witu@nvidia.com \
    --cc=zhuyifei@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox