From: Samiullah Khawaja <skhawaja@google.com>
To: Vipin Sharma <vipinsh@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Alex Williamson <alex@shazbot.org>,
Shuah Khan <shuah@kernel.org>,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
Adithya Jayachandran <ajayachandra@nvidia.com>,
Parav Pandit <parav@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
William Tu <witu@nvidia.com>,
Pratyush Yadav <pratyush@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
David Matlack <dmatlack@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>,
Pranjal Shrivastava <praan@google.com>,
YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks
Date: Tue, 17 Mar 2026 01:06:34 +0000 [thread overview]
Message-ID: <abidn8EGmi88wpCr@google.com> (raw)
In-Reply-To: <20260316165018.GA1768676.vipinsh@google.com>
On Mon, Mar 16, 2026 at 03:54:50PM -0700, Vipin Sharma wrote:
>On Tue, Feb 03, 2026 at 10:09:35PM +0000, Samiullah Khawaja wrote:
>> +config IOMMU_LIVEUPDATE
>> + bool "IOMMU live update state preservation support"
>> + depends on LIVEUPDATE && IOMMUFD
>> + help
>> + Enable support for preserving IOMMU state across a kexec live update.
>> +
>> + This allows devices managed by iommufd to maintain their DMA mappings
>> + during kexec base kernel update.
>> +
>> + If unsure, say N.
>> +
>
>Do we need a separate config? Can't we just use CONFIG_LIVEUPDATE?
We have a separate CONFIG here so that the phase 1/2 split for iommu
preservation doesn't break the vfio preservation. See following
discussion in the RFCv2:
https://lore.kernel.org/all/aYEpHBYxlQxhXrwl@google.com/
>
>> menuconfig IOMMU_SUPPORT
>> bool "IOMMU Hardware Support"
>> depends on MMU
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index 0275821f4ef9..b3715c5a6b97 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -15,6 +15,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>> obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>> obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE_KUNIT_TEST) += io-pgtable-arm-selftests.o
>> obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
>> +obj-$(CONFIG_IOMMU_LIVEUPDATE) += liveupdate.o
>
>It seems like there is a sorted order for CONFIG_IOMMU_* in the
>Makefile, lets keep it same if possible.
Will fix in the next revision.
>
>> +static void iommu_liveupdate_free_objs(u64 next, bool incoming)
>> +{
>> + struct iommu_objs_ser *objs;
>> +
>> + while (next) {
>> + objs = __va(next);
>
>There is also call to phys_to_virt() in other functions in this patch.
>Should we use the same here to be consistent?
Agreed. I will fix this.
>
>> + next = objs->next_objs;
>> +
>> + if (!incoming)
>> + kho_unpreserve_free(objs);
>> + else
>> + folio_put(virt_to_folio(objs));
>> + }
>> +}
>
>Instead of passing boolean, and calling with different arguments, I
>think it will be simpler to just have two functions
>
>- iommu_liveupdate_unpreserve()
>- iommu_liveupdate_folio_put()
This is a helper function to free the serialized state without
duplicating multiple checks for various type of state (iommu,
iommu_domain and devices).
Do you think maybe I should add these two functions and make it call the
helper?
>
>> +
>> +static void iommu_liveupdate_flb_free(struct iommu_lu_flb_obj *obj)
>> +{
>> + if (obj->iommu_domains)
>> + iommu_liveupdate_free_objs(obj->ser->iommu_domains_phys, false);
>> +
>> + if (obj->devices)
>> + iommu_liveupdate_free_objs(obj->ser->devices_phys, false);
>> +
>> + if (obj->iommus)
>> + iommu_liveupdate_free_objs(obj->ser->iommus_phys, false);
>> +
>> + kho_unpreserve_free(obj->ser);
>> + kfree(obj);
>> +}
>> +
>> +static int iommu_liveupdate_flb_preserve(struct liveupdate_flb_op_args *argp)
>> +{
>> + struct iommu_lu_flb_obj *obj;
>> + struct iommu_lu_flb_ser *ser;
>> + void *mem;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> + if (!obj)
>> + return -ENOMEM;
>> +
>> + mutex_init(&obj->lock);
>> + mem = kho_alloc_preserve(sizeof(*ser));
>> + if (IS_ERR(mem))
>> + goto err_free;
>> +
>> + ser = mem;
>> + obj->ser = ser;
>> +
>> + mem = kho_alloc_preserve(PAGE_SIZE);
>> + if (IS_ERR(mem))
>> + goto err_free;
>> +
>> + obj->iommu_domains = mem;
>> + ser->iommu_domains_phys = virt_to_phys(obj->iommu_domains);
>> +
>> + mem = kho_alloc_preserve(PAGE_SIZE);
>> + if (IS_ERR(mem))
>> + goto err_free;
>> +
>> + obj->devices = mem;
>> + ser->devices_phys = virt_to_phys(obj->devices);
>> +
>> + mem = kho_alloc_preserve(PAGE_SIZE);
>> + if (IS_ERR(mem))
>> + goto err_free;
>> +
>> + obj->iommus = mem;
>> + ser->iommus_phys = virt_to_phys(obj->iommus);
>> +
>> + argp->obj = obj;
>> + argp->data = virt_to_phys(ser);
>> + return 0;
>> +
>> +err_free:
>> + iommu_liveupdate_flb_free(obj);
>
>Generally, I have seen in the function goto will call corresponding
>error tags, and free corresponding allocations and all the one which
>happend before. It is easier to read code that way. I know you are
>combining the free call from iommu_liveupdate_flb_unpreserve() also.
>IMHO, code readability will be better this way.
I had that originally when I was writing this function, but it gets
really cluttered :(. Instead it is more clean without code duplication
using this one cleanup function here to free the state on error and also
when doing unpreserve. Please consider this a "destroy" function of obj
and it can be called from 2 places,
- Error during allocation of internal state.
- During unpreserve.
>
>> + return PTR_ERR(mem);
>> +}
>> +
>> +static void iommu_liveupdate_flb_unpreserve(struct liveupdate_flb_op_args *argp)
>> +{
>> + iommu_liveupdate_flb_free(argp->obj);
>> +}
>> +
>> +static void iommu_liveupdate_flb_finish(struct liveupdate_flb_op_args *argp)
>> +{
>> + struct iommu_lu_flb_obj *obj = argp->obj;
>> +
>> + if (obj->iommu_domains)
>> + iommu_liveupdate_free_objs(obj->ser->iommu_domains_phys, true);
>
>Can there be the case where obj->iommu_domains is NULL but
>obj->ser->iommu_domains_phys is not? If that is not possible, I will
>just simplify the patch and unconditionally call
>iommu_liveupdate_free_objs()?
Are you suggesting that on flb_finish() the obj->iommu_domains should be
non-NULL as flb_retrieve() succeeded? If yes, then that is correct. I
will update this to call the free_objs() without checking
obj->iommu_domains. I will do same for other types.
>
>> +
>> +static int iommu_liveupdate_flb_retrieve(struct liveupdate_flb_op_args *argp)
>> +{
>> + struct iommu_lu_flb_obj *obj;
>> + struct iommu_lu_flb_ser *ser;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_ATOMIC);
>> + if (!obj)
>> + return -ENOMEM;
>
>Is kzalloc() failure here recoverable whereas iommu_liveupdate_restore_objs()
>below is not? If it is not recoverable should there be a BUG_ON here?
Interesting... This should be recoverable as there is no corruption or
bad state. LUO will propagate this to caller and it should be handle
properly. I will make sure that this is handled in init.
>
>> +
>> + mutex_init(&obj->lock);
>> + BUG_ON(!kho_restore_folio(argp->data));
>> + ser = phys_to_virt(argp->data);
>> + obj->ser = ser;
>> +
>> + iommu_liveupdate_restore_objs(ser->iommu_domains_phys);
>> + obj->iommu_domains = phys_to_virt(ser->iommu_domains_phys);
>
>Can iommu_liveupdate_restore_obj() just return virtual address and we
>can simplify code to:
>
> obj->iommu_domains = iommu_liveupdate_restore_objs(ser->iommu_domains_phys);
Yes that is a good idea. I will change this.
>
>> +
>> + iommu_liveupdate_restore_objs(ser->devices_phys);
>> + obj->devices = phys_to_virt(ser->devices_phys);
>> +
>> + iommu_liveupdate_restore_objs(ser->iommus_phys);
>> + obj->iommus = phys_to_virt(ser->iommus_phys);
>> +
>> + argp->obj = obj;
>> +
>> + return 0;
>> +}
>> +
>> diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h
>
>I will recommend to use full name and not short "lu". iommu-liveupdate.h
>seems more readable and not too long.
Agreed. I will change this.
>
>> +#define MAX_IOMMU_SERS ((PAGE_SIZE - sizeof(struct iommus_ser)) / sizeof(struct iommu_ser))
>> +#define MAX_IOMMU_DOMAIN_SERS \
>> + ((PAGE_SIZE - sizeof(struct iommu_domains_ser)) / sizeof(struct iommu_domain_ser))
>> +#define MAX_DEVICE_SERS ((PAGE_SIZE - sizeof(struct devices_ser)) / sizeof(struct device_ser))
>
>This is per page limit, not whole serialization limit. May be we can
>name something like:
>
>- MAX_IOMMU_SERS_PER_PAGE, or
>- MAX_IOMMU_SERS_PAGE_CAPACITY
Agreed.
>
>> +
>> +struct iommu_lu_flb_obj {
>> + struct mutex lock;
>> + struct iommu_lu_flb_ser *ser;
>> +
>> + struct iommu_domains_ser *iommu_domains;
>> + struct iommus_ser *iommus;
>> + struct devices_ser *devices;
>> +} __packed;
>> +
>
>I think naming scheme used here is little hard to absorb when we have so
>many individual structs in this header file. Specifically, struct names like:
>
>- iommu_domains_ser vs iommu_domain_ser
>- iommus_ser vs iommu_ser
>- devices_ser vs device_ser
>- iommu_objs_ser vs iommu_obj_ser
>
>First three are showing container and its elements relation, however,
>last one doesn't have that relation but naming is same there.
>
>I will recommend to change the naming scheme of containers to something like:
>
> struct iommu_domain_ser_[hdr|header|table|arr] {};
> struct iommu_ser_hdr {}
> struct device_ser_hdr {}
>
>Individual element of container can be same.
>
>For objs, something like:
> iommu_objs_ser -> iommu_hdr_meta
>
>
Agreed. The singular vs plural for object vs aggregate is tricky. I will
rework these names. I am thinking something like following based on the
feedback on this patch,
struct iommu_ser_hdr; <= object hdr.
struct iommu_ser_arr_hdr <= array of objects hdr.
struct iommu_domain_ser <= contains a preserved domain.
struct iommu_domain_ser_arr <= array of domains.
Thanks,
Sami
next prev parent reply other threads:[~2026-03-17 1:06 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 22:09 [PATCH 00/14] iommu: Add live update state preservation Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks Samiullah Khawaja
2026-03-11 21:07 ` Pranjal Shrivastava
2026-03-12 16:43 ` Samiullah Khawaja
2026-03-12 23:43 ` Pranjal Shrivastava
2026-03-13 16:47 ` Samiullah Khawaja
2026-03-13 15:36 ` Pranjal Shrivastava
2026-03-13 16:58 ` Samiullah Khawaja
2026-04-10 13:51 ` Jason Gunthorpe
2026-04-13 6:41 ` Tian, Kevin
2026-03-16 22:54 ` Vipin Sharma
2026-03-17 1:06 ` Samiullah Khawaja [this message]
2026-03-23 23:27 ` Vipin Sharma
2026-02-03 22:09 ` [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Samiullah Khawaja
2026-03-12 23:10 ` Pranjal Shrivastava
2026-03-13 18:42 ` Samiullah Khawaja
2026-03-17 20:09 ` Pranjal Shrivastava
2026-03-17 20:13 ` Samiullah Khawaja
2026-03-17 20:23 ` Pranjal Shrivastava
2026-03-17 21:03 ` Vipin Sharma
2026-03-18 18:51 ` Pranjal Shrivastava
2026-03-18 17:49 ` Samiullah Khawaja
2026-03-17 19:58 ` Vipin Sharma
2026-03-17 20:33 ` Samiullah Khawaja
2026-03-24 19:06 ` Vipin Sharma
2026-03-24 19:45 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 03/14] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-03-18 10:00 ` Pranjal Shrivastava
2026-03-18 16:54 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 04/14] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-03-03 16:42 ` Ankit Soni
2026-03-03 18:41 ` Samiullah Khawaja
2026-03-20 17:27 ` Pranjal Shrivastava
2026-03-20 18:12 ` Samiullah Khawaja
2026-04-10 14:13 ` Jason Gunthorpe
2026-04-10 22:13 ` Samiullah Khawaja
2026-03-17 20:59 ` Vipin Sharma
2026-03-20 9:28 ` Pranjal Shrivastava
2026-03-20 18:27 ` Samiullah Khawaja
2026-03-20 11:01 ` Pranjal Shrivastava
2026-03-20 18:56 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 05/14] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-03-20 21:57 ` Pranjal Shrivastava
2026-03-23 16:41 ` Samiullah Khawaja
2026-04-10 14:16 ` Jason Gunthorpe
2026-04-10 23:02 ` Samiullah Khawaja
2026-04-10 23:16 ` Jason Gunthorpe
2026-04-13 19:31 ` Samiullah Khawaja
2026-04-13 22:33 ` Jason Gunthorpe
2026-04-13 23:28 ` Samiullah Khawaja
2026-04-13 23:40 ` Jason Gunthorpe
2026-02-03 22:09 ` [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-03-19 16:04 ` Vipin Sharma
2026-03-19 16:27 ` Samiullah Khawaja
2026-03-20 23:01 ` Pranjal Shrivastava
2026-03-21 13:27 ` Pranjal Shrivastava
2026-03-23 18:32 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-03-19 20:54 ` Vipin Sharma
2026-03-20 1:05 ` Samiullah Khawaja
2026-03-22 19:51 ` Pranjal Shrivastava
2026-03-23 19:33 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 08/14] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-03-10 5:16 ` Ankit Soni
2026-03-10 21:47 ` Samiullah Khawaja
2026-03-22 21:59 ` Pranjal Shrivastava
2026-03-23 18:02 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 09/14] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-03-23 18:19 ` Pranjal Shrivastava
2026-03-23 18:51 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 10/14] iommufd-lu: Implement ioctl to let userspace mark an HWPT to be preserved Samiullah Khawaja
2026-03-19 23:35 ` Vipin Sharma
2026-03-20 0:40 ` Samiullah Khawaja
2026-03-20 23:34 ` Vipin Sharma
2026-03-23 16:24 ` Samiullah Khawaja
2026-03-25 14:37 ` Pranjal Shrivastava
2026-03-25 17:31 ` Samiullah Khawaja
2026-03-25 18:55 ` Pranjal Shrivastava
2026-03-25 20:19 ` Samiullah Khawaja
2026-03-25 20:36 ` Pranjal Shrivastava
2026-03-25 20:46 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 11/14] iommufd-lu: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-02-25 23:47 ` Samiullah Khawaja
2026-03-03 5:56 ` Ankit Soni
2026-03-03 18:51 ` Samiullah Khawaja
2026-03-23 20:28 ` Vipin Sharma
2026-03-23 21:34 ` Samiullah Khawaja
2026-03-25 20:08 ` Pranjal Shrivastava
2026-03-25 20:32 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-03-23 20:59 ` Vipin Sharma
2026-03-23 21:38 ` Samiullah Khawaja
2026-03-25 20:24 ` Pranjal Shrivastava
2026-03-25 20:41 ` Samiullah Khawaja
2026-03-25 21:23 ` Pranjal Shrivastava
2026-03-26 0:16 ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 13/14] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-02-17 4:18 ` Ankit Soni
2026-03-03 18:35 ` Samiullah Khawaja
2026-03-23 21:17 ` Vipin Sharma
2026-03-23 22:07 ` Samiullah Khawaja
2026-03-24 20:30 ` Vipin Sharma
2026-03-25 20:55 ` Pranjal Shrivastava
2026-02-03 22:09 ` [PATCH 14/14] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
2026-03-23 22:18 ` Vipin Sharma
2026-03-27 18:32 ` Samiullah Khawaja
2026-03-25 21:05 ` Pranjal Shrivastava
2026-03-27 18:25 ` Samiullah Khawaja
2026-03-27 18:40 ` Samiullah Khawaja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abidn8EGmi88wpCr@google.com \
--to=skhawaja@google.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=baolu.lu@linux.intel.com \
--cc=chrisl@kernel.org \
--cc=dmatlack@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=praan@google.com \
--cc=pratyush@kernel.org \
--cc=robin.murphy@arm.com \
--cc=saeedm@nvidia.com \
--cc=shuah@kernel.org \
--cc=vipinsh@google.com \
--cc=will@kernel.org \
--cc=witu@nvidia.com \
--cc=zhuyifei@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.