All of lore.kernel.org
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: Pranjal Shrivastava <praan@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
	 Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	 Jason Gunthorpe <jgg@ziepe.ca>,
	Robin Murphy <robin.murphy@arm.com>,
	 Kevin Tian <kevin.tian@intel.com>,
	Alex Williamson <alex@shazbot.org>,
	 Shuah Khan <shuah@kernel.org>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	 kvm@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	 Adithya Jayachandran <ajayachandra@nvidia.com>,
	Parav Pandit <parav@nvidia.com>,
	 Leon Romanovsky <leonro@nvidia.com>,
	William Tu <witu@nvidia.com>,
	 Pratyush Yadav <pratyush@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 David Matlack <dmatlack@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Chris Li <chrisl@kernel.org>, Vipin Sharma <vipinsh@google.com>,
	 YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids
Date: Mon, 23 Mar 2026 19:33:52 +0000	[thread overview]
Message-ID: <acGNBlN1Nwe59L3_@google.com> (raw)
In-Reply-To: <acBIL052TH39FpNK@google.com>

On Sun, Mar 22, 2026 at 07:51:11PM +0000, Pranjal Shrivastava wrote:
>On Tue, Feb 03, 2026 at 10:09:41PM +0000, Samiullah Khawaja wrote:
>> During boot fetch the preserved state of IOMMU unit and if found then
>> restore the state.
>>
>> - Reuse the root_table that was preserved in the previous kernel.
>> - Reclaim the domain ids of the preserved domains for each preserved
>>   devices so these are not acquired by another domain.
>>
>> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
>> ---
>>  drivers/iommu/intel/iommu.c      | 26 +++++++++++++++------
>>  drivers/iommu/intel/iommu.h      |  7 ++++++
>>  drivers/iommu/intel/liveupdate.c | 40 ++++++++++++++++++++++++++++++++
>>  3 files changed, 66 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index c95de93fb72f..8acb7f8a7627 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -222,12 +222,12 @@ static void clear_translation_pre_enabled(struct intel_iommu *iommu)
>>  	iommu->flags &= ~VTD_FLAG_TRANS_PRE_ENABLED;
>>  }
>>
>> -static void init_translation_status(struct intel_iommu *iommu)
>> +static void init_translation_status(struct intel_iommu *iommu, bool restoring)
>>  {
>>  	u32 gsts;
>>
>>  	gsts = readl(iommu->reg + DMAR_GSTS_REG);
>> -	if (gsts & DMA_GSTS_TES)
>> +	if (!restoring && (gsts & DMA_GSTS_TES))
>>  		iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
>>  }
>>
>> @@ -670,10 +670,16 @@ void dmar_fault_dump_ptes(struct intel_iommu *iommu, u16 source_id,
>>  #endif
>>
>>  /* iommu handling */
>> -static int iommu_alloc_root_entry(struct intel_iommu *iommu)
>> +static int iommu_alloc_root_entry(struct intel_iommu *iommu, struct iommu_ser *restored_state)
>>  {
>>  	struct root_entry *root;
>>
>> +	if (restored_state) {
>> +		intel_iommu_liveupdate_restore_root_table(iommu, restored_state);
>> +		__iommu_flush_cache(iommu, iommu->root_entry, ROOT_SIZE);
>> +		return 0;
>> +	}
>
>Instead of putting this inside the allocator, shouldn't init_dmars and
>intel_iommu_add check for iommu_ser and call
>intel_iommu_liveupdate_restore_root_table() directly, bypassing the
>allocation entirely? This looks like it could be a stand-alone function
>which has nothing to do with allocation.

Agreed. Will move the check out into the caller.
>
>> +
>>  	root = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC, SZ_4K);
>>  	if (!root) {
>>  		pr_err("Allocating root entry for %s failed\n",
>> @@ -1614,6 +1620,7 @@ static int copy_translation_tables(struct intel_iommu *iommu)
>>
>>  static int __init init_dmars(void)
>>  {
>> +	struct iommu_ser *iommu_ser = NULL;
>>  	struct dmar_drhd_unit *drhd;
>>  	struct intel_iommu *iommu;
>>  	int ret;
>> @@ -1636,8 +1643,10 @@ static int __init init_dmars(void)
>>  						   intel_pasid_max_id);
>>  		}
>>
>> +		iommu_ser = iommu_get_preserved_data(iommu->reg_phys, IOMMU_INTEL);
>> +
>>  		intel_iommu_init_qi(iommu);
>> -		init_translation_status(iommu);
>> +		init_translation_status(iommu, !!iommu_ser);
>>
>>  		if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
>>  			iommu_disable_translation(iommu);
>> @@ -1651,7 +1660,7 @@ static int __init init_dmars(void)
>>  		 * we could share the same root & context tables
>>  		 * among all IOMMU's. Need to Split it later.
>>  		 */
>> -		ret = iommu_alloc_root_entry(iommu);
>> +		ret = iommu_alloc_root_entry(iommu, iommu_ser);
>>  		if (ret)
>>  			goto free_iommu;
>>
>> @@ -2110,15 +2119,18 @@ int dmar_parse_one_satc(struct acpi_dmar_header *hdr, void *arg)
>>  static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
>>  {
>>  	struct intel_iommu *iommu = dmaru->iommu;
>> +	struct iommu_ser *iommu_ser = NULL;
>>  	int ret;
>>
>
>Nit: Add: /* Fetch the preserved context using MMIO base as a token */ ?

Will add.
>
>> +	iommu_ser = iommu_get_preserved_data(iommu->reg_phys, IOMMU_INTEL);
>> +
>>  	/*
>>  	 * Disable translation if already enabled prior to OS handover.
>>  	 */
>> -	if (iommu->gcmd & DMA_GCMD_TE)
>> +	if (!iommu_ser && iommu->gcmd & DMA_GCMD_TE)
>>  		iommu_disable_translation(iommu);
>>
>> -	ret = iommu_alloc_root_entry(iommu);
>> +	ret = iommu_alloc_root_entry(iommu, iommu_ser);
>
>I understand that iommu_get_preserved_data() will eventually return NULL
>after the flb_finish op has executed (based on the LUO IOCTLs dropping
>the incoming state), but I'm sensing a potential UAF/double-restore
>issue here that could happen during the boot window.
>
>I believe we could restore the same context multiple times? I see
>intel_iommu_add() is called from both dmar_device_add() and
>dmar_device_remove() paths, and the ACPI probe has the following
>sequence [1]:
>
>static int acpi_pci_root_add(struct acpi_device *device, ...)
>{
>	// ...
>	if (hotadd && dmar_device_add(handle)) {
>		result = -ENXIO;
>		goto end;
>	}
>
>	// ...
>	root->bus = pci_acpi_scan_root(root);
>	if (!root->bus) {
>		// ...
>		result = -ENODEV;
>		goto remove_dmar;
>	}
>	// ...
>
>remove_dmar:
>	if (hotadd)
>		dmar_device_remove(handle);
>end:
>	return result;
>}
>
>If we successfully restored a domain during dmar_device_add(), but the
>ACPI probe fails later (e.g., pci_acpi_scan_root fails), we jump to
>remove_dmar. This tears down the DMAR unit, it unwinds via
>dmar_device_remove() which eventually calls dmar_iommu_hotplug(false)
>where we:
>
>	disable_dmar_iommu(iommu);
>	free_dmar_iommu(iommu);
>
>At this point, the root table folios are freed back to the allocator.
>
>However, if a re-scan is then triggered before the FLB drops the
>incoming state, we would call:
>
>dmar_device_add() -> intel_iommu_add() -> iommu_alloc_root_entry() again
>
>Because the KHO state wasn't marked as deleted/consumed,
>iommu_get_preserved_data() will hand us the exact same iommu_ser pointer?
>
>In which case, we'd call kho_restore_folio(iommu_ser->intel.root_table)
>on a physical page that might have already been reallocated?
>
>Shouldn't the restored state be explicitly marked as consumed
>(obj.deleted = 1), and shouldn't the driver properly unpreserve/clean up
>the KHO tracking during the free_dmar_iommu() teardown path?

Thats a good point.

I think on disable/free the restored state should not be freed abruptly.
The iommu should be able to reuse the same state. We just need to mark
it as consumed/restored. I will rework the disable/free code paths to
handle this.
>
>>  	if (ret)
>>  		goto out;
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 70032e86437d..d7bf63aff17d 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1283,6 +1283,8 @@ int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_se
>>  void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *device_ser);
>>  int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser);
>>  void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser);
>> +void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu,
>> +					       struct iommu_ser *iommu_ser);
>>  #else
>>  static inline int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_ser)
>>  {
>> @@ -1301,6 +1303,11 @@ static inline int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_
>>  static inline void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser)
>>  {
>>  }
>> +
>> +static inline void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu,
>> +							     struct iommu_ser *iommu_ser)
>> +{
>> +}
>>  #endif
>>
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>> diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c
>> index 82ba1daf1711..6dcb5783d1db 100644
>> --- a/drivers/iommu/intel/liveupdate.c
>> +++ b/drivers/iommu/intel/liveupdate.c
>> @@ -73,6 +73,46 @@ static int preserve_iommu_context(struct intel_iommu *iommu)
>>  	return ret;
>>  }
>>
>> +static void restore_iommu_context(struct intel_iommu *iommu)
>> +{
>> +	struct context_entry *context;
>> +	int i;
>> +
>> +	for (i = 0; i < ROOT_ENTRY_NR; i++) {
>> +		context = iommu_context_addr(iommu, i, 0, 0);
>> +		if (context)
>> +			BUG_ON(!kho_restore_folio(virt_to_phys(context)));
>> +
>> +		if (!sm_supported(iommu))
>> +			continue;
>> +
>> +		context = iommu_context_addr(iommu, i, 0x80, 0);
>> +		if (context)
>> +			BUG_ON(!kho_restore_folio(virt_to_phys(context)));
>> +	}
>> +}
>> +
>> +static int __restore_used_domain_ids(struct device_ser *ser, void *arg)
>> +{
>> +	int id = ser->domain_iommu_ser.did;
>> +	struct intel_iommu *iommu = arg;
>> +
>
>Shouldn't we check if the did actually belongs to the iommu instance?
>iommu_for_each_preserved_device() iterates over all preserved devices in
>the system. However, here (__restore_used_domain_ids) we allocate the
>device's did in the current iommu->domain_ida without checking if that
>device actually belongs to the current IOMMU?

Yes, this needs to be checked. I will add this.
>
>On multi-IOMMU systems, this will cause every IOMMU's IDA to be
>cross-polluted with the domain IDs of devices attached to other IOMMUs.
>We must verify the device belongs to this specific IOMMU first, maybe:
>
>if (ser->domain_iommu_ser.iommu_phys != iommu->reg_phys)
>		return 0;
>
>> +	ida_alloc_range(&iommu->domain_ida, id, id, GFP_ATOMIC);
>> +	return 0;
>> +}
>> +
>> +void intel_iommu_liveupdate_restore_root_table(struct intel_iommu *iommu,
>> +					       struct iommu_ser *iommu_ser)
>> +{
>> +	BUG_ON(!kho_restore_folio(iommu_ser->intel.root_table));
>> +	iommu->root_entry = __va(iommu_ser->intel.root_table);
>> +
>> +	restore_iommu_context(iommu);
>> +	iommu_for_each_preserved_device(__restore_used_domain_ids, iommu);
>> +	pr_info("Restored IOMMU[0x%llx] Root Table at: 0x%llx\n",
>> +		iommu->reg_phys, iommu_ser->intel.root_table);
>> +}
>> +
>>  int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_ser)
>>  {
>>  	struct device_domain_info *info = dev_iommu_priv_get(dev);
>
>Thanks,
>Praan
>
>[1] https://elixir.bootlin.com/linux/v7.0-rc4/source/drivers/acpi/pci_root.c#L728

Thanks,
Sami

  reply	other threads:[~2026-03-23 19:33 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03 22:09 [PATCH 00/14] iommu: Add live update state preservation Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 01/14] iommu: Implement IOMMU LU FLB callbacks Samiullah Khawaja
2026-03-11 21:07   ` Pranjal Shrivastava
2026-03-12 16:43     ` Samiullah Khawaja
2026-03-12 23:43       ` Pranjal Shrivastava
2026-03-13 16:47         ` Samiullah Khawaja
2026-03-13 15:36       ` Pranjal Shrivastava
2026-03-13 16:58         ` Samiullah Khawaja
2026-04-10 13:51     ` Jason Gunthorpe
2026-04-13  6:41       ` Tian, Kevin
2026-03-16 22:54   ` Vipin Sharma
2026-03-17  1:06     ` Samiullah Khawaja
2026-03-23 23:27       ` Vipin Sharma
2026-02-03 22:09 ` [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Samiullah Khawaja
2026-03-12 23:10   ` Pranjal Shrivastava
2026-03-13 18:42     ` Samiullah Khawaja
2026-03-17 20:09       ` Pranjal Shrivastava
2026-03-17 20:13         ` Samiullah Khawaja
2026-03-17 20:23           ` Pranjal Shrivastava
2026-03-17 21:03             ` Vipin Sharma
2026-03-18 18:51               ` Pranjal Shrivastava
2026-03-18 17:49             ` Samiullah Khawaja
2026-03-17 19:58   ` Vipin Sharma
2026-03-17 20:33     ` Samiullah Khawaja
2026-03-24 19:06       ` Vipin Sharma
2026-03-24 19:45         ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 03/14] liveupdate: luo_file: Add internal APIs for file preservation Samiullah Khawaja
2026-03-18 10:00   ` Pranjal Shrivastava
2026-03-18 16:54     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 04/14] iommu/pages: Add APIs to preserve/unpreserve/restore iommu pages Samiullah Khawaja
2026-03-03 16:42   ` Ankit Soni
2026-03-03 18:41     ` Samiullah Khawaja
2026-03-20 17:27       ` Pranjal Shrivastava
2026-03-20 18:12         ` Samiullah Khawaja
2026-04-10 14:13           ` Jason Gunthorpe
2026-04-10 22:13             ` Samiullah Khawaja
2026-03-17 20:59   ` Vipin Sharma
2026-03-20  9:28     ` Pranjal Shrivastava
2026-03-20 18:27       ` Samiullah Khawaja
2026-03-20 11:01     ` Pranjal Shrivastava
2026-03-20 18:56       ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 05/14] iommupt: Implement preserve/unpreserve/restore callbacks Samiullah Khawaja
2026-03-20 21:57   ` Pranjal Shrivastava
2026-03-23 16:41     ` Samiullah Khawaja
2026-04-10 14:16     ` Jason Gunthorpe
2026-04-10 23:02       ` Samiullah Khawaja
2026-04-10 23:16         ` Jason Gunthorpe
2026-04-13 19:31           ` Samiullah Khawaja
2026-04-13 22:33             ` Jason Gunthorpe
2026-04-13 23:28               ` Samiullah Khawaja
2026-04-13 23:40                 ` Jason Gunthorpe
2026-02-03 22:09 ` [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Samiullah Khawaja
2026-03-19 16:04   ` Vipin Sharma
2026-03-19 16:27     ` Samiullah Khawaja
2026-03-20 23:01   ` Pranjal Shrivastava
2026-03-21 13:27     ` Pranjal Shrivastava
2026-03-23 18:32     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 07/14] iommu/vt-d: Restore IOMMU state and reclaimed domain ids Samiullah Khawaja
2026-03-19 20:54   ` Vipin Sharma
2026-03-20  1:05     ` Samiullah Khawaja
2026-03-22 19:51   ` Pranjal Shrivastava
2026-03-23 19:33     ` Samiullah Khawaja [this message]
2026-02-03 22:09 ` [PATCH 08/14] iommu: Restore and reattach preserved domains to devices Samiullah Khawaja
2026-03-10  5:16   ` Ankit Soni
2026-03-10 21:47     ` Samiullah Khawaja
2026-03-22 21:59   ` Pranjal Shrivastava
2026-03-23 18:02     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 09/14] iommu/vt-d: preserve PASID table of preserved device Samiullah Khawaja
2026-03-23 18:19   ` Pranjal Shrivastava
2026-03-23 18:51     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 10/14] iommufd-lu: Implement ioctl to let userspace mark an HWPT to be preserved Samiullah Khawaja
2026-03-19 23:35   ` Vipin Sharma
2026-03-20  0:40     ` Samiullah Khawaja
2026-03-20 23:34       ` Vipin Sharma
2026-03-23 16:24         ` Samiullah Khawaja
2026-03-25 14:37   ` Pranjal Shrivastava
2026-03-25 17:31     ` Samiullah Khawaja
2026-03-25 18:55       ` Pranjal Shrivastava
2026-03-25 20:19         ` Samiullah Khawaja
2026-03-25 20:36           ` Pranjal Shrivastava
2026-03-25 20:46             ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 11/14] iommufd-lu: Persist iommu hardware pagetables for live update Samiullah Khawaja
2026-02-25 23:47   ` Samiullah Khawaja
2026-03-03  5:56   ` Ankit Soni
2026-03-03 18:51     ` Samiullah Khawaja
2026-03-23 20:28   ` Vipin Sharma
2026-03-23 21:34     ` Samiullah Khawaja
2026-03-25 20:08   ` Pranjal Shrivastava
2026-03-25 20:32     ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 12/14] iommufd: Add APIs to preserve/unpreserve a vfio cdev Samiullah Khawaja
2026-03-23 20:59   ` Vipin Sharma
2026-03-23 21:38     ` Samiullah Khawaja
2026-03-25 20:24   ` Pranjal Shrivastava
2026-03-25 20:41     ` Samiullah Khawaja
2026-03-25 21:23       ` Pranjal Shrivastava
2026-03-26  0:16         ` Samiullah Khawaja
2026-02-03 22:09 ` [PATCH 13/14] vfio/pci: Preserve the iommufd state of the " Samiullah Khawaja
2026-02-17  4:18   ` Ankit Soni
2026-03-03 18:35     ` Samiullah Khawaja
2026-03-23 21:17   ` Vipin Sharma
2026-03-23 22:07     ` Samiullah Khawaja
2026-03-24 20:30       ` Vipin Sharma
2026-03-25 20:55   ` Pranjal Shrivastava
2026-02-03 22:09 ` [PATCH 14/14] iommufd/selftest: Add test to verify iommufd preservation Samiullah Khawaja
2026-03-23 22:18   ` Vipin Sharma
2026-03-27 18:32     ` Samiullah Khawaja
2026-03-25 21:05   ` Pranjal Shrivastava
2026-03-27 18:25     ` Samiullah Khawaja
2026-03-27 18:40       ` Samiullah Khawaja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acGNBlN1Nwe59L3_@google.com \
    --to=skhawaja@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=chrisl@kernel.org \
    --cc=dmatlack@google.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=praan@google.com \
    --cc=pratyush@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=saeedm@nvidia.com \
    --cc=shuah@kernel.org \
    --cc=vipinsh@google.com \
    --cc=will@kernel.org \
    --cc=witu@nvidia.com \
    --cc=zhuyifei@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.