From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CDAA36A030 for ; Fri, 20 Mar 2026 23:02:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774047732; cv=none; b=Xl97ZU06IjmPJJdVsEujwoJiq+ppBuJrBLMapc36tCsxcFFZfyz43Hr1T37PTi423aFbMqAVuZwDyV1+VpKF5pXLD56f/iupjvI6J9WL5kWo8GYi66nZXdZKpg4MaQzYXbSh7BxAB+K7jZaYz5HRtKgxnOIi7foriTsr7WV3Zzk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774047732; c=relaxed/simple; bh=Nn0ALrdXYnzXye+eQmzrx53e5T7Ey86s5EVAM2zENXE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KjZ30jsXUY0T4QzT5tbokyaKUfZULsWkr04eag01ZE/TsO3cFS/1fERuBE8WWGvDU8aq61RWmnWOgOry3EkWG8o/l30+hj7Hs7DroyxnWUUL9297UvI0ggymQJfWCozqXLsK3hlDEH5Kmhh8VaM8YOX48LcA6ddkAeKQtRBNi3w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VM3FY/au; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VM3FY/au" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2b04c9e3eb7so34585ad.0 for ; Fri, 20 Mar 2026 16:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1774047729; x=1774652529; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rRn8bETX30Lb2Xb+2jweq88kWDnEQZ7Z1tWRbRz5s4s=; b=VM3FY/auT2DMh+Ju0fetaDIq/gi/E0/bLq/EUXl59pLDtms+r30EvqBzBbBRm31GKd 29TOJY4EorBSXUvvTdje/rMuegjXk6IgVn8I5+GVh5n/8z92ncMEYKZO/1Z8cPiROYEB iJV8stXbNRTXR1lfdXwcVoDGXZcbQLmNn4MU8UdaQt/rvqSFFSA8t20HWVVesyVS/ELU rnHtuY2OSfrUeMv2wLA2J6gBHzE0kYj3DADmiJvOcxHAbJs4UUBP3O2rIq7Os/XG9uyT 3SC/wMqWgjOdrCTjry7cFp9Q4z2obDFe34jSAIlvziDk09AXTzAWiJofaY7g+NIiLxky mIDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774047729; x=1774652529; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rRn8bETX30Lb2Xb+2jweq88kWDnEQZ7Z1tWRbRz5s4s=; b=SK3YpY2U4ecEhnFBvglDH21E5Qoj63SYxEp+u0CcPDKb69/BWl+c67IjXCuUnVpvzd qq98CvVaAF/MZ8Wug5hbW+qEsxw+21Fb7B+TM4F+zxUFDicTrja7L0BriirVoM+KRmCk goE1xNnXLYn2o5k5OZfC1KykIpqo6VyObqX8LM9K23A3sri38kgt0uW7+GakDheQk4GO fe2qPdkXuCWFWOJ49/F3HxNe/pOtEnNKS/WorffwKplrbtXmaGqwE/x0QWv8jHoMbz9K O+doTxUDAIvH+ADMFcREyacZx8vrJwA2Ue+yV74kHx0Pwp2Aq7Z16/gL6/zCfWuxJtzQ TVjg== X-Forwarded-Encrypted: i=1; AJvYcCWCkLfHJ5eyYHpAAoumbesltcmmdAPD2DytaP1IfNaDx0+qvzi7t8sFu4OEVvBRP+Pt8Sv+xONWTZMaZco=@vger.kernel.org X-Gm-Message-State: AOJu0YxsDrED30ligWN1KclhR4jkBhOkuwhXt4OmbmNTH1d4Y9Hjl7xm TBtkZErcMXVzAXdBE5CZuBpRbIiJ1nQIRwJGmYXDSEzDDla68+chCJvC/H8KT8iLhQ== X-Gm-Gg: ATEYQzw2VXDLz8WR5U0HQezUzYcCddieG5vhQ3uc42BLkFcL4LLFgsd6XnQzHEIk/D/ Luydli9ZnyNLfQYDqwzpQPQQ85eBgcEk+lbJM5PM9aaVbtf5ABueApsztv8FgokU1n9wg13Gvbv KZqeUauLzZHe+dHus1t3OhnFIKnox+0jJ0fjLXZd0Tt8kUbUcqEFYB9rQDCuoEKsM5TDPzWkEuc hVGHwniDNhgYP3726kXy3CYxfRSCAN6GHCKBYOHWdOG796y4DEVozaTXaPRaF9VN4CEtBrPVgYI 2LKrlbhACcwb315Dw0c3xa1ff0skgQyoApk+y3z4uhDdhSaNB1uIemg6LEKuf8q/l6BkSS0Y82X xzJ1dpQGupDclNYJ2Q/yuqeoMuZbziHJEe4M16XQDiUAjGIeAHCyoZK5lDPXS1beh+wrThMxtJO SduYf6iOD5RY8lL7jDhOWe4pJc8B5b3TPsOsoxx1Z0sXoClJZZowRruIbhIw== X-Received: by 2002:a17:902:f707:b0:2b0:5e19:1862 with SMTP id d9443c01a7336-2b08c4a14c0mr598585ad.5.1774047728180; Fri, 20 Mar 2026 16:02:08 -0700 (PDT) Received: from google.com (10.129.124.34.bc.googleusercontent.com. [34.124.129.10]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c743a7ffafdsm2476969a12.7.2026.03.20.16.02.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Mar 2026 16:02:07 -0700 (PDT) Date: Fri, 20 Mar 2026 23:01:59 +0000 From: Pranjal Shrivastava To: Samiullah Khawaja Cc: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Vipin Sharma , YiFei Zhu Subject: Re: [PATCH 06/14] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Message-ID: References: <20260203220948.2176157-1-skhawaja@google.com> <20260203220948.2176157-7-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260203220948.2176157-7-skhawaja@google.com> On Tue, Feb 03, 2026 at 10:09:40PM +0000, Samiullah Khawaja wrote: > Add implementation of the device and iommu presevation in a separate > file. Also set the device and iommu preserve/unpreserve ops in the > struct iommu_ops. > > During normal shutdown the iommu translation is disabled. Since the root > table is preserved during live update, it needs to be cleaned up and the > context entries of the unpreserved devices need to be cleared. > > Signed-off-by: Samiullah Khawaja > --- > drivers/iommu/intel/Makefile | 1 + > drivers/iommu/intel/iommu.c | 47 ++++++++++- > drivers/iommu/intel/iommu.h | 27 +++++++ > drivers/iommu/intel/liveupdate.c | 134 +++++++++++++++++++++++++++++++ > 4 files changed, 205 insertions(+), 4 deletions(-) > create mode 100644 drivers/iommu/intel/liveupdate.c > > diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile > index ada651c4a01b..d38fc101bc35 100644 > --- a/drivers/iommu/intel/Makefile > +++ b/drivers/iommu/intel/Makefile > @@ -6,3 +6,4 @@ obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o > obj-$(CONFIG_INTEL_IOMMU_SVM) += svm.o > obj-$(CONFIG_IRQ_REMAP) += irq_remapping.o > obj-$(CONFIG_INTEL_IOMMU_PERF_EVENTS) += perfmon.o > +obj-$(CONFIG_IOMMU_LIVEUPDATE) += liveupdate.o > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 134302fbcd92..c95de93fb72f 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -52,6 +53,8 @@ static int rwbf_quirk; > > #define rwbf_required(iommu) (rwbf_quirk || cap_rwbf((iommu)->cap)) > > +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *iommu); > + > /* > * set to 1 to panic kernel if can't successfully enable VT-d > * (used when kernel is launched w/ TXT) > @@ -60,8 +63,6 @@ static int force_on = 0; > static int intel_iommu_tboot_noforce; > static int no_platform_optin; > > -#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry)) > - > /* > * Take a root_entry and return the Lower Context Table Pointer (LCTP) > * if marked present. > @@ -2378,8 +2379,10 @@ void intel_iommu_shutdown(void) > /* Disable PMRs explicitly here. */ > iommu_disable_protect_mem_regions(iommu); > > - /* Make sure the IOMMUs are switched off */ > - iommu_disable_translation(iommu); > + if (!__maybe_clean_unpreserved_context_entries(iommu)) { > + /* Make sure the IOMMUs are switched off */ > + iommu_disable_translation(iommu); > + } > } > } > > @@ -2902,6 +2905,38 @@ static const struct iommu_dirty_ops intel_second_stage_dirty_ops = { > .set_dirty_tracking = intel_iommu_set_dirty_tracking, > }; > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *iommu) > +{ > + struct device_domain_info *info; > + struct pci_dev *pdev = NULL; > + > + if (!iommu->iommu.outgoing_preserved_state) > + return false; > + > + for_each_pci_dev(pdev) { Shouldn't we *also* clean context entries for non-pci devices? AFAICT, in the current code, we simply turn off translation during shutdowm. However, with this change, because this cleanup loop only runs for_each_pci_dev(), it completely ignores any non-PCI devices (like platform devices) that might be attached to this IOMMU? Since the IOMMU is left powered on during the handover, skipping non-PCI devices in this cleanup loop means their context entries are left perfectly intact and active in the hardware root table. There could be a chance that the new kernel doesn't reclaim/overwrite these entries in which case it could result in zombie DMAs surviving the KHO. > + info = dev_iommu_priv_get(&pdev->dev); > + if (!info) > + continue; > + > + if (info->iommu != iommu) > + continue; > + > + if (dev_iommu_preserved_state(&pdev->dev)) > + continue; > + > + domain_context_clear(info); > + } > + > + return true; > +} > +#else > +static bool __maybe_clean_unpreserved_context_entries(struct intel_iommu *iommu) > +{ > + return false; > +} > +#endif > + > static struct iommu_domain * > intel_iommu_domain_alloc_second_stage(struct device *dev, > struct intel_iommu *iommu, u32 flags) > @@ -3925,6 +3960,10 @@ const struct iommu_ops intel_iommu_ops = { > .is_attach_deferred = intel_iommu_is_attach_deferred, > .def_domain_type = device_def_domain_type, > .page_response = intel_iommu_page_response, > + .preserve_device = intel_iommu_preserve_device, > + .unpreserve_device = intel_iommu_unpreserve_device, > + .preserve = intel_iommu_preserve, > + .unpreserve = intel_iommu_unpreserve, > }; > > static void quirk_iommu_igfx(struct pci_dev *dev) > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 25c5e22096d4..70032e86437d 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -557,6 +557,8 @@ struct root_entry { > u64 hi; > }; > > +#define ROOT_ENTRY_NR (VTD_PAGE_SIZE / sizeof(struct root_entry)) > + > /* > * low 64 bits: > * 0: present > @@ -1276,6 +1278,31 @@ static inline int iopf_for_domain_replace(struct iommu_domain *new, > return 0; > } > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > +int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_ser); > +void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *device_ser); > +int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser); > +void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser); > +#else > +static inline int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_ser) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *device_ser) > +{ > +} > + > +static inline int intel_iommu_preserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void intel_iommu_unpreserve(struct iommu_device *iommu, struct iommu_ser *iommu_ser) > +{ > +} > +#endif > + > #ifdef CONFIG_INTEL_IOMMU_SVM > void intel_svm_check(struct intel_iommu *iommu); > struct iommu_domain *intel_svm_domain_alloc(struct device *dev, > diff --git a/drivers/iommu/intel/liveupdate.c b/drivers/iommu/intel/liveupdate.c > new file mode 100644 > index 000000000000..82ba1daf1711 > --- /dev/null > +++ b/drivers/iommu/intel/liveupdate.c > @@ -0,0 +1,134 @@ > +// SPDX-License-Identifier: GPL-2.0-only > + > +/* > + * Copyright (C) 2025, Google LLC > + * Author: Samiullah Khawaja > + */ > + > +#define pr_fmt(fmt) "iommu: liveupdate: " fmt > + > +#include > +#include > +#include > +#include > +#include > + > +#include "iommu.h" > +#include "../iommu-pages.h" > + > +static void unpreserve_iommu_context(struct intel_iommu *iommu, int end) > +{ > + struct context_entry *context; > + int i; > + > + if (end < 0) > + end = ROOT_ENTRY_NR; > + > + for (i = 0; i < end; i++) { > + context = iommu_context_addr(iommu, i, 0, 0); > + if (context) > + iommu_unpreserve_page(context); > + > + if (!sm_supported(iommu)) > + continue; > + > + context = iommu_context_addr(iommu, i, 0x80, 0); > + if (context) > + iommu_unpreserve_page(context); > + } > +} > + > +static int preserve_iommu_context(struct intel_iommu *iommu) > +{ > + struct context_entry *context; > + int ret; > + int i; > + > + for (i = 0; i < ROOT_ENTRY_NR; i++) { > + context = iommu_context_addr(iommu, i, 0, 0); > + if (context) { > + ret = iommu_preserve_page(context); > + if (ret) > + goto error; > + } > + > + if (!sm_supported(iommu)) > + continue; > + > + context = iommu_context_addr(iommu, i, 0x80, 0); > + if (context) { > + ret = iommu_preserve_page(context); > + if (ret) > + goto error_sm; > + } > + } > + > + return 0; > + > +error_sm: > + context = iommu_context_addr(iommu, i, 0, 0); > + iommu_unpreserve_page(context); > +error: > + unpreserve_iommu_context(iommu, i); > + return ret; > +} > + > +int intel_iommu_preserve_device(struct device *dev, struct device_ser *device_ser) > +{ > + struct device_domain_info *info = dev_iommu_priv_get(dev); > + > + if (!dev_is_pci(dev)) > + return -EOPNOTSUPP; Maybe also WARN_ON_ONCE ? > + > + if (!info) > + return -EINVAL; > + > + device_ser->domain_iommu_ser.did = domain_id_iommu(info->domain, info->iommu); > + return 0; > +} > + > +void intel_iommu_unpreserve_device(struct device *dev, struct device_ser *device_ser) > +{ > +} Nit: We can just not populate the unpreserve_device op here, we don't need an empty function > + > +int intel_iommu_preserve(struct iommu_device *iommu_dev, struct iommu_ser *ser) > +{ > + struct intel_iommu *iommu; > + int ret; > + > + iommu = container_of(iommu_dev, struct intel_iommu, iommu); > + > + spin_lock(&iommu->lock); Minor note: We call this with the session->mutex acquired which is a sleepable/preemptable context whereas spin_lock disables preemption and puts us in an atomic context. While iommu_context_addr() happens to be safe here because it uses GFP_ATOMIC & we pass alloc=0, we are heavily relying on the assumption that iommu_context_addr() or other helpers will never sleep or attempt a GFP_KERNEL allocation under the hood. Given how easy it would be for a future refactor to accidentally introduce a sleeping lock inside any of the other helpers, I'd recommend adding an explicit comment here warning that this runs in an atomic context (same for unpreserve below). >r + ret = preserve_iommu_context(iommu); > + if (ret) > + goto err; > + > + ret = iommu_preserve_page(iommu->root_entry); > + if (ret) { > + unpreserve_iommu_context(iommu, -1); Nit: The same overloading discussion as Vipin pointed out in the other patch. > + goto err; > + } > + > + ser->intel.phys_addr = iommu->reg_phys; > + ser->intel.root_table = __pa(iommu->root_entry); > + ser->type = IOMMU_INTEL; > + ser->token = ser->intel.phys_addr; > + spin_unlock(&iommu->lock); > + > + return 0; > +err: > + spin_unlock(&iommu->lock); > + return ret; > +} > + > +void intel_iommu_unpreserve(struct iommu_device *iommu_dev, struct iommu_ser *iommu_ser) > +{ > + struct intel_iommu *iommu; > + > + iommu = container_of(iommu_dev, struct intel_iommu, iommu); > + > + spin_lock(&iommu->lock); > + unpreserve_iommu_context(iommu, -1); > + iommu_unpreserve_page(iommu->root_entry); > + spin_unlock(&iommu->lock); > +} Thanks, Praan