From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B02FA29B8CF for ; Fri, 8 May 2026 02:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778207826; cv=none; b=ZPcvCFcGFNRS/gSXG3tdW8ZOd+zQfUFT8/9ruCQYWwDnuSxiFSfHkQuR4BiEJiUZg+NtC0jQ9Wk1KyE3BnTnqYsjqNTz0IvG5mtUJZx3148URY+tfqeJSvZejXOi2MuL9JMaR9RcyzoS+eDdq1CeYItPSLtEUgNSUtTbxKmAi1k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778207826; c=relaxed/simple; bh=7XLyoxiXiPTIEisakBIKoC72jfxRTH7b0MQFmabanUQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rvoUyiGYrOFivCye+N+u6/8L+NItF3yulcxCJjjBwBc+6TCWKkUYZWbfVAXKAW9mZHvB1ZVJ0JpICF7iT/Ii1YqSu0ltvln9sFHWdEowrzYyf6+YAVKFCY+pg0Mg4Ob2XdGe8OFfBa3Rq7Vs0BnFbiHPr32LGrEFUnFzfqCEvuw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cga14DaF; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cga14DaF" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2ba3b9bcf69so32735ad.0 for ; Thu, 07 May 2026 19:37:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778207821; x=1778812621; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8wvvHwrpsrKEDb2OQono0ByW7h8DSGBYa8qQB5qgDR4=; b=cga14DaF/uij3lEBtrWBQcLpNMyvqswkkn7Nx6XZh5m74YodDo7uNAlZ7w9cBYhxAh c1tKHYAFBccazny9+ldsRu2k5Kl5Ng1TB6nJRs2cpnCyiIU3dYOpkyEGAW4Ae/HIqh/+ nO2pYtku184aBCSMBVqf6oCZFXvrQRd2PdM1HntmVw0HuOU15FhcUdWPYoym4vVEqVbE Y6ZSjpwKbYfTQZIDeAcxO8nh/uQRPg0ROcrttuX1D8zh6SATdqm+Kt6fusYPC7js4gIL PswRPWecDuFnHN16n6Pg4Epcsat8qHGUTZt/bu1vbgdbqok8yWvTFl6K7UrE9EnEEsyK rssg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778207821; x=1778812621; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8wvvHwrpsrKEDb2OQono0ByW7h8DSGBYa8qQB5qgDR4=; b=fK3X0MYDt2+WciDW/dEoPetNxJEeX/E7Qv2Dr1/Sd2s5xJs+Y4mAIABshXr8xqfqAg UwKpL2JjvKVlTN6p3oMI3BwOQaaEe6znhGssXUmL3rIMD6DYZJEkHEIigqcVeH75eBGR exyCxKxc3gwAIrKRLbpruZtSARNJygPw59vzUT7/ihX4aMvfZAsEqpAYO3IFN+CHsGxB SsKadw4Nu3yMi/EOP0r/Hvog5I7MRdv+fs+Th6sq9Ozfd6QzBw0GgFjLLttEulsNNfSM yhutHhSm4AM/vxs0SNP3q2yJNxvTVVd2cP2ftFYosgFd9QqTI//qJ84yQnzmZynUhEhT AYNQ== X-Forwarded-Encrypted: i=1; AFNElJ9jGyrl7dZRzc+9y8lv/hKEFrI0fHO1XJecK/PAYsq1dmVU6bnSRvCbcJblrQ6+eg6MetmHu8+twsP8xqE=@vger.kernel.org X-Gm-Message-State: AOJu0YylONaqvk8TW7TwVJmLB7PtPfvlaQ+3cbUEGMetAidK+F3bXaLg vTknqrtMLhumA6k9i3/y1Q821d1U71JRyDEX0jBzIPyG5QizOK18CPq/Swya21EoqQ== X-Gm-Gg: Acq92OHKvLXB667FFJayzoMLa8Fk2jhBAaUdZQ8vZYbfVoIFGy5zGHqITNnPUJse57U 2SHG6w56pXIuYdoBECAVrpPfk4jyGR5s8CexftHD4xgkku8Gexhze/FZnaLSiuMpdfvo3J8SyWA O1y4Qs6pbIaHdfVqI3jIMXIVhjzre9/PzEHferlrHn8ODvGW0PiUpc/wgpHf68Q7GX9l7oO3aqW nG0F1nuFXrQemD/uh38ibgHSzeqb9jgJUO05ueAJVmULf2v+M6UAUYbWVJ7z/i278jZnE3eoPpA MoZkDLnQ6q58PU5CoUqu/FeheZAT/pwHrJ4pd2ZZfWQd69WOq3HMqn2M0P+P87Pl5U60K4yuVs6 pMh0LL1U0R7APKctjVateyNKeUa9SGTyAEQpN92YNGe3DfqeCur7bfc9fuu0USrTyWSlUYW3imJ xUYBBDUgRcUwboj2DDesvDEJHYmUkWGBR9spiZrEXtsSIO4VyILl4CGH2lha93+/8lfOeDXg== X-Received: by 2002:a17:902:c40b:b0:2b0:d1f4:f601 with SMTP id d9443c01a7336-2baee82c9famr1274245ad.15.1778207820573; Thu, 07 May 2026 19:37:00 -0700 (PDT) Received: from google.com (153.46.83.34.bc.googleusercontent.com. [34.83.46.153]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-839679c80e7sm12834136b3a.31.2026.05.07.19.36.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2026 19:36:59 -0700 (PDT) Date: Fri, 8 May 2026 02:36:56 +0000 From: Samiullah Khawaja To: Baolu Lu Cc: David Woodhouse , Joerg Roedel , Will Deacon , Jason Gunthorpe , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Pranjal Shrivastava , Vipin Sharma , YiFei Zhu Subject: Re: [PATCH v2 07/16] iommu/vt-d: Implement device and iommu preserve/unpreserve ops Message-ID: References: <20260427175633.1978233-1-skhawaja@google.com> <20260427175633.1978233-8-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: On Thu, May 07, 2026 at 02:25:14PM +0800, Baolu Lu wrote: >On 4/28/26 01:56, Samiullah Khawaja wrote: >>Add implementation of the device and iommu presevation in a separate >>file. Also set the device and iommu preserve/unpreserve ops in the >>struct iommu_ops. >> >>During normal shutdown the iommu translation is disabled. Since the root >>table is preserved during live update, it needs to be cleaned up and the >>context entries of the unpreserved devices need to be cleared. > >This is not related to preserve/unpreserve ops and could be made in a >separated patch? Agreed. I will move this stuff to a separate patch. > >> >>Signed-off-by: Samiullah Khawaja >>--- >> MAINTAINERS | 1 + >> drivers/iommu/intel/Makefile | 1 + >> drivers/iommu/intel/iommu.c | 52 +++++++++++- >> drivers/iommu/intel/iommu.h | 28 +++++++ >> drivers/iommu/intel/liveupdate.c | 139 +++++++++++++++++++++++++++++++ >> drivers/iommu/iommu.c | 18 ++++ >> include/linux/iommu-liveupdate.h | 10 +++ >> include/linux/iommu.h | 14 ++++ >> include/linux/kho/abi/iommu.h | 18 ++++ >> 9 files changed, 277 insertions(+), 4 deletions(-) >> create mode 100644 drivers/iommu/intel/liveupdate.c >> >>diff --git a/MAINTAINERS b/MAINTAINERS >>index 980041955abc..9f5c02c6c8c1 100644 >>--- a/MAINTAINERS >>+++ b/MAINTAINERS >>@@ -13495,6 +13495,7 @@ M: Samiullah Khawaja >> R: Pranjal Shrivastava >> L: iommu@lists.linux.dev >> S: Maintained >>+F: drivers/iommu/intel/liveupdate.c > >This file is deeply integrated into the Intel IOMMU driver. Maintaining >it separately from the rest of the driver would add unnecessary >complexity. I suggest merging this entry into the existing Intel IOMMU >block so they can be managed together. I understand your concern. This file is driver specific and should definitely be routed through your tree. My goal with listing it here wasn't to change the merge path, but rather to ensure I am responsible for the liveupdate-specific logic in this file. Because this file acts as the backend to the generic Live Update framework, keeping it tied to the Liveupdate IOMMU entry helps guarantee that future changes stay architecturally aligned and tested across the whole liveupdate feature. However, I think I will add a dedicated 'IOMMU VT-d LIVEUPDATE' block where you are also listed as maintainer, and the patches for this file should absolutely continue to flow through the Intel IOMMU tree. > >> F: drivers/iommu/liveupdate.c >> F: include/linux/iommu-liveupdate.h >> F: include/linux/kho/abi/iommu.h >>diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile >>index ada651c4a01b..d38fc101bc35 100644 >>+ [snip] >>+static void unpreserve_iommu_context_table(struct intel_iommu *iommu, int end) >>+{ >>+ struct context_entry *context; >>+ int i; >>+ >>+ for (i = 0; i < end; i++) { >>+ context = iommu_context_addr(iommu, i, 0, 0); >>+ if (context) >>+ iommu_unpreserve_page(context); >>+ >>+ if (!sm_supported(iommu)) >>+ continue; >>+ >>+ context = iommu_context_addr(iommu, i, 0x80, 0); >>+ if (context) >>+ iommu_unpreserve_page(context); >>+ } >>+} >>+ >>+static int preserve_iommu_context_table(struct intel_iommu *iommu) > >Since this function preserves all context tables, should it be renamed >to preserve_iommu_context_tables()? Agreed. Will change this. > >>+{ >>+ struct context_entry *context; >>+ int ret; >>+ int i; >>+ >>+ for (i = 0; i < ROOT_ENTRY_NR; i++) { >>+ /* >>+ * Alloc the context tables now to make sure the iommu unit is >>+ * properly preserved. These might stay unused and wastes around >>+ * 32MB max in scalable mode. >>+ */ > >Instead of allocating and preserving context tables for all root entries >(as noted, can waste up to 32MB), could we restrict this only to the >entries possibly in use by active PCI devices? I think the hotplug devices or VFs created through SR-IOV will be missed that way. Lets say device A is preserved and the associated iommu is also preserved. And then a new device B is hotplugged and preserved, then the context table for that will be missed. Since we don't track the context_tables that are preserved, there is no way to incremently preserve the new-ones. Let me look into the behaviour of KHO, maybe we can make the preserve call idempotent and do these incrementally. > >>+ spin_lock(&iommu->lock); >>+ context = iommu_context_addr(iommu, i, 0, 1); >>+ spin_unlock(&iommu->lock); >>+ if (!context) { >>+ ret = -ENOMEM; >>+ goto error; >>+ } [snip] >>diff --git a/include/linux/iommu.h b/include/linux/iommu.h >>index 1c424b32c5fc..999be5127c65 100644 >>--- a/include/linux/iommu.h >>+++ b/include/linux/iommu.h >>@@ -1207,6 +1207,20 @@ static inline void *dev_iommu_priv_get(struct device *dev) >> void dev_iommu_priv_set(struct device *dev, void *priv); >>+typedef int (*iommu_dev_iter_fn)(struct device *dev, >>+ struct iommu_device *iommu, void *arg); >>+ >>+/** >>+ * struct iommu_dev_iter - Iterator for devices attached to an IOMMU >>+ */ >>+struct iommu_dev_iter { >>+ struct iommu_device *iommu; >>+ iommu_dev_iter_fn fn; >>+ void *arg; >>+}; >>+ >>+void iommu_for_each_dev(struct iommu_dev_iter *iter); > >Is a generic helper necessary here? I am concerned about potential races >with concurrent operations especially probe_device or release_device. >And also the hot-plug/unplug scenarios? > >Since the current preservation logic is limited to PCI devices, it might >be safer and simpler to use the existing for_each_pci_dev() for this >specific case instead of introducing a new generic helper? The helper is used late during shutdown to clean up context entries for devices that are not preserved. During this phase, the system is quiescent and hotplug is disabled, so there are no races with concurrent probe or release operations. I was using for_each_pci_dev() in previous version, but the iterator is needed to clear context entries of both PCI and non-PCI devices that are unpreserved. Please also see related comments on previous version: https://lore.kernel.org/all/ab3R54-kyHGDEW9L@google.com/ > >>+ >> extern struct mutex iommu_probe_device_lock; >> int iommu_probe_device(struct device *dev); [snip] >>+ }; >> } __packed; >> /** > >Thanks, >baolu > Thanks, Sami