From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A7A11DFF0 for ; Tue, 17 Mar 2026 20:23:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773779030; cv=none; b=kewqleuFcakWf/em0BW2csPF4tNAqHTgoMG15wPCxSfizGbzBnuxYZbVTdxgUoLEWU2oyxexsWVSncLgD+y11Bz1P1sLa9Q6T/y19U9+oEZslXi+U+HsHGR70sP2qLXL3WItLHdvVxJY0yTRBDnqoWlY7TnWL1UQRCuGPC3wu0E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773779030; c=relaxed/simple; bh=6jwUuQoag1MkOXhvOi7xKE+56PIb6frVHx9A8bX9Z+o=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NCK+Xc/QpbARJZECS230ceyelfySLCgSc3qH/vHF9nhn9rtWGrYOm8oz3awaJMGF5YvqmtZZUnRI0/OW3goloGRy/VaifG/4uHiJc/RpwGqU+T9fu0rYfT3MgOSwmgO7SB5HBXXB2vky6aCqr8b3MvzDSqo8EQge4hYCEkQJEFE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WU/9vEMR; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WU/9vEMR" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2b052562254so28145ad.0 for ; Tue, 17 Mar 2026 13:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773779027; x=1774383827; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Vik8jCM6p2TyUUslgy2oS8vbrWAAKs/67hY96PDRukY=; b=WU/9vEMRBoRStC+HslbKe6jT64/Q3iNWFJ6mrOd4U0/wOcGFZNmUzj+YofrXl+5Lc7 L0d+OTdOjWVNRWumjezp6BkV1GfIdoDTlGev1HbF+dMxAAYS7w46powNkUBDeiR8S5J2 25blx3Df0+8/tPMJu1V6s6owZsLHVE+1wWXTeAPFU00nq6OfI4BhkNLAohtjdB+uXb8I AsfcmDhIIsdelwahjBBrNz9BwQLJMcB7G9EPN4VrXNjcU96Q6VtcSSmDqyAe+nV/Uq2A jQlxIu+jUqOj38ojPQQxrKWkPt69k+irsYXAxmSOX1K2MCVxK3IURbP0qmP21eWdxdF2 S+eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773779027; x=1774383827; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vik8jCM6p2TyUUslgy2oS8vbrWAAKs/67hY96PDRukY=; b=eI/CBGKVoSDARHlNezgRC6cY2Pk1vlst5piiLyS7+d4G1jTZTPyOe0kPOneqmy7fPI GDqX9ODW8BHfAsR2cRvI05y0NYK31zZuVKEC0yvz8iAZEG/ygCiQS/wwb86gyhVH3so9 /pyK5erm/9qgKMa8S7DgrXv0QB2yilHk/NOT16vdqLmq7sF1SWvv2deDWIk6YNOqtV1k 4lXRS2JtcpdNrqw2tXv99wpVx8XukZlAjZcxta4DBzkiaQpJ5GDkYSFBVFARUJQPGcEt cCNvCGatX6ods+QmeTnTl2TBl+OOMRWpKtsZ1LwFw29qaTafNoedmirVW0khoItG3UPd oxJw== X-Forwarded-Encrypted: i=1; AJvYcCVgYyjK9xn5NfR07J0NGxHgZaNdbXZJU6JdzOzPsYb8apADPB8gOUxcWeK8R+uPT/bzLxHrV83goelALUc=@vger.kernel.org X-Gm-Message-State: AOJu0YzxOUqprzXFuXycFPLoLSEFNHGuhhiZg8GY4EN3td86xuKM9dUa ax9ViW5t66yAQZR+bl1UPPAe6bYPfohSCv0mAqGn7aSsytJTP9FRMyM497OJvw2frA== X-Gm-Gg: ATEYQzyXo97CsJfubuIFcUYV4msD/tAeA0CbIWBguvZ7S8FvhYQJYuizhoVp1ZbSxD9 g3x3L/+ywrTDaPAm/dwJxtYy+HRAph/AMW2EF3PBYZvgzqcr/2WaQ/5O4OwH+ZNcZoIp+YbhmwO XUNYiJ6wOaZ3PWkQea3nNMHs+GbqR9u5QazO57mcFnuY446sUWWNhyWf61VD8SeER5tJrca37ZB u4lz0mNOp0sKJfq9zp4rGyyLaOlJKjNiSl5STkoC2QQnavS+uueXcmMWlO7bGFnzq7qXei7z/1S BnDD6LilTKXuVWZCGBh5gS5DuqcFpyXjl9ZMCB0cQqtsCsQSo/1LJ7BhnKnG8Yvkt+iBk4nogLB D2DeAFJe7VDuWR2eXpXFtT8lWQVsZ3IevEcvfilIUH/hkeUtHVncnYrbTD27zCCi8C69k+QRg5I Btw1To0efE7VqTl01DymOVJUhErIpUXt9lNtjMykU16hvSUe+swIzuHxo5Kg== X-Received: by 2002:a17:903:2412:b0:2b0:55cf:5e91 with SMTP id d9443c01a7336-2b06f8a080emr183235ad.19.1773779026877; Tue, 17 Mar 2026 13:23:46 -0700 (PDT) Received: from google.com (10.129.124.34.bc.googleusercontent.com. [34.124.129.10]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b06e604e7bsm4580315ad.63.2026.03.17.13.23.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Mar 2026 13:23:46 -0700 (PDT) Date: Tue, 17 Mar 2026 20:23:38 +0000 From: Pranjal Shrivastava To: Samiullah Khawaja Cc: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Vipin Sharma , YiFei Zhu Subject: Re: [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Message-ID: References: <20260203220948.2176157-1-skhawaja@google.com> <20260203220948.2176157-3-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Mar 17, 2026 at 08:13:47PM +0000, Samiullah Khawaja wrote: > On Tue, Mar 17, 2026 at 08:09:26PM +0000, Pranjal Shrivastava wrote: > > On Fri, Mar 13, 2026 at 06:42:05PM +0000, Samiullah Khawaja wrote: > > > On Thu, Mar 12, 2026 at 11:10:36PM +0000, Pranjal Shrivastava wrote: > > > > On Tue, Feb 03, 2026 at 10:09:36PM +0000, Samiullah Khawaja wrote: > > > > > Add IOMMU domain ops that can be implemented by the IOMMU drivers if > > > > > they support IOMMU domain preservation across liveupdate. The new IOMMU > > > > > domain preserve, unpreserve and restore APIs call these ops to perform > > > > > respective live update operations. > > > > > > > > > > Similarly add IOMMU ops to preserve/unpreserve a device. These can be > > > > > implemented by the IOMMU drivers that support preservation of devices > > > > > that have their IOMMU domains preserved. During device preservation the > > > > > state of the associated IOMMU is also preserved. The device can only be > > > > > preserved if the attached iommu domain is preserved and the associated > > > > > iommu supports preservation. > > > > > > > > > > The preserved state of the device and IOMMU needs to be fetched during > > > > > shutdown and boot in the next kernel. Add APIs that can be used to fetch > > > > > the preserved state of a device and IOMMU. The APIs will only be used > > > > > during shutdown and after liveupdate so no locking needed. > > > > > > > > > > Signed-off-by: Samiullah Khawaja > > > > > --- > > > > > drivers/iommu/iommu.c | 3 + > > > > > drivers/iommu/liveupdate.c | 326 +++++++++++++++++++++++++++++++++++++ > > > > > include/linux/iommu-lu.h | 119 ++++++++++++++ > > > > > include/linux/iommu.h | 32 ++++ > > > > > 4 files changed, 480 insertions(+) > > > > > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > > > index 4926a43118e6..c0632cb5b570 100644 > > > > > --- a/drivers/iommu/iommu.c > > > > > +++ b/drivers/iommu/iommu.c > > > > > @@ -389,6 +389,9 @@ static struct dev_iommu *dev_iommu_get(struct device *dev) > > > > > > > > > > mutex_init(¶m->lock); > > > > > dev->iommu = param; > > > > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > > > > > + dev->iommu->device_ser = NULL; > > > > > +#endif > > > > > return param; > > > > > } > > > > > > > > > > diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c > > > > > index 6189ba32ff2c..83eb609b3fd7 100644 > > > > > --- a/drivers/iommu/liveupdate.c > > > > > +++ b/drivers/iommu/liveupdate.c > > > > > @@ -11,6 +11,7 @@ > > > > > #include > > > > > #include > > > > > #include > > > > > +#include > > > > > #include > > > > > > > > > > static void iommu_liveupdate_restore_objs(u64 next) > > > > > @@ -175,3 +176,328 @@ int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler) > > > > > return liveupdate_unregister_flb(handler, &iommu_flb); > > > > > } > > > > > EXPORT_SYMBOL(iommu_liveupdate_unregister_flb); > > > > > + > > > > > +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, > > > > > + void *arg) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *obj; > > > > > + struct devices_ser *devices; > > > > > + int ret, i, idx; > > > > > + > > > > > + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); > > > > > + if (ret) > > > > > + return -ENOENT; > > > > > + > > > > > + devices = __va(obj->ser->devices_phys); > > > > > + for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) { > > > > > + if (idx >= MAX_DEVICE_SERS) { > > > > > + devices = __va(devices->objs.next_objs); > > > > > + idx = 0; > > > > > + } > > > > > + > > > > > + if (devices->devices[idx].obj.deleted) > > > > > + continue; > > > > > + > > > > > + ret = fn(&devices->devices[idx], arg); > > > > > + if (ret) > > > > > + return ret; > > > > > + } > > > > > + > > > > > + return 0; > > > > > +} > > > > > +EXPORT_SYMBOL(iommu_for_each_preserved_device); > > > > > + > > > > > +static inline bool device_ser_match(struct device_ser *match, > > > > > + struct pci_dev *pdev) > > > > > +{ > > > > > + return match->devid == pci_dev_id(pdev) && match->pci_domain == pci_domain_nr(pdev->bus); > > > > > +} > > > > > + > > > > > +struct device_ser *iommu_get_device_preserved_data(struct device *dev) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *obj; > > > > > + struct devices_ser *devices; > > > > > + int ret, i, idx; > > > > > + > > > > > + if (!dev_is_pci(dev)) > > > > > + return NULL; > > > > > + > > > > > + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); > > > > > + if (ret) > > > > > + return NULL; > > > > > + > > > > > + devices = __va(obj->ser->devices_phys); > > > > > + for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) { > > > > > + if (idx >= MAX_DEVICE_SERS) { > > > > > + devices = __va(devices->objs.next_objs); > > > > > + idx = 0; > > > > > + } > > > > > + > > > > > + if (devices->devices[idx].obj.deleted) > > > > > + continue; > > > > > + > > > > > + if (device_ser_match(&devices->devices[idx], to_pci_dev(dev))) { > > > > > + devices->devices[idx].obj.incoming = true; > > > > > + return &devices->devices[idx]; > > > > > + } > > > > > + } > > > > > + > > > > > + return NULL; > > > > > +} > > > > > +EXPORT_SYMBOL(iommu_get_device_preserved_data); > > > > > + > > > > > +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *obj; > > > > > + struct iommus_ser *iommus; > > > > > + int ret, i, idx; > > > > > + > > > > > + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); > > > > > + if (ret) > > > > > + return NULL; > > > > > + > > > > > + iommus = __va(obj->ser->iommus_phys); > > > > > + for (i = 0, idx = 0; i < obj->ser->nr_iommus; ++i, ++idx) { > > > > > + if (idx >= MAX_IOMMU_SERS) { > > > > > + iommus = __va(iommus->objs.next_objs); > > > > > + idx = 0; > > > > > + } > > > > > + > > > > > + if (iommus->iommus[idx].obj.deleted) > > > > > + continue; > > > > > + > > > > > + if (iommus->iommus[idx].token == token && > > > > > + iommus->iommus[idx].type == type) > > > > > + return &iommus->iommus[idx]; > > > > > + } > > > > > + > > > > > + return NULL; > > > > > +} > > > > > +EXPORT_SYMBOL(iommu_get_preserved_data); > > > > > + > > > > Also, I don't see these helpers being used anywhere in this series yet? > > Should we add them with the phase-2 series? When we actually "Restore" > > the domain? > > Phase 1 does restore the iommu and domains during boot. But the > iommufd/hwpt state is not restored or re-associated with the restored > domains. > > These helpers are used to fetch the preserved state during boot. > > Thanks, > Sami I see these are used in PATCH 8, should we introduce these helpers in PATCH 8 where we actually use them? As we introduce iommu_restore_domain in PATCH 8? Thanks, Praan > > > > > > > +static int reserve_obj_ser(struct iommu_objs_ser **objs_ptr, u64 max_objs) > > > > > > > > Isn't this more of an "insert" / "populate" / write_ser_entry? We can > > > > rename it to something like iommu_ser_push_entry / iommu_ser_write_entry > > > > > > This is reserving an object in the objects array. The object is filled > > > once reservation is done. Maybe I can call it alloc_obj_ser or > > > alloc_entry_ser. > > > > > > > > > +{ > > > > > + struct iommu_objs_ser *next_objs, *objs = *objs_ptr; > > > > > > > > Not loving these names :( > > > > > > > > TBH, the reserve_obj_ser function isn't too readable, esp. with all the > > > > variable names, here and in the lu header.. I've had to go back n forth > > > > to the first patch. For example, here next_objs can be next_objs_page & > > > > objs can be curr_page. (PTAL at my reply on PATCH 01 about renaming). > > > > > > Basically with current naming there are "serialization objects" in > > > "serializatoin object arrays". The object is a "base" type with > > > "inherited" types for iommu, domain etc. I will add explanation of all > > > this in the ABI header. > > > > > > Maybe we can name it to: > > > > > > struct iommu_obj_ser; > > > struct iommu_obj_array_ser; or struct iommu_obj_ser_array; > > > > > > Then for reserve/alloc, we use names like "next_obj_array" and > > > "curr_obj_array"? > > > > > > > > > + int idx; > > > > > + > > > > > + if (objs->nr_objs == max_objs) { > > > > > + next_objs = kho_alloc_preserve(PAGE_SIZE); > > > > > + if (IS_ERR(next_objs)) > > > > > + return PTR_ERR(next_objs); > > > > > + > > > > > + objs->next_objs = virt_to_phys(next_objs); > > > > > + objs = next_objs; > > > > > + *objs_ptr = objs; > > > > > + objs->nr_objs = 0; > > > > > + objs->next_objs = 0; > > > > > > > > This seems redundant, no need to zero these out, kho_alloc_preserve > > > > passes __GFP_ZERO to folio_alloc [1], which should z-alloc the pages. > > > > > > Agreed. Will remove this. > > > > > > > > > + } > > > > > + > > > > > + idx = objs->nr_objs++; > > > > > + return idx; > > > > > +} > > > > > > > > Just to give a mental model to fellow reviewers, here's how this is laid > > > > out: > > > > > > > > ---------------------------------------------------------------------- > > > > [ PAGE START ] | > > > > ---------------------------------------------------------------------- > > > > | iommu_objs_ser (The Page Header) | > > > > | - next_objs: 0x0000 (End of the page-chain) | > > > > | - nr_objs: 2 | > > > > ---------------------------------------------------------------------- > > > > | ITEM 0: iommu_domain_ser | > > > > | [ iommu_obj_ser (The entry header) ] | > > > > | - idx: 0 | > > > > | - ref_count: 1 | > > > > | - deleted: 0 | > > > > | [ Domain Data ] | > > > > ---------------------------------------------------------------------- > > > > | ITEM 1: iommu_domain_ser | > > > > | [ iommu_obj_ser (The Price Tag) ] | > > > > | - idx: 1 | > > > > | - ref_count: 1 | > > > > | - deleted: 0 | > > > > | [ Domain Data ] | > > > > ---------------------------------------------------------------------- > > > > | ... (Empty space for more domains) ... | > > > > | | > > > > ---------------------------------------------------------------------- > > > > [ PAGE END ] | > > > > ---------------------------------------------------------------------- > > > > > > +1 > > > > > > Will add a table in the header as you suggested. > > > > > > > > > + > > > > > +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser) > > > > > +{ > > > > > + struct iommu_domain_ser *domain_ser; > > > > > + struct iommu_lu_flb_obj *flb_obj; > > > > > + int idx, ret; > > > > > + > > > > > + if (!domain->ops->preserve) > > > > > + return -EOPNOTSUPP; > > > > > + > > > > > + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); > > > > > + if (ret) > > > > > + return ret; > > > > > + > > > > > + guard(mutex)(&flb_obj->lock); > > > > > + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommu_domains, > > > > > + MAX_IOMMU_DOMAIN_SERS); > > > > > + if (idx < 0) > > > > > + return idx; > > > > > + > > > > > + domain_ser = &flb_obj->iommu_domains->iommu_domains[idx]; > > > > > > > > This is slightly less-readable as well, I understand we're trying to: > > > > > > > > iommu_domains_ser -> iommu_domain_ser[idx] but the same name > > > > (iommu_domains) makes it difficult to read.. we should rename this as: > > > > > > Agreed. > > > > > > I will update it to, > > > > > > &flb_obj->curr_iommu_domains->domain_array[idx] > > > > > > > > &flb_obj->iommu_domains_page->domain_entries[idx] or something for > > > > better readability.. > > > > > > > > Also, let's add a comment explaining that reserve_obj_ser actually > > > > advances the flb_obj ptr when necessary.. > > > > > > Agreed. > > > > > > > > IIUC, we start with PAGE 0 initially, store it's phys in the > > > > iommu_flb_preserve op (the iommu_ser_phys et al) & then we go on > > > > alloc-ing more pages and keep storing the "current" active page with the > > > > liveupdate core. Now when we jump into the new kernel, we read the > > > > ser_phys and then follow the page chain, right? > > > > > > Yes, we do an initial allocation of arrays for each object type and then > > > later allocate more as needed. The flb_obj holds the address of > > > currently active array. > > > > > > I will add this explanation in the header. > > > > > > > > > + idx = flb_obj->ser->nr_domains++; > > > > > + domain_ser->obj.idx = idx; > > > > > + domain_ser->obj.ref_count = 1; > > > > > + > > > > > + ret = domain->ops->preserve(domain, domain_ser); > > > > > + if (ret) { > > > > > + domain_ser->obj.deleted = true; > > > > > + return ret; > > > > > + } > > > > > + > > > > > + domain->preserved_state = domain_ser; > > > > > + *ser = domain_ser; > > > > > + return 0; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(iommu_domain_preserve); > > > > > + > > > > > +void iommu_domain_unpreserve(struct iommu_domain *domain) > > > > > +{ > > > > > + struct iommu_domain_ser *domain_ser; > > > > > + struct iommu_lu_flb_obj *flb_obj; > > > > > + int ret; > > > > > + > > > > > + if (!domain->ops->unpreserve) > > > > > + return; > > > > > + > > > > > + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); > > > > > + if (ret) > > > > > + return; > > > > > + > > > > > + guard(mutex)(&flb_obj->lock); > > > > > + > > > > > + /* > > > > > + * There is no check for attached devices here. The correctness relies > > > > > + * on the Live Update Orchestrator's session lifecycle. All resources > > > > > + * (iommufd, vfio devices) are preserved within a single session. If the > > > > > + * session is torn down, the .unpreserve callbacks for all files will be > > > > > + * invoked, ensuring a consistent cleanup without needing explicit > > > > > + * refcounting for the serialized objects here. > > > > > + */ > > > > > + domain_ser = domain->preserved_state; > > > > > + domain->ops->unpreserve(domain, domain_ser); > > > > > + domain_ser->obj.deleted = true; > > > > > + domain->preserved_state = NULL; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(iommu_domain_unpreserve); > > > > > + > > > > > +static int iommu_preserve_locked(struct iommu_device *iommu) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *flb_obj; > > > > > + struct iommu_ser *iommu_ser; > > > > > + int idx, ret; > > > > > + > > > > > + if (!iommu->ops->preserve) > > > > > + return -EOPNOTSUPP; > > > > > + > > > > > + if (iommu->outgoing_preserved_state) { > > > > > + iommu->outgoing_preserved_state->obj.ref_count++; > > > > > + return 0; > > > > > + } > > > > > + > > > > > + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); > > > > > + if (ret) > > > > > + return ret; > > > > > + > > > > > + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommus, > > > > > + MAX_IOMMU_SERS); > > > > > + if (idx < 0) > > > > > + return idx; > > > > > + > > > > > + iommu_ser = &flb_obj->iommus->iommus[idx]; > > > > > + idx = flb_obj->ser->nr_iommus++; > > > > > + iommu_ser->obj.idx = idx; > > > > > + iommu_ser->obj.ref_count = 1; > > > > > + > > > > > + ret = iommu->ops->preserve(iommu, iommu_ser); > > > > > + if (ret) > > > > > + iommu_ser->obj.deleted = true; > > > > > + > > > > > + iommu->outgoing_preserved_state = iommu_ser; > > > > > + return ret; > > > > > +} > > > > > + > > > > > +static void iommu_unpreserve_locked(struct iommu_device *iommu) > > > > > +{ > > > > > + struct iommu_ser *iommu_ser = iommu->outgoing_preserved_state; > > > > > + > > > > > + iommu_ser->obj.ref_count--; > > > > > + if (iommu_ser->obj.ref_count) > > > > > + return; > > > > > + > > > > > + iommu->outgoing_preserved_state = NULL; > > > > > + iommu->ops->unpreserve(iommu, iommu_ser); > > > > > + iommu_ser->obj.deleted = true; > > > > > +} > > > > > + > > > > > +int iommu_preserve_device(struct iommu_domain *domain, > > > > > + struct device *dev, u64 token) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *flb_obj; > > > > > + struct device_ser *device_ser; > > > > > + struct dev_iommu *iommu; > > > > > + struct pci_dev *pdev; > > > > > + int ret, idx; > > > > > + > > > > > + if (!dev_is_pci(dev)) > > > > > + return -EOPNOTSUPP; > > > > > + > > > > > + if (!domain->preserved_state) > > > > > + return -EINVAL; > > > > > + > > > > > + pdev = to_pci_dev(dev); > > > > > + iommu = dev->iommu; > > > > > + if (!iommu->iommu_dev->ops->preserve_device || > > > > > + !iommu->iommu_dev->ops->preserve) > > > > > + return -EOPNOTSUPP; > > > > > + > > > > > + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); > > > > > + if (ret) > > > > > + return ret; > > > > > + > > > > > + guard(mutex)(&flb_obj->lock); > > > > > + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->devices, > > > > > + MAX_DEVICE_SERS); > > > > > + if (idx < 0) > > > > > + return idx; > > > > > + > > > > > + device_ser = &flb_obj->devices->devices[idx]; > > > > > + idx = flb_obj->ser->nr_devices++; > > > > > + device_ser->obj.idx = idx; > > > > > + device_ser->obj.ref_count = 1; > > > > > + > > > > > + ret = iommu_preserve_locked(iommu->iommu_dev); > > > > > + if (ret) { > > > > > + device_ser->obj.deleted = true; > > > > > + return ret; > > > > > + } > > > > > + > > > > > + device_ser->domain_iommu_ser.domain_phys = __pa(domain->preserved_state); > > > > > + device_ser->domain_iommu_ser.iommu_phys = __pa(iommu->iommu_dev->outgoing_preserved_state); > > > > > + device_ser->devid = pci_dev_id(pdev); > > > > > + device_ser->pci_domain = pci_domain_nr(pdev->bus); > > > > > + device_ser->token = token; > > > > > + > > > > > + ret = iommu->iommu_dev->ops->preserve_device(dev, device_ser); > > > > > + if (ret) { > > > > > + device_ser->obj.deleted = true; > > > > > + iommu_unpreserve_locked(iommu->iommu_dev); > > > > > + return ret; > > > > > + } > > > > > + > > > > > + dev->iommu->device_ser = device_ser; > > > > > + return 0; > > > > > +} > > > > > + > > > > > +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev) > > > > > +{ > > > > > + struct iommu_lu_flb_obj *flb_obj; > > > > > + struct device_ser *device_ser; > > > > > + struct dev_iommu *iommu; > > > > > + struct pci_dev *pdev; > > > > > + int ret; > > > > > + > > > > > + if (!dev_is_pci(dev)) > > > > > + return; > > > > > + > > > > > + pdev = to_pci_dev(dev); > > > > > + iommu = dev->iommu; > > > > > + if (!iommu->iommu_dev->ops->unpreserve_device || > > > > > + !iommu->iommu_dev->ops->unpreserve) > > > > > + return; > > > > > + > > > > > + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); > > > > > + if (WARN_ON(ret)) > > > > > + return; > > > > > + > > > > > + guard(mutex)(&flb_obj->lock); > > > > > + device_ser = dev_iommu_preserved_state(dev); > > > > > + if (WARN_ON(!device_ser)) > > > > > + return; > > > > > + > > > > > + iommu->iommu_dev->ops->unpreserve_device(dev, device_ser); > > > > > + dev->iommu->device_ser = NULL; > > > > > + > > > > > + iommu_unpreserve_locked(iommu->iommu_dev); > > > > > +} > > > > > > > > I'm wondering ig we should guard these APIs against accidental or > > > > potential abuse by other in-kernel drivers. Since the Live Update > > > > Orchestrator (LUO) is architecturally designed around user-space > > > > driven sequences (IOCTLs, specific mutex ordering, etc.), for now. > > > > > > > > Since the header-file is also under include/linux, we should avoid any > > > > possibility where we end up having drivers depending on these API. > > > > We could add a check based on dma owner: > > > > > > > > + /* Only devices explicitly claimed by a user-space driver > > > > + * (VFIO/IOMMUFD) are eligible for Live Update preservation. > > > > + */ > > > > + if (!iommu_dma_owner_claimed(dev)) > > > > + return -EPERM; > > > > > > > > This should ensure we aren't creating 'zombie' preserved states for > > > > devices not managed by IOMMUFD/VFIO. > > > > > > Agreed. I will update this. > > > > > > > > > > > > > diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h > > > > > index 59095d2f1bb2..48c07514a776 100644 > > > > > --- a/include/linux/iommu-lu.h > > > > > +++ b/include/linux/iommu-lu.h > > > > > @@ -8,9 +8,128 @@ > > > > > #ifndef _LINUX_IOMMU_LU_H > > > > > #define _LINUX_IOMMU_LU_H > > > > > > > > > > +#include > > > > > +#include > > > > > #include > > > > > #include > > > > > > > > > > +typedef int (*iommu_preserved_device_iter_fn)(struct device_ser *ser, > > > > > + void *arg); > > > > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > > > > > +static inline void *dev_iommu_preserved_state(struct device *dev) > > > > > +{ > > > > > + struct device_ser *ser; > > > > > + > > > > > + if (!dev->iommu) > > > > > + return NULL; > > > > > + > > > > > + ser = dev->iommu->device_ser; > > > > > + if (ser && !ser->obj.incoming) > > > > > + return ser; > > > > > + > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline void *dev_iommu_restored_state(struct device *dev) > > > > > +{ > > > > > + struct device_ser *ser; > > > > > + > > > > > + if (!dev->iommu) > > > > > + return NULL; > > > > > + > > > > > + ser = dev->iommu->device_ser; > > > > > + if (ser && ser->obj.incoming) > > > > > + return ser; > > > > > + > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline void *iommu_domain_restored_state(struct iommu_domain *domain) > > > > > +{ > > > > > + struct iommu_domain_ser *ser; > > > > > + > > > > > + ser = domain->preserved_state; > > > > > + if (ser && ser->obj.incoming) > > > > > + return ser; > > > > > + > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain) > > > > > +{ > > > > > + struct device_ser *ser = dev_iommu_restored_state(dev); > > > > > + > > > > > + if (ser && iommu_domain_restored_state(domain)) > > > > > + return ser->domain_iommu_ser.did; > > > > > + > > > > > + return -1; > > > > > +} > > > > > + > > > > > +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, > > > > > + void *arg); > > > > > +struct device_ser *iommu_get_device_preserved_data(struct device *dev); > > > > > +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type); > > > > > +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser); > > > > > +void iommu_domain_unpreserve(struct iommu_domain *domain); > > > > > +int iommu_preserve_device(struct iommu_domain *domain, > > > > > + struct device *dev, u64 token); > > > > > +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev); > > > > > +#else > > > > > +static inline void *dev_iommu_preserved_state(struct device *dev) > > > > > +{ > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline void *dev_iommu_restored_state(struct device *dev) > > > > > +{ > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain) > > > > > +{ > > > > > + return -1; > > > > > +} > > > > > + > > > > > +static inline void *iommu_domain_restored_state(struct iommu_domain *domain) > > > > > +{ > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, void *arg) > > > > > +{ > > > > > + return -EOPNOTSUPP; > > > > > +} > > > > > + > > > > > +static inline struct device_ser *iommu_get_device_preserved_data(struct device *dev) > > > > > +{ > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type) > > > > > +{ > > > > > + return NULL; > > > > > +} > > > > > + > > > > > +static inline int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser) > > > > > +{ > > > > > + return -EOPNOTSUPP; > > > > > +} > > > > > + > > > > > +static inline void iommu_domain_unpreserve(struct iommu_domain *domain) > > > > > +{ > > > > > +} > > > > > + > > > > > +static inline int iommu_preserve_device(struct iommu_domain *domain, > > > > > + struct device *dev, u64 token) > > > > > +{ > > > > > + return -EOPNOTSUPP; > > > > > +} > > > > > + > > > > > +static inline void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev) > > > > > +{ > > > > > +} > > > > > +#endif > > > > > + > > > > > int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler); > > > > > int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler); > > > > > > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > > > > index 54b8b48c762e..bd949c1ce7c5 100644 > > > > > --- a/include/linux/iommu.h > > > > > +++ b/include/linux/iommu.h > > > > > @@ -14,6 +14,8 @@ > > > > > #include > > > > > #include > > > > > #include > > > > > +#include > > > > > +#include > > > > > #include > > > > > > > > > > #define IOMMU_READ (1 << 0) > > > > > @@ -248,6 +250,10 @@ struct iommu_domain { > > > > > struct list_head next; > > > > > }; > > > > > }; > > > > > + > > > > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > > > > > + struct iommu_domain_ser *preserved_state; > > > > > +#endif > > > > > }; > > > > > > > > > > static inline bool iommu_is_dma_domain(struct iommu_domain *domain) > > > > > @@ -647,6 +653,10 @@ __iommu_copy_struct_to_user(const struct iommu_user_data *dst_data, > > > > > * resources shared/passed to user space IOMMU instance. Associate > > > > > * it with a nesting @parent_domain. It is required for driver to > > > > > * set @viommu->ops pointing to its own viommu_ops > > > > > + * @preserve_device: Preserve state of a device for liveupdate. > > > > > + * @unpreserve_device: Unpreserve state that was preserved earlier. > > > > > + * @preserve: Preserve state of iommu translation hardware for liveupdate. > > > > > + * @unpreserve: Unpreserve state of iommu that was preserved earlier. > > > > > * @owner: Driver module providing these ops > > > > > * @identity_domain: An always available, always attachable identity > > > > > * translation. > > > > > @@ -703,6 +713,11 @@ struct iommu_ops { > > > > > struct iommu_domain *parent_domain, > > > > > const struct iommu_user_data *user_data); > > > > > > > > > > + int (*preserve_device)(struct device *dev, struct device_ser *device_ser); > > > > > + void (*unpreserve_device)(struct device *dev, struct device_ser *device_ser); > > > > > > > > Nit: Let's move the _device ops under the comment: > > > > `/* Per device IOMMU features */` > > > > > > I will move these. > > > > > > > > > + int (*preserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser); > > > > > + void (*unpreserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser); > > > > > + > > > > > > > > I'm wondering if there's any benefit to add these ops under a #ifdef ? > > > > > > These should be NULL when liveupdate is disabled, so #ifdef should not > > > be needed. Not sure how much space we save if don't define these. I will > > > move these in #ifdef anyway. > > > > > > > > > const struct iommu_domain_ops *default_domain_ops; > > > > > struct module *owner; > > > > > struct iommu_domain *identity_domain; > > > > > @@ -749,6 +764,11 @@ struct iommu_ops { > > > > > * specific mechanisms. > > > > > * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) > > > > > * @free: Release the domain after use. > > > > > + * @preserve: Preserve the iommu domain for liveupdate. > > > > > + * Returns 0 on success, a negative errno on failure. > > > > > + * @unpreserve: Unpreserve the iommu domain that was preserved earlier. > > > > > + * @restore: Restore the iommu domain after liveupdate. > > > > > + * Returns 0 on success, a negative errno on failure. > > > > > */ > > > > > struct iommu_domain_ops { > > > > > int (*attach_dev)(struct iommu_domain *domain, struct device *dev, > > > > > @@ -779,6 +799,9 @@ struct iommu_domain_ops { > > > > > unsigned long quirks); > > > > > > > > > > void (*free)(struct iommu_domain *domain); > > > > > + int (*preserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser); > > > > > + void (*unpreserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser); > > > > > + int (*restore)(struct iommu_domain *domain, struct iommu_domain_ser *ser); > > > > > }; > > > > > > > > > > /** > > > > > @@ -790,6 +813,8 @@ struct iommu_domain_ops { > > > > > * @singleton_group: Used internally for drivers that have only one group > > > > > * @max_pasids: number of supported PASIDs > > > > > * @ready: set once iommu_device_register() has completed successfully > > > > > + * @outgoing_preserved_state: preserved iommu state of outgoing kernel for > > > > > + * liveupdate. > > > > > */ > > > > > struct iommu_device { > > > > > struct list_head list; > > > > > @@ -799,6 +824,10 @@ struct iommu_device { > > > > > struct iommu_group *singleton_group; > > > > > u32 max_pasids; > > > > > bool ready; > > > > > + > > > > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > > > > > + struct iommu_ser *outgoing_preserved_state; > > > > > +#endif > > > > > }; > > > > > > > > > > /** > > > > > @@ -853,6 +882,9 @@ struct dev_iommu { > > > > > u32 pci_32bit_workaround:1; > > > > > u32 require_direct:1; > > > > > u32 shadow_on_flush:1; > > > > > +#ifdef CONFIG_IOMMU_LIVEUPDATE > > > > > + struct device_ser *device_ser; > > > > > +#endif > > > > > }; > > > > > > > > > > int iommu_device_register(struct iommu_device *iommu, > > > > > > > > Thanks, > > > > Praan > > > > > > > > [1] https://elixir.bootlin.com/linux/v7.0-rc3/source/kernel/liveupdate/kexec_handover.c#L1182