From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C46803DDDDD for ; Fri, 13 Mar 2026 18:42:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773427336; cv=none; b=e+0K8c8iJN/8IPJr4IwI5dmeVVLJpoZI0IQyVQcoQ/qmjh5ftEpAiO6MQo6OhsgnBQ7dyYBFvyk7DjlykLl8ZM/UUVho2hdmrcn2tJ5owSWQYNQ7WgNNcFbcQ400OMWB04aB6t+5FpoQJRCkUEOz+xP8Dqy8TI82JUNhXVt7K9M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773427336; c=relaxed/simple; bh=vQnylcqlyNpXPh4Hod0oajC03ZDjeSSkSB2Qk9zJRY8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HtW/w5OQpXHHBR8qDV3brIVqIjaTdTTeF81EFzS06gIL4mjD9qzqpyC+jvZRdh0AE+PAkW+gmk7lKp949DqIMgQ2HXQOGveyRu6vwtRockDGSi7/vVnLj0CD/TZ11HhqJF2MbeW2yNL8F99PJaYzG52IoJ0R3mc59lq9TJuDmjw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jXk83Wby; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jXk83Wby" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-2ae49120e97so13145ad.0 for ; Fri, 13 Mar 2026 11:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773427331; x=1774032131; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/A5FBLqlUiawqC7jWNaor6H1U3Nfh/py+Xr7ugd0EOQ=; b=jXk83Wbyo3w7XH78Lea3w5JGE1HMBjud0W7LJQZbCiK3tQqhgHJ3VewO4lHp1pJo24 //kERHAbKb0WvM6Dzryaj58SCb/zLqKxEf3CehoPZz4d5dMyY5OgjW0NYOEJRqQiaez0 HBV3kCTMcOdIFxLwrsKHAHccEWRgh9kX1L+Vs6jADM4A1OTWYt6fZ9AswTBVhHQ3Reqm XiKrcoyotWZxj0uVBNEo5eM3fz1GBPt96ftXGQXlpTSJzzKNlriEKtkcI7ITY/kFb/Pu 3CrmTFVxAOAWwHW5yX9myPLta4IVKuy2xClHXD5OS067UF8UPtegz3GydWeRcG9HCxNU /39Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773427331; x=1774032131; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/A5FBLqlUiawqC7jWNaor6H1U3Nfh/py+Xr7ugd0EOQ=; b=XxvvVG2SABknA7DUrJVR2a8GpsKeSJso8+WyFM/9Gx5pscfHH7BTKZOy91hZQc1GtH gC6X1HMJjQvH51oxDaeIpKhow3LD6oJKq3Vf9Npsp++4U5uc3VKMQ8v20ghNnvPoxnMY J+KHNIdEZFwvAe3e5A2v1O4/xAWuQkhbcgFHglOFEYmVwsyYksrNUDysQcdx3zZOCeh5 xPjX/4+x4FkJxSj0mK4Pj6GnugFvah7PzA90wAwcJQKbRAXouYdNqUvVdwdacJwbZ/PW EucNm4a43dpCY5x1mWjWM08oekjQcXKcx986Gej215wkX2Ket3BDtGMBbpgrMjPsJpre caFg== X-Forwarded-Encrypted: i=1; AJvYcCWQrTlNb5/XN6aniqrEs+RYySJqlsWLE/Bv6VnC9JtDElbF45C3oTk5rFkxGQP92/Gy+MpP5Zz0GQ1llEU=@vger.kernel.org X-Gm-Message-State: AOJu0YxQHwiKhvqZZUYNNv3ZLkLQKjKwLR65XGge4RkBZrQz/UW1B9jp TX4cxQ7aJIs5OYCbPszVZwZEG0lSN/AEB37bpWakUqR4fB6MdLo+TLTgQ7iqsN7LLQ== X-Gm-Gg: ATEYQzzh2o3bOnn4MgX+Mly3HhB1BKjz3ICzHsKFAmWtlrWKRdtm0qLNaOOoRID+BhO JRgxzhxoKjbUhneltrgXtMNqIpxSgBPN5YoSsrtbB5TO9nEDTZhphLFh1Rm5d6RU+FzlMhNRviN a+vH7iO7Pl9nCTCZ/i81/99w+8JnSLv7IRmyUvlGvdJN8JH0Oh15pFnMr94+xH93n01vXFGQR9r Pvp+ypKX+tiP0vsCw26TLgNxC6wKUHT7CzW/zmmYPRohSK4vixHb2WqBVEXW8/e1ekwvWpy5rDN kczvFz5AttJ2vWjP9vjo4g4Ah1/ynQ9trsFM50sN7t0V969mRNa6eFl1rW5Rzfl3dThizNrB4g3 auh1yt/RGNaG02K41mHZILP07N+jxSIPv+anRaPWxlv0XER9WjQmz2PKqWrEHVgaA1Zw+6oRu7N gQZ+WsN6n4UG2NBg0gvn/zL+t+sWLVirqh17MpAV9KWMBrBeQ7Ffs5A4oOAG/9xw== X-Received: by 2002:a17:902:ce82:b0:2ae:618b:7522 with SMTP id d9443c01a7336-2b0420ab8damr476235ad.4.1773427330358; Fri, 13 Mar 2026 11:42:10 -0700 (PDT) Received: from google.com (168.136.83.34.bc.googleusercontent.com. [34.83.136.168]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82a07341986sm6490221b3a.32.2026.03.13.11.42.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Mar 2026 11:42:09 -0700 (PDT) Date: Fri, 13 Mar 2026 18:42:05 +0000 From: Samiullah Khawaja To: Pranjal Shrivastava Cc: David Woodhouse , Lu Baolu , Joerg Roedel , Will Deacon , Jason Gunthorpe , Robin Murphy , Kevin Tian , Alex Williamson , Shuah Khan , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Saeed Mahameed , Adithya Jayachandran , Parav Pandit , Leon Romanovsky , William Tu , Pratyush Yadav , Pasha Tatashin , David Matlack , Andrew Morton , Chris Li , Vipin Sharma , YiFei Zhu Subject: Re: [PATCH 02/14] iommu: Implement IOMMU core liveupdate skeleton Message-ID: References: <20260203220948.2176157-1-skhawaja@google.com> <20260203220948.2176157-3-skhawaja@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: On Thu, Mar 12, 2026 at 11:10:36PM +0000, Pranjal Shrivastava wrote: >On Tue, Feb 03, 2026 at 10:09:36PM +0000, Samiullah Khawaja wrote: >> Add IOMMU domain ops that can be implemented by the IOMMU drivers if >> they support IOMMU domain preservation across liveupdate. The new IOMMU >> domain preserve, unpreserve and restore APIs call these ops to perform >> respective live update operations. >> >> Similarly add IOMMU ops to preserve/unpreserve a device. These can be >> implemented by the IOMMU drivers that support preservation of devices >> that have their IOMMU domains preserved. During device preservation the >> state of the associated IOMMU is also preserved. The device can only be >> preserved if the attached iommu domain is preserved and the associated >> iommu supports preservation. >> >> The preserved state of the device and IOMMU needs to be fetched during >> shutdown and boot in the next kernel. Add APIs that can be used to fetch >> the preserved state of a device and IOMMU. The APIs will only be used >> during shutdown and after liveupdate so no locking needed. >> >> Signed-off-by: Samiullah Khawaja >> --- >> drivers/iommu/iommu.c | 3 + >> drivers/iommu/liveupdate.c | 326 +++++++++++++++++++++++++++++++++++++ >> include/linux/iommu-lu.h | 119 ++++++++++++++ >> include/linux/iommu.h | 32 ++++ >> 4 files changed, 480 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index 4926a43118e6..c0632cb5b570 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -389,6 +389,9 @@ static struct dev_iommu *dev_iommu_get(struct device *dev) >> >> mutex_init(¶m->lock); >> dev->iommu = param; >> +#ifdef CONFIG_IOMMU_LIVEUPDATE >> + dev->iommu->device_ser = NULL; >> +#endif >> return param; >> } >> >> diff --git a/drivers/iommu/liveupdate.c b/drivers/iommu/liveupdate.c >> index 6189ba32ff2c..83eb609b3fd7 100644 >> --- a/drivers/iommu/liveupdate.c >> +++ b/drivers/iommu/liveupdate.c >> @@ -11,6 +11,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> static void iommu_liveupdate_restore_objs(u64 next) >> @@ -175,3 +176,328 @@ int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler) >> return liveupdate_unregister_flb(handler, &iommu_flb); >> } >> EXPORT_SYMBOL(iommu_liveupdate_unregister_flb); >> + >> +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, >> + void *arg) >> +{ >> + struct iommu_lu_flb_obj *obj; >> + struct devices_ser *devices; >> + int ret, i, idx; >> + >> + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); >> + if (ret) >> + return -ENOENT; >> + >> + devices = __va(obj->ser->devices_phys); >> + for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) { >> + if (idx >= MAX_DEVICE_SERS) { >> + devices = __va(devices->objs.next_objs); >> + idx = 0; >> + } >> + >> + if (devices->devices[idx].obj.deleted) >> + continue; >> + >> + ret = fn(&devices->devices[idx], arg); >> + if (ret) >> + return ret; >> + } >> + >> + return 0; >> +} >> +EXPORT_SYMBOL(iommu_for_each_preserved_device); >> + >> +static inline bool device_ser_match(struct device_ser *match, >> + struct pci_dev *pdev) >> +{ >> + return match->devid == pci_dev_id(pdev) && match->pci_domain == pci_domain_nr(pdev->bus); >> +} >> + >> +struct device_ser *iommu_get_device_preserved_data(struct device *dev) >> +{ >> + struct iommu_lu_flb_obj *obj; >> + struct devices_ser *devices; >> + int ret, i, idx; >> + >> + if (!dev_is_pci(dev)) >> + return NULL; >> + >> + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); >> + if (ret) >> + return NULL; >> + >> + devices = __va(obj->ser->devices_phys); >> + for (i = 0, idx = 0; i < obj->ser->nr_devices; ++i, ++idx) { >> + if (idx >= MAX_DEVICE_SERS) { >> + devices = __va(devices->objs.next_objs); >> + idx = 0; >> + } >> + >> + if (devices->devices[idx].obj.deleted) >> + continue; >> + >> + if (device_ser_match(&devices->devices[idx], to_pci_dev(dev))) { >> + devices->devices[idx].obj.incoming = true; >> + return &devices->devices[idx]; >> + } >> + } >> + >> + return NULL; >> +} >> +EXPORT_SYMBOL(iommu_get_device_preserved_data); >> + >> +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type) >> +{ >> + struct iommu_lu_flb_obj *obj; >> + struct iommus_ser *iommus; >> + int ret, i, idx; >> + >> + ret = liveupdate_flb_get_incoming(&iommu_flb, (void **)&obj); >> + if (ret) >> + return NULL; >> + >> + iommus = __va(obj->ser->iommus_phys); >> + for (i = 0, idx = 0; i < obj->ser->nr_iommus; ++i, ++idx) { >> + if (idx >= MAX_IOMMU_SERS) { >> + iommus = __va(iommus->objs.next_objs); >> + idx = 0; >> + } >> + >> + if (iommus->iommus[idx].obj.deleted) >> + continue; >> + >> + if (iommus->iommus[idx].token == token && >> + iommus->iommus[idx].type == type) >> + return &iommus->iommus[idx]; >> + } >> + >> + return NULL; >> +} >> +EXPORT_SYMBOL(iommu_get_preserved_data); >> + >> +static int reserve_obj_ser(struct iommu_objs_ser **objs_ptr, u64 max_objs) > >Isn't this more of an "insert" / "populate" / write_ser_entry? We can >rename it to something like iommu_ser_push_entry / iommu_ser_write_entry This is reserving an object in the objects array. The object is filled once reservation is done. Maybe I can call it alloc_obj_ser or alloc_entry_ser. > >> +{ >> + struct iommu_objs_ser *next_objs, *objs = *objs_ptr; > >Not loving these names :( > >TBH, the reserve_obj_ser function isn't too readable, esp. with all the >variable names, here and in the lu header.. I've had to go back n forth >to the first patch. For example, here next_objs can be next_objs_page & >objs can be curr_page. (PTAL at my reply on PATCH 01 about renaming). Basically with current naming there are "serialization objects" in "serializatoin object arrays". The object is a "base" type with "inherited" types for iommu, domain etc. I will add explanation of all this in the ABI header. Maybe we can name it to: struct iommu_obj_ser; struct iommu_obj_array_ser; or struct iommu_obj_ser_array; Then for reserve/alloc, we use names like "next_obj_array" and "curr_obj_array"? > >> + int idx; >> + >> + if (objs->nr_objs == max_objs) { >> + next_objs = kho_alloc_preserve(PAGE_SIZE); >> + if (IS_ERR(next_objs)) >> + return PTR_ERR(next_objs); >> + >> + objs->next_objs = virt_to_phys(next_objs); >> + objs = next_objs; >> + *objs_ptr = objs; >> + objs->nr_objs = 0; >> + objs->next_objs = 0; > >This seems redundant, no need to zero these out, kho_alloc_preserve >passes __GFP_ZERO to folio_alloc [1], which should z-alloc the pages. Agreed. Will remove this. > >> + } >> + >> + idx = objs->nr_objs++; >> + return idx; >> +} > >Just to give a mental model to fellow reviewers, here's how this is laid >out: > >---------------------------------------------------------------------- >[ PAGE START ] | >---------------------------------------------------------------------- >| iommu_objs_ser (The Page Header) | >| - next_objs: 0x0000 (End of the page-chain) | >| - nr_objs: 2 | >---------------------------------------------------------------------- >| ITEM 0: iommu_domain_ser | >| [ iommu_obj_ser (The entry header) ] | >| - idx: 0 | >| - ref_count: 1 | >| - deleted: 0 | >| [ Domain Data ] | >---------------------------------------------------------------------- >| ITEM 1: iommu_domain_ser | >| [ iommu_obj_ser (The Price Tag) ] | >| - idx: 1 | >| - ref_count: 1 | >| - deleted: 0 | >| [ Domain Data ] | >---------------------------------------------------------------------- >| ... (Empty space for more domains) ... | >| | >---------------------------------------------------------------------- >[ PAGE END ] | >---------------------------------------------------------------------- +1 Will add a table in the header as you suggested. > >> + >> +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser) >> +{ >> + struct iommu_domain_ser *domain_ser; >> + struct iommu_lu_flb_obj *flb_obj; >> + int idx, ret; >> + >> + if (!domain->ops->preserve) >> + return -EOPNOTSUPP; >> + >> + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); >> + if (ret) >> + return ret; >> + >> + guard(mutex)(&flb_obj->lock); >> + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommu_domains, >> + MAX_IOMMU_DOMAIN_SERS); >> + if (idx < 0) >> + return idx; >> + >> + domain_ser = &flb_obj->iommu_domains->iommu_domains[idx]; > >This is slightly less-readable as well, I understand we're trying to: > >iommu_domains_ser -> iommu_domain_ser[idx] but the same name >(iommu_domains) makes it difficult to read.. we should rename this as: Agreed. I will update it to, &flb_obj->curr_iommu_domains->domain_array[idx] > >&flb_obj->iommu_domains_page->domain_entries[idx] or something for >better readability.. > >Also, let's add a comment explaining that reserve_obj_ser actually >advances the flb_obj ptr when necessary.. Agreed. > >IIUC, we start with PAGE 0 initially, store it's phys in the >iommu_flb_preserve op (the iommu_ser_phys et al) & then we go on >alloc-ing more pages and keep storing the "current" active page with the >liveupdate core. Now when we jump into the new kernel, we read the >ser_phys and then follow the page chain, right? Yes, we do an initial allocation of arrays for each object type and then later allocate more as needed. The flb_obj holds the address of currently active array. I will add this explanation in the header. > >> + idx = flb_obj->ser->nr_domains++; >> + domain_ser->obj.idx = idx; >> + domain_ser->obj.ref_count = 1; >> + >> + ret = domain->ops->preserve(domain, domain_ser); >> + if (ret) { >> + domain_ser->obj.deleted = true; >> + return ret; >> + } >> + >> + domain->preserved_state = domain_ser; >> + *ser = domain_ser; >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(iommu_domain_preserve); >> + >> +void iommu_domain_unpreserve(struct iommu_domain *domain) >> +{ >> + struct iommu_domain_ser *domain_ser; >> + struct iommu_lu_flb_obj *flb_obj; >> + int ret; >> + >> + if (!domain->ops->unpreserve) >> + return; >> + >> + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); >> + if (ret) >> + return; >> + >> + guard(mutex)(&flb_obj->lock); >> + >> + /* >> + * There is no check for attached devices here. The correctness relies >> + * on the Live Update Orchestrator's session lifecycle. All resources >> + * (iommufd, vfio devices) are preserved within a single session. If the >> + * session is torn down, the .unpreserve callbacks for all files will be >> + * invoked, ensuring a consistent cleanup without needing explicit >> + * refcounting for the serialized objects here. >> + */ >> + domain_ser = domain->preserved_state; >> + domain->ops->unpreserve(domain, domain_ser); >> + domain_ser->obj.deleted = true; >> + domain->preserved_state = NULL; >> +} >> +EXPORT_SYMBOL_GPL(iommu_domain_unpreserve); >> + >> +static int iommu_preserve_locked(struct iommu_device *iommu) >> +{ >> + struct iommu_lu_flb_obj *flb_obj; >> + struct iommu_ser *iommu_ser; >> + int idx, ret; >> + >> + if (!iommu->ops->preserve) >> + return -EOPNOTSUPP; >> + >> + if (iommu->outgoing_preserved_state) { >> + iommu->outgoing_preserved_state->obj.ref_count++; >> + return 0; >> + } >> + >> + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); >> + if (ret) >> + return ret; >> + >> + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->iommus, >> + MAX_IOMMU_SERS); >> + if (idx < 0) >> + return idx; >> + >> + iommu_ser = &flb_obj->iommus->iommus[idx]; >> + idx = flb_obj->ser->nr_iommus++; >> + iommu_ser->obj.idx = idx; >> + iommu_ser->obj.ref_count = 1; >> + >> + ret = iommu->ops->preserve(iommu, iommu_ser); >> + if (ret) >> + iommu_ser->obj.deleted = true; >> + >> + iommu->outgoing_preserved_state = iommu_ser; >> + return ret; >> +} >> + >> +static void iommu_unpreserve_locked(struct iommu_device *iommu) >> +{ >> + struct iommu_ser *iommu_ser = iommu->outgoing_preserved_state; >> + >> + iommu_ser->obj.ref_count--; >> + if (iommu_ser->obj.ref_count) >> + return; >> + >> + iommu->outgoing_preserved_state = NULL; >> + iommu->ops->unpreserve(iommu, iommu_ser); >> + iommu_ser->obj.deleted = true; >> +} >> + >> +int iommu_preserve_device(struct iommu_domain *domain, >> + struct device *dev, u64 token) >> +{ >> + struct iommu_lu_flb_obj *flb_obj; >> + struct device_ser *device_ser; >> + struct dev_iommu *iommu; >> + struct pci_dev *pdev; >> + int ret, idx; >> + >> + if (!dev_is_pci(dev)) >> + return -EOPNOTSUPP; >> + >> + if (!domain->preserved_state) >> + return -EINVAL; >> + >> + pdev = to_pci_dev(dev); >> + iommu = dev->iommu; >> + if (!iommu->iommu_dev->ops->preserve_device || >> + !iommu->iommu_dev->ops->preserve) >> + return -EOPNOTSUPP; >> + >> + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); >> + if (ret) >> + return ret; >> + >> + guard(mutex)(&flb_obj->lock); >> + idx = reserve_obj_ser((struct iommu_objs_ser **)&flb_obj->devices, >> + MAX_DEVICE_SERS); >> + if (idx < 0) >> + return idx; >> + >> + device_ser = &flb_obj->devices->devices[idx]; >> + idx = flb_obj->ser->nr_devices++; >> + device_ser->obj.idx = idx; >> + device_ser->obj.ref_count = 1; >> + >> + ret = iommu_preserve_locked(iommu->iommu_dev); >> + if (ret) { >> + device_ser->obj.deleted = true; >> + return ret; >> + } >> + >> + device_ser->domain_iommu_ser.domain_phys = __pa(domain->preserved_state); >> + device_ser->domain_iommu_ser.iommu_phys = __pa(iommu->iommu_dev->outgoing_preserved_state); >> + device_ser->devid = pci_dev_id(pdev); >> + device_ser->pci_domain = pci_domain_nr(pdev->bus); >> + device_ser->token = token; >> + >> + ret = iommu->iommu_dev->ops->preserve_device(dev, device_ser); >> + if (ret) { >> + device_ser->obj.deleted = true; >> + iommu_unpreserve_locked(iommu->iommu_dev); >> + return ret; >> + } >> + >> + dev->iommu->device_ser = device_ser; >> + return 0; >> +} >> + >> +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev) >> +{ >> + struct iommu_lu_flb_obj *flb_obj; >> + struct device_ser *device_ser; >> + struct dev_iommu *iommu; >> + struct pci_dev *pdev; >> + int ret; >> + >> + if (!dev_is_pci(dev)) >> + return; >> + >> + pdev = to_pci_dev(dev); >> + iommu = dev->iommu; >> + if (!iommu->iommu_dev->ops->unpreserve_device || >> + !iommu->iommu_dev->ops->unpreserve) >> + return; >> + >> + ret = liveupdate_flb_get_outgoing(&iommu_flb, (void **)&flb_obj); >> + if (WARN_ON(ret)) >> + return; >> + >> + guard(mutex)(&flb_obj->lock); >> + device_ser = dev_iommu_preserved_state(dev); >> + if (WARN_ON(!device_ser)) >> + return; >> + >> + iommu->iommu_dev->ops->unpreserve_device(dev, device_ser); >> + dev->iommu->device_ser = NULL; >> + >> + iommu_unpreserve_locked(iommu->iommu_dev); >> +} > >I'm wondering ig we should guard these APIs against accidental or >potential abuse by other in-kernel drivers. Since the Live Update >Orchestrator (LUO) is architecturally designed around user-space >driven sequences (IOCTLs, specific mutex ordering, etc.), for now. > >Since the header-file is also under include/linux, we should avoid any >possibility where we end up having drivers depending on these API. >We could add a check based on dma owner: > >+ /* Only devices explicitly claimed by a user-space driver >+ * (VFIO/IOMMUFD) are eligible for Live Update preservation. >+ */ >+ if (!iommu_dma_owner_claimed(dev)) >+ return -EPERM; > >This should ensure we aren't creating 'zombie' preserved states for >devices not managed by IOMMUFD/VFIO. Agreed. I will update this. > > >> diff --git a/include/linux/iommu-lu.h b/include/linux/iommu-lu.h >> index 59095d2f1bb2..48c07514a776 100644 >> --- a/include/linux/iommu-lu.h >> +++ b/include/linux/iommu-lu.h >> @@ -8,9 +8,128 @@ >> #ifndef _LINUX_IOMMU_LU_H >> #define _LINUX_IOMMU_LU_H >> >> +#include >> +#include >> #include >> #include >> >> +typedef int (*iommu_preserved_device_iter_fn)(struct device_ser *ser, >> + void *arg); >> +#ifdef CONFIG_IOMMU_LIVEUPDATE >> +static inline void *dev_iommu_preserved_state(struct device *dev) >> +{ >> + struct device_ser *ser; >> + >> + if (!dev->iommu) >> + return NULL; >> + >> + ser = dev->iommu->device_ser; >> + if (ser && !ser->obj.incoming) >> + return ser; >> + >> + return NULL; >> +} >> + >> +static inline void *dev_iommu_restored_state(struct device *dev) >> +{ >> + struct device_ser *ser; >> + >> + if (!dev->iommu) >> + return NULL; >> + >> + ser = dev->iommu->device_ser; >> + if (ser && ser->obj.incoming) >> + return ser; >> + >> + return NULL; >> +} >> + >> +static inline void *iommu_domain_restored_state(struct iommu_domain *domain) >> +{ >> + struct iommu_domain_ser *ser; >> + >> + ser = domain->preserved_state; >> + if (ser && ser->obj.incoming) >> + return ser; >> + >> + return NULL; >> +} >> + >> +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain) >> +{ >> + struct device_ser *ser = dev_iommu_restored_state(dev); >> + >> + if (ser && iommu_domain_restored_state(domain)) >> + return ser->domain_iommu_ser.did; >> + >> + return -1; >> +} >> + >> +int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, >> + void *arg); >> +struct device_ser *iommu_get_device_preserved_data(struct device *dev); >> +struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type); >> +int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser); >> +void iommu_domain_unpreserve(struct iommu_domain *domain); >> +int iommu_preserve_device(struct iommu_domain *domain, >> + struct device *dev, u64 token); >> +void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev); >> +#else >> +static inline void *dev_iommu_preserved_state(struct device *dev) >> +{ >> + return NULL; >> +} >> + >> +static inline void *dev_iommu_restored_state(struct device *dev) >> +{ >> + return NULL; >> +} >> + >> +static inline int dev_iommu_restore_did(struct device *dev, struct iommu_domain *domain) >> +{ >> + return -1; >> +} >> + >> +static inline void *iommu_domain_restored_state(struct iommu_domain *domain) >> +{ >> + return NULL; >> +} >> + >> +static inline int iommu_for_each_preserved_device(iommu_preserved_device_iter_fn fn, void *arg) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +static inline struct device_ser *iommu_get_device_preserved_data(struct device *dev) >> +{ >> + return NULL; >> +} >> + >> +static inline struct iommu_ser *iommu_get_preserved_data(u64 token, enum iommu_lu_type type) >> +{ >> + return NULL; >> +} >> + >> +static inline int iommu_domain_preserve(struct iommu_domain *domain, struct iommu_domain_ser **ser) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +static inline void iommu_domain_unpreserve(struct iommu_domain *domain) >> +{ >> +} >> + >> +static inline int iommu_preserve_device(struct iommu_domain *domain, >> + struct device *dev, u64 token) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +static inline void iommu_unpreserve_device(struct iommu_domain *domain, struct device *dev) >> +{ >> +} >> +#endif >> + >> int iommu_liveupdate_register_flb(struct liveupdate_file_handler *handler); >> int iommu_liveupdate_unregister_flb(struct liveupdate_file_handler *handler); >> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >> index 54b8b48c762e..bd949c1ce7c5 100644 >> --- a/include/linux/iommu.h >> +++ b/include/linux/iommu.h >> @@ -14,6 +14,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> #include >> >> #define IOMMU_READ (1 << 0) >> @@ -248,6 +250,10 @@ struct iommu_domain { >> struct list_head next; >> }; >> }; >> + >> +#ifdef CONFIG_IOMMU_LIVEUPDATE >> + struct iommu_domain_ser *preserved_state; >> +#endif >> }; >> >> static inline bool iommu_is_dma_domain(struct iommu_domain *domain) >> @@ -647,6 +653,10 @@ __iommu_copy_struct_to_user(const struct iommu_user_data *dst_data, >> * resources shared/passed to user space IOMMU instance. Associate >> * it with a nesting @parent_domain. It is required for driver to >> * set @viommu->ops pointing to its own viommu_ops >> + * @preserve_device: Preserve state of a device for liveupdate. >> + * @unpreserve_device: Unpreserve state that was preserved earlier. >> + * @preserve: Preserve state of iommu translation hardware for liveupdate. >> + * @unpreserve: Unpreserve state of iommu that was preserved earlier. >> * @owner: Driver module providing these ops >> * @identity_domain: An always available, always attachable identity >> * translation. >> @@ -703,6 +713,11 @@ struct iommu_ops { >> struct iommu_domain *parent_domain, >> const struct iommu_user_data *user_data); >> >> + int (*preserve_device)(struct device *dev, struct device_ser *device_ser); >> + void (*unpreserve_device)(struct device *dev, struct device_ser *device_ser); > >Nit: Let's move the _device ops under the comment: >`/* Per device IOMMU features */` I will move these. > >> + int (*preserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser); >> + void (*unpreserve)(struct iommu_device *iommu, struct iommu_ser *iommu_ser); >> + > >I'm wondering if there's any benefit to add these ops under a #ifdef ? These should be NULL when liveupdate is disabled, so #ifdef should not be needed. Not sure how much space we save if don't define these. I will move these in #ifdef anyway. > >> const struct iommu_domain_ops *default_domain_ops; >> struct module *owner; >> struct iommu_domain *identity_domain; >> @@ -749,6 +764,11 @@ struct iommu_ops { >> * specific mechanisms. >> * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) >> * @free: Release the domain after use. >> + * @preserve: Preserve the iommu domain for liveupdate. >> + * Returns 0 on success, a negative errno on failure. >> + * @unpreserve: Unpreserve the iommu domain that was preserved earlier. >> + * @restore: Restore the iommu domain after liveupdate. >> + * Returns 0 on success, a negative errno on failure. >> */ >> struct iommu_domain_ops { >> int (*attach_dev)(struct iommu_domain *domain, struct device *dev, >> @@ -779,6 +799,9 @@ struct iommu_domain_ops { >> unsigned long quirks); >> >> void (*free)(struct iommu_domain *domain); >> + int (*preserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser); >> + void (*unpreserve)(struct iommu_domain *domain, struct iommu_domain_ser *ser); >> + int (*restore)(struct iommu_domain *domain, struct iommu_domain_ser *ser); >> }; >> >> /** >> @@ -790,6 +813,8 @@ struct iommu_domain_ops { >> * @singleton_group: Used internally for drivers that have only one group >> * @max_pasids: number of supported PASIDs >> * @ready: set once iommu_device_register() has completed successfully >> + * @outgoing_preserved_state: preserved iommu state of outgoing kernel for >> + * liveupdate. >> */ >> struct iommu_device { >> struct list_head list; >> @@ -799,6 +824,10 @@ struct iommu_device { >> struct iommu_group *singleton_group; >> u32 max_pasids; >> bool ready; >> + >> +#ifdef CONFIG_IOMMU_LIVEUPDATE >> + struct iommu_ser *outgoing_preserved_state; >> +#endif >> }; >> >> /** >> @@ -853,6 +882,9 @@ struct dev_iommu { >> u32 pci_32bit_workaround:1; >> u32 require_direct:1; >> u32 shadow_on_flush:1; >> +#ifdef CONFIG_IOMMU_LIVEUPDATE >> + struct device_ser *device_ser; >> +#endif >> }; >> >> int iommu_device_register(struct iommu_device *iommu, > >Thanks, >Praan > >[1] https://elixir.bootlin.com/linux/v7.0-rc3/source/kernel/liveupdate/kexec_handover.c#L1182