From: Bjorn Helgaas <helgaas@kernel.org>
To: David Matlack <dmatlack@google.com>
Cc: "Alex Williamson" <alex@shazbot.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Adithya Jayachandran" <ajayachandra@nvidia.com>,
"Alexander Graf" <graf@amazon.com>,
"Alex Mastro" <amastro@fb.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Ankit Agrawal" <ankita@nvidia.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Askar Safin" <safinaskar@gmail.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
"Chris Li" <chrisl@kernel.org>,
"Dapeng Mi" <dapeng1.mi@linux.intel.com>,
"David Rientjes" <rientjes@google.com>,
"Feng Tang" <feng.tang@linux.alibaba.com>,
"Jacob Pan" <jacob.pan@linux.microsoft.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Jonathan Corbet" <corbet@lwn.net>,
"Josh Hilke" <jrhilke@google.com>, "Kees Cook" <kees@kernel.org>,
"Kevin Tian" <kevin.tian@intel.com>,
kexec@lists.infradead.org, kvm@vger.kernel.org,
"Leon Romanovsky" <leon@kernel.org>,
"Leon Romanovsky" <leonro@nvidia.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, "Li RongQing" <lirongqing@baidu.com>,
"Lukas Wunner" <lukas@wunner.de>,
"Marco Elver" <elver@google.com>,
"Michał Winiarski" <michal.winiarski@intel.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Parav Pandit" <parav@nvidia.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Pranjal Shrivastava" <praan@google.com>,
"Pratyush Yadav" <pratyush@kernel.org>,
"Raghavendra Rao Ananta" <rananta@google.com>,
"Randy Dunlap" <rdunlap@infradead.org>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Saeed Mahameed" <saeedm@nvidia.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Shuah Khan" <skhan@linuxfoundation.org>,
"Vipin Sharma" <vipinsh@google.com>,
"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
"William Tu" <witu@nvidia.com>, "Yi Liu" <yi.l.liu@intel.com>,
"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
Date: Mon, 30 Mar 2026 17:54:37 -0500 [thread overview]
Message-ID: <20260330225437.GA111390@bhelgaas> (raw)
In-Reply-To: <acWne_ZCcF4YQN25@google.com>
On Thu, Mar 26, 2026 at 09:39:07PM +0000, David Matlack wrote:
> On 2026-03-25 06:12 PM, Bjorn Helgaas wrote:
> > On Mon, Mar 23, 2026 at 11:57:54PM +0000, David Matlack wrote:
> > > Add an API to enable the PCI subsystem to participate in a Live Update
> > > and track all devices that are being preserved by drivers. Since this
> > > support is still under development, hide it behind a new Kconfig
> > > PCI_LIVEUPDATE that is marked experimental.
> ...
> > This sets "dev->liveupdate_incoming = false", and the only place we
> > check that is in pci_liveupdate_retrieve(). In particular, there's
> > nothing in the driver bind/unbind paths that seems related. I guess
> > pci_liveupdate_finish() just means the driver can't call
> > pci_liveupdate_retrieve() any more?
>
> liveupdate_incoming is used by VFIO in patch 10:
>
> https://lore.kernel.org/kvm/20260323235817.1960573-11-dmatlack@google.com/
>
> Fundamentally, I think drivers will need to know that the device they
> are dealing with was preserved across the Live Update so they can react
> accordingly and this is how they know. This feels like an appropriate
> responsibility to delegate to the PCI core since it can be common across
> all PCI devices, rather than requiring drivers to store their own state
> about which devices were preserved. I suspect PCI core will also use
> liveupdate_incoming in the future (e.g. to avoid assigning new BARs) as
> we implement more of the device preservation.
Yes. It's easier to review if this is added at the point where it is
used.
> And in case you are also wondering about liveupdate_outgoing, I forsee
> that being used for things like skipping disabling bus mastering in
> pci_device_shutdown().
>
> I think it would be a good idea to try to split this patch up, so there
> is more breathing room to explain this context in the commit messages.
Sounds good.
> > > + * Don't both accounting for VFs that could be created after this
> > > + * since preserving VFs is not supported yet. Also don't account
> > > + * for devices that could be hot-plugged after this since preserving
> > > + * hot-plugged devices across Live Update is not yet an expected
> > > + * use-case.
> >
> > s/Don't both accounting/Don't bother accounting/ ? not sure of intent
>
> "Don't bother" was the intent.
>
> > I suspect the important thing here is that this allocates space for
> > preserving X devices, and each subsequent pci_liveupdate_preserve()
> > call from a driver uses up one of those slots.
> >
> > My guess is this is just an allocation issue and from that point of
> > view there's no actual problem with enabling VFs or hot-adding devices
> > after this point; it's just that pci_liveupdate_preserve() will fail
> > after X calls.
>
> Yes that is correct.
Mentioning VFs in the comment is a slight misdirection when the actual
concern is just about the number of devices.
> I see that a lot of your comments are about these WARN_ONs so do you
> have any general guidance on how I should be handling them?
If it's practical to arrange it so we dereference a NULL pointer or
similar, that's my preference because it doesn't take extra code and
it's impossible to ignore. Sometimes people add "if (!ptr) return
-EINVAL;" to internal functions where "ptr" should never be NULL. IMO
cases like that should just use assume "ptr" is valid and use it.
Likely not a practical strategy in your case.
> > > + if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!"))
> >
> > Seems like an every-time sort of message if this indicates a driver bug?
> >
> > It's enough of a hassle to convince myself that pci_WARN_ONCE()
> > returns the value that caused the warning that I would prefer:
> >
> > if (!dev_ser) {
> > pci_warn(...) or pci_WARN_ONCE(...)
> > return;
> > }
>
> For "this should really never happen" warnings, which is the case here,
> my preference is to use WARN_ON_ONCE() since you only need to see it
> happen once to know there is a bug somewhere, and logging every time can
> lead to overwhelmingly interleaved logs if it happens too many times.
I'm objecting more to using the return value of pci_WARN_ONCE() than
the warning itself. It's not really obvious what WARN_ONCE() should
return and kind of a hassle to figure it out, so I think it's clearer
in this case to test dev_ser directly.
> > > + for (i = ser->nr_devices; i > 0; i--) {
> > > + struct pci_dev_ser *prev = &ser->devices[i - 1];
> > > + int cmp = pci_dev_ser_cmp(&new, prev);
> > > +
> > > + /*
> > > + * This should never happen unless there is a kernel bug or
> > > + * corruption that causes the state in struct pci_ser to get out
> > > + * of sync with struct pci_dev.
> >
> > Huh. Same comment as above. I don't think this is telling me
> > anything useful. I guess what happened is we're trying to preserve X
> > and X is already in "ser", but we should have returned -EBUSY above
> > for that case. If we're just saying memory corruption could cause
> > bugs, I think that's pointless.
> >
> > Actually I'm not even sure we should check for this.
> >
> > > + */
> > > + if (WARN_ON_ONCE(!cmp))
> > > + return -EBUSY;
>
> This is another "this should really never happen" check. I could just
> return without warning but this is a sign that something is very wrong
> somewhere in the kernel and it is trivial to just add WARN_ON_ONCE() so
> that it gets flagged in dmesg. In my experience that can be very helpful
> to track down logic bugs during developemt and rare race conditions at
> scale in production environments.
OK. Maybe just remove the comment. It's self-evident that
WARN_ON_ONCE() is a "shouldn't happen" situation, and I don't think
the comment contains useful information.
> > > +u32 pci_liveupdate_incoming_nr_devices(void)
> > > +{
> > > + struct pci_ser *ser;
> > > +
> > > + if (pci_liveupdate_flb_get_incoming(&ser))
> > > + return 0;
> >
> > Seems slightly overcomplicated to return various error codes from
> > pci_liveupdate_flb_get_incoming(), only to throw them away here and
> > special-case the "return 0". I think you *could* set
> > "ser->nr_devices" to zero at entry to
> > pci_liveupdate_flb_get_incoming() and make this just:
> >
> > pci_liveupdate_flb_get_incoming(&ser);
> > return ser->nr_devices;
>
> pci_liveupdate_flb_get_incoming() fetches the preserved pci_ser struct
> from LUO (the struct that the previous kernel allocated and populated).
> If pci_liveupdate_flb_get_incoming() returns an error, it means there
> was no struct pci_ser preserved by the previous kernel (or at least not
> that the current kernel is compatible with), so we return 0 here to
> indicate that 0 devices were preserved.
Right. Here's what I was thinking:
pci_liveupdate_flb_get_incoming(...)
{
struct pci_ser *ser = *serp;
ser->nr_devices = 0;
ret = liveupdate_flb_get_incoming(...);
...
if (ret == -ENOENT) {
pr_info_once("PCI: No incoming FLB data detected during Live Update");
return;
}
WARN_ONCE(ret, "PCI: Failed to retrieve incoming ...");
}
u32 pci_liveupdate_incoming_nr_devices(void)
{
pci_liveupdate_flb_get_incoming(&ser);
return ser->nr_devices;
}
> > > +++ b/include/linux/kho/abi/pci.h
> >
> > It seems like most of include/linux/ is ABI, so does kho/abi/ need to
> > be separated out in its own directory?
>
> include/linux/kho/abi/ contains all of the structs, enums, etc. that are
> handed off between kernels during a Live Update. If almost anything
> changes in this directory, it breaks our ability to upgrade/downgrade
> via Live Update. That's why it's split off into its own directory.
>
> include/linux/ is not part of the Live Update ABI. Changes to those
> headers to not affect our ability to upgrade/downgrade via Live Update.
>
> > It's kind of unusual for the hierarchy to be this deep, especially
> > since abi/ is the only thing in include/linux/kho/.
>
> Yes I agree, but that is outside the scope of this patchset I think.
> This directory already exists.
Agreed.
Bjorn
next prev parent reply other threads:[~2026-03-30 22:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06 ` David Matlack
2026-03-25 23:12 ` Bjorn Helgaas
2026-03-26 21:39 ` David Matlack
2026-03-30 22:54 ` Bjorn Helgaas [this message]
2026-03-31 17:33 ` Samiullah Khawaja
2026-04-02 21:28 ` Yanjun.Zhu
2026-04-03 17:24 ` Chris Li
2026-04-03 21:58 ` David Matlack
2026-04-05 16:56 ` Zhu Yanjun
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 18:00 ` David Matlack
2026-03-25 11:12 ` Yi Liu
2026-03-25 17:29 ` David Matlack
2026-03-25 23:13 ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 16:33 ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08 ` Yi Liu
2026-03-24 16:46 ` David Matlack
2026-03-27 23:39 ` Samiullah Khawaja
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08 ` Yi Liu
2026-03-24 17:05 ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-24 6:42 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files Askar Safin
2026-03-26 20:43 ` David Matlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330225437.GA111390@bhelgaas \
--to=helgaas@kernel.org \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dapeng1.mi@linux.intel.com \
--cc=dmatlack@google.com \
--cc=elver@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=graf@amazon.com \
--cc=jacob.pan@linux.microsoft.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jrhilke@google.com \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kexec@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=lukas@wunner.de \
--cc=michal.winiarski@intel.com \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=praan@google.com \
--cc=pratyush@kernel.org \
--cc=rananta@google.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rodrigo.vivi@intel.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=safinaskar@gmail.com \
--cc=skhan@linuxfoundation.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=vivek.kasireddy@intel.com \
--cc=witu@nvidia.com \
--cc=yanjun.zhu@linux.dev \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox