From: Bjorn Helgaas <helgaas@kernel.org>
To: David Matlack <dmatlack@google.com>
Cc: "Alex Williamson" <alex@shazbot.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Adithya Jayachandran" <ajayachandra@nvidia.com>,
"Alexander Graf" <graf@amazon.com>,
"Alex Mastro" <amastro@fb.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Ankit Agrawal" <ankita@nvidia.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Askar Safin" <safinaskar@gmail.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
"Chris Li" <chrisl@kernel.org>,
"Dapeng Mi" <dapeng1.mi@linux.intel.com>,
"David Rientjes" <rientjes@google.com>,
"Feng Tang" <feng.tang@linux.alibaba.com>,
"Jacob Pan" <jacob.pan@linux.microsoft.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Jonathan Corbet" <corbet@lwn.net>,
"Josh Hilke" <jrhilke@google.com>, "Kees Cook" <kees@kernel.org>,
"Kevin Tian" <kevin.tian@intel.com>,
kexec@lists.infradead.org, kvm@vger.kernel.org,
"Leon Romanovsky" <leon@kernel.org>,
"Leon Romanovsky" <leonro@nvidia.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, "Li RongQing" <lirongqing@baidu.com>,
"Lukas Wunner" <lukas@wunner.de>,
"Marco Elver" <elver@google.com>,
"Michał Winiarski" <michal.winiarski@intel.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Parav Pandit" <parav@nvidia.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Pranjal Shrivastava" <praan@google.com>,
"Pratyush Yadav" <pratyush@kernel.org>,
"Raghavendra Rao Ananta" <rananta@google.com>,
"Randy Dunlap" <rdunlap@infradead.org>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Saeed Mahameed" <saeedm@nvidia.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Shuah Khan" <skhan@linuxfoundation.org>,
"Vipin Sharma" <vipinsh@google.com>,
"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
"William Tu" <witu@nvidia.com>, "Yi Liu" <yi.l.liu@intel.com>,
"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update
Date: Mon, 30 Mar 2026 17:54:37 -0500 [thread overview]
Message-ID: <20260330225437.GA111390@bhelgaas> (raw)
In-Reply-To: <acWne_ZCcF4YQN25@google.com>
On Thu, Mar 26, 2026 at 09:39:07PM +0000, David Matlack wrote:
> On 2026-03-25 06:12 PM, Bjorn Helgaas wrote:
> > On Mon, Mar 23, 2026 at 11:57:54PM +0000, David Matlack wrote:
> > > Add an API to enable the PCI subsystem to participate in a Live Update
> > > and track all devices that are being preserved by drivers. Since this
> > > support is still under development, hide it behind a new Kconfig
> > > PCI_LIVEUPDATE that is marked experimental.
> ...
> > This sets "dev->liveupdate_incoming = false", and the only place we
> > check that is in pci_liveupdate_retrieve(). In particular, there's
> > nothing in the driver bind/unbind paths that seems related. I guess
> > pci_liveupdate_finish() just means the driver can't call
> > pci_liveupdate_retrieve() any more?
>
> liveupdate_incoming is used by VFIO in patch 10:
>
> https://lore.kernel.org/kvm/20260323235817.1960573-11-dmatlack@google.com/
>
> Fundamentally, I think drivers will need to know that the device they
> are dealing with was preserved across the Live Update so they can react
> accordingly and this is how they know. This feels like an appropriate
> responsibility to delegate to the PCI core since it can be common across
> all PCI devices, rather than requiring drivers to store their own state
> about which devices were preserved. I suspect PCI core will also use
> liveupdate_incoming in the future (e.g. to avoid assigning new BARs) as
> we implement more of the device preservation.
Yes. It's easier to review if this is added at the point where it is
used.
> And in case you are also wondering about liveupdate_outgoing, I forsee
> that being used for things like skipping disabling bus mastering in
> pci_device_shutdown().
>
> I think it would be a good idea to try to split this patch up, so there
> is more breathing room to explain this context in the commit messages.
Sounds good.
> > > + * Don't both accounting for VFs that could be created after this
> > > + * since preserving VFs is not supported yet. Also don't account
> > > + * for devices that could be hot-plugged after this since preserving
> > > + * hot-plugged devices across Live Update is not yet an expected
> > > + * use-case.
> >
> > s/Don't both accounting/Don't bother accounting/ ? not sure of intent
>
> "Don't bother" was the intent.
>
> > I suspect the important thing here is that this allocates space for
> > preserving X devices, and each subsequent pci_liveupdate_preserve()
> > call from a driver uses up one of those slots.
> >
> > My guess is this is just an allocation issue and from that point of
> > view there's no actual problem with enabling VFs or hot-adding devices
> > after this point; it's just that pci_liveupdate_preserve() will fail
> > after X calls.
>
> Yes that is correct.
Mentioning VFs in the comment is a slight misdirection when the actual
concern is just about the number of devices.
> I see that a lot of your comments are about these WARN_ONs so do you
> have any general guidance on how I should be handling them?
If it's practical to arrange it so we dereference a NULL pointer or
similar, that's my preference because it doesn't take extra code and
it's impossible to ignore. Sometimes people add "if (!ptr) return
-EINVAL;" to internal functions where "ptr" should never be NULL. IMO
cases like that should just use assume "ptr" is valid and use it.
Likely not a practical strategy in your case.
> > > + if (pci_WARN_ONCE(dev, !dev_ser, "Cannot find preserved device!"))
> >
> > Seems like an every-time sort of message if this indicates a driver bug?
> >
> > It's enough of a hassle to convince myself that pci_WARN_ONCE()
> > returns the value that caused the warning that I would prefer:
> >
> > if (!dev_ser) {
> > pci_warn(...) or pci_WARN_ONCE(...)
> > return;
> > }
>
> For "this should really never happen" warnings, which is the case here,
> my preference is to use WARN_ON_ONCE() since you only need to see it
> happen once to know there is a bug somewhere, and logging every time can
> lead to overwhelmingly interleaved logs if it happens too many times.
I'm objecting more to using the return value of pci_WARN_ONCE() than
the warning itself. It's not really obvious what WARN_ONCE() should
return and kind of a hassle to figure it out, so I think it's clearer
in this case to test dev_ser directly.
> > > + for (i = ser->nr_devices; i > 0; i--) {
> > > + struct pci_dev_ser *prev = &ser->devices[i - 1];
> > > + int cmp = pci_dev_ser_cmp(&new, prev);
> > > +
> > > + /*
> > > + * This should never happen unless there is a kernel bug or
> > > + * corruption that causes the state in struct pci_ser to get out
> > > + * of sync with struct pci_dev.
> >
> > Huh. Same comment as above. I don't think this is telling me
> > anything useful. I guess what happened is we're trying to preserve X
> > and X is already in "ser", but we should have returned -EBUSY above
> > for that case. If we're just saying memory corruption could cause
> > bugs, I think that's pointless.
> >
> > Actually I'm not even sure we should check for this.
> >
> > > + */
> > > + if (WARN_ON_ONCE(!cmp))
> > > + return -EBUSY;
>
> This is another "this should really never happen" check. I could just
> return without warning but this is a sign that something is very wrong
> somewhere in the kernel and it is trivial to just add WARN_ON_ONCE() so
> that it gets flagged in dmesg. In my experience that can be very helpful
> to track down logic bugs during developemt and rare race conditions at
> scale in production environments.
OK. Maybe just remove the comment. It's self-evident that
WARN_ON_ONCE() is a "shouldn't happen" situation, and I don't think
the comment contains useful information.
> > > +u32 pci_liveupdate_incoming_nr_devices(void)
> > > +{
> > > + struct pci_ser *ser;
> > > +
> > > + if (pci_liveupdate_flb_get_incoming(&ser))
> > > + return 0;
> >
> > Seems slightly overcomplicated to return various error codes from
> > pci_liveupdate_flb_get_incoming(), only to throw them away here and
> > special-case the "return 0". I think you *could* set
> > "ser->nr_devices" to zero at entry to
> > pci_liveupdate_flb_get_incoming() and make this just:
> >
> > pci_liveupdate_flb_get_incoming(&ser);
> > return ser->nr_devices;
>
> pci_liveupdate_flb_get_incoming() fetches the preserved pci_ser struct
> from LUO (the struct that the previous kernel allocated and populated).
> If pci_liveupdate_flb_get_incoming() returns an error, it means there
> was no struct pci_ser preserved by the previous kernel (or at least not
> that the current kernel is compatible with), so we return 0 here to
> indicate that 0 devices were preserved.
Right. Here's what I was thinking:
pci_liveupdate_flb_get_incoming(...)
{
struct pci_ser *ser = *serp;
ser->nr_devices = 0;
ret = liveupdate_flb_get_incoming(...);
...
if (ret == -ENOENT) {
pr_info_once("PCI: No incoming FLB data detected during Live Update");
return;
}
WARN_ONCE(ret, "PCI: Failed to retrieve incoming ...");
}
u32 pci_liveupdate_incoming_nr_devices(void)
{
pci_liveupdate_flb_get_incoming(&ser);
return ser->nr_devices;
}
> > > +++ b/include/linux/kho/abi/pci.h
> >
> > It seems like most of include/linux/ is ABI, so does kho/abi/ need to
> > be separated out in its own directory?
>
> include/linux/kho/abi/ contains all of the structs, enums, etc. that are
> handed off between kernels during a Live Update. If almost anything
> changes in this directory, it breaks our ability to upgrade/downgrade
> via Live Update. That's why it's split off into its own directory.
>
> include/linux/ is not part of the Live Update ABI. Changes to those
> headers to not affect our ability to upgrade/downgrade via Live Update.
>
> > It's kind of unusual for the hierarchy to be this deep, especially
> > since abi/ is the only thing in include/linux/kho/.
>
> Yes I agree, but that is outside the scope of this patchset I think.
> This directory already exists.
Agreed.
Bjorn
next prev parent reply other threads:[~2026-03-30 22:54 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 23:57 [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-03-23 23:57 ` [PATCH v3 01/24] liveupdate: Export symbols needed by modules David Matlack
2026-03-23 23:57 ` [PATCH v3 02/24] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-03-25 20:06 ` David Matlack
2026-03-25 23:12 ` Bjorn Helgaas
2026-03-26 21:39 ` David Matlack
2026-03-30 22:54 ` Bjorn Helgaas [this message]
2026-03-31 17:33 ` Samiullah Khawaja
2026-04-02 21:28 ` Yanjun.Zhu
2026-04-03 17:24 ` Chris Li
2026-04-03 21:58 ` David Matlack
2026-04-05 16:56 ` Zhu Yanjun
2026-04-06 16:06 ` David Matlack
2026-04-06 18:09 ` Yanjun.Zhu
2026-03-23 23:57 ` [PATCH v3 03/24] PCI: Require Live Update preserved devices are in singleton iommu_groups David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 18:00 ` David Matlack
2026-03-25 11:12 ` Yi Liu
2026-03-25 17:29 ` David Matlack
2026-03-25 23:13 ` Bjorn Helgaas
2026-03-23 23:57 ` [PATCH v3 04/24] PCI: Inherit bus numbers from previous kernel during Live Update David Matlack
2026-03-23 23:57 ` [PATCH v3 05/24] docs: liveupdate: Add documentation for PCI David Matlack
2026-03-23 23:57 ` [PATCH v3 06/24] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-03-24 13:07 ` Yi Liu
2026-03-24 16:33 ` David Matlack
2026-03-23 23:57 ` [PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-03-24 13:08 ` Yi Liu
2026-03-24 16:46 ` David Matlack
2026-03-27 23:39 ` Samiullah Khawaja
2026-04-21 17:40 ` David Matlack
2026-04-21 18:44 ` Jason Gunthorpe
2026-04-21 19:02 ` David Matlack
2026-04-21 19:20 ` Jason Gunthorpe
2026-03-23 23:58 ` [PATCH v3 08/24] vfio/pci: Retrieve preserved device files after " David Matlack
2026-03-24 13:08 ` Yi Liu
2026-03-24 17:05 ` David Matlack
2026-03-23 23:58 ` [PATCH v3 09/24] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-03-23 23:58 ` [PATCH v3 10/24] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-03-23 23:58 ` [PATCH v3 11/24] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-03-23 23:58 ` [PATCH v3 12/24] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 13/24] docs: liveupdate: Add documentation for VFIO PCI David Matlack
2026-03-23 23:58 ` [PATCH v3 14/24] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-03-23 23:58 ` [PATCH v3 15/24] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-03-23 23:58 ` [PATCH v3 16/24] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-03-23 23:58 ` [PATCH v3 17/24] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-03-23 23:58 ` [PATCH v3 18/24] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-03-23 23:58 ` [PATCH v3 19/24] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-03-23 23:58 ` [PATCH v3 20/24] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-03-23 23:58 ` [PATCH v3 21/24] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-03-23 23:58 ` [PATCH v3 22/24] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-03-23 23:58 ` [PATCH v3 23/24] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-03-23 23:58 ` [PATCH v3 24/24] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack
2026-03-24 6:42 ` [PATCH v3 00/24] vfio/pci: Base Live Update support for VFIO device files Askar Safin
2026-03-26 20:43 ` David Matlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330225437.GA111390@bhelgaas \
--to=helgaas@kernel.org \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=arnd@arndb.de \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dapeng1.mi@linux.intel.com \
--cc=dmatlack@google.com \
--cc=elver@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=graf@amazon.com \
--cc=jacob.pan@linux.microsoft.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=jrhilke@google.com \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kexec@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=lukas@wunner.de \
--cc=michal.winiarski@intel.com \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=praan@google.com \
--cc=pratyush@kernel.org \
--cc=rananta@google.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rodrigo.vivi@intel.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=safinaskar@gmail.com \
--cc=skhan@linuxfoundation.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=vivek.kasireddy@intel.com \
--cc=witu@nvidia.com \
--cc=yanjun.zhu@linux.dev \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.