From: Pratyush Yadav <pratyush@kernel.org>
To: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
kexec@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org,
Adithya Jayachandran <ajayachandra@nvidia.com>,
Alexander Graf <graf@amazon.com>,
Alex Williamson <alex@shazbot.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Chris Li <chrisl@kernel.org>,
David Rientjes <rientjes@google.com>,
Jacob Pan <jacob.pan@linux.microsoft.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Jonathan Corbet <corbet@lwn.net>,
Josh Hilke <jrhilke@google.com>,
Leon Romanovsky <leonro@nvidia.com>,
Lukas Wunner <lukas@wunner.de>, Mike Rapoport <rppt@kernel.org>,
Parav Pandit <parav@nvidia.com>,
Pranjal Shrivastava <praan@google.com>,
Pratyush Yadav <pratyush@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Samiullah Khawaja <skhawaja@google.com>,
Shuah Khan <skhan@linuxfoundation.org>,
Vipin Sharma <vipinsh@google.com>, William Tu <witu@nvidia.com>,
Yi Liu <yi.l.liu@intel.com>
Subject: Re: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices
Date: Thu, 25 Jun 2026 16:35:02 +0200 [thread overview]
Message-ID: <2vxzechulmcp.fsf@kernel.org> (raw)
In-Reply-To: <ajBgj_aSuzMZG47e@google.com> (David Matlack's message of "Mon, 15 Jun 2026 20:29:03 +0000")
Hi David,
On Mon, Jun 15 2026, David Matlack wrote:
> On 2026-06-14 01:38 PM, Pasha Tatashin wrote:
>> On Fri, 22 May 2026 20:24:01 +0000, David Matlack <dmatlack@google.com> wrote:
[...]
>> > + }
>> > +
>> > + pci_info(dev, "Device was preserved by previous kernel across Live Update\n");
>> > + dev->liveupdate.incoming = dev_ser;
>> > +
>> > + /*
>> > + * Hold the ref on the incoming FLB until pci_liveupdate_finish() so
>> > + * that dev->liveupdate.incoming does not get freed while it is in use.
>> > + */
>>
>> How would that work? If finish is not called FLB stays around until the
>> next reboot.
>
> True... I think if the PCI core trusts drivers to call
> pci_liveupdate_finish() then we don't need to hold onto the incoming
> reference here.
That was my point when I was arguing against refcounts on outgoing FLBs.
This is very easy to abuse, especially when we are talking about device
drivers. And this refcounting mechanism makes the FLB no longer
file-lifecycle-bound, since now it is entirely up to drivers to decide
the lifecycle of this data.
I have been thinking about this a bit more in the last couple days, and
I wonder if we are doing this right. Here's an idea I have been thinking
of.
We should make live update a first class citizen in PCI. Instead of
patching in liveupdate via the liveupdate.incoming field, and letting
drivers figure out when to use it, we should separate out probe and
retrieve paths entirely.
Probe and retrieve are fundamentally different operations. While they
may share some common initialization logic for the _software_ state, how
they interface with the hardware is completely different. I think mixing
the two will result in driver code being more spaghetti by having
liveupdate checks sprayed out all over.
This series doesn't add support for any drivers, but looking at some of
the code we have downstream, I see this problem. The liveupdate code is
all over the place in the driver and it is very hard to wrap one's head
around how the device is actually retrieved.
So I think PCI core should track preserved devices, and if the device is
preserved, it should skip the probe and wait for retrieve. Retrieve does
the full initialization of the device. This fits in with the LUO model
as well. You can make retrieve a callback of struct pci_driver and do
some wrappers to talk with LUO, so device drivers don't directly
interface with LUO at all.
We should do similar things on the shutdown path. Shutdown is a
fundamentally different operation from freeze, and so we should separate
them out as well.
This solves the lifetime problem as well. When PCI core is initializing,
it knows for sure that no retrievals are going to happen. That's because
none of the drivers have registered yet. So it can safely access the FLB
and initialize its state. After that, drivers can register themselves
and start accepting retrieve() calls. Once the last driver goes away,
the FLB is freed automatically.
I am sorry for suggesting a big refactor at v6, but the early versions
looked good to me at the time, and I only thought more deeply about this
when trying to figure out how we can make the lifetimes cleaner.
What do you think? Does this make sense?
--
Regards,
Pratyush Yadav
next prev parent reply other threads:[~2026-06-25 14:35 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 20:23 [PATCH v6 00/12] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-05-22 20:23 ` [PATCH v6 01/12] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-05-22 20:43 ` sashiko-bot
2026-05-22 21:37 ` David Matlack
2026-05-23 11:17 ` Mike Rapoport
2026-05-25 16:50 ` Pratyush Yadav
2026-06-05 5:41 ` Pranjal Shrivastava
2026-06-08 20:51 ` David Matlack
2026-06-09 10:45 ` Pranjal Shrivastava
2026-06-12 5:15 ` Pasha Tatashin
2026-06-12 6:54 ` Mike Rapoport
2026-06-12 10:47 ` Pasha Tatashin
2026-06-15 22:19 ` David Matlack
2026-06-18 13:30 ` Pranjal Shrivastava
2026-06-18 16:35 ` Pratyush Yadav
2026-06-15 22:22 ` David Matlack
2026-06-17 21:44 ` Pasha Tatashin
2026-05-22 20:24 ` [PATCH v6 02/12] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
2026-05-22 20:54 ` sashiko-bot
2026-05-22 21:28 ` David Matlack
2026-06-05 6:26 ` Pranjal Shrivastava
2026-06-05 6:11 ` Pranjal Shrivastava
2026-06-12 11:38 ` Pasha Tatashin
2026-05-22 20:24 ` [PATCH v6 03/12] PCI: liveupdate: Track incoming " David Matlack
2026-05-22 21:13 ` sashiko-bot
2026-05-22 21:34 ` David Matlack
2026-06-06 10:08 ` Pranjal Shrivastava
2026-06-08 20:57 ` David Matlack
2026-06-09 10:48 ` Pranjal Shrivastava
2026-06-14 13:38 ` Pasha Tatashin
2026-06-15 20:29 ` David Matlack
2026-06-17 22:07 ` David Matlack
2026-06-25 14:35 ` Pratyush Yadav [this message]
2026-06-16 20:09 ` Samiullah Khawaja
2026-06-16 22:20 ` David Matlack
2026-06-16 22:38 ` Samiullah Khawaja
2026-05-22 20:24 ` [PATCH v6 04/12] PCI: liveupdate: Document driver binding responsibilities David Matlack
2026-05-25 15:35 ` Pratyush Yadav
2026-06-06 10:20 ` Pranjal Shrivastava
2026-06-14 13:41 ` Pasha Tatashin
2026-06-23 16:43 ` Samiullah Khawaja
2026-05-22 20:24 ` [PATCH v6 05/12] PCI: liveupdate: Keep bus numbers constant during Live Update David Matlack
2026-05-22 21:08 ` sashiko-bot
2026-05-22 21:31 ` David Matlack
2026-06-06 11:10 ` Pranjal Shrivastava
2026-06-14 14:01 ` Pasha Tatashin
2026-06-14 13:57 ` Pasha Tatashin
2026-06-15 20:20 ` David Matlack
2026-06-23 17:06 ` Samiullah Khawaja
2026-05-22 20:24 ` [PATCH v6 06/12] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
2026-06-06 22:15 ` Pranjal Shrivastava
2026-06-08 21:34 ` David Matlack
2026-06-09 11:15 ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 07/12] PCI: Refactor matching logic for pci_dev_acs_ops David Matlack
2026-06-07 19:01 ` Pranjal Shrivastava
2026-06-08 21:49 ` David Matlack
2026-06-09 10:56 ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 08/12] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
2026-06-07 20:37 ` Pranjal Shrivastava
2026-06-08 10:49 ` Pranjal Shrivastava
2026-06-08 18:16 ` Jason Gunthorpe
2026-06-09 15:12 ` Pranjal Shrivastava
2026-06-09 15:34 ` Pranjal Shrivastava
2026-06-08 21:56 ` David Matlack
2026-06-09 17:20 ` Pranjal Shrivastava
2026-06-09 18:40 ` David Matlack
2026-06-09 19:25 ` Pranjal Shrivastava
2026-06-10 0:07 ` Jason Gunthorpe
2026-06-10 14:37 ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 09/12] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
2026-05-22 21:51 ` sashiko-bot
2026-06-08 11:33 ` Pranjal Shrivastava
2026-06-08 18:19 ` Jason Gunthorpe
2026-05-22 20:24 ` [PATCH v6 10/12] PCI: liveupdate: Freeze preservation status during shutdown David Matlack
2026-06-08 11:47 ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 11/12] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-06-08 11:58 ` Pranjal Shrivastava
2026-05-22 20:24 ` [PATCH v6 12/12] Documentation: PCI: Add documentation for Live Update David Matlack
2026-06-08 12:01 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2vxzechulmcp.fsf@kernel.org \
--to=pratyush@kernel.org \
--cc=ajayachandra@nvidia.com \
--cc=alex@shazbot.org \
--cc=bhelgaas@google.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=jacob.pan@linux.microsoft.com \
--cc=jgg@nvidia.com \
--cc=jrhilke@google.com \
--cc=kexec@lists.infradead.org \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=praan@google.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=skhawaja@google.com \
--cc=vipinsh@google.com \
--cc=witu@nvidia.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox