linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Hypervisor Live Update] Notes from August 11, 2025
@ 2025-08-24 20:03 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-08-24 20:03 UTC (permalink / raw)
  To: Alexander Graf, Anthony Yznaga, Dave Hansen, David Hildenbrand,
	David Matlack, Frank van der Linden, James Gowans,
	Jason Gunthorpe, Junaid Shahid, Mike Rapoport, Pankaj Gupta,
	Pasha Tatashin, Pratyush Yadav, Praveen Kumar, Vipin Sharma,
	Vishal Annapurve, Woodhouse, David
  Cc: linux-mm, kexec

Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, August 11.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
We discussed the current status of LUO v3 which was out for review.  Pasha
had pinged akpm about merging this into mm-unstable.  There were several
changes from v2 -> v3 as a result from discussion upstream.  Specifically
there was discussion about how file descriptors could be orchesrated
independently from each other; individual fds could be moved into a
prepare state before the global state changed to prepare.

There were also changes so that only one userspace agent could control LUO
at at time.  There were also design changes around IOMMU API based on
feedback from Jason (making it more similar to iommufd for extensibility
in the future).

----->o-----
We discussed Live Update Orchestrator Daemon (luod) and the external
design doc[1] that was shared upstream; this includes a description of the
luod lifecycle (started early, cannot be killed by systemd, and continues
after the reboot phase).

We discussed sessions in luod.  Previous discussions suggested that this
should be managed in the kernel in LUO but there was a realization that
they could be fully managed in userspace through luod itself.  This is
intuitive since there is only one agent that can control LUO at any given
time.  Any user that wants to preserve state with LUO would then create a
session through luod (its UUID is a 128 bit value).  This is required on
the other side of the live update to identify the session.

Imagine VMM establishing a session and then luod monitoing that connection
so that when the connection dies, everything is removed from that session.
If the VMM quits but still wants to preserve across LUO, it can commit
everything to a session so that everything associated with the VMM can be
preserved (after entering the prepare phase).  This also ensures that on
the other side of the live update that the client has to provide the
session ID to ensure it is connected to the right data.

Non-privileged processes can also preserve state if the admin process
makes the sock accessible to non-privileged processes such as the VMM.

luoctl was described as the CLI for interacting with luod.  This allowed
dumping the state in json format including for debugging purposes.

Pasha expressed a general concern about security including for a
compromised host; he did not believe that LUO should enforce that security
boundary but rather wanted to have a security review that would help to
establish luod's role for security.

There was no implementation to go with this at this point, it was only in
the design phase and people were highly encouraged to provide feedback for
its design.

----->o-----
Chris Li discussed PCI preservation and the latest RFC patch series that
was posted.  He received great feedback on the mailing list for RFC v1
especially from Jason, including for how to handle config registers.
Jason suggested starting from preserving as minimal state as possible 
rather than preserving everything from the start (MSI config registers
should likely be recreated on the other side of the live update).  His
suggestion was to minimize the preservation state as much as possible.

Jason said one of the biggest open blockers was how the kernel would
resynchronize with all the stuff that wasn't reinitialized.  For example,
for MSI, we save the registers but there was a question about what happens
next time someone allocates an interrupt and what happens then.  Jason
suggested that the interrupts should be reset by the new kernel on the
other side of the live update (interrupt preservation was never the plan
so far).

Vipin Sharma asked about how MSI preservation was never the plan; he
suggested that KVM be able to check that some interrupt happened.  Jason
said the plan to date has been that interrupts are lost during KHO.  David
Matlack echoed this, that the plan for now is that any interrupts during
live update are lost.  In the future, posted interrupt preservation may be
possible but this could be very complex.

There was general feedback given to avoid leaving behind placeholders in
the series that would have to be addressed later and potentially by other
developers.

Chris asked for feedback on the scope of v2 and the minimal viable patch
series to make progress.  Jason suggested preserving the bus master bit.
However, he suggested the very first series of patches should likely focus
on allowing DMA to continue to system memory.  Chris agreed with this.

Chris asked about which PCI devices would be good to start with.  There
was general feedback to start with a network device that does interrupts
and basic DMA (like e1000).  Jason suggested after that, that we shift
toward PCI DMA with an IOMMU present and start talking about how to
preserve the IOMMU configuration.

----->o-----
Pasha brought up the point that in the context of Chris's RFC series that
we have not yet discussed how to pass old data to the new kernel.  He
suggested spending time to design this.  Chris said this was out of scope
for the original patch series but now that the minimal viable approach has
been discussed, it would be possible to discuss this in a future call.
This was not predicted to use the device tree, but likely something closer
to KSTATE.

----->o-----
David Matlack noted that the deadline for proposals for the live update
microconference was September 10[2].

----->o-----
Next meeting will be on Monday, August 25 at 8am PDT (UTC-7), everybody is
welcome: https://meet.google.com/rjn-dmzu-hgq

Topics for the next meeting:

 - discussion on whether these recorded sessions should appear on YouTube
   to share rather than only on the shared drive
 - discussion on latest status of LUO for the upstream kernel after rebase
   on top of 6.17-rc1
 - update on feedback received for luod and the next steps as we head into
   implementation
 - update on the latest status of PCI preservation, registration, and
   initialization
 - [15 min] KSerial serialization protocol designed for exchanging data
   between live update kernels (BTF to extract or deposit data into the
   kernel native C struct with unique member id)
   + overlap with KSTATE and next steps for implementation
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1] https://tinyurl.com/luoddesign
[2] https://lpc.events/event/19/contributions/2004/


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-08-24 20:03 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 20:03 [Hypervisor Live Update] Notes from August 11, 2025 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).