From: Chenghao Duan <duanchenghao@kylinos.cn>
To: Maximilian Heyne <mheyne@amazon.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
Alexander Graf <graf@amazon.com>,
"Andersen, Tycho" <Tycho.Andersen@amd.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Baolu Lu <baolu.lu@linux.intel.com>,
David Hildenbrand <david@kernel.org>,
David Matlack <dmatlack@google.com>,
James Gowans <jgowans@amazon.com>,
Jason Gunthorpe <jgg@nvidia.com>, Mike Rapoport <rppt@kernel.org>,
Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
Pratyush Yadav <pratyush@kernel.org>,
Praveen Kumar <kpraveen.lkml@gmail.com>,
Vipin Sharma <vipinsh@google.com>,
Vishal Annapurve <vannapurve@google.com>,
"Woodhouse, David" <dwmw@amazon.co.uk>,
Luca Boccassi <luca.boccassi@gmail.com>,
Samiullah Khawaja <skhawaja@google.com>,
Jork Loeser <jloeser@linux.microsoft.com>,
linux-mm@kvack.org, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, jianghaoran@kylinos.cn
Subject: Re: [Hypervisor Live Update] Notes from June 1, 2026
Date: Fri, 3 Jul 2026 17:02:41 +0800 [thread overview]
Message-ID: <20260703090241.GA148322@chenghao-pc> (raw)
In-Reply-To: <20260702-pay-effect-93be8ab0@mheyne-amazon>
On Thu, Jul 02, 2026 at 12:00:16PM +0000, Maximilian Heyne wrote:
> On Thu, Jul 02, 2026 at 02:02:02PM +0800, Chenghao Duan wrote:
> > On Sun, Jun 07, 2026 at 12:06:01PM -0400, Pasha Tatashin wrote:
> > > Hi everybody,
> > >
> > > Here are the notes from the Hypervisor Live Update call that happened on
> > > Monday, June 1. Thanks to everybody who was involved!
> > >
> > > These notes are intended to bring people up to speed who could not
> > > attend the call as well as keep the conversation going in between
> > > meetings.
> > >
> > > ----->o-----
> > > LPC 2026 Call for Proposals
> > >
> > > The Call for Proposals for the Live Update Microconference at LPC 2026
> > > is officially open. Please submit your topics and proposals before the
> > > deadline on July 24th.
> > >
> > > https://lore.kernel.org/all/ahcc3Qyuy7Oy03Iq@plex
> > >
> > > ----->o-----
> > > KHO Xarray Implementation & Core Data Structures
> > >
> > > Pratyush is collaborating with Mike on a KHO fallback allocation
> > > strategy for memblock. Alongside this, Pratyush is designing a
> > > serialized, sparse "KHO Xarray" data structure to lift current mapping
> > > restrictions across all three memfd types (shared, hugeTLB, and
> > > guest_memfd). By allowing runtime page faults and allocation tracking
> > > post-preservation, this avoids flat vmalloc array scalability
> > > limitations.
> > >
> > > Potential wider use cases for the KHO Xarray were discussed:
> > > - MSHV sparse bitmap tracking.
> > > - IOMMU page table tracking (Samiullah will evaluate domain/device tree
> > > association fit).
> > > - PCI/VFIO sparse tracking via Bus/Device/Function (BDF) key spaces.
> > >
> > > Slab/Cache Preservation vs. Linked Blocks:
> > > David Matlack noted that using an Xarray page per PCI device would be
> > > too expensive given their small struct sizes. Pratyush suggested
> > > preserving slab caches via dedicated kmem_cache flags to manage small,
> > > arbitrary allocations. As an immediate alternative, Pasha's ongoing LUO
> > > limits refactor series introduces a highly compact block-linked list
> > > structure optimized for runtime file/session tracking. David Matlack
> > > will review if this fits the PCI core tracking requirements.
> > >
> > > ----->o-----
> > > LUO Limit Removal & PCI Core Status
> > >
> > > LUO Refactor: Pasha is updating the LUO series to address Pratyush's
> > > comments (primarily renaming iterator functions) and plans to send out
> > > v2 shortly. Given that LUO is not yet in fleet production, the group
> > > agreed to fast-track this into the upcoming merge window to align with
> > > systemd's fdstore integration.
> > >
> > > PCI Core v6: David Matlack sent out v6 incorporating two critical fixes
> > > spotted by Sachiko regarding get/put semantics and double-retrieval
> > > failures. Review tags from the live update team are needed to help
> > > secure Bjorn's Ack once he returns from vacation next week.
> > >
> > > ----->o-----
> > > IOMMU Persistence & Process Memory
> > >
> > > IOMMU v3: Samiullah is addressing recent review feedback on the IOMMU
> > > persistence series and intends to post v3 by the end of this week. The
> > > associated development roadmap document has received positive
> > > stakeholder attention.
> > >
> > > CRIU & vm_splice: Maximilian's investigation into optimizing vm_splice
> > > for copy-less data preservation remains deferred but remains in the
> > > pipeline, with potential future collaboration with Google's tmpfs splice
> > > efforts.
> >
> > I’ve also been researching a combination solution integrating CRIU and
> > KHO. My approach stores all image data dumped by CRIU into memfd, then
> > persists those memfd objects via KHO/LUO.
>
> I've experimented with exactly the same approach plus if a process
> already has memfd's, don't dump (to yet another memfd) but preserve this
> memfd directly via KHO.
>
> >
> > I’ve reviewed the historical meeting notes and would like to clarify:
> > does the CRIU solution discussed in the meetings aim to save the full
> > set of a process’s metadata and data, or only the anonymous memory and
> > shared memory allocated during the process runtime?
>
> I've been the only one bringing this up in the meetings and my idea is
> to enable a reboot with negigible process downtime. So save the state of
> a process, kexec and resume the process. Currently, preservation and
> restorations are quite slow when processes have a lot of anonymous
> memory as this needs to be moved to a preservable memfd first. So what
> I'm researching is how I can convert anonymous memory efficiently into
> something that can be preserved (currently memfd).
>
> And to answer your question, I'd say save data and metadata in memfd's.
> As the alternative would be to save the metadata on disk which would be
> slow.
>
Thank you very much for your explanation.
I know vmsplice enables zero-copy operations on user memory pages. When
used together with splice, this syscall can establish references between
user pages and file descriptors. Could this be the optimal solution to
our current performance bottlenecks?
Or are there any other better alternatives already available?
Chenghao
> >
> > Chenghao
> > >
> > > ----->o-----
> > > guest_memfd Enlightenment & VMM Documentation
> > >
> > > Tarun debriefed the community on his upstream presentation regarding the
> > > initial guest_memfd preservation patch series (currently covering fully
> > > shared mappings with page-sized folios).
> > >
> > > Key design and architecture alignments include:
> > > - VM File Association: guest_memfd requires an active 'struct kvm'
> > > context to be retrieved. VMMs must preserve the parent VM file
> > > alongside guest_memfd, using LUO tokens to re-link them on the
> > > incoming kernel path. This sets the stage for future private
> > > mapping/secure EPT table tracking.
> > > - Relaxed Fault Logic: The group agreed to drop strict upfront pre-fault
> > > checks. Instead, standard runtime page-fault semantics will apply. If
> > > a guest page fault occurs post-preservation, it will bubble up via
> > > standard KVM_RUN ioctl exits to the VMM, which can safely pause vCPUs
> > > and retry the fault post-kexec.
> > > - Centralized VMM Documentation: Pasha and David Matlack proposed
> > > creating a centralized guide under live_update/vmm detailing the
> > > overall live update flow, timing constraints, and subsystem
> > > requirements to assist external QEMU and VMM developers.
> > >
> > > ----->o-----
> > > Next meeting will be on Monday, June 15 at 8am PDT (UTC-7), everybody is
> > > welcome: https://meet.google.com/rjn-dmzu-hgq
> > >
> > > Note: I am going to be traveling on June 15th, David Matlack is going to
> > > be hosting it.
> > >
> > > Topics for the next meeting:
> > > - Presentation of VFIO roadmap (Vipin and David Matlack)
> > > - Status of KHO Xarray development and slab preservation feasibility
> > > - Review of PCI core changes v7 and upstream merge coordination
> > > - IOMMU persistence v3 review feedback
> > > - Detailed review of guest_memfd v2 and VMM interaction documentation
> > > - Review and coordination of LPC 2026 Microconference topic submissions
> > > - later: KHO support for Confidential VMs including page table
> > > preservation and pinning
> > > - later: versioning support for luod to negotiate
> > > - later: KHO enlightenment for ASI
> > > - later: update on PCI preservation series and next steps
> > > - later: testing methodology to allow downstream consumers to qualify
> > > that live update works from one version to another
> > > - later: reducing blackout window during live update, including deferred
> > > struct page initialization
> > >
> > > Please let me know if you'd like to propose additional topics for
> > > discussion, thank you!
>
>
>
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597
prev parent reply other threads:[~2026-07-03 9:02 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-07 16:06 [Hypervisor Live Update] Notes from June 1, 2026 Pasha Tatashin
[not found] ` <20260702060202.GA78893@chenghao-pc>
[not found] ` <20260702-pay-effect-93be8ab0@mheyne-amazon>
2026-07-03 9:02 ` Chenghao Duan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703090241.GA148322@chenghao-pc \
--to=duanchenghao@kylinos.cn \
--cc=Tycho.Andersen@amd.com \
--cc=anthony.yznaga@oracle.com \
--cc=baolu.lu@linux.intel.com \
--cc=david@kernel.org \
--cc=dmatlack@google.com \
--cc=dwmw@amazon.co.uk \
--cc=graf@amazon.com \
--cc=jgg@nvidia.com \
--cc=jgowans@amazon.com \
--cc=jianghaoran@kylinos.cn \
--cc=jloeser@linux.microsoft.com \
--cc=kexec@lists.infradead.org \
--cc=kpraveen.lkml@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luca.boccassi@gmail.com \
--cc=mheyne@amazon.de \
--cc=pankaj.gupta.linux@gmail.com \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=rppt@kernel.org \
--cc=skhawaja@google.com \
--cc=vannapurve@google.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox