* [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation
@ 2026-06-22 18:48 Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option Tarun Sahu
` (9 more replies)
0 siblings, 10 replies; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Hello,
This is Non-RFC patch series for guest_memfd preservation. After
having multiple discussion across hypervisor liveupdate meeting,
guest_memfd bi-weekly meeting, the design for the basic support of
guest_memfd preservation is final. This series is going to include
guest_memfd which are fully shared and does not support private mem
and backed by PAGE_SIZE pages.
Steps to test:
1. Compile Kernel with CONFIG_LIVEUPDATE_GUEST_MEMFD=y
2. boot kernel with command line: kho=on liveupdate=on
3. run the following kselftest
$ .selftests/kvm/guest_memfd_preservation_test --stage 1
$ <kexec> --reuse-cmdline
$ .selftests/kvm/guest_memfd_preservation_test --stage 2
NOTE: Assert the following:
$ ls /dev/liveupdate
$ ls /dev/kvm
$ dmesg | grep liveupdate # (should have kvm_vm_luo &&
# guest_memfd_luo handler registered)
The changes are rebased on:
kvm/next + liveupdate/next (merge) + [3] + [4] + [5]
Where,
[3]: luo: conversion of serialized_data to KHOSER_PTR
[4]: luo: APIs to retrieve file internally from session
[5]: selftests: liveupdate sefltests library
Here is the github repo:
https://github.com/tar-unix/linux/tree/gmem-pre
V3 <- RFC V2 [2]
1. Finalize the design
2. resolve sashiko reported bugs
3. Use of KHOSER_PTR instead of raw serialized_data as per [3]
RFC V2 [2] <- RFC V1 [1]
1. Removed mem_attr_array as it is not needed for fully-shared
2. Removed pre-faulted condition
3. Added vm_type preservation for ARM64.
4. Removed liveupdate_get_file_incoming api patch as it is sent
separately [4] by Samiullah.
[1] https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/
[2] https://lore.kernel.org/all/c054ba0fb2639932bbe354420d3f4f84cce84905.1780676742.git.tarunsahu@google.com/
[3] https://lore.kernel.org/all/20260622111215.4157974-1-tarunsahu@google.com/
[4] https://lore.kernel.org/all/20260613012521.835490-1-skhawaja@google.com/
[5] https://lore.kernel.org/all/20260612214512.464146-1-vipinsh@google.com/
Tarun Sahu (9):
liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
kvm: Prepare core VM structs and helpers for LUO support
kvm: kvm_luo: Allow kvm preservation with LUO
kvm: guest_memfd: Move internal definitions and helper to new header
kvm: guest_memfd: Add support for freezing and unfreezing mappings
kvm: guest_memfd_luo: add support for guest_memfd preservation
docs: add documentation for guest_memfd preservation via LUO
selftests: kvm: Split ____vm_create() to expose init helpers
selftests: kvm: Add guest_memfd_preservation_test
Documentation/core-api/liveupdate.rst | 1 +
Documentation/liveupdate/vmm.rst | 107 ++++
MAINTAINERS | 14 +
include/linux/kho/abi/kvm.h | 106 ++++
include/linux/kvm_host.h | 14 +
kernel/liveupdate/Kconfig | 15 +
tools/testing/selftests/kvm/Makefile.kvm | 6 +-
.../kvm/guest_memfd_preservation_test.c | 236 +++++++++
.../testing/selftests/kvm/include/kvm_util.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 26 +-
virt/kvm/Makefile.kvm | 1 +
virt/kvm/guest_memfd.c | 185 +++++--
virt/kvm/guest_memfd.h | 44 ++
virt/kvm/guest_memfd_luo.c | 497 ++++++++++++++++++
virt/kvm/kvm_luo.c | 195 +++++++
virt/kvm/kvm_main.c | 94 +++-
virt/kvm/kvm_mm.h | 15 +
17 files changed, 1477 insertions(+), 81 deletions(-)
create mode 100644 Documentation/liveupdate/vmm.rst
create mode 100644 include/linux/kho/abi/kvm.h
create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
create mode 100644 virt/kvm/guest_memfd.h
create mode 100644 virt/kvm/guest_memfd_luo.c
create mode 100644 virt/kvm/kvm_luo.c
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support Tarun Sahu
` (8 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Introduce the LIVEUPDATE_GUEST_MEMFD Kconfig option. This option
enables live update support for KVM guest_memfd files, enabling
guest_memfd-backed memory preservation across kernel upgrades.
Currently this support only guest_memfd files that are full-shared
and pre-faulted.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
kernel/liveupdate/Kconfig | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index c13af38..2490f9a 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -86,4 +86,19 @@ config LIVEUPDATE_MEMFD
If unsure, say N.
+config LIVEUPDATE_GUEST_MEMFD
+ bool "Live update support for guest_memfd"
+ depends on LIVEUPDATE
+ depends on KVM_GUEST_MEMFD
+ default LIVEUPDATE
+ help
+ Enable live update support for KVM guest_memfd files. This allows
+ preserving VM Memory backed by guest_memfd file across kernel live
+ updates.
+
+ This can only be used for the guest_memfd that are fully-shared
+ and pre-faulted.
+
+ If unsure, say N.
+
endmenu
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 19:01 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO Tarun Sahu
` (7 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Introduce core infrastructure to support VM preservation with LUO.
First two changes are just refactoring, no functional change, third
change introduces a new member in struct kvm.
- Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code.
- Add a public kvm_create_vm_file() helper wrapping kvm_create_vm()
and anon_inode_getfile() to provide a unified VM file creation API.
- Track a weak reference to the backing file in struct kvm under
CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution
without circular lifetime dependencies.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
include/linux/kvm_host.h | 14 +++++++
virt/kvm/kvm_main.c | 79 +++++++++++++++++++++++++++++-----------
virt/kvm/kvm_mm.h | 3 ++
3 files changed, 75 insertions(+), 21 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfae..cbb5eb9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -874,6 +874,18 @@ struct kvm {
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
/* Protected by slots_lock (for writes) and RCU (for reads) */
struct xarray mem_attr_array;
+#endif
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+ /*
+ * Weak reference to the VFS file backing this KVM instance. Stored
+ * without incrementing the file refcount to prevent a circular lifetime
+ * dependency (since file->private_data already pins this struct kvm).
+ * Used exclusively to resolve the file pointer back from struct kvm.
+ *
+ * Written/cleared via rcu_assign_pointer() and read locklessly under
+ * RCU (e.g. via get_file_active() to prevent ABA races).
+ */
+ struct file *vm_file;
#endif
char stats_id[KVM_STATS_NAME_SIZE];
};
@@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm);
bool kvm_get_kvm_safe(struct kvm *kvm);
void kvm_put_kvm(struct kvm *kvm);
bool file_is_kvm(struct file *file);
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname);
void kvm_put_kvm_no_destroy(struct kvm *kvm);
+void kvm_uevent_notify_vm_create(struct kvm *kvm);
static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
{
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e44c20c..14c3254 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -67,9 +67,6 @@
#include <linux/kvm_dirty_ring.h>
-/* Worst case buffer size needed for holding an integer. */
-#define ITOA_MAX_LEN 12
-
MODULE_AUTHOR("Qumranet");
MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
MODULE_LICENSE("GPL");
@@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
{
struct kvm *kvm = filp->private_data;
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+ /*
+ * Clear the weak reference of the vm file.
+ * In case vm file is closed by userspace, but kvm still has
+ * other users like vCPUs, clearing this pointer ensures
+ * that we don't have a dangling pointer to a closed file.
+ *
+ * Cleared via rcu_assign_pointer() to ensure proper memory visibility
+ * for concurrent lockless readers under RCU.
+ */
+ rcu_assign_pointer(kvm->vm_file, NULL);
+#endif
+
kvm_irqfd_release(kvm);
kvm_put_kvm(kvm);
@@ -5477,11 +5487,47 @@ bool file_is_kvm(struct file *file)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm);
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
+{
+ struct kvm *kvm = kvm_create_vm(type, fdname);
+ struct file *file;
+
+ if (IS_ERR(kvm))
+ return ERR_CAST(kvm);
+
+ file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+ if (IS_ERR(file)) {
+ kvm_put_kvm(kvm);
+ return file;
+ }
+
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+ /*
+ * Weak reference to the file (without get_file()) to prevent a circular
+ * dependency. Safe because the file's release path clears this pointer
+ * and drops its reference to the VM.
+ *
+ * Written via rcu_assign_pointer() because the pointer can be read
+ * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
+ * get_file_active() to prevent lockless ABA races).
+ */
+ rcu_assign_pointer(kvm->vm_file, file);
+#endif
+
+ /*
+ * Don't call kvm_put_kvm anymore at this point; file->f_op is
+ * already set, with ->release() being kvm_vm_release(). In error
+ * cases it will be called by the final fput(file) and will take
+ * care of doing kvm_put_kvm(kvm).
+ */
+
+ return file;
+}
+
static int kvm_dev_ioctl_create_vm(unsigned long type)
{
char fdname[ITOA_MAX_LEN + 1];
int r, fd;
- struct kvm *kvm;
struct file *file;
fd = get_unused_fd_flags(O_CLOEXEC);
@@ -5490,31 +5536,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
snprintf(fdname, sizeof(fdname), "%d", fd);
- kvm = kvm_create_vm(type, fdname);
- if (IS_ERR(kvm)) {
- r = PTR_ERR(kvm);
- goto put_fd;
- }
-
- file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+ file = kvm_create_vm_file(type, fdname);
if (IS_ERR(file)) {
r = PTR_ERR(file);
- goto put_kvm;
+ goto put_fd;
}
- /*
- * Don't call kvm_put_kvm anymore at this point; file->f_op is
- * already set, with ->release() being kvm_vm_release(). In error
- * cases it will be called by the final fput(file) and will take
- * care of doing kvm_put_kvm(kvm).
- */
- kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+ kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data);
fd_install(fd, file);
return fd;
-put_kvm:
- kvm_put_kvm(kvm);
put_fd:
put_unused_fd(fd);
return r;
@@ -6342,6 +6374,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
kfree(env);
}
+void kvm_uevent_notify_vm_create(struct kvm *kvm)
+{
+ kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+}
+
static void kvm_init_debug(void)
{
const struct file_operations *fops;
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 7510ca9..6241617 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -6,6 +6,9 @@
#include <linux/kvm.h>
#include <linux/kvm_types.h>
+/* Worst case buffer size needed for holding an integer as a string. */
+#define ITOA_MAX_LEN 12
+
/*
* Architectures can choose whether to use an rwlock or spinlock
* for the mmu_lock. These macros, for use in common code
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 19:06 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header Tarun Sahu
` (6 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Introduce KVM VM preservation support for Live Update Orchestrator.
Register an LUO file handler for KVM files to serialize and
deserialize necessary VM state across live updates. Currently, this
preserves the VM type. This implementation provides the necessary
infrastructure and dependencies for the upcoming guest_memfd
preservation support. And it can be extended to preserve more vm
state in future.
Retrieve is simply creating the kvm and populate the retrieved data.
Only catch here is there is no way to know which fd is going to be
assigned to this kvm file hence I am using atomically incremented id
for the fdname.
This change also updates the MAINTAINERS list for kvm_luo.c.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
MAINTAINERS | 11 ++
include/linux/kho/abi/kvm.h | 39 ++++++++
virt/kvm/Makefile.kvm | 1 +
virt/kvm/kvm_luo.c | 195 ++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 8 ++
virt/kvm/kvm_mm.h | 8 ++
6 files changed, 262 insertions(+)
create mode 100644 include/linux/kho/abi/kvm.h
create mode 100644 virt/kvm/kvm_luo.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 5dbc8a6..7c000e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14411,6 +14411,17 @@ S: Maintained
F: Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
F: drivers/video/backlight/ktz8866.c
+KVM LIVE UPDATE
+M: Pasha Tatashin <pasha.tatashin@soleen.com>
+M: Mike Rapoport <rppt@kernel.org>
+M: Pratyush Yadav <pratyush@kernel.org>
+R: Tarun Sahu <tarunsahu@google.com>
+L: kexec@lists.infradead.org
+L: kvm@vger.kernel.org
+S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F: virt/kvm/kvm_luo.c
+
KVM PARAVIRT (KVM/paravirt)
M: Paolo Bonzini <pbonzini@redhat.com>
R: Vitaly Kuznetsov <vkuznets@redhat.com>
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
new file mode 100644
index 0000000..718db68
--- /dev/null
+++ b/include/linux/kho/abi/kvm.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM Preservation ABI for Live Update Orchestrator (LUO)
+ */
+#ifndef _LINUX_KHO_ABI_KVM_H
+#define _LINUX_KHO_ABI_KVM_H
+
+#include <linux/types.h>
+#include <linux/kho/abi/kexec_handover.h>
+
+/**
+ * DOC: KVM Live Update ABI
+ *
+ * KVM uses the ABI defined below for preserving its state
+ * across a kexec reboot using the LUO.
+ *
+ * The state is serialized into a packed structure `struct kvm_luo_ser`
+ * which is handed over to the next kernel via the KHO mechanism.
+ *
+ * This interface is a contract. Any modification to the structure layout
+ * constitutes a breaking change. Such changes require incrementing the
+ * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ */
+
+/**
+ * struct kvm_luo_ser - Main serialization structure for a KVM VM.
+ * @type: The type of VM.
+ */
+struct kvm_luo_ser {
+ u64 type;
+} __packed;
+
+/* The compatibility string for KVM VM file handler */
+#define KVM_LUO_FH_COMPATIBLE "kvm_vm_luo_v1"
+
+#endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index d047d4c..c1a9621 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,3 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
diff --git a/virt/kvm/kvm_luo.c b/virt/kvm/kvm_luo.c
new file mode 100644
index 0000000..6728877
--- /dev/null
+++ b/virt/kvm/kvm_luo.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM VM Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: KVM VM Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * KVM virtual machines (VMs) can be preserved over a kexec reboot using the
+ * Live Update Orchestrator (LUO) file preservation. This allows userspace
+ * to preserve KVM VM state across kexec reboots.
+ *
+ * The preservation is not intended to be fully transparent. Only specific
+ * VM configuration and state are preserved, while other aspects of the VM
+ * must be re-established or re-configured by userspace after retrieval.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of the KVM VM are preserved across kexec:
+ *
+ * VM Type
+ * The VM type (e.g., on x86 architecture, the vm_type parameter) is
+ * preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * The preservation does not cover:
+ *
+ * - vCPUs and vCPU states
+ * - Memspots / Memory slot layout (memslots)
+ * - Interrupt controllers and IRQ routings
+ * - Coalesced MMIO zones
+ * - Device bindings (VFIO/Eventfds)
+ * - Active paging or guest registers state
+ * - etc
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "kvm_mm.h"
+
+static bool kvm_luo_can_preserve(struct liveupdate_file_handler *handler,
+ struct file *file)
+{
+ return file_is_kvm(file);
+}
+
+static int kvm_luo_preserve(struct liveupdate_file_op_args *args)
+{
+ DECLARE_KHOSER_PTR(sd, struct kvm_luo_ser *);
+ struct kvm *kvm = args->file->private_data;
+ struct kvm_luo_ser *ser;
+
+ if (kvm->vm_dead || kvm->vm_bugged)
+ return -EINVAL;
+
+ ser = kho_alloc_preserve(sizeof(*ser));
+ if (IS_ERR(ser))
+ return PTR_ERR(ser);
+
+#if defined(CONFIG_X86)
+ ser->type = kvm->arch.vm_type;
+#elif defined(CONFIG_ARM64)
+ ser->type = kvm_phys_shift(&kvm->arch.mmu);
+ if (kvm_vm_is_protected(kvm))
+ ser->type |= KVM_VM_TYPE_ARM_PROTECTED;
+
+#else
+ ser->type = 0;
+#endif
+
+ KHOSER_STORE_PTR(sd, ser);
+ KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+
+ return 0;
+}
+
+static atomic_t restored_vm_id = ATOMIC_INIT(0);
+
+static int kvm_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+ char fdname[ITOA_MAX_LEN + 1];
+ struct kvm_luo_ser *ser;
+ struct file *file;
+ struct kvm *kvm;
+ int err = 0;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (!ser)
+ return -EINVAL;
+
+ snprintf(fdname, sizeof(fdname), "%d",
+ atomic_inc_return(&restored_vm_id));
+
+ file = kvm_create_vm_file(ser->type, fdname);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto err_free_ser;
+ }
+
+ kvm = file->private_data;
+
+ args->file = file;
+ kho_restore_free(ser);
+
+ kvm_uevent_notify_vm_create(kvm);
+ return 0;
+
+err_free_ser:
+ kho_restore_free(ser);
+ return err;
+}
+
+static void kvm_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+ struct kvm_luo_ser *ser;
+
+ /*
+ * in case preservation failed, args->serialized_data will
+ * be NULL and kvm_luo_preserve takes care of cleaning up.
+ * If preserve succeeds, this condition fails and unpreserve
+ * function takes care of cleaning up.
+ */
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (WARN_ON_ONCE(!ser))
+ return;
+
+ kho_unpreserve_free(ser);
+}
+
+static void kvm_luo_finish(struct liveupdate_file_op_args *args)
+{
+ struct kvm_luo_ser *ser;
+
+ /*
+ * If retrieve_status is true or set to error, nothing to do here.
+ * Already cleaned up in kvm_luo_retrieve().
+ */
+ if (args->retrieve_status)
+ return;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (!ser)
+ return;
+
+ kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_luo_file_ops = {
+ .can_preserve = kvm_luo_can_preserve,
+ .preserve = kvm_luo_preserve,
+ .retrieve = kvm_luo_retrieve,
+ .unpreserve = kvm_luo_unpreserve,
+ .finish = kvm_luo_finish,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_luo_handler = {
+ .ops = &kvm_luo_file_ops,
+ .compatible = KVM_LUO_FH_COMPATIBLE,
+};
+
+int kvm_luo_init(void)
+{
+ int err = liveupdate_register_file_handler(&kvm_luo_handler);
+
+ if (err && err != -EOPNOTSUPP) {
+ pr_err("Could not register kvm_vm_luo handler: %pe\n", ERR_PTR(err));
+ return err;
+ }
+
+ return 0;
+}
+
+void kvm_luo_exit(void)
+{
+ liveupdate_unregister_file_handler(&kvm_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14c3254..d9c3dd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6577,6 +6577,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
if (r)
goto err_virt;
+ r = kvm_luo_init();
+ if (r)
+ goto err_luo;
+
/*
* Registration _must_ be the very last thing done, as this exposes
* /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6590,6 +6594,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
return 0;
err_register:
+ kvm_luo_exit();
+err_luo:
kvm_uninit_virtualization();
err_virt:
kvm_gmem_exit();
@@ -6619,6 +6625,8 @@ void kvm_exit(void)
*/
misc_deregister(&kvm_dev);
+ kvm_luo_exit();
+
kvm_uninit_virtualization();
debugfs_remove_recursive(kvm_debugfs_dir);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 6241617..8719871 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -100,4 +100,12 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
}
#endif /* CONFIG_KVM_GUEST_MEMFD */
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+int kvm_luo_init(void);
+void kvm_luo_exit(void);
+#else
+static inline int kvm_luo_init(void) { return 0; }
+static inline void kvm_luo_exit(void) {}
+#endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
+
#endif /* __KVM_MM_H__ */
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (2 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings Tarun Sahu
` (5 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
To support guest_memfd memory preservation with LUO, guest_memfd luo
code needs to access guest_memfd internals and reconstruct guest_memfd
file instances from a preserved state.
Extract gmem_file, gmem_inode, and the GMEM_I() helper from guest_memfd.c
into a new internal header virt/kvm/guest_memfd.h.
Additionally, split __kvm_gmem_create() to expose a non-static
__kvm_gmem_create_file() helper. This helper returns a struct file
instead of a file descriptor, enabling file creation and initialization
without installing it into a file descriptor table.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
virt/kvm/guest_memfd.c | 68 +++++++++++++++++-------------------------
virt/kvm/guest_memfd.h | 39 ++++++++++++++++++++++++
2 files changed, 67 insertions(+), 40 deletions(-)
create mode 100644 virt/kvm/guest_memfd.h
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8669068..fe1adc9b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,38 +7,12 @@
#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
+#include "guest_memfd.h"
#include "kvm_mm.h"
static struct vfsmount *kvm_gmem_mnt;
-/*
- * A guest_memfd instance can be associated multiple VMs, each with its own
- * "view" of the underlying physical memory.
- *
- * The gmem's inode is effectively the raw underlying physical storage, and is
- * used to track properties of the physical memory, while each gmem file is
- * effectively a single VM's view of that storage, and is used to track assets
- * specific to its associated VM, e.g. memslots=>gmem bindings.
- */
-struct gmem_file {
- struct kvm *kvm;
- struct xarray bindings;
- struct list_head entry;
-};
-
-struct gmem_inode {
- struct shared_policy policy;
- struct inode vfs_inode;
- struct list_head gmem_file_list;
-
- u64 flags;
-};
-
-static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
-{
- return container_of(inode, struct gmem_inode, vfs_inode);
-}
#define kvm_gmem_for_each_file(f, inode) \
list_for_each_entry(f, &GMEM_I(inode)->gmem_file_list, entry)
@@ -557,23 +531,17 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
return true;
}
-static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags)
{
static const char *name = "[kvm-gmem]";
struct gmem_file *f;
struct inode *inode;
struct file *file;
- int fd, err;
-
- fd = get_unused_fd_flags(0);
- if (fd < 0)
- return fd;
+ int err;
f = kzalloc_obj(*f);
- if (!f) {
- err = -ENOMEM;
- goto err_fd;
- }
+ if (!f)
+ return ERR_PTR(-ENOMEM);
/* __fput() will take care of fops_put(). */
if (!fops_get(&kvm_gmem_fops)) {
@@ -612,8 +580,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
xa_init(&f->bindings);
list_add(&f->entry, &GMEM_I(inode)->gmem_file_list);
- fd_install(fd, file);
- return fd;
+ return file;
err_inode:
iput(inode);
@@ -621,7 +588,28 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
fops_put(&kvm_gmem_fops);
err_gmem:
kfree(f);
-err_fd:
+ return ERR_PTR(err);
+}
+
+static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+{
+ struct file *file;
+ int fd, err;
+
+ fd = get_unused_fd_flags(0);
+ if (fd < 0)
+ return fd;
+
+ file = __kvm_gmem_create_file(kvm, size, flags);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto err_put_fd;
+ }
+
+ fd_install(fd, file);
+ return fd;
+
+err_put_fd:
put_unused_fd(fd);
return err;
}
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
new file mode 100644
index 0000000..c528b04
--- /dev/null
+++ b/virt/kvm/guest_memfd.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_GUEST_MEMFD_H__
+#define __KVM_GUEST_MEMFD_H__ 1
+
+#include <linux/kvm_host.h>
+#include <linux/fs.h>
+#include <linux/mempolicy.h>
+
+/*
+ * A guest_memfd instance can be associated multiple VMs, each with its own
+ * "view" of the underlying physical memory.
+ *
+ * The gmem's inode is effectively the raw underlying physical storage, and is
+ * used to track properties of the physical memory, while each gmem file is
+ * effectively a single VM's view of that storage, and is used to track assets
+ * specific to its associated VM, e.g. memslots=>gmem bindings.
+ */
+struct gmem_file {
+ struct kvm *kvm;
+ struct xarray bindings;
+ struct list_head entry;
+};
+
+struct gmem_inode {
+ struct shared_policy policy;
+ struct inode vfs_inode;
+ struct list_head gmem_file_list;
+
+ u64 flags;
+};
+
+static inline struct gmem_inode *GMEM_I(struct inode *inode)
+{
+ return container_of(inode, struct gmem_inode, vfs_inode);
+}
+
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+
+#endif /* __KVM_GUEST_MEMFD_H__ */
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (3 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 19:01 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation Tarun Sahu
` (4 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
This patch introduces the freeze on gmem_inode which prevents
the fallocate call and any new page fault allocation. This will avoid
gmem file modification when it is being preserved
Used srcu lock to synchronise the freeze call, where write blocks
until all the reads are free. And reads are re-entrant.
Incase fault fails, It return -EPERM and VM_EXIT to userspace. userspace
must handle this properly as every new fault will fail.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
virt/kvm/guest_memfd.c | 117 +++++++++++++++++++++++++++++++++++++----
virt/kvm/guest_memfd.h | 5 ++
2 files changed, 111 insertions(+), 11 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fe1adc9b..a4d9d34 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,11 +7,13 @@
#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
+#include <linux/srcu.h>
#include "guest_memfd.h"
#include "kvm_mm.h"
static struct vfsmount *kvm_gmem_mnt;
+static struct srcu_struct kvm_gmem_freeze_srcu;
#define kvm_gmem_for_each_file(f, inode) \
@@ -96,6 +98,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
/* TODO: Support huge pages. */
struct mempolicy *policy;
struct folio *folio;
+ int idx;
/*
* Fast-path: See if folio is already present in mapping to avoid
@@ -105,12 +108,20 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
if (!IS_ERR(folio))
return folio;
+ idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+ if (kvm_gmem_is_frozen(inode)) {
+ srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+ return ERR_PTR(-EPERM);
+ }
+
policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
folio = __filemap_get_folio_mpol(inode->i_mapping, index,
FGP_LOCK | FGP_CREAT,
mapping_gfp_mask(inode->i_mapping), policy);
mpol_cond_put(policy);
+ srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+
/*
* External interfaces like kvm_gmem_get_pfn() support dealing
* with hugepages to a degree, but internally, guest_memfd currently
@@ -273,16 +284,30 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
loff_t len)
{
+ struct inode *inode = file_inode(file);
int ret;
+ int idx;
- if (!(mode & FALLOC_FL_KEEP_SIZE))
- return -EOPNOTSUPP;
+ idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+ if (kvm_gmem_is_frozen(inode)) {
+ srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+ return -EPERM;
+ }
- if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
- return -EOPNOTSUPP;
+ if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
- if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
- return -EINVAL;
+ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
+ ret = -EINVAL;
+ goto out;
+ }
if (mode & FALLOC_FL_PUNCH_HOLE)
ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
@@ -291,6 +316,9 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
if (!ret)
file_modified(file);
+
+out:
+ srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
return ret;
}
@@ -948,7 +976,9 @@ static void kvm_gmem_destroy_inode(struct inode *inode)
static void kvm_gmem_free_inode(struct inode *inode)
{
- kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
+ struct gmem_inode *gi = GMEM_I(inode);
+
+ kmem_cache_free(kvm_gmem_inode_cachep, gi);
}
static const struct super_operations kvm_gmem_super_operations = {
@@ -1005,12 +1035,21 @@ int kvm_gmem_init(struct module *module)
if (!kvm_gmem_inode_cachep)
return -ENOMEM;
+ ret = init_srcu_struct(&kvm_gmem_freeze_srcu);
+ if (ret)
+ goto err_cache;
+
ret = kvm_gmem_init_mount();
- if (ret) {
- kmem_cache_destroy(kvm_gmem_inode_cachep);
- return ret;
- }
+ if (ret)
+ goto err_srcu;
+
return 0;
+
+err_srcu:
+ cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
+err_cache:
+ kmem_cache_destroy(kvm_gmem_inode_cachep);
+ return ret;
}
void kvm_gmem_exit(void)
@@ -1018,5 +1057,61 @@ void kvm_gmem_exit(void)
kern_unmount(kvm_gmem_mnt);
kvm_gmem_mnt = NULL;
rcu_barrier();
+ cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
kmem_cache_destroy(kvm_gmem_inode_cachep);
}
+
+/**
+ * kvm_gmem_freeze - Freeze or unfreeze a guest_memfd inode mapping.
+ * @inode: The guest_memfd inode.
+ * @freeze: True to freeze, false to unfreeze.
+ *
+ * This API is used strictly during the live update / preservation transition
+ * window to prevent host userspace and guest-side faults from making any
+ * mapping modifications (such as fallocate or page fault allocation)
+ * to the guest_memfd page cache.
+ *
+ * Synchronization Strategy (Sleepable RCU):
+ * To avoid high-contention VFS locks (like inode_lock or
+ * filemap_invalidate_lock) on the vCPU page fault hot paths, this subsystem
+ * implements a lightweight, system-wide Sleepable RCU (SRCU) mechanism
+ * (`kvm_gmem_freeze_srcu`):
+ *
+ * Global vs. Per-Inode SRCU
+ * ======================
+ * A single system-wide global static `srcu_struct` is used instead of a
+ * per-inode SRCU structure to completely prevent unprivileged users from
+ * exhausting the host's per-CPU memory allocator. Because
+ * `init_srcu_struct()` allocates per-CPU memory via `alloc_percpu()`, which
+ * is not accounted by memory cgroups (memcg),
+ * a per-inode SRCU structure would allow a tenant to bypass cgroup limits and
+ * trigger a system-wide Out-of-Memory (OOM) crash simply by spawning a large
+ * number of guest_memfd file descriptors (bounded only by RLIMIT_NOFILE).
+ *
+ * Flag Modification Note:
+ * Since `GUEST_MEMFD_F_MAPPING_FROZEN` is the ONLY flag in
+ * `GMEM_I(inode)->flags` that is mutated dynamically at runtime (all other
+ * flags are creation-time flags which remain strictly read-only), there is
+ * no possibility of concurrent bit-modification races. Therefore, a standard
+ * `WRITE_ONCE` is fully safe and does not require complex `cmpxchg`
+ * synchronization loops.
+ */
+void kvm_gmem_freeze(struct inode *inode, bool freeze)
+{
+ u64 flags = READ_ONCE(GMEM_I(inode)->flags);
+
+ if (freeze)
+ flags |= GUEST_MEMFD_F_MAPPING_FROZEN;
+ else
+ flags &= ~GUEST_MEMFD_F_MAPPING_FROZEN;
+
+ WRITE_ONCE(GMEM_I(inode)->flags, flags);
+
+ if (freeze)
+ synchronize_srcu(&kvm_gmem_freeze_srcu);
+}
+
+bool kvm_gmem_is_frozen(struct inode *inode)
+{
+ return READ_ONCE(GMEM_I(inode)->flags) & GUEST_MEMFD_F_MAPPING_FROZEN;
+}
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
index c528b04..028c348 100644
--- a/virt/kvm/guest_memfd.h
+++ b/virt/kvm/guest_memfd.h
@@ -29,11 +29,16 @@ struct gmem_inode {
u64 flags;
};
+/* Internal kernel-only flags (must not overlap with UAPI flags) */
+#define GUEST_MEMFD_F_MAPPING_FROZEN (1ULL << 63)
+
static inline struct gmem_inode *GMEM_I(struct inode *inode)
{
return container_of(inode, struct gmem_inode, vfs_inode);
}
struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+void kvm_gmem_freeze(struct inode *inode, bool freeze);
+bool kvm_gmem_is_frozen(struct inode *inode);
#endif /* __KVM_GUEST_MEMFD_H__ */
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (4 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 19:08 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO Tarun Sahu
` (3 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
This patch sets up the basic infrastructure to preserve the guest_memfd.
Currently this supports only fully shared guest_memfd and backed by
PAGE_SIZE pages.
It uses INIT_SHARED flag to check its shareability and
kvm_arch_has_private_mem to check if the conversion of memory to private
is not supported.
Preservation is straight forward. It walks through the folios and
serialize them.
There is kvm_gmem_freeze call on preserve which freeze the guest_memfd
inode. It avoids any changes to inode mapping with fallocate calls and
also fails any new fault allocation on or after preservation.
This change also update the MAINTAINERS list.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
MAINTAINERS | 1 +
include/linux/kho/abi/kvm.h | 79 +++++-
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/guest_memfd_luo.c | 497 ++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 7 +
virt/kvm/kvm_mm.h | 4 +
6 files changed, 583 insertions(+), 7 deletions(-)
create mode 100644 virt/kvm/guest_memfd_luo.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 7c000e6..d1d699ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L: kexec@lists.infradead.org
L: kvm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F: virt/kvm/guest_memfd_luo.c
F: virt/kvm/kvm_luo.c
KVM PARAVIRT (KVM/paravirt)
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
index 718db68..42074d7 100644
--- a/include/linux/kho/abi/kvm.h
+++ b/include/linux/kho/abi/kvm.h
@@ -9,20 +9,23 @@
#define _LINUX_KHO_ABI_KVM_H
#include <linux/types.h>
+#include <linux/bits.h>
#include <linux/kho/abi/kexec_handover.h>
/**
- * DOC: KVM Live Update ABI
+ * DOC: KVM and guest_memfd Live Update ABI
*
- * KVM uses the ABI defined below for preserving its state
+ * KVM and guest_memfd use the ABI defined below for preserving their states
* across a kexec reboot using the LUO.
*
- * The state is serialized into a packed structure `struct kvm_luo_ser`
- * which is handed over to the next kernel via the KHO mechanism.
+ * The state is serialized into packed structures (struct kvm_luo_ser and
+ * struct guest_memfd_luo_ser) which are handed over to the next kernel via
+ * the KHO mechanism.
*
- * This interface is a contract. Any modification to the structure layout
+ * This interface is a contract. Any modification to the structure layouts
* constitutes a breaking change. Such changes require incrementing the
- * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ * version number in the KVM_LUO_FH_COMPATIBLE or
+ * GUEST_MEMFD_LUO_FH_COMPATIBLE compatibility strings.
*/
/**
@@ -36,4 +39,68 @@ struct kvm_luo_ser {
/* The compatibility string for KVM VM file handler */
#define KVM_LUO_FH_COMPATIBLE "kvm_vm_luo_v1"
+/**
+ * struct guest_memfd_luo_folio_ser - Serialization layout for a single folio in guest_memfd.
+ * @pfn: Page Frame Number of the folio.
+ * @index: Page offset of the folio within the file.
+ * @flags: State flags associated with the folio.
+ */
+struct guest_memfd_luo_folio_ser {
+ u64 pfn:52;
+ u64 flags:12;
+ u64 index;
+} __packed;
+
+/**
+ * GUEST_MEMFD_LUO_FOLIO_UPTODATE - The folio is up-to-date.
+ *
+ * This flag is per folio to check if the folio is uptodate.
+ */
+#define GUEST_MEMFD_LUO_FOLIO_UPTODATE BIT(0)
+
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_MMAP - The guest_memfd supports mmap.
+ *
+ * This flag indicates that the guest_memfd supports host-side mmap.
+ */
+#define GUEST_MEMFD_LUO_FLAG_MMAP BIT(0)
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_INIT_SHARED - Initialize memory as shared.
+ *
+ * This flag indicates that the guest_memfd has been initialized as shared
+ * memory.
+ */
+#define GUEST_MEMFD_LUO_FLAG_INIT_SHARED BIT(1)
+
+/**
+ * GUEST_MEMFD_LUO_SUPPORTED_FLAGS - Supported guest_memfd LUO flags mask.
+ *
+ * A mask of all guest_memfd preservation flags supported by this version
+ * of the KVM LUO ABI.
+ */
+#define GUEST_MEMFD_LUO_SUPPORTED_FLAGS (GUEST_MEMFD_LUO_FLAG_MMAP | \
+ GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+
+/**
+ * struct guest_memfd_luo_ser - Main serialization structure for guest_memfd.
+ * @size: The size of the file in bytes.
+ * @flags: File-level flags.
+ * @nr_folios: Number of folios in the folios array.
+ * @vm_token: Token of the associated KVM VM instance.
+ * @folios: KHO vmalloc descriptor pointing to the array of
+ * struct guest_memfd_luo_folio_ser.
+ */
+struct guest_memfd_luo_ser {
+ u64 size;
+ u64 flags;
+ u64 nr_folios;
+ u64 vm_token;
+ struct kho_vmalloc folios;
+} __packed;
+
+/* The compatibility string for GUEST_MEMFD file handler */
+#define GUEST_MEMFD_LUO_FH_COMPATIBLE "guest_memfd_luo_v1"
+
#endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index c1a9621..d30fca0 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,4 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
-kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/guest_memfd_luo.o $(KVM)/kvm_luo.o
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
new file mode 100644
index 0000000..c242b1d
--- /dev/null
+++ b/virt/kvm/guest_memfd_luo.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * Guestmemfd Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: Guestmemfd Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * Guest memory file descriptors (guest_memfd) can be preserved over a kexec
+ * reboot using the Live Update Orchestrator (LUO) file preservation. This
+ * allows userspace to preserve VM memory across kexec reboots.
+ *
+ * The preservation is not intended to be transparent. Only select properties
+ * of the guest_memfd are preserved, while others are reset to default.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of guest_memfd are preserved across kexec:
+ *
+ * File Size
+ * The size of the file is preserved.
+ *
+ * File Contents
+ * All folios present in the page cache are preserved.
+ *
+ * File-level Flags
+ * The file-level flags (such as MMAP support and INIT_SHARED default mapping)
+ * are preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * NUMA Memory Policy
+ * NUMA memory policies associated with the guest_memfd are not preserved.
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "guest_memfd.h"
+#include "kvm_mm.h"
+
+
+static int kvm_gmem_luo_walk_folios(struct address_space *mapping,
+ pgoff_t end_index, struct guest_memfd_luo_folio_ser *folios_ser,
+ u64 *out_count)
+{
+ struct folio_batch fbatch;
+ pgoff_t index = 0;
+ u64 count = 0;
+ int err = 0;
+
+ folio_batch_init(&fbatch);
+ while (index < end_index) {
+ unsigned int nr, i;
+
+ nr = filemap_get_folios(mapping, &index, end_index - 1, &fbatch);
+ if (nr == 0)
+ break;
+
+ for (i = 0; i < nr; i++) {
+ struct folio *folio = fbatch.folios[i];
+
+ if (folios_ser) {
+ if (folio_test_hwpoison(folio)) {
+ err = -EHWPOISON;
+ folio_batch_release(&fbatch);
+ goto out;
+ }
+ err = kho_preserve_folio(folio);
+ if (err) {
+ folio_batch_release(&fbatch);
+ goto out;
+ }
+
+ folios_ser[count].pfn = folio_pfn(folio);
+ folios_ser[count].index = folio->index;
+ folios_ser[count].flags = folio_test_uptodate(folio) ?
+ GUEST_MEMFD_LUO_FOLIO_UPTODATE : 0;
+ }
+ count++;
+ }
+ folio_batch_release(&fbatch);
+ cond_resched();
+ }
+
+out:
+ *out_count = count;
+ return err;
+}
+
+static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, struct file *file)
+{
+ struct inode *inode = file_inode(file);
+ struct gmem_file *gmem_file;
+ struct kvm *kvm;
+
+ if (inode->i_sb->s_magic != GUEST_MEMFD_MAGIC)
+ return 0;
+
+ gmem_file = file->private_data;
+ if (!gmem_file)
+ return 0;
+
+ /*
+ * Only Fully-shared guest_memfd preservation is supported
+ */
+ if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+ return 0;
+
+ /*
+ * It makes sure that no memory can converted to private
+ * even if it was initially fully shared (in-place conversions are
+ * prevented).
+ */
+ kvm = gmem_file->kvm;
+ if (kvm_arch_has_private_mem(kvm))
+ return 0;
+
+ if (mapping_large_folio_support(inode->i_mapping))
+ return 0;
+
+ return 1;
+}
+
+static int kvm_gmem_luo_preserve(struct liveupdate_file_op_args *args)
+{
+ DECLARE_KHOSER_PTR(sd, struct guest_memfd_luo_ser *);
+ struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+ u64 count = 0, gmem_flags, abi_flags = 0;
+ struct guest_memfd_luo_ser *ser;
+ struct address_space *mapping;
+ struct gmem_file *gmem_file;
+ struct inode *inode;
+ pgoff_t end_index;
+ struct kvm *kvm;
+ int err = 0;
+ long size;
+
+ inode = file_inode(args->file);
+ kvm_gmem_freeze(inode, true);
+
+ mapping = inode->i_mapping;
+ size = i_size_read(inode);
+ if (!size) {
+ err = -EINVAL;
+ goto err_unfreeze_inode;
+ }
+
+ if (WARN_ON_ONCE(!PAGE_ALIGNED(size))) {
+ err = -EINVAL;
+ goto err_unfreeze_inode;
+ }
+
+ gmem_file = args->file->private_data;
+ kvm = gmem_file->kvm;
+
+ gmem_flags = READ_ONCE(GMEM_I(inode)->flags);
+ if (gmem_flags & ~(GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED
+ | GUEST_MEMFD_F_MAPPING_FROZEN)) {
+ err = -EOPNOTSUPP;
+ goto err_unfreeze_inode;
+ }
+
+ if (gmem_flags & GUEST_MEMFD_FLAG_MMAP)
+ abi_flags |= GUEST_MEMFD_LUO_FLAG_MMAP;
+ if (gmem_flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+ abi_flags |= GUEST_MEMFD_LUO_FLAG_INIT_SHARED;
+
+ end_index = size >> PAGE_SHIFT;
+
+ ser = kho_alloc_preserve(sizeof(*ser));
+ if (IS_ERR(ser)) {
+ err = PTR_ERR(ser);
+ goto err_unfreeze_inode;
+ }
+
+ /* First pass: Count the folios present in the page cache */
+ err = kvm_gmem_luo_walk_folios(mapping, end_index, NULL, &count);
+ if (err)
+ goto err_free_ser;
+
+ ser->size = size;
+ ser->flags = abi_flags;
+ ser->nr_folios = count;
+ ser->vm_token = 0; // It will be set during the kvm_gmem_luo_freeze()
+
+ if (count > 0) {
+ folios_ser = vcalloc(count, sizeof(*folios_ser));
+ if (!folios_ser) {
+ err = -ENOMEM;
+ goto err_free_ser;
+ }
+
+ /* Second pass: Fill the metadata array and preserve folios */
+ err = kvm_gmem_luo_walk_folios(mapping, end_index, folios_ser, &count);
+ if (err)
+ goto err_unpreserve_unlocked;
+
+ if (WARN_ON_ONCE(count != ser->nr_folios)) {
+ err = -EINVAL;
+ goto err_unpreserve_unlocked;
+ }
+ }
+
+ if (count > 0) {
+ err = kho_preserve_vmalloc(folios_ser, &ser->folios);
+ if (err)
+ goto err_unpreserve_unlocked;
+ }
+
+ KHOSER_STORE_PTR(sd, ser);
+ KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+ args->private_data = folios_ser;
+
+ return 0;
+
+err_unpreserve_unlocked:
+ for (long i = (long)count - 1; i >= 0; i--) {
+ struct folio *folio = pfn_folio(folios_ser[i].pfn);
+
+ kho_unpreserve_folio(folio);
+ }
+ vfree(folios_ser);
+err_free_ser:
+ kho_unpreserve_free(ser);
+err_unfreeze_inode:
+ kvm_gmem_freeze(inode, false);
+ return err;
+}
+
+static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args)
+{
+ struct guest_memfd_luo_ser *ser;
+ struct gmem_file *gmem_file;
+ struct kvm *kvm;
+ struct file *kvm_file;
+ u64 vm_token;
+ int err;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (WARN_ON_ONCE(!ser))
+ return -EINVAL;
+
+ gmem_file = args->file->private_data;
+ kvm = gmem_file->kvm;
+
+ /*
+ * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE_BY_RCU
+ * file memory from being reallocated while it is being processed.
+ */
+ kvm_file = get_file_active(&kvm->vm_file);
+ if (!kvm_file)
+ return -ENOENT;
+
+ err = liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token);
+ fput(kvm_file);
+ if (err)
+ return err;
+
+ ser->vm_token = vm_token;
+ return 0;
+}
+
+static void kvm_gmem_luo_discard_folios(
+ const struct guest_memfd_luo_folio_ser *folios_ser,
+ u64 nr_folios, u64 start_idx)
+{
+ long i;
+
+ for (i = start_idx; i < nr_folios; i++) {
+ struct folio *folio;
+ phys_addr_t phys;
+
+ if (!folios_ser[i].pfn)
+ continue;
+
+ phys = PFN_PHYS(folios_ser[i].pfn);
+ folio = kho_restore_folio(phys);
+ if (folio)
+ folio_put(folio);
+ }
+}
+
+static void kvm_gmem_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+ struct guest_memfd_luo_folio_ser *folios_ser = args->private_data;
+ struct guest_memfd_luo_ser *ser;
+ long i;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (WARN_ON_ONCE(!ser))
+ return;
+
+ if (ser->nr_folios > 0)
+ kho_unpreserve_vmalloc(&ser->folios);
+ for (i = ser->nr_folios - 1; i >= 0; i--) {
+ struct folio *folio;
+
+ if (!folios_ser[i].pfn)
+ continue;
+
+ folio = pfn_folio(folios_ser[i].pfn);
+ kho_unpreserve_folio(folio);
+ }
+ vfree(folios_ser);
+
+ kho_unpreserve_free(ser);
+ kvm_gmem_freeze(file_inode(args->file), false);
+}
+
+static int kvm_gmem_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+ struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+ struct guest_memfd_luo_ser *ser;
+ struct kvm *kvm = NULL;
+ struct file *vm_file;
+ struct inode *inode;
+ struct file *file;
+ u64 gmem_flags = 0;
+ int err = 0;
+ long i = 0;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (!ser)
+ return -EINVAL;
+
+ if (ser->flags & ~GUEST_MEMFD_LUO_SUPPORTED_FLAGS) {
+ err = -EOPNOTSUPP;
+ goto err_free_ser;
+ }
+
+ if (ser->flags & GUEST_MEMFD_LUO_FLAG_MMAP)
+ gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
+ if (ser->flags & GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+ gmem_flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
+
+ err = liveupdate_get_file_incoming(args->session, ser->vm_token, &vm_file);
+ if (err) {
+ pr_warn("gmem: provided VM FD token (%llx) on preserve is incorrect\n",
+ ser->vm_token);
+ goto err_free_ser;
+ }
+
+ if (file_is_kvm(vm_file))
+ kvm = vm_file->private_data;
+
+ /*
+ * Release the temporary reference taken by the liveupdate_get_file_incoming
+ * call. LUO still holds a reference.
+ */
+ fput(vm_file);
+
+ if (!kvm) {
+ err = -EINVAL;
+ goto err_free_ser;
+ }
+
+ file = __kvm_gmem_create_file(kvm, ser->size, gmem_flags);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto err_free_ser;
+ }
+
+ inode = file_inode(file);
+
+ if (ser->nr_folios) {
+ folios_ser = kho_restore_vmalloc(&ser->folios);
+ if (!folios_ser) {
+ err = -EINVAL;
+ goto err_destroy_file;
+ }
+
+ for (i = 0; i < ser->nr_folios; i++) {
+ struct folio *folio;
+ phys_addr_t phys;
+
+ if (!folios_ser[i].pfn)
+ continue;
+
+ phys = PFN_PHYS(folios_ser[i].pfn);
+ folio = kho_restore_folio(phys);
+ if (!folio) {
+ pr_err("gmem: failed to restore folio at %llx\n", phys);
+ err = -EIO;
+ goto err_put_remaining_folios;
+ }
+
+ err = filemap_add_folio(inode->i_mapping, folio, folios_ser[i].index,
+ GFP_KERNEL);
+ if (err) {
+ pr_err("gmem: failed to add folio to page cache\n");
+ folio_put(folio);
+ goto err_put_remaining_folios;
+ }
+
+ if (folios_ser[i].flags & GUEST_MEMFD_LUO_FOLIO_UPTODATE)
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ vfree(folios_ser);
+ }
+
+ args->file = file;
+ kho_restore_free(ser);
+ return 0;
+
+err_put_remaining_folios:
+ i++;
+err_destroy_file:
+ fput(file);
+err_free_ser:
+ if (ser->nr_folios) {
+ if (!folios_ser)
+ folios_ser = kho_restore_vmalloc(&ser->folios);
+ if (folios_ser) {
+ kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, i);
+ vfree(folios_ser);
+ }
+ }
+ kho_restore_free(ser);
+ return err;
+}
+
+static void kvm_gmem_luo_finish(struct liveupdate_file_op_args *args)
+{
+ struct guest_memfd_luo_ser *ser;
+ struct guest_memfd_luo_folio_ser *folios_ser;
+
+ /* Nothing to be done here, if retrieve_status was successful or errored,
+ * Cleanup is taken care of in retrieval call.
+ */
+ if (args->retrieve_status)
+ return;
+
+ ser = KHOSER_LOAD_PTR(args->serialized_data);
+ if (!ser)
+ return;
+
+ if (ser->nr_folios) {
+ folios_ser = kho_restore_vmalloc(&ser->folios);
+ if (folios_ser) {
+ kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, 0);
+ vfree(folios_ser);
+ }
+ }
+
+ kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_gmem_luo_file_ops = {
+ .can_preserve = kvm_gmem_luo_can_preserve,
+ .preserve = kvm_gmem_luo_preserve,
+ .freeze = kvm_gmem_luo_freeze,
+ .retrieve = kvm_gmem_luo_retrieve,
+ .unpreserve = kvm_gmem_luo_unpreserve,
+ .finish = kvm_gmem_luo_finish,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_gmem_luo_handler = {
+ .ops = &kvm_gmem_luo_file_ops,
+ .compatible = GUEST_MEMFD_LUO_FH_COMPATIBLE,
+};
+
+int kvm_gmem_luo_init(void)
+{
+ int err = liveupdate_register_file_handler(&kvm_gmem_luo_handler);
+
+ if (err && err != -EOPNOTSUPP) {
+ pr_err("Could not register luo filesystem handler: %pe\n", ERR_PTR(err));
+ return err;
+ }
+
+ return 0;
+}
+
+void kvm_gmem_luo_exit(void)
+{
+ liveupdate_unregister_file_handler(&kvm_gmem_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9c3dd1..e8e2f10 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6581,6 +6581,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
if (r)
goto err_luo;
+ r = kvm_gmem_luo_init();
+ if (r)
+ goto err_gmem_luo;
+
/*
* Registration _must_ be the very last thing done, as this exposes
* /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6594,6 +6598,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
return 0;
err_register:
+ kvm_gmem_luo_exit();
+err_gmem_luo:
kvm_luo_exit();
err_luo:
kvm_uninit_virtualization();
@@ -6625,6 +6631,7 @@ void kvm_exit(void)
*/
misc_deregister(&kvm_dev);
+ kvm_gmem_luo_exit();
kvm_luo_exit();
kvm_uninit_virtualization();
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 8719871..1295ff8 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -103,9 +103,13 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
int kvm_luo_init(void);
void kvm_luo_exit(void);
+int kvm_gmem_luo_init(void);
+void kvm_gmem_luo_exit(void);
#else
static inline int kvm_luo_init(void) { return 0; }
static inline void kvm_luo_exit(void) {}
+static inline int kvm_gmem_luo_init(void) { return 0; }
+static inline void kvm_gmem_luo_exit(void) {}
#endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
#endif /* __KVM_MM_H__ */
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (5 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 18:54 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers Tarun Sahu
` (2 subsequent siblings)
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Add the documentation under the "Preserving file descriptors" section
of LUO's documentation.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
Documentation/core-api/liveupdate.rst | 1 +
Documentation/liveupdate/vmm.rst | 107 ++++++++++++++++++++++++++
MAINTAINERS | 1 +
virt/kvm/guest_memfd_luo.c | 4 +-
4 files changed, 111 insertions(+), 2 deletions(-)
create mode 100644 Documentation/liveupdate/vmm.rst
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 5a292d0..bac58a3 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -34,6 +34,7 @@ The following types of file descriptors can be preserved
:maxdepth: 1
../mm/memfd_preservation
+ ../liveupdate/vmm
Public API
==========
diff --git a/Documentation/liveupdate/vmm.rst b/Documentation/liveupdate/vmm.rst
new file mode 100644
index 0000000..8353e23
--- /dev/null
+++ b/Documentation/liveupdate/vmm.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+=============================
+VM & Guest_Memfd Preservation
+=============================
+
+.. kernel-doc:: virt/kvm/kvm_luo.c
+ :doc: KVM VM Preservation via LUO
+
+.. kernel-doc:: virt/kvm/guest_memfd_luo.c
+ :doc: Guest_Memfd Preservation via LUO
+
+VMM Instructions
+================
+
+This section describes the requirements, scope, conditions, and
+ordering constraints that a Virtual Machine Monitor (VMM) must adhere
+to for successful preservation and retrieval of guest_memfd files
+across a Live Update Orchestrator (LUO) sequence.
+
+Scope and Limitations
+---------------------
+
+At this stage, the scope of guest_memfd preservation is restricted to:
+
+1. **Fully Shared guest_memfd**:
+ This time only fully shared guest_memfd supported. Any system that
+ supports coco vm (which uses private guest_memfd), will not support
+ the preservation.
+
+2. **Standard Page Size**:
+ Only guest_memfd backed by standard page size (``PAGE_SIZE``,
+ order-0) pages is supported. Large/huge page backing (e.g.,
+ hugetlb guest_memfd) is not supported.
+
+Any Virtual Machine (VM) whose memory is fully backed by such
+guest_memfd files can be preserved across live update.
+
+VMM Actions and Conditions during Live Update
+---------------------------------------------
+
+During the live update sequence, the kernel introduces a *freezing*
+phase for the guest_memfd inode. Freezing prevents any modifications to
+the guest_memfd page cache. Specifically, once a guest_memfd mapping is
+frozen:
+
+- Any subsequent ``fallocate`` calls on the guest_memfd file descriptor
+ will fail and return ``-EPERM``.
+- Any new page faults (guest-side or host-userspace-side) that require
+ folio allocation will fail and return ``-EPERM``.
+
+To prevent vCPUs or VMM helper threads from failing due to these
+``-EPERM`` errors, the VMM must implement one of the following
+strategies:
+
+1. **Pause the VM (Recommended)**:
+ The VMM should pause/suspend all vCPUs before invoking the
+ preservation or freezing of the VM and guest_memfd files. This
+ ensures no new page faults or memory accesses can occur while the
+ guest_memfd is frozen.
+
+2. **Handle Fault Failures**:
+ If the VM is not paused, the VMM must be prepared to handle VM
+ exits or user page fault errors resulting from the ``-EPERM``
+ failures. The VMM must take appropriate action, such as
+ immediately pausing the VM, or aborting the live update sequence
+ (by tearing down or unpreserving the live update session).
+
+Preservation and Retrieval Ordering
+-----------------------------------
+
+Preservation Order
+~~~~~~~~~~~~~~~~~~
+
+There is no strict ordering requirement for initiating the
+preservation of the KVM VM file and the guest_memfd files; they are
+preserved independently. If kexec is triggered with guest_memfd
+preservation without preserving the vm file, kexec will fail.
+
+Retrieval Order
+~~~~~~~~~~~~~~~
+
+Similarly, there is no strict ordering required for retrieving the VM
+and guest_memfd files. Any file can be retrieved at any order.
+
+If guest_memfd file is retrieved and VM file is not retrieved, and
+luo_finish is called, then vm_file will be lost and guest_memfd file
+will be hanging around.
+
+NOTE: Before Initiating the preservation/retirval, it is necessary to make
+sure that the kvm module is loaded (/dev/kvm must be available).
+
+
+VM & Guest_Memfd Preservation ABI
+=================================
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+ :doc: DOC: guest_memfd Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+ :internal:
+
+See Also
+========
+
+- :doc:`/core-api/liveupdate`
+- :doc:`/userspace-api/liveupdate`
diff --git a/MAINTAINERS b/MAINTAINERS
index d1d699ce..e27b677 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L: kexec@lists.infradead.org
L: kvm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F: Documentation/liveupdate/vmm.rst
F: virt/kvm/guest_memfd_luo.c
F: virt/kvm/kvm_luo.c
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
index c242b1d..8411fe8 100644
--- a/virt/kvm/guest_memfd_luo.c
+++ b/virt/kvm/guest_memfd_luo.c
@@ -119,11 +119,11 @@ static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, s
/*
* Only Fully-shared guest_memfd preservation is supported
*/
- if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+ if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
return 0;
/*
- * It makes sure that no memory can converted to private
+ * It makes sure that no memory can be converted to private
* even if it was initially fully shared (in-place conversions are
* prevented).
*/
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (6 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test Tarun Sahu
2026-06-22 18:55 ` [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation tarunsahu
9 siblings, 0 replies; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Refactor `____vm_create()` in the KVM selftest library to extract its
initialization steps into separate, reusable internal helpers.
Introduce `vm_init_fields()` and `vm_init_memory_properties()`. This
allows advanced test setups to perform targeted VM fields or memory
property initializations independently, which is required by upcoming
test cases that restore preserved VMs. No functional changes are
introduced for the existing tests.
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
.../testing/selftests/kvm/include/kvm_util.h | 2 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 26 +++++++++++++------
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 04a9101..88de0e7 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -471,6 +471,8 @@ const char *vm_guest_mode_string(u32 i);
void kvm_vm_free(struct kvm_vm *vmp);
void kvm_vm_restart(struct kvm_vm *vmp);
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape);
+void vm_init_memory_properties(struct kvm_vm *vm);
void kvm_vm_release(struct kvm_vm *vmp);
void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename);
int kvm_memfd_alloc(size_t size, bool hugepages);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 195f3fd..dc576b8 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -276,13 +276,8 @@ __weak void vm_populate_gva_bitmap(struct kvm_vm *vm)
(1ULL << (vm->va_bits - 1)) >> vm->page_shift);
}
-struct kvm_vm *____vm_create(struct vm_shape shape)
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape)
{
- struct kvm_vm *vm;
-
- vm = calloc(1, sizeof(*vm));
- TEST_ASSERT(vm != NULL, "Insufficient Memory");
-
INIT_LIST_HEAD(&vm->vcpus);
vm->regions.gpa_tree = RB_ROOT;
vm->regions.hva_tree = RB_ROOT;
@@ -380,9 +375,10 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
if (vm->pa_bits != 40)
vm->type = KVM_VM_TYPE_ARM_IPA_SIZE(vm->pa_bits);
#endif
+}
- vm_open(vm);
-
+void vm_init_memory_properties(struct kvm_vm *vm)
+{
/* Limit to VA-bit canonical virtual addresses. */
vm->vpages_valid = sparsebit_alloc();
vm_populate_gva_bitmap(vm);
@@ -392,6 +388,20 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
/* Allocate and setup memory for guest. */
vm->vpages_mapped = sparsebit_alloc();
+}
+
+struct kvm_vm *____vm_create(struct vm_shape shape)
+{
+ struct kvm_vm *vm;
+
+ vm = calloc(1, sizeof(*vm));
+ TEST_ASSERT(vm != NULL, "Insufficient Memory");
+
+ vm_init_fields(vm, shape);
+
+ vm_open(vm);
+
+ vm_init_memory_properties(vm);
return vm;
}
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (7 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers Tarun Sahu
@ 2026-06-22 18:48 ` Tarun Sahu
2026-06-22 19:13 ` sashiko-bot
2026-06-22 18:55 ` [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation tarunsahu
9 siblings, 1 reply; 18+ messages in thread
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
Add a new KVM selftest `guest_memfd_preservation_test` to verify that
guest memory backed by guest_memfd is preserved properly.
Here, I have used the kvm selftests framework by creating a new
vm and mapping two memory slots to it. One is the code that is executed
inside the vm and other is the guest_memfd whose memory is being
written by the guest code.
In Stage 1: Once data is written the vm exits and wait for the user
to trigger the kexec.
In Stage 2: A new vm is created with retrieved kvm and again two
memory slots are assigned. Once for guest code, and another is for
retrieved guest_memfd where guest_memfd memory is verified by the
executed guest code. If verification succeeds, The test passes.
// Kernel is compiled with CONFIG_LIVEUPDATE_GUEST_MEMFD and booted
// with kho=on liveupdate=on command line parameter.
$ ./selftests/kvm/guest_memfd_preservation_test --stage 1
$ <kexec>
$ ./selftests/kvm/guest_memfd_preservation_test --stage 2
Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
MAINTAINERS | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 6 +-
.../kvm/guest_memfd_preservation_test.c | 236 ++++++++++++++++++
3 files changed, 242 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
diff --git a/MAINTAINERS b/MAINTAINERS
index e27b677..d0033a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14421,6 +14421,7 @@ L: kvm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
F: Documentation/liveupdate/vmm.rst
+F: tools/testing/selftests/kvm/guest_memfd_preservation_test.c
F: virt/kvm/guest_memfd_luo.c
F: virt/kvm/kvm_luo.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index d28a057..d5bc8be2 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -164,6 +164,8 @@ TEST_GEN_PROGS_x86 += pre_fault_memory_test
# Compiled outputs used by test targets
TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
+# Manual test that forks a persistent background daemon; skip auto CI run
+TEST_GEN_PROGS_EXTENDED_x86 += guest_memfd_preservation_test
TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs
@@ -258,6 +260,7 @@ OVERRIDE_TARGETS = 1
# which causes the environment variable to override the makefile).
include ../lib.mk
include ../cgroup/lib/libcgroup.mk
+include ../liveupdate/lib/libliveupdate.mk
INSTALL_HDR_PATH = $(top_srcdir)/usr
LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/
@@ -312,7 +315,8 @@ LIBKVM_S := $(filter %.S,$(LIBKVM))
LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
LIBKVM_STRING_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_STRING))
-LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ) $(LIBCGROUP_O)
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ) \
+ $(LIBCGROUP_O) $(LIBLIVEUPDATE_O)
SPLIT_TEST_GEN_PROGS := $(patsubst %, $(OUTPUT)/%, $(SPLIT_TESTS))
SPLIT_TEST_GEN_OBJ := $(patsubst %, $(OUTPUT)/$(ARCH)/%.o, $(SPLIT_TESTS))
diff --git a/tools/testing/selftests/kvm/guest_memfd_preservation_test.c b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
new file mode 100644
index 0000000..c0a20e7
--- /dev/null
+++ b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2026, Google LLC.
+ *
+ * Author: Tarun Sahu <tarunsahu@google.com>
+ *
+ * Test for VM and guest_memfd preservation across kexec (Live Update) via LUO.
+ *
+ * NOTE: This is a MANUAL test and is excluded from automated CI/testing
+ * frameworks because Stage 1 daemonizes into the background to pin resources
+ * and requires a human operator to manually trigger kexec before Stage 2
+ * is executed. Running Stage 1 automatically would leak the background daemon
+ * and cause CI runners to falsely interpret it as a passed test.
+ *
+ * Usage:
+ * Stage 1: ./guest_memfd_preservation_test --stage 1
+ * Stage 2: ./guest_memfd_preservation_test --stage 2
+ */
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <linux/sizes.h>
+#include <linux/falloc.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+#include "ucall_common.h"
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#include <libliveupdate.h>
+
+#define SESSION_NAME "gmem_vm_preservation_session"
+#define VM_TOKEN 0x1001
+#define GMEM_TOKEN 0x1002
+
+#define STATE_SESSION_NAME "gmem_preservation_state"
+#define STATE_TOKEN 0x999
+
+#define GMEM_SIZE (16ULL * 1024 * 1024)
+#define DATA_SIZE (5ULL * 1024 * 1024)
+
+static size_t page_size;
+
+/* Deterministic byte pattern generation based on offset */
+static inline uint8_t get_pattern_byte(size_t offset)
+{
+ return (uint8_t)(offset ^ 0x5A);
+}
+
+static void guest_code_phase1(uint64_t gpa, uint64_t size, uint64_t data_size)
+{
+ uint8_t *mem = (uint8_t *)gpa;
+ size_t i;
+
+ for (i = 0; i < data_size; i++)
+ mem[i] = get_pattern_byte(i);
+
+ GUEST_DONE();
+}
+
+static void guest_code_phase2(uint64_t gpa, uint64_t size, uint64_t data_size)
+{
+ uint8_t *mem = (uint8_t *)gpa;
+ size_t i;
+
+ for (i = 0; i < data_size; i++) {
+ uint8_t val = get_pattern_byte(i);
+
+ __GUEST_ASSERT(mem[i] == val,
+ "Data mismatch at offset %lu! Expected 0x%x, got 0x%x",
+ i, val, mem[i]);
+ }
+
+ GUEST_DONE();
+}
+
+static void run_stage_1(int luo_fd)
+{
+ uint64_t flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ int gmem_fd, session_fd, ret;
+ const uint64_t gpa = SZ_4G;
+ struct kvm_vcpu *vcpu;
+ const int slot = 1;
+ struct kvm_vm *vm;
+
+ ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
+
+ ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
+ create_state_file(luo_fd, STATE_SESSION_NAME, STATE_TOKEN, 2);
+
+ vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1,
+ guest_code_phase1);
+ gmem_fd = vm_create_guest_memfd(vm, GMEM_SIZE, flags);
+ vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
+ gmem_fd, 0);
+
+ for (size_t i = 0; i < GMEM_SIZE; i += page_size)
+ virt_pg_map(vm, gpa + i, gpa + i);
+
+ vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
+
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+ ksft_print_msg("[STAGE 1] Creating session '%s' and preserving VM/guest_memfd...\n",
+ SESSION_NAME);
+ session_fd = luo_create_session(luo_fd, SESSION_NAME);
+ TEST_ASSERT(session_fd >= 0, "Failed to create LUO session");
+
+ ret = luo_session_preserve_fd(session_fd, vm->fd, VM_TOKEN);
+ TEST_ASSERT(ret == 0, "Failed to preserve VM file descriptor");
+
+ ret = luo_session_preserve_fd(session_fd, gmem_fd, GMEM_TOKEN);
+ TEST_ASSERT(ret == 0, "Failed to preserve guest_memfd file descriptor");
+
+ printf("\n============================================================\n");
+ printf("Phase 1 Complete Successfully!\n");
+ printf("VM file and guest_memfd file have been preserved via LUO.\n");
+ printf("Tokens: VM_TOKEN=0x%x, GMEM_TOKEN=0x%x\n", VM_TOKEN, GMEM_TOKEN);
+ printf("Machine Size: %llu MB, Data Size: %llu MB\n", GMEM_SIZE / SZ_1M,
+ DATA_SIZE / SZ_1M);
+ printf("------------------------------------------------------------\n");
+
+ close(luo_fd);
+ daemonize_and_wait();
+}
+
+static struct kvm_vm *vm_create_from_fd(int resurrected_vm_fd,
+ struct vm_shape shape)
+{
+ struct kvm_vm *vm;
+
+ vm = calloc(1, sizeof(*vm));
+ TEST_ASSERT(vm != NULL, "Insufficient Memory");
+
+ vm_init_fields(vm, shape);
+
+ vm->kvm_fd = open_path_or_exit(KVM_DEV_PATH, O_RDWR);
+ vm->fd = resurrected_vm_fd;
+
+ if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
+ vm->stats.fd = vm_get_stats_fd(vm);
+ else
+ vm->stats.fd = -1;
+
+ vm_init_memory_properties(vm);
+
+ return vm;
+}
+
+static void run_stage_2(int luo_fd, int state_session_fd)
+{
+ int retrieved_vm_fd, retrieved_gmem_fd, session_fd, stage;
+ struct vm_shape shape = VM_SHAPE_DEFAULT;
+ const uint64_t gpa = SZ_4G;
+ struct kvm_vcpu *vcpu;
+ const int slot = 1;
+ struct kvm_vm *vm;
+
+ ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
+
+ restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage);
+ if (stage != 2)
+ fail_exit("Expected stage 2, but state file contains %d", stage);
+
+ ksft_print_msg("[STAGE 2] Retrieving session '%s'...\n", SESSION_NAME);
+ session_fd = luo_retrieve_session(luo_fd, SESSION_NAME);
+ TEST_ASSERT(session_fd >= 0, "Failed to retrieve LUO session");
+
+ retrieved_vm_fd = luo_session_retrieve_fd(session_fd, VM_TOKEN);
+ TEST_ASSERT(retrieved_vm_fd >= 0, "Failed to retrieve VM file descriptor");
+
+ retrieved_gmem_fd = luo_session_retrieve_fd(session_fd, GMEM_TOKEN);
+ TEST_ASSERT(retrieved_gmem_fd >= 0, "Failed to retrieve guest_memfd file descriptor");
+
+ vm = vm_create_from_fd(retrieved_vm_fd, shape);
+
+ u64 nr_pages = 2048; /* 8MB is plenty for slot0 pages */
+
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
+ kvm_vm_elf_load(vm, program_invocation_name);
+
+ for (int i = 0; i < NR_MEM_REGIONS; i++)
+ vm->memslots[i] = 0;
+
+ struct userspace_mem_region *slot0 = memslot2region(vm, 0);
+
+ ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size);
+
+ vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
+ retrieved_gmem_fd, 0);
+
+ for (size_t i = 0; i < GMEM_SIZE; i += page_size)
+ virt_pg_map(vm, gpa + i, gpa + i);
+
+ vcpu = vm_vcpu_add(vm, 0, guest_code_phase2);
+ kvm_arch_vm_finalize_vcpus(vm);
+
+ vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
+
+ printf("Resuming / Running VM in Phase 2...\n");
+ vcpu_run(vcpu);
+ TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+ printf("\nSUCCESS: Phase 2 Complete! All 5MB complex data verified intact!\n");
+
+ luo_session_finish(session_fd);
+ close(session_fd);
+
+ ksft_print_msg("[STAGE 2] Finalizing state session...\n");
+ if (luo_session_finish(state_session_fd) < 0)
+ fail_exit("luo_session_finish for state session");
+ close(state_session_fd);
+
+ /* This will also close the vm_fd */
+ kvm_vm_free(vm);
+ close(retrieved_gmem_fd);
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+ page_size = getpagesize();
+
+ return luo_test(argc, argv, STATE_SESSION_NAME,
+ run_stage_1, run_stage_2);
+}
--
2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO
2026-06-22 18:48 ` [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO Tarun Sahu
@ 2026-06-22 18:54 ` sashiko-bot
2026-06-22 19:04 ` tarunsahu
0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 18:54 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The patch includes an undocumented logic bug fix in C code while being labeled as a pure documentation change.
- [Low] Typos and grammatical errors in the new documentation.
--
commit eb552713cd5f980d69c6b422467b4dfbb265bad1
Author: Tarun Sahu <tarunsahu@google.com>
docs: add documentation for guest_memfd preservation via LUO
This commit adds documentation under the "Preserving file descriptors" section
of LUO's documentation. It describes the requirements and limitations for
guest_memfd preservation.
> diff --git a/Documentation/liveupdate/vmm.rst b/Documentation/liveupdate/vmm.rst
> new file mode 100644
> index 0000000000000..8353e236471d2
> --- /dev/null
> +++ b/Documentation/liveupdate/vmm.rst
[ ... ]
> +Scope and Limitations
> +---------------------
> +
> +At this stage, the scope of guest_memfd preservation is restricted to:
> +
> +1. **Fully Shared guest_memfd**:
> + This time only fully shared guest_memfd supported. Any system that
> + supports coco vm (which uses private guest_memfd), will not support
> + the preservation.
[Severity: Low]
This isn't a bug, but there are some grammatical issues here. "This time only
fully shared guest_memfd supported" seems like it should be "At this time, only
fully shared guest_memfd is supported". Also, "coco vm" should probably be
capitalized as "CoCo VM".
[ ... ]
> +Retrieval Order
> +~~~~~~~~~~~~~~~
> +
> +Similarly, there is no strict ordering required for retrieving the VM
> +and guest_memfd files. Any file can be retrieved at any order.
> +
> +If guest_memfd file is retrieved and VM file is not retrieved, and
> +luo_finish is called, then vm_file will be lost and guest_memfd file
> +will be hanging around.
> +
> +NOTE: Before Initiating the preservation/retirval, it is necessary to make
[Severity: Low]
This isn't a bug, but there is a typo here: "retirval" should be "retrieval".
> +sure that the kvm module is loaded (/dev/kvm must be available).
> +
> +
[ ... ]
> diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
> index c242b1d14c0ac..8411fe86dc2a1 100644
> --- a/virt/kvm/guest_memfd_luo.c
> +++ b/virt/kvm/guest_memfd_luo.c
> @@ -119,11 +119,11 @@ static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, s
> /*
> * Only Fully-shared guest_memfd preservation is supported
> */
> - if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
> + if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
> return 0;
[Severity: Medium]
Should this functional logic fix in kvm_gmem_luo_can_preserve() be split into
its own patch?
The commit message explicitly states this is a documentation change, but this
modifies the condition for GUEST_MEMFD_FLAG_INIT_SHARED in C code, which
violates the rule of one logical change per commit and makes tracking bug
fixes difficult.
>
> /*
> - * It makes sure that no memory can converted to private
> + * It makes sure that no memory can be converted to private
> * even if it was initially fully shared (in-place conversions are
> * prevented).
> */
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=7
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
` (8 preceding siblings ...)
2026-06-22 18:48 ` [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test Tarun Sahu
@ 2026-06-22 18:55 ` tarunsahu
9 siblings, 0 replies; 18+ messages in thread
From: tarunsahu @ 2026-06-22 18:55 UTC (permalink / raw)
To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
Shuah Khan, Pratyush Yadav, Pasha Tatashin, seanjc, ackerleytng,
aneesh.kumar, fvdl, sagis, david, dmatlack, mark.rutland
Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
+ Adding More people to the series (To:) which I missed in my original message.
~Tarun
Tarun Sahu <tarunsahu@google.com> writes:
> Hello,
> This is Non-RFC patch series for guest_memfd preservation. After
> having multiple discussion across hypervisor liveupdate meeting,
> guest_memfd bi-weekly meeting, the design for the basic support of
> guest_memfd preservation is final. This series is going to include
> guest_memfd which are fully shared and does not support private mem
> and backed by PAGE_SIZE pages.
>
> Steps to test:
> 1. Compile Kernel with CONFIG_LIVEUPDATE_GUEST_MEMFD=y
> 2. boot kernel with command line: kho=on liveupdate=on
> 3. run the following kselftest
> $ .selftests/kvm/guest_memfd_preservation_test --stage 1
> $ <kexec> --reuse-cmdline
> $ .selftests/kvm/guest_memfd_preservation_test --stage 2
>
> NOTE: Assert the following:
> $ ls /dev/liveupdate
> $ ls /dev/kvm
> $ dmesg | grep liveupdate # (should have kvm_vm_luo &&
> # guest_memfd_luo handler registered)
>
> The changes are rebased on:
> kvm/next + liveupdate/next (merge) + [3] + [4] + [5]
> Where,
> [3]: luo: conversion of serialized_data to KHOSER_PTR
> [4]: luo: APIs to retrieve file internally from session
> [5]: selftests: liveupdate sefltests library
> Here is the github repo:
> https://github.com/tar-unix/linux/tree/gmem-pre
>
> V3 <- RFC V2 [2]
> 1. Finalize the design
> 2. resolve sashiko reported bugs
> 3. Use of KHOSER_PTR instead of raw serialized_data as per [3]
>
> RFC V2 [2] <- RFC V1 [1]
> 1. Removed mem_attr_array as it is not needed for fully-shared
> 2. Removed pre-faulted condition
> 3. Added vm_type preservation for ARM64.
> 4. Removed liveupdate_get_file_incoming api patch as it is sent
> separately [4] by Samiullah.
>
> [1] https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/
> [2] https://lore.kernel.org/all/c054ba0fb2639932bbe354420d3f4f84cce84905.1780676742.git.tarunsahu@google.com/
> [3] https://lore.kernel.org/all/20260622111215.4157974-1-tarunsahu@google.com/
> [4] https://lore.kernel.org/all/20260613012521.835490-1-skhawaja@google.com/
> [5] https://lore.kernel.org/all/20260612214512.464146-1-vipinsh@google.com/
>
> Tarun Sahu (9):
> liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
> kvm: Prepare core VM structs and helpers for LUO support
> kvm: kvm_luo: Allow kvm preservation with LUO
> kvm: guest_memfd: Move internal definitions and helper to new header
> kvm: guest_memfd: Add support for freezing and unfreezing mappings
> kvm: guest_memfd_luo: add support for guest_memfd preservation
> docs: add documentation for guest_memfd preservation via LUO
> selftests: kvm: Split ____vm_create() to expose init helpers
> selftests: kvm: Add guest_memfd_preservation_test
>
> Documentation/core-api/liveupdate.rst | 1 +
> Documentation/liveupdate/vmm.rst | 107 ++++
> MAINTAINERS | 14 +
> include/linux/kho/abi/kvm.h | 106 ++++
> include/linux/kvm_host.h | 14 +
> kernel/liveupdate/Kconfig | 15 +
> tools/testing/selftests/kvm/Makefile.kvm | 6 +-
> .../kvm/guest_memfd_preservation_test.c | 236 +++++++++
> .../testing/selftests/kvm/include/kvm_util.h | 2 +
> tools/testing/selftests/kvm/lib/kvm_util.c | 26 +-
> virt/kvm/Makefile.kvm | 1 +
> virt/kvm/guest_memfd.c | 185 +++++--
> virt/kvm/guest_memfd.h | 44 ++
> virt/kvm/guest_memfd_luo.c | 497 ++++++++++++++++++
> virt/kvm/kvm_luo.c | 195 +++++++
> virt/kvm/kvm_main.c | 94 +++-
> virt/kvm/kvm_mm.h | 15 +
> 17 files changed, 1477 insertions(+), 81 deletions(-)
> create mode 100644 Documentation/liveupdate/vmm.rst
> create mode 100644 include/linux/kho/abi/kvm.h
> create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
> create mode 100644 virt/kvm/guest_memfd.h
> create mode 100644 virt/kvm/guest_memfd_luo.c
> create mode 100644 virt/kvm/kvm_luo.c
>
> --
> 2.55.0.rc0.786.g65d90a0328-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support
2026-06-22 18:48 ` [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support Tarun Sahu
@ 2026-06-22 19:01 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 19:01 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Missing `__rcu` annotation for the `vm_file` field in `struct kvm`.
--
commit 742b05343deb7763672dc99dce96c4773cc18720
Author: Tarun Sahu <tarunsahu@google.com>
kvm: Prepare core VM structs and helpers for LUO support
This commit introduces core infrastructure to support VM preservation with
LUO. It refactors VM file creation into a unified kvm_create_vm_file()
helper and tracks a weak reference to the backing file in struct kvm to
enable reverse file resolution.
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
[ ... ]
> @@ -874,6 +874,18 @@ struct kvm {
> #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> /* Protected by slots_lock (for writes) and RCU (for reads) */
> struct xarray mem_attr_array;
> +#endif
> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
> + /*
> + * Weak reference to the VFS file backing this KVM instance. Stored
> + * without incrementing the file refcount to prevent a circular lifetime
> + * dependency (since file->private_data already pins this struct kvm).
> + * Used exclusively to resolve the file pointer back from struct kvm.
> + *
> + * Written/cleared via rcu_assign_pointer() and read locklessly under
> + * RCU (e.g. via get_file_active() to prevent ABA races).
> + */
> + struct file *vm_file;
[Severity: Low]
Should this pointer be annotated with __rcu?
It is documented as being read locklessly under RCU and is updated using
rcu_assign_pointer() below. Without the annotation, sparse static analysis
will likely generate warnings about incorrect types and different address
spaces when the pointer is updated.
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
[ ... ]
> +struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
> +{
[ ... ]
> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
> + /*
> + * Weak reference to the file (without get_file()) to prevent a circular
> + * dependency. Safe because the file's release path clears this pointer
> + * and drops its reference to the VM.
> + *
> + * Written via rcu_assign_pointer() because the pointer can be read
> + * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
> + * get_file_active() to prevent lockless ABA races).
> + */
> + rcu_assign_pointer(kvm->vm_file, file);
[Severity: Low]
Since kvm->vm_file is updated here in kvm_create_vm_file() using
rcu_assign_pointer(), does the member in struct kvm require the __rcu
modifier to prevent sparse warnings?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=2
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings
2026-06-22 18:48 ` [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings Tarun Sahu
@ 2026-06-22 19:01 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 19:01 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Global SRCU read lock held across unbounded fallocate operations causes system-wide DoS for guest_memfd freezing.
Pre-existing issues:
- [High] Calling `file_modified(file)` without holding the `inode_lock()`.
--
commit 76fd5a6866a1e58bea715f4038d1ddf2c323cc9d
Author: Tarun Sahu <tarunsahu@google.com>
kvm: guest_memfd: Add support for freezing and unfreezing mappings
This commit adds a freeze state on guest_memfd inodes to prevent fallocate and
new page faults during VM live updates. It synchronizes these freezing states
using a system-wide SRCU lock.
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index fe1adc9ba48d1..a4d9d348e4d16 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
[ ... ]
> @@ -273,16 +284,30 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
> static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
> loff_t len)
> {
> + struct inode *inode = file_inode(file);
> int ret;
> + int idx;
>
> - if (!(mode & FALLOC_FL_KEEP_SIZE))
> - return -EOPNOTSUPP;
> + idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
> + if (kvm_gmem_is_frozen(inode)) {
> + srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
> + return -EPERM;
> + }
>
> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> - return -EOPNOTSUPP;
> + if (!(mode & FALLOC_FL_KEEP_SIZE)) {
> + ret = -EOPNOTSUPP;
> + goto out;
> + }
>
> - if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> - return -EINVAL;
> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
> + ret = -EOPNOTSUPP;
> + goto out;
> + }
> +
> + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
> + ret = -EINVAL;
> + goto out;
> + }
>
> if (mode & FALLOC_FL_PUNCH_HOLE)
> ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
> else
> ret = kvm_gmem_allocate(file_inode(file), offset, len);
[Severity: High]
Could holding this global SRCU read lock across potentially unbounded
operations cause a system-wide denial of service?
When kvm_gmem_fallocate() calls kvm_gmem_allocate() or
kvm_gmem_punch_hole(), the operation iterates over a potentially massive
memory region. While kvm_gmem_allocate() yields the CPU via cond_resched(),
it does not drop this global SRCU read lock.
If another task on the host concurrently calls kvm_gmem_freeze() to preserve
a different VM, its synchronize_srcu() could block indefinitely waiting for
this tenant's operation to complete. Should the SRCU lock be dropped
and reacquired during long loops to prevent cross-tenant starvation?
>
> if (!ret)
> file_modified(file);
[Severity: High]
This is a pre-existing issue, but does calling file_modified() here violate
the VFS locking requirements?
The VFS API requires the caller of file_modified() to hold the file's
inode_lock (or i_rwsem) because it performs non-atomic timestamp updates
and security privilege clearing. Neither vfs_fallocate() nor
kvm_gmem_fallocate() appears to acquire the inode lock before this call.
> +
> +out:
> + srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
> return ret;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=5
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO
2026-06-22 18:54 ` sashiko-bot
@ 2026-06-22 19:04 ` tarunsahu
0 siblings, 0 replies; 18+ messages in thread
From: tarunsahu @ 2026-06-22 19:04 UTC (permalink / raw)
To: sashiko-reviews
Cc: kvm, kexec, Pasha Tatashin, seanjc, Mike Rapoport, Pratyush Yadav,
mark.rutland, ackerleytng, Paolo Bonzini, Alexander Graf
sashiko-bot@kernel.org writes:
>
>> diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
>> index c242b1d14c0ac..8411fe86dc2a1 100644
>> --- a/virt/kvm/guest_memfd_luo.c
>> +++ b/virt/kvm/guest_memfd_luo.c
>> @@ -119,11 +119,11 @@ static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, s
>> /*
>> * Only Fully-shared guest_memfd preservation is supported
>> */
>> - if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
>> + if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
>> return 0;
>
> [Severity: Medium]
> Should this functional logic fix in kvm_gmem_luo_can_preserve() be split into
> its own patch?
>
> The commit message explicitly states this is a documentation change, but this
> modifies the condition for GUEST_MEMFD_FLAG_INIT_SHARED in C code, which
> violates the rule of one logical change per commit and makes tracking bug
> fixes difficult.
>
OOOOPS, This (and the below) should be part of previous patch, I will
have it fixed in next version next version.
>>
>> /*
>> - * It makes sure that no memory can converted to private
>> + * It makes sure that no memory can be converted to private
>> * even if it was initially fully shared (in-place conversions are
>> * prevented).
>> */
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=7
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO
2026-06-22 18:48 ` [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO Tarun Sahu
@ 2026-06-22 19:06 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 19:06 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Low] The macro `KHOSER_COPY_TYPEUNSAFE` is undefined in the kernel tree, causing a build failure.
- [Low] `KHOSER_LOAD_PTR` is incorrectly invoked with a `u64` argument instead of a union, causing a build failure.
--
commit c233e1cbb1e57dbd4789fe720442add1dfd010f1
Author: Tarun Sahu <tarunsahu@google.com>
kvm: kvm_luo: Allow kvm preservation with LUO
This commit introduces KVM VM preservation support for the Live Update
Orchestrator (LUO). It registers a file handler to serialize and deserialize
VM state across live updates, primarily focusing on preserving the VM type.
> diff --git a/virt/kvm/kvm_luo.c b/virt/kvm/kvm_luo.c
> new file mode 100644
> index 0000000000000..6728877f16daf
> --- /dev/null
> +++ b/virt/kvm/kvm_luo.c
> @@ -0,0 +1,195 @@
[ ... ]
> +static int kvm_luo_preserve(struct liveupdate_file_op_args *args)
> +{
> + DECLARE_KHOSER_PTR(sd, struct kvm_luo_ser *);
> + struct kvm *kvm = args->file->private_data;
> + struct kvm_luo_ser *ser;
[ ... ]
> + KHOSER_STORE_PTR(sd, ser);
> + KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
[Severity: Low]
Is this macro defined somewhere else in the tree? It looks like
KHOSER_COPY_TYPEUNSAFE is missing from <linux/kho/abi/kexec_handover.h>
and other headers, which causes a build failure here.
> +
> + return 0;
> +}
> +
> +static atomic_t restored_vm_id = ATOMIC_INIT(0);
> +
> +static int kvm_luo_retrieve(struct liveupdate_file_op_args *args)
> +{
> + char fdname[ITOA_MAX_LEN + 1];
> + struct kvm_luo_ser *ser;
> + struct file *file;
> + struct kvm *kvm;
> + int err = 0;
> +
> + ser = KHOSER_LOAD_PTR(args->serialized_data);
[Severity: Low]
Does this code compile? KHOSER_LOAD_PTR() expects its argument to be a union
declared with DECLARE_KHOSER_PTR(), which provides the expected .ptr and
.phys fields.
Since args->serialized_data is a plain u64, this causes a compilation error
because it tries to access struct/union members on a primitive type.
> + if (!ser)
> + return -EINVAL;
> +
> + snprintf(fdname, sizeof(fdname), "%d",
> + atomic_inc_return(&restored_vm_id));
> +
> + file = kvm_create_vm_file(ser->type, fdname);
[ ... ]
> +static void kvm_luo_unpreserve(struct liveupdate_file_op_args *args)
> +{
> + struct kvm_luo_ser *ser;
> +
> + /*
> + * in case preservation failed, args->serialized_data will
> + * be NULL and kvm_luo_preserve takes care of cleaning up.
> + * If preserve succeeds, this condition fails and unpreserve
> + * function takes care of cleaning up.
> + */
> + ser = KHOSER_LOAD_PTR(args->serialized_data);
[Severity: Low]
Similar to the issue in kvm_luo_retrieve(), args->serialized_data is a plain
u64 rather than the union KHOSER_LOAD_PTR() expects. Will this cause a build
failure?
> + if (WARN_ON_ONCE(!ser))
> + return;
> +
> + kho_unpreserve_free(ser);
> +}
> +
> +static void kvm_luo_finish(struct liveupdate_file_op_args *args)
> +{
> + struct kvm_luo_ser *ser;
> +
> + /*
> + * If retrieve_status is true or set to error, nothing to do here.
> + * Already cleaned up in kvm_luo_retrieve().
> + */
> + if (args->retrieve_status)
> + return;
> +
> + ser = KHOSER_LOAD_PTR(args->serialized_data);
[Severity: Low]
Is there a type mismatch here as well? args->serialized_data is being passed
to KHOSER_LOAD_PTR() as a plain u64.
> + if (!ser)
> + return;
> +
> + kho_restore_free(ser);
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation
2026-06-22 18:48 ` [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation Tarun Sahu
@ 2026-06-22 19:08 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 19:08 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Missing `kvm_get_kvm()` in `kvm_gmem_luo_retrieve` causes a Use-After-Free of the KVM instance.
- [Critical] Restoring guest_memfd with flags not supported by the current architecture or KVM configuration.
- [High] Missing validation of `ser->size` and `folios_ser[i].index` during guest_memfd retrieval.
- [Low] The patch calls non-existent LUO functions and accesses a non-existent field in `struct liveupdate_file_op_args`, preventing compilation.
--
commit 9f1f13985c0cce15e0b3564ac25bf7a759e9dd5e
Author: Tarun Sahu <tarunsahu@google.com>
kvm: guest_memfd_luo: add support for guest_memfd preservation
This patch sets up the basic infrastructure to preserve the guest_memfd across
kexec reboots using LUO. It walks through the folios, serializes them, and
freezes the guest_memfd inode to avoid changes. During retrieval, the folios
are restored into the page cache.
> diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
> new file mode 100644
> index 0000000000000..c242b1d14c0ac
> --- /dev/null
> +++ b/virt/kvm/guest_memfd_luo.c
[ ... ]
> +static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args)
> +{
[ ... ]
> + /*
> + * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE_BY_RCU
> + * file memory from being reallocated while it is being processed.
> + */
> + kvm_file = get_file_active(&kvm->vm_file);
> + if (!kvm_file)
> + return -ENOENT;
> +
> + err = liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token);
[Severity: Low]
Is args->session defined in struct liveupdate_file_op_args?
Looking at the definition in include/linux/liveupdate.h, there is no session
field, and the function liveupdate_get_token_outgoing appears to be
undefined in the kernel tree.
Will this cause a compilation failure?
[ ... ]
> +static int kvm_gmem_luo_retrieve(struct liveupdate_file_op_args *args)
> +{
[ ... ]
> + if (ser->flags & GUEST_MEMFD_LUO_FLAG_MMAP)
> + gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
> + if (ser->flags & GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
> + gmem_flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
> +
> + err = liveupdate_get_file_incoming(args->session, ser->vm_token, &vm_file);
[Severity: Low]
Similar to the freeze path, is args->session defined, and is
liveupdate_get_file_incoming available in the current tree?
[ ... ]
> + if (!kvm) {
> + err = -EINVAL;
> + goto err_free_ser;
> + }
> +
> + file = __kvm_gmem_create_file(kvm, ser->size, gmem_flags);
[Severity: Critical]
Does this miss a kvm_get_kvm() call?
In kvm_gmem_luo_retrieve(), __kvm_gmem_create_file() associates the new
guest_memfd file with the kvm instance. However, without calling
kvm_get_kvm(), the lifecycle linkage is broken.
If the userspace process closes the VM file descriptor before closing the
retrieved guest_memfd, could the KVM instance be freed prematurely, leading
to a use-after-free when the guest_memfd is eventually released?
[Severity: Critical]
Are gmem_flags validated against the current architecture's supported flags?
Unlike kvm_gmem_create(), the retrieval path passes the deserialized flags
directly to __kvm_gmem_create_file() without checking
kvm_gmem_get_supported_flags().
Could an outdated or manipulated KHO image force the creation of a
guest_memfd with mmap enabled on architectures or confidential
configurations that explicitly forbid it?
[Severity: High]
Is ser->size validated before creating the file?
The standard creation path enforces that the size is greater than 0 and page
aligned.
Without these checks, could an invalid size cause issues in the guest_memfd?
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + goto err_free_ser;
> + }
> +
> + inode = file_inode(file);
> +
> + if (ser->nr_folios) {
> + folios_ser = kho_restore_vmalloc(&ser->folios);
> + if (!folios_ser) {
> + err = -EINVAL;
> + goto err_destroy_file;
> + }
> +
> + for (i = 0; i < ser->nr_folios; i++) {
> + struct folio *folio;
> + phys_addr_t phys;
[ ... ]
> + err = filemap_add_folio(inode->i_mapping, folio, folios_ser[i].index,
> + GFP_KERNEL);
[Severity: High]
Are bounds checks needed for folios_ser[i].index?
The deserialized index is trusted implicitly.
If a manipulated KHO image provides out-of-bounds indices greater than or
equal to the file size divided by the page size, could this insert
inaccessible folios into the page cache that consume memory until the file
is closed?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=6
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test
2026-06-22 18:48 ` [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test Tarun Sahu
@ 2026-06-22 19:13 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-06-22 19:13 UTC (permalink / raw)
To: Tarun Sahu; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Stage 2 test failure due to re-creation of preserved KVM resources (memslots and vCPU)
--
commit 323482e513bb6a4836ceedaaec02c412bc10f974
Author: Tarun Sahu <tarunsahu@google.com>
selftests: kvm: Add guest_memfd_preservation_test
This commit adds a new KVM selftest to verify that guest memory backed
by guest_memfd is preserved properly across kexec via LUO. It introduces
a two-stage test to save state before kexec and verify it afterwards.
> diff --git a/tools/testing/selftests/kvm/guest_memfd_preservation_test.c b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
> new file mode 100644
> index 0000000000000..c0a20e788ba79
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
[ ... ]
> +static void run_stage_2(int luo_fd, int state_session_fd)
> +{
> + int retrieved_vm_fd, retrieved_gmem_fd, session_fd, stage;
> + struct vm_shape shape = VM_SHAPE_DEFAULT;
> + const uint64_t gpa = SZ_4G;
> + struct kvm_vcpu *vcpu;
> + const int slot = 1;
> + struct kvm_vm *vm;
[ ... ]
> + retrieved_gmem_fd = luo_session_retrieve_fd(session_fd, GMEM_TOKEN);
> + TEST_ASSERT(retrieved_gmem_fd >= 0, "Failed to retrieve guest_memfd file descriptor");
> +
> + vm = vm_create_from_fd(retrieved_vm_fd, shape);
> +
> + u64 nr_pages = 2048; /* 8MB is plenty for slot0 pages */
> +
> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
[Severity: High]
Does this ioctl fail with -EINVAL because slot 0 already exists in the
preserved KVM VM with a different size? The preserved VM from stage 1 retains
its kernel-side state, which we attempt to blindly re-create here without
respecting the preserved state.
> + kvm_vm_elf_load(vm, program_invocation_name);
> +
> + for (int i = 0; i < NR_MEM_REGIONS; i++)
> + vm->memslots[i] = 0;
> +
> + struct userspace_mem_region *slot0 = memslot2region(vm, 0);
> +
> + ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size);
> +
> + vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
> + retrieved_gmem_fd, 0);
[Severity: High]
Could this fail with -EINVAL because modifying an existing private
KVM_MEM_GUEST_MEMFD memslot is explicitly forbidden by KVM?
> +
> + for (size_t i = 0; i < GMEM_SIZE; i += page_size)
> + virt_pg_map(vm, gpa + i, gpa + i);
> +
> + vcpu = vm_vcpu_add(vm, 0, guest_code_phase2);
[Severity: High]
Will this call fail with -EEXIST since vCPU ID 0 was already created
in Stage 1 and remains preserved within the retrieved KVM VM?
> + kvm_arch_vm_finalize_vcpus(vm);
> +
> + vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622184851.2309827-1-tarunsahu@google.com?part=9
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-06-22 19:13 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 18:48 [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support Tarun Sahu
2026-06-22 19:01 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO Tarun Sahu
2026-06-22 19:06 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings Tarun Sahu
2026-06-22 19:01 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation Tarun Sahu
2026-06-22 19:08 ` sashiko-bot
2026-06-22 18:48 ` [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO Tarun Sahu
2026-06-22 18:54 ` sashiko-bot
2026-06-22 19:04 ` tarunsahu
2026-06-22 18:48 ` [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers Tarun Sahu
2026-06-22 18:48 ` [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test Tarun Sahu
2026-06-22 19:13 ` sashiko-bot
2026-06-22 18:55 ` [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation tarunsahu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.