Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH 4/4] mips: vmcore_info: export mips arch-specific struct offsets to vmcoreinfo
From: Pnina Feder @ 2026-06-22 21:14 UTC (permalink / raw)
  To: Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Thomas Bogendoerfer, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
  Cc: Dave Young, Jonathan Corbet, Alexandre Ghiti, kexec, linux-kernel,
	linux-mips, linux-riscv, linux-doc, Pnina Feder
In-Reply-To: <20260622211430.4008899-1-pnina.feder@mobileye.com>

Export MIPS architecture-specific struct offsets needed by the
vmcore-tasks tool, including signal frame layouts and register
context structures used to reconstruct user-space register state
from a vmcore dump.

Signed-off-by: Pnina Feder <pnina.feder@mobileye.com>
---
 .../admin-guide/kdump/vmcoreinfo.rst          | 34 +++++++++++++++++++
 arch/mips/kernel/Makefile                     |  1 +
 arch/mips/kernel/signal.c                     |  8 +++++
 arch/mips/kernel/vmcore_info.c                | 22 ++++++++++++
 4 files changed, 65 insertions(+)
 create mode 100644 arch/mips/kernel/vmcore_info.c

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 3c364434b846..4af32ddf5615 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -494,6 +494,40 @@ Used to get the vmalloc_start address from the high_memory symbol.
 
 The maximum number of CPUs.
 
+MIPS
+====
+
+(rt_sigframe, rs_uc)
+--------------------
+
+Offset of the ucontext member within the MIPS rt_sigframe structure.
+Used to locate the signal context within a signal frame on the user
+stack.
+
+(sigcontext, sc_regs)
+---------------------
+
+Offset of the saved register array within struct sigcontext. Used to
+extract user-space register state from signal frames in a vmcore dump.
+
+PAGE_SHIFT
+----------
+
+The base-2 logarithm of the page size. Used for page frame number
+calculations during address translation.
+
+_PFN_MASK|_PAGE_PRESENT|_PAGE_VALID|_PAGE_GLOBAL
+-------------------------------------------------
+
+Page table entry bit masks and flags. Used for walking MIPS page tables
+and translating virtual to physical addresses in a vmcore dump.
+
+PTRS_PER_PGD|PTRS_PER_PMD|PTRS_PER_PTE
+---------------------------------------
+
+Number of entries per page table level. Used for page table walking
+during virtual-to-physical address translation.
+
 powerpc
 =======
 
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index 95a1e674fd67..99f2961f6ee1 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -24,6 +24,7 @@ CFLAGS_REMOVE_perf_event_mipsxx.o = $(CC_FLAGS_FTRACE)
 endif
 
 obj-$(CONFIG_CEVT_BCM1480)	+= cevt-bcm1480.o
+obj-$(CONFIG_VMCORE_INFO)	+= vmcore_info.o
 obj-$(CONFIG_CEVT_R4K)		+= cevt-r4k.o
 obj-$(CONFIG_CEVT_DS1287)	+= cevt-ds1287.o
 obj-$(CONFIG_CEVT_GT641XX)	+= cevt-gt641xx.o
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index 4a10f18a8806..f2241f52fa17 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -26,6 +26,7 @@
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
 #include <linux/resume_user_mode.h>
+#include <linux/vmcore_info.h>
 
 #include <asm/abi.h>
 #include <asm/asm.h>
@@ -62,6 +63,13 @@ struct rt_sigframe {
 	struct ucontext rs_uc;
 };
 
+#ifdef CONFIG_VMCORE_INFO
+void mips_rt_signal_frame(void)
+{
+	VMCOREINFO_OFFSET(rt_sigframe, rs_uc);
+}
+#endif
+
 #ifdef CONFIG_MIPS_FP_SUPPORT
 
 /*
diff --git a/arch/mips/kernel/vmcore_info.c b/arch/mips/kernel/vmcore_info.c
new file mode 100644
index 000000000000..5d7fdc662065
--- /dev/null
+++ b/arch/mips/kernel/vmcore_info.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/vmcore_info.h>
+
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+
+extern void mips_rt_signal_frame(void);
+
+void arch_crash_save_vmcoreinfo(void)
+{
+	mips_rt_signal_frame();
+	VMCOREINFO_OFFSET(sigcontext, sc_regs);
+	VMCOREINFO_NUMBER(PAGE_SHIFT);
+	VMCOREINFO_NUMBER(_PFN_MASK);
+	VMCOREINFO_NUMBER(_PAGE_PRESENT);
+	VMCOREINFO_NUMBER(_PAGE_VALID);
+	VMCOREINFO_NUMBER(_PAGE_GLOBAL);
+	VMCOREINFO_NUMBER(PTRS_PER_PGD);
+	VMCOREINFO_NUMBER(PTRS_PER_PMD);
+	VMCOREINFO_NUMBER(PTRS_PER_PTE);
+}
-- 
2.43.0


^ permalink raw reply related

* [PATCH 3/4] riscv: vmcore_info: export riscv arch-specific struct offsets to vmcoreinfo
From: Pnina Feder @ 2026-06-22 21:14 UTC (permalink / raw)
  To: Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Thomas Bogendoerfer, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
  Cc: Dave Young, Jonathan Corbet, Alexandre Ghiti, kexec, linux-kernel,
	linux-mips, linux-riscv, linux-doc, Pnina Feder
In-Reply-To: <20260622211430.4008899-1-pnina.feder@mobileye.com>

Export RISC-V architecture-specific struct offsets needed by the
vmcore-tasks tool, including signal frame layouts and register
context structures used to reconstruct user-space register state
from a vmcore dump.

Signed-off-by: Pnina Feder <pnina.feder@mobileye.com>
---
 .../admin-guide/kdump/vmcoreinfo.rst          | 26 +++++++++++++++++++
 arch/riscv/kernel/signal.c                    |  8 ++++++
 arch/riscv/kernel/vmcore_info.c               | 11 ++++++++
 3 files changed, 45 insertions(+)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 36103b3cdc05..3c364434b846 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -595,6 +595,32 @@ va_kernel_pa_offset
 Indicates the offset between the kernel virtual and physical mappings.
 Used to translate virtual to physical addresses.
 
+STACK_ALIGN
+-----------
+
+Stack alignment requirement for the architecture. Used to locate signal
+frames on the user stack.
+
+(sigcontext, sc_regs)
+---------------------
+
+Offset of the saved register array within struct sigcontext. Used to
+extract user-space register state from signal frames in a vmcore dump.
+
+_PAGE_PFN_SHIFT
+---------------
+
+The bit shift to extract the PFN from a page table entry. Used for
+virtual-to-physical address translation when walking page tables from
+a vmcore dump.
+
+(rt_sigframe, uc)
+-----------------
+
+Offset of the ucontext member within the RISC-V rt_sigframe structure.
+Used to locate the signal context (and thus saved registers) within a
+signal frame on the user stack.
+
 Task and VMA metadata
 =====================
 
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 59784dc117e4..eb03c0ea6aae 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -13,6 +13,7 @@
 #include <linux/resume_user_mode.h>
 #include <linux/linkage.h>
 #include <linux/entry-common.h>
+#include <linux/vmcore_info.h>
 
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
@@ -40,6 +41,13 @@ struct rt_sigframe {
 #endif
 };
 
+#ifdef CONFIG_VMCORE_INFO
+void riscv_rt_signal_frame(void)
+{
+	VMCOREINFO_OFFSET(rt_sigframe, uc);
+}
+#endif
+
 #ifdef CONFIG_FPU
 static long restore_fp_state(struct pt_regs *regs,
 			     union __riscv_fp_state __user *sc_fpregs)
diff --git a/arch/riscv/kernel/vmcore_info.c b/arch/riscv/kernel/vmcore_info.c
index c27efceec3cc..dd174042dba3 100644
--- a/arch/riscv/kernel/vmcore_info.c
+++ b/arch/riscv/kernel/vmcore_info.c
@@ -3,6 +3,12 @@
 #include <linux/vmcore_info.h>
 #include <linux/pagemap.h>
 
+#include <asm/processor.h>
+#include <asm/pgtable-bits.h>
+#include <asm/sigcontext.h>
+
+extern void riscv_rt_signal_frame(void);
+
 static inline u64 get_satp_value(void)
 {
 	return csr_read(CSR_SATP);
@@ -28,4 +34,9 @@ void arch_crash_save_vmcoreinfo(void)
 						kernel_map.va_kernel_pa_offset);
 	vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
 	vmcoreinfo_append_str("NUMBER(satp)=0x%llx\n", get_satp_value());
+	riscv_rt_signal_frame();
+
+	VMCOREINFO_NUMBER(STACK_ALIGN);
+	VMCOREINFO_OFFSET(sigcontext, sc_regs);
+	VMCOREINFO_NUMBER(_PAGE_PFN_SHIFT);
 }
-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/4] vmcoreinfo: export task and mm struct offsets to vmcoreinfo
From: Pnina Feder @ 2026-06-22 21:14 UTC (permalink / raw)
  To: Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Thomas Bogendoerfer, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
  Cc: Dave Young, Jonathan Corbet, Alexandre Ghiti, kexec, linux-kernel,
	linux-mips, linux-riscv, linux-doc, Pnina Feder
In-Reply-To: <20260622211430.4008899-1-pnina.feder@mobileye.com>

Export the struct offsets and sizes needed by the vmcore-tasks tool
to walk task lists, extract register state, and enumerate VMAs from
a vmcore dump. This includes offsets into task_struct, mm_struct,
vm_area_struct, and related structures that are not already covered
by existing vmcoreinfo exports.

Signed-off-by: Pnina Feder <pnina.feder@mobileye.com>
---
 .../admin-guide/kdump/vmcoreinfo.rst          | 77 +++++++++++++++++++
 kernel/vmcore_info.c                          | 60 +++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 7663c610fe90..36103b3cdc05 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -594,3 +594,80 @@ va_kernel_pa_offset
 
 Indicates the offset between the kernel virtual and physical mappings.
 Used to translate virtual to physical addresses.
+
+Task and VMA metadata
+=====================
+
+The following vmcoreinfo entries export struct offsets and sizes needed
+to walk task lists, extract register state, and enumerate VMAs from a
+vmcore dump without requiring kernel debug symbols (DWARF/BTF). Used by
+the vmcore-tasks userspace tool for lightweight post-mortem crash
+analysis.
+
+init_task
+---------
+
+The address of the initial task (swapper). Used as the starting point
+to walk the circular task list via the tasks member.
+
+(task_struct, tasks)|(task_struct, pid)|(task_struct, tgid)|(task_struct, comm)|(task_struct, mm)|(task_struct, stack)|(task_struct, signal)|(task_struct, flags)|(task_struct, __state)|(task_struct, exit_state)|(task_struct, thread_node)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+
+Offsets into task_struct needed to extract per-task metadata: process
+name, PID/TGID, task state, kernel stack pointer, mm_struct pointer,
+signal_struct pointer, and thread group linkage.
+
+(signal_struct, thread_head)|(signal_struct, nr_threads)
+--------------------------------------------------------
+
+Offsets into signal_struct for walking the thread group list and
+determining the number of threads.
+
+(mm_struct, mm_mt)|(mm_struct, pgd)|(mm_struct, start_brk)|(mm_struct, brk)|(mm_struct, start_stack)
+----------------------------------------------------------------------------------------------------
+
+Offsets into mm_struct for accessing the VMA maple tree, page global
+directory, and memory layout boundaries.
+
+Maple tree internals
+--------------------
+
+Offsets for maple_tree, maple_node, maple_range_64, maple_arange_64,
+and maple_metadata structures. These are needed to walk the maple tree
+that stores VMAs (mm_struct.mm_mt) from a vmcore dump.
+
+(vm_area_struct, vm_start)|(vm_area_struct, vm_end)|(vm_area_struct, vm_flags)|(vm_area_struct, vm_file)|(vm_area_struct, vm_mm)
+-------------------------------------------------------------------------------------------------------------------------------
+
+Offsets into vm_area_struct for extracting VMA boundaries, permissions,
+backing file, and owning mm_struct.
+
+(file, f_path)|(path, dentry)|(dentry, d_name)|(dentry, d_parent)|(qstr, hash_len)|(qstr, name)
+------------------------------------------------------------------------------------------------
+
+Offsets for traversing file -> path -> dentry -> name to reconstruct
+the filename backing a VMA.
+
+THREAD_SIZE
+-----------
+
+The size of the kernel stack. Used to locate the pt_regs saved at the
+top of the kernel stack for each task.
+
+(ucontext, uc_mcontext)
+-----------------------
+
+Offset of the machine context within struct ucontext. Used to locate
+saved registers within a signal frame.
+
+__NR_rt_sigreturn
+-----------------
+
+The rt_sigreturn syscall number. Used to identify signal frame return
+trampolines on the user stack during backtrace reconstruction.
+
+CONFIG_PGTABLE_LEVELS|PMD_SHIFT|PGDIR_SHIFT
+--------------------------------------------
+
+Page table geometry constants. Used for walking page tables to translate
+user virtual addresses to physical addresses in a vmcore dump.
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index 8614430ca212..f963274ab1a2 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -17,6 +17,7 @@
 
 #include <asm/page.h>
 #include <asm/sections.h>
+#include <asm/ucontext.h>
 
 #include "kallsyms_internal.h"
 #include "kexec_internal.h"
@@ -244,6 +245,65 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_SYMBOL(kallsyms_offsets);
 #endif /* CONFIG_KALLSYMS */
 
+	VMCOREINFO_SYMBOL(init_task);
+	VMCOREINFO_STRUCT_SIZE(task_struct);
+	VMCOREINFO_OFFSET(task_struct, tasks);
+	VMCOREINFO_OFFSET(task_struct, thread_node);
+	VMCOREINFO_OFFSET(task_struct, pid);
+	VMCOREINFO_OFFSET(task_struct, tgid);
+	VMCOREINFO_OFFSET(task_struct, exit_state);
+	VMCOREINFO_OFFSET(task_struct, __state);
+	VMCOREINFO_OFFSET(task_struct, flags);
+	VMCOREINFO_OFFSET(task_struct, comm);
+	VMCOREINFO_OFFSET(task_struct, stack);
+	VMCOREINFO_OFFSET(task_struct, signal);
+	VMCOREINFO_OFFSET(signal_struct, thread_head);
+	VMCOREINFO_OFFSET(signal_struct, nr_threads);
+	VMCOREINFO_OFFSET(task_struct, mm);
+	VMCOREINFO_STRUCT_SIZE(mm_struct);
+	VMCOREINFO_OFFSET(mm_struct, mm_mt);
+	VMCOREINFO_OFFSET(mm_struct, pgd);
+	VMCOREINFO_OFFSET(mm_struct, start_brk);
+	VMCOREINFO_OFFSET(mm_struct, brk);
+	VMCOREINFO_OFFSET(mm_struct, start_stack);
+	VMCOREINFO_STRUCT_SIZE(maple_tree);
+	VMCOREINFO_OFFSET(maple_tree, ma_root);
+	VMCOREINFO_OFFSET(maple_tree, ma_flags);
+	VMCOREINFO_STRUCT_SIZE(maple_node);
+	VMCOREINFO_OFFSET(maple_node, slot);
+	VMCOREINFO_OFFSET(maple_node, parent);
+	VMCOREINFO_OFFSET(maple_node, ma64);
+	VMCOREINFO_OFFSET(maple_node, mr64);
+	VMCOREINFO_OFFSET(maple_range_64, pivot);
+	VMCOREINFO_OFFSET(maple_range_64, slot);
+	VMCOREINFO_OFFSET(maple_metadata, end);
+	VMCOREINFO_OFFSET(maple_metadata, gap);
+	VMCOREINFO_OFFSET(maple_arange_64, pivot);
+	VMCOREINFO_OFFSET(maple_arange_64, slot);
+	VMCOREINFO_OFFSET(maple_arange_64, gap);
+	VMCOREINFO_OFFSET(maple_arange_64, meta);
+	VMCOREINFO_STRUCT_SIZE(vm_area_struct);
+	VMCOREINFO_OFFSET(vm_area_struct, vm_start);
+	VMCOREINFO_OFFSET(vm_area_struct, vm_end);
+	VMCOREINFO_OFFSET(vm_area_struct, vm_flags);
+	VMCOREINFO_OFFSET(vm_area_struct, vm_file);
+	VMCOREINFO_OFFSET(vm_area_struct, vm_mm);
+	VMCOREINFO_STRUCT_SIZE(file);
+	VMCOREINFO_OFFSET(file, f_path);
+	VMCOREINFO_OFFSET(path, dentry);
+	VMCOREINFO_STRUCT_SIZE(dentry);
+	VMCOREINFO_OFFSET(dentry, d_name);
+	VMCOREINFO_OFFSET(dentry, d_parent);
+	VMCOREINFO_OFFSET(qstr, hash_len);
+	VMCOREINFO_OFFSET(qstr, name);
+	VMCOREINFO_NUMBER(THREAD_SIZE);
+	VMCOREINFO_STRUCT_SIZE(pt_regs);
+	VMCOREINFO_OFFSET(ucontext, uc_mcontext);
+	VMCOREINFO_NUMBER(__NR_rt_sigreturn);
+	VMCOREINFO_NUMBER(CONFIG_PGTABLE_LEVELS);
+	VMCOREINFO_NUMBER(PMD_SHIFT);
+	VMCOREINFO_NUMBER(PGDIR_SHIFT);
+
 	arch_crash_save_vmcoreinfo();
 	update_vmcoreinfo_note();
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/4] vmcoreinfo: increase vmcoreinfo buffer to 8KB
From: Pnina Feder @ 2026-06-22 21:14 UTC (permalink / raw)
  To: Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Thomas Bogendoerfer, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
  Cc: Dave Young, Jonathan Corbet, Alexandre Ghiti, kexec, linux-kernel,
	linux-mips, linux-riscv, linux-doc, Pnina Feder
In-Reply-To: <20260622211430.4008899-1-pnina.feder@mobileye.com>

Additional metadata will be exported to vmcoreinfo, requiring more
buffer space than a single 4KB page provides.

Change VMCOREINFO_BYTES from PAGE_SIZE to a fixed SZ_8K. This
decouples the buffer size from the page size, avoiding waste on
architectures with large pages (e.g. 16KB on MIPS, 64KB on arm64)
while providing enough space on 4KB-page architectures like RISC-V.

The existing allocation in kimage_crash_copy_vmcoreinfo() already
uses get_order() and DIV_ROUND_UP(), so it correctly rounds up to
whole pages regardless of the constant's value.

Signed-off-by: Pnina Feder <pnina.feder@mobileye.com>
---
 include/linux/vmcore_info.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmcore_info.h b/include/linux/vmcore_info.h
index e71518caacdf..612dcf7b9ecd 100644
--- a/include/linux/vmcore_info.h
+++ b/include/linux/vmcore_info.h
@@ -20,7 +20,8 @@
 				     CRASH_CORE_NOTE_NAME_BYTES +	\
 				     CRASH_CORE_NOTE_DESC_BYTES)

-#define VMCOREINFO_BYTES	   PAGE_SIZE
+/* Fixed size independent of PAGE_SIZE to avoid waste on large-page archs */
+#define VMCOREINFO_BYTES	   SZ_8K
 #define VMCOREINFO_NOTE_NAME	   "VMCOREINFO"
 #define VMCOREINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCOREINFO_NOTE_NAME), 4)
 #define VMCOREINFO_NOTE_SIZE	   ((CRASH_CORE_NOTE_HEAD_BYTES * 2) +	\
-- 
2.43.0

^ permalink raw reply related

* [PATCH 0/4] vmcore-tasks: export per-task metadata to vmcoreinfo
From: Pnina Feder @ 2026-06-22 21:14 UTC (permalink / raw)
  To: Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Thomas Bogendoerfer, Paul Walmsley,
	Palmer Dabbelt, Albert Ou
  Cc: Dave Young, Jonathan Corbet, Alexandre Ghiti, kexec, linux-kernel,
	linux-mips, linux-riscv, linux-doc, Pnina Feder

This series extends vmcoreinfo with struct offsets and sizes needed by
the vmcore-tasks userspace tool to extract per-task state from a vmcore
dump without requiring kernel debug symbols (DWARF/BTF).

The vmcore-tasks tool reads /proc/vmcore (or a saved vmcore file) and
reconstructs, for each task:
  - task name, pid, state, flags
  - VMA list (start, end, flags, backing file)
  - user register state (saved on the kernel stack at kernel entry)
  - user-space backtrace with VMA/filename mapping
  - kernel dmesg buffer

This provides a lightweight post-mortem crash analysis capability for
production environments where full debug info (DWARF/BTF) is not
available.

The companion userspace tool is submitted to kexec-tools:
  https://lore.kernel.org/all/20260622205550.1087163-1-pnina.feder@mobileye.com/

The series is structured as follows:

  Patch 1: Increase vmcoreinfo buffer from PAGE_SIZE to a fixed SZ_8K,
           decoupled from page size to avoid waste on large-page
           architectures (MIPS 16KB, arm64 64KB).

  Patch 2: Export generic struct offsets (task_struct, mm_struct,
           vm_area_struct, maple_tree, file/dentry/path, pt_regs,
           signal_struct) needed to walk task lists and VMAs.

  Patch 3: Export RISC-V arch-specific offsets (signal frame layouts,
           register context structures) for user register extraction.

  Patch 4: Export MIPS arch-specific offsets (signal frame layouts,
           register context structures) for user register extraction.

Additional architecture support (arm64, x86, etc.) can follow the
same pattern established by patches 3 and 4.

Tested on MIPS64 (QEMU Malta) and RISC-V with full kdump pipeline:
primary kernel -> kexec panic -> crash kernel -> vmcore-tasks analysis.

Pnina Feder (4):
  vmcoreinfo: increase vmcoreinfo buffer to 8KB
  vmcoreinfo: export task and mm struct offsets to vmcoreinfo
  riscv: vmcore_info: export riscv arch-specific struct offsets to
    vmcoreinfo
  mips: vmcore_info: export mips arch-specific struct offsets to
    vmcoreinfo

 .../admin-guide/kdump/vmcoreinfo.rst          | 137 ++++++++++++++++++
 arch/mips/kernel/Makefile                     |   1 +
 arch/mips/kernel/signal.c                     |   8 +
 arch/mips/kernel/vmcore_info.c                |  22 +++
 arch/riscv/kernel/signal.c                    |   8 +
 arch/riscv/kernel/vmcore_info.c               |  11 ++
 include/linux/vmcore_info.h                   |   3 +-
 kernel/vmcore_info.c                          |  60 ++++++++
 8 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 arch/mips/kernel/vmcore_info.c

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH 1/4] nfs: store the full NFS fileid in inode->i_ino
From: Mark Brown @ 2026-06-22 21:05 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan,
	linux-nfs, linux-kernel, linux-doc
In-Reply-To: <20260512-nfsino-v1-1-284720522f4c@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4803 bytes --]

On Tue, May 12, 2026 at 12:12:42PM -0400, Jeff Layton wrote:
> Now that inode->i_ino is a 64-bit value, store the full NFS fileid in
> it directly instead of an XOR-folded hash. This makes NFS_FILEID() and
> set_nfs_fileid() operate on inode->i_ino rather than the separate
> nfsi->fileid field.

This patch is in -next now and is triggering a failure for in the LTP
ioctl10.c test for me on arm:

tst_buffers.c:57: TINFO: Test is using guarded buffers
tst_test.c:2047: TINFO: LTP version: 20260130
tst_test.c:2050: TINFO: Tested kernel: 7.1.0-next-20260622 #1 SMP @1782128788 armv7l

...

ioctl10.c:111: TFAIL: q->inode (11493907226) != entry.vm_inode (4294967295)

arm64 seems unaffected, I didn't really investigate but I'll note that
unsigned long is 32 bit on arm.

Full log:

   https://lava.sirena.org.uk/scheduler/job/2904745#L3852

bisect log with more test job links:

git bisect start
# status: waiting for both good and bad commits
# good: [7f5d1580a3723e4ea89001a67a24d9f350e15c01] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
git bisect good 7f5d1580a3723e4ea89001a67a24d9f350e15c01
# status: waiting for bad commit, 1 good commit known
# bad: [948efecf22e49aa4bf55bb73ec79a0ddcfd38571] Add linux-next specific files for 20260622
git bisect bad 948efecf22e49aa4bf55bb73ec79a0ddcfd38571
# test job: [3c54940fe511142cfe574022c3b703271982d64c] https://lava.sirena.org.uk/scheduler/job/2905311
# bad: [3c54940fe511142cfe574022c3b703271982d64c] Merge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.git
git bisect bad 3c54940fe511142cfe574022c3b703271982d64c
# test job: [80895ca480e9a42f961914ae5c947a66c130b344] https://lava.sirena.org.uk/scheduler/job/2905400
# good: [80895ca480e9a42f961914ae5c947a66c130b344] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux.git
git bisect good 80895ca480e9a42f961914ae5c947a66c130b344
# test job: [2b8c085b832b07b3f7f3b7b7d06388920daf2a54] https://lava.sirena.org.uk/scheduler/job/2905436
# bad: [2b8c085b832b07b3f7f3b7b7d06388920daf2a54] Merge branch 'fs-next' of linux-next
git bisect bad 2b8c085b832b07b3f7f3b7b7d06388920daf2a54
# test job: [034e46edded1d4fc91f53c16c53f82b1c5908ca5] https://lava.sirena.org.uk/scheduler/job/2905486
# bad: [034e46edded1d4fc91f53c16c53f82b1c5908ca5] Merge branch 'linux-next' of git://git.linux-nfs.org/projects/anna/linux-nfs.git
git bisect bad 034e46edded1d4fc91f53c16c53f82b1c5908ca5
# test job: [5f03612db546bdffbcc1ebd343d055612948317c] https://lava.sirena.org.uk/scheduler/job/2905541
# good: [5f03612db546bdffbcc1ebd343d055612948317c] Merge branch 'for_next' of https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
git bisect good 5f03612db546bdffbcc1ebd343d055612948317c
# test job: [eb3dd8eb882bf0d1daacd0debc0f3e946a3ee1b8] https://lava.sirena.org.uk/scheduler/job/2905673
# good: [eb3dd8eb882bf0d1daacd0debc0f3e946a3ee1b8] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
git bisect good eb3dd8eb882bf0d1daacd0debc0f3e946a3ee1b8
# test job: [b1819a4e1d531b4b1d06405fbe73e5e20c402b53] https://lava.sirena.org.uk/scheduler/job/2905815
# good: [b1819a4e1d531b4b1d06405fbe73e5e20c402b53] ksmbd: sleep interruptibly in the durable handle scavenger
git bisect good b1819a4e1d531b4b1d06405fbe73e5e20c402b53
# test job: [17d90b68c3a3d7d7e95b49e1fe9381a723f637a8] https://lava.sirena.org.uk/scheduler/job/2906138
# bad: [17d90b68c3a3d7d7e95b49e1fe9381a723f637a8] sunrpc: fix uninitialized xprt_create_args structure
git bisect bad 17d90b68c3a3d7d7e95b49e1fe9381a723f637a8
# test job: [35168eb947f230aaa35fd8416a30563ef89f5421] https://lava.sirena.org.uk/scheduler/job/2906213
# bad: [35168eb947f230aaa35fd8416a30563ef89f5421] NFS: fix eof updates after NFSv4.2 fallocate/zero-range
git bisect bad 35168eb947f230aaa35fd8416a30563ef89f5421
# test job: [37957478be021b92981aa4c99b69f308d3b784d0] https://lava.sirena.org.uk/scheduler/job/2863766
# bad: [37957478be021b92981aa4c99b69f308d3b784d0] sunrpc: Fix error handling in rpc_sysfs_xprt_switch_add_xprt_store()
git bisect bad 37957478be021b92981aa4c99b69f308d3b784d0
# test job: [0e06a884f5ba6226829441bfc656ff9f5e9e90ac] https://lava.sirena.org.uk/scheduler/job/2863828
# bad: [0e06a884f5ba6226829441bfc656ff9f5e9e90ac] nfs: remove nfs_compat_user_ino64() and deprecate enable_ino64
git bisect bad 0e06a884f5ba6226829441bfc656ff9f5e9e90ac
# test job: [0cad7630425f4c9ee0dfa376ff8bf60c88ff2566] https://lava.sirena.org.uk/scheduler/job/2864357
# bad: [0cad7630425f4c9ee0dfa376ff8bf60c88ff2566] nfs: store the full NFS fileid in inode->i_ino
git bisect bad 0cad7630425f4c9ee0dfa376ff8bf60c88ff2566
# first bad commit: [0cad7630425f4c9ee0dfa376ff8bf60c88ff2566] nfs: store the full NFS fileid in inode->i_ino

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] Docs/driver-api/uio-howto: document mmap_prepare callback
From: Randy Dunlap @ 2026-06-22 20:10 UTC (permalink / raw)
  To: Doehyun Baek, Greg Kroah-Hartman, Jonathan Corbet, Shuah Khan
  Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, linux-doc,
	linux-kernel
In-Reply-To: <20260622181821.1195257-1-doehyunbaek@gmail.com>



On 6/22/26 11:18 AM, Doehyun Baek wrote:
> The UIO howto still documents an mmap callback in struct uio_info.
> That field was replaced by mmap_prepare, which takes a struct
> vm_area_desc.
> 
> A UIO driver following the current howto no longer builds because
> struct uio_info has no mmap member. Update the documented callback
> signature and matching text to match the current API.
> 
> Fixes: 933f05f58ac6 ("uio: replace deprecated mmap hook with mmap_prepare in uio_info")
> Signed-off-by: Doehyun Baek <doehyunbaek@gmail.com>

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Thanks.

> ---
>  Documentation/driver-api/uio-howto.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst
> index 907ffa3b38f5..c08472dfbcfe 100644
> --- a/Documentation/driver-api/uio-howto.rst
> +++ b/Documentation/driver-api/uio-howto.rst
> @@ -246,10 +246,10 @@ the members are required, others are optional.
>     hardware interrupt number. The flags given here will be used in the
>     call to :c:func:`request_irq()`.
>  
> --  ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``:
> +-  ``int (*mmap_prepare)(struct uio_info *info, struct vm_area_desc *desc)``:
>     Optional. If you need a special :c:func:`mmap()`
>     function, you can set it here. If this pointer is not NULL, your
> -   :c:func:`mmap()` will be called instead of the built-in one.
> +   ``mmap_prepare`` will be called instead of the built-in one.
>  
>  -  ``int (*open)(struct uio_info *info, struct inode *inode)``:
>     Optional. You might want to have your own :c:func:`open()`,
> 
> base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd

-- 
~Randy

^ permalink raw reply

* Re: [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group
From: Babu Moger @ 2026-06-22 19:03 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tony.luck, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman
In-Reply-To: <510ee961-b3a3-41ef-857f-6dc210b6eb83@intel.com>

Hi Reinette,

On 6/22/26 11:47, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/18/26 6:29 PM, Babu Moger wrote:
>> On 6/16/26 18:42, Reinette Chatre wrote:
>>> On 4/30/26 4:24 PM, Babu Moger wrote:
> 
> ...
> 
>>>> +/**
>>>> + * rdtgroup_config_kmode_clear() - Tear down the kernel-mode binding on @rdtgrp
>>>> + * @rdtgrp:    Resctrl group whose kernel-mode binding is being released.
>>>> + *        May be %NULL when no group is currently bound, in which case
>>>> + *        this is a no-op.
>>>> + * @kmode:    Kernel-mode policy currently active on @rdtgrp, as a
>>>> + *        BIT(&enum resctrl_kernel_modes) value.  When this is
>>>> + *        BIT(INHERIT_CTRL_AND_MON) the hardware tear-down is skipped
>>>> + *        because no MSR was previously programmed.
>>>> + *
>>>> + * Disables the kernel-mode binding on the CPUs @rdtgrp covers (its
>>>> + * @kmode_cpu_mask, or all online CPUs when that mask is empty) and resets
>>>> + * the per-group bookkeeping (@kmode and @kmode_cpu_mask).  This is the
>>>> + * disable counterpart of rdtgroup_config_kmode() and exists so that a write
>>>> + * that transitions the active mode to BIT(INHERIT_CTRL_AND_MON) -- which
>>>> + * skips rdtgroup_config_kmode() entirely -- still tears down the previously
>>>> + * bound group instead of leaving stale enable bits behind.
>>>> + *
>>>> + * On allocation failure the function returns -ENOMEM and leaves both the
>>>> + * hardware state and @rdtgrp's bookkeeping unchanged so the caller can fail
>>>> + * the operation atomically and last_cmd_status reflects reality.
>>>> + *
>>>> + * Context: Caller must hold rdtgroup_mutex.
>>>> + *
>>>> + * Return: 0 on success (including the @rdtgrp == %NULL and INHERIT cases),
>>>> + * -ENOMEM if cpumask allocation fails.
>>>> + */
>>>> +static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
>>>> +{
>>>> +    cpumask_var_t disable_mask;
>>>> +    u32 closid, rmid;
>>>> +
>>>> +    if (!rdtgrp)
>>>> +        return 0;
>>>> +
>>>> +    if (kmode == BIT(INHERIT_CTRL_AND_MON))
>>>> +        goto out_clear;
>>>> +
>>>> +    if (!zalloc_cpumask_var(&disable_mask, GFP_KERNEL))
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if (rdtgrp->type == RDTMON_GROUP) {
>>>> +        closid = rdtgrp->mon.parent->closid;
>>>> +        rmid = rdtgrp->mon.rmid;
>>>> +    } else {
>>>> +        closid = rdtgrp->closid;
>>>> +        rmid = rdtgrp->mon.rmid;
>>>> +    }
>>>
>>
>> I can directly use it like below. I dont need to check for RDTMON_GROUP.
>>
>>      closid = rdtgrp->closid;
>>       rmid = rdtgrp->mon.rmid;
>>
>>
>>> Same comment as above ... but actually, why is closid/rmid needed at all? This
>>> function is intended to *reset* the kernel mode so needing a valid/active closid and
>>> rmid does not look right.
>>
>> This is a bit tricky. I may need CLOSID/RMID in
>> resctrl_arch_configure_kmode(). According to the specification, only
>> the PLZA_EN field is allowed to differ across CPUs where PLZA is
>> enabled; all other fields must remain consistent across CPUs within
>> the same domain. If CLOSID/RMID are not passed, it could result in
>> inconsistent values across CPUs.
> 
> 
> I see. Let's revisit this in next version. It is not quite clear to me how
> the rework of cpu_mask wrangling will impact the resctrl_arch_configure_kmode()
> calls. To simplify this for now resctrl could continue to provide closid and rmid
> to architecture (with the API documentation in include/linux/resctrl.h documenting
> why it is provided and that it may be unused by architecture).
> 

Sounds good. Lets revisit this again.

> 
> 
>>>> +
>>>> +    /*
>>>> +     * Split "<mode>:group=<spec>"; the ":group=<spec>" suffix is optional
>>>> +     * and when omitted the default control group (&rdtgroup_default) is used.
>>>> +     */
>>>> +    group_str = strstr(buf, ":group=");
>>>> +    if (group_str) {
>>>> +        *group_str = '\0';
>>>> +        group_str += strlen(":group=");
>>>> +    }
>>>> +    mode_str = buf;
>>>> +
>>>> +    mutex_lock(&rdtgroup_mutex);
>>>> +    rdt_last_cmd_clear();
>>>> +
>>>> +    for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++)
>>>> +        if (!strcmp(mode_str, resctrl_mode_str[i]))
>>>> +            break;
>>>> +    if (i == RESCTRL_NUM_KERNEL_MODES) {
>>>> +        rdt_last_cmd_puts("Unknown kernel mode\n");
>>>> +        ret = -EINVAL;
>>>> +        goto out_unlock;
>>>> +    }
>>>> +
>>>> +    if (!(resctrl_kcfg.kmode & BIT(i))) {
>>>> +        rdt_last_cmd_puts("Kernel mode not available\n");
>>>> +        ret = -EINVAL;
>>>> +        goto out_unlock;
>>>> +    }
>>>> +
>>>> +    kmode = BIT(i);
>>>
>>> Can kmode be of enum type to be assigned the actual enum value to avoid all these BIT(enum value) usages?
>>
>> You mean?
>>
>> enum resctrl_kernel_modes {
>>      INHERIT_CTRL_AND_MON        = 1U << 0,  /* 1 */
>>      GLOBAL_ASSIGN_CTRL_INHERIT_MON    = 1U << 1,  /* 2 */
>>      GLOBAL_ASSIGN_CTRL_ASSIGN_MON    = 1U << 2,  /* 4 */
>> };
>>
>> #define RESCTRL_NUM_KERNEL_MODES  3
> 
> No. I mean:
> 	enum resctrl_kernel_mode kmode;
> ... with a change like this code like below can be simplified:
> 
>>>> +    if (kmode == BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU) &&
> 
> 	kmode == GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU

Sure. Will do.

Thanks
Babu


^ permalink raw reply

* Re: [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation
From: tarunsahu @ 2026-06-22 18:55 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Pasha Tatashin, seanjc, ackerleytng,
	aneesh.kumar, fvdl, sagis, david, dmatlack, mark.rutland
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>


+ Adding More people to the series (To:) which I missed in my original message.

~Tarun

Tarun Sahu <tarunsahu@google.com> writes:

> Hello,
> This is Non-RFC patch series for guest_memfd preservation. After
> having multiple discussion across hypervisor liveupdate meeting,
> guest_memfd bi-weekly meeting, the design for the basic support of
> guest_memfd preservation is final. This series is going to include
> guest_memfd which are fully shared and does not support private mem
> and backed by PAGE_SIZE pages.
>
> Steps to test:
> 1. Compile Kernel with CONFIG_LIVEUPDATE_GUEST_MEMFD=y
> 2. boot kernel with command line: kho=on liveupdate=on
> 3. run the following kselftest
> 	$ .selftests/kvm/guest_memfd_preservation_test --stage 1
> 	$ <kexec> --reuse-cmdline
> 	$ .selftests/kvm/guest_memfd_preservation_test --stage 2
>
> NOTE: Assert the following:
> 	$ ls /dev/liveupdate
> 	$ ls /dev/kvm
> 	$ dmesg | grep liveupdate # (should have kvm_vm_luo &&
> 		# guest_memfd_luo handler registered)
>
> The changes are rebased on:
> 	kvm/next + liveupdate/next (merge) + [3] + [4] + [5]
> 	Where,
> 	[3]: luo: conversion of serialized_data to KHOSER_PTR
> 	[4]: luo: APIs to retrieve file internally from session
> 	[5]: selftests: liveupdate sefltests library
> Here is the github repo:
> 	https://github.com/tar-unix/linux/tree/gmem-pre
>
> V3 <- RFC V2 [2]
> 1. Finalize the design
> 2. resolve sashiko reported bugs
> 3. Use of KHOSER_PTR instead of raw serialized_data as per [3]
>
> RFC V2 [2] <- RFC V1 [1]
> 1. Removed mem_attr_array as it is not needed for fully-shared
> 2. Removed pre-faulted condition
> 3. Added vm_type preservation for ARM64.
> 4. Removed liveupdate_get_file_incoming api patch as it is sent
>    separately [4] by Samiullah.
>
> [1] https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/
> [2] https://lore.kernel.org/all/c054ba0fb2639932bbe354420d3f4f84cce84905.1780676742.git.tarunsahu@google.com/
> [3] https://lore.kernel.org/all/20260622111215.4157974-1-tarunsahu@google.com/
> [4] https://lore.kernel.org/all/20260613012521.835490-1-skhawaja@google.com/
> [5] https://lore.kernel.org/all/20260612214512.464146-1-vipinsh@google.com/
>
> Tarun Sahu (9):
>   liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
>   kvm: Prepare core VM structs and helpers for LUO support
>   kvm: kvm_luo: Allow kvm preservation with LUO
>   kvm: guest_memfd: Move internal definitions and helper to new header
>   kvm: guest_memfd: Add support for freezing and unfreezing mappings
>   kvm: guest_memfd_luo: add support for guest_memfd preservation
>   docs: add documentation for guest_memfd preservation via LUO
>   selftests: kvm: Split ____vm_create() to expose init helpers
>   selftests: kvm: Add guest_memfd_preservation_test
>
>  Documentation/core-api/liveupdate.rst         |   1 +
>  Documentation/liveupdate/vmm.rst              | 107 ++++
>  MAINTAINERS                                   |  14 +
>  include/linux/kho/abi/kvm.h                   | 106 ++++
>  include/linux/kvm_host.h                      |  14 +
>  kernel/liveupdate/Kconfig                     |  15 +
>  tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
>  .../kvm/guest_memfd_preservation_test.c       | 236 +++++++++
>  .../testing/selftests/kvm/include/kvm_util.h  |   2 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  26 +-
>  virt/kvm/Makefile.kvm                         |   1 +
>  virt/kvm/guest_memfd.c                        | 185 +++++--
>  virt/kvm/guest_memfd.h                        |  44 ++
>  virt/kvm/guest_memfd_luo.c                    | 497 ++++++++++++++++++
>  virt/kvm/kvm_luo.c                            | 195 +++++++
>  virt/kvm/kvm_main.c                           |  94 +++-
>  virt/kvm/kvm_mm.h                             |  15 +
>  17 files changed, 1477 insertions(+), 81 deletions(-)
>  create mode 100644 Documentation/liveupdate/vmm.rst
>  create mode 100644 include/linux/kho/abi/kvm.h
>  create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
>  create mode 100644 virt/kvm/guest_memfd.h
>  create mode 100644 virt/kvm/guest_memfd_luo.c
>  create mode 100644 virt/kvm/kvm_luo.c
>
> -- 
> 2.55.0.rc0.786.g65d90a0328-goog

^ permalink raw reply

* [PATCH v3 9/9] selftests: kvm: Add guest_memfd_preservation_test
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Add a new KVM selftest `guest_memfd_preservation_test` to verify that
guest memory backed by guest_memfd is preserved properly.

Here, I have used the kvm selftests framework by creating a new
vm and mapping two memory slots to it. One is the code that is executed
inside the vm and other is the guest_memfd whose memory is being
written by the guest code.

In Stage 1: Once data is written the vm exits and wait for the user
to trigger the kexec.

In Stage 2: A new vm is created with retrieved kvm and again two
memory slots are assigned. Once for guest code, and another is for
retrieved guest_memfd where guest_memfd memory is verified by the
executed guest code. If verification succeeds, The test passes.

// Kernel is compiled with CONFIG_LIVEUPDATE_GUEST_MEMFD and booted
// with kho=on liveupdate=on command line parameter.

$ ./selftests/kvm/guest_memfd_preservation_test --stage 1
$ <kexec>
$ ./selftests/kvm/guest_memfd_preservation_test --stage 2

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 MAINTAINERS                                   |   1 +
 tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
 .../kvm/guest_memfd_preservation_test.c       | 236 ++++++++++++++++++
 3 files changed, 242 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e27b677..d0033a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14421,6 +14421,7 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
 F:	Documentation/liveupdate/vmm.rst
+F:	tools/testing/selftests/kvm/guest_memfd_preservation_test.c
 F:	virt/kvm/guest_memfd_luo.c
 F:	virt/kvm/kvm_luo.c
 
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index d28a057..d5bc8be2 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -164,6 +164,8 @@ TEST_GEN_PROGS_x86 += pre_fault_memory_test
 
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
+# Manual test that forks a persistent background daemon; skip auto CI run
+TEST_GEN_PROGS_EXTENDED_x86 += guest_memfd_preservation_test
 
 TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON)
 TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs
@@ -258,6 +260,7 @@ OVERRIDE_TARGETS = 1
 # which causes the environment variable to override the makefile).
 include ../lib.mk
 include ../cgroup/lib/libcgroup.mk
+include ../liveupdate/lib/libliveupdate.mk
 
 INSTALL_HDR_PATH = $(top_srcdir)/usr
 LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/
@@ -312,7 +315,8 @@ LIBKVM_S := $(filter %.S,$(LIBKVM))
 LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
 LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
 LIBKVM_STRING_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_STRING))
-LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ) $(LIBCGROUP_O)
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ) \
+						$(LIBCGROUP_O) $(LIBLIVEUPDATE_O)
 SPLIT_TEST_GEN_PROGS := $(patsubst %, $(OUTPUT)/%, $(SPLIT_TESTS))
 SPLIT_TEST_GEN_OBJ := $(patsubst %, $(OUTPUT)/$(ARCH)/%.o, $(SPLIT_TESTS))
 
diff --git a/tools/testing/selftests/kvm/guest_memfd_preservation_test.c b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
new file mode 100644
index 0000000..c0a20e7
--- /dev/null
+++ b/tools/testing/selftests/kvm/guest_memfd_preservation_test.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2026, Google LLC.
+ *
+ * Author: Tarun Sahu <tarunsahu@google.com>
+ *
+ * Test for VM and guest_memfd preservation across kexec (Live Update) via LUO.
+ *
+ * NOTE: This is a MANUAL test and is excluded from automated CI/testing
+ * frameworks because Stage 1 daemonizes into the background to pin resources
+ * and requires a human operator to manually trigger kexec before Stage 2
+ * is executed. Running Stage 1 automatically would leak the background daemon
+ * and cause CI runners to falsely interpret it as a passed test.
+ *
+ * Usage:
+ * Stage 1: ./guest_memfd_preservation_test --stage 1
+ * Stage 2: ./guest_memfd_preservation_test --stage 2
+ */
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <linux/sizes.h>
+#include <linux/falloc.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+#include "ucall_common.h"
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#include <libliveupdate.h>
+
+#define SESSION_NAME "gmem_vm_preservation_session"
+#define VM_TOKEN 0x1001
+#define GMEM_TOKEN 0x1002
+
+#define STATE_SESSION_NAME "gmem_preservation_state"
+#define STATE_TOKEN 0x999
+
+#define GMEM_SIZE (16ULL * 1024 * 1024)
+#define DATA_SIZE (5ULL * 1024 * 1024)
+
+static size_t page_size;
+
+/* Deterministic byte pattern generation based on offset */
+static inline uint8_t get_pattern_byte(size_t offset)
+{
+	return (uint8_t)(offset ^ 0x5A);
+}
+
+static void guest_code_phase1(uint64_t gpa, uint64_t size, uint64_t data_size)
+{
+	uint8_t *mem = (uint8_t *)gpa;
+	size_t i;
+
+	for (i = 0; i < data_size; i++)
+		mem[i] = get_pattern_byte(i);
+
+	GUEST_DONE();
+}
+
+static void guest_code_phase2(uint64_t gpa, uint64_t size, uint64_t data_size)
+{
+	uint8_t *mem = (uint8_t *)gpa;
+	size_t i;
+
+	for (i = 0; i < data_size; i++) {
+		uint8_t val = get_pattern_byte(i);
+
+		__GUEST_ASSERT(mem[i] == val,
+			       "Data mismatch at offset %lu! Expected 0x%x, got 0x%x",
+			       i, val, mem[i]);
+	}
+
+	GUEST_DONE();
+}
+
+static void run_stage_1(int luo_fd)
+{
+	uint64_t flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+	int gmem_fd, session_fd, ret;
+	const uint64_t gpa = SZ_4G;
+	struct kvm_vcpu *vcpu;
+	const int slot = 1;
+	struct kvm_vm *vm;
+
+	ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
+
+	ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
+	create_state_file(luo_fd, STATE_SESSION_NAME, STATE_TOKEN, 2);
+
+	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1,
+					guest_code_phase1);
+	gmem_fd = vm_create_guest_memfd(vm, GMEM_SIZE, flags);
+	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
+				 gmem_fd, 0);
+
+	for (size_t i = 0; i < GMEM_SIZE; i += page_size)
+		virt_pg_map(vm, gpa + i, gpa + i);
+
+	vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+	ksft_print_msg("[STAGE 1] Creating session '%s' and preserving VM/guest_memfd...\n",
+		       SESSION_NAME);
+	session_fd = luo_create_session(luo_fd, SESSION_NAME);
+	TEST_ASSERT(session_fd >= 0, "Failed to create LUO session");
+
+	ret = luo_session_preserve_fd(session_fd, vm->fd, VM_TOKEN);
+	TEST_ASSERT(ret == 0, "Failed to preserve VM file descriptor");
+
+	ret = luo_session_preserve_fd(session_fd, gmem_fd, GMEM_TOKEN);
+	TEST_ASSERT(ret == 0, "Failed to preserve guest_memfd file descriptor");
+
+	printf("\n============================================================\n");
+	printf("Phase 1 Complete Successfully!\n");
+	printf("VM file and guest_memfd file have been preserved via LUO.\n");
+	printf("Tokens: VM_TOKEN=0x%x, GMEM_TOKEN=0x%x\n", VM_TOKEN, GMEM_TOKEN);
+	printf("Machine Size: %llu MB, Data Size: %llu MB\n", GMEM_SIZE / SZ_1M,
+				 DATA_SIZE / SZ_1M);
+	printf("------------------------------------------------------------\n");
+
+	close(luo_fd);
+	daemonize_and_wait();
+}
+
+static struct kvm_vm *vm_create_from_fd(int resurrected_vm_fd,
+					struct vm_shape shape)
+{
+	struct kvm_vm *vm;
+
+	vm = calloc(1, sizeof(*vm));
+	TEST_ASSERT(vm != NULL, "Insufficient Memory");
+
+	vm_init_fields(vm, shape);
+
+	vm->kvm_fd = open_path_or_exit(KVM_DEV_PATH, O_RDWR);
+	vm->fd = resurrected_vm_fd;
+
+	if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
+		vm->stats.fd = vm_get_stats_fd(vm);
+	else
+		vm->stats.fd = -1;
+
+	vm_init_memory_properties(vm);
+
+	return vm;
+}
+
+static void run_stage_2(int luo_fd, int state_session_fd)
+{
+	int retrieved_vm_fd, retrieved_gmem_fd, session_fd, stage;
+	struct vm_shape shape = VM_SHAPE_DEFAULT;
+	const uint64_t gpa = SZ_4G;
+	struct kvm_vcpu *vcpu;
+	const int slot = 1;
+	struct kvm_vm *vm;
+
+	ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
+
+	restore_and_read_stage(state_session_fd, STATE_TOKEN, &stage);
+	if (stage != 2)
+		fail_exit("Expected stage 2, but state file contains %d", stage);
+
+	ksft_print_msg("[STAGE 2] Retrieving session '%s'...\n", SESSION_NAME);
+	session_fd = luo_retrieve_session(luo_fd, SESSION_NAME);
+	TEST_ASSERT(session_fd >= 0, "Failed to retrieve LUO session");
+
+	retrieved_vm_fd = luo_session_retrieve_fd(session_fd, VM_TOKEN);
+	TEST_ASSERT(retrieved_vm_fd >= 0, "Failed to retrieve VM file descriptor");
+
+	retrieved_gmem_fd = luo_session_retrieve_fd(session_fd, GMEM_TOKEN);
+	TEST_ASSERT(retrieved_gmem_fd >= 0, "Failed to retrieve guest_memfd file descriptor");
+
+	vm = vm_create_from_fd(retrieved_vm_fd, shape);
+
+	u64 nr_pages = 2048; /* 8MB is plenty for slot0 pages */
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
+	kvm_vm_elf_load(vm, program_invocation_name);
+
+	for (int i = 0; i < NR_MEM_REGIONS; i++)
+		vm->memslots[i] = 0;
+
+	struct userspace_mem_region *slot0 = memslot2region(vm, 0);
+
+	ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size);
+
+	vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, GMEM_SIZE, NULL,
+				   retrieved_gmem_fd, 0);
+
+	for (size_t i = 0; i < GMEM_SIZE; i += page_size)
+		virt_pg_map(vm, gpa + i, gpa + i);
+
+	vcpu = vm_vcpu_add(vm, 0, guest_code_phase2);
+	kvm_arch_vm_finalize_vcpus(vm);
+
+	vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
+
+	printf("Resuming / Running VM in Phase 2...\n");
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+	printf("\nSUCCESS: Phase 2 Complete! All 5MB complex data verified intact!\n");
+
+	luo_session_finish(session_fd);
+	close(session_fd);
+
+	ksft_print_msg("[STAGE 2] Finalizing state session...\n");
+	if (luo_session_finish(state_session_fd) < 0)
+		fail_exit("luo_session_finish for state session");
+	close(state_session_fd);
+
+	/* This will also close the vm_fd */
+	kvm_vm_free(vm);
+	close(retrieved_gmem_fd);
+}
+
+int main(int argc, char *argv[])
+{
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
+	page_size = getpagesize();
+
+	return luo_test(argc, argv, STATE_SESSION_NAME,
+			run_stage_1, run_stage_2);
+}
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 8/9] selftests: kvm: Split ____vm_create() to expose init helpers
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Refactor `____vm_create()` in the KVM selftest library to extract its
initialization steps into separate, reusable internal helpers.

Introduce `vm_init_fields()` and `vm_init_memory_properties()`. This
allows advanced test setups to perform targeted VM fields or memory
property initializations independently, which is required by upcoming
test cases that restore preserved VMs. No functional changes are
introduced for the existing tests.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 .../testing/selftests/kvm/include/kvm_util.h  |  2 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 26 +++++++++++++------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 04a9101..88de0e7 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -471,6 +471,8 @@ const char *vm_guest_mode_string(u32 i);
 
 void kvm_vm_free(struct kvm_vm *vmp);
 void kvm_vm_restart(struct kvm_vm *vmp);
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape);
+void vm_init_memory_properties(struct kvm_vm *vm);
 void kvm_vm_release(struct kvm_vm *vmp);
 void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename);
 int kvm_memfd_alloc(size_t size, bool hugepages);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 195f3fd..dc576b8 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -276,13 +276,8 @@ __weak void vm_populate_gva_bitmap(struct kvm_vm *vm)
 		(1ULL << (vm->va_bits - 1)) >> vm->page_shift);
 }
 
-struct kvm_vm *____vm_create(struct vm_shape shape)
+void vm_init_fields(struct kvm_vm *vm, struct vm_shape shape)
 {
-	struct kvm_vm *vm;
-
-	vm = calloc(1, sizeof(*vm));
-	TEST_ASSERT(vm != NULL, "Insufficient Memory");
-
 	INIT_LIST_HEAD(&vm->vcpus);
 	vm->regions.gpa_tree = RB_ROOT;
 	vm->regions.hva_tree = RB_ROOT;
@@ -380,9 +375,10 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 	if (vm->pa_bits != 40)
 		vm->type = KVM_VM_TYPE_ARM_IPA_SIZE(vm->pa_bits);
 #endif
+}
 
-	vm_open(vm);
-
+void vm_init_memory_properties(struct kvm_vm *vm)
+{
 	/* Limit to VA-bit canonical virtual addresses. */
 	vm->vpages_valid = sparsebit_alloc();
 	vm_populate_gva_bitmap(vm);
@@ -392,6 +388,20 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 
 	/* Allocate and setup memory for guest. */
 	vm->vpages_mapped = sparsebit_alloc();
+}
+
+struct kvm_vm *____vm_create(struct vm_shape shape)
+{
+	struct kvm_vm *vm;
+
+	vm = calloc(1, sizeof(*vm));
+	TEST_ASSERT(vm != NULL, "Insufficient Memory");
+
+	vm_init_fields(vm, shape);
+
+	vm_open(vm);
+
+	vm_init_memory_properties(vm);
 
 	return vm;
 }
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 6/9] kvm: guest_memfd_luo: add support for guest_memfd preservation
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

This patch sets up the basic infrastructure to preserve the guest_memfd.
Currently this supports only fully shared guest_memfd and backed by
PAGE_SIZE pages.

It uses INIT_SHARED flag to check its shareability and
kvm_arch_has_private_mem to check if the conversion of memory to private
is not supported.

Preservation is straight forward. It walks through the folios and
serialize them.

There is kvm_gmem_freeze call on preserve which freeze the guest_memfd
inode. It avoids any changes to inode mapping with fallocate calls and
also fails any new fault allocation on or after preservation.

This change also update the MAINTAINERS list.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 MAINTAINERS                 |   1 +
 include/linux/kho/abi/kvm.h |  79 +++++-
 virt/kvm/Makefile.kvm       |   2 +-
 virt/kvm/guest_memfd_luo.c  | 497 ++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c         |   7 +
 virt/kvm/kvm_mm.h           |   4 +
 6 files changed, 583 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/guest_memfd_luo.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c000e6..d1d699ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L:	kexec@lists.infradead.org
 L:	kvm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	virt/kvm/guest_memfd_luo.c
 F:	virt/kvm/kvm_luo.c
 
 KVM PARAVIRT (KVM/paravirt)
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
index 718db68..42074d7 100644
--- a/include/linux/kho/abi/kvm.h
+++ b/include/linux/kho/abi/kvm.h
@@ -9,20 +9,23 @@
 #define _LINUX_KHO_ABI_KVM_H
 
 #include <linux/types.h>
+#include <linux/bits.h>
 #include <linux/kho/abi/kexec_handover.h>
 
 /**
- * DOC: KVM Live Update ABI
+ * DOC: KVM and guest_memfd Live Update ABI
  *
- * KVM uses the ABI defined below for preserving its state
+ * KVM and guest_memfd use the ABI defined below for preserving their states
  * across a kexec reboot using the LUO.
  *
- * The state is serialized into a packed structure `struct kvm_luo_ser`
- * which is handed over to the next kernel via the KHO mechanism.
+ * The state is serialized into packed structures (struct kvm_luo_ser and
+ * struct guest_memfd_luo_ser) which are handed over to the next kernel via
+ * the KHO mechanism.
  *
- * This interface is a contract. Any modification to the structure layout
+ * This interface is a contract. Any modification to the structure layouts
  * constitutes a breaking change. Such changes require incrementing the
- * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ * version number in the KVM_LUO_FH_COMPATIBLE or
+ * GUEST_MEMFD_LUO_FH_COMPATIBLE compatibility strings.
  */
 
 /**
@@ -36,4 +39,68 @@ struct kvm_luo_ser {
 /* The compatibility string for KVM VM file handler */
 #define KVM_LUO_FH_COMPATIBLE	"kvm_vm_luo_v1"
 
+/**
+ * struct guest_memfd_luo_folio_ser - Serialization layout for a single folio in guest_memfd.
+ * @pfn:   Page Frame Number of the folio.
+ * @index: Page offset of the folio within the file.
+ * @flags: State flags associated with the folio.
+ */
+struct guest_memfd_luo_folio_ser {
+	u64 pfn:52;
+	u64 flags:12;
+	u64 index;
+} __packed;
+
+/**
+ * GUEST_MEMFD_LUO_FOLIO_UPTODATE - The folio is up-to-date.
+ *
+ * This flag is per folio to check if the folio is uptodate.
+ */
+#define GUEST_MEMFD_LUO_FOLIO_UPTODATE	BIT(0)
+
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_MMAP - The guest_memfd supports mmap.
+ *
+ * This flag indicates that the guest_memfd supports host-side mmap.
+ */
+#define GUEST_MEMFD_LUO_FLAG_MMAP		BIT(0)
+
+/**
+ * GUEST_MEMFD_LUO_FLAG_INIT_SHARED - Initialize memory as shared.
+ *
+ * This flag indicates that the guest_memfd has been initialized as shared
+ * memory.
+ */
+#define GUEST_MEMFD_LUO_FLAG_INIT_SHARED	BIT(1)
+
+/**
+ * GUEST_MEMFD_LUO_SUPPORTED_FLAGS - Supported guest_memfd LUO flags mask.
+ *
+ * A mask of all guest_memfd preservation flags supported by this version
+ * of the KVM LUO ABI.
+ */
+#define GUEST_MEMFD_LUO_SUPPORTED_FLAGS	(GUEST_MEMFD_LUO_FLAG_MMAP | \
+						 GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+
+/**
+ * struct guest_memfd_luo_ser - Main serialization structure for guest_memfd.
+ * @size:      The size of the file in bytes.
+ * @flags:     File-level flags.
+ * @nr_folios: Number of folios in the folios array.
+ * @vm_token:  Token of the associated KVM VM instance.
+ * @folios:    KHO vmalloc descriptor pointing to the array of
+ *             struct guest_memfd_luo_folio_ser.
+ */
+struct guest_memfd_luo_ser {
+	u64 size;
+	u64 flags;
+	u64 nr_folios;
+	u64 vm_token;
+	struct kho_vmalloc folios;
+} __packed;
+
+/* The compatibility string for GUEST_MEMFD file handler */
+#define GUEST_MEMFD_LUO_FH_COMPATIBLE	"guest_memfd_luo_v1"
+
 #endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index c1a9621..d30fca0 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,4 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
-kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/guest_memfd_luo.o $(KVM)/kvm_luo.o
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
new file mode 100644
index 0000000..c242b1d
--- /dev/null
+++ b/virt/kvm/guest_memfd_luo.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * Guestmemfd Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: Guestmemfd Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * Guest memory file descriptors (guest_memfd) can be preserved over a kexec
+ * reboot using the Live Update Orchestrator (LUO) file preservation. This
+ * allows userspace to preserve VM memory across kexec reboots.
+ *
+ * The preservation is not intended to be transparent. Only select properties
+ * of the guest_memfd are preserved, while others are reset to default.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of guest_memfd are preserved across kexec:
+ *
+ * File Size
+ *   The size of the file is preserved.
+ *
+ * File Contents
+ *   All folios present in the page cache are preserved.
+ *
+ * File-level Flags
+ *   The file-level flags (such as MMAP support and INIT_SHARED default mapping)
+ *   are preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * NUMA Memory Policy
+ *   NUMA memory policies associated with the guest_memfd are not preserved.
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "guest_memfd.h"
+#include "kvm_mm.h"
+
+
+static int kvm_gmem_luo_walk_folios(struct address_space *mapping,
+		pgoff_t end_index, struct guest_memfd_luo_folio_ser *folios_ser,
+		u64 *out_count)
+{
+	struct folio_batch fbatch;
+	pgoff_t index = 0;
+	u64 count = 0;
+	int err = 0;
+
+	folio_batch_init(&fbatch);
+	while (index < end_index) {
+		unsigned int nr, i;
+
+		nr = filemap_get_folios(mapping, &index, end_index - 1, &fbatch);
+		if (nr == 0)
+			break;
+
+		for (i = 0; i < nr; i++) {
+			struct folio *folio = fbatch.folios[i];
+
+			if (folios_ser) {
+				if (folio_test_hwpoison(folio)) {
+					err = -EHWPOISON;
+					folio_batch_release(&fbatch);
+					goto out;
+				}
+				err = kho_preserve_folio(folio);
+				if (err) {
+					folio_batch_release(&fbatch);
+					goto out;
+				}
+
+				folios_ser[count].pfn = folio_pfn(folio);
+				folios_ser[count].index = folio->index;
+				folios_ser[count].flags = folio_test_uptodate(folio) ?
+							  GUEST_MEMFD_LUO_FOLIO_UPTODATE : 0;
+			}
+			count++;
+		}
+		folio_batch_release(&fbatch);
+		cond_resched();
+	}
+
+out:
+	*out_count = count;
+	return err;
+}
+
+static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct gmem_file *gmem_file;
+	struct kvm *kvm;
+
+	if (inode->i_sb->s_magic != GUEST_MEMFD_MAGIC)
+		return 0;
+
+	gmem_file = file->private_data;
+	if (!gmem_file)
+		return 0;
+
+	/*
+	 * Only Fully-shared guest_memfd preservation is supported
+	 */
+	if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+		return 0;
+
+	/*
+	 * It makes sure that no memory can converted to private
+	 * even if it was initially fully shared (in-place conversions are
+	 * prevented).
+	 */
+	kvm = gmem_file->kvm;
+	if (kvm_arch_has_private_mem(kvm))
+		return 0;
+
+	if (mapping_large_folio_support(inode->i_mapping))
+		return 0;
+
+	return 1;
+}
+
+static int kvm_gmem_luo_preserve(struct liveupdate_file_op_args *args)
+{
+	DECLARE_KHOSER_PTR(sd, struct guest_memfd_luo_ser *);
+	struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+	u64 count = 0, gmem_flags, abi_flags = 0;
+	struct guest_memfd_luo_ser *ser;
+	struct address_space *mapping;
+	struct gmem_file *gmem_file;
+	struct inode *inode;
+	pgoff_t end_index;
+	struct kvm *kvm;
+	int err = 0;
+	long size;
+
+	inode = file_inode(args->file);
+	kvm_gmem_freeze(inode, true);
+
+	mapping = inode->i_mapping;
+	size = i_size_read(inode);
+	if (!size) {
+		err = -EINVAL;
+		goto err_unfreeze_inode;
+	}
+
+	if (WARN_ON_ONCE(!PAGE_ALIGNED(size))) {
+		err = -EINVAL;
+		goto err_unfreeze_inode;
+	}
+
+	gmem_file = args->file->private_data;
+	kvm = gmem_file->kvm;
+
+	gmem_flags = READ_ONCE(GMEM_I(inode)->flags);
+	if (gmem_flags & ~(GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED
+				| GUEST_MEMFD_F_MAPPING_FROZEN)) {
+		err = -EOPNOTSUPP;
+		goto err_unfreeze_inode;
+	}
+
+	if (gmem_flags & GUEST_MEMFD_FLAG_MMAP)
+		abi_flags |= GUEST_MEMFD_LUO_FLAG_MMAP;
+	if (gmem_flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+		abi_flags |= GUEST_MEMFD_LUO_FLAG_INIT_SHARED;
+
+	end_index = size >> PAGE_SHIFT;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser)) {
+		err = PTR_ERR(ser);
+		goto err_unfreeze_inode;
+	}
+
+	/* First pass: Count the folios present in the page cache */
+	err = kvm_gmem_luo_walk_folios(mapping, end_index, NULL, &count);
+	if (err)
+		goto err_free_ser;
+
+	ser->size = size;
+	ser->flags = abi_flags;
+	ser->nr_folios = count;
+	ser->vm_token = 0; // It will be set during the kvm_gmem_luo_freeze()
+
+	if (count > 0) {
+		folios_ser = vcalloc(count, sizeof(*folios_ser));
+		if (!folios_ser) {
+			err = -ENOMEM;
+			goto err_free_ser;
+		}
+
+		/* Second pass: Fill the metadata array and preserve folios */
+		err = kvm_gmem_luo_walk_folios(mapping, end_index, folios_ser, &count);
+		if (err)
+			goto err_unpreserve_unlocked;
+
+		if (WARN_ON_ONCE(count != ser->nr_folios)) {
+			err = -EINVAL;
+			goto err_unpreserve_unlocked;
+		}
+	}
+
+	if (count > 0) {
+		err = kho_preserve_vmalloc(folios_ser, &ser->folios);
+		if (err)
+			goto err_unpreserve_unlocked;
+	}
+
+	KHOSER_STORE_PTR(sd, ser);
+	KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+	args->private_data = folios_ser;
+
+	return 0;
+
+err_unpreserve_unlocked:
+	for (long i = (long)count - 1; i >= 0; i--) {
+		struct folio *folio = pfn_folio(folios_ser[i].pfn);
+
+		kho_unpreserve_folio(folio);
+	}
+	vfree(folios_ser);
+err_free_ser:
+	kho_unpreserve_free(ser);
+err_unfreeze_inode:
+	kvm_gmem_freeze(inode, false);
+	return err;
+}
+
+static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_ser *ser;
+	struct gmem_file *gmem_file;
+	struct kvm *kvm;
+	struct file *kvm_file;
+	u64 vm_token;
+	int err;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return -EINVAL;
+
+	gmem_file = args->file->private_data;
+	kvm = gmem_file->kvm;
+
+	/*
+	 * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE_BY_RCU
+	 * file memory from being reallocated while it is being processed.
+	 */
+	kvm_file = get_file_active(&kvm->vm_file);
+	if (!kvm_file)
+		return -ENOENT;
+
+	err = liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token);
+	fput(kvm_file);
+	if (err)
+		return err;
+
+	ser->vm_token = vm_token;
+	return 0;
+}
+
+static void kvm_gmem_luo_discard_folios(
+	const struct guest_memfd_luo_folio_ser *folios_ser,
+	u64 nr_folios, u64 start_idx)
+{
+	long i;
+
+	for (i = start_idx; i < nr_folios; i++) {
+		struct folio *folio;
+		phys_addr_t phys;
+
+		if (!folios_ser[i].pfn)
+			continue;
+
+		phys = PFN_PHYS(folios_ser[i].pfn);
+		folio = kho_restore_folio(phys);
+		if (folio)
+			folio_put(folio);
+	}
+}
+
+static void kvm_gmem_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_folio_ser *folios_ser = args->private_data;
+	struct guest_memfd_luo_ser *ser;
+	long i;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return;
+
+	if (ser->nr_folios > 0)
+		kho_unpreserve_vmalloc(&ser->folios);
+	for (i = ser->nr_folios - 1; i >= 0; i--) {
+		struct folio *folio;
+
+		if (!folios_ser[i].pfn)
+			continue;
+
+		folio = pfn_folio(folios_ser[i].pfn);
+		kho_unpreserve_folio(folio);
+	}
+	vfree(folios_ser);
+
+	kho_unpreserve_free(ser);
+	kvm_gmem_freeze(file_inode(args->file), false);
+}
+
+static int kvm_gmem_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_folio_ser *folios_ser = NULL;
+	struct guest_memfd_luo_ser *ser;
+	struct kvm *kvm = NULL;
+	struct file *vm_file;
+	struct inode *inode;
+	struct file *file;
+	u64 gmem_flags = 0;
+	int err = 0;
+	long i = 0;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return -EINVAL;
+
+	if (ser->flags & ~GUEST_MEMFD_LUO_SUPPORTED_FLAGS) {
+		err = -EOPNOTSUPP;
+		goto err_free_ser;
+	}
+
+	if (ser->flags & GUEST_MEMFD_LUO_FLAG_MMAP)
+		gmem_flags |= GUEST_MEMFD_FLAG_MMAP;
+	if (ser->flags & GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
+		gmem_flags |= GUEST_MEMFD_FLAG_INIT_SHARED;
+
+	err = liveupdate_get_file_incoming(args->session, ser->vm_token, &vm_file);
+	if (err) {
+		pr_warn("gmem: provided VM FD token (%llx) on preserve is incorrect\n",
+						ser->vm_token);
+		goto err_free_ser;
+	}
+
+	if (file_is_kvm(vm_file))
+		kvm = vm_file->private_data;
+
+	/*
+	 * Release the temporary reference taken by the liveupdate_get_file_incoming
+	 * call. LUO still holds a reference.
+	 */
+	fput(vm_file);
+
+	if (!kvm) {
+		err = -EINVAL;
+		goto err_free_ser;
+	}
+
+	file = __kvm_gmem_create_file(kvm, ser->size, gmem_flags);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_free_ser;
+	}
+
+	inode = file_inode(file);
+
+	if (ser->nr_folios) {
+		folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (!folios_ser) {
+			err = -EINVAL;
+			goto err_destroy_file;
+		}
+
+		for (i = 0; i < ser->nr_folios; i++) {
+			struct folio *folio;
+			phys_addr_t phys;
+
+			if (!folios_ser[i].pfn)
+				continue;
+
+			phys = PFN_PHYS(folios_ser[i].pfn);
+			folio = kho_restore_folio(phys);
+			if (!folio) {
+				pr_err("gmem: failed to restore folio at %llx\n", phys);
+				err = -EIO;
+				goto err_put_remaining_folios;
+			}
+
+			err = filemap_add_folio(inode->i_mapping, folio, folios_ser[i].index,
+						GFP_KERNEL);
+			if (err) {
+				pr_err("gmem: failed to add folio to page cache\n");
+				folio_put(folio);
+				goto err_put_remaining_folios;
+			}
+
+			if (folios_ser[i].flags & GUEST_MEMFD_LUO_FOLIO_UPTODATE)
+				folio_mark_uptodate(folio);
+			folio_unlock(folio);
+			folio_put(folio);
+		}
+		vfree(folios_ser);
+	}
+
+	args->file = file;
+	kho_restore_free(ser);
+	return 0;
+
+err_put_remaining_folios:
+	i++;
+err_destroy_file:
+	fput(file);
+err_free_ser:
+	if (ser->nr_folios) {
+		if (!folios_ser)
+			folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (folios_ser) {
+			kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, i);
+			vfree(folios_ser);
+		}
+	}
+	kho_restore_free(ser);
+	return err;
+}
+
+static void kvm_gmem_luo_finish(struct liveupdate_file_op_args *args)
+{
+	struct guest_memfd_luo_ser *ser;
+	struct guest_memfd_luo_folio_ser *folios_ser;
+
+	/* Nothing to be done here, if retrieve_status was successful or errored,
+	 * Cleanup is taken care of in retrieval call.
+	 */
+	if (args->retrieve_status)
+		return;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return;
+
+	if (ser->nr_folios) {
+		folios_ser = kho_restore_vmalloc(&ser->folios);
+		if (folios_ser) {
+			kvm_gmem_luo_discard_folios(folios_ser, ser->nr_folios, 0);
+			vfree(folios_ser);
+		}
+	}
+
+	kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_gmem_luo_file_ops = {
+	.can_preserve = kvm_gmem_luo_can_preserve,
+	.preserve = kvm_gmem_luo_preserve,
+	.freeze = kvm_gmem_luo_freeze,
+	.retrieve = kvm_gmem_luo_retrieve,
+	.unpreserve = kvm_gmem_luo_unpreserve,
+	.finish = kvm_gmem_luo_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_gmem_luo_handler = {
+	.ops = &kvm_gmem_luo_file_ops,
+	.compatible = GUEST_MEMFD_LUO_FH_COMPATIBLE,
+};
+
+int kvm_gmem_luo_init(void)
+{
+	int err = liveupdate_register_file_handler(&kvm_gmem_luo_handler);
+
+	if (err && err != -EOPNOTSUPP) {
+		pr_err("Could not register luo filesystem handler: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+void kvm_gmem_luo_exit(void)
+{
+	liveupdate_unregister_file_handler(&kvm_gmem_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9c3dd1..e8e2f10 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6581,6 +6581,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	if (r)
 		goto err_luo;
 
+	r = kvm_gmem_luo_init();
+	if (r)
+		goto err_gmem_luo;
+
 	/*
 	 * Registration _must_ be the very last thing done, as this exposes
 	 * /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6594,6 +6598,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	return 0;
 
 err_register:
+	kvm_gmem_luo_exit();
+err_gmem_luo:
 	kvm_luo_exit();
 err_luo:
 	kvm_uninit_virtualization();
@@ -6625,6 +6631,7 @@ void kvm_exit(void)
 	 */
 	misc_deregister(&kvm_dev);
 
+	kvm_gmem_luo_exit();
 	kvm_luo_exit();
 
 	kvm_uninit_virtualization();
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 8719871..1295ff8 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -103,9 +103,13 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 #ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
 int kvm_luo_init(void);
 void kvm_luo_exit(void);
+int kvm_gmem_luo_init(void);
+void kvm_gmem_luo_exit(void);
 #else
 static inline int kvm_luo_init(void) { return 0; }
 static inline void kvm_luo_exit(void) {}
+static inline int kvm_gmem_luo_init(void) { return 0; }
+static inline void kvm_gmem_luo_exit(void) {}
 #endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
 
 #endif /* __KVM_MM_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 7/9] docs: add documentation for guest_memfd preservation via LUO
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Add the documentation under the "Preserving file descriptors" section
of LUO's documentation.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 Documentation/core-api/liveupdate.rst |   1 +
 Documentation/liveupdate/vmm.rst      | 107 ++++++++++++++++++++++++++
 MAINTAINERS                           |   1 +
 virt/kvm/guest_memfd_luo.c            |   4 +-
 4 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/liveupdate/vmm.rst

diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index 5a292d0..bac58a3 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -34,6 +34,7 @@ The following types of file descriptors can be preserved
    :maxdepth: 1
 
    ../mm/memfd_preservation
+   ../liveupdate/vmm
 
 Public API
 ==========
diff --git a/Documentation/liveupdate/vmm.rst b/Documentation/liveupdate/vmm.rst
new file mode 100644
index 0000000..8353e23
--- /dev/null
+++ b/Documentation/liveupdate/vmm.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+=============================
+VM & Guest_Memfd Preservation
+=============================
+
+.. kernel-doc:: virt/kvm/kvm_luo.c
+   :doc: KVM VM Preservation via LUO
+
+.. kernel-doc:: virt/kvm/guest_memfd_luo.c
+   :doc: Guest_Memfd Preservation via LUO
+
+VMM Instructions
+================
+
+This section describes the requirements, scope, conditions, and
+ordering constraints that a Virtual Machine Monitor (VMM) must adhere
+to for successful preservation and retrieval of guest_memfd files
+across a Live Update Orchestrator (LUO) sequence.
+
+Scope and Limitations
+---------------------
+
+At this stage, the scope of guest_memfd preservation is restricted to:
+
+1. **Fully Shared guest_memfd**:
+   This time only fully shared guest_memfd supported. Any system that
+   supports coco vm (which uses private guest_memfd), will not support
+   the preservation.
+
+2. **Standard Page Size**:
+   Only guest_memfd backed by standard page size (``PAGE_SIZE``,
+   order-0) pages is supported. Large/huge page backing (e.g.,
+   hugetlb guest_memfd) is not supported.
+
+Any Virtual Machine (VM) whose memory is fully backed by such
+guest_memfd files can be preserved across live update.
+
+VMM Actions and Conditions during Live Update
+---------------------------------------------
+
+During the live update sequence, the kernel introduces a *freezing*
+phase for the guest_memfd inode. Freezing prevents any modifications to
+the guest_memfd page cache. Specifically, once a guest_memfd mapping is
+frozen:
+
+- Any subsequent ``fallocate`` calls on the guest_memfd file descriptor
+  will fail and return ``-EPERM``.
+- Any new page faults (guest-side or host-userspace-side) that require
+  folio allocation will fail and return ``-EPERM``.
+
+To prevent vCPUs or VMM helper threads from failing due to these
+``-EPERM`` errors, the VMM must implement one of the following
+strategies:
+
+1. **Pause the VM (Recommended)**:
+   The VMM should pause/suspend all vCPUs before invoking the
+   preservation or freezing of the VM and guest_memfd files. This
+   ensures no new page faults or memory accesses can occur while the
+   guest_memfd is frozen.
+
+2. **Handle Fault Failures**:
+   If the VM is not paused, the VMM must be prepared to handle VM
+   exits or user page fault errors resulting from the ``-EPERM``
+   failures. The VMM must take appropriate action, such as
+   immediately pausing the VM, or aborting the live update sequence
+   (by tearing down or unpreserving the live update session).
+
+Preservation and Retrieval Ordering
+-----------------------------------
+
+Preservation Order
+~~~~~~~~~~~~~~~~~~
+
+There is no strict ordering requirement for initiating the
+preservation of the KVM VM file and the guest_memfd files; they are
+preserved independently. If kexec is triggered with guest_memfd
+preservation without preserving the vm file, kexec will fail.
+
+Retrieval Order
+~~~~~~~~~~~~~~~
+
+Similarly, there is no strict ordering required for retrieving the VM
+and guest_memfd files. Any file can be retrieved at any order.
+
+If guest_memfd file is retrieved and VM file is not retrieved, and
+luo_finish is called, then vm_file will be lost and guest_memfd file
+will be hanging around.
+
+NOTE: Before Initiating the preservation/retirval, it is necessary to make
+sure that the kvm module is loaded (/dev/kvm must be available).
+
+
+VM & Guest_Memfd Preservation ABI
+=================================
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+   :doc: DOC: guest_memfd Live Update ABI
+
+.. kernel-doc:: include/linux/kho/abi/kvm.h
+   :internal:
+
+See Also
+========
+
+- :doc:`/core-api/liveupdate`
+- :doc:`/userspace-api/liveupdate`
diff --git a/MAINTAINERS b/MAINTAINERS
index d1d699ce..e27b677 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14420,6 +14420,7 @@ L:	kexec@lists.infradead.org
 L:	kvm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	Documentation/liveupdate/vmm.rst
 F:	virt/kvm/guest_memfd_luo.c
 F:	virt/kvm/kvm_luo.c
 
diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
index c242b1d..8411fe8 100644
--- a/virt/kvm/guest_memfd_luo.c
+++ b/virt/kvm/guest_memfd_luo.c
@@ -119,11 +119,11 @@ static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, s
 	/*
 	 * Only Fully-shared guest_memfd preservation is supported
 	 */
-	if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
+	if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
 		return 0;
 
 	/*
-	 * It makes sure that no memory can converted to private
+	 * It makes sure that no memory can be converted to private
 	 * even if it was initially fully shared (in-place conversions are
 	 * prevented).
 	 */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 5/9] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

This patch introduces the freeze on gmem_inode which prevents
the fallocate call and any new page fault allocation. This will avoid
gmem file modification when it is being preserved

Used srcu lock to synchronise the freeze call, where write blocks
until all the reads are free. And reads are re-entrant.

Incase fault fails, It return -EPERM and VM_EXIT to userspace. userspace
must handle this properly as every new fault will fail.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 virt/kvm/guest_memfd.c | 117 +++++++++++++++++++++++++++++++++++++----
 virt/kvm/guest_memfd.h |   5 ++
 2 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fe1adc9b..a4d9d34 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,11 +7,13 @@
 #include <linux/mempolicy.h>
 #include <linux/pseudo_fs.h>
 #include <linux/pagemap.h>
+#include <linux/srcu.h>
 #include "guest_memfd.h"
 
 #include "kvm_mm.h"
 
 static struct vfsmount *kvm_gmem_mnt;
+static struct srcu_struct kvm_gmem_freeze_srcu;
 
 
 #define kvm_gmem_for_each_file(f, inode) \
@@ -96,6 +98,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 	/* TODO: Support huge pages. */
 	struct mempolicy *policy;
 	struct folio *folio;
+	int idx;
 
 	/*
 	 * Fast-path: See if folio is already present in mapping to avoid
@@ -105,12 +108,20 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
 	if (!IS_ERR(folio))
 		return folio;
 
+	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+	if (kvm_gmem_is_frozen(inode)) {
+		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+		return ERR_PTR(-EPERM);
+	}
+
 	policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
 	folio = __filemap_get_folio_mpol(inode->i_mapping, index,
 					 FGP_LOCK | FGP_CREAT,
 					 mapping_gfp_mask(inode->i_mapping), policy);
 	mpol_cond_put(policy);
 
+	srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+
 	/*
 	 * External interfaces like kvm_gmem_get_pfn() support dealing
 	 * with hugepages to a degree, but internally, guest_memfd currently
@@ -273,16 +284,30 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
 static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
 			       loff_t len)
 {
+	struct inode *inode = file_inode(file);
 	int ret;
+	int idx;
 
-	if (!(mode & FALLOC_FL_KEEP_SIZE))
-		return -EOPNOTSUPP;
+	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
+	if (kvm_gmem_is_frozen(inode)) {
+		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
+		return -EPERM;
+	}
 
-	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
-		return -EOPNOTSUPP;
+	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 
-	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
-		return -EINVAL;
+	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (mode & FALLOC_FL_PUNCH_HOLE)
 		ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
@@ -291,6 +316,9 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
 
 	if (!ret)
 		file_modified(file);
+
+out:
+	srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
 	return ret;
 }
 
@@ -948,7 +976,9 @@ static void kvm_gmem_destroy_inode(struct inode *inode)
 
 static void kvm_gmem_free_inode(struct inode *inode)
 {
-	kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
+	struct gmem_inode *gi = GMEM_I(inode);
+
+	kmem_cache_free(kvm_gmem_inode_cachep, gi);
 }
 
 static const struct super_operations kvm_gmem_super_operations = {
@@ -1005,12 +1035,21 @@ int kvm_gmem_init(struct module *module)
 	if (!kvm_gmem_inode_cachep)
 		return -ENOMEM;
 
+	ret = init_srcu_struct(&kvm_gmem_freeze_srcu);
+	if (ret)
+		goto err_cache;
+
 	ret = kvm_gmem_init_mount();
-	if (ret) {
-		kmem_cache_destroy(kvm_gmem_inode_cachep);
-		return ret;
-	}
+	if (ret)
+		goto err_srcu;
+
 	return 0;
+
+err_srcu:
+	cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
+err_cache:
+	kmem_cache_destroy(kvm_gmem_inode_cachep);
+	return ret;
 }
 
 void kvm_gmem_exit(void)
@@ -1018,5 +1057,61 @@ void kvm_gmem_exit(void)
 	kern_unmount(kvm_gmem_mnt);
 	kvm_gmem_mnt = NULL;
 	rcu_barrier();
+	cleanup_srcu_struct(&kvm_gmem_freeze_srcu);
 	kmem_cache_destroy(kvm_gmem_inode_cachep);
 }
+
+/**
+ * kvm_gmem_freeze - Freeze or unfreeze a guest_memfd inode mapping.
+ * @inode: The guest_memfd inode.
+ * @freeze: True to freeze, false to unfreeze.
+ *
+ * This API is used strictly during the live update / preservation transition
+ * window to prevent host userspace and guest-side faults from making any
+ * mapping modifications (such as fallocate or page fault allocation)
+ * to the guest_memfd page cache.
+ *
+ * Synchronization Strategy (Sleepable RCU):
+ * To avoid high-contention VFS locks (like inode_lock or
+ * filemap_invalidate_lock) on the vCPU page fault hot paths, this subsystem
+ * implements a lightweight, system-wide Sleepable RCU (SRCU) mechanism
+ * (`kvm_gmem_freeze_srcu`):
+ *
+ * Global vs. Per-Inode SRCU
+ * ======================
+ * A single system-wide global static `srcu_struct` is used instead of a
+ * per-inode SRCU structure to completely prevent unprivileged users from
+ * exhausting the host's per-CPU memory allocator. Because
+ * `init_srcu_struct()` allocates per-CPU memory via `alloc_percpu()`, which
+ * is not accounted by memory cgroups (memcg),
+ * a per-inode SRCU structure would allow a tenant to bypass cgroup limits and
+ * trigger a system-wide Out-of-Memory (OOM) crash simply by spawning a large
+ * number of guest_memfd file descriptors (bounded only by RLIMIT_NOFILE).
+ *
+ * Flag Modification Note:
+ * Since `GUEST_MEMFD_F_MAPPING_FROZEN` is the ONLY flag in
+ * `GMEM_I(inode)->flags` that is mutated dynamically at runtime (all other
+ * flags are creation-time flags which remain strictly read-only), there is
+ * no possibility of concurrent bit-modification races. Therefore, a standard
+ * `WRITE_ONCE` is fully safe and does not require complex `cmpxchg`
+ * synchronization loops.
+ */
+void kvm_gmem_freeze(struct inode *inode, bool freeze)
+{
+	u64 flags = READ_ONCE(GMEM_I(inode)->flags);
+
+	if (freeze)
+		flags |= GUEST_MEMFD_F_MAPPING_FROZEN;
+	else
+		flags &= ~GUEST_MEMFD_F_MAPPING_FROZEN;
+
+	WRITE_ONCE(GMEM_I(inode)->flags, flags);
+
+	if (freeze)
+		synchronize_srcu(&kvm_gmem_freeze_srcu);
+}
+
+bool kvm_gmem_is_frozen(struct inode *inode)
+{
+	return READ_ONCE(GMEM_I(inode)->flags) & GUEST_MEMFD_F_MAPPING_FROZEN;
+}
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
index c528b04..028c348 100644
--- a/virt/kvm/guest_memfd.h
+++ b/virt/kvm/guest_memfd.h
@@ -29,11 +29,16 @@ struct gmem_inode {
 	u64 flags;
 };
 
+/* Internal kernel-only flags (must not overlap with UAPI flags) */
+#define GUEST_MEMFD_F_MAPPING_FROZEN	(1ULL << 63)
+
 static inline struct gmem_inode *GMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct gmem_inode, vfs_inode);
 }
 
 struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+void kvm_gmem_freeze(struct inode *inode, bool freeze);
+bool kvm_gmem_is_frozen(struct inode *inode);
 
 #endif /* __KVM_GUEST_MEMFD_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 4/9] kvm: guest_memfd: Move internal definitions and helper to new header
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

To support guest_memfd memory preservation with LUO, guest_memfd luo
code needs to access guest_memfd internals and reconstruct guest_memfd
file instances from a preserved state.

Extract gmem_file, gmem_inode, and the GMEM_I() helper from guest_memfd.c
into a new internal header virt/kvm/guest_memfd.h.

Additionally, split __kvm_gmem_create() to expose a non-static
__kvm_gmem_create_file() helper. This helper returns a struct file
instead of a file descriptor, enabling file creation and initialization
without installing it into a file descriptor table.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 virt/kvm/guest_memfd.c | 68 +++++++++++++++++-------------------------
 virt/kvm/guest_memfd.h | 39 ++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 40 deletions(-)
 create mode 100644 virt/kvm/guest_memfd.h

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 8669068..fe1adc9b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,38 +7,12 @@
 #include <linux/mempolicy.h>
 #include <linux/pseudo_fs.h>
 #include <linux/pagemap.h>
+#include "guest_memfd.h"
 
 #include "kvm_mm.h"
 
 static struct vfsmount *kvm_gmem_mnt;
 
-/*
- * A guest_memfd instance can be associated multiple VMs, each with its own
- * "view" of the underlying physical memory.
- *
- * The gmem's inode is effectively the raw underlying physical storage, and is
- * used to track properties of the physical memory, while each gmem file is
- * effectively a single VM's view of that storage, and is used to track assets
- * specific to its associated VM, e.g. memslots=>gmem bindings.
- */
-struct gmem_file {
-	struct kvm *kvm;
-	struct xarray bindings;
-	struct list_head entry;
-};
-
-struct gmem_inode {
-	struct shared_policy policy;
-	struct inode vfs_inode;
-	struct list_head gmem_file_list;
-
-	u64 flags;
-};
-
-static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
-{
-	return container_of(inode, struct gmem_inode, vfs_inode);
-}
 
 #define kvm_gmem_for_each_file(f, inode) \
 	list_for_each_entry(f, &GMEM_I(inode)->gmem_file_list, entry)
@@ -557,23 +531,17 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
 	return true;
 }
 
-static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags)
 {
 	static const char *name = "[kvm-gmem]";
 	struct gmem_file *f;
 	struct inode *inode;
 	struct file *file;
-	int fd, err;
-
-	fd = get_unused_fd_flags(0);
-	if (fd < 0)
-		return fd;
+	int err;
 
 	f = kzalloc_obj(*f);
-	if (!f) {
-		err = -ENOMEM;
-		goto err_fd;
-	}
+	if (!f)
+		return ERR_PTR(-ENOMEM);
 
 	/* __fput() will take care of fops_put(). */
 	if (!fops_get(&kvm_gmem_fops)) {
@@ -612,8 +580,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	xa_init(&f->bindings);
 	list_add(&f->entry, &GMEM_I(inode)->gmem_file_list);
 
-	fd_install(fd, file);
-	return fd;
+	return file;
 
 err_inode:
 	iput(inode);
@@ -621,7 +588,28 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
 	fops_put(&kvm_gmem_fops);
 err_gmem:
 	kfree(f);
-err_fd:
+	return ERR_PTR(err);
+}
+
+static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
+{
+	struct file *file;
+	int fd, err;
+
+	fd = get_unused_fd_flags(0);
+	if (fd < 0)
+		return fd;
+
+	file = __kvm_gmem_create_file(kvm, size, flags);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	fd_install(fd, file);
+	return fd;
+
+err_put_fd:
 	put_unused_fd(fd);
 	return err;
 }
diff --git a/virt/kvm/guest_memfd.h b/virt/kvm/guest_memfd.h
new file mode 100644
index 0000000..c528b04
--- /dev/null
+++ b/virt/kvm/guest_memfd.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_GUEST_MEMFD_H__
+#define __KVM_GUEST_MEMFD_H__ 1
+
+#include <linux/kvm_host.h>
+#include <linux/fs.h>
+#include <linux/mempolicy.h>
+
+/*
+ * A guest_memfd instance can be associated multiple VMs, each with its own
+ * "view" of the underlying physical memory.
+ *
+ * The gmem's inode is effectively the raw underlying physical storage, and is
+ * used to track properties of the physical memory, while each gmem file is
+ * effectively a single VM's view of that storage, and is used to track assets
+ * specific to its associated VM, e.g. memslots=>gmem bindings.
+ */
+struct gmem_file {
+	struct kvm *kvm;
+	struct xarray bindings;
+	struct list_head entry;
+};
+
+struct gmem_inode {
+	struct shared_policy policy;
+	struct inode vfs_inode;
+	struct list_head gmem_file_list;
+
+	u64 flags;
+};
+
+static inline struct gmem_inode *GMEM_I(struct inode *inode)
+{
+	return container_of(inode, struct gmem_inode, vfs_inode);
+}
+
+struct file *__kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags);
+
+#endif /* __KVM_GUEST_MEMFD_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 3/9] kvm: kvm_luo: Allow kvm preservation with LUO
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce KVM VM preservation support for Live Update Orchestrator.

Register an LUO file handler for KVM files to serialize and
deserialize necessary VM state across live updates. Currently, this
preserves the VM type. This implementation provides the necessary
infrastructure and dependencies for the upcoming guest_memfd
preservation support. And it can be extended to preserve more vm
state in future.

Retrieve is simply creating the kvm and populate the retrieved data.
Only catch here is there is no way to know which fd is going to be
assigned to this kvm file hence I am using atomically incremented id
for the fdname.

This change also updates the MAINTAINERS list for kvm_luo.c.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 MAINTAINERS                 |  11 ++
 include/linux/kho/abi/kvm.h |  39 ++++++++
 virt/kvm/Makefile.kvm       |   1 +
 virt/kvm/kvm_luo.c          | 195 ++++++++++++++++++++++++++++++++++++
 virt/kvm/kvm_main.c         |   8 ++
 virt/kvm/kvm_mm.h           |   8 ++
 6 files changed, 262 insertions(+)
 create mode 100644 include/linux/kho/abi/kvm.h
 create mode 100644 virt/kvm/kvm_luo.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dbc8a6..7c000e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14411,6 +14411,17 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
 F:	drivers/video/backlight/ktz8866.c
 
+KVM LIVE UPDATE
+M:	Pasha Tatashin <pasha.tatashin@soleen.com>
+M:	Mike Rapoport <rppt@kernel.org>
+M:	Pratyush Yadav <pratyush@kernel.org>
+R:	Tarun Sahu <tarunsahu@google.com>
+L:	kexec@lists.infradead.org
+L:	kvm@vger.kernel.org
+S:	Maintained
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
+F:	virt/kvm/kvm_luo.c
+
 KVM PARAVIRT (KVM/paravirt)
 M:	Paolo Bonzini <pbonzini@redhat.com>
 R:	Vitaly Kuznetsov <vkuznets@redhat.com>
diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
new file mode 100644
index 0000000..718db68
--- /dev/null
+++ b/include/linux/kho/abi/kvm.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM Preservation ABI for Live Update Orchestrator (LUO)
+ */
+#ifndef _LINUX_KHO_ABI_KVM_H
+#define _LINUX_KHO_ABI_KVM_H
+
+#include <linux/types.h>
+#include <linux/kho/abi/kexec_handover.h>
+
+/**
+ * DOC: KVM Live Update ABI
+ *
+ * KVM uses the ABI defined below for preserving its state
+ * across a kexec reboot using the LUO.
+ *
+ * The state is serialized into a packed structure `struct kvm_luo_ser`
+ * which is handed over to the next kernel via the KHO mechanism.
+ *
+ * This interface is a contract. Any modification to the structure layout
+ * constitutes a breaking change. Such changes require incrementing the
+ * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
+ */
+
+/**
+ * struct kvm_luo_ser - Main serialization structure for a KVM VM.
+ * @type:         The type of VM.
+ */
+struct kvm_luo_ser {
+	u64 type;
+} __packed;
+
+/* The compatibility string for KVM VM file handler */
+#define KVM_LUO_FH_COMPATIBLE	"kvm_vm_luo_v1"
+
+#endif /* _LINUX_KHO_ABI_KVM_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index d047d4c..c1a9621 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -13,3 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
+kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
diff --git a/virt/kvm/kvm_luo.c b/virt/kvm/kvm_luo.c
new file mode 100644
index 0000000..6728877
--- /dev/null
+++ b/virt/kvm/kvm_luo.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2026, Google LLC.
+ * Tarun Sahu <tarunsahu@google.com>
+ *
+ * KVM VM Preservation for Live Update Orchestrator (LUO)
+ */
+
+/**
+ * DOC: KVM VM Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * KVM virtual machines (VMs) can be preserved over a kexec reboot using the
+ * Live Update Orchestrator (LUO) file preservation. This allows userspace
+ * to preserve KVM VM state across kexec reboots.
+ *
+ * The preservation is not intended to be fully transparent. Only specific
+ * VM configuration and state are preserved, while other aspects of the VM
+ * must be re-established or re-configured by userspace after retrieval.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of the KVM VM are preserved across kexec:
+ *
+ * VM Type
+ *   The VM type (e.g., on x86 architecture, the vm_type parameter) is
+ *   preserved.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * The preservation does not cover:
+ *
+ * - vCPUs and vCPU states
+ * - Memspots / Memory slot layout (memslots)
+ * - Interrupt controllers and IRQ routings
+ * - Coalesced MMIO zones
+ * - Device bindings (VFIO/Eventfds)
+ * - Active paging or guest registers state
+ * - etc
+ */
+#include <linux/liveupdate.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/anon_inodes.h>
+#include <linux/magic.h>
+#include <linux/kexec_handover.h>
+#include <linux/kho/abi/kexec_handover.h>
+#include <linux/kho/abi/kvm.h>
+#include "kvm_mm.h"
+
+static bool kvm_luo_can_preserve(struct liveupdate_file_handler *handler,
+				 struct file *file)
+{
+	return file_is_kvm(file);
+}
+
+static int kvm_luo_preserve(struct liveupdate_file_op_args *args)
+{
+	DECLARE_KHOSER_PTR(sd, struct kvm_luo_ser *);
+	struct kvm *kvm = args->file->private_data;
+	struct kvm_luo_ser *ser;
+
+	if (kvm->vm_dead || kvm->vm_bugged)
+		return -EINVAL;
+
+	ser = kho_alloc_preserve(sizeof(*ser));
+	if (IS_ERR(ser))
+		return PTR_ERR(ser);
+
+#if defined(CONFIG_X86)
+	ser->type = kvm->arch.vm_type;
+#elif defined(CONFIG_ARM64)
+	ser->type = kvm_phys_shift(&kvm->arch.mmu);
+	if (kvm_vm_is_protected(kvm))
+		ser->type |= KVM_VM_TYPE_ARM_PROTECTED;
+
+#else
+	ser->type = 0;
+#endif
+
+	KHOSER_STORE_PTR(sd, ser);
+	KHOSER_COPY_TYPEUNSAFE(args->serialized_data, sd);
+
+	return 0;
+}
+
+static atomic_t restored_vm_id = ATOMIC_INIT(0);
+
+static int kvm_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+	char fdname[ITOA_MAX_LEN + 1];
+	struct kvm_luo_ser *ser;
+	struct file *file;
+	struct kvm *kvm;
+	int err = 0;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return -EINVAL;
+
+	snprintf(fdname, sizeof(fdname), "%d",
+		 atomic_inc_return(&restored_vm_id));
+
+	file = kvm_create_vm_file(ser->type, fdname);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_free_ser;
+	}
+
+	kvm = file->private_data;
+
+	args->file = file;
+	kho_restore_free(ser);
+
+	kvm_uevent_notify_vm_create(kvm);
+	return 0;
+
+err_free_ser:
+	kho_restore_free(ser);
+	return err;
+}
+
+static void kvm_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+	struct kvm_luo_ser *ser;
+
+	/*
+	 * in case preservation failed, args->serialized_data will
+	 * be NULL and kvm_luo_preserve takes care of cleaning up.
+	 * If preserve succeeds, this condition fails and unpreserve
+	 * function takes care of cleaning up.
+	 */
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (WARN_ON_ONCE(!ser))
+		return;
+
+	kho_unpreserve_free(ser);
+}
+
+static void kvm_luo_finish(struct liveupdate_file_op_args *args)
+{
+	struct kvm_luo_ser *ser;
+
+	/*
+	 * If retrieve_status is true or set to error, nothing to do here.
+	 * Already cleaned up in kvm_luo_retrieve().
+	 */
+	if (args->retrieve_status)
+		return;
+
+	ser = KHOSER_LOAD_PTR(args->serialized_data);
+	if (!ser)
+		return;
+
+	kho_restore_free(ser);
+}
+
+static const struct liveupdate_file_ops kvm_luo_file_ops = {
+	.can_preserve = kvm_luo_can_preserve,
+	.preserve = kvm_luo_preserve,
+	.retrieve = kvm_luo_retrieve,
+	.unpreserve = kvm_luo_unpreserve,
+	.finish = kvm_luo_finish,
+	.owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler kvm_luo_handler = {
+	.ops = &kvm_luo_file_ops,
+	.compatible = KVM_LUO_FH_COMPATIBLE,
+};
+
+int kvm_luo_init(void)
+{
+	int err = liveupdate_register_file_handler(&kvm_luo_handler);
+
+	if (err && err != -EOPNOTSUPP) {
+		pr_err("Could not register kvm_vm_luo handler: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+void kvm_luo_exit(void)
+{
+	liveupdate_unregister_file_handler(&kvm_luo_handler);
+}
+
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14c3254..d9c3dd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6577,6 +6577,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	if (r)
 		goto err_virt;
 
+	r = kvm_luo_init();
+	if (r)
+		goto err_luo;
+
 	/*
 	 * Registration _must_ be the very last thing done, as this exposes
 	 * /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6590,6 +6594,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	return 0;
 
 err_register:
+	kvm_luo_exit();
+err_luo:
 	kvm_uninit_virtualization();
 err_virt:
 	kvm_gmem_exit();
@@ -6619,6 +6625,8 @@ void kvm_exit(void)
 	 */
 	misc_deregister(&kvm_dev);
 
+	kvm_luo_exit();
+
 	kvm_uninit_virtualization();
 
 	debugfs_remove_recursive(kvm_debugfs_dir);
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 6241617..8719871 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -100,4 +100,12 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slot *slot)
 }
 #endif /* CONFIG_KVM_GUEST_MEMFD */
 
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+int kvm_luo_init(void);
+void kvm_luo_exit(void);
+#else
+static inline int kvm_luo_init(void) { return 0; }
+static inline void kvm_luo_exit(void) {}
+#endif /* CONFIG_LIVEUPDATE_GUEST_MEMFD */
+
 #endif /* __KVM_MM_H__ */
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 2/9] kvm: Prepare core VM structs and helpers for LUO support
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce core infrastructure to support VM preservation with LUO.

First two changes are just refactoring, no functional change, third
change introduces a new member in struct kvm.
- Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code.
- Add a public kvm_create_vm_file() helper wrapping kvm_create_vm()
  and anon_inode_getfile() to provide a unified VM file creation API.
- Track a weak reference to the backing file in struct kvm under
  CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution
  without circular lifetime dependencies.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 include/linux/kvm_host.h | 14 +++++++
 virt/kvm/kvm_main.c      | 79 +++++++++++++++++++++++++++++-----------
 virt/kvm/kvm_mm.h        |  3 ++
 3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab8cfae..cbb5eb9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -874,6 +874,18 @@ struct kvm {
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	/* Protected by slots_lock (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
+#endif
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Weak reference to the VFS file backing this KVM instance. Stored
+	 * without incrementing the file refcount to prevent a circular lifetime
+	 * dependency (since file->private_data already pins this struct kvm).
+	 * Used exclusively to resolve the file pointer back from struct kvm.
+	 *
+	 * Written/cleared via rcu_assign_pointer() and read locklessly under
+	 * RCU (e.g. via get_file_active() to prevent ABA races).
+	 */
+	struct file *vm_file;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
 };
@@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm);
 bool kvm_get_kvm_safe(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);
 bool file_is_kvm(struct file *file);
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname);
 void kvm_put_kvm_no_destroy(struct kvm *kvm);
+void kvm_uevent_notify_vm_create(struct kvm *kvm);
 
 static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e44c20c..14c3254 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -67,9 +67,6 @@
 #include <linux/kvm_dirty_ring.h>
 
 
-/* Worst case buffer size needed for holding an integer. */
-#define ITOA_MAX_LEN 12
-
 MODULE_AUTHOR("Qumranet");
 MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
 MODULE_LICENSE("GPL");
@@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 {
 	struct kvm *kvm = filp->private_data;
 
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Clear the weak reference of the vm file.
+	 * In case vm file is closed by userspace, but kvm still has
+	 * other users like vCPUs, clearing this pointer ensures
+	 * that we don't have a dangling pointer to a closed file.
+	 *
+	 * Cleared via rcu_assign_pointer() to ensure proper memory visibility
+	 * for concurrent lockless readers under RCU.
+	 */
+	rcu_assign_pointer(kvm->vm_file, NULL);
+#endif
+
 	kvm_irqfd_release(kvm);
 
 	kvm_put_kvm(kvm);
@@ -5477,11 +5487,47 @@ bool file_is_kvm(struct file *file)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm);
 
+struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
+{
+	struct kvm *kvm = kvm_create_vm(type, fdname);
+	struct file *file;
+
+	if (IS_ERR(kvm))
+		return ERR_CAST(kvm);
+
+	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+	if (IS_ERR(file)) {
+		kvm_put_kvm(kvm);
+		return file;
+	}
+
+#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
+	/*
+	 * Weak reference to the file (without get_file()) to prevent a circular
+	 * dependency. Safe because the file's release path clears this pointer
+	 * and drops its reference to the VM.
+	 *
+	 * Written via rcu_assign_pointer() because the pointer can be read
+	 * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
+	 * get_file_active() to prevent lockless ABA races).
+	 */
+	rcu_assign_pointer(kvm->vm_file, file);
+#endif
+
+	/*
+	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
+	 * already set, with ->release() being kvm_vm_release().  In error
+	 * cases it will be called by the final fput(file) and will take
+	 * care of doing kvm_put_kvm(kvm).
+	 */
+
+	return file;
+}
+
 static int kvm_dev_ioctl_create_vm(unsigned long type)
 {
 	char fdname[ITOA_MAX_LEN + 1];
 	int r, fd;
-	struct kvm *kvm;
 	struct file *file;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
@@ -5490,31 +5536,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 
 	snprintf(fdname, sizeof(fdname), "%d", fd);
 
-	kvm = kvm_create_vm(type, fdname);
-	if (IS_ERR(kvm)) {
-		r = PTR_ERR(kvm);
-		goto put_fd;
-	}
-
-	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+	file = kvm_create_vm_file(type, fdname);
 	if (IS_ERR(file)) {
 		r = PTR_ERR(file);
-		goto put_kvm;
+		goto put_fd;
 	}
 
-	/*
-	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
-	 * already set, with ->release() being kvm_vm_release().  In error
-	 * cases it will be called by the final fput(file) and will take
-	 * care of doing kvm_put_kvm(kvm).
-	 */
-	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data);
 
 	fd_install(fd, file);
 	return fd;
 
-put_kvm:
-	kvm_put_kvm(kvm);
 put_fd:
 	put_unused_fd(fd);
 	return r;
@@ -6342,6 +6374,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
 	kfree(env);
 }
 
+void kvm_uevent_notify_vm_create(struct kvm *kvm)
+{
+	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
+}
+
 static void kvm_init_debug(void)
 {
 	const struct file_operations *fops;
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 7510ca9..6241617 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -6,6 +6,9 @@
 #include <linux/kvm.h>
 #include <linux/kvm_types.h>
 
+/* Worst case buffer size needed for holding an integer as a string. */
+#define ITOA_MAX_LEN 12
+
 /*
  * Architectures can choose whether to use an rwlock or spinlock
  * for the mmu_lock.  These macros, for use in common code
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 1/9] liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel
In-Reply-To: <20260622184851.2309827-1-tarunsahu@google.com>

Introduce the LIVEUPDATE_GUEST_MEMFD Kconfig option. This option
enables live update support for KVM guest_memfd files, enabling
guest_memfd-backed memory preservation across kernel upgrades.

Currently this support only guest_memfd files that are full-shared
and pre-faulted.

Signed-off-by: Tarun Sahu <tarunsahu@google.com>
---
 kernel/liveupdate/Kconfig | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index c13af38..2490f9a 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -86,4 +86,19 @@ config LIVEUPDATE_MEMFD
 
 	  If unsure, say N.
 
+config LIVEUPDATE_GUEST_MEMFD
+	bool "Live update support for guest_memfd"
+	depends on LIVEUPDATE
+	depends on KVM_GUEST_MEMFD
+	default LIVEUPDATE
+	help
+	  Enable live update support for KVM guest_memfd files. This allows
+	  preserving VM Memory backed by guest_memfd file across kernel live
+	  updates.
+
+	  This can only be used for the guest_memfd that are fully-shared
+	  and pre-faulted.
+
+	  If unsure, say N.
+
 endmenu
-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply related

* [PATCH v3 0/9] liveupdate: kvm: guest_memfd preservation
From: Tarun Sahu @ 2026-06-22 18:48 UTC (permalink / raw)
  To: Jonathan Corbet, Mike Rapoport, Paolo Bonzini, Alexander Graf,
	Shuah Khan, Pratyush Yadav, Tarun Sahu, Pasha Tatashin
  Cc: kvm, linux-mm, kexec, linux-doc, linux-kselftest, linux-kernel

Hello,
This is Non-RFC patch series for guest_memfd preservation. After
having multiple discussion across hypervisor liveupdate meeting,
guest_memfd bi-weekly meeting, the design for the basic support of
guest_memfd preservation is final. This series is going to include
guest_memfd which are fully shared and does not support private mem
and backed by PAGE_SIZE pages.

Steps to test:
1. Compile Kernel with CONFIG_LIVEUPDATE_GUEST_MEMFD=y
2. boot kernel with command line: kho=on liveupdate=on
3. run the following kselftest
	$ .selftests/kvm/guest_memfd_preservation_test --stage 1
	$ <kexec> --reuse-cmdline
	$ .selftests/kvm/guest_memfd_preservation_test --stage 2

NOTE: Assert the following:
	$ ls /dev/liveupdate
	$ ls /dev/kvm
	$ dmesg | grep liveupdate # (should have kvm_vm_luo &&
		# guest_memfd_luo handler registered)

The changes are rebased on:
	kvm/next + liveupdate/next (merge) + [3] + [4] + [5]
	Where,
	[3]: luo: conversion of serialized_data to KHOSER_PTR
	[4]: luo: APIs to retrieve file internally from session
	[5]: selftests: liveupdate sefltests library
Here is the github repo:
	https://github.com/tar-unix/linux/tree/gmem-pre

V3 <- RFC V2 [2]
1. Finalize the design
2. resolve sashiko reported bugs
3. Use of KHOSER_PTR instead of raw serialized_data as per [3]

RFC V2 [2] <- RFC V1 [1]
1. Removed mem_attr_array as it is not needed for fully-shared
2. Removed pre-faulted condition
3. Added vm_type preservation for ARM64.
4. Removed liveupdate_get_file_incoming api patch as it is sent
   separately [4] by Samiullah.

[1] https://lore.kernel.org/all/cover.1779080766.git.tarunsahu@google.com/
[2] https://lore.kernel.org/all/c054ba0fb2639932bbe354420d3f4f84cce84905.1780676742.git.tarunsahu@google.com/
[3] https://lore.kernel.org/all/20260622111215.4157974-1-tarunsahu@google.com/
[4] https://lore.kernel.org/all/20260613012521.835490-1-skhawaja@google.com/
[5] https://lore.kernel.org/all/20260612214512.464146-1-vipinsh@google.com/

Tarun Sahu (9):
  liveupdate: Add LIVEUPDATE_GUEST_MEMFD config option
  kvm: Prepare core VM structs and helpers for LUO support
  kvm: kvm_luo: Allow kvm preservation with LUO
  kvm: guest_memfd: Move internal definitions and helper to new header
  kvm: guest_memfd: Add support for freezing and unfreezing mappings
  kvm: guest_memfd_luo: add support for guest_memfd preservation
  docs: add documentation for guest_memfd preservation via LUO
  selftests: kvm: Split ____vm_create() to expose init helpers
  selftests: kvm: Add guest_memfd_preservation_test

 Documentation/core-api/liveupdate.rst         |   1 +
 Documentation/liveupdate/vmm.rst              | 107 ++++
 MAINTAINERS                                   |  14 +
 include/linux/kho/abi/kvm.h                   | 106 ++++
 include/linux/kvm_host.h                      |  14 +
 kernel/liveupdate/Kconfig                     |  15 +
 tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
 .../kvm/guest_memfd_preservation_test.c       | 236 +++++++++
 .../testing/selftests/kvm/include/kvm_util.h  |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  26 +-
 virt/kvm/Makefile.kvm                         |   1 +
 virt/kvm/guest_memfd.c                        | 185 +++++--
 virt/kvm/guest_memfd.h                        |  44 ++
 virt/kvm/guest_memfd_luo.c                    | 497 ++++++++++++++++++
 virt/kvm/kvm_luo.c                            | 195 +++++++
 virt/kvm/kvm_main.c                           |  94 +++-
 virt/kvm/kvm_mm.h                             |  15 +
 17 files changed, 1477 insertions(+), 81 deletions(-)
 create mode 100644 Documentation/liveupdate/vmm.rst
 create mode 100644 include/linux/kho/abi/kvm.h
 create mode 100644 tools/testing/selftests/kvm/guest_memfd_preservation_test.c
 create mode 100644 virt/kvm/guest_memfd.h
 create mode 100644 virt/kvm/guest_memfd_luo.c
 create mode 100644 virt/kvm/kvm_luo.c

-- 
2.55.0.rc0.786.g65d90a0328-goog


^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Conor Dooley @ 2026-06-22 18:39 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Nuno Sá, Rodrigo Alencar, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <20260622172911.48259a0c@jic23-huawei>

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On Mon, Jun 22, 2026 at 05:29:11PM +0100, Jonathan Cameron wrote:
> > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > (I suspect it doesn't!)
> > > > 
> > > > Just thinking as I type, but could we do something a bit nasty with
> > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > under serializing locks. 
> > > > 
> > > > Agreed though that this would be nicer as an SPI thing that let
> > > > us specify that a single CS is share by multiple devices and their
> > > > is some other signal acting to select which one we are talking to.
> > > >   
> > > 
> > > If the device-addressing on the same chip-select is to be handled
> > > by the spi framework, wouldn't we lose device-specific features?
> > > 
> > > I understand that this multi-device feature is there mostly to extend the
> > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > 
> > > 	"MULTI DEVICE SW LDAC MODE"
> > > 
> > > exists so that software can update channel values accross multiple devices.  
> > 
> > Right! You do have a point! I agree the main driver for a feature like
> > this is likely to extend the channel count and effectively "aggregate"
> > devices.
> > 
> > But I would say that even with the spi solution the MULTI DEVICE stuff
> > should be doable (as we still need a sort of adi,pin-id property). 
> > 
> > But yes, I do feel that the whole feature is for aggregation so seeing
> > one device with 32 channels is the expectation here? Rather than seeing
> > two devices with 16 channels.
> 
> Agreed - if we have messages that address both devices at once that needs
> to be a unified driver and given they are about triggering simultaneous
> update of all channels it needs to look like one big device.
> This ends up similar to how we handle daisy chain devices.
> 
> The question of what to do on devices that don't have this feature
> is rather different. Good thing you read the datasheet :)

I'm not sure it really is, the intent for the microchip devices I think
is pretty similar. The mcp3911 datasheet cites three-phase power
metering using three devices as a typical use-case, for example.
Probably creating an amalgamated device is a good fit there too?

I assume an amalgamated device for this ADI product means per-channel ID
properties? If so, I think they should be made generic and the Microchip
products retrofitted to use them, with a fallback to the proprietary
property. Not going to ask for the support for multiple devices in those
drivers, since the current way doesn't work and there'd be no loss of
support. Someone from Microchip can do that. The proprietary property
to generic conversion should be straightforward and provides weight to
an argument for this being generic, since that'd be three devices that
can all share?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH] Docs/driver-api/uio-howto: document mmap_prepare callback
From: Doehyun Baek @ 2026-06-22 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jonathan Corbet, Shuah Khan
  Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, linux-doc,
	linux-kernel, Doehyun Baek

The UIO howto still documents an mmap callback in struct uio_info.
That field was replaced by mmap_prepare, which takes a struct
vm_area_desc.

A UIO driver following the current howto no longer builds because
struct uio_info has no mmap member. Update the documented callback
signature and matching text to match the current API.

Fixes: 933f05f58ac6 ("uio: replace deprecated mmap hook with mmap_prepare in uio_info")
Signed-off-by: Doehyun Baek <doehyunbaek@gmail.com>
---
 Documentation/driver-api/uio-howto.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst
index 907ffa3b38f5..c08472dfbcfe 100644
--- a/Documentation/driver-api/uio-howto.rst
+++ b/Documentation/driver-api/uio-howto.rst
@@ -246,10 +246,10 @@ the members are required, others are optional.
    hardware interrupt number. The flags given here will be used in the
    call to :c:func:`request_irq()`.
 
--  ``int (*mmap)(struct uio_info *info, struct vm_area_struct *vma)``:
+-  ``int (*mmap_prepare)(struct uio_info *info, struct vm_area_desc *desc)``:
    Optional. If you need a special :c:func:`mmap()`
    function, you can set it here. If this pointer is not NULL, your
-   :c:func:`mmap()` will be called instead of the built-in one.
+   ``mmap_prepare`` will be called instead of the built-in one.
 
 -  ``int (*open)(struct uio_info *info, struct inode *inode)``:
    Optional. You might want to have your own :c:func:`open()`,

base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd
-- 
2.43.0


^ permalink raw reply related

* Re: [RFC PATCH 0/2] kasan: hw_tags: Add option to tag only at allocation time
From: Catalin Marinas @ 2026-06-22 17:13 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Dev Jain, ryabinin.a.a, akpm, corbet, glider, andreyknvl, dvyukov,
	vincenzo.frascino, kasan-dev, linux-mm, linux-kernel, skhan,
	workflows, linux-doc, linux-arm-kernel, ryan.roberts,
	anshuman.khandual, kaleshsingh, 21cnbao, david, will
In-Reply-To: <2208123f-8a51-483b-aa93-c35d8d053d25@kernel.org>

Hi Harry,

On Mon, Jun 22, 2026 at 09:42:10PM +0900, Harry Yoo wrote:
> On 6/19/26 10:19 PM, Catalin Marinas wrote:
> > On Thu, Jun 18, 2026 at 10:35:15PM +0900, Harry Yoo wrote:
> >> On 6/12/26 1:44 PM, Dev Jain wrote:
> >>> Now, when a memory object will be freed, it will retain the random tag it
> >>> had at allocation time. This compromises on catching UAF bugs, till the
> >>> time the object is not reallocated, at which point it will have a new
> >>> random tag.
> >>>
> >>> Hence, not catching "use-after-free-before-reallocation" and not catching
> >>> "double-free" will be the compromise for reduced KASAN overhead.
> >>
> >> I doubt users who care about security enough to enable HW_TAGS KASAN
> >> are willing to compromise on security just to save a few instructions
> >> to store tags in the free path.
> >>
> >> To me, it looks like too much of a compromise on security for little
> >> performance gain.
> > 
> > I don't think there's much compromise on security for use-after-free.
> 
> I think it depends... OH, WAIT! I see what you mean.
> 
> You mean use-after-free before reallocation does not lead to much
> compromise on security because objects are initialized after allocation?
> 
> You're probably right.
> 
> Hmm, but stores to e.g.) free pointer, fields initialized by
> constructor or accessed by SLAB_TYPESAFE_BY_RCU semantics after free
> will be undiscovered if they happen before reallocation.

Even with SLAB_TYPESAFE_BY_RCU, the object isn't tagged on free either
(or realloc, only if the actual slab page ends up freed). But we don't
get type confusion for such slab.

However, without tagging on free, one could argue that it reduces
security for cases where the page is re-allocated as untagged - e.g. all
user pages mapped without PROT_MTE. Currently we have a deterministic
tag check fault if the page is coloured as KASAN_TAG_INVALID. I think
for this patch, it might be better to only do such skip on free in
kasan_poison_slab() rather than kasan_poison(). Freed pages would then
be tagged.

An alternative would be tagging on free only with a new tag and skipping
it on re-alloc. But we'd need to track when it's a completely new
allocation or a reused object (I haven't looked I'm pretty sure it's
doable).

-- 
Catalin

^ permalink raw reply

* Re: [PATCH RFC v5 6/6] iio: osf: register IIO devices from capabilities
From: Jonathan Cameron @ 2026-06-22 17:07 UTC (permalink / raw)
  To: Jinseob Kim
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, David Lechner,
	Nuno Sá, Andy Shevchenko, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-doc, linux-kernel
In-Reply-To: <20260616072242.3942-7-kimjinseob88@gmail.com>

On Tue, 16 Jun 2026 16:22:42 +0900
Jinseob Kim <kimjinseob88@gmail.com> wrote:

> Register IIO devices for supported Open Sensor Fusion capability entries
> and push received samples into IIO buffers when enabled.
> 
> Signed-off-by: Jinseob Kim <kimjinseob88@gmail.com>
Sashiko had a few comments.  The last one on the unitilialized heap
memory needs a new version of the fix from me.

Hopefully I'll get to that in the next few days,

https://sashiko.dev/#/patchset/20260529121005.1470-1-kimjinseob88%40gmail.com

The one about intermediate build issues (if correct) suggests you didn't
ensure this series builds after each patch. Please make sure to do that
to avoid breaking bisectability of the kernel.

Thanks,

Jonathan

> ---
>  drivers/iio/opensensorfusion/Kconfig    |  11 +-
>  drivers/iio/opensensorfusion/Makefile   |   3 +-
>  drivers/iio/opensensorfusion/osf_core.c | 253 ++++++++++++++++++++--
>  drivers/iio/opensensorfusion/osf_core.h |  52 +++++
>  drivers/iio/opensensorfusion/osf_iio.c  | 275 ++++++++++++++++++++++++
>  drivers/iio/opensensorfusion/osf_iio.h  |  22 ++
>  6 files changed, 586 insertions(+), 30 deletions(-)
>  create mode 100644 drivers/iio/opensensorfusion/osf_iio.c
>  create mode 100644 drivers/iio/opensensorfusion/osf_iio.h
> 
> diff --git a/drivers/iio/opensensorfusion/Kconfig b/drivers/iio/opensensorfusion/Kconfig
> index d393eb3aa..8b9376d28 100644
> --- a/drivers/iio/opensensorfusion/Kconfig
> +++ b/drivers/iio/opensensorfusion/Kconfig
> @@ -5,11 +5,10 @@ config OPEN_SENSOR_FUSION
>  	depends on IIO
>  	depends on SERIAL_DEV_BUS
>  	select CRC32
> +	select IIO_BUFFER
> +	select IIO_KFIFO_BUF
>  	help
> -	  Build the Open Sensor Fusion UART receive path.
> +	  Build the Open Sensor Fusion UART IIO driver.
>  
> -	  The driver receives OSF protocol frames over a serdev UART.
> -	  Frames are decoded and validated before being passed to the
> -	  driver core.
> -	  This patch only adds the transport path.
> -	  IIO device registration is added separately.
> +	  The driver receives OSF protocol frames over a serdev UART and
> +	  registers IIO devices for supported capability entries.
Avoid this churn. I wouldn't worry about it being a little forwards
looking when added in the earlier patch and directly go to the final
text.

> diff --git a/drivers/iio/opensensorfusion/osf_core.c b/drivers/iio/opensensorfusion/osf_core.c
> index 137fb7166..61ef55646 100644
> --- a/drivers/iio/opensensorfusion/osf_core.c
> +++ b/drivers/iio/opensensorfusion/osf_core.c

>  
> -static int osf_core_validate_sensor_sample(const struct osf_frame *frame)
> +static int osf_core_register_capabilities(struct osf_device *osf,
> +					  const struct osf_capability_cache *cache)
>  {
> +	struct iio_dev *indio_dev;
> +	unsigned int i;
> +	int ret;
> +
> +	if (osf->capability_cache.valid)
> +		return 0;
> +
> +	for (i = 0; i < cache->capability_count; i++) {
> +		if (!osf_iio_sensor_supported(cache->entries[i].sensor_type,
> +					      cache->entries[i].channel_count))
> +			continue;
> +
> +		if (osf_core_capability_is_duplicate(cache, i))
> +			return -EEXIST;
> +	}
> +
> +	for (i = 0; i < cache->capability_count; i++) {
> +		if (!osf_iio_sensor_supported(cache->entries[i].sensor_type,
> +					      cache->entries[i].channel_count))
> +			continue;
> +
> +		ret = osf_iio_register_sensor(osf->dev, &cache->entries[i],
> +					      osf, &indio_dev);
> +		if (ret)
> +			goto err_unregister;
> +
> +		osf->iio_devs[osf->iio_dev_count].sensor_type =
> +			cache->entries[i].sensor_type;
> +		osf->iio_devs[osf->iio_dev_count].sensor_index =
> +			cache->entries[i].sensor_index;
> +		osf->iio_devs[osf->iio_dev_count].indio_dev = indio_dev;
> +		osf->iio_dev_count++;

Probably use a designated initializer for this one
		ost->iio_dev[osf->iio_dev_count++] = (struct osf_iio_binding) {
			.sensor_type = ...

		};

Not a problem if the lines are over 80 chars given this should be generally easier
to read.

> +
> +static int osf_core_handle_sensor_sample(struct osf_device *osf,
> +					 const struct osf_frame *frame)
> +{
> +	struct osf_latest_sample *latest;
>  	struct osf_sensor_sample sample;
> +	struct iio_dev *indio_dev;
> +	s32 values[OSF_MAX_SAMPLE_CHANNELS] = { };
> +	unsigned int i;
> +	int ret;
> +
> +	ret = osf_protocol_decode_sensor_sample(frame, &sample);
> +	if (ret)
> +		return ret;
> +
> +	if (sample.channel_count > OSF_MAX_SAMPLE_CHANNELS)
> +		return -E2BIG;
> +
> +	for (i = 0; i < sample.channel_count; i++) {
> +		ret = osf_protocol_sensor_sample_value(&sample, i, &values[i]);
> +		if (ret)
> +			return ret;
> +	}
>  
> -	return osf_protocol_decode_sensor_sample(frame, &sample);
> +	mutex_lock(&osf->latest_lock);

This may well be better as a scoped_guard()

> +	latest = osf_core_find_latest_sample(osf, sample.sensor_type,
> +					     sample.sensor_index);
> +	if (!latest) {
> +		mutex_unlock(&osf->latest_lock);

scoped_guard() would allow you to return here without worrying
about the manual unlock.

> +		return -E2BIG;
> +	}
> +
> +	memcpy(latest->values, values, sizeof(values));
> +	latest->sensor_type = sample.sensor_type;
> +	latest->sensor_index = sample.sensor_index;
> +	latest->channel_count = sample.channel_count;
> +	latest->sample_format = sample.sample_format;
> +	latest->scale_nano = sample.scale_nano;
> +	latest->sequence = frame->sequence;
> +	latest->timestamp_us = frame->timestamp_us;
> +	latest->valid = true;
> +	osf->last_sequence = frame->sequence;
> +	mutex_unlock(&osf->latest_lock);
> +
> +	indio_dev = osf_core_find_iio_dev(osf, sample.sensor_type,
> +					  sample.sensor_index);
> +	if (!indio_dev)
> +		return 0;
> +
> +	return osf_iio_push_sample(indio_dev, values, sample.channel_count);
>  }

>  
> @@ -73,27 +260,47 @@ int osf_core_receive_frame(struct osf_device *osf, const u8 *buf, size_t len)
>  
>  	switch (frame.message_type) {
>  	case OSF_MSG_SENSOR_SAMPLE:
> -		ret = osf_core_validate_sensor_sample(&frame);
> -		break;
> +		return osf_core_handle_sensor_sample(osf, &frame);
>  	case OSF_MSG_DEVICE_STATUS:
> -		ret = osf_core_validate_device_status(&frame);
> -		break;
> +		return osf_core_handle_device_status(osf, &frame);
>  	case OSF_MSG_CAPABILITY_REPORT:
> -		ret = osf_core_validate_capability_report(&frame);
> -		break;
> +		return osf_core_handle_capability_report(osf, &frame);
>  	default:
>  		if (frame.message_type >= OSF_RESERVED_MSG_FIRST &&
>  		    frame.message_type <= OSF_RESERVED_MSG_LAST)
> -			ret = 0;
> -		else if (frame.message_type >= OSF_VENDOR_PRIVATE_FIRST)
> -			ret = 0;
> -		else
> -			ret = -EOPNOTSUPP;
> -		break;
> +			return 0;
> +		if (frame.message_type >= OSF_VENDOR_PRIVATE_FIRST)
> +			return 0;
> +		return -EOPNOTSUPP;
>  	}

See if you can rework original code to reduce the churn here.

> +}
> +
> +int osf_core_read_latest_sample(struct osf_device *osf, u16 sensor_type,
> +				u16 sensor_index, unsigned int channel,
> +				s32 *value)
> +{
> +	const struct osf_latest_sample *latest;
> +	unsigned int i;
> +	int ret = -ENODATA;
> +
> +	if (!osf || !value)
> +		return -EINVAL;
> +
> +	mutex_lock(&osf->latest_lock);

Looks like a good place to use guard(mutex)(&osf->latest_lock);
Remember to include cleanup.h

> +	for (i = 0; i < osf->latest_sample_count; i++) {
> +		latest = &osf->latest_samples[i];
> +		if (latest->sensor_type != sensor_type ||
> +		    latest->sensor_index != sensor_index)
> +			continue;
> +
> +		if (!latest->valid || channel >= latest->channel_count)
> +			break;
>  
> -	if (!ret)
> -		osf->last_sequence = frame.sequence;
> +		*value = latest->values[channel];
> +		ret = 0;
With guard, you can return directly here.
> +		break;
> +	}
> +	mutex_unlock(&osf->latest_lock);
This gets handled automatically on leaving scope

Then if you get here you can just do
	return -ENODATA;

>  
>  	return ret;
>  }


> diff --git a/drivers/iio/opensensorfusion/osf_iio.c b/drivers/iio/opensensorfusion/osf_iio.c
> new file mode 100644
> index 000000000..862a797f4
> --- /dev/null
> +++ b/drivers/iio/opensensorfusion/osf_iio.c

> +
> +bool osf_iio_sensor_supported(u16 sensor_type, u16 channel_count)
> +{
> +	return !!osf_iio_find_sensor_spec(sensor_type, channel_count);
The !! is getting used a lot less in modern kernel code. Linus Torvalds
once pointed out how hard it is to read.  Maybe != 0 is clearer and
let the compiler do the optimization if it wants.

> +}
> +
> +const char *osf_iio_sensor_name(u16 sensor_type)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(osf_iio_sensor_specs); i++) {
> +		if (osf_iio_sensor_specs[i].sensor_type == sensor_type)
> +			return osf_iio_sensor_specs[i].name;
> +	}
> +
> +	return NULL;
> +}

> +}


> +
> +int osf_iio_push_sample(struct iio_dev *indio_dev, const s32 *values,
> +			unsigned int channel_count)

As you are comparing it with the reported number of channels from spec->channel
count I would match type with that (u16 I think)

> +{
> +	struct osf_iio_state *state = iio_priv(indio_dev);
> +	s64 timestamp;
> +
> +	if (channel_count != state->spec->channel_count)
> +		return -EPROTO;
> +
> +	/* This is only a fast path; IIO rechecks buffer state while pushing. */
> +	if (!iio_buffer_enabled(indio_dev))
> +		return 0;
> +
> +	timestamp = iio_get_time_ns(indio_dev);
> +
> +	return iio_push_to_buffers_with_ts_unaligned(indio_dev, values,
> +						     channel_count * sizeof(*values),
> +						     timestamp);
> +}


^ permalink raw reply

* Re: [PATCH v3 8/8] docs: misc: amd-sbi: Document SBTSI userspace interface
From: Randy Dunlap @ 2026-06-22 16:57 UTC (permalink / raw)
  To: Akshay Gupta, linux-doc, linux-kernel, linux-hwmon
  Cc: corbet, skhan, linux, arnd, gregkh, NaveenKrishna.Chatradhi,
	Anand.Umarji, Prathima.Lk
In-Reply-To: <20260622135821.2190260-9-Akshay.Gupta@amd.com>



On 6/22/26 6:58 AM, Akshay Gupta wrote:
> From: Prathima <Prathima.Lk@amd.com>
> 
> - Document AMD sideband IOCTL description defined
>   for SBTSI and its usage.
>   User space C-APIs are made available by esmi_oob_library [1],
>   which is provided by the E-SMS project [2].
> 
>   Link: https://github.com/amd/esmi_oob_library [1]
>   Link: https://www.amd.com/en/developer/e-sms.html [2]
> 
> Include a user-space open example for /dev/sbtsi-* and list auxiliary
> bus sysfs paths.
> 
> Reviewed-by: Akshay Gupta <Akshay.Gupta@amd.com>
> Signed-off-by: Prathima <Prathima.Lk@amd.com>
> ---
> Changes since v2:
> - Update misc node names info as per socket
> 
> Changes since v1:
> - Elaborate the document
>  Documentation/misc-devices/amd-sbi.rst | 68 ++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/Documentation/misc-devices/amd-sbi.rst b/Documentation/misc-devices/amd-sbi.rst
> index f91ddadefe48..fbbbc504119f 100644
> --- a/Documentation/misc-devices/amd-sbi.rst
> +++ b/Documentation/misc-devices/amd-sbi.rst
> @@ -48,6 +48,60 @@ Access restrictions:
>   * APML Mailbox messages and Register xfer access are read-write,
>   * CPUID and MCA_MSR access is read-only.
>  
> +SBTSI device
> +============
> +
> +sbtsi driver under the drivers/misc/amd-sbi creates miscdevice

   The sbtsi driver in the drivers/misc/amd-sbi/ directory creates a miscdevice

> +/dev/sbtsi-* to let user space programs run APML TSI register xfer

                                                                 transfer
?

> +commands.
> +
> +The driver supports both I2C and I3C transports for SB-TSI targets.
> +The transport is selected by the bus where the device is enumerated.
> +
> +Misc device:
> + * In 1P socket 0: /dev/sbtsi-4c
> + * In 2P socket 0: /dev/sbtsi-4c, socket 1: /dev/sbtsi-48
> +
> +.. code-block:: bash
> +
> +   $ ls -al /dev/sbtsi-4c
> +   crw-------    1 root     root       10, 116 Apr  2 05:22 /dev/sbtsi-4c
> +
> +
> +Access restrictions:
> + * Only root user is allowed to open the file.
> + * APML TSI Register xfer access is read-write.

                        transfer
?

> +
> +SBTSI hwmon interface
> +=====================
[snip]

-- 
~Randy


^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22 16:51 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Peter Zijlstra, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Julia Lawall, Yury Norov, linux-doc,
	linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <08b3c961-18bb-43d9-8d7f-8a87bcad0afa@infradead.org>

On Mon, 22 Jun 2026 09:40:45 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> > Did you forget your C 101 class? If you use a function, you gotta
> > include the relevant header.  
> 
> Also item #1 in Documentation/process/submit-checklist.rst.

What is that? Remove all trace_printk()s before you submit?

Because that is what you should do. But now you also need to remember
to remove the include <linux/trace_printk.h> too. Or, I guess if
someone uses it a lot, they may just keep it in their files without the
trace_printk()s.

-- Steve

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox