* [PATCH v6 00/20] Live Update Orchestrator
@ 2025-11-15 23:33 Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
` (19 more replies)
0 siblings, 20 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
This series introduces the Live Update Orchestrator, a kernel subsystem
designed to facilitate live kernel updates using a kexec-based reboot.
This capability is critical for cloud environments, allowing hypervisors
to be updated with minimal downtime for running virtual machines. LUO
achieves this by preserving the state of selected resources, such as
memory, devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving memfd file
descriptors, which allows critical in-memory data, such as guest RAM or
any other large memory region, to be maintained in RAM across the kexec
reboot.
The other series that use LUO, are VFIO [1], IOMMU [2], and PCI [3]
preservations.
Github repo of this series [4].
The core of LUO is a framework for managing the lifecycle of preserved
resources through a userspace-driven interface. Key features include:
- Session Management
Userspace agent (i.e. luod [5]) creates named sessions, each
represented by a file descriptor (via centralized agent that controls
/dev/liveupdate). The lifecycle of all preserved resources within a
session is tied to this FD, ensuring automatic kernel cleanup if the
controlling userspace agent crashes or exits unexpectedly.
- File Preservation
A handler-based framework allows specific file types (demonstrated
here with memfd) to be preserved. Handlers manage the serialization,
restoration, and lifecycle of their specific file types.
- File-Lifecycle-Bound State
A new mechanism for managing shared global state whose lifecycle is
tied to the preservation of one or more files. This is crucial for
subsystems like IOMMU or HugeTLB, where multiple file descriptors may
depend on a single, shared underlying resource that must be preserved
only once.
- KHO Integration
LUO drives the Kexec Handover framework programmatically to pass its
serialized metadata to the next kernel. The LUO state is finalized and
added to the kexec image just before the reboot is triggered. In the
future this step will also be removed once stateless KHO is
merged [6].
- Userspace Interface
Control is provided via ioctl commands on /dev/liveupdate for creating
and retrieving sessions, as well as on session file descriptors for
managing individual files.
- Testing
The series includes a set of selftests, including userspace API
validation, kexec-based lifecycle tests for various session and file
scenarios, and a new in-kernel test module to validate the FLB logic.
Changelog since v5 [7]
- Moved internal luo_alloc/free_* memory helpers to generic
kho_alloc/free_* APIs, and submitted as a separate KHO series [8].
- Moved the liveupdate_reboot() invocation from kernel/reboot.c to
kernel_kexec() in kernel/kexec_core.c.
- Moved generic KHO enabling patches (debugfs, kimage logic) out of this
series and into the base KHO series.
- Feedback: Addressed review comments from Mike Rapoport and Pratyush
Yadav.
[1] https://lore.kernel.org/all/20251018000713.677779-1-vipinsh@google.com/
[2] https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google.com
[3] https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel.org
[4] https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v6
[5] https://tinyurl.com/luoddesign
[6] https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com
[7] https://lore.kernel.org/all/20251107210526.257742-1-pasha.tatashin@soleen.com
[8] https://lore.kernel.org/all/20251114190002.3311679-1-pasha.tatashin@soleen.com
Pasha Tatashin (14):
liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
liveupdate: luo_core: integrate with KHO
kexec: call liveupdate_reboot() before kexec
liveupdate: luo_session: add sessions support
liveupdate: luo_ioctl: add user interface
liveupdate: luo_file: implement file systems callbacks
liveupdate: luo_session: Add ioctls for file preservation
liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
docs: add luo documentation
MAINTAINERS: add liveupdate entry
selftests/liveupdate: Add userspace API selftests
selftests/liveupdate: Add kexec-based selftest for session lifecycle
selftests/liveupdate: Add kexec test for multiple and empty sessions
tests/liveupdate: Add in-kernel liveupdate test
Pratyush Yadav (6):
mm: shmem: use SHMEM_F_* flags instead of VM_* flags
mm: shmem: allow freezing inode mapping
mm: shmem: export some functions to internal.h
liveupdate: luo_file: add private argument to store runtime state
mm: memfd_luo: allow preserving memfd
docs: add documentation for memfd preservation via LUO
Documentation/core-api/index.rst | 1 +
Documentation/core-api/liveupdate.rst | 71 ++
Documentation/mm/index.rst | 1 +
Documentation/mm/memfd_preservation.rst | 23 +
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 2 +
Documentation/userspace-api/liveupdate.rst | 20 +
MAINTAINERS | 15 +
include/linux/liveupdate.h | 265 +++++
include/linux/liveupdate/abi/luo.h | 238 +++++
include/linux/liveupdate/abi/memfd.h | 88 ++
include/linux/shmem_fs.h | 23 +
include/uapi/linux/liveupdate.h | 216 +++++
kernel/kexec_core.c | 5 +
kernel/liveupdate/Kconfig | 27 +
kernel/liveupdate/Makefile | 9 +
kernel/liveupdate/luo_core.c | 252 +++++
kernel/liveupdate/luo_file.c | 906 ++++++++++++++++++
kernel/liveupdate/luo_flb.c | 658 +++++++++++++
kernel/liveupdate/luo_internal.h | 95 ++
kernel/liveupdate/luo_ioctl.c | 223 +++++
kernel/liveupdate/luo_session.c | 600 ++++++++++++
lib/Kconfig.debug | 23 +
lib/tests/Makefile | 1 +
lib/tests/liveupdate.c | 143 +++
mm/Makefile | 1 +
mm/internal.h | 6 +
mm/memfd_luo.c | 671 +++++++++++++
mm/shmem.c | 50 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/liveupdate/.gitignore | 3 +
tools/testing/selftests/liveupdate/Makefile | 40 +
tools/testing/selftests/liveupdate/config | 5 +
.../testing/selftests/liveupdate/do_kexec.sh | 16 +
.../testing/selftests/liveupdate/liveupdate.c | 348 +++++++
.../selftests/liveupdate/luo_kexec_simple.c | 114 +++
.../selftests/liveupdate/luo_multi_session.c | 190 ++++
.../selftests/liveupdate/luo_test_utils.c | 168 ++++
.../selftests/liveupdate/luo_test_utils.h | 39 +
39 files changed, 5539 insertions(+), 19 deletions(-)
create mode 100644 Documentation/core-api/liveupdate.rst
create mode 100644 Documentation/mm/memfd_preservation.rst
create mode 100644 Documentation/userspace-api/liveupdate.rst
create mode 100644 include/linux/liveupdate.h
create mode 100644 include/linux/liveupdate/abi/luo.h
create mode 100644 include/linux/liveupdate/abi/memfd.h
create mode 100644 include/uapi/linux/liveupdate.h
create mode 100644 kernel/liveupdate/luo_core.c
create mode 100644 kernel/liveupdate/luo_file.c
create mode 100644 kernel/liveupdate/luo_flb.c
create mode 100644 kernel/liveupdate/luo_internal.h
create mode 100644 kernel/liveupdate/luo_ioctl.c
create mode 100644 kernel/liveupdate/luo_session.c
create mode 100644 lib/tests/liveupdate.c
create mode 100644 mm/memfd_luo.c
create mode 100644 tools/testing/selftests/liveupdate/.gitignore
create mode 100644 tools/testing/selftests/liveupdate/Makefile
create mode 100644 tools/testing/selftests/liveupdate/config
create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c
create mode 100644 tools/testing/selftests/liveupdate/luo_kexec_simple.c
create mode 100644 tools/testing/selftests/liveupdate/luo_multi_session.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 2:54 ` Andrew Morton
2025-11-18 15:45 ` Pratyush Yadav
2025-11-15 23:33 ` [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO Pasha Tatashin
` (18 subsequent siblings)
19 siblings, 2 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce LUO, a mechanism intended to facilitate kernel updates while
keeping designated devices operational across the transition (e.g., via
kexec). The primary use case is updating hypervisors with minimal
disruption to running virtual machines. For userspace side of hypervisor
update we have copyless migration. LUO is for updating the kernel.
This initial patch lays the groundwork for the LUO subsystem.
Further functionality, including the implementation of state transition
logic, integration with KHO, and hooks for subsystems and file
descriptors, will be added in subsequent patches.
Create a character device at /dev/liveupdate.
A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
structures. The magic number for IOCTL is registered in
Documentation/userspace-api/ioctl/ioctl-number.rst.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
.../userspace-api/ioctl/ioctl-number.rst | 2 +
include/linux/liveupdate.h | 35 ++++++++
include/uapi/linux/liveupdate.h | 46 ++++++++++
kernel/liveupdate/Kconfig | 27 ++++++
kernel/liveupdate/Makefile | 6 ++
kernel/liveupdate/luo_core.c | 86 +++++++++++++++++++
kernel/liveupdate/luo_ioctl.c | 45 ++++++++++
7 files changed, 247 insertions(+)
create mode 100644 include/linux/liveupdate.h
create mode 100644 include/uapi/linux/liveupdate.h
create mode 100644 kernel/liveupdate/luo_core.c
create mode 100644 kernel/liveupdate/luo_ioctl.c
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 7c527a01d1cf..7232b3544cec 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -385,6 +385,8 @@ Code Seq# Include File Comments
0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver
0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver
<mailto:linux-hyperv@vger.kernel.org>
+0xBA 00-0F uapi/linux/liveupdate.h Pasha Tatashin
+ <mailto:pasha.tatashin@soleen.com>
0xC0 00-0F linux/usb/iowarrior.h
0xCA 00-0F uapi/misc/cxl.h Dead since 6.15
0xCA 10-2F uapi/misc/ocxl.h
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
new file mode 100644
index 000000000000..730b76625fec
--- /dev/null
+++ b/include/linux/liveupdate.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+#ifndef _LINUX_LIVEUPDATE_H
+#define _LINUX_LIVEUPDATE_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/list.h>
+
+#ifdef CONFIG_LIVEUPDATE
+
+/* Return true if live update orchestrator is enabled */
+bool liveupdate_enabled(void);
+
+/* Called during kexec to tell LUO that entered into reboot */
+int liveupdate_reboot(void);
+
+#else /* CONFIG_LIVEUPDATE */
+
+static inline bool liveupdate_enabled(void)
+{
+ return false;
+}
+
+static inline int liveupdate_reboot(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_LIVEUPDATE */
+#endif /* _LINUX_LIVEUPDATE_H */
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
new file mode 100644
index 000000000000..df34c1642c4d
--- /dev/null
+++ b/include/uapi/linux/liveupdate.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/*
+ * Userspace interface for /dev/liveupdate
+ * Live Update Orchestrator
+ *
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef _UAPI_LIVEUPDATE_H
+#define _UAPI_LIVEUPDATE_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. Each
+ * ioctl is passed in a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ * - ENOTTY: The IOCTL number itself is not supported at all
+ * - E2BIG: The IOCTL number is supported, but the provided structure has
+ * non-zero in a part the kernel does not understand.
+ * - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ * understood, however a known field has a value the kernel does not
+ * understand or support.
+ * - EINVAL: Everything about the IOCTL was understood, but a field is not
+ * correct.
+ * - ENOENT: A provided token does not exist.
+ * - ENOMEM: Out of memory.
+ * - EOVERFLOW: Mathematics overflowed.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+
+/* The ioctl type, documented in ioctl-number.rst */
+#define LIVEUPDATE_IOCTL_TYPE 0xBA
+
+#endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index a973a54447de..90857dccb359 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -1,4 +1,10 @@
# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2025, Google LLC.
+# Pasha Tatashin <pasha.tatashin@soleen.com>
+#
+# Live Update Orchestrator
+#
menu "Live Update and Kexec HandOver"
depends on !DEFERRED_STRUCT_PAGE_INIT
@@ -51,4 +57,25 @@ config KEXEC_HANDOVER_ENABLE_DEFAULT
The default behavior can still be overridden at boot time by
passing 'kho=off'.
+config LIVEUPDATE
+ bool "Live Update Orchestrator"
+ depends on KEXEC_HANDOVER
+ help
+ Enable the Live Update Orchestrator. Live Update is a mechanism,
+ typically based on kexec, that allows the kernel to be updated
+ while keeping selected devices operational across the transition.
+ These devices are intended to be reclaimed by the new kernel and
+ re-attached to their original workload without requiring a device
+ reset.
+
+ Ability to handover a device from current to the next kernel depends
+ on specific support within device drivers and related kernel
+ subsystems.
+
+ This feature primarily targets virtual machine hosts to quickly update
+ the kernel hypervisor with minimal disruption to the running virtual
+ machines.
+
+ If unsure, say N.
+
endmenu
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index f52ce1ebcf86..413722002b7a 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -1,5 +1,11 @@
# SPDX-License-Identifier: GPL-2.0
+luo-y := \
+ luo_core.o \
+ luo_ioctl.o
+
obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o
obj-$(CONFIG_KEXEC_HANDOVER_DEBUGFS) += kexec_handover_debugfs.o
+
+obj-$(CONFIG_LIVEUPDATE) += luo.o
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
new file mode 100644
index 000000000000..0e1ab19fa1cd
--- /dev/null
+++ b/kernel/liveupdate/luo_core.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: Live Update Orchestrator (LUO)
+ *
+ * Live Update is a specialized, kexec-based reboot process that allows a
+ * running kernel to be updated from one version to another while preserving
+ * the state of selected resources and keeping designated hardware devices
+ * operational. For these devices, DMA activity may continue throughout the
+ * kernel transition.
+ *
+ * While the primary use case driving this work is supporting live updates of
+ * the Linux kernel when it is used as a hypervisor in cloud environments, the
+ * LUO framework itself is designed to be workload-agnostic. Much like Kernel
+ * Live Patching, which applies security fixes regardless of the workload,
+ * Live Update facilitates a full kernel version upgrade for any type of system.
+ *
+ * For example, a non-hypervisor system running an in-memory cache like
+ * memcached with many gigabytes of data can use LUO. The userspace service
+ * can place its cache into a memfd, have its state preserved by LUO, and
+ * restore it immediately after the kernel kexec.
+ *
+ * Whether the system is running virtual machines, containers, a
+ * high-performance database, or networking services, LUO's primary goal is to
+ * enable a full kernel update by preserving critical userspace state and
+ * keeping essential devices operational.
+ *
+ * The core of LUO is a mechanism that tracks the progress of a live update,
+ * along with a callback API that allows other kernel subsystems to participate
+ * in the process. Example subsystems that can hook into LUO include: kvm,
+ * iommu, interrupts, vfio, participating filesystems, and memory management.
+ *
+ * LUO uses Kexec Handover to transfer memory state from the current kernel to
+ * the next kernel. For more details see
+ * Documentation/core-api/kho/concepts.rst.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kobject.h>
+#include <linux/liveupdate.h>
+
+static struct {
+ bool enabled;
+} luo_global;
+
+static int __init early_liveupdate_param(char *buf)
+{
+ return kstrtobool(buf, &luo_global.enabled);
+}
+early_param("liveupdate", early_liveupdate_param);
+
+/* Public Functions */
+
+/**
+ * liveupdate_reboot() - Kernel reboot notifier for live update final
+ * serialization.
+ *
+ * This function is invoked directly from the reboot() syscall pathway
+ * if kexec is in progress.
+ *
+ * If any callback fails, this function aborts KHO, undoes the freeze()
+ * callbacks, and returns an error.
+ */
+int liveupdate_reboot(void)
+{
+ return 0;
+}
+
+/**
+ * liveupdate_enabled - Check if the live update feature is enabled.
+ *
+ * This function returns the state of the live update feature flag, which
+ * can be controlled via the ``liveupdate`` kernel command-line parameter.
+ *
+ * @return true if live update is enabled, false otherwise.
+ */
+bool liveupdate_enabled(void)
+{
+ return luo_global.enabled;
+}
diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
new file mode 100644
index 000000000000..44d365185f7c
--- /dev/null
+++ b/kernel/liveupdate/luo_ioctl.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include <linux/liveupdate.h>
+#include <linux/miscdevice.h>
+
+struct luo_device_state {
+ struct miscdevice miscdev;
+};
+
+static const struct file_operations luo_fops = {
+ .owner = THIS_MODULE,
+};
+
+static struct luo_device_state luo_dev = {
+ .miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "liveupdate",
+ .fops = &luo_fops,
+ },
+};
+
+static int __init liveupdate_ioctl_init(void)
+{
+ if (!liveupdate_enabled())
+ return 0;
+
+ return misc_register(&luo_dev.miscdev);
+}
+module_init(liveupdate_ioctl_init);
+
+static void __exit liveupdate_exit(void)
+{
+ misc_deregister(&luo_dev.miscdev);
+}
+module_exit(liveupdate_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Pasha Tatashin");
+MODULE_DESCRIPTION("Live Update Orchestrator");
+MODULE_VERSION("0.1");
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 12:43 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec Pasha Tatashin
` (17 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Integrate the LUO with the KHO framework to enable passing LUO state
across a kexec reboot.
When LUO is transitioned to a "prepared" state, it tells KHO to
finalize, so all memory segments that were added to KHO preservation
list are getting preserved. After "Prepared" state no new segments
can be preserved. If LUO is canceled, it also tells KHO to cancel the
serialization, and therefore, later LUO can go back into the prepared
state.
This patch introduces the following changes:
- During the KHO finalization phase allocate FDT blob.
- Populate this FDT with a LUO compatibility string ("luo-v1").
LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
logic (`luo_do_*_calls`) remains unimplemented in this patch.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate/abi/luo.h | 54 ++++++++++
kernel/liveupdate/luo_core.c | 153 ++++++++++++++++++++++++++++-
2 files changed, 206 insertions(+), 1 deletion(-)
create mode 100644 include/linux/liveupdate/abi/luo.h
diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
new file mode 100644
index 000000000000..9483a294287f
--- /dev/null
+++ b/include/linux/liveupdate/abi/luo.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: Live Update Orchestrator ABI
+ *
+ * This header defines the stable Application Binary Interface used by the
+ * Live Update Orchestrator to pass state from a pre-update kernel to a
+ * post-update kernel. The ABI is built upon the Kexec HandOver framework
+ * and uses a Flattened Device Tree to describe the preserved data.
+ *
+ * This interface is a contract. Any modification to the FDT structure, node
+ * properties, compatible strings, or the layout of the `__packed` serialization
+ * structures defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the relevant `_COMPATIBLE` string to
+ * prevent a new kernel from misinterpreting data from an old kernel.
+ *
+ * FDT Structure Overview:
+ * The entire LUO state is encapsulated within a single KHO entry named "LUO".
+ * This entry contains an FDT with the following layout:
+ *
+ * .. code-block:: none
+ *
+ * / {
+ * compatible = "luo-v1";
+ * liveupdate-number = <...>;
+ * };
+ *
+ * Main LUO Node (/):
+ *
+ * - compatible: "luo-v1"
+ * Identifies the overall LUO ABI version.
+ * - liveupdate-number: u64
+ * A counter tracking the number of successful live updates performed.
+ */
+
+#ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
+#define _LINUX_LIVEUPDATE_ABI_LUO_H
+
+/*
+ * The LUO FDT hooks all LUO state for sessions, fds, etc.
+ * In the root it allso carries "liveupdate-number" 64-bit property that
+ * corresponds to the number of live-updates performed on this machine.
+ */
+#define LUO_FDT_SIZE PAGE_SIZE
+#define LUO_FDT_KHO_ENTRY_NAME "LUO"
+#define LUO_FDT_COMPATIBLE "luo-v1"
+#define LUO_FDT_LIVEUPDATE_NUM "liveupdate-number"
+
+#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index 0e1ab19fa1cd..4a213b262b9f 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -42,11 +42,24 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
#include <linux/kobject.h>
+#include <linux/libfdt.h>
#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/mm.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#include "kexec_handover_internal.h"
static struct {
bool enabled;
+ void *fdt_out;
+ void *fdt_in;
+ u64 liveupdate_num;
} luo_global;
static int __init early_liveupdate_param(char *buf)
@@ -55,6 +68,129 @@ static int __init early_liveupdate_param(char *buf)
}
early_param("liveupdate", early_liveupdate_param);
+static int __init luo_early_startup(void)
+{
+ phys_addr_t fdt_phys;
+ int err, ln_size;
+ const void *ptr;
+
+ if (!kho_is_enabled()) {
+ if (liveupdate_enabled())
+ pr_warn("Disabling liveupdate because KHO is disabled\n");
+ luo_global.enabled = false;
+ return 0;
+ }
+
+ /* Retrieve LUO subtree, and verify its format. */
+ err = kho_retrieve_subtree(LUO_FDT_KHO_ENTRY_NAME, &fdt_phys);
+ if (err) {
+ if (err != -ENOENT) {
+ pr_err("failed to retrieve FDT '%s' from KHO: %pe\n",
+ LUO_FDT_KHO_ENTRY_NAME, ERR_PTR(err));
+ return err;
+ }
+
+ return 0;
+ }
+
+ luo_global.fdt_in = phys_to_virt(fdt_phys);
+ err = fdt_node_check_compatible(luo_global.fdt_in, 0,
+ LUO_FDT_COMPATIBLE);
+ if (err) {
+ pr_err("FDT '%s' is incompatible with '%s' [%d]\n",
+ LUO_FDT_KHO_ENTRY_NAME, LUO_FDT_COMPATIBLE, err);
+
+ return -EINVAL;
+ }
+
+ ln_size = 0;
+ ptr = fdt_getprop(luo_global.fdt_in, 0, LUO_FDT_LIVEUPDATE_NUM,
+ &ln_size);
+ if (!ptr || ln_size != sizeof(luo_global.liveupdate_num)) {
+ pr_err("Unable to get live update number '%s' [%d]\n",
+ LUO_FDT_LIVEUPDATE_NUM, ln_size);
+
+ return -EINVAL;
+ }
+
+ luo_global.liveupdate_num = get_unaligned((u64 *)ptr);
+ pr_info("Retrieved live update data, liveupdate number: %lld\n",
+ luo_global.liveupdate_num);
+
+ return 0;
+}
+
+static int __init liveupdate_early_init(void)
+{
+ int err;
+
+ err = luo_early_startup();
+ if (err) {
+ pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
+ ERR_PTR(err));
+ luo_global.enabled = false;
+ }
+
+ return err;
+}
+early_initcall(liveupdate_early_init);
+
+/* Called during boot to create outgoing LUO fdt tree */
+static int __init luo_fdt_setup(void)
+{
+ const u64 ln = luo_global.liveupdate_num + 1;
+ void *fdt_out;
+ int err;
+
+ fdt_out = kho_alloc_preserve(LUO_FDT_SIZE);
+ if (IS_ERR(fdt_out)) {
+ pr_err("failed to allocate/preserve FDT memory\n");
+ return PTR_ERR(fdt_out);
+ }
+
+ err = fdt_create(fdt_out, LUO_FDT_SIZE);
+ err |= fdt_finish_reservemap(fdt_out);
+ err |= fdt_begin_node(fdt_out, "");
+ err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
+ err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
+ err |= fdt_end_node(fdt_out);
+ err |= fdt_finish(fdt_out);
+ if (err)
+ goto exit_free;
+
+ err = kho_add_subtree(LUO_FDT_KHO_ENTRY_NAME, fdt_out);
+ if (err)
+ goto exit_free;
+ luo_global.fdt_out = fdt_out;
+
+ return 0;
+
+exit_free:
+ kho_unpreserve_free(fdt_out);
+ pr_err("failed to prepare LUO FDT: %d\n", err);
+
+ return err;
+}
+
+/*
+ * late initcall because it initializes the outgoing tree that is needed only
+ * once userspace starts using /dev/liveupdate.
+ */
+static int __init luo_late_startup(void)
+{
+ int err;
+
+ if (!liveupdate_enabled())
+ return 0;
+
+ err = luo_fdt_setup();
+ if (err)
+ luo_global.enabled = false;
+
+ return err;
+}
+late_initcall(luo_late_startup);
+
/* Public Functions */
/**
@@ -69,7 +205,22 @@ early_param("liveupdate", early_liveupdate_param);
*/
int liveupdate_reboot(void)
{
- return 0;
+ int err;
+
+ if (!liveupdate_enabled())
+ return 0;
+
+ err = kho_finalize();
+ if (err) {
+ pr_err("kho_finalize failed %d\n", err);
+ /*
+ * kho_finalize() may return libfdt errors, to aboid passing to
+ * userspace unknown errors, change this to EAGAIN.
+ */
+ err = -EAGAIN;
+ }
+
+ return err;
}
/**
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 12:44 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 04/20] liveupdate: luo_session: add sessions support Pasha Tatashin
` (16 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Modify the kernel_kexec() to call liveupdate_reboot().
This ensures that the Live Update Orchestrator is notified just
before the kernel executes the kexec jump. The liveupdate_reboot()
function triggers the final freeze event, allowing participating
FDs perform last-minute check or state saving within the blackout
window.
If liveupdate_reboot() returns an error (indicating a failure during
LUO finalization), the kexec operation is aborted to prevent proceeding
with an inconsistent state. An error is returned to user.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
kernel/kexec_core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a8890dd03a1d..3122235c225b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -15,6 +15,7 @@
#include <linux/kexec.h>
#include <linux/mutex.h>
#include <linux/list.h>
+#include <linux/liveupdate.h>
#include <linux/highmem.h>
#include <linux/syscalls.h>
#include <linux/reboot.h>
@@ -1145,6 +1146,10 @@ int kernel_kexec(void)
goto Unlock;
}
+ error = liveupdate_reboot();
+ if (error)
+ goto Unlock;
+
#ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
/*
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 04/20] liveupdate: luo_session: add sessions support
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (2 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 17:05 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface Pasha Tatashin
` (15 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce concept of "Live Update Sessions" within the LUO framework.
LUO sessions provide a mechanism to group and manage `struct file *`
instances (representing file descriptors) that need to be preserved
across a kexec-based live update.
Each session is identified by a unique name and acts as a container
for file objects whose state is critical to a userspace workload, such
as a virtual machine or a high-performance database, aiming to maintain
their functionality across a kernel transition.
This groundwork establishes the framework for preserving file-backed
state across kernel updates, with the actual file data preservation
mechanisms to be implemented in subsequent patches.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate/abi/luo.h | 83 +++++-
include/uapi/linux/liveupdate.h | 3 +
kernel/liveupdate/Makefile | 3 +-
kernel/liveupdate/luo_core.c | 10 +
kernel/liveupdate/luo_internal.h | 52 ++++
kernel/liveupdate/luo_session.c | 421 +++++++++++++++++++++++++++++
6 files changed, 570 insertions(+), 2 deletions(-)
create mode 100644 kernel/liveupdate/luo_internal.h
create mode 100644 kernel/liveupdate/luo_session.c
diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
index 9483a294287f..03a177ae232e 100644
--- a/include/linux/liveupdate/abi/luo.h
+++ b/include/linux/liveupdate/abi/luo.h
@@ -28,6 +28,11 @@
* / {
* compatible = "luo-v1";
* liveupdate-number = <...>;
+ *
+ * luo-session {
+ * compatible = "luo-session-v1";
+ * luo-session-header = <phys_addr_of_session_header_ser>;
+ * };
* };
*
* Main LUO Node (/):
@@ -36,14 +41,40 @@
* Identifies the overall LUO ABI version.
* - liveupdate-number: u64
* A counter tracking the number of successful live updates performed.
+ *
+ * Session Node (luo-session):
+ * This node describes all preserved user-space sessions.
+ *
+ * - compatible: "luo-session-v1"
+ * Identifies the session ABI version.
+ * - luo-session-header: u64
+ * The physical address of a `struct luo_session_header_ser`. This structure
+ * is the header for a contiguous block of memory containing an array of
+ * `struct luo_session_ser`, one for each preserved session.
+ *
+ * Serialization Structures:
+ * The FDT properties point to memory regions containing arrays of simple,
+ * `__packed` structures. These structures contain the actual preserved state.
+ *
+ * - struct luo_session_header_ser:
+ * Header for the session array. Contains the total page count of the
+ * preserved memory block and the number of `struct luo_session_ser`
+ * entries that follow.
+ *
+ * - struct luo_session_ser:
+ * Metadata for a single session, including its name and a physical pointer
+ * to another preserved memory block containing an array of
+ * `struct luo_file_ser` for all files in that session.
*/
#ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
#define _LINUX_LIVEUPDATE_ABI_LUO_H
+#include <uapi/linux/liveupdate.h>
+
/*
* The LUO FDT hooks all LUO state for sessions, fds, etc.
- * In the root it allso carries "liveupdate-number" 64-bit property that
+ * In the root it also carries "liveupdate-number" 64-bit property that
* corresponds to the number of live-updates performed on this machine.
*/
#define LUO_FDT_SIZE PAGE_SIZE
@@ -51,4 +82,54 @@
#define LUO_FDT_COMPATIBLE "luo-v1"
#define LUO_FDT_LIVEUPDATE_NUM "liveupdate-number"
+/*
+ * LUO FDT session node
+ * LUO_FDT_SESSION_HEADER: is a u64 physical address of struct
+ * luo_session_header_ser
+ */
+#define LUO_FDT_SESSION_NODE_NAME "luo-session"
+#define LUO_FDT_SESSION_COMPATIBLE "luo-session-v1"
+#define LUO_FDT_SESSION_HEADER "luo-session-header"
+
+/**
+ * struct luo_session_header_ser - Header for the serialized session data block.
+ * @pgcnt: The total size, in pages, of the entire preserved memory block
+ * that this header describes.
+ * @count: The number of 'struct luo_session_ser' entries that immediately
+ * follow this header in the memory block.
+ *
+ * This structure is located at the beginning of a contiguous block of
+ * physical memory preserved across the kexec. It provides the necessary
+ * metadata to interpret the array of session entries that follow.
+ */
+struct luo_session_header_ser {
+ u64 pgcnt;
+ u64 count;
+} __packed;
+
+/**
+ * struct luo_session_ser - Represents the serialized metadata for a LUO session.
+ * @name: The unique name of the session, copied from the `luo_session`
+ * structure.
+ * @files: The physical address of a contiguous memory block that holds
+ * the serialized state of files.
+ * @pgcnt: The number of pages occupied by the `files` memory block.
+ * @count: The total number of files that were part of this session during
+ * serialization. Used for iteration and validation during
+ * restoration.
+ *
+ * This structure is used to package session-specific metadata for transfer
+ * between kernels via Kexec Handover. An array of these structures (one per
+ * session) is created and passed to the new kernel, allowing it to reconstruct
+ * the session context.
+ *
+ * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
+ */
+struct luo_session_ser {
+ char name[LIVEUPDATE_SESSION_NAME_LENGTH];
+ u64 files;
+ u64 pgcnt;
+ u64 count;
+} __packed;
+
#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
index df34c1642c4d..d2ef2f7e0dbd 100644
--- a/include/uapi/linux/liveupdate.h
+++ b/include/uapi/linux/liveupdate.h
@@ -43,4 +43,7 @@
/* The ioctl type, documented in ioctl-number.rst */
#define LIVEUPDATE_IOCTL_TYPE 0xBA
+/* The maximum length of session name including null termination */
+#define LIVEUPDATE_SESSION_NAME_LENGTH 56
+
#endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index 413722002b7a..83285e7ad726 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -2,7 +2,8 @@
luo-y := \
luo_core.o \
- luo_ioctl.o
+ luo_ioctl.o \
+ luo_session.o
obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index 4a213b262b9f..653cdca5e25d 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -54,6 +54,7 @@
#include <linux/unaligned.h>
#include "kexec_handover_internal.h"
+#include "luo_internal.h"
static struct {
bool enabled;
@@ -117,6 +118,10 @@ static int __init luo_early_startup(void)
pr_info("Retrieved live update data, liveupdate number: %lld\n",
luo_global.liveupdate_num);
+ err = luo_session_setup_incoming(luo_global.fdt_in);
+ if (err)
+ return err;
+
return 0;
}
@@ -153,6 +158,7 @@ static int __init luo_fdt_setup(void)
err |= fdt_begin_node(fdt_out, "");
err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
+ err |= luo_session_setup_outgoing(fdt_out);
err |= fdt_end_node(fdt_out);
err |= fdt_finish(fdt_out);
if (err)
@@ -210,6 +216,10 @@ int liveupdate_reboot(void)
if (!liveupdate_enabled())
return 0;
+ err = luo_session_serialize();
+ if (err)
+ return err;
+
err = kho_finalize();
if (err) {
pr_err("kho_finalize failed %d\n", err);
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
new file mode 100644
index 000000000000..245373edfa6f
--- /dev/null
+++ b/kernel/liveupdate/luo_internal.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef _LINUX_LUO_INTERNAL_H
+#define _LINUX_LUO_INTERNAL_H
+
+#include <linux/liveupdate.h>
+
+/**
+ * struct luo_session - Represents an active or incoming Live Update session.
+ * @name: A unique name for this session, used for identification and
+ * retrieval.
+ * @files_list: An ordered list of files associated with this session, it is
+ * ordered by preservation time.
+ * @ser: Pointer to the serialized data for this session.
+ * @count: A counter tracking the number of files currently stored in the
+ * @files_list for this session.
+ * @list: A list_head member used to link this session into a global list
+ * of either outgoing (to be preserved) or incoming (restored from
+ * previous kernel) sessions.
+ * @retrieved: A boolean flag indicating whether this session has been
+ * retrieved by a consumer in the new kernel.
+ * @mutex: Session lock, protects files_list, and count.
+ * @files: The physically contiguous memory block that holds the serialized
+ * state of files.
+ * @pgcnt: The number of pages @files occupy.
+ */
+struct luo_session {
+ char name[LIVEUPDATE_SESSION_NAME_LENGTH];
+ struct list_head files_list;
+ struct luo_session_ser *ser;
+ long count;
+ struct list_head list;
+ bool retrieved;
+ struct mutex mutex;
+ struct luo_file_ser *files;
+ u64 pgcnt;
+};
+
+int luo_session_create(const char *name, struct file **filep);
+int luo_session_retrieve(const char *name, struct file **filep);
+int __init luo_session_setup_outgoing(void *fdt);
+int __init luo_session_setup_incoming(void *fdt);
+int luo_session_serialize(void);
+int luo_session_deserialize(void);
+bool luo_session_is_deserialized(void);
+
+#endif /* _LINUX_LUO_INTERNAL_H */
diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
new file mode 100644
index 000000000000..cb74bfaba479
--- /dev/null
+++ b/kernel/liveupdate/luo_session.c
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: LUO Sessions
+ *
+ * LUO Sessions provide the core mechanism for grouping and managing `struct
+ * file *` instances that need to be preserved across a kexec-based live
+ * update. Each session acts as a named container for a set of file objects,
+ * allowing a userspace agent to manage the lifecycle of resources critical to a
+ * workload.
+ *
+ * Core Concepts:
+ *
+ * - Named Containers: Sessions are identified by a unique, user-provided name,
+ * which is used for both creation in the current kernel and retrieval in the
+ * next kernel.
+ *
+ * - Userspace Interface: Session management is driven from userspace via
+ * ioctls on /dev/liveupdate.
+ *
+ * - Serialization: Session metadata is preserved using the KHO framework. When
+ * a live update is triggered via kexec, an array of `struct luo_session_ser`
+ * is populated and placed in a preserved memory region. An FDT node is also
+ * created, containing the count of sessions and the physical address of this
+ * array.
+ *
+ * Session Lifecycle:
+ *
+ * 1. Creation: A userspace agent calls `luo_session_create()` to create a
+ * new, empty session and receives a file descriptor for it.
+ *
+ * 2. Serialization: When the `reboot(LINUX_REBOOT_CMD_KEXEC)` syscall is
+ * made, `luo_session_serialize()` is called. It iterates through all
+ * active sessions and writes their metadata into a memory area preserved
+ * by KHO.
+ *
+ * 3. Deserialization (in new kernel): After kexec, `luo_session_deserialize()`
+ * runs, reading the serialized data and creating a list of `struct
+ * luo_session` objects representing the preserved sessions.
+ *
+ * 4. Retrieval: A userspace agent in the new kernel can then call
+ * `luo_session_retrieve()` with a session name to get a new file
+ * descriptor and access the preserved state.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/cleanup.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/libfdt.h>
+#include <linux/list.h>
+#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/unaligned.h>
+#include <uapi/linux/liveupdate.h>
+#include "luo_internal.h"
+
+/* 16 4K pages, give space for 819 sessions */
+#define LUO_SESSION_PGCNT 16ul
+#define LUO_SESSION_MAX (((LUO_SESSION_PGCNT << PAGE_SHIFT) - \
+ sizeof(struct luo_session_header_ser)) / \
+ sizeof(struct luo_session_ser))
+
+/**
+ * struct luo_session_header - Header struct for managing LUO sessions.
+ * @count: The number of sessions currently tracked in the @list.
+ * @list: The head of the linked list of `struct luo_session` instances.
+ * @rwsem: A read-write semaphore providing synchronized access to the
+ * session list and other fields in this structure.
+ * @header_ser: The header data of serialization array.
+ * @ser: The serialized session data (an array of
+ * `struct luo_session_ser`).
+ * @active: Set to true when first initialized. If previous kernel did not
+ * send session data, active stays false for incoming.
+ */
+struct luo_session_header {
+ long count;
+ struct list_head list;
+ struct rw_semaphore rwsem;
+ struct luo_session_header_ser *header_ser;
+ struct luo_session_ser *ser;
+ bool active;
+};
+
+/**
+ * struct luo_session_global - Global container for managing LUO sessions.
+ * @incoming: The sessions passed from the previous kernel.
+ * @outgoing: The sessions that are going to be passed to the next kernel.
+ * @deserialized: The sessions have been deserialized once /dev/liveupdate
+ * has been opened.
+ */
+struct luo_session_global {
+ struct luo_session_header incoming;
+ struct luo_session_header outgoing;
+ bool deserialized;
+};
+
+static struct luo_session_global luo_session_global;
+
+static struct luo_session *luo_session_alloc(const char *name)
+{
+ struct luo_session *session = kzalloc(sizeof(*session), GFP_KERNEL);
+
+ if (!session)
+ return ERR_PTR(-ENOMEM);
+
+ strscpy(session->name, name, sizeof(session->name));
+ INIT_LIST_HEAD(&session->files_list);
+ INIT_LIST_HEAD(&session->list);
+ mutex_init(&session->mutex);
+ session->count = 0;
+
+ return session;
+}
+
+static void luo_session_free(struct luo_session *session)
+{
+ WARN_ON(session->count);
+ WARN_ON(!list_empty(&session->files_list));
+ mutex_destroy(&session->mutex);
+ kfree(session);
+}
+
+static int luo_session_insert(struct luo_session_header *sh,
+ struct luo_session *session)
+{
+ struct luo_session *it;
+
+ guard(rwsem_write)(&sh->rwsem);
+
+ /*
+ * For outgoing we should make sure there is room in serialization array
+ * for new session.
+ */
+ if (sh == &luo_session_global.outgoing) {
+ if (sh->count == LUO_SESSION_MAX)
+ return -ENOMEM;
+ }
+
+ /*
+ * For small number of sessions this loop won't hurt performance
+ * but if we ever start using a lot of sessions, this might
+ * become a bottle neck during deserialization time, as it would
+ * cause O(n*n) complexity.
+ */
+ list_for_each_entry(it, &sh->list, list) {
+ if (!strncmp(it->name, session->name, sizeof(it->name)))
+ return -EEXIST;
+ }
+ list_add_tail(&session->list, &sh->list);
+ sh->count++;
+
+ return 0;
+}
+
+static void luo_session_remove(struct luo_session_header *sh,
+ struct luo_session *session)
+{
+ guard(rwsem_write)(&sh->rwsem);
+ list_del(&session->list);
+ sh->count--;
+}
+
+static int luo_session_release(struct inode *inodep, struct file *filep)
+{
+ struct luo_session *session = filep->private_data;
+ struct luo_session_header *sh;
+
+ /* If retrieved is set, it means this session is from incoming list */
+ if (session->retrieved)
+ sh = &luo_session_global.incoming;
+ else
+ sh = &luo_session_global.outgoing;
+
+ luo_session_remove(sh, session);
+ luo_session_free(session);
+
+ return 0;
+}
+
+static const struct file_operations luo_session_fops = {
+ .owner = THIS_MODULE,
+ .release = luo_session_release,
+};
+
+/* Create a "struct file" for session */
+static int luo_session_getfile(struct luo_session *session, struct file **filep)
+{
+ char name_buf[128];
+ struct file *file;
+
+ guard(mutex)(&session->mutex);
+ snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
+ file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ *filep = file;
+
+ return 0;
+}
+
+int luo_session_create(const char *name, struct file **filep)
+{
+ struct luo_session *session;
+ int err;
+
+ session = luo_session_alloc(name);
+ if (IS_ERR(session))
+ return PTR_ERR(session);
+
+ err = luo_session_insert(&luo_session_global.outgoing, session);
+ if (err)
+ goto err_free;
+
+ err = luo_session_getfile(session, filep);
+ if (err)
+ goto err_remove;
+
+ return 0;
+
+err_remove:
+ luo_session_remove(&luo_session_global.outgoing, session);
+err_free:
+ luo_session_free(session);
+
+ return err;
+}
+
+int luo_session_retrieve(const char *name, struct file **filep)
+{
+ struct luo_session_header *sh = &luo_session_global.incoming;
+ struct luo_session *session = NULL;
+ struct luo_session *it;
+ int err;
+
+ scoped_guard(rwsem_read, &sh->rwsem) {
+ list_for_each_entry(it, &sh->list, list) {
+ if (!strncmp(it->name, name, sizeof(it->name))) {
+ session = it;
+ break;
+ }
+ }
+ }
+
+ if (!session)
+ return -ENOENT;
+
+ scoped_guard(mutex, &session->mutex) {
+ if (session->retrieved)
+ return -EINVAL;
+ }
+
+ err = luo_session_getfile(session, filep);
+ if (!err) {
+ scoped_guard(mutex, &session->mutex)
+ session->retrieved = true;
+ }
+
+ return err;
+}
+
+int __init luo_session_setup_outgoing(void *fdt_out)
+{
+ struct luo_session_header_ser *header_ser;
+ u64 header_ser_pa;
+ int err;
+
+ header_ser = kho_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
+ if (IS_ERR(header_ser))
+ return PTR_ERR(header_ser);
+ header_ser_pa = virt_to_phys(header_ser);
+
+ err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
+ err |= fdt_property_string(fdt_out, "compatible",
+ LUO_FDT_SESSION_COMPATIBLE);
+ err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEADER, &header_ser_pa,
+ sizeof(header_ser_pa));
+ err |= fdt_end_node(fdt_out);
+
+ if (err)
+ goto err_unpreserve;
+
+ header_ser->pgcnt = LUO_SESSION_PGCNT;
+ INIT_LIST_HEAD(&luo_session_global.outgoing.list);
+ init_rwsem(&luo_session_global.outgoing.rwsem);
+ luo_session_global.outgoing.header_ser = header_ser;
+ luo_session_global.outgoing.ser = (void *)(header_ser + 1);
+ luo_session_global.outgoing.active = true;
+
+ return 0;
+
+err_unpreserve:
+ kho_unpreserve_free(header_ser);
+ return err;
+}
+
+int __init luo_session_setup_incoming(void *fdt_in)
+{
+ struct luo_session_header_ser *header_ser;
+ int err, header_size, offset;
+ u64 header_ser_pa;
+ const void *ptr;
+
+ offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
+ if (offset < 0) {
+ pr_err("Unable to get session node: [%s]\n",
+ LUO_FDT_SESSION_NODE_NAME);
+ return -EINVAL;
+ }
+
+ err = fdt_node_check_compatible(fdt_in, offset,
+ LUO_FDT_SESSION_COMPATIBLE);
+ if (err) {
+ pr_err("Session node incompatible [%s]\n",
+ LUO_FDT_SESSION_COMPATIBLE);
+ return -EINVAL;
+ }
+
+ header_size = 0;
+ ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEADER, &header_size);
+ if (!ptr || header_size != sizeof(u64)) {
+ pr_err("Unable to get session header '%s' [%d]\n",
+ LUO_FDT_SESSION_HEADER, header_size);
+ return -EINVAL;
+ }
+
+ header_ser_pa = get_unaligned((u64 *)ptr);
+ header_ser = phys_to_virt(header_ser_pa);
+
+ luo_session_global.incoming.header_ser = header_ser;
+ luo_session_global.incoming.ser = (void *)(header_ser + 1);
+ INIT_LIST_HEAD(&luo_session_global.incoming.list);
+ init_rwsem(&luo_session_global.incoming.rwsem);
+ luo_session_global.incoming.active = true;
+
+ return 0;
+}
+
+bool luo_session_is_deserialized(void)
+{
+ return luo_session_global.deserialized;
+}
+
+int luo_session_deserialize(void)
+{
+ struct luo_session_header *sh = &luo_session_global.incoming;
+ int err;
+
+ if (luo_session_is_deserialized())
+ return 0;
+
+ luo_session_global.deserialized = true;
+ if (!sh->active) {
+ INIT_LIST_HEAD(&sh->list);
+ init_rwsem(&sh->rwsem);
+ return 0;
+ }
+
+ for (int i = 0; i < sh->header_ser->count; i++) {
+ struct luo_session *session;
+
+ session = luo_session_alloc(sh->ser[i].name);
+ if (IS_ERR(session)) {
+ pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
+ sh->ser[i].name, session);
+ return PTR_ERR(session);
+ }
+
+ err = luo_session_insert(sh, session);
+ if (err) {
+ luo_session_free(session);
+ pr_warn("Failed to insert session [%s] %pe\n",
+ session->name, ERR_PTR(err));
+ return err;
+ }
+
+ session->count = sh->ser[i].count;
+ session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
+ session->pgcnt = sh->ser[i].pgcnt;
+ }
+
+ kho_restore_free(sh->header_ser);
+ sh->header_ser = NULL;
+ sh->ser = NULL;
+
+ return 0;
+}
+
+int luo_session_serialize(void)
+{
+ struct luo_session_header *sh = &luo_session_global.outgoing;
+ struct luo_session *session;
+ int i = 0;
+
+ guard(rwsem_write)(&sh->rwsem);
+ list_for_each_entry(session, &sh->list, list) {
+ strscpy(sh->ser[i].name, session->name,
+ sizeof(sh->ser[i].name));
+ sh->ser[i].count = session->count;
+ sh->ser[i].files = session->files ? virt_to_phys(session->files) : 0;
+ sh->ser[i].pgcnt = session->pgcnt;
+ i++;
+ }
+ sh->header_ser->count = sh->count;
+
+ return 0;
+}
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (3 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 04/20] liveupdate: luo_session: add sessions support Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 17:15 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
` (14 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce the user-space interface for the Live Update Orchestrator
via ioctl commands, enabling external control over the live update
process and management of preserved resources.
The idea is that there is going to be a single userspace agent driving
the live update, therefore, only a single process can ever hold this
device opened at a time.
The following ioctl commands are introduced:
LIVEUPDATE_IOCTL_CREATE_SESSION
Provides a way for userspace to create a named session for grouping file
descriptors that need to be preserved. It returns a new file descriptor
representing the session.
LIVEUPDATE_IOCTL_RETRIEVE_SESSION
Allows the userspace agent in the new kernel to reclaim a preserved
session by its name, receiving a new file descriptor to manage the
restored resources.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/uapi/linux/liveupdate.h | 66 +++++++++++-
kernel/liveupdate/luo_internal.h | 21 ++++
kernel/liveupdate/luo_ioctl.c | 178 +++++++++++++++++++++++++++++++
3 files changed, 264 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
index d2ef2f7e0dbd..6e04254ee535 100644
--- a/include/uapi/linux/liveupdate.h
+++ b/include/uapi/linux/liveupdate.h
@@ -44,6 +44,70 @@
#define LIVEUPDATE_IOCTL_TYPE 0xBA
/* The maximum length of session name including null termination */
-#define LIVEUPDATE_SESSION_NAME_LENGTH 56
+#define LIVEUPDATE_SESSION_NAME_LENGTH 64
+
+/* The /dev/liveupdate ioctl commands */
+enum {
+ LIVEUPDATE_CMD_BASE = 0x00,
+ LIVEUPDATE_CMD_CREATE_SESSION = LIVEUPDATE_CMD_BASE,
+ LIVEUPDATE_CMD_RETRIEVE_SESSION = 0x01,
+};
+
+/**
+ * struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION)
+ * @size: Input; sizeof(struct liveupdate_ioctl_create_session)
+ * @fd: Output; The new file descriptor for the created session.
+ * @name: Input; A null-terminated string for the session name, max
+ * length %LIVEUPDATE_SESSION_NAME_LENGTH including termination
+ * char.
+ *
+ * Creates a new live update session for managing preserved resources.
+ * This ioctl can only be called on the main /dev/liveupdate device.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+struct liveupdate_ioctl_create_session {
+ __u32 size;
+ __s32 fd;
+ __u8 name[LIVEUPDATE_SESSION_NAME_LENGTH];
+};
+
+#define LIVEUPDATE_IOCTL_CREATE_SESSION \
+ _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_CREATE_SESSION)
+
+/**
+ * struct liveupdate_ioctl_retrieve_session - ioctl(LIVEUPDATE_IOCTL_RETRIEVE_SESSION)
+ * @size: Input; sizeof(struct liveupdate_ioctl_retrieve_session)
+ * @fd: Output; The new file descriptor for the retrieved session.
+ * @name: Input; A null-terminated string identifying the session to retrieve.
+ * The name must exactly match the name used when the session was
+ * created in the previous kernel.
+ *
+ * Retrieves a handle (a new file descriptor) for a preserved session by its
+ * name. This is the primary mechanism for a userspace agent to regain control
+ * of its preserved resources after a live update.
+ *
+ * The userspace application provides the null-terminated `name` of a session
+ * it created before the live update. If a preserved session with a matching
+ * name is found, the kernel instantiates it and returns a new file descriptor
+ * in the `fd` field. This new session FD can then be used for all file-specific
+ * operations, such as restoring individual file descriptors with
+ * LIVEUPDATE_SESSION_RETRIEVE_FD.
+ *
+ * It is the responsibility of the userspace application to know the names of
+ * the sessions it needs to retrieve. If no session with the given name is
+ * found, the ioctl will fail with -ENOENT.
+ *
+ * This ioctl can only be called on the main /dev/liveupdate device when the
+ * system is in the LIVEUPDATE_STATE_UPDATED state.
+ */
+struct liveupdate_ioctl_retrieve_session {
+ __u32 size;
+ __s32 fd;
+ __u8 name[LIVEUPDATE_SESSION_NAME_LENGTH];
+};
+
+#define LIVEUPDATE_IOCTL_RETRIEVE_SESSION \
+ _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_RETRIEVE_SESSION)
#endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
index 245373edfa6f..5185ad37a8c1 100644
--- a/kernel/liveupdate/luo_internal.h
+++ b/kernel/liveupdate/luo_internal.h
@@ -9,6 +9,27 @@
#define _LINUX_LUO_INTERNAL_H
#include <linux/liveupdate.h>
+#include <linux/uaccess.h>
+
+struct luo_ucmd {
+ void __user *ubuffer;
+ u32 user_size;
+ void *cmd;
+};
+
+static inline int luo_ucmd_respond(struct luo_ucmd *ucmd,
+ size_t kernel_cmd_size)
+{
+ /*
+ * Copy the minimum of what the user provided and what we actually
+ * have.
+ */
+ if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
+ min_t(size_t, ucmd->user_size, kernel_cmd_size))) {
+ return -EFAULT;
+ }
+ return 0;
+}
/**
* struct luo_session - Represents an active or incoming Live Update session.
diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
index 44d365185f7c..367385efa962 100644
--- a/kernel/liveupdate/luo_ioctl.c
+++ b/kernel/liveupdate/luo_ioctl.c
@@ -5,15 +5,192 @@
* Pasha Tatashin <pasha.tatashin@soleen.com>
*/
+/**
+ * DOC: LUO ioctl Interface
+ *
+ * The IOCTL user-space control interface for the LUO subsystem.
+ * It registers a character device, typically found at ``/dev/liveupdate``,
+ * which allows a userspace agent to manage the LUO state machine and its
+ * associated resources, such as preservable file descriptors.
+ *
+ * To ensure that the state machine is controlled by a single entity, access
+ * to this device is exclusive: only one process is permitted to have
+ * ``/dev/liveupdate`` open at any given time. Subsequent open attempts will
+ * fail with -EBUSY until the first process closes its file descriptor.
+ * This singleton model simplifies state management by preventing conflicting
+ * commands from multiple userspace agents.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/atomic.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
#include <linux/liveupdate.h>
#include <linux/miscdevice.h>
+#include <uapi/linux/liveupdate.h>
+#include "luo_internal.h"
struct luo_device_state {
struct miscdevice miscdev;
+ atomic_t in_use;
+};
+
+static int luo_ioctl_create_session(struct luo_ucmd *ucmd)
+{
+ struct liveupdate_ioctl_create_session *argp = ucmd->cmd;
+ struct file *file;
+ int err;
+
+ argp->fd = get_unused_fd_flags(O_CLOEXEC);
+ if (argp->fd < 0)
+ return argp->fd;
+
+ err = luo_session_create(argp->name, &file);
+ if (err)
+ goto err_put_fd;
+
+ err = luo_ucmd_respond(ucmd, sizeof(*argp));
+ if (err)
+ goto err_put_file;
+
+ fd_install(argp->fd, file);
+
+ return 0;
+
+err_put_file:
+ fput(file);
+err_put_fd:
+ put_unused_fd(argp->fd);
+
+ return err;
+}
+
+static int luo_ioctl_retrieve_session(struct luo_ucmd *ucmd)
+{
+ struct liveupdate_ioctl_retrieve_session *argp = ucmd->cmd;
+ struct file *file;
+ int err;
+
+ argp->fd = get_unused_fd_flags(O_CLOEXEC);
+ if (argp->fd < 0)
+ return argp->fd;
+
+ err = luo_session_retrieve(argp->name, &file);
+ if (err < 0)
+ goto err_put_fd;
+
+ err = luo_ucmd_respond(ucmd, sizeof(*argp));
+ if (err)
+ goto err_put_file;
+
+ fd_install(argp->fd, file);
+
+ return 0;
+
+err_put_file:
+ fput(file);
+err_put_fd:
+ put_unused_fd(argp->fd);
+
+ return err;
+}
+
+static int luo_open(struct inode *inodep, struct file *filep)
+{
+ struct luo_device_state *ldev = container_of(filep->private_data,
+ struct luo_device_state,
+ miscdev);
+
+ if (atomic_cmpxchg(&ldev->in_use, 0, 1))
+ return -EBUSY;
+
+ luo_session_deserialize();
+
+ return 0;
+}
+
+static int luo_release(struct inode *inodep, struct file *filep)
+{
+ struct luo_device_state *ldev = container_of(filep->private_data,
+ struct luo_device_state,
+ miscdev);
+ atomic_set(&ldev->in_use, 0);
+
+ return 0;
+}
+
+union ucmd_buffer {
+ struct liveupdate_ioctl_create_session create;
+ struct liveupdate_ioctl_retrieve_session retrieve;
+};
+
+struct luo_ioctl_op {
+ unsigned int size;
+ unsigned int min_size;
+ unsigned int ioctl_num;
+ int (*execute)(struct luo_ucmd *ucmd);
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
+ [_IOC_NR(_ioctl) - LIVEUPDATE_CMD_BASE] = { \
+ .size = sizeof(_struct) + \
+ BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
+ sizeof(_struct)), \
+ .min_size = offsetofend(_struct, _last), \
+ .ioctl_num = _ioctl, \
+ .execute = _fn, \
+ }
+
+static const struct luo_ioctl_op luo_ioctl_ops[] = {
+ IOCTL_OP(LIVEUPDATE_IOCTL_CREATE_SESSION, luo_ioctl_create_session,
+ struct liveupdate_ioctl_create_session, name),
+ IOCTL_OP(LIVEUPDATE_IOCTL_RETRIEVE_SESSION, luo_ioctl_retrieve_session,
+ struct liveupdate_ioctl_retrieve_session, name),
};
+static long luo_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+ const struct luo_ioctl_op *op;
+ struct luo_ucmd ucmd = {};
+ union ucmd_buffer buf;
+ unsigned int nr;
+ int err;
+
+ nr = _IOC_NR(cmd);
+ if (nr < LIVEUPDATE_CMD_BASE ||
+ (nr - LIVEUPDATE_CMD_BASE) >= ARRAY_SIZE(luo_ioctl_ops)) {
+ return -EINVAL;
+ }
+
+ ucmd.ubuffer = (void __user *)arg;
+ err = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+ if (err)
+ return err;
+
+ op = &luo_ioctl_ops[nr - LIVEUPDATE_CMD_BASE];
+ if (op->ioctl_num != cmd)
+ return -ENOIOCTLCMD;
+ if (ucmd.user_size < op->min_size)
+ return -EINVAL;
+
+ ucmd.cmd = &buf;
+ err = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+ ucmd.user_size);
+ if (err)
+ return err;
+
+ return op->execute(&ucmd);
+}
+
static const struct file_operations luo_fops = {
.owner = THIS_MODULE,
+ .open = luo_open,
+ .release = luo_release,
+ .unlocked_ioctl = luo_ioctl,
};
static struct luo_device_state luo_dev = {
@@ -22,6 +199,7 @@ static struct luo_device_state luo_dev = {
.name = "liveupdate",
.fops = &luo_fops,
},
+ .in_use = ATOMIC_INIT(0),
};
static int __init liveupdate_ioctl_init(void)
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (4 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 18:15 ` Mike Rapoport
2025-11-18 17:38 ` David Matlack
2025-11-15 23:33 ` [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation Pasha Tatashin
` (13 subsequent siblings)
19 siblings, 2 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
This patch implements the core mechanism for managing preserved
files throughout the live update lifecycle. It provides the logic to
invoke the file handler callbacks (preserve, unpreserve, freeze,
unfreeze, retrieve, and finish) at the appropriate stages.
During the reboot phase, luo_file_freeze() serializes the final
metadata for each file (handler compatible string, token, and data
handle) into a memory region preserved by KHO. In the new kernel,
luo_file_deserialize() reconstructs the in-memory file list from this
data, preparing the session for retrieval.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate.h | 109 ++++
include/linux/liveupdate/abi/luo.h | 22 +
kernel/liveupdate/Makefile | 1 +
kernel/liveupdate/luo_file.c | 887 +++++++++++++++++++++++++++++
kernel/liveupdate/luo_internal.h | 9 +
5 files changed, 1028 insertions(+)
create mode 100644 kernel/liveupdate/luo_file.c
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
index 730b76625fec..4a5d4dd9905a 100644
--- a/include/linux/liveupdate.h
+++ b/include/linux/liveupdate.h
@@ -10,6 +10,88 @@
#include <linux/bug.h>
#include <linux/types.h>
#include <linux/list.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <uapi/linux/liveupdate.h>
+
+struct liveupdate_file_handler;
+struct liveupdate_session;
+struct file;
+
+/**
+ * struct liveupdate_file_op_args - Arguments for file operation callbacks.
+ * @handler: The file handler being called.
+ * @session: The session this file belongs to.
+ * @retrieved: The retrieve status for the 'can_finish / finish'
+ * operation.
+ * @file: The file object. For retrieve: [OUT] The callback sets
+ * this to the new file. For other ops: [IN] The caller sets
+ * this to the file being operated on.
+ * @serialized_data: The opaque u64 handle, preserve/prepare/freeze may update
+ * this field.
+ *
+ * This structure bundles all parameters for the file operation callbacks.
+ * The 'data' and 'file' fields are used for both input and output.
+ */
+struct liveupdate_file_op_args {
+ struct liveupdate_file_handler *handler;
+ struct liveupdate_session *session;
+ bool retrieved;
+ struct file *file;
+ u64 serialized_data;
+};
+
+/**
+ * struct liveupdate_file_ops - Callbacks for live-updatable files.
+ * @can_preserve: Required. Lightweight check to see if this handler is
+ * compatible with the given file.
+ * @preserve: Required. Performs state-saving for the file.
+ * @unpreserve: Required. Cleans up any resources allocated by @preserve.
+ * @freeze: Optional. Final actions just before kernel transition.
+ * @unfreeze: Optional. Undo freeze operations.
+ * @retrieve: Required. Restores the file in the new kernel.
+ * @can_finish: Optional. Check if this FD can finish, i.e. all restoration
+ * pre-requirements for this FD are satisfied. Called prior to
+ * finish, in order to do successful finish calls for all
+ * resources in the session.
+ * @finish: Required. Final cleanup in the new kernel.
+ * @owner: Module reference
+ *
+ * All operations (except can_preserve) receive a pointer to a
+ * 'struct liveupdate_file_op_args' containing the necessary context.
+ */
+struct liveupdate_file_ops {
+ bool (*can_preserve)(struct liveupdate_file_handler *handler,
+ struct file *file);
+ int (*preserve)(struct liveupdate_file_op_args *args);
+ void (*unpreserve)(struct liveupdate_file_op_args *args);
+ int (*freeze)(struct liveupdate_file_op_args *args);
+ void (*unfreeze)(struct liveupdate_file_op_args *args);
+ int (*retrieve)(struct liveupdate_file_op_args *args);
+ bool (*can_finish)(struct liveupdate_file_op_args *args);
+ void (*finish)(struct liveupdate_file_op_args *args);
+ struct module *owner;
+};
+
+/**
+ * struct liveupdate_file_handler - Represents a handler for a live-updatable file type.
+ * @ops: Callback functions
+ * @compatible: The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
+ * that uniquely identifies the file type this handler
+ * supports. This is matched against the compatible string
+ * associated with individual &struct file instances.
+ * @list: Used for linking this handler instance into a global
+ * list of registered file handlers.
+ *
+ * Modules that want to support live update for specific file types should
+ * register an instance of this structure. LUO uses this registration to
+ * determine if a given file can be preserved and to find the appropriate
+ * operations to manage its state across the update.
+ */
+struct liveupdate_file_handler {
+ const struct liveupdate_file_ops *ops;
+ const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
+ struct list_head list;
+};
#ifdef CONFIG_LIVEUPDATE
@@ -19,6 +101,16 @@ bool liveupdate_enabled(void);
/* Called during kexec to tell LUO that entered into reboot */
int liveupdate_reboot(void);
+int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
+
+/* kernel can internally retrieve files */
+int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
+ struct file **filep);
+
+/* Get a token for an outgoing file, or -ENOENT if file is not preserved */
+int liveupdate_get_token_outgoing(struct liveupdate_session *s,
+ struct file *file, u64 *tokenp);
+
#else /* CONFIG_LIVEUPDATE */
static inline bool liveupdate_enabled(void)
@@ -31,5 +123,22 @@ static inline int liveupdate_reboot(void)
return 0;
}
+static inline int liveupdate_register_file_handler(struct liveupdate_file_handler *h)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int liveupdate_get_file_incoming(struct liveupdate_session *s,
+ u64 token, struct file **filep)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int liveupdate_get_token_outgoing(struct liveupdate_session *s,
+ struct file *file, u64 *tokenp)
+{
+ return -EOPNOTSUPP;
+}
+
#endif /* CONFIG_LIVEUPDATE */
#endif /* _LINUX_LIVEUPDATE_H */
diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
index 03a177ae232e..3a596ca1907b 100644
--- a/include/linux/liveupdate/abi/luo.h
+++ b/include/linux/liveupdate/abi/luo.h
@@ -65,6 +65,11 @@
* Metadata for a single session, including its name and a physical pointer
* to another preserved memory block containing an array of
* `struct luo_file_ser` for all files in that session.
+ *
+ * - struct luo_file_ser:
+ * Metadata for a single preserved file. Contains the `compatible` string to
+ * find the correct handler in the new kernel, a user-provided `token` for
+ * identification, and an opaque `data` handle for the handler to use.
*/
#ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
@@ -132,4 +137,21 @@ struct luo_session_ser {
u64 count;
} __packed;
+/* The max size is set so it can be reliably used during in serialization */
+#define LIVEUPDATE_HNDL_COMPAT_LENGTH 48
+
+/**
+ * struct luo_file_ser - Represents the serialized preserves files.
+ * @compatible: File handler compatible string.
+ * @data: Private data
+ * @token: User provided token for this file
+ *
+ * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
+ */
+struct luo_file_ser {
+ char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
+ u64 data;
+ u64 token;
+} __packed;
+
#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index 83285e7ad726..c2252a2ad7bd 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -2,6 +2,7 @@
luo-y := \
luo_core.o \
+ luo_file.o \
luo_ioctl.o \
luo_session.o
diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
new file mode 100644
index 000000000000..dae27a69a09f
--- /dev/null
+++ b/kernel/liveupdate/luo_file.c
@@ -0,0 +1,887 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: LUO File Descriptors
+ *
+ * LUO provides the infrastructure to preserve specific, stateful file
+ * descriptors across a kexec-based live update. The primary goal is to allow
+ * workloads, such as virtual machines using vfio, memfd, or iommufd, to
+ * retain access to their essential resources without interruption.
+ *
+ * The framework is built around a callback-based handler model and a well-
+ * defined lifecycle for each preserved file.
+ *
+ * Handler Registration:
+ * Kernel modules responsible for a specific file type (e.g., memfd, vfio)
+ * register a &struct liveupdate_file_handler. This handler provides a set of
+ * callbacks that LUO invokes at different stages of the update process, most
+ * notably:
+ *
+ * - can_preserve(): A lightweight check to determine if the handler is
+ * compatible with a given 'struct file'.
+ * - preserve(): The heavyweight operation that saves the file's state and
+ * returns an opaque u64 handle, happens while vcpus are still running.
+ * LUO becomes the owner of this file until session is closed or file is
+ * finished.
+ * - unpreserve(): Cleans up any resources allocated by .preserve(), called
+ * if the preservation process is aborted before the reboot (i.e. session is
+ * closed).
+ * - freeze(): A final pre-reboot opportunity to prepare the state for kexec.
+ * We are already in reboot syscall, and therefore userspace cannot mutate
+ * the file anymore.
+ * - unfreeze(): Undoes the actions of .freeze(), called if the live update
+ * is aborted after the freeze phase.
+ * - retrieve(): Reconstructs the file in the new kernel from the preserved
+ * handle.
+ * - finish(): Performs final check and cleanup in the new kernel. After
+ * succesul finish call, LUO gives up ownership to this file.
+ *
+ * File Preservation Lifecycle happy path:
+ *
+ * 1. Preserve (Normal Operation): A userspace agent preserves files one by one
+ * via an ioctl. For each file, luo_preserve_file() finds a compatible
+ * handler, calls its .preserve() op, and creates an internal &struct
+ * luo_file to track the live state.
+ *
+ * 2. Freeze (Pre-Reboot): Just before the kexec, luo_file_freeze() is called.
+ * It iterates through all preserved files, calls their respective .freeze()
+ * ops, and serializes their final metadata (compatible string, token, and
+ * data handle) into a contiguous memory block for KHO.
+ *
+ * 3. Deserialize (New Kernel - Early Boot): After kexec, luo_file_deserialize()
+ * runs. It reads the serialized data from the KHO memory region and
+ * reconstructs the in-memory list of &struct luo_file instances for the new
+ * kernel, linking them to their corresponding handlers.
+ *
+ * 4. Retrieve (New Kernel - Userspace Ready): The userspace agent can now
+ * restore file descriptors by providing a token. luo_retrieve_file()
+ * searches for the matching token, calls the handler's .retrieve() op to
+ * re-create the 'struct file', and returns a new FD. Files can be
+ * retrieved in ANY order.
+ *
+ * 5. Finish (New Kernel - Cleanup): Once a session retrival is complete,
+ * luo_file_finish() is called. It iterates through all files,
+ * invokes their .finish() ops for final cleanup, and releases all
+ * associated kernel resources.
+ *
+ * File Preservation Lifecycle unhappy paths:
+ *
+ * 1. Abort Before Reboot: If the userspace agent aborts the live update
+ * process before calling reboot (e.g., by closing the session file
+ * descriptor), the session's release handler calls
+ * luo_file_unpreserve_files(). This invokes the .unpreserve() callback on
+ * all preserved files, ensuring all allocated resources are cleaned up and
+ * returning the system to a clean state.
+ *
+ * 2. Freeze Failure: During the reboot() syscall, if any handler's .freeze()
+ * op fails, the .unfreeze() op is invoked on all previously *successful*
+ * freezes to roll back their state. The reboot() syscall then returns an
+ * error to userspace, canceling the live update.
+ *
+ * 3. Finish Failure: In the new kernel, if a handler's .finish() op fails,
+ * the luo_file_finish() operation is aborted. LUO retains ownership of
+ * all files within that session, including those that were not yet
+ * processed. The userspace agent can attempt to call the finish operation
+ * again later. If the issue cannot be resolved, these resources will be held
+ * by LUO until the next live update cycle, at which point they will be
+ * discarded.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cleanup.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kexec_handover.h>
+#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/module.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include "luo_internal.h"
+
+static LIST_HEAD(luo_file_handler_list);
+
+/* 2 4K pages, give space for 128 files per session */
+#define LUO_FILE_PGCNT 2ul
+#define LUO_FILE_MAX \
+ ((LUO_FILE_PGCNT << PAGE_SHIFT) / sizeof(struct luo_file_ser))
+
+/**
+ * struct luo_file - Represents a single preserved file instance.
+ * @fh: Pointer to the &struct liveupdate_file_handler that manages
+ * this type of file.
+ * @file: Pointer to the kernel's &struct file that is being preserved.
+ * This is NULL in the new kernel until the file is successfully
+ * retrieved.
+ * @serialized_data: The opaque u64 handle to the serialized state of the file.
+ * This handle is passed back to the handler's .freeze(),
+ * .retrieve(), and .finish() callbacks, allowing it to track
+ * and update its serialized state across phases.
+ * @retrieved: A flag indicating whether a user/kernel in the new kernel has
+ * successfully called retrieve() on this file. This prevents
+ * multiple retrieval attempts.
+ * @mutex: A mutex that protects the fields of this specific instance
+ * (e.g., @retrieved, @file), ensuring that operations like
+ * retrieving or finishing a file are atomic.
+ * @list: The list_head linking this instance into its parent
+ * session's list of preserved files.
+ * @token: The user-provided unique token used to identify this file.
+ *
+ * This structure is the core in-kernel representation of a single file being
+ * managed through a live update. An instance is created by luo_preserve_file()
+ * to link a 'struct file' to its corresponding handler, a user-provided token,
+ * and the serialized state handle returned by the handler's .preserve()
+ * operation.
+ *
+ * These instances are tracked in a per-session list. The @serialized_data
+ * field, which holds a handle to the file's serialized state, may be updated
+ * during the .freeze() callback before being serialized for the next kernel.
+ * After reboot, these structures are recreated by luo_file_deserialize() and
+ * are finally cleaned up by luo_file_finish().
+ */
+struct luo_file {
+ struct liveupdate_file_handler *fh;
+ struct file *file;
+ u64 serialized_data;
+ bool retrieved;
+ struct mutex mutex;
+ struct list_head list;
+ u64 token;
+};
+
+static int luo_session_alloc_files_mem(struct luo_session *session)
+{
+ size_t size;
+ void *mem;
+
+ if (session->files)
+ return 0;
+
+ WARN_ON_ONCE(session->count);
+
+ size = LUO_FILE_PGCNT << PAGE_SHIFT;
+ mem = kho_alloc_preserve(size);
+ if (IS_ERR(mem))
+ return PTR_ERR(mem);
+
+ session->files = mem;
+ session->pgcnt = LUO_FILE_PGCNT;
+
+ return 0;
+}
+
+static void luo_session_free_files_mem(struct luo_session *session)
+{
+ /* If session has files, no need to free preservation memory */
+ if (session->count)
+ return;
+
+ if (!session->files)
+ return;
+
+ kho_unpreserve_free(session->files);
+ session->files = NULL;
+ session->pgcnt = 0;
+}
+
+static bool luo_token_is_used(struct luo_session *session, u64 token)
+{
+ struct luo_file *iter;
+
+ list_for_each_entry(iter, &session->files_list, list) {
+ if (iter->token == token)
+ return true;
+ }
+
+ return false;
+}
+
+/**
+ * luo_preserve_file - Initiate the preservation of a file descriptor.
+ * @session: The session to which the preserved file will be added.
+ * @token: A unique, user-provided identifier for the file.
+ * @fd: The file descriptor to be preserved.
+ *
+ * This function orchestrates the first phase of preserving a file. Upon entry,
+ * it takes a reference to the 'struct file' via fget(), effectively making LUO
+ * a co-owner of the file. This reference is held until the file is either
+ * unpreserved or successfully finished in the next kernel, preventing the file
+ * from being prematurely destroyed.
+ *
+ * This function orchestrates the first phase of preserving a file. It performs
+ * the following steps:
+ *
+ * 1. Validates that the @token is not already in use within the session.
+ * 2. Ensures the session's memory for files serialization is allocated
+ * (allocates if needed).
+ * 3. Iterates through registered handlers, calling can_preserve() to find one
+ * compatible with the given @fd.
+ * 4. Calls the handler's .preserve() operation, which saves the file's state
+ * and returns an opaque private data handle.
+ * 5. Adds the new instance to the session's internal list.
+ *
+ * On success, LUO takes a reference to the 'struct file' and considers it
+ * under its management until it is unpreserved or finished.
+ *
+ * In case of any failure, all intermediate allocations (file reference, memory
+ * for the 'luo_file' struct, etc.) are cleaned up before returning an error.
+ *
+ * Context: Can be called from an ioctl handler during normal system operation.
+ * Return: 0 on success. Returns a negative errno on failure:
+ * -EEXIST if the token is already used.
+ * -EBADF if the file descriptor is invalid.
+ * -ENOSPC if the session is full.
+ * -ENOENT if no compatible handler is found.
+ * -ENOMEM on memory allocation failure.
+ * Other erros might be returned by .preserve().
+ */
+int luo_preserve_file(struct luo_session *session, u64 token, int fd)
+{
+ struct liveupdate_file_op_args args = {0};
+ struct liveupdate_file_handler *fh;
+ struct luo_file *luo_file;
+ struct file *file;
+ int err;
+
+ lockdep_assert_held(&session->mutex);
+
+ if (luo_token_is_used(session, token))
+ return -EEXIST;
+
+ file = fget(fd);
+ if (!file)
+ return -EBADF;
+
+ err = luo_session_alloc_files_mem(session);
+ if (err)
+ goto exit_err;
+
+ if (session->count == LUO_FILE_MAX) {
+ err = -ENOSPC;
+ goto exit_err;
+ }
+
+ err = -ENOENT;
+ list_for_each_entry(fh, &luo_file_handler_list, list) {
+ if (fh->ops->can_preserve(fh, file)) {
+ err = 0;
+ break;
+ }
+ }
+
+ /* err is still -ENOENT if no handler was found */
+ if (err)
+ goto exit_err;
+
+ luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
+ if (!luo_file) {
+ err = -ENOMEM;
+ goto exit_err;
+ }
+
+ luo_file->file = file;
+ luo_file->fh = fh;
+ luo_file->token = token;
+ luo_file->retrieved = false;
+ mutex_init(&luo_file->mutex);
+
+ args.handler = fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = file;
+ err = fh->ops->preserve(&args);
+ if (err) {
+ mutex_destroy(&luo_file->mutex);
+ kfree(luo_file);
+ goto exit_err;
+ } else {
+ luo_file->serialized_data = args.serialized_data;
+ list_add_tail(&luo_file->list, &session->files_list);
+ session->count++;
+ }
+
+ return 0;
+
+exit_err:
+ fput(file);
+ luo_session_free_files_mem(session);
+
+ return err;
+}
+
+/**
+ * luo_file_unpreserve_files - Unpreserves all files from a session.
+ * @session: The session to be cleaned up.
+ *
+ * This function serves as the primary cleanup path for a session. It is
+ * invoked when the userspace agent closes the session's file descriptor.
+ *
+ * For each file, it performs the following cleanup actions:
+ * 1. Calls the handler's .unpreserve() callback to allow the handler to
+ * release any resources it allocated.
+ * 2. Removes the file from the session's internal tracking list.
+ * 3. Releases the reference to the 'struct file' that was taken by
+ * luo_preserve_file() via fput(), returning ownership.
+ * 4. Frees the memory associated with the internal 'struct luo_file'.
+ *
+ * After all individual files are unpreserved, it frees the contiguous memory
+ * block that was allocated to hold their serialization data.
+ */
+void luo_file_unpreserve_files(struct luo_session *session)
+{
+ struct luo_file *luo_file;
+
+ lockdep_assert_held(&session->mutex);
+
+ while (!list_empty(&session->files_list)) {
+ struct liveupdate_file_op_args args = {0};
+
+ luo_file = list_last_entry(&session->files_list,
+ struct luo_file, list);
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = luo_file->file;
+ args.serialized_data = luo_file->serialized_data;
+ luo_file->fh->ops->unpreserve(&args);
+
+ list_del(&luo_file->list);
+ session->count--;
+
+ fput(luo_file->file);
+ mutex_destroy(&luo_file->mutex);
+ kfree(luo_file);
+ }
+
+ luo_session_free_files_mem(session);
+}
+
+static int luo_file_freeze_one(struct luo_session *session,
+ struct luo_file *luo_file)
+{
+ int err = 0;
+
+ guard(mutex)(&luo_file->mutex);
+
+ if (luo_file->fh->ops->freeze) {
+ struct liveupdate_file_op_args args = {0};
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = luo_file->file;
+ args.serialized_data = luo_file->serialized_data;
+
+ err = luo_file->fh->ops->freeze(&args);
+ if (!err)
+ luo_file->serialized_data = args.serialized_data;
+ }
+
+ return err;
+}
+
+static void luo_file_unfreeze_one(struct luo_session *session,
+ struct luo_file *luo_file)
+{
+ guard(mutex)(&luo_file->mutex);
+
+ if (luo_file->fh->ops->unfreeze) {
+ struct liveupdate_file_op_args args = {0};
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = luo_file->file;
+ args.serialized_data = luo_file->serialized_data;
+
+ luo_file->fh->ops->unfreeze(&args);
+ }
+
+ luo_file->serialized_data = 0;
+}
+
+static void __luo_file_unfreeze(struct luo_session *session,
+ struct luo_file *failed_entry)
+{
+ struct list_head *files_list = &session->files_list;
+ struct luo_file *luo_file;
+
+ list_for_each_entry(luo_file, files_list, list) {
+ if (luo_file == failed_entry)
+ break;
+
+ luo_file_unfreeze_one(session, luo_file);
+ }
+
+ memset(session->files, 0, session->pgcnt << PAGE_SHIFT);
+}
+
+/**
+ * luo_file_freeze - Freezes all preserved files and serializes their metadata.
+ * @session: The session whose files are to be frozen.
+ *
+ * This function is called from the reboot() syscall path, just before the
+ * kernel transitions to the new image via kexec. Its purpose is to perform the
+ * final preparation and serialization of all preserved files in the session.
+ *
+ * It iterates through each preserved file in FIFO order (the order of
+ * preservation) and performs two main actions:
+ *
+ * 1. Freezes the File: It calls the handler's .freeze() callback for each
+ * file. This gives the handler a final opportunity to quiesce the device or
+ * prepare its state for the upcoming reboot. The handler may update its
+ * private data handle during this step.
+ *
+ * 2. Serializes Metadata: After a successful freeze, it copies the final file
+ * metadata—the handler's compatible string, the user token, and the final
+ * private data handle—into the pre-allocated contiguous memory buffer
+ * (session->files) that will be handed over to the next kernel via KHO.
+ *
+ * Error Handling (Rollback):
+ * This function is atomic. If any handler's .freeze() operation fails, the
+ * entire live update is aborted. The __luo_file_unfreeze() helper is
+ * immediately called to invoke the .unfreeze() op on all files that were
+ * successfully frozen before the point of failure, rolling them back to a
+ * running state. The function then returns an error, causing the reboot()
+ * syscall to fail.
+ *
+ * Context: Called only from the liveupdate_reboot() path.
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int luo_file_freeze(struct luo_session *session)
+{
+ struct luo_file_ser *file_ser = session->files;
+ struct luo_file *luo_file;
+ int err;
+ int i;
+
+ lockdep_assert_held(&session->mutex);
+
+ if (!session->count)
+ return 0;
+
+ if (WARN_ON(!file_ser))
+ return -EINVAL;
+
+ i = 0;
+ list_for_each_entry(luo_file, &session->files_list, list) {
+ err = luo_file_freeze_one(session, luo_file);
+ if (err < 0) {
+ pr_warn("Freeze failed for session[%s] token[%#0llx] handler[%s] err[%pe]\n",
+ session->name, luo_file->token,
+ luo_file->fh->compatible, ERR_PTR(err));
+ goto exit_err;
+ }
+
+ strscpy(file_ser[i].compatible, luo_file->fh->compatible,
+ sizeof(file_ser[i].compatible));
+ file_ser[i].data = luo_file->serialized_data;
+ file_ser[i].token = luo_file->token;
+ i++;
+ }
+
+ return 0;
+
+exit_err:
+ __luo_file_unfreeze(session, luo_file);
+
+ return err;
+}
+
+/**
+ * luo_file_unfreeze - Unfreezes all files in a session.
+ * @session: The session whose files are to be unfrozen.
+ *
+ * This function rolls back the state of all files in a session after the freeze
+ * phase has begun but must be aborted. It is the counterpart to
+ * luo_file_freeze().
+ *
+ * It invokes the __luo_file_unfreeze() helper with a NULL argument, which
+ * signals the helper to iterate through all files in the session and call
+ * their respective .unfreeze() handler callbacks.
+ *
+ * Context: This is called when the live update is aborted during
+ * the reboot() syscall, after luo_file_freeze() has been called.
+ */
+void luo_file_unfreeze(struct luo_session *session)
+{
+ lockdep_assert_held(&session->mutex);
+
+ if (!session->count)
+ return;
+
+ __luo_file_unfreeze(session, NULL);
+}
+
+/**
+ * luo_retrieve_file - Restores a preserved file from a session by its token.
+ * @session: The session from which to retrieve the file.
+ * @token: The unique token identifying the file to be restored.
+ * @filep: Output parameter; on success, this is populated with a pointer
+ * to the newly retrieved 'struct file'.
+ *
+ * This function is the primary mechanism for recreating a file in the new
+ * kernel after a live update. It searches the session's list of deserialized
+ * files for an entry matching the provided @token.
+ *
+ * The operation is idempotent: if a file has already been successfully
+ * retrieved, this function will simply return a pointer to the existing
+ * 'struct file' and report success without re-executing the retrieve
+ * operation. This is handled by checking the 'retrieved' flag under a lock.
+ *
+ * File retrieval can happen in any order; it is not bound by the order of
+ * preservation.
+ *
+ * Context: Can be called from an ioctl or other in-kernel code in the new
+ * kernel.
+ * Return: 0 on success. Returns a negative errno on failure:
+ * -ENOENT if no file with the matching token is found.
+ * Any error code returned by the handler's .retrieve() op.
+ */
+int luo_retrieve_file(struct luo_session *session, u64 token,
+ struct file **filep)
+{
+ struct liveupdate_file_op_args args = {0};
+ struct luo_file *luo_file;
+ int err;
+
+ lockdep_assert_held(&session->mutex);
+
+ if (list_empty(&session->files_list))
+ return -ENOENT;
+
+ list_for_each_entry(luo_file, &session->files_list, list) {
+ if (luo_file->token == token)
+ break;
+ }
+
+ if (luo_file->token != token)
+ return -ENOENT;
+
+ guard(mutex)(&luo_file->mutex);
+ if (luo_file->retrieved) {
+ /*
+ * Someone is asking for this file again, so get a reference
+ * for them.
+ */
+ get_file(luo_file->file);
+ *filep = luo_file->file;
+ return 0;
+ }
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.serialized_data = luo_file->serialized_data;
+ err = luo_file->fh->ops->retrieve(&args);
+ if (!err) {
+ luo_file->file = args.file;
+
+ /* Get reference so we can keep this file in LUO until finish */
+ get_file(luo_file->file);
+ *filep = luo_file->file;
+ luo_file->retrieved = true;
+ }
+
+ return err;
+}
+
+static int luo_file_can_finish_one(struct luo_session *session,
+ struct luo_file *luo_file)
+{
+ bool can_finish = true;
+
+ guard(mutex)(&luo_file->mutex);
+
+ if (luo_file->fh->ops->can_finish) {
+ struct liveupdate_file_op_args args = {0};
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = luo_file->file;
+ args.serialized_data = luo_file->serialized_data;
+ args.retrieved = luo_file->retrieved;
+ can_finish = luo_file->fh->ops->can_finish(&args);
+ }
+
+ return can_finish ? 0 : -EBUSY;
+}
+
+static void luo_file_finish_one(struct luo_session *session,
+ struct luo_file *luo_file)
+{
+ struct liveupdate_file_op_args args = {0};
+
+ guard(mutex)(&luo_file->mutex);
+
+ args.handler = luo_file->fh;
+ args.session = (struct liveupdate_session *)session;
+ args.file = luo_file->file;
+ args.serialized_data = luo_file->serialized_data;
+ args.retrieved = luo_file->retrieved;
+
+ luo_file->fh->ops->finish(&args);
+}
+
+/**
+ * luo_file_finish - Completes the lifecycle for all files in a session.
+ * @session: The session to be finalized.
+ *
+ * This function orchestrates the final teardown of a live update session in the
+ * new kernel. It should be called after all necessary files have been
+ * retrieved and the userspace agent is ready to release the preserved state.
+ *
+ * The function iterates through all tracked files. For each file, it performs
+ * the following sequence of cleanup actions:
+ *
+ * 1. If file is not yet retrieved, retrieves it, and calls can_finish() on
+ * every file in the session. If all can_finish return true, continue to
+ * finish.
+ * 2. Calls the handler's .finish() callback (via luo_file_finish_one) to
+ * allow for final resource cleanup within the handler.
+ * 3. Releases LUO's ownership reference on the 'struct file' via fput(). This
+ * is the counterpart to the get_file() call in luo_retrieve_file().
+ * 4. Removes the 'struct luo_file' from the session's internal list.
+ * 5. Frees the memory for the 'struct luo_file' instance itself.
+ *
+ * After successfully finishing all individual files, it frees the
+ * contiguous memory block that was used to transfer the serialized metadata
+ * from the previous kernel.
+ *
+ * Error Handling (Atomic Failure):
+ * This operation is atomic. If any handler's .can_finish() op fails, the entire
+ * function aborts immediately and returns an error.
+ *
+ * Context: Can be called from an ioctl handler in the new kernel.
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int luo_file_finish(struct luo_session *session)
+{
+ struct list_head *files_list = &session->files_list;
+ struct luo_file *luo_file;
+ int err;
+
+ if (!session->count)
+ return 0;
+
+ lockdep_assert_held(&session->mutex);
+
+ list_for_each_entry(luo_file, files_list, list) {
+ err = luo_file_can_finish_one(session, luo_file);
+ if (err)
+ return err;
+ }
+
+ while (!list_empty(&session->files_list)) {
+ luo_file = list_last_entry(&session->files_list,
+ struct luo_file, list);
+
+ luo_file_finish_one(session, luo_file);
+
+ if (luo_file->file)
+ fput(luo_file->file);
+ list_del(&luo_file->list);
+ session->count--;
+ mutex_destroy(&luo_file->mutex);
+ kfree(luo_file);
+ }
+
+ if (session->files) {
+ kho_restore_free(session->files);
+ session->files = NULL;
+ session->pgcnt = 0;
+ }
+
+ return 0;
+}
+
+/**
+ * luo_file_deserialize - Reconstructs the list of preserved files in the new kernel.
+ * @session: The incoming session containing the serialized file data from KHO.
+ *
+ * This function is called during the early boot process of the new kernel. It
+ * takes the raw, contiguous memory block of 'struct luo_file_ser' entries,
+ * provided by the previous kernel, and transforms it back into a live,
+ * in-memory linked list of 'struct luo_file' instances.
+ *
+ * For each serialized entry, it performs the following steps:
+ * 1. Reads the 'compatible' string.
+ * 2. Searches the global list of registered file handlers for one that
+ * matches the compatible string.
+ * 3. Allocates a new 'struct luo_file'.
+ * 4. Populates the new structure with the deserialized data (token, private
+ * data handle) and links it to the found handler. The 'file' pointer is
+ * initialized to NULL, as the file has not been retrieved yet.
+ * 5. Adds the new 'struct luo_file' to the session's files_list.
+ *
+ * This prepares the session for userspace, which can later call
+ * luo_retrieve_file() to restore the actual file descriptors.
+ *
+ * Context: Called from session deserialization.
+ */
+int luo_file_deserialize(struct luo_session *session)
+{
+ struct luo_file_ser *file_ser;
+ u64 i;
+
+ lockdep_assert_held(&session->mutex);
+
+ if (!session->files)
+ return 0;
+
+ file_ser = session->files;
+ for (i = 0; i < session->count; i++) {
+ struct liveupdate_file_handler *fh;
+ bool handler_found = false;
+ struct luo_file *luo_file;
+
+ list_for_each_entry(fh, &luo_file_handler_list, list) {
+ if (!strcmp(fh->compatible, file_ser[i].compatible)) {
+ handler_found = true;
+ break;
+ }
+ }
+
+ if (!handler_found) {
+ pr_warn("No registered handler for compatible '%s'\n",
+ file_ser[i].compatible);
+ return -ENOENT;
+ }
+
+ luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
+ if (!luo_file)
+ return -ENOMEM;
+
+ luo_file->fh = fh;
+ luo_file->file = NULL;
+ luo_file->serialized_data = file_ser[i].data;
+ luo_file->token = file_ser[i].token;
+ luo_file->retrieved = false;
+ mutex_init(&luo_file->mutex);
+ list_add_tail(&luo_file->list, &session->files_list);
+ }
+
+ return 0;
+}
+
+/**
+ * liveupdate_register_file_handler - Register a file handler with LUO.
+ * @fh: Pointer to a caller-allocated &struct liveupdate_file_handler.
+ * The caller must initialize this structure, including a unique
+ * 'compatible' string and a valid 'fh' callbacks. This function adds the
+ * handler to the global list of supported file handlers.
+ *
+ * Context: Typically called during module initialization for file types that
+ * support live update preservation.
+ *
+ * Return: 0 on success. Negative errno on failure.
+ */
+int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
+{
+ static DEFINE_MUTEX(register_file_handler_lock);
+ struct liveupdate_file_handler *fh_iter;
+
+ if (!liveupdate_enabled())
+ return -EOPNOTSUPP;
+
+ /*
+ * Once sessions have been deserialized, file handlers cannot be
+ * registered, it is too late.
+ */
+ if (WARN_ON(luo_session_is_deserialized()))
+ return -EBUSY;
+
+ /* Sanity check that all required callbacks are set */
+ if (!fh->ops->preserve || !fh->ops->unpreserve ||
+ !fh->ops->retrieve || !fh->ops->finish) {
+ return -EINVAL;
+ }
+
+ guard(mutex)(®ister_file_handler_lock);
+ list_for_each_entry(fh_iter, &luo_file_handler_list, list) {
+ if (!strcmp(fh_iter->compatible, fh->compatible)) {
+ pr_err("File handler registration failed: Compatible string '%s' already registered.\n",
+ fh->compatible);
+ return -EEXIST;
+ }
+ }
+
+ if (!try_module_get(fh->ops->owner))
+ return -EAGAIN;
+
+ INIT_LIST_HEAD(&fh->list);
+ list_add_tail(&fh->list, &luo_file_handler_list);
+
+ return 0;
+}
+
+/**
+ * liveupdate_get_token_outgoing - Get the token for a preserved file.
+ * @s: The outgoing liveupdate session.
+ * @file: The file object to search for.
+ * @tokenp: Output parameter for the found token.
+ *
+ * Searches the list of preserved files in an outgoing session for a matching
+ * file object. If found, the corresponding user-provided token is returned.
+ *
+ * This function is intended for in-kernel callers that need to correlate a
+ * file with its liveupdate token.
+ *
+ * Context: Can be called from any context that can acquire the session mutex.
+ * Return: 0 on success, -ENOENT if the file is not preserved in this session.
+ */
+int liveupdate_get_token_outgoing(struct liveupdate_session *s,
+ struct file *file, u64 *tokenp)
+{
+ struct luo_session *session = (struct luo_session *)s;
+ struct luo_file *luo_file;
+ int err = -ENOENT;
+
+ list_for_each_entry(luo_file, &session->files_list, list) {
+ if (luo_file->file == file) {
+ if (tokenp)
+ *tokenp = luo_file->token;
+ err = 0;
+ break;
+ }
+ }
+
+ return err;
+}
+
+/**
+ * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel use.
+ * @s: The incoming liveupdate session (restored from the previous kernel).
+ * @token: The unique token identifying the file to retrieve.
+ * @filep: On success, this will be populated with a pointer to the retrieved
+ * 'struct file'.
+ *
+ * Provides a kernel-internal API for other subsystems to retrieve their
+ * preserved files after a live update. This function is a simple wrapper
+ * around luo_retrieve_file(), allowing callers to find a file by its token.
+ *
+ * The operation is idempotent; subsequent calls for the same token will return
+ * a pointer to the same 'struct file' object.
+ *
+ * The caller receives a pointer to the file with a reference incremented. The
+ * file's lifetime is managed by LUO and any userspace file
+ * descriptors. If the caller needs to hold a reference to the file beyond the
+ * immediate scope, it must call get_file() itself.
+ *
+ * Context: Can be called from any context in the new kernel that has a handle
+ * to a restored session.
+ * Return: 0 on success. Returns -ENOENT if no file with the matching token is
+ * found, or any other negative errno on failure.
+ */
+int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
+ struct file **filep)
+{
+ struct luo_session *session = (struct luo_session *)s;
+
+ return luo_retrieve_file(session, token, filep);
+}
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
index 5185ad37a8c1..1a36f2383123 100644
--- a/kernel/liveupdate/luo_internal.h
+++ b/kernel/liveupdate/luo_internal.h
@@ -70,4 +70,13 @@ int luo_session_serialize(void);
int luo_session_deserialize(void);
bool luo_session_is_deserialized(void);
+int luo_preserve_file(struct luo_session *session, u64 token, int fd);
+void luo_file_unpreserve_files(struct luo_session *session);
+int luo_file_freeze(struct luo_session *session);
+void luo_file_unfreeze(struct luo_session *session);
+int luo_retrieve_file(struct luo_session *session, u64 token,
+ struct file **filep);
+int luo_file_finish(struct luo_session *session);
+int luo_file_deserialize(struct luo_session *session);
+
#endif /* _LINUX_LUO_INTERNAL_H */
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (5 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-16 18:25 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state Pasha Tatashin
` (12 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introducing the userspace interface and internal logic required to
manage the lifecycle of file descriptors within a session. Previously, a
session was merely a container; this change makes it a functional
management unit.
The following capabilities are added:
A new set of ioctl commands are added, which operate on the file
descriptor returned by CREATE_SESSION. This allows userspace to:
- LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
to be preserved across the live update.
- LIVEUPDATE_SESSION_RETRIEVE_FD: Retrieve a preserved file in the
new kernel using its unique token.
- LIVEUPDATE_SESSION_FINISH: finish session
The session's .release handler is enhanced to be state-aware. When a
session's file descriptor is closed, it correctly unpreserves
the session based on its current state before freeing all
associated file resources.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/uapi/linux/liveupdate.h | 103 ++++++++++++++++++
kernel/liveupdate/luo_session.c | 187 +++++++++++++++++++++++++++++++-
2 files changed, 286 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
index 6e04254ee535..3902ffab4c53 100644
--- a/include/uapi/linux/liveupdate.h
+++ b/include/uapi/linux/liveupdate.h
@@ -53,6 +53,14 @@ enum {
LIVEUPDATE_CMD_RETRIEVE_SESSION = 0x01,
};
+/* ioctl commands for session file descriptors */
+enum {
+ LIVEUPDATE_CMD_SESSION_BASE = 0x40,
+ LIVEUPDATE_CMD_SESSION_PRESERVE_FD = LIVEUPDATE_CMD_SESSION_BASE,
+ LIVEUPDATE_CMD_SESSION_RETRIEVE_FD = 0x41,
+ LIVEUPDATE_CMD_SESSION_FINISH = 0x42,
+};
+
/**
* struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION)
* @size: Input; sizeof(struct liveupdate_ioctl_create_session)
@@ -110,4 +118,99 @@ struct liveupdate_ioctl_retrieve_session {
#define LIVEUPDATE_IOCTL_RETRIEVE_SESSION \
_IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_RETRIEVE_SESSION)
+/* Session specific IOCTLs */
+
+/**
+ * struct liveupdate_session_preserve_fd - ioctl(LIVEUPDATE_SESSION_PRESERVE_FD)
+ * @size: Input; sizeof(struct liveupdate_session_preserve_fd)
+ * @fd: Input; The user-space file descriptor to be preserved.
+ * @token: Input; An opaque, unique token for preserved resource.
+ *
+ * Holds parameters for preserving a file descriptor.
+ *
+ * User sets the @fd field identifying the file descriptor to preserve
+ * (e.g., memfd, kvm, iommufd, VFIO). The kernel validates if this FD type
+ * and its dependencies are supported for preservation. If validation passes,
+ * the kernel marks the FD internally and *initiates the process* of preparing
+ * its state for saving. The actual snapshotting of the state typically occurs
+ * during the subsequent %LIVEUPDATE_IOCTL_PREPARE execution phase, though
+ * some finalization might occur during freeze.
+ * On successful validation and initiation, the kernel uses the @token
+ * field with an opaque identifier representing the resource being preserved.
+ * This token confirms the FD is targeted for preservation and is required for
+ * the subsequent %LIVEUPDATE_SESSION_RETRIEVE_FD call after the live update.
+ *
+ * Return: 0 on success (validation passed, preservation initiated), negative
+ * error code on failure (e.g., unsupported FD type, dependency issue,
+ * validation failed).
+ */
+struct liveupdate_session_preserve_fd {
+ __u32 size;
+ __s32 fd;
+ __aligned_u64 token;
+};
+
+#define LIVEUPDATE_SESSION_PRESERVE_FD \
+ _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_PRESERVE_FD)
+
+/**
+ * struct liveupdate_session_retrieve_fd - ioctl(LIVEUPDATE_SESSION_RETRIEVE_FD)
+ * @size: Input; sizeof(struct liveupdate_session_RETRIEVE_fd)
+ * @fd: Output; The new file descriptor representing the fully restored
+ * kernel resource.
+ * @token: Input; An opaque, token that was used to preserve the resource.
+ *
+ * Retrieve a previously preserved file descriptor.
+ *
+ * User sets the @token field to the value obtained from a successful
+ * %LIVEUPDATE_IOCTL_FD_PRESERVE call before the live update. On success,
+ * the kernel restores the state (saved during the PREPARE/FREEZE phases)
+ * associated with the token and populates the @fd field with a new file
+ * descriptor referencing the restored resource in the current (new) kernel.
+ * This operation must be performed *before* signaling completion via
+ * %LIVEUPDATE_IOCTL_FINISH.
+ *
+ * Return: 0 on success, negative error code on failure (e.g., invalid token).
+ */
+struct liveupdate_session_retrieve_fd {
+ __u32 size;
+ __s32 fd;
+ __aligned_u64 token;
+};
+
+#define LIVEUPDATE_SESSION_RETRIEVE_FD \
+ _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_RETRIEVE_FD)
+
+/**
+ * struct liveupdate_session_finish - ioctl(LIVEUPDATE_SESSION_FINISH)
+ * @size: Input; sizeof(struct liveupdate_session_finish)
+ * @reserved: Input; Must be zero. Reserved for future use.
+ *
+ * Signals the completion of the restoration process for a retrieved session.
+ * This is the final operation that should be performed on a session file
+ * descriptor after a live update.
+ *
+ * This ioctl must be called once all required file descriptors for the session
+ * have been successfully retrieved (using %LIVEUPDATE_SESSION_RETRIEVE_FD) and
+ * are fully restored from the userspace and kernel perspective.
+ *
+ * Upon success, the kernel releases its ownership of the preserved resources
+ * associated with this session. This allows internal resources to be freed,
+ * typically by decrementing reference counts on the underlying preserved
+ * objects.
+ *
+ * If this operation fails, the resources remain preserved in memory. Userspace
+ * may attempt to call finish again. The resources will otherwise be reset
+ * during the next live update cycle.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+struct liveupdate_session_finish {
+ __u32 size;
+ __u32 reserved;
+};
+
+#define LIVEUPDATE_SESSION_FINISH \
+ _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_FINISH)
+
#endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
index cb74bfaba479..82ba6e3578f5 100644
--- a/kernel/liveupdate/luo_session.c
+++ b/kernel/liveupdate/luo_session.c
@@ -174,26 +174,189 @@ static void luo_session_remove(struct luo_session_header *sh,
sh->count--;
}
+static int luo_session_finish_one(struct luo_session *session)
+{
+ guard(mutex)(&session->mutex);
+ return luo_file_finish(session);
+}
+
+static void luo_session_unfreeze_one(struct luo_session *session)
+{
+ guard(mutex)(&session->mutex);
+ luo_file_unfreeze(session);
+}
+
+static int luo_session_freeze_one(struct luo_session *session)
+{
+ guard(mutex)(&session->mutex);
+ return luo_file_freeze(session);
+}
+
static int luo_session_release(struct inode *inodep, struct file *filep)
{
struct luo_session *session = filep->private_data;
struct luo_session_header *sh;
+ int err = 0;
/* If retrieved is set, it means this session is from incoming list */
- if (session->retrieved)
+ if (session->retrieved) {
sh = &luo_session_global.incoming;
- else
+
+ err = luo_session_finish_one(session);
+ if (err) {
+ pr_warn("Unable to finish session [%s] on release\n",
+ session->name);
+ } else {
+ luo_session_remove(sh, session);
+ luo_session_free(session);
+ }
+
+ } else {
sh = &luo_session_global.outgoing;
- luo_session_remove(sh, session);
- luo_session_free(session);
+ scoped_guard(mutex, &session->mutex)
+ luo_file_unpreserve_files(session);
+ luo_session_remove(sh, session);
+ luo_session_free(session);
+ }
+
+ return err;
+}
+
+static int luo_session_preserve_fd(struct luo_session *session,
+ struct luo_ucmd *ucmd)
+{
+ struct liveupdate_session_preserve_fd *argp = ucmd->cmd;
+ int err;
+
+ guard(mutex)(&session->mutex);
+ err = luo_preserve_file(session, argp->token, argp->fd);
+ if (err)
+ return err;
+
+ err = luo_ucmd_respond(ucmd, sizeof(*argp));
+ if (err)
+ pr_warn("The file was successfully preserved, but response to user failed\n");
+
+ return err;
+}
+
+static int luo_session_retrieve_fd(struct luo_session *session,
+ struct luo_ucmd *ucmd)
+{
+ struct liveupdate_session_retrieve_fd *argp = ucmd->cmd;
+ struct file *file;
+ int err;
+
+ argp->fd = get_unused_fd_flags(O_CLOEXEC);
+ if (argp->fd < 0)
+ return argp->fd;
+
+ guard(mutex)(&session->mutex);
+ err = luo_retrieve_file(session, argp->token, &file);
+ if (err < 0)
+ goto err_put_fd;
+
+ err = luo_ucmd_respond(ucmd, sizeof(*argp));
+ if (err)
+ goto err_put_file;
+
+ fd_install(argp->fd, file);
return 0;
+
+err_put_file:
+ fput(file);
+err_put_fd:
+ put_unused_fd(argp->fd);
+
+ return err;
+}
+
+static int luo_session_finish(struct luo_session *session,
+ struct luo_ucmd *ucmd)
+{
+ struct liveupdate_session_finish *argp = ucmd->cmd;
+ int err = luo_session_finish_one(session);
+
+ if (err)
+ return err;
+
+ return luo_ucmd_respond(ucmd, sizeof(*argp));
+}
+
+union ucmd_buffer {
+ struct liveupdate_session_finish finish;
+ struct liveupdate_session_preserve_fd preserve;
+ struct liveupdate_session_retrieve_fd retrieve;
+};
+
+struct luo_ioctl_op {
+ unsigned int size;
+ unsigned int min_size;
+ unsigned int ioctl_num;
+ int (*execute)(struct luo_session *session, struct luo_ucmd *ucmd);
+};
+
+#define IOCTL_OP(_ioctl, _fn, _struct, _last) \
+ [_IOC_NR(_ioctl) - LIVEUPDATE_CMD_SESSION_BASE] = { \
+ .size = sizeof(_struct) + \
+ BUILD_BUG_ON_ZERO(sizeof(union ucmd_buffer) < \
+ sizeof(_struct)), \
+ .min_size = offsetofend(_struct, _last), \
+ .ioctl_num = _ioctl, \
+ .execute = _fn, \
+ }
+
+static const struct luo_ioctl_op luo_session_ioctl_ops[] = {
+ IOCTL_OP(LIVEUPDATE_SESSION_FINISH, luo_session_finish,
+ struct liveupdate_session_finish, reserved),
+ IOCTL_OP(LIVEUPDATE_SESSION_PRESERVE_FD, luo_session_preserve_fd,
+ struct liveupdate_session_preserve_fd, token),
+ IOCTL_OP(LIVEUPDATE_SESSION_RETRIEVE_FD, luo_session_retrieve_fd,
+ struct liveupdate_session_retrieve_fd, token),
+};
+
+static long luo_session_ioctl(struct file *filep, unsigned int cmd,
+ unsigned long arg)
+{
+ struct luo_session *session = filep->private_data;
+ const struct luo_ioctl_op *op;
+ struct luo_ucmd ucmd = {};
+ union ucmd_buffer buf;
+ unsigned int nr;
+ int ret;
+
+ nr = _IOC_NR(cmd);
+ if (nr < LIVEUPDATE_CMD_SESSION_BASE || (nr - LIVEUPDATE_CMD_SESSION_BASE) >=
+ ARRAY_SIZE(luo_session_ioctl_ops)) {
+ return -EINVAL;
+ }
+
+ ucmd.ubuffer = (void __user *)arg;
+ ret = get_user(ucmd.user_size, (u32 __user *)ucmd.ubuffer);
+ if (ret)
+ return ret;
+
+ op = &luo_session_ioctl_ops[nr - LIVEUPDATE_CMD_SESSION_BASE];
+ if (op->ioctl_num != cmd)
+ return -ENOIOCTLCMD;
+ if (ucmd.user_size < op->min_size)
+ return -EINVAL;
+
+ ucmd.cmd = &buf;
+ ret = copy_struct_from_user(ucmd.cmd, op->size, ucmd.ubuffer,
+ ucmd.user_size);
+ if (ret)
+ return ret;
+
+ return op->execute(session, &ucmd);
}
static const struct file_operations luo_session_fops = {
.owner = THIS_MODULE,
.release = luo_session_release,
+ .unlocked_ioctl = luo_session_ioctl,
};
/* Create a "struct file" for session */
@@ -391,6 +554,8 @@ int luo_session_deserialize(void)
session->count = sh->ser[i].count;
session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
session->pgcnt = sh->ser[i].pgcnt;
+ scoped_guard(mutex, &session->mutex)
+ luo_file_deserialize(session);
}
kho_restore_free(sh->header_ser);
@@ -405,9 +570,14 @@ int luo_session_serialize(void)
struct luo_session_header *sh = &luo_session_global.outgoing;
struct luo_session *session;
int i = 0;
+ int err;
guard(rwsem_write)(&sh->rwsem);
list_for_each_entry(session, &sh->list, list) {
+ err = luo_session_freeze_one(session);
+ if (err)
+ goto err_undo;
+
strscpy(sh->ser[i].name, session->name,
sizeof(sh->ser[i].name));
sh->ser[i].count = session->count;
@@ -418,4 +588,13 @@ int luo_session_serialize(void)
sh->header_ser->count = sh->count;
return 0;
+
+err_undo:
+ list_for_each_entry_continue_reverse(session, &sh->list, list) {
+ luo_session_unfreeze_one(session);
+ i--;
+ memset(&sh->ser[i], 0, sizeof(sh->ser[i]));
+ }
+
+ return err;
}
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (6 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 9:39 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 09/20] docs: add luo documentation Pasha Tatashin
` (11 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce a mechanism for managing global kernel state whose lifecycle
is tied to the preservation of one or more files. This is necessary for
subsystems where multiple preserved file descriptors depend on a single,
shared underlying resource.
An example is HugeTLB, where multiple file descriptors such as memfd and
guest_memfd may rely on the state of a single HugeTLB subsystem.
Preserving this state for each individual file would be redundant and
incorrect. The state should be preserved only once when the first file
is preserved, and restored/finished only once the last file is handled.
This patch introduces File-Lifecycle-Bound (FLB) objects to solve this
problem. An FLB is a global, reference-counted object with a defined set
of operations:
- A file handler (struct liveupdate_file_handler) declares a dependency
on one or more FLBs via a new registration function,
liveupdate_register_flb().
- When the first file depending on an FLB is preserved, the FLB's
.preserve() callback is invoked to save the shared global state. The
reference count is then incremented for each subsequent file.
- Conversely, when the last file is unpreserved (before reboot) or
finished (after reboot), the FLB's .unpreserve() or .finish() callback
is invoked to clean up the global resource.
The implementation includes:
- A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a
corresponding FDT node (luo-flb) to serialize the state of all active
FLBs and pass them via Kexec Handover.
- Core logic in luo_flb.c to manage FLB registration, reference
counting, and the invocation of lifecycle callbacks.
- An API (liveupdate_flb_*_locked/*_unlock) for other kernel subsystems
to safely access the live object managed by an FLB, both before and
after the live update.
This framework provides the necessary infrastructure for more complex
subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update
Orchestrator.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate.h | 116 +++++
include/linux/liveupdate/abi/luo.h | 76 ++++
kernel/liveupdate/Makefile | 1 +
kernel/liveupdate/luo_core.c | 7 +-
kernel/liveupdate/luo_file.c | 8 +
kernel/liveupdate/luo_flb.c | 658 +++++++++++++++++++++++++++++
kernel/liveupdate/luo_internal.h | 7 +
7 files changed, 872 insertions(+), 1 deletion(-)
create mode 100644 kernel/liveupdate/luo_flb.c
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
index 4a5d4dd9905a..36a831ae3ead 100644
--- a/include/linux/liveupdate.h
+++ b/include/linux/liveupdate.h
@@ -14,6 +14,7 @@
#include <uapi/linux/liveupdate.h>
struct liveupdate_file_handler;
+struct liveupdate_flb;
struct liveupdate_session;
struct file;
@@ -81,6 +82,7 @@ struct liveupdate_file_ops {
* associated with individual &struct file instances.
* @list: Used for linking this handler instance into a global
* list of registered file handlers.
+ * @flb_list: A list of FLB dependencies.
*
* Modules that want to support live update for specific file types should
* register an instance of this structure. LUO uses this registration to
@@ -91,6 +93,80 @@ struct liveupdate_file_handler {
const struct liveupdate_file_ops *ops;
const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
struct list_head list;
+ struct list_head flb_list;
+};
+
+/**
+ * struct liveupdate_flb_op_args - Arguments for FLB operation callbacks.
+ * @flb: The global FLB instance for which this call is performed.
+ * @data: For .preserve(): [OUT] The callback sets this field.
+ * For .unpreserve(): [IN] The handle from .preserve().
+ * For .retrieve(): [IN] The handle from .preserve().
+ * @obj: For .preserve(): [OUT] Sets this to the live object.
+ * For .retrieve(): [OUT] Sets this to the live object.
+ * For .finish(): [IN] The live object from .retrieve().
+ *
+ * This structure bundles all parameters for the FLB operation callbacks.
+ */
+struct liveupdate_flb_op_args {
+ struct liveupdate_flb *flb;
+ u64 data;
+ void *obj;
+};
+
+/**
+ * struct liveupdate_flb_ops - Callbacks for global File-Lifecycle-Bound data.
+ * @preserve: Called when the first file using this FLB is preserved.
+ * The callback must save its state and return a single,
+ * self-contained u64 handle by setting the 'argp->data'
+ * field and 'argp->obj'.
+ * @unpreserve: Called when the last file using this FLB is unpreserved
+ * (aborted before reboot). Receives the handle via
+ * 'argp->data' and live object via 'argp->obj'.
+ * @retrieve: Called on-demand in the new kernel, the first time a
+ * component requests access to the shared object. It receives
+ * the preserved handle via 'argp->data' and must reconstruct
+ * the live object, returning it by setting the 'argp->obj'
+ * field.
+ * @finish: Called in the new kernel when the last file using this FLB
+ * is finished. Receives the live object via 'argp->obj' for
+ * cleanup.
+ * @owner: Module reference
+ *
+ * Operations that manage global shared data with file bound lifecycle,
+ * triggered by the first file that uses it and concluded by the last file that
+ * uses it, across all sessions.
+ */
+struct liveupdate_flb_ops {
+ int (*preserve)(struct liveupdate_flb_op_args *argp);
+ void (*unpreserve)(struct liveupdate_flb_op_args *argp);
+ int (*retrieve)(struct liveupdate_flb_op_args *argp);
+ void (*finish)(struct liveupdate_flb_op_args *argp);
+ struct module *owner;
+};
+
+/**
+ * struct liveupdate_flb - A global definition for a shared data object.
+ * @ops: Callback functions
+ * @compatible: The compatibility string (e.g., "iommu-core-v1"
+ * that uniquely identifies the FLB type this handler
+ * supports. This is matched against the compatible string
+ * associated with individual &struct liveupdate_flb
+ * instances.
+ * @list: A global list of registered FLBs.
+ * @internal: Internal state, set in liveupdate_init_flb().
+ *
+ * This struct is the "template" that a driver registers to define a shared,
+ * file-lifecycle-bound object. The actual runtime state (the live object,
+ * refcount, etc.) is managed internally by the LUO core.
+ * Use liveupdate_init_flb() to initialize this struct before using it in
+ * other functions.
+ */
+struct liveupdate_flb {
+ const struct liveupdate_flb_ops *ops;
+ const char compatible[LIVEUPDATE_FLB_COMPAT_LENGTH];
+ struct list_head list;
+ void *internal;
};
#ifdef CONFIG_LIVEUPDATE
@@ -111,6 +187,17 @@ int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
int liveupdate_get_token_outgoing(struct liveupdate_session *s,
struct file *file, u64 *tokenp);
+/* Before using FLB for the first time it should be initialized */
+int liveupdate_init_flb(struct liveupdate_flb *flb);
+
+int liveupdate_register_flb(struct liveupdate_file_handler *h,
+ struct liveupdate_flb *flb);
+
+int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp);
+void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj);
+int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp);
+void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj);
+
#else /* CONFIG_LIVEUPDATE */
static inline bool liveupdate_enabled(void)
@@ -140,5 +227,34 @@ static inline int liveupdate_get_token_outgoing(struct liveupdate_session *s,
return -EOPNOTSUPP;
}
+static inline int liveupdate_init_flb(struct liveupdate_flb *flb)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int liveupdate_register_flb(struct liveupdate_file_handler *h,
+ struct liveupdate_flb *flb)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb,
+ void **objp)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb,
+ void *obj) { }
+
+static inline int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb,
+ void **objp)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb,
+ void *obj) { }
+
#endif /* CONFIG_LIVEUPDATE */
#endif /* _LINUX_LIVEUPDATE_H */
diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
index 3a596ca1907b..85596ce68c16 100644
--- a/include/linux/liveupdate/abi/luo.h
+++ b/include/linux/liveupdate/abi/luo.h
@@ -33,6 +33,11 @@
* compatible = "luo-session-v1";
* luo-session-header = <phys_addr_of_session_header_ser>;
* };
+ *
+ * luo-flb {
+ * compatible = "luo-flb-v1";
+ * luo-flb-header = <phys_addr_of_flb_header_ser>;
+ * };
* };
*
* Main LUO Node (/):
@@ -52,6 +57,17 @@
* is the header for a contiguous block of memory containing an array of
* `struct luo_session_ser`, one for each preserved session.
*
+ * File-Lifecycle-Bound Node (luo-flb):
+ * This node describes all preserved global objects whose lifecycle is bound
+ * to that of the preserved files (e.g., shared IOMMU state).
+ *
+ * - compatible: "luo-flb-v1"
+ * Identifies the FLB ABI version.
+ * - luo-flb-header: u64
+ * The physical address of a `struct luo_flb_header_ser`. This structure is
+ * the header for a contiguous block of memory containing an array of
+ * `struct luo_flb_ser`, one for each preserved global object.
+ *
* Serialization Structures:
* The FDT properties point to memory regions containing arrays of simple,
* `__packed` structures. These structures contain the actual preserved state.
@@ -70,6 +86,16 @@
* Metadata for a single preserved file. Contains the `compatible` string to
* find the correct handler in the new kernel, a user-provided `token` for
* identification, and an opaque `data` handle for the handler to use.
+ *
+ * - struct luo_flb_header_ser:
+ * Header for the FLB array. Contains the total page count of the
+ * preserved memory block and the number of `struct luo_flb_ser` entries
+ * that follow.
+ *
+ * - struct luo_flb_ser:
+ * Metadata for a single preserved global object. Contains its `name`
+ * (compatible string), an opaque `data` handle, and the `count`
+ * number of files depending on it.
*/
#ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
@@ -154,4 +180,54 @@ struct luo_file_ser {
u64 token;
} __packed;
+/* The max size is set so it can be reliably used during in serialization */
+#define LIVEUPDATE_FLB_COMPAT_LENGTH 48
+
+#define LUO_FDT_FLB_NODE_NAME "luo-flb"
+#define LUO_FDT_FLB_COMPATIBLE "luo-flb-v1"
+#define LUO_FDT_FLB_HEADER "luo-flb-header"
+
+/**
+ * struct luo_flb_header_ser - Header for the serialized FLB data block.
+ * @pgcnt: The total number of pages occupied by the entire preserved memory
+ * region, including this header and the subsequent array of
+ * &struct luo_flb_ser entries.
+ * @count: The number of &struct luo_flb_ser entries that follow this header
+ * in the memory block.
+ *
+ * This structure is located at the physical address specified by the
+ * `LUO_FDT_FLB_HEADER` FDT property. It provides the new kernel with the
+ * necessary information to find and iterate over the array of preserved
+ * File-Lifecycle-Bound objects and to manage the underlying memory.
+ *
+ * If this structure is modified, LUO_FDT_FLB_COMPATIBLE must be updated.
+ */
+struct luo_flb_header_ser {
+ u64 pgcnt;
+ u64 count;
+} __packed;
+
+/**
+ * struct luo_flb_ser - Represents the serialized state of a single FLB object.
+ * @name: The unique compatibility string of the FLB object, used to find the
+ * corresponding &struct liveupdate_flb handler in the new kernel.
+ * @data: The opaque u64 handle returned by the FLB's .preserve() operation
+ * in the old kernel. This handle encapsulates the entire state needed
+ * for restoration.
+ * @count: The reference count at the time of serialization; i.e., the number
+ * of preserved files that depended on this FLB. This is used by the
+ * new kernel to correctly manage the FLB's lifecycle.
+ *
+ * An array of these structures is created in a preserved memory region and
+ * passed to the new kernel. Each entry allows the LUO core to restore one
+ * global, shared object.
+ *
+ * If this structure is modified, LUO_FDT_FLB_COMPATIBLE must be updated.
+ */
+struct luo_flb_ser {
+ char name[LIVEUPDATE_FLB_COMPAT_LENGTH];
+ u64 data;
+ u64 count;
+} __packed;
+
#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index c2252a2ad7bd..8d5a8354ad5a 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -3,6 +3,7 @@
luo-y := \
luo_core.o \
luo_file.o \
+ luo_flb.o \
luo_ioctl.o \
luo_session.o
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index 653cdca5e25d..7c3932b6f96f 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -122,7 +122,9 @@ static int __init luo_early_startup(void)
if (err)
return err;
- return 0;
+ err = luo_flb_setup_incoming(luo_global.fdt_in);
+
+ return err;
}
static int __init liveupdate_early_init(void)
@@ -159,6 +161,7 @@ static int __init luo_fdt_setup(void)
err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
err |= luo_session_setup_outgoing(fdt_out);
+ err |= luo_flb_setup_outgoing(fdt_out);
err |= fdt_end_node(fdt_out);
err |= fdt_finish(fdt_out);
if (err)
@@ -220,6 +223,8 @@ int liveupdate_reboot(void)
if (err)
return err;
+ luo_flb_serialize();
+
err = kho_finalize();
if (err) {
pr_err("kho_finalize failed %d\n", err);
diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
index dae27a69a09f..3d3bd84cb281 100644
--- a/kernel/liveupdate/luo_file.c
+++ b/kernel/liveupdate/luo_file.c
@@ -282,6 +282,10 @@ int luo_preserve_file(struct luo_session *session, u64 token, int fd)
if (err)
goto exit_err;
+ err = luo_flb_file_preserve(fh);
+ if (err)
+ goto exit_err;
+
luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
if (!luo_file) {
err = -ENOMEM;
@@ -301,6 +305,7 @@ int luo_preserve_file(struct luo_session *session, u64 token, int fd)
if (err) {
mutex_destroy(&luo_file->mutex);
kfree(luo_file);
+ luo_flb_file_unpreserve(fh);
goto exit_err;
} else {
luo_file->serialized_data = args.serialized_data;
@@ -352,6 +357,7 @@ void luo_file_unpreserve_files(struct luo_session *session)
args.file = luo_file->file;
args.serialized_data = luo_file->serialized_data;
luo_file->fh->ops->unpreserve(&args);
+ luo_flb_file_unpreserve(luo_file->fh);
list_del(&luo_file->list);
session->count--;
@@ -624,6 +630,7 @@ static void luo_file_finish_one(struct luo_session *session,
args.file = luo_file->file;
args.serialized_data = luo_file->serialized_data;
args.retrieved = luo_file->retrieved;
+ luo_flb_file_finish(luo_file->fh);
luo_file->fh->ops->finish(&args);
}
@@ -815,6 +822,7 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
return -EAGAIN;
INIT_LIST_HEAD(&fh->list);
+ INIT_LIST_HEAD(&fh->flb_list);
list_add_tail(&fh->list, &luo_file_handler_list);
return 0;
diff --git a/kernel/liveupdate/luo_flb.c b/kernel/liveupdate/luo_flb.c
new file mode 100644
index 000000000000..47fcd3d74eb5
--- /dev/null
+++ b/kernel/liveupdate/luo_flb.c
@@ -0,0 +1,658 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: LUO File Lifecycle Bound Global Data
+ *
+ * File-Lifecycle-Bound (FLB) objects provide a mechanism for managing global
+ * state that is shared across multiple live-updatable files. The lifecycle of
+ * this shared state is tied to the preservation of the files that depend on it.
+ *
+ * An FLB represents a global resource, such as the IOMMU core state, that is
+ * required by multiple file descriptors (e.g., all VFIO fds).
+ *
+ * The preservation of the FLB's state is triggered when the *first* file
+ * depending on it is preserved. The cleanup of this state (unpreserve or
+ * finish) is triggered when the *last* file depending on it is unpreserved or
+ * finished.
+ *
+ * Handler Dependency: A file handler declares its dependency on one or more
+ * FLBs by registering them via liveupdate_register_flb().
+ *
+ * Callback Model: Each FLB is defined by a set of operations
+ * (&struct liveupdate_flb_ops) that LUO invokes at key points:
+ *
+ * - .preserve(): Called for the first file. Saves global state.
+ * - .unpreserve(): Called for the last file (if aborted pre-reboot).
+ * - .retrieve(): Called on-demand in the new kernel to restore the state.
+ * - .finish(): Called for the last file in the new kernel for cleanup.
+ *
+ * This reference-counted approach ensures that shared state is saved exactly
+ * once and restored exactly once, regardless of how many files depend on it,
+ * and that its lifecycle is correctly managed across the kexec transition.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cleanup.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/libfdt.h>
+#include <linux/list.h>
+#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/unaligned.h>
+#include "luo_internal.h"
+
+#define LUO_FLB_PGCNT 1ul
+#define LUO_FLB_MAX (((LUO_FLB_PGCNT << PAGE_SHIFT) - \
+ sizeof(struct luo_flb_header_ser)) / sizeof(struct luo_flb_ser))
+
+struct luo_flb_header {
+ struct luo_flb_header_ser *header_ser;
+ struct luo_flb_ser *ser;
+ bool active;
+};
+
+struct luo_flb_global {
+ struct luo_flb_header incoming;
+ struct luo_flb_header outgoing;
+ struct list_head list;
+ long count;
+};
+
+static struct luo_flb_global luo_flb_global = {
+ .list = LIST_HEAD_INIT(luo_flb_global.list),
+};
+
+/*
+ * struct luo_flb_link - Links an FLB definition to a file handler's internal
+ * list of dependencies.
+ * @flb: A pointer to the registered &struct liveupdate_flb definition.
+ * @list: The list_head for linking.
+ */
+struct luo_flb_link {
+ struct liveupdate_flb *flb;
+ struct list_head list;
+};
+
+/*
+ * struct luo_flb_state - Holds the runtime state for one FLB lifecycle path.
+ * @count: The number of preserved files currently depending on this FLB.
+ * This is used to trigger the preserve/unpreserve/finish ops on the
+ * first/last file.
+ * @data: The opaque u64 handle returned by .preserve() or passed to
+ * .retrieve().
+ * @obj: The live kernel object returned by .preserve() or .retrieve().
+ * @lock: A mutex that protects all fields within this structure, providing
+ * the synchronization service for the FLB's ops.
+ */
+struct luo_flb_state {
+ long count;
+ u64 data;
+ void *obj;
+ struct mutex lock;
+};
+
+/*
+ * struct luo_flb_internal - Keep separate incoming and outgoing states.
+ * @outgoing: The runtime state for the pre-reboot (preserve/unpreserve)
+ * lifecycle.
+ * @incoming: The runtime state for the post-reboot (retrieve/finish)
+ * lifecycle.
+ */
+struct luo_flb_internal {
+ struct luo_flb_state outgoing;
+ struct luo_flb_state incoming;
+};
+
+static int luo_flb_file_preserve_one(struct liveupdate_flb *flb)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ scoped_guard(mutex, &internal->outgoing.lock) {
+ if (!internal->outgoing.count) {
+ struct liveupdate_flb_op_args args = {0};
+ int err;
+
+ args.flb = flb;
+ err = flb->ops->preserve(&args);
+ if (err)
+ return err;
+ internal->outgoing.data = args.data;
+ internal->outgoing.obj = args.obj;
+ }
+ internal->outgoing.count++;
+ }
+
+ return 0;
+}
+
+static void luo_flb_file_unpreserve_one(struct liveupdate_flb *flb)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ scoped_guard(mutex, &internal->outgoing.lock) {
+ internal->outgoing.count--;
+ if (!internal->outgoing.count) {
+ struct liveupdate_flb_op_args args = {0};
+
+ args.flb = flb;
+ args.data = internal->outgoing.data;
+ args.obj = internal->outgoing.obj;
+
+ if (flb->ops->unpreserve)
+ flb->ops->unpreserve(&args);
+
+ internal->outgoing.data = 0;
+ internal->outgoing.obj = NULL;
+ }
+ }
+}
+
+static int luo_flb_retrieve_one(struct liveupdate_flb *flb)
+{
+ struct luo_flb_header *fh = &luo_flb_global.incoming;
+ struct luo_flb_internal *internal = flb->internal;
+ struct liveupdate_flb_op_args args = {0};
+ bool found = false;
+ int err;
+
+ guard(mutex)(&internal->incoming.lock);
+
+ if (internal->incoming.obj)
+ return 0;
+
+ if (!fh->active)
+ return -ENODATA;
+
+ for (int i = 0; i < fh->header_ser->count; i++) {
+ if (!strcmp(fh->ser[i].name, flb->compatible)) {
+ internal->incoming.data = fh->ser[i].data;
+ internal->incoming.count = fh->ser[i].count;
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ return -ENOENT;
+
+ args.flb = flb;
+ args.data = internal->incoming.data;
+
+ err = flb->ops->retrieve(&args);
+ if (err)
+ return err;
+
+ internal->incoming.obj = args.obj;
+
+ if (WARN_ON_ONCE(!internal->incoming.obj))
+ return -EIO;
+
+ return 0;
+}
+
+static void luo_flb_file_finish_one(struct liveupdate_flb *flb)
+{
+ struct luo_flb_internal *internal = flb->internal;
+ u64 count;
+
+ scoped_guard(mutex, &internal->incoming.lock)
+ count = --internal->incoming.count;
+
+ if (!count) {
+ struct liveupdate_flb_op_args args = {0};
+
+ if (!internal->incoming.obj) {
+ int err = luo_flb_retrieve_one(flb);
+
+ if (WARN_ON(err))
+ return;
+ }
+
+ scoped_guard(mutex, &internal->incoming.lock) {
+ args.flb = flb;
+ args.obj = internal->incoming.obj;
+ flb->ops->finish(&args);
+
+ internal->incoming.data = 0;
+ internal->incoming.obj = NULL;
+ }
+ }
+}
+
+/**
+ * luo_flb_file_preserve - Notifies FLBs that a file is about to be preserved.
+ * @h: The file handler for the preserved file.
+ *
+ * This function iterates through all FLBs associated with the given file
+ * handler. It increments the reference count for each FLB. If the count becomes
+ * 1, it triggers the FLB's .preserve() callback to save the global state.
+ *
+ * This operation is atomic. If any FLB's .preserve() op fails, it will roll
+ * back by calling .unpreserve() on any FLBs that were successfully preserved
+ * during this call.
+ *
+ * Context: Called from luo_preserve_file()
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int luo_flb_file_preserve(struct liveupdate_file_handler *h)
+{
+ struct luo_flb_link *iter;
+ int err = 0;
+
+ list_for_each_entry(iter, &h->flb_list, list) {
+ err = luo_flb_file_preserve_one(iter->flb);
+ if (err)
+ goto exit_err;
+ }
+
+ return 0;
+
+exit_err:
+ list_for_each_entry_continue_reverse(iter, &h->flb_list, list)
+ luo_flb_file_unpreserve_one(iter->flb);
+
+ return err;
+}
+
+/**
+ * luo_flb_file_unpreserve - Notifies FLBs that a dependent file was unpreserved.
+ * @h: The file handler for the unpreserved file.
+ *
+ * This function iterates through all FLBs associated with the given file
+ * handler, in reverse order of registration. It decrements the reference count
+ * for each FLB. If the count becomes 0, it triggers the FLB's .unpreserve()
+ * callback to clean up the global state.
+ *
+ * Context: Called when a preserved file is being cleaned up before reboot
+ * (e.g., from luo_file_unpreserve_files()).
+ */
+void luo_flb_file_unpreserve(struct liveupdate_file_handler *h)
+{
+ struct luo_flb_link *iter;
+
+ list_for_each_entry_reverse(iter, &h->flb_list, list)
+ luo_flb_file_unpreserve_one(iter->flb);
+}
+
+/**
+ * luo_flb_file_finish - Notifies FLBs that a dependent file has been finished.
+ * @h: The file handler for the finished file.
+ *
+ * This function iterates through all FLBs associated with the given file
+ * handler, in reverse order of registration. It decrements the incoming
+ * reference count for each FLB. If the count becomes 0, it triggers the FLB's
+ * .finish() callback for final cleanup in the new kernel.
+ *
+ * Context: Called from luo_file_finish() for each file being finished.
+ */
+void luo_flb_file_finish(struct liveupdate_file_handler *h)
+{
+ struct luo_flb_link *iter;
+
+ list_for_each_entry_reverse(iter, &h->flb_list, list)
+ luo_flb_file_finish_one(iter->flb);
+}
+
+/**
+ * liveupdate_init_flb - Initializes a liveupdate FLB structure.
+ * @flb: The &struct liveupdate_flb to initialize.
+ *
+ * This function must be called to prepare an FLB structure before it can be
+ * used with liveupdate_register_flb() or any other LUO functions.
+ *
+ * Context: Typically called once from a subsystem's module init function for
+ * each global FLB object that the module defines.
+ *
+ * Return: 0 on success, or -ENOMEM if memory allocation fails, and -EOPNOTSUPP
+ * when live update is disabled or not configured.
+ */
+int liveupdate_init_flb(struct liveupdate_flb *flb)
+{
+ struct luo_flb_internal *internal;
+
+ if (!liveupdate_enabled())
+ return -EOPNOTSUPP;
+
+ internal = kzalloc(sizeof(*internal), GFP_KERNEL | __GFP_ZERO);
+ if (!internal)
+ return -ENOMEM;
+
+ mutex_init(&internal->incoming.lock);
+ mutex_init(&internal->outgoing.lock);
+
+ flb->internal = internal;
+ INIT_LIST_HEAD(&flb->list);
+
+ return 0;
+}
+
+/**
+ * liveupdate_register_flb - Associate an FLB with a file handler and register it globally.
+ * @h: The file handler that will now depend on the FLB.
+ * @flb: The File-Lifecycle-Bound object to associate.
+ *
+ * Establishes a dependency, informing the LUO core that whenever a file of
+ * type @h is preserved, the state of @flb must also be managed.
+ *
+ * On the first registration of a given @flb object, it is added to a global
+ * registry. This function checks for duplicate registrations, both for a
+ * specific handler and globally, and ensures the total number of unique
+ * FLBs does not exceed the system limit.
+ *
+ * Context: Typically called from a subsystem's module init function after
+ * both the handler and the FLB have been defined and initialized.
+ * Return: 0 on success. Returns a negative errno on failure:
+ * -EINVAL if arguments are NULL or not initialized.
+ * -ENOMEM on memory allocation failure.
+ * -EEXIST if this FLB is already registered with this handler.
+ * -ENOSPC if the maximum number of global FLBs has been reached.
+ * -EOPNOTSUPP if live update is disabled or not configured.
+ */
+int liveupdate_register_flb(struct liveupdate_file_handler *h,
+ struct liveupdate_flb *flb)
+{
+ struct luo_flb_internal *internal = flb->internal;
+ struct luo_flb_link *link __free(kfree) = NULL;
+ static DEFINE_MUTEX(register_flb_lock);
+ struct liveupdate_flb *gflb;
+ struct luo_flb_link *iter;
+
+ if (!liveupdate_enabled())
+ return -EOPNOTSUPP;
+
+ if (WARN_ON(!h || !flb || !internal))
+ return -EINVAL;
+
+ if (WARN_ON(!flb->ops->preserve || !flb->ops->unpreserve ||
+ !flb->ops->retrieve || !flb->ops->finish)) {
+ return -EINVAL;
+ }
+
+ /*
+ * Once session/files have been deserialized, FLBs cannot be registered,
+ * it is too late. Deserialization uses file handlers, and FLB registers
+ * to file handlers.
+ */
+ if (WARN_ON(luo_session_is_deserialized()))
+ return -EBUSY;
+
+ /*
+ * File handler must already be registered, as it is initializes the
+ * flb_list
+ */
+ if (WARN_ON(list_empty(&h->list)))
+ return -EINVAL;
+
+ link = kzalloc(sizeof(*link), GFP_KERNEL);
+ if (!link)
+ return -ENOMEM;
+
+ guard(mutex)(®ister_flb_lock);
+
+ /* Check that this FLB is not already linked to this file handler */
+ list_for_each_entry(iter, &h->flb_list, list) {
+ if (iter->flb == flb)
+ return -EEXIST;
+ }
+
+ /* Is this FLB linked to global list ? */
+ if (list_empty(&flb->list)) {
+ if (luo_flb_global.count == LUO_FLB_MAX)
+ return -ENOSPC;
+
+ /* Check that compatible string is unique in global list */
+ list_for_each_entry(gflb, &luo_flb_global.list, list) {
+ if (!strcmp(gflb->compatible, flb->compatible))
+ return -EEXIST;
+ }
+
+ if (!try_module_get(flb->ops->owner))
+ return -EAGAIN;
+
+ list_add_tail(&flb->list, &luo_flb_global.list);
+ luo_flb_global.count++;
+ }
+
+ /* Finally, link the FLB to the file handler */
+ link->flb = flb;
+ list_add_tail(&no_free_ptr(link)->list, &h->flb_list);
+
+ return 0;
+}
+
+/**
+ * liveupdate_flb_incoming_locked - Lock and retrieve the incoming FLB object.
+ * @flb: The FLB definition.
+ * @objp: Output parameter; will be populated with the live shared object.
+ *
+ * Acquires the FLB's internal lock and returns a pointer to its shared live
+ * object for the incoming (post-reboot) path.
+ *
+ * If this is the first time the object is requested in the new kernel, this
+ * function will trigger the FLB's .retrieve() callback to reconstruct the
+ * object from its preserved state. Subsequent calls will return the same
+ * cached object.
+ *
+ * The caller MUST call liveupdate_flb_incoming_unlock() to release the lock.
+ *
+ * Return: 0 on success, or a negative errno on failure. -ENODATA means no
+ * incoming FLB data, -ENOENT means specific flb not found in the incoming
+ * data, and -EOPNOTSUPP when live update is disabled or not configured.
+ */
+int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ if (!liveupdate_enabled())
+ return -EOPNOTSUPP;
+
+ if (WARN_ON(!internal))
+ return -EINVAL;
+
+ if (!internal->incoming.obj) {
+ int err = luo_flb_retrieve_one(flb);
+
+ if (err)
+ return err;
+ }
+
+ mutex_lock(&internal->incoming.lock);
+ *objp = internal->incoming.obj;
+
+ return 0;
+}
+
+/**
+ * liveupdate_flb_incoming_unlock - Unlock an incoming FLB object.
+ * @flb: The FLB definition.
+ * @obj: The object that was returned by the _locked call (used for validation).
+ *
+ * Releases the internal lock acquired by liveupdate_flb_incoming_locked().
+ */
+void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ lockdep_assert_held(&internal->incoming.lock);
+ internal->incoming.obj = obj;
+ mutex_unlock(&internal->incoming.lock);
+}
+
+/**
+ * liveupdate_flb_outgoing_locked - Lock and retrieve the outgoing FLB object.
+ * @flb: The FLB definition.
+ * @objp: Output parameter; will be populated with the live shared object.
+ *
+ * Acquires the FLB's internal lock and returns a pointer to its shared live
+ * object for the outgoing (pre-reboot) path.
+ *
+ * This function assumes the object has already been created by the FLB's
+ * .preserve() callback, which is triggered when the first dependent file
+ * is preserved.
+ *
+ * The caller MUST call liveupdate_flb_outgoing_unlock() to release the lock.
+ *
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ if (!liveupdate_enabled())
+ return -EOPNOTSUPP;
+
+ if (WARN_ON(!internal))
+ return -EINVAL;
+
+ mutex_lock(&internal->outgoing.lock);
+
+ /* The object must exist if any file is being preserved */
+ if (WARN_ON_ONCE(!internal->outgoing.obj)) {
+ mutex_unlock(&internal->outgoing.lock);
+ return -ENOENT;
+ }
+
+ *objp = internal->outgoing.obj;
+
+ return 0;
+}
+
+/**
+ * liveupdate_flb_outgoing_unlock - Unlock an outgoing FLB object.
+ * @flb: The FLB definition.
+ * @obj: The object that was returned by the _locked call (used for validation).
+ *
+ * Releases the internal lock acquired by liveupdate_flb_outgoing_locked().
+ */
+void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj)
+{
+ struct luo_flb_internal *internal = flb->internal;
+
+ lockdep_assert_held(&internal->outgoing.lock);
+ internal->outgoing.obj = obj;
+ mutex_unlock(&internal->outgoing.lock);
+}
+
+int __init luo_flb_setup_outgoing(void *fdt_out)
+{
+ struct luo_flb_header_ser *header_ser;
+ u64 header_ser_pa;
+ int err;
+
+ header_ser = kho_alloc_preserve(LUO_FLB_PGCNT << PAGE_SHIFT);
+ if (IS_ERR(header_ser))
+ return PTR_ERR(header_ser);
+
+ header_ser_pa = virt_to_phys(header_ser);
+
+ err = fdt_begin_node(fdt_out, LUO_FDT_FLB_NODE_NAME);
+ err |= fdt_property_string(fdt_out, "compatible",
+ LUO_FDT_FLB_COMPATIBLE);
+ err |= fdt_property(fdt_out, LUO_FDT_FLB_HEADER, &header_ser_pa,
+ sizeof(header_ser_pa));
+ err |= fdt_end_node(fdt_out);
+
+ if (err)
+ goto err_unpreserve;
+
+ header_ser->pgcnt = LUO_FLB_PGCNT;
+ luo_flb_global.outgoing.header_ser = header_ser;
+ luo_flb_global.outgoing.ser = (void *)(header_ser + 1);
+ luo_flb_global.outgoing.active = true;
+
+ return 0;
+
+err_unpreserve:
+ kho_unpreserve_free(header_ser);
+
+ return err;
+}
+
+int __init luo_flb_setup_incoming(void *fdt_in)
+{
+ struct luo_flb_header_ser *header_ser;
+ int err, header_size, offset;
+ const void *ptr;
+ u64 header_ser_pa;
+
+ offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_FLB_NODE_NAME);
+ if (offset < 0) {
+ pr_err("Unable to get FLB node [%s]\n", LUO_FDT_FLB_NODE_NAME);
+
+ return -ENOENT;
+ }
+
+ err = fdt_node_check_compatible(fdt_in, offset,
+ LUO_FDT_FLB_COMPATIBLE);
+ if (err) {
+ pr_err("FLB node is incompatible with '%s' [%d]\n",
+ LUO_FDT_FLB_COMPATIBLE, err);
+
+ return -EINVAL;
+ }
+
+ header_size = 0;
+ ptr = fdt_getprop(fdt_in, offset, LUO_FDT_FLB_HEADER, &header_size);
+ if (!ptr || header_size != sizeof(u64)) {
+ pr_err("Unable to get FLB header property '%s' [%d]\n",
+ LUO_FDT_FLB_HEADER, header_size);
+
+ return -EINVAL;
+ }
+
+ header_ser_pa = get_unaligned((u64 *)ptr);
+ header_ser = phys_to_virt(header_ser_pa);
+
+ luo_flb_global.incoming.header_ser = header_ser;
+ luo_flb_global.incoming.ser = (void *)(header_ser + 1);
+ luo_flb_global.incoming.active = true;
+
+ return 0;
+}
+
+/**
+ * luo_flb_serialize - Serializes all active FLB objects for KHO.
+ *
+ * This function is called from the reboot path. It iterates through all
+ * registered File-Lifecycle-Bound (FLB) objects. For each FLB that has been
+ * preserved (i.e., its reference count is greater than zero), it writes its
+ * metadata into the memory region designated for Kexec Handover.
+ *
+ * The serialized data includes the FLB's compatibility string, its opaque
+ * data handle, and the final reference count. This allows the new kernel to
+ * find the appropriate handler and reconstruct the FLB's state.
+ *
+ * Context: Called from liveupdate_reboot() just before kho_finalize().
+ */
+void luo_flb_serialize(void)
+{
+ struct luo_flb_header *fh = &luo_flb_global.outgoing;
+ struct liveupdate_flb *flb;
+ int i = 0;
+
+ list_for_each_entry(flb, &luo_flb_global.list, list) {
+ struct luo_flb_internal *internal = flb->internal;
+
+ if (internal->outgoing.count > 0) {
+ strscpy(fh->ser[i].name, flb->compatible,
+ sizeof(fh->ser[i].name));
+ fh->ser[i].data = internal->outgoing.data;
+ fh->ser[i].count = internal->outgoing.count;
+ i++;
+ }
+ }
+
+ fh->header_ser->count = i;
+}
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
index 1a36f2383123..389fb102775f 100644
--- a/kernel/liveupdate/luo_internal.h
+++ b/kernel/liveupdate/luo_internal.h
@@ -79,4 +79,11 @@ int luo_retrieve_file(struct luo_session *session, u64 token,
int luo_file_finish(struct luo_session *session);
int luo_file_deserialize(struct luo_session *session);
+int luo_flb_file_preserve(struct liveupdate_file_handler *h);
+void luo_flb_file_unpreserve(struct liveupdate_file_handler *h);
+void luo_flb_file_finish(struct liveupdate_file_handler *h);
+int __init luo_flb_setup_outgoing(void *fdt);
+int __init luo_flb_setup_incoming(void *fdt);
+void luo_flb_serialize(void);
+
#endif /* _LINUX_LUO_INTERNAL_H */
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 09/20] docs: add luo documentation
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (7 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 10/20] MAINTAINERS: add liveupdate entry Pasha Tatashin
` (10 subsequent siblings)
19 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Add the documentation files for the Live Update Orchestrator
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/liveupdate.rst | 64 ++++++++++++++++++++++
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/liveupdate.rst | 20 +++++++
4 files changed, 86 insertions(+)
create mode 100644 Documentation/core-api/liveupdate.rst
create mode 100644 Documentation/userspace-api/liveupdate.rst
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 6cbdcbfa79c3..5eb0fbbbc323 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -138,6 +138,7 @@ Documents that don't fit elsewhere or which have yet to be categorized.
:maxdepth: 1
librs
+ liveupdate
netlink
.. only:: subproject and html
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
new file mode 100644
index 000000000000..deacc098d024
--- /dev/null
+++ b/Documentation/core-api/liveupdate.rst
@@ -0,0 +1,64 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+Live Update Orchestrator
+========================
+:Author: Pasha Tatashin <pasha.tatashin@soleen.com>
+
+.. kernel-doc:: kernel/liveupdate/luo_core.c
+ :doc: Live Update Orchestrator (LUO)
+
+LUO Sessions
+============
+.. kernel-doc:: kernel/liveupdate/luo_session.c
+ :doc: LUO Sessions
+
+LUO Preserving File Descriptors
+===============================
+.. kernel-doc:: kernel/liveupdate/luo_file.c
+ :doc: LUO File Descriptors
+
+LUO File Lifecycle Bound Global Data
+====================================
+.. kernel-doc:: kernel/liveupdate/luo_flb.c
+ :doc: LUO File Lifecycle Bound Global Data
+
+Live Update Orchestrator ABI
+============================
+.. kernel-doc:: include/linux/liveupdate/abi/luo.h
+ :doc: Live Update Orchestrator ABI
+
+Public API
+==========
+.. kernel-doc:: include/linux/liveupdate.h
+
+.. kernel-doc:: include/linux/liveupdate/abi/luo.h
+
+.. kernel-doc:: kernel/liveupdate/luo_core.c
+ :export:
+
+.. kernel-doc:: kernel/liveupdate/luo_flb.c
+ :export:
+
+.. kernel-doc:: kernel/liveupdate/luo_file.c
+ :export:
+
+Internal API
+============
+.. kernel-doc:: kernel/liveupdate/luo_core.c
+ :internal:
+
+.. kernel-doc:: kernel/liveupdate/luo_flb.c
+ :internal:
+
+.. kernel-doc:: kernel/liveupdate/luo_session.c
+ :internal:
+
+.. kernel-doc:: kernel/liveupdate/luo_file.c
+ :internal:
+
+See Also
+========
+
+- :doc:`Live Update uAPI </userspace-api/liveupdate>`
+- :doc:`/core-api/kho/concepts`
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index b8c73be4fb11..8a61ac4c1bf1 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -61,6 +61,7 @@ Everything else
:maxdepth: 1
ELF
+ liveupdate
netlink/index
sysfs-platform_profile
vduse
diff --git a/Documentation/userspace-api/liveupdate.rst b/Documentation/userspace-api/liveupdate.rst
new file mode 100644
index 000000000000..04210a6cf6d6
--- /dev/null
+++ b/Documentation/userspace-api/liveupdate.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Live Update uAPI
+================
+:Author: Pasha Tatashin <pasha.tatashin@soleen.com>
+
+ioctl interface
+===============
+.. kernel-doc:: kernel/liveupdate/luo_ioctl.c
+ :doc: LUO ioctl Interface
+
+ioctl uAPI
+===========
+.. kernel-doc:: include/uapi/linux/liveupdate.h
+
+See Also
+========
+
+- :doc:`Live Update Orchestrator </core-api/liveupdate>`
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 10/20] MAINTAINERS: add liveupdate entry
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (8 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 09/20] docs: add luo documentation Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 9:40 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
` (9 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Add a MAINTAINERS file entry for the new Live Update Orchestrator
introduced in previous patches.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
MAINTAINERS | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 500789529359..bc9f5c6f0e80 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14464,6 +14464,17 @@ F: kernel/module/livepatch.c
F: samples/livepatch/
F: tools/testing/selftests/livepatch/
+LIVE UPDATE
+M: Pasha Tatashin <pasha.tatashin@soleen.com>
+L: linux-kernel@vger.kernel.org
+S: Maintained
+F: Documentation/core-api/liveupdate.rst
+F: Documentation/userspace-api/liveupdate.rst
+F: include/linux/liveupdate.h
+F: include/linux/liveupdate/
+F: include/uapi/linux/liveupdate.h
+F: kernel/liveupdate/
+
LLC (802.2)
L: netdev@vger.kernel.org
S: Odd fixes
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (9 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 10/20] MAINTAINERS: add liveupdate entry Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 9:48 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 12/20] mm: shmem: allow freezing inode mapping Pasha Tatashin
` (8 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <ptyadav@amazon.de>
shmem_inode_info::flags can have the VM flags VM_NORESERVE and
VM_LOCKED. These are used to suppress pre-accounting or to lock the
pages in the inode respectively. Using the VM flags directly makes it
difficult to add shmem-specific flags that are unrelated to VM behavior
since one would need to find a VM flag not used by shmem and re-purpose
it.
Introduce SHMEM_F_NORESERVE and SHMEM_F_LOCKED which represent the same
information, but their bits are independent of the VM flags. Callers can
still pass VM_NORESERVE to shmem_get_inode(), but it gets transformed to
the shmem-specific flag internally.
No functional changes intended.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/shmem_fs.h | 6 ++++++
mm/shmem.c | 28 +++++++++++++++-------------
2 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 0e47465ef0fd..650874b400b5 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -10,6 +10,7 @@
#include <linux/xattr.h>
#include <linux/fs_parser.h>
#include <linux/userfaultfd_k.h>
+#include <linux/bits.h>
struct swap_iocb;
@@ -19,6 +20,11 @@ struct swap_iocb;
#define SHMEM_MAXQUOTAS 2
#endif
+/* Suppress pre-accounting of the entire object size. */
+#define SHMEM_F_NORESERVE BIT(0)
+/* Disallow swapping. */
+#define SHMEM_F_LOCKED BIT(1)
+
struct shmem_inode_info {
spinlock_t lock;
unsigned int seals; /* shmem seals */
diff --git a/mm/shmem.c b/mm/shmem.c
index 58701d14dd96..1d5036dec08a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -175,20 +175,20 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
*/
static inline int shmem_acct_size(unsigned long flags, loff_t size)
{
- return (flags & VM_NORESERVE) ?
+ return (flags & SHMEM_F_NORESERVE) ?
0 : security_vm_enough_memory_mm(current->mm, VM_ACCT(size));
}
static inline void shmem_unacct_size(unsigned long flags, loff_t size)
{
- if (!(flags & VM_NORESERVE))
+ if (!(flags & SHMEM_F_NORESERVE))
vm_unacct_memory(VM_ACCT(size));
}
static inline int shmem_reacct_size(unsigned long flags,
loff_t oldsize, loff_t newsize)
{
- if (!(flags & VM_NORESERVE)) {
+ if (!(flags & SHMEM_F_NORESERVE)) {
if (VM_ACCT(newsize) > VM_ACCT(oldsize))
return security_vm_enough_memory_mm(current->mm,
VM_ACCT(newsize) - VM_ACCT(oldsize));
@@ -206,7 +206,7 @@ static inline int shmem_reacct_size(unsigned long flags,
*/
static inline int shmem_acct_blocks(unsigned long flags, long pages)
{
- if (!(flags & VM_NORESERVE))
+ if (!(flags & SHMEM_F_NORESERVE))
return 0;
return security_vm_enough_memory_mm(current->mm,
@@ -215,7 +215,7 @@ static inline int shmem_acct_blocks(unsigned long flags, long pages)
static inline void shmem_unacct_blocks(unsigned long flags, long pages)
{
- if (flags & VM_NORESERVE)
+ if (flags & SHMEM_F_NORESERVE)
vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
}
@@ -1551,7 +1551,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
int nr_pages;
bool split = false;
- if ((info->flags & VM_LOCKED) || sbinfo->noswap)
+ if ((info->flags & SHMEM_F_LOCKED) || sbinfo->noswap)
goto redirty;
if (!total_swap_pages)
@@ -2910,15 +2910,15 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
* ipc_lock_object() when called from shmctl_do_lock(),
* no serialization needed when called from shm_destroy().
*/
- if (lock && !(info->flags & VM_LOCKED)) {
+ if (lock && !(info->flags & SHMEM_F_LOCKED)) {
if (!user_shm_lock(inode->i_size, ucounts))
goto out_nomem;
- info->flags |= VM_LOCKED;
+ info->flags |= SHMEM_F_LOCKED;
mapping_set_unevictable(file->f_mapping);
}
- if (!lock && (info->flags & VM_LOCKED) && ucounts) {
+ if (!lock && (info->flags & SHMEM_F_LOCKED) && ucounts) {
user_shm_unlock(inode->i_size, ucounts);
- info->flags &= ~VM_LOCKED;
+ info->flags &= ~SHMEM_F_LOCKED;
mapping_clear_unevictable(file->f_mapping);
}
retval = 0;
@@ -3062,7 +3062,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
spin_lock_init(&info->lock);
atomic_set(&info->stop_eviction, 0);
info->seals = F_SEAL_SEAL;
- info->flags = flags & VM_NORESERVE;
+ info->flags = (flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
info->i_crtime = inode_get_mtime(inode);
info->fsflags = (dir == NULL) ? 0 :
SHMEM_I(dir)->fsflags & SHMEM_FL_INHERITED;
@@ -5804,8 +5804,10 @@ static inline struct inode *shmem_get_inode(struct mnt_idmap *idmap,
/* common code */
static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
- loff_t size, unsigned long flags, unsigned int i_flags)
+ loff_t size, unsigned long vm_flags,
+ unsigned int i_flags)
{
+ unsigned long flags = (vm_flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
struct inode *inode;
struct file *res;
@@ -5822,7 +5824,7 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
return ERR_PTR(-ENOMEM);
inode = shmem_get_inode(&nop_mnt_idmap, mnt->mnt_sb, NULL,
- S_IFREG | S_IRWXUGO, 0, flags);
+ S_IFREG | S_IRWXUGO, 0, vm_flags);
if (IS_ERR(inode)) {
shmem_unacct_size(flags, size);
return ERR_CAST(inode);
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 12/20] mm: shmem: allow freezing inode mapping
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (10 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 10:08 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 13/20] mm: shmem: export some functions to internal.h Pasha Tatashin
` (7 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <ptyadav@amazon.de>
To prepare a shmem inode for live update via the Live Update
Orchestrator (LUO), its index -> folio mappings must be serialized. Once
the mappings are serialized, they cannot change since it would cause the
serialized data to become inconsistent. This can be done by pinning the
folios to avoid migration, and by making sure no folios can be added to
or removed from the inode.
While mechanisms to pin folios already exist, the only way to stop
folios being added or removed are the grow and shrink file seals. But
file seals come with their own semantics, one of which is that they
can't be removed. This doesn't work with liveupdate since it can be
cancelled or error out, which would need the seals to be removed and the
file's normal functionality to be restored.
Introduce SHMEM_F_MAPPING_FROZEN to indicate this instead. It is
internal to shmem and is not directly exposed to userspace. It functions
similar to F_SEAL_GROW | F_SEAL_SHRINK, but additionally disallows hole
punching, and can be removed.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/shmem_fs.h | 17 +++++++++++++++++
mm/shmem.c | 12 +++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 650874b400b5..a9f5db472a39 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -24,6 +24,14 @@ struct swap_iocb;
#define SHMEM_F_NORESERVE BIT(0)
/* Disallow swapping. */
#define SHMEM_F_LOCKED BIT(1)
+/*
+ * Disallow growing, shrinking, or hole punching in the inode. Combined with
+ * folio pinning, makes sure the inode's mapping stays fixed.
+ *
+ * In some ways similar to F_SEAL_GROW | F_SEAL_SHRINK, but can be removed and
+ * isn't directly visible to userspace.
+ */
+#define SHMEM_F_MAPPING_FROZEN BIT(2)
struct shmem_inode_info {
spinlock_t lock;
@@ -186,6 +194,15 @@ static inline bool shmem_file(struct file *file)
return shmem_mapping(file->f_mapping);
}
+/* Must be called with inode lock taken exclusive. */
+static inline void shmem_i_mapping_freeze(struct inode *inode, bool freeze)
+{
+ if (freeze)
+ SHMEM_I(inode)->flags |= SHMEM_F_MAPPING_FROZEN;
+ else
+ SHMEM_I(inode)->flags &= ~SHMEM_F_MAPPING_FROZEN;
+}
+
/*
* If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
* beyond i_size's notion of EOF, which fallocate has committed to reserving:
diff --git a/mm/shmem.c b/mm/shmem.c
index 1d5036dec08a..05c3db840257 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1292,7 +1292,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
loff_t newsize = attr->ia_size;
/* protected by i_rwsem */
- if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
+ if ((info->flags & SHMEM_F_MAPPING_FROZEN) ||
+ (newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
(newsize > oldsize && (info->seals & F_SEAL_GROW)))
return -EPERM;
@@ -3289,6 +3290,10 @@ shmem_write_begin(const struct kiocb *iocb, struct address_space *mapping,
return -EPERM;
}
+ if (unlikely((info->flags & SHMEM_F_MAPPING_FROZEN) &&
+ pos + len > inode->i_size))
+ return -EPERM;
+
ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE);
if (ret)
return ret;
@@ -3662,6 +3667,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
inode_lock(inode);
+ if (info->flags & SHMEM_F_MAPPING_FROZEN) {
+ error = -EPERM;
+ goto out;
+ }
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
struct address_space *mapping = file->f_mapping;
loff_t unmap_start = round_up(offset, PAGE_SIZE);
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 13/20] mm: shmem: export some functions to internal.h
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (11 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 12/20] mm: shmem: allow freezing inode mapping Pasha Tatashin
@ 2025-11-15 23:33 ` Pasha Tatashin
2025-11-17 10:14 ` Mike Rapoport
2025-11-15 23:34 ` [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state Pasha Tatashin
` (6 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <ptyadav@amazon.de>
shmem_inode_acct_blocks(), shmem_recalc_inode(), and
shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
functionality will also be used in the future by Live Update
Orchestrator (LUO) to recreate memfd files after a live update.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
mm/internal.h | 6 ++++++
mm/shmem.c | 10 +++++-----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..4ba155524f80 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1562,6 +1562,12 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid);
unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
int priority);
+int shmem_add_to_page_cache(struct folio *folio,
+ struct address_space *mapping,
+ pgoff_t index, void *expected, gfp_t gfp);
+int shmem_inode_acct_blocks(struct inode *inode, long pages);
+bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped);
+
#ifdef CONFIG_SHRINKER_DEBUG
static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
struct shrinker *shrinker, const char *fmt, va_list ap)
diff --git a/mm/shmem.c b/mm/shmem.c
index 05c3db840257..c3dc4af59c14 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -219,7 +219,7 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
}
-static int shmem_inode_acct_blocks(struct inode *inode, long pages)
+int shmem_inode_acct_blocks(struct inode *inode, long pages)
{
struct shmem_inode_info *info = SHMEM_I(inode);
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
@@ -435,7 +435,7 @@ static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
*
* Return: true if swapped was incremented from 0, for shmem_writeout().
*/
-static bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
+bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
{
struct shmem_inode_info *info = SHMEM_I(inode);
bool first_swapped = false;
@@ -861,9 +861,9 @@ static void shmem_update_stats(struct folio *folio, int nr_pages)
/*
* Somewhat like filemap_add_folio, but error if expected item has gone.
*/
-static int shmem_add_to_page_cache(struct folio *folio,
- struct address_space *mapping,
- pgoff_t index, void *expected, gfp_t gfp)
+int shmem_add_to_page_cache(struct folio *folio,
+ struct address_space *mapping,
+ pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
unsigned long nr = folio_nr_pages(folio);
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (12 preceding siblings ...)
2025-11-15 23:33 ` [PATCH v6 13/20] mm: shmem: export some functions to internal.h Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-17 10:15 ` Mike Rapoport
2025-11-15 23:34 ` [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd Pasha Tatashin
` (5 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <pratyush@kernel.org>
Currently file handlers only get the serialized_data field to store
their state. This field has a pointer to the serialized state of the
file, and it becomes a part of LUO file's serialized state.
File handlers can also need some runtime state to track information that
shouldn't make it in the serialized data.
One such example is a vmalloc pointer. While kho_preserve_vmalloc()
preserves the memory backing a vmalloc allocation, it does not store the
original vmap pointer, since that has no use being passed to the next
kernel. The pointer is needed to free the memory in case the file is
unpreserved.
Provide a private field in struct luo_file and pass it to all the
callbacks. The field's can be set by preserve, and must be freed by
unpreserve.
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate.h | 5 +++++
kernel/liveupdate/luo_file.c | 9 +++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
index 36a831ae3ead..defc69a1985d 100644
--- a/include/linux/liveupdate.h
+++ b/include/linux/liveupdate.h
@@ -29,6 +29,10 @@ struct file;
* this to the file being operated on.
* @serialized_data: The opaque u64 handle, preserve/prepare/freeze may update
* this field.
+ * @private_data: Private data for the file used to hold runtime state that
+ * is not preserved. Set by the handler's .preserve()
+ * callback, and must be freed in the handler's
+ * .unpreserve() callback.
*
* This structure bundles all parameters for the file operation callbacks.
* The 'data' and 'file' fields are used for both input and output.
@@ -39,6 +43,7 @@ struct liveupdate_file_op_args {
bool retrieved;
struct file *file;
u64 serialized_data;
+ void *private_data;
};
/**
diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
index 3d3bd84cb281..df337c9c4f21 100644
--- a/kernel/liveupdate/luo_file.c
+++ b/kernel/liveupdate/luo_file.c
@@ -126,6 +126,10 @@ static LIST_HEAD(luo_file_handler_list);
* This handle is passed back to the handler's .freeze(),
* .retrieve(), and .finish() callbacks, allowing it to track
* and update its serialized state across phases.
+ * @private_data: Pointer to the private data for the file used to hold runtime
+ * state that is not preserved. Set by the handler's .preserve()
+ * callback, and must be freed in the handler's .unpreserve()
+ * callback.
* @retrieved: A flag indicating whether a user/kernel in the new kernel has
* successfully called retrieve() on this file. This prevents
* multiple retrieval attempts.
@@ -152,6 +156,7 @@ struct luo_file {
struct liveupdate_file_handler *fh;
struct file *file;
u64 serialized_data;
+ void *private_data;
bool retrieved;
struct mutex mutex;
struct list_head list;
@@ -309,6 +314,7 @@ int luo_preserve_file(struct luo_session *session, u64 token, int fd)
goto exit_err;
} else {
luo_file->serialized_data = args.serialized_data;
+ luo_file->private_data = args.private_data;
list_add_tail(&luo_file->list, &session->files_list);
session->count++;
}
@@ -356,6 +362,7 @@ void luo_file_unpreserve_files(struct luo_session *session)
args.session = (struct liveupdate_session *)session;
args.file = luo_file->file;
args.serialized_data = luo_file->serialized_data;
+ args.private_data = luo_file->private_data;
luo_file->fh->ops->unpreserve(&args);
luo_flb_file_unpreserve(luo_file->fh);
@@ -384,6 +391,7 @@ static int luo_file_freeze_one(struct luo_session *session,
args.session = (struct liveupdate_session *)session;
args.file = luo_file->file;
args.serialized_data = luo_file->serialized_data;
+ args.private_data = luo_file->private_data;
err = luo_file->fh->ops->freeze(&args);
if (!err)
@@ -405,6 +413,7 @@ static void luo_file_unfreeze_one(struct luo_session *session,
args.session = (struct liveupdate_session *)session;
args.file = luo_file->file;
args.serialized_data = luo_file->serialized_data;
+ args.private_data = luo_file->private_data;
luo_file->fh->ops->unfreeze(&args);
}
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (13 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-17 11:03 ` Mike Rapoport
2025-11-15 23:34 ` [PATCH v6 16/20] docs: add documentation for memfd preservation via LUO Pasha Tatashin
` (4 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <ptyadav@amazon.de>
The ability to preserve a memfd allows userspace to use KHO and LUO to
transfer its memory contents to the next kernel. This is useful in many
ways. For one, it can be used with IOMMUFD as the backing store for
IOMMU page tables. Preserving IOMMUFD is essential for performing a
hypervisor live update with passthrough devices. memfd support provides
the first building block for making that possible.
For another, applications with a large amount of memory that takes time
to reconstruct, reboots to consume kernel upgrades can be very
expensive. memfd with LUO gives those applications reboot-persistent
memory that they can use to quickly save and reconstruct that state.
While memfd is backed by either hugetlbfs or shmem, currently only
support on shmem is added. To be more precise, support for anonymous
shmem files is added.
The handover to the next kernel is not transparent. All the properties
of the file are not preserved; only its memory contents, position, and
size. The recreated file gets the UID and GID of the task doing the
restore, and the task's cgroup gets charged with the memory.
Once preserved, the file cannot grow or shrink, and all its pages are
pinned to avoid migrations and swapping. The file can still be read from
or written to.
Use vmalloc to get the buffer to hold the folios, and preserve
it using kho_preserve_vmalloc(). This doesn't have the size limit.
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
---
MAINTAINERS | 2 +
include/linux/liveupdate/abi/memfd.h | 88 ++++
mm/Makefile | 1 +
mm/memfd_luo.c | 671 +++++++++++++++++++++++++++
4 files changed, 762 insertions(+)
create mode 100644 include/linux/liveupdate/abi/memfd.h
create mode 100644 mm/memfd_luo.c
diff --git a/MAINTAINERS b/MAINTAINERS
index bc9f5c6f0e80..ad9fee6dc605 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14466,6 +14466,7 @@ F: tools/testing/selftests/livepatch/
LIVE UPDATE
M: Pasha Tatashin <pasha.tatashin@soleen.com>
+R: Pratyush Yadav <pratyush@kernel.org>
L: linux-kernel@vger.kernel.org
S: Maintained
F: Documentation/core-api/liveupdate.rst
@@ -14474,6 +14475,7 @@ F: include/linux/liveupdate.h
F: include/linux/liveupdate/
F: include/uapi/linux/liveupdate.h
F: kernel/liveupdate/
+F: mm/memfd_luo.c
LLC (802.2)
L: netdev@vger.kernel.org
diff --git a/include/linux/liveupdate/abi/memfd.h b/include/linux/liveupdate/abi/memfd.h
new file mode 100644
index 000000000000..bf848e5bd1de
--- /dev/null
+++ b/include/linux/liveupdate/abi/memfd.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ *
+ * Copyright (C) 2025 Amazon.com Inc. or its affiliates.
+ * Pratyush Yadav <ptyadav@amazon.de>
+ */
+
+#ifndef _LINUX_LIVEUPDATE_ABI_MEMFD_H
+#define _LINUX_LIVEUPDATE_ABI_MEMFD_H
+
+/**
+ * DOC: memfd Live Update ABI
+ *
+ * This header defines the ABI for preserving the state of a memfd across a
+ * kexec reboot using the LUO.
+ *
+ * The state is serialized into a Flattened Device Tree which is then handed
+ * over to the next kernel via the KHO mechanism. The FDT is passed as the
+ * opaque `data` handle in the file handler callbacks.
+ *
+ * This interface is a contract. Any modification to the FDT structure,
+ * node properties, compatible string, or the layout of the serialization
+ * structures defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the MEMFD_LUO_FH_COMPATIBLE string.
+ *
+ * FDT Structure Overview:
+ * The memfd state is contained within a single FDT with the following layout:
+ *
+ * .. code-block:: none
+ *
+ * / {
+ * pos = <...>;
+ * size = <...>;
+ * nr_folios = <...>;
+ * folios = < ... binary data ... >;
+ * };
+ *
+ * Node Properties:
+ * - pos: u64
+ * The file's current position (f_pos).
+ * - size: u64
+ * The total size of the file in bytes (i_size).
+ * - nr_folios: u64
+ * Number of folios in folios array. Only present when size > 0.
+ * - folios: struct kho_vmalloc
+ * KHO vmalloc preservation for an array of &struct memfd_luo_folio_ser,
+ * one for each preserved folio from the original file's mapping. Only
+ * present when size > 0.
+ */
+
+/**
+ * struct memfd_luo_folio_ser - Serialized state of a single folio.
+ * @foliodesc: A packed 64-bit value containing both the PFN and status flags of
+ * the preserved folio. The upper 52 bits store the PFN, and the
+ * lower 12 bits are reserved for flags (e.g., dirty, uptodate).
+ * @index: The page offset (pgoff_t) of the folio within the original file's
+ * address space. This is used to correctly position the folio
+ * during restoration.
+ *
+ * This structure represents the minimal information required to restore a
+ * single folio in the new kernel. An array of these structs forms the binary
+ * data for the "folios" property in the handover FDT.
+ */
+struct memfd_luo_folio_ser {
+ u64 foliodesc;
+ u64 index;
+};
+
+/* The strings used for memfd KHO FDT sub-tree. */
+
+/* 64-bit pos value for the preserved memfd */
+#define MEMFD_FDT_POS "pos"
+
+/* 64-bit size value of the preserved memfd */
+#define MEMFD_FDT_SIZE "size"
+
+#define MEMFD_FDT_FOLIOS "folios"
+
+/* Number of folios in the folios array. */
+#define MEMFD_FDT_NR_FOLIOS "nr_folios"
+
+/* The compatibility string for memfd file handler */
+#define MEMFD_LUO_FH_COMPATIBLE "memfd-v1"
+
+#endif /* _LINUX_LIVEUPDATE_ABI_MEMFD_H */
diff --git a/mm/Makefile b/mm/Makefile
index 21abb3353550..7738ec416f00 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_NUMA) += memory-tiers.o
obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
+obj-$(CONFIG_LIVEUPDATE) += memfd_luo.o
obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o
obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
ifdef CONFIG_SWAP
diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c
new file mode 100644
index 000000000000..4c1d16db2cff
--- /dev/null
+++ b/mm/memfd_luo.c
@@ -0,0 +1,671 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ *
+ * Copyright (C) 2025 Amazon.com Inc. or its affiliates.
+ * Pratyush Yadav <ptyadav@amazon.de>
+ */
+
+/**
+ * DOC: Memfd Preservation via LUO
+ *
+ * Overview
+ * ========
+ *
+ * Memory file descriptors (memfd) can be preserved over a kexec using the Live
+ * Update Orchestrator (LUO) file preservation. This allows userspace to
+ * transfer its memory contents to the next kernel after a kexec.
+ *
+ * The preservation is not intended to be transparent. Only select properties of
+ * the file are preserved. All others are reset to default. The preserved
+ * properties are described below.
+ *
+ * .. note::
+ * The LUO API is not stabilized yet, so the preserved properties of a memfd
+ * are also not stable and are subject to backwards incompatible changes.
+ *
+ * .. note::
+ * Currently a memfd backed by Hugetlb is not supported. Memfds created
+ * with ``MFD_HUGETLB`` will be rejected.
+ *
+ * Preserved Properties
+ * ====================
+ *
+ * The following properties of the memfd are preserved across kexec:
+ *
+ * File Contents
+ * All data stored in the file is preserved.
+ *
+ * File Size
+ * The size of the file is preserved. Holes in the file are filled by
+ * allocating pages for them during preservation.
+ *
+ * File Position
+ * The current file position is preserved, allowing applications to continue
+ * reading/writing from their last position.
+ *
+ * File Status Flags
+ * memfds are always opened with ``O_RDWR`` and ``O_LARGEFILE``. This property
+ * is maintained.
+ *
+ * Non-Preserved Properties
+ * ========================
+ *
+ * All properties which are not preserved must be assumed to be reset to
+ * default. This section describes some of those properties which may be more of
+ * note.
+ *
+ * ``FD_CLOEXEC`` flag
+ * A memfd can be created with the ``MFD_CLOEXEC`` flag that sets the
+ * ``FD_CLOEXEC`` on the file. This flag is not preserved and must be set
+ * again after restore via ``fcntl()``.
+ *
+ * Seals
+ * File seals are not preserved. The file is unsealed on restore and if
+ * needed, must be sealed again via ``fcntl()``.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/bits.h>
+#include <linux/err.h>
+#include <linux/file.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/libfdt.h>
+#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/memfd.h>
+#include <linux/shmem_fs.h>
+#include <linux/vmalloc.h>
+#include "internal.h"
+
+#define PRESERVED_PFN_MASK GENMASK(63, 12)
+#define PRESERVED_PFN_SHIFT 12
+#define PRESERVED_FLAG_DIRTY BIT(0)
+#define PRESERVED_FLAG_UPTODATE BIT(1)
+
+#define PRESERVED_FOLIO_PFN(desc) (((desc) & PRESERVED_PFN_MASK) >> PRESERVED_PFN_SHIFT)
+#define PRESERVED_FOLIO_FLAGS(desc) ((desc) & ~PRESERVED_PFN_MASK)
+#define PRESERVED_FOLIO_MKDESC(pfn, flags) (((pfn) << PRESERVED_PFN_SHIFT) | (flags))
+
+struct memfd_luo_private {
+ struct memfd_luo_folio_ser *pfolios;
+ u64 nr_folios;
+};
+
+static struct memfd_luo_folio_ser *memfd_luo_preserve_folios(struct file *file, void *fdt,
+ u64 *nr_foliosp)
+{
+ struct inode *inode = file_inode(file);
+ struct memfd_luo_folio_ser *pfolios;
+ struct kho_vmalloc *kho_vmalloc;
+ unsigned int max_folios;
+ long i, size, nr_pinned;
+ struct folio **folios;
+ int err = -EINVAL;
+ pgoff_t offset;
+ u64 nr_folios;
+
+ size = i_size_read(inode);
+ /*
+ * If the file has zero size, then the folios and nr_folios properties
+ * are not set.
+ */
+ if (!size) {
+ *nr_foliosp = 0;
+ return NULL;
+ }
+
+ /*
+ * Guess the number of folios based on inode size. Real number might end
+ * up being smaller if there are higher order folios.
+ */
+ max_folios = PAGE_ALIGN(size) / PAGE_SIZE;
+ folios = kvmalloc_array(max_folios, sizeof(*folios), GFP_KERNEL);
+ if (!folios)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Pin the folios so they don't move around behind our back. This also
+ * ensures none of the folios are in CMA -- which ensures they don't
+ * fall in KHO scratch memory. It also moves swapped out folios back to
+ * memory.
+ *
+ * A side effect of doing this is that it allocates a folio for all
+ * indices in the file. This might waste memory on sparse memfds. If
+ * that is really a problem in the future, we can have a
+ * memfd_pin_folios() variant that does not allocate a page on empty
+ * slots.
+ */
+ nr_pinned = memfd_pin_folios(file, 0, size - 1, folios, max_folios,
+ &offset);
+ if (nr_pinned < 0) {
+ err = nr_pinned;
+ pr_err("failed to pin folios: %d\n", err);
+ goto err_free_folios;
+ }
+ nr_folios = nr_pinned;
+
+ err = fdt_property(fdt, MEMFD_FDT_NR_FOLIOS, &nr_folios, sizeof(nr_folios));
+ if (err)
+ goto err_unpin;
+
+ err = fdt_property_placeholder(fdt, MEMFD_FDT_FOLIOS, sizeof(*kho_vmalloc),
+ (void **)&kho_vmalloc);
+ if (err) {
+ pr_err("Failed to reserve '%s' property in FDT: %s\n",
+ MEMFD_FDT_FOLIOS, fdt_strerror(err));
+ err = -ENOMEM;
+ goto err_unpin;
+ }
+
+ pfolios = vcalloc(nr_folios, sizeof(*pfolios));
+ if (!pfolios) {
+ err = -ENOMEM;
+ goto err_unpin;
+ }
+
+ for (i = 0; i < nr_folios; i++) {
+ struct memfd_luo_folio_ser *pfolio = &pfolios[i];
+ struct folio *folio = folios[i];
+ unsigned int flags = 0;
+ unsigned long pfn;
+
+ err = kho_preserve_folio(folio);
+ if (err)
+ goto err_unpreserve;
+
+ pfn = folio_pfn(folio);
+ if (folio_test_dirty(folio))
+ flags |= PRESERVED_FLAG_DIRTY;
+ if (folio_test_uptodate(folio))
+ flags |= PRESERVED_FLAG_UPTODATE;
+
+ pfolio->foliodesc = PRESERVED_FOLIO_MKDESC(pfn, flags);
+ pfolio->index = folio->index;
+ }
+
+ err = kho_preserve_vmalloc(pfolios, kho_vmalloc);
+ if (err)
+ goto err_unpreserve;
+
+ kvfree(folios);
+ *nr_foliosp = nr_folios;
+ return pfolios;
+
+err_unpreserve:
+ i--;
+ for (; i >= 0; i--)
+ kho_unpreserve_folio(folios[i]);
+ vfree(pfolios);
+err_unpin:
+ unpin_folios(folios, nr_folios);
+err_free_folios:
+ kvfree(folios);
+ return ERR_PTR(err);
+}
+
+static void memfd_luo_unpreserve_folios(void *fdt, struct memfd_luo_folio_ser *pfolios,
+ u64 nr_folios)
+{
+ struct kho_vmalloc *kho_vmalloc;
+ long i;
+
+ if (!nr_folios)
+ return;
+
+ kho_vmalloc = (struct kho_vmalloc *)fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, NULL);
+ /* The FDT was created by this kernel so expect it to be sane. */
+ WARN_ON_ONCE(!kho_vmalloc);
+ kho_unpreserve_vmalloc(kho_vmalloc);
+
+ for (i = 0; i < nr_folios; i++) {
+ const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
+ struct folio *folio;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+
+ kho_unpreserve_folio(folio);
+ unpin_folio(folio);
+ }
+
+ vfree(pfolios);
+}
+
+static struct memfd_luo_folio_ser *memfd_luo_fdt_folios(const void *fdt, u64 *nr_folios)
+{
+ const struct kho_vmalloc *kho_vmalloc;
+ struct memfd_luo_folio_ser *pfolios;
+ const u64 *nr;
+ int len;
+
+ nr = fdt_getprop(fdt, 0, MEMFD_FDT_NR_FOLIOS, &len);
+ if (!nr || len != sizeof(*nr)) {
+ pr_err("invalid '%s' property\n", MEMFD_FDT_NR_FOLIOS);
+ return NULL;
+ }
+
+ kho_vmalloc = fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, &len);
+ if (!kho_vmalloc || len != sizeof(*kho_vmalloc)) {
+ pr_err("invalid '%s' property\n", MEMFD_FDT_FOLIOS);
+ return NULL;
+ }
+
+ pfolios = kho_restore_vmalloc(kho_vmalloc);
+ if (!pfolios)
+ return NULL;
+
+ *nr_folios = *nr;
+ return pfolios;
+}
+
+static void *memfd_luo_create_fdt(void)
+{
+ struct folio *fdt_folio;
+ int err = 0;
+ void *fdt;
+
+ /*
+ * The FDT only contains a couple of properties and a kho_vmalloc
+ * object. One page should be enough for that.
+ */
+ fdt_folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+ if (!fdt_folio)
+ return NULL;
+
+ fdt = folio_address(fdt_folio);
+
+ err |= fdt_create(fdt, folio_size(fdt_folio));
+ err |= fdt_finish_reservemap(fdt);
+ err |= fdt_begin_node(fdt, "");
+ if (err)
+ goto free;
+
+ return fdt;
+
+free:
+ folio_put(fdt_folio);
+ return NULL;
+}
+
+static int memfd_luo_finish_fdt(void *fdt)
+{
+ int err;
+
+ err = fdt_end_node(fdt);
+ if (err)
+ return err;
+
+ return fdt_finish(fdt);
+}
+
+static int memfd_luo_preserve(struct liveupdate_file_op_args *args)
+{
+ struct inode *inode = file_inode(args->file);
+ struct memfd_luo_folio_ser *pfolios;
+ struct memfd_luo_private *private;
+ u64 pos, nr_folios;
+ int err = 0;
+ void *fdt;
+ long size;
+
+ private = kmalloc(sizeof(*private), GFP_KERNEL);
+ if (!private)
+ return -ENOMEM;
+
+ inode_lock(inode);
+ shmem_i_mapping_freeze(inode, true);
+
+ size = i_size_read(inode);
+
+ fdt = memfd_luo_create_fdt();
+ if (!fdt) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
+ pos = args->file->f_pos;
+ err = fdt_property(fdt, MEMFD_FDT_POS, &pos, sizeof(pos));
+ if (err)
+ goto err_free_fdt;
+
+ err = fdt_property(fdt, MEMFD_FDT_SIZE, &size, sizeof(size));
+ if (err)
+ goto err_free_fdt;
+
+ pfolios = memfd_luo_preserve_folios(args->file, fdt, &nr_folios);
+ if (IS_ERR(pfolios)) {
+ err = PTR_ERR(pfolios);
+ goto err_free_fdt;
+ }
+
+ err = memfd_luo_finish_fdt(fdt);
+ if (err)
+ goto err_unpreserve_folios;
+
+ err = kho_preserve_folio(virt_to_folio(fdt));
+ if (err)
+ goto err_unpreserve_folios;
+
+ inode_unlock(inode);
+
+ private->pfolios = pfolios;
+ private->nr_folios = nr_folios;
+ args->private_data = private;
+ args->serialized_data = virt_to_phys(fdt);
+ return 0;
+
+err_unpreserve_folios:
+ memfd_luo_unpreserve_folios(fdt, pfolios, nr_folios);
+err_free_fdt:
+ folio_put(virt_to_folio(fdt));
+err_unlock:
+ shmem_i_mapping_freeze(inode, false);
+ inode_unlock(inode);
+ kfree(private);
+ return err;
+}
+
+static int memfd_luo_freeze(struct liveupdate_file_op_args *args)
+{
+ u64 pos = args->file->f_pos;
+ void *fdt;
+ int err;
+
+ if (WARN_ON_ONCE(!args->serialized_data))
+ return -EINVAL;
+
+ fdt = phys_to_virt(args->serialized_data);
+
+ /*
+ * The pos might have changed since prepare. Everything else stays the
+ * same.
+ */
+ err = fdt_setprop(fdt, 0, "pos", &pos, sizeof(pos));
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static void memfd_luo_unpreserve(struct liveupdate_file_op_args *args)
+{
+ struct memfd_luo_private *private = args->private_data;
+ struct inode *inode = file_inode(args->file);
+ struct folio *fdt_folio;
+ void *fdt;
+
+ if (WARN_ON_ONCE(!args->serialized_data || !args->private_data))
+ return;
+
+ inode_lock(inode);
+ shmem_i_mapping_freeze(inode, false);
+
+ fdt = phys_to_virt(args->serialized_data);
+ fdt_folio = virt_to_folio(fdt);
+
+ memfd_luo_unpreserve_folios(fdt, private->pfolios, private->nr_folios);
+
+ kho_unpreserve_folio(fdt_folio);
+ folio_put(fdt_folio);
+ inode_unlock(inode);
+ kfree(private);
+}
+
+static struct folio *memfd_luo_get_fdt(u64 data)
+{
+ return kho_restore_folio((phys_addr_t)data);
+}
+
+static void memfd_luo_discard_folios(const struct memfd_luo_folio_ser *pfolios,
+ long nr_folios)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr_folios; i++) {
+ const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
+ struct folio *folio;
+ phys_addr_t phys;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ folio = kho_restore_folio(phys);
+ if (!folio) {
+ pr_warn_ratelimited("Unable to restore folio at physical address: %llx\n",
+ phys);
+ continue;
+ }
+
+ folio_put(folio);
+ }
+}
+
+static void memfd_luo_finish(struct liveupdate_file_op_args *args)
+{
+ const struct memfd_luo_folio_ser *pfolios;
+ struct folio *fdt_folio;
+ const void *fdt;
+ u64 nr_folios;
+
+ if (args->retrieved)
+ return;
+
+ fdt_folio = memfd_luo_get_fdt(args->serialized_data);
+ if (!fdt_folio) {
+ pr_err("failed to restore memfd FDT\n");
+ return;
+ }
+
+ fdt = folio_address(fdt_folio);
+
+ pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
+ if (!pfolios)
+ goto out;
+
+ memfd_luo_discard_folios(pfolios, nr_folios);
+ vfree(pfolios);
+
+out:
+ folio_put(fdt_folio);
+}
+
+static int memfd_luo_retrieve_folios(struct file *file, const void *fdt)
+{
+ const struct memfd_luo_folio_ser *pfolios;
+ struct inode *inode = file_inode(file);
+ struct address_space *mapping;
+ struct folio *folio;
+ u64 nr_folios;
+ long i = 0;
+ int err;
+
+ /* Careful: folios don't exist in FDT on zero-size files. */
+ if (!inode->i_size)
+ return 0;
+
+ pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
+ if (!pfolios) {
+ pr_err("failed to fetch preserved folio list\n");
+ return -EINVAL;
+ }
+
+ inode = file->f_inode;
+ mapping = inode->i_mapping;
+
+ for (; i < nr_folios; i++) {
+ const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
+ phys_addr_t phys;
+ u64 index;
+ int flags;
+
+ if (!pfolio->foliodesc)
+ continue;
+
+ phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ folio = kho_restore_folio(phys);
+ if (!folio) {
+ pr_err("Unable to restore folio at physical address: %llx\n",
+ phys);
+ goto put_folios;
+ }
+ index = pfolio->index;
+ flags = PRESERVED_FOLIO_FLAGS(pfolio->foliodesc);
+
+ /* Set up the folio for insertion. */
+ __folio_set_locked(folio);
+ __folio_set_swapbacked(folio);
+
+ err = mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping));
+ if (err) {
+ pr_err("shmem: failed to charge folio index %ld: %d\n",
+ i, err);
+ goto unlock_folio;
+ }
+
+ err = shmem_add_to_page_cache(folio, mapping, index, NULL,
+ mapping_gfp_mask(mapping));
+ if (err) {
+ pr_err("shmem: failed to add to page cache folio index %ld: %d\n",
+ i, err);
+ goto unlock_folio;
+ }
+
+ if (flags & PRESERVED_FLAG_UPTODATE)
+ folio_mark_uptodate(folio);
+ if (flags & PRESERVED_FLAG_DIRTY)
+ folio_mark_dirty(folio);
+
+ err = shmem_inode_acct_blocks(inode, 1);
+ if (err) {
+ pr_err("shmem: failed to account folio index %ld: %d\n",
+ i, err);
+ goto unlock_folio;
+ }
+
+ shmem_recalc_inode(inode, 1, 0);
+ folio_add_lru(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ vfree(pfolios);
+ return 0;
+
+unlock_folio:
+ folio_unlock(folio);
+ folio_put(folio);
+ i++;
+put_folios:
+ /*
+ * Note: don't free the folios already added to the file. They will be
+ * freed when the file is freed. Free the ones not added yet here.
+ */
+ for (; i < nr_folios; i++) {
+ const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
+
+ folio = kho_restore_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
+ if (folio)
+ folio_put(folio);
+ }
+
+ vfree(pfolios);
+ return err;
+}
+
+static int memfd_luo_retrieve(struct liveupdate_file_op_args *args)
+{
+ struct folio *fdt_folio;
+ const u64 *pos, *size;
+ struct file *file;
+ int len, ret = 0;
+ const void *fdt;
+
+ fdt_folio = memfd_luo_get_fdt(args->serialized_data);
+ if (!fdt_folio)
+ return -ENOENT;
+
+ fdt = page_to_virt(folio_page(fdt_folio, 0));
+
+ size = fdt_getprop(fdt, 0, "size", &len);
+ if (!size || len != sizeof(u64)) {
+ pr_err("invalid 'size' property\n");
+ ret = -EINVAL;
+ goto put_fdt;
+ }
+
+ pos = fdt_getprop(fdt, 0, "pos", &len);
+ if (!pos || len != sizeof(u64)) {
+ pr_err("invalid 'pos' property\n");
+ ret = -EINVAL;
+ goto put_fdt;
+ }
+
+ file = shmem_file_setup("", 0, VM_NORESERVE);
+
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ pr_err("failed to setup file: %d\n", ret);
+ goto put_fdt;
+ }
+
+ vfs_setpos(file, *pos, MAX_LFS_FILESIZE);
+ file->f_inode->i_size = *size;
+
+ ret = memfd_luo_retrieve_folios(file, fdt);
+ if (ret)
+ goto put_file;
+
+ args->file = file;
+ folio_put(fdt_folio);
+ return 0;
+
+put_file:
+ fput(file);
+put_fdt:
+ folio_put(fdt_folio);
+ return ret;
+}
+
+static bool memfd_luo_can_preserve(struct liveupdate_file_handler *handler,
+ struct file *file)
+{
+ struct inode *inode = file_inode(file);
+
+ return shmem_file(file) && !inode->i_nlink;
+}
+
+static const struct liveupdate_file_ops memfd_luo_file_ops = {
+ .freeze = memfd_luo_freeze,
+ .finish = memfd_luo_finish,
+ .retrieve = memfd_luo_retrieve,
+ .preserve = memfd_luo_preserve,
+ .unpreserve = memfd_luo_unpreserve,
+ .can_preserve = memfd_luo_can_preserve,
+ .owner = THIS_MODULE,
+};
+
+static struct liveupdate_file_handler memfd_luo_handler = {
+ .ops = &memfd_luo_file_ops,
+ .compatible = MEMFD_LUO_FH_COMPATIBLE,
+};
+
+static int __init memfd_luo_init(void)
+{
+ int err = liveupdate_register_file_handler(&memfd_luo_handler);
+
+ if (err && err != -EOPNOTSUPP) {
+ pr_err("Could not register luo filesystem handler: %pe\n", ERR_PTR(err));
+
+ return err;
+ }
+
+ return 0;
+}
+late_initcall(memfd_luo_init);
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 16/20] docs: add documentation for memfd preservation via LUO
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (14 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests Pasha Tatashin
` (3 subsequent siblings)
19 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
From: Pratyush Yadav <ptyadav@amazon.de>
Add the documentation under the "Preserving file descriptors" section of
LUO's documentation.
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
Documentation/core-api/liveupdate.rst | 7 +++++++
Documentation/mm/index.rst | 1 +
Documentation/mm/memfd_preservation.rst | 23 +++++++++++++++++++++++
MAINTAINERS | 1 +
4 files changed, 32 insertions(+)
create mode 100644 Documentation/mm/memfd_preservation.rst
diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst
index deacc098d024..384de79a2457 100644
--- a/Documentation/core-api/liveupdate.rst
+++ b/Documentation/core-api/liveupdate.rst
@@ -28,6 +28,13 @@ Live Update Orchestrator ABI
.. kernel-doc:: include/linux/liveupdate/abi/luo.h
:doc: Live Update Orchestrator ABI
+The following types of file descriptors can be preserved
+
+.. toctree::
+ :maxdepth: 1
+
+ ../mm/memfd_preservation
+
Public API
==========
.. kernel-doc:: include/linux/liveupdate.h
diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
index ba6a8872849b..7aa2a8886908 100644
--- a/Documentation/mm/index.rst
+++ b/Documentation/mm/index.rst
@@ -48,6 +48,7 @@ documentation, or deleted if it has served its purpose.
hugetlbfs_reserv
ksm
memory-model
+ memfd_preservation
mmu_notifier
multigen_lru
numa
diff --git a/Documentation/mm/memfd_preservation.rst b/Documentation/mm/memfd_preservation.rst
new file mode 100644
index 000000000000..4f09c3921893
--- /dev/null
+++ b/Documentation/mm/memfd_preservation.rst
@@ -0,0 +1,23 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+==========================
+Memfd Preservation via LUO
+==========================
+
+.. kernel-doc:: mm/memfd_luo.c
+ :doc: Memfd Preservation via LUO
+
+Memfd Preservation ABI
+======================
+
+.. kernel-doc:: include/linux/liveupdate/abi/memfd.h
+ :doc: DOC: memfd Live Update ABI
+
+.. kernel-doc:: include/linux/liveupdate/abi/memfd.h
+ :internal:
+
+See Also
+========
+
+- :doc:`/core-api/liveupdate`
+- :doc:`/core-api/kho/concepts`
diff --git a/MAINTAINERS b/MAINTAINERS
index ad9fee6dc605..6ffe4425adbf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14470,6 +14470,7 @@ R: Pratyush Yadav <pratyush@kernel.org>
L: linux-kernel@vger.kernel.org
S: Maintained
F: Documentation/core-api/liveupdate.rst
+F: Documentation/mm/memfd_preservation.rst
F: Documentation/userspace-api/liveupdate.rst
F: include/linux/liveupdate.h
F: include/linux/liveupdate/
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (15 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 16/20] docs: add documentation for memfd preservation via LUO Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-17 19:38 ` David Matlack
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
` (2 subsequent siblings)
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce a selftest suite for LUO. These tests validate the core
userspace-facing API provided by the /dev/liveupdate device and its
associated ioctls.
The suite covers fundamental device behavior, session management, and
the file preservation mechanism using memfd as a test case. This
provides regression testing for the LUO uAPI.
The following functionality is verified:
Device Access:
Basic open and close operations on /dev/liveupdate.
Enforcement of exclusive device access (verifying EBUSY on a
second open).
Session Management:
Successful creation of sessions with unique names.
Failure to create sessions with duplicate names.
File Preservation:
Preserving a single memfd and verifying its content remains
intact post-preservation.
Preserving multiple memfds within a single session, each with
unique data.
A complex scenario involving multiple sessions, each containing
a mix of empty and data-filled memfds.
Note: This test suite is limited to verifying the pre-kexec
functionality of LUO (e.g., session creation, file preservation).
The post-kexec restoration of resources is not covered, as the kselftest
framework does not currently support orchestrating a reboot and
continuing execution in the new kernel.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/liveupdate/.gitignore | 1 +
tools/testing/selftests/liveupdate/Makefile | 7 +
tools/testing/selftests/liveupdate/config | 5 +
.../testing/selftests/liveupdate/liveupdate.c | 348 ++++++++++++++++++
6 files changed, 363 insertions(+)
create mode 100644 tools/testing/selftests/liveupdate/.gitignore
create mode 100644 tools/testing/selftests/liveupdate/Makefile
create mode 100644 tools/testing/selftests/liveupdate/config
create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 6ffe4425adbf..5a1ed783de20 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14477,6 +14477,7 @@ F: include/linux/liveupdate/
F: include/uapi/linux/liveupdate.h
F: kernel/liveupdate/
F: mm/memfd_luo.c
+F: tools/testing/selftests/liveupdate/
LLC (802.2)
L: netdev@vger.kernel.org
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index c46ebdb9b8ef..56e44a98d6a5 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -54,6 +54,7 @@ TARGETS += kvm
TARGETS += landlock
TARGETS += lib
TARGETS += livepatch
+TARGETS += liveupdate
TARGETS += lkdtm
TARGETS += lsm
TARGETS += membarrier
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
new file mode 100644
index 000000000000..af6e773cf98f
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/.gitignore
@@ -0,0 +1 @@
+/liveupdate
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
new file mode 100644
index 000000000000..2a573c36016e
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+CFLAGS += -Wall -O2 -Wno-unused-function
+CFLAGS += $(KHDR_INCLUDES)
+
+TEST_GEN_PROGS += liveupdate
+
+include ../lib.mk
diff --git a/tools/testing/selftests/liveupdate/config b/tools/testing/selftests/liveupdate/config
new file mode 100644
index 000000000000..c0c7e7cc484e
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/config
@@ -0,0 +1,5 @@
+CONFIG_KEXEC_FILE=y
+CONFIG_KEXEC_HANDOVER=y
+CONFIG_KEXEC_HANDOVER_DEBUGFS=y
+CONFIG_KEXEC_HANDOVER_DEBUG=y
+CONFIG_LIVEUPDATE=y
diff --git a/tools/testing/selftests/liveupdate/liveupdate.c b/tools/testing/selftests/liveupdate/liveupdate.c
new file mode 100644
index 000000000000..c2878e3d5ef9
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/liveupdate.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/*
+ * Selftests for the Live Update Orchestrator.
+ * This test suite verifies the functionality and behavior of the
+ * /dev/liveupdate character device and its session management capabilities.
+ *
+ * Tests include:
+ * - Device access: basic open/close, and enforcement of exclusive access.
+ * - Session management: creation of unique sessions, and duplicate name detection.
+ * - Resource preservation: successfully preserving individual and multiple memfds,
+ * verifying contents remain accessible.
+ * - Complex multi-session scenarios involving mixed empty and populated files.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+
+#include <linux/liveupdate.h>
+
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#define LIVEUPDATE_DEV "/dev/liveupdate"
+
+FIXTURE(liveupdate_device) {
+ int fd1;
+ int fd2;
+};
+
+FIXTURE_SETUP(liveupdate_device)
+{
+ self->fd1 = -1;
+ self->fd2 = -1;
+}
+
+FIXTURE_TEARDOWN(liveupdate_device)
+{
+ if (self->fd1 >= 0)
+ close(self->fd1);
+ if (self->fd2 >= 0)
+ close(self->fd2);
+}
+
+/*
+ * Test Case: Basic Open and Close
+ *
+ * Verifies that the /dev/liveupdate device can be opened and subsequently
+ * closed without errors. Skips if the device does not exist.
+ */
+TEST_F(liveupdate_device, basic_open_close)
+{
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist.", LIVEUPDATE_DEV);
+
+ ASSERT_GE(self->fd1, 0);
+ ASSERT_EQ(close(self->fd1), 0);
+ self->fd1 = -1;
+}
+
+/*
+ * Test Case: Exclusive Open Enforcement
+ *
+ * Verifies that the /dev/liveupdate device can only be opened by one process
+ * at a time. It checks that a second attempt to open the device fails with
+ * the EBUSY error code.
+ */
+TEST_F(liveupdate_device, exclusive_open)
+{
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist.", LIVEUPDATE_DEV);
+
+ ASSERT_GE(self->fd1, 0);
+ self->fd2 = open(LIVEUPDATE_DEV, O_RDWR);
+ EXPECT_LT(self->fd2, 0);
+ EXPECT_EQ(errno, EBUSY);
+}
+
+/* Helper function to create a LUO session via ioctl. */
+static int create_session(int lu_fd, const char *name)
+{
+ struct liveupdate_ioctl_create_session args = {};
+
+ args.size = sizeof(args);
+ strncpy((char *)args.name, name, sizeof(args.name) - 1);
+
+ if (ioctl(lu_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &args))
+ return -errno;
+
+ return args.fd;
+}
+
+/*
+ * Test Case: Create Duplicate Session
+ *
+ * Verifies that attempting to create two sessions with the same name fails
+ * on the second attempt with EEXIST.
+ */
+TEST_F(liveupdate_device, create_duplicate_session)
+{
+ int session_fd1, session_fd2;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd1 = create_session(self->fd1, "duplicate-session-test");
+ ASSERT_GE(session_fd1, 0);
+
+ session_fd2 = create_session(self->fd1, "duplicate-session-test");
+ EXPECT_LT(session_fd2, 0);
+ EXPECT_EQ(-session_fd2, EEXIST);
+
+ ASSERT_EQ(close(session_fd1), 0);
+}
+
+/*
+ * Test Case: Create Distinct Sessions
+ *
+ * Verifies that creating two sessions with different names succeeds.
+ */
+TEST_F(liveupdate_device, create_distinct_sessions)
+{
+ int session_fd1, session_fd2;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd1 = create_session(self->fd1, "distinct-session-1");
+ ASSERT_GE(session_fd1, 0);
+
+ session_fd2 = create_session(self->fd1, "distinct-session-2");
+ ASSERT_GE(session_fd2, 0);
+
+ ASSERT_EQ(close(session_fd1), 0);
+ ASSERT_EQ(close(session_fd2), 0);
+}
+
+static int preserve_fd(int session_fd, int fd_to_preserve, __u64 token)
+{
+ struct liveupdate_session_preserve_fd args = {};
+
+ args.size = sizeof(args);
+ args.fd = fd_to_preserve;
+ args.token = token;
+
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &args))
+ return -errno;
+
+ return 0;
+}
+
+/*
+ * Test Case: Preserve MemFD
+ *
+ * Verifies that a valid memfd can be successfully preserved in a session and
+ * that its contents remain intact after the preservation call.
+ */
+TEST_F(liveupdate_device, preserve_memfd)
+{
+ const char *test_str = "hello liveupdate";
+ char read_buf[64] = {};
+ int session_fd, mem_fd;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd = create_session(self->fd1, "preserve-memfd-test");
+ ASSERT_GE(session_fd, 0);
+
+ mem_fd = memfd_create("test-memfd", 0);
+ ASSERT_GE(mem_fd, 0);
+
+ ASSERT_EQ(write(mem_fd, test_str, strlen(test_str)), strlen(test_str));
+ ASSERT_EQ(preserve_fd(session_fd, mem_fd, 0x1234), 0);
+ ASSERT_EQ(close(session_fd), 0);
+
+ ASSERT_EQ(lseek(mem_fd, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd, read_buf, sizeof(read_buf)), strlen(test_str));
+ ASSERT_STREQ(read_buf, test_str);
+ ASSERT_EQ(close(mem_fd), 0);
+}
+
+/*
+ * Test Case: Preserve Multiple MemFDs
+ *
+ * Verifies that multiple memfds can be preserved in a single session,
+ * each with a unique token, and that their contents remain distinct and
+ * correct after preservation.
+ */
+TEST_F(liveupdate_device, preserve_multiple_memfds)
+{
+ const char *test_str1 = "data for memfd one";
+ const char *test_str2 = "data for memfd two";
+ char read_buf[64] = {};
+ int session_fd, mem_fd1, mem_fd2;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd = create_session(self->fd1, "preserve-multi-memfd-test");
+ ASSERT_GE(session_fd, 0);
+
+ mem_fd1 = memfd_create("test-memfd-1", 0);
+ ASSERT_GE(mem_fd1, 0);
+ mem_fd2 = memfd_create("test-memfd-2", 0);
+ ASSERT_GE(mem_fd2, 0);
+
+ ASSERT_EQ(write(mem_fd1, test_str1, strlen(test_str1)), strlen(test_str1));
+ ASSERT_EQ(write(mem_fd2, test_str2, strlen(test_str2)), strlen(test_str2));
+
+ ASSERT_EQ(preserve_fd(session_fd, mem_fd1, 0xAAAA), 0);
+ ASSERT_EQ(preserve_fd(session_fd, mem_fd2, 0xBBBB), 0);
+
+ memset(read_buf, 0, sizeof(read_buf));
+ ASSERT_EQ(lseek(mem_fd1, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd1, read_buf, sizeof(read_buf)), strlen(test_str1));
+ ASSERT_STREQ(read_buf, test_str1);
+
+ memset(read_buf, 0, sizeof(read_buf));
+ ASSERT_EQ(lseek(mem_fd2, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd2, read_buf, sizeof(read_buf)), strlen(test_str2));
+ ASSERT_STREQ(read_buf, test_str2);
+
+ ASSERT_EQ(close(mem_fd1), 0);
+ ASSERT_EQ(close(mem_fd2), 0);
+ ASSERT_EQ(close(session_fd), 0);
+}
+
+/*
+ * Test Case: Preserve Complex Scenario
+ *
+ * Verifies a more complex scenario with multiple sessions and a mix of empty
+ * and non-empty memfds distributed across them.
+ */
+TEST_F(liveupdate_device, preserve_complex_scenario)
+{
+ const char *data1 = "data for session 1";
+ const char *data2 = "data for session 2";
+ char read_buf[64] = {};
+ int session_fd1, session_fd2;
+ int mem_fd_data1, mem_fd_empty1, mem_fd_data2, mem_fd_empty2;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd1 = create_session(self->fd1, "complex-session-1");
+ ASSERT_GE(session_fd1, 0);
+ session_fd2 = create_session(self->fd1, "complex-session-2");
+ ASSERT_GE(session_fd2, 0);
+
+ mem_fd_data1 = memfd_create("data1", 0);
+ ASSERT_GE(mem_fd_data1, 0);
+ ASSERT_EQ(write(mem_fd_data1, data1, strlen(data1)), strlen(data1));
+
+ mem_fd_empty1 = memfd_create("empty1", 0);
+ ASSERT_GE(mem_fd_empty1, 0);
+
+ mem_fd_data2 = memfd_create("data2", 0);
+ ASSERT_GE(mem_fd_data2, 0);
+ ASSERT_EQ(write(mem_fd_data2, data2, strlen(data2)), strlen(data2));
+
+ mem_fd_empty2 = memfd_create("empty2", 0);
+ ASSERT_GE(mem_fd_empty2, 0);
+
+ ASSERT_EQ(preserve_fd(session_fd1, mem_fd_data1, 0x1111), 0);
+ ASSERT_EQ(preserve_fd(session_fd1, mem_fd_empty1, 0x2222), 0);
+ ASSERT_EQ(preserve_fd(session_fd2, mem_fd_data2, 0x3333), 0);
+ ASSERT_EQ(preserve_fd(session_fd2, mem_fd_empty2, 0x4444), 0);
+
+ ASSERT_EQ(lseek(mem_fd_data1, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd_data1, read_buf, sizeof(read_buf)), strlen(data1));
+ ASSERT_STREQ(read_buf, data1);
+
+ memset(read_buf, 0, sizeof(read_buf));
+ ASSERT_EQ(lseek(mem_fd_data2, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd_data2, read_buf, sizeof(read_buf)), strlen(data2));
+ ASSERT_STREQ(read_buf, data2);
+
+ ASSERT_EQ(lseek(mem_fd_empty1, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd_empty1, read_buf, sizeof(read_buf)), 0);
+
+ ASSERT_EQ(lseek(mem_fd_empty2, 0, SEEK_SET), 0);
+ ASSERT_EQ(read(mem_fd_empty2, read_buf, sizeof(read_buf)), 0);
+
+ ASSERT_EQ(close(mem_fd_data1), 0);
+ ASSERT_EQ(close(mem_fd_empty1), 0);
+ ASSERT_EQ(close(mem_fd_data2), 0);
+ ASSERT_EQ(close(mem_fd_empty2), 0);
+ ASSERT_EQ(close(session_fd1), 0);
+ ASSERT_EQ(close(session_fd2), 0);
+}
+
+/*
+ * Test Case: Preserve Unsupported File Descriptor
+ *
+ * Verifies that attempting to preserve a file descriptor that does not have
+ * a registered Live Update handler fails gracefully.
+ * Uses /dev/null as a representative of a file type (character device)
+ * that is not supported by the orchestrator.
+ */
+TEST_F(liveupdate_device, preserve_unsupported_fd)
+{
+ int session_fd, unsupported_fd;
+ int ret;
+
+ self->fd1 = open(LIVEUPDATE_DEV, O_RDWR);
+ if (self->fd1 < 0 && errno == ENOENT)
+ SKIP(return, "%s does not exist", LIVEUPDATE_DEV);
+ ASSERT_GE(self->fd1, 0);
+
+ session_fd = create_session(self->fd1, "unsupported-fd-test");
+ ASSERT_GE(session_fd, 0);
+
+ unsupported_fd = open("/dev/null", O_RDWR);
+ ASSERT_GE(unsupported_fd, 0);
+
+ ret = preserve_fd(session_fd, unsupported_fd, 0xDEAD);
+ EXPECT_EQ(ret, -ENOENT);
+
+ ASSERT_EQ(close(unsupported_fd), 0);
+ ASSERT_EQ(close(session_fd), 0);
+}
+
+TEST_HARNESS_MAIN
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (16 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-16 18:53 ` Zhu Yanjun
` (3 more replies)
2025-11-15 23:34 ` [PATCH v6 19/20] selftests/liveupdate: Add kexec test for multiple and empty sessions Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test Pasha Tatashin
19 siblings, 4 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce a kexec-based selftest, luo_kexec_simple, to validate the
end-to-end lifecycle of a Live Update Orchestrator (LUO) session across
a reboot.
While existing tests verify the uAPI in a pre-reboot context, this test
ensures that the core functionality—preserving state via Kexec Handover
and restoring it in a new kernel—works as expected.
The test operates in two stages, managing its state across the reboot by
preserving a dedicated "state session" containing a memfd. This
mechanism dogfoods the LUO feature itself for state tracking, making the
test self-contained.
The test validates the following sequence:
Stage 1 (Pre-kexec):
- Creates a test session (test-session).
- Creates and preserves a memfd with a known data pattern into the test
session.
- Creates the state-tracking session to signal progression to Stage 2.
- Executes a kexec reboot via a helper script.
Stage 2 (Post-kexec):
- Retrieves the state-tracking session to confirm it is in the
post-reboot stage.
- Retrieves the preserved test session.
- Restores the memfd from the test session and verifies its contents
match the original data pattern written in Stage 1.
- Finalizes both the test and state sessions to ensure a clean
teardown.
The test relies on a helper script (do_kexec.sh) to perform the reboot
and a shared utility library (luo_test_utils.c) for common LUO
operations, keeping the main test logic clean and focused.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/.gitignore | 1 +
tools/testing/selftests/liveupdate/Makefile | 32 ++++
.../testing/selftests/liveupdate/do_kexec.sh | 16 ++
.../selftests/liveupdate/luo_kexec_simple.c | 114 ++++++++++++
.../selftests/liveupdate/luo_test_utils.c | 168 ++++++++++++++++++
.../selftests/liveupdate/luo_test_utils.h | 39 ++++
6 files changed, 370 insertions(+)
create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
create mode 100644 tools/testing/selftests/liveupdate/luo_kexec_simple.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
index af6e773cf98f..daeef116174d 100644
--- a/tools/testing/selftests/liveupdate/.gitignore
+++ b/tools/testing/selftests/liveupdate/.gitignore
@@ -1 +1,2 @@
/liveupdate
+/luo_kexec_simple
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 2a573c36016e..1563ac84006a 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -1,7 +1,39 @@
# SPDX-License-Identifier: GPL-2.0-only
+
+KHDR_INCLUDES ?= -I../../../../usr/include
CFLAGS += -Wall -O2 -Wno-unused-function
CFLAGS += $(KHDR_INCLUDES)
+LDFLAGS += -static
+OUTPUT ?= .
+
+# --- Test Configuration (Edit this section when adding new tests) ---
+LUO_SHARED_SRCS := luo_test_utils.c
+LUO_SHARED_HDRS += luo_test_utils.h
+
+LUO_MANUAL_TESTS += luo_kexec_simple
+
+TEST_FILES += do_kexec.sh
TEST_GEN_PROGS += liveupdate
+# --- Automatic Rule Generation (Do not edit below) ---
+
+TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
+
+# Define the full list of sources for each manual test.
+$(foreach test,$(LUO_MANUAL_TESTS), \
+ $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
+
+# This loop automatically generates an explicit build rule for each manual test.
+# It includes dependencies on the shared headers and makes the output
+# executable.
+# Note the use of '$$' to escape automatic variables for the 'eval' command.
+$(foreach test,$(LUO_MANUAL_TESTS), \
+ $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
+ $(call msg,LINK,,$$@) ; \
+ $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
+ $(Q)chmod +x $$@ \
+ ) \
+)
+
include ../lib.mk
diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
new file mode 100755
index 000000000000..3c7c6cafbef8
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/do_kexec.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+set -e
+
+# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
+
+KERNEL="${KERNEL:-/boot/bzImage}"
+set -- -l -s --reuse-cmdline "$KERNEL"
+
+INITRAMFS="${INITRAMFS:-/boot/initramfs}"
+if [ -f "$INITRAMFS" ]; then
+ set -- "$@" --initrd="$INITRAMFS"
+fi
+
+kexec "$@"
+kexec -e
diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
new file mode 100644
index 000000000000..67ab6ebf9eec
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ *
+ * A simple selftest to validate the end-to-end lifecycle of a LUO session
+ * across a single kexec reboot.
+ */
+
+#include "luo_test_utils.h"
+
+/* Test-specific constants are now defined locally */
+#define KEXEC_SCRIPT "./do_kexec.sh"
+#define TEST_SESSION_NAME "test-session"
+#define TEST_MEMFD_TOKEN 0x1A
+#define TEST_MEMFD_DATA "hello kexec world"
+
+/* Constants for the state-tracking mechanism, specific to this test file. */
+#define STATE_SESSION_NAME "kexec_simple_state"
+#define STATE_MEMFD_TOKEN 999
+
+/* Stage 1: Executed before the kexec reboot. */
+static void run_stage_1(int luo_fd)
+{
+ int session_fd;
+
+ ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
+
+ ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
+ create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
+
+ ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
+ TEST_SESSION_NAME);
+ session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
+
+ if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
+ TEST_MEMFD_DATA) < 0) {
+ fail_exit("create_and_preserve_memfd for token %#x",
+ TEST_MEMFD_TOKEN);
+ }
+
+ ksft_print_msg("[STAGE 1] Executing kexec...\n");
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+ exit(EXIT_FAILURE);
+}
+
+/* Stage 2: Executed after the kexec reboot. */
+static void run_stage_2(int luo_fd, int state_session_fd)
+{
+ int session_fd, mfd, stage;
+
+ ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
+
+ restore_and_read_stage(state_session_fd, STATE_MEMFD_TOKEN, &stage);
+ if (stage != 2)
+ fail_exit("Expected stage 2, but state file contains %d", stage);
+
+ ksft_print_msg("[STAGE 2] Retrieving session '%s'...\n", TEST_SESSION_NAME);
+ session_fd = luo_retrieve_session(luo_fd, TEST_SESSION_NAME);
+ if (session_fd < 0)
+ fail_exit("luo_retrieve_session for '%s'", TEST_SESSION_NAME);
+
+ ksft_print_msg("[STAGE 2] Restoring and verifying memfd (token %#x)...\n",
+ TEST_MEMFD_TOKEN);
+ mfd = restore_and_verify_memfd(session_fd, TEST_MEMFD_TOKEN,
+ TEST_MEMFD_DATA);
+ if (mfd < 0)
+ fail_exit("restore_and_verify_memfd for token %#x", TEST_MEMFD_TOKEN);
+ close(mfd);
+
+ ksft_print_msg("[STAGE 2] Test data verified successfully.\n");
+ ksft_print_msg("[STAGE 2] Finalizing test session...\n");
+ if (luo_session_finish(session_fd) < 0)
+ fail_exit("luo_session_finish for test session");
+ close(session_fd);
+
+ ksft_print_msg("[STAGE 2] Finalizing state session...\n");
+ if (luo_session_finish(state_session_fd) < 0)
+ fail_exit("luo_session_finish for state session");
+ close(state_session_fd);
+
+ ksft_print_msg("\n--- SIMPLE KEXEC TEST PASSED ---\n");
+}
+
+int main(int argc, char *argv[])
+{
+ int luo_fd;
+ int state_session_fd;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0)
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+
+ /*
+ * Determine the stage by attempting to retrieve the state session.
+ * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
+ */
+ state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
+ if (state_session_fd == -ENOENT) {
+ run_stage_1(luo_fd);
+ } else if (state_session_fd >= 0) {
+ /* We got a valid handle, pass it directly to stage 2 */
+ run_stage_2(luo_fd, state_session_fd);
+ } else {
+ fail_exit("Failed to check for state session");
+ }
+
+ close(luo_fd);
+}
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c
new file mode 100644
index 000000000000..0a24105cbc54
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_test_utils.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <errno.h>
+#include <stdarg.h>
+
+#include "luo_test_utils.h"
+
+int luo_open_device(void)
+{
+ return open(LUO_DEVICE, O_RDWR);
+}
+
+int luo_create_session(int luo_fd, const char *name)
+{
+ struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) };
+
+ snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
+ LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
+
+ if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0)
+ return -errno;
+
+ return arg.fd;
+}
+
+int luo_retrieve_session(int luo_fd, const char *name)
+{
+ struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) };
+
+ snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
+ LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
+
+ if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0)
+ return -errno;
+
+ return arg.fd;
+}
+
+int create_and_preserve_memfd(int session_fd, int token, const char *data)
+{
+ struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) };
+ long page_size = sysconf(_SC_PAGE_SIZE);
+ void *map = MAP_FAILED;
+ int mfd = -1, ret = -1;
+
+ mfd = memfd_create("test_mfd", 0);
+ if (mfd < 0)
+ return -errno;
+
+ if (ftruncate(mfd, page_size) != 0)
+ goto out;
+
+ map = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, mfd, 0);
+ if (map == MAP_FAILED)
+ goto out;
+
+ snprintf(map, page_size, "%s", data);
+ munmap(map, page_size);
+
+ arg.fd = mfd;
+ arg.token = token;
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0)
+ goto out;
+
+ ret = 0;
+out:
+ if (ret != 0 && errno != 0)
+ ret = -errno;
+ if (mfd >= 0)
+ close(mfd);
+ return ret;
+}
+
+int restore_and_verify_memfd(int session_fd, int token,
+ const char *expected_data)
+{
+ struct liveupdate_session_retrieve_fd arg = { .size = sizeof(arg) };
+ long page_size = sysconf(_SC_PAGE_SIZE);
+ void *map = MAP_FAILED;
+ int mfd = -1, ret = -1;
+
+ arg.token = token;
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg) < 0)
+ return -errno;
+ mfd = arg.fd;
+
+ map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0);
+ if (map == MAP_FAILED)
+ goto out;
+
+ if (expected_data && strcmp(expected_data, map) != 0) {
+ ksft_print_msg("Data mismatch! Expected '%s', Got '%s'\n",
+ expected_data, (char *)map);
+ ret = -EINVAL;
+ goto out_munmap;
+ }
+
+ ret = mfd;
+out_munmap:
+ munmap(map, page_size);
+out:
+ if (ret < 0 && errno != 0)
+ ret = -errno;
+ if (ret < 0 && mfd >= 0)
+ close(mfd);
+ return ret;
+}
+
+int luo_session_finish(int session_fd)
+{
+ struct liveupdate_session_finish arg = { .size = sizeof(arg) };
+
+ if (ioctl(session_fd, LIVEUPDATE_SESSION_FINISH, &arg) < 0)
+ return -errno;
+
+ return 0;
+}
+
+void create_state_file(int luo_fd, const char *session_name, int token,
+ int next_stage)
+{
+ char buf[32];
+ int state_session_fd;
+
+ state_session_fd = luo_create_session(luo_fd, session_name);
+ if (state_session_fd < 0)
+ fail_exit("luo_create_session for state tracking");
+
+ snprintf(buf, sizeof(buf), "%d", next_stage);
+ if (create_and_preserve_memfd(state_session_fd, token, buf) < 0)
+ fail_exit("create_and_preserve_memfd for state tracking");
+
+ /*
+ * DO NOT close session FD, otherwise it is going to be unpreserved
+ */
+}
+
+void restore_and_read_stage(int state_session_fd, int token, int *stage)
+{
+ char buf[32] = {0};
+ int mfd;
+
+ mfd = restore_and_verify_memfd(state_session_fd, token, NULL);
+ if (mfd < 0)
+ fail_exit("failed to restore state memfd");
+
+ if (read(mfd, buf, sizeof(buf) - 1) < 0)
+ fail_exit("failed to read state mfd");
+
+ *stage = atoi(buf);
+
+ close(mfd);
+}
diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/luo_test_utils.h
new file mode 100644
index 000000000000..093e787b9f4b
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_test_utils.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ *
+ * Utility functions for LUO kselftests.
+ */
+
+#ifndef LUO_TEST_UTILS_H
+#define LUO_TEST_UTILS_H
+
+#include <errno.h>
+#include <string.h>
+#include <linux/liveupdate.h>
+#include "../kselftest.h"
+
+#define LUO_DEVICE "/dev/liveupdate"
+
+#define fail_exit(fmt, ...) \
+ ksft_exit_fail_msg("[%s:%d] " fmt " (errno: %s)\n", \
+ __func__, __LINE__, ##__VA_ARGS__, strerror(errno))
+
+/* Generic LUO and session management helpers */
+int luo_open_device(void);
+int luo_create_session(int luo_fd, const char *name);
+int luo_retrieve_session(int luo_fd, const char *name);
+int luo_session_finish(int session_fd);
+
+/* Generic file preservation and restoration helpers */
+int create_and_preserve_memfd(int session_fd, int token, const char *data);
+int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
+
+/* Kexec state-tracking helpers */
+void create_state_file(int luo_fd, const char *session_name, int token,
+ int next_stage);
+void restore_and_read_stage(int state_session_fd, int token, int *stage);
+
+#endif /* LUO_TEST_UTILS_H */
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 19/20] selftests/liveupdate: Add kexec test for multiple and empty sessions
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (17 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test Pasha Tatashin
19 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce a new kexec-based selftest, luo_kexec_multi_session, to
validate the end-to-end lifecycle of a more complex LUO scenario.
While the existing luo_kexec_simple test covers the basic end-to-end
lifecycle, it is limited to a single session with one preserved file.
This new test significantly expands coverage by verifying LUO's ability
to handle a mixed workload involving multiple sessions, some of which
are intentionally empty. This ensures that the LUO core correctly
preserves and restores the state of all session types across a reboot.
The test validates the following sequence:
Stage 1 (Pre-kexec):
- Creates two empty test sessions (multi-test-empty-1,
multi-test-empty-2).
- Creates a session with one preserved memfd (multi-test-files-1).
- Creates another session with two preserved memfds
(multi-test-files-2), each containing unique data.
- Creates a state-tracking session to manage the transition to
Stage 2.
- Executes a kexec reboot via the helper script.
Stage 2 (Post-kexec):
- Retrieves the state-tracking session to confirm it is in the
post-reboot stage.
- Retrieves all four test sessions (both the empty and non-empty
ones).
- For the non-empty sessions, restores the preserved memfds and
verifies their contents match the original data patterns.
- Finalizes all test sessions and the state session to ensure a clean
teardown and that all associated kernel resources are correctly
released.
This test provides greater confidence in the robustness of the LUO
framework by validating its behavior in a more realistic, multi-faceted
scenario.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
tools/testing/selftests/liveupdate/.gitignore | 1 +
tools/testing/selftests/liveupdate/Makefile | 1 +
.../selftests/liveupdate/luo_multi_session.c | 190 ++++++++++++++++++
3 files changed, 192 insertions(+)
create mode 100644 tools/testing/selftests/liveupdate/luo_multi_session.c
diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
index daeef116174d..42a15a8d5d9e 100644
--- a/tools/testing/selftests/liveupdate/.gitignore
+++ b/tools/testing/selftests/liveupdate/.gitignore
@@ -1,2 +1,3 @@
/liveupdate
/luo_kexec_simple
+/luo_multi_session
diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
index 1563ac84006a..6ee6efeec62d 100644
--- a/tools/testing/selftests/liveupdate/Makefile
+++ b/tools/testing/selftests/liveupdate/Makefile
@@ -11,6 +11,7 @@ LUO_SHARED_SRCS := luo_test_utils.c
LUO_SHARED_HDRS += luo_test_utils.h
LUO_MANUAL_TESTS += luo_kexec_simple
+LUO_MANUAL_TESTS += luo_multi_session
TEST_FILES += do_kexec.sh
diff --git a/tools/testing/selftests/liveupdate/luo_multi_session.c b/tools/testing/selftests/liveupdate/luo_multi_session.c
new file mode 100644
index 000000000000..c9955f1b6e97
--- /dev/null
+++ b/tools/testing/selftests/liveupdate/luo_multi_session.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ *
+ * A selftest to validate the end-to-end lifecycle of multiple LUO sessions
+ * across a kexec reboot, including empty sessions and sessions with multiple
+ * files.
+ */
+
+#include "luo_test_utils.h"
+
+#define KEXEC_SCRIPT "./do_kexec.sh"
+
+#define SESSION_EMPTY_1 "multi-test-empty-1"
+#define SESSION_EMPTY_2 "multi-test-empty-2"
+#define SESSION_FILES_1 "multi-test-files-1"
+#define SESSION_FILES_2 "multi-test-files-2"
+
+#define MFD1_TOKEN 0x1001
+#define MFD2_TOKEN 0x2002
+#define MFD3_TOKEN 0x3003
+
+#define MFD1_DATA "Data for session files 1"
+#define MFD2_DATA "First file for session files 2"
+#define MFD3_DATA "Second file for session files 2"
+
+#define STATE_SESSION_NAME "kexec_multi_state"
+#define STATE_MEMFD_TOKEN 998
+
+/* Stage 1: Executed before the kexec reboot. */
+static void run_stage_1(int luo_fd)
+{
+ int s_empty1_fd, s_empty2_fd, s_files1_fd, s_files2_fd;
+
+ ksft_print_msg("[STAGE 1] Starting pre-kexec setup for multi-session test...\n");
+
+ ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
+ create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
+
+ ksft_print_msg("[STAGE 1] Creating empty sessions '%s' and '%s'...\n",
+ SESSION_EMPTY_1, SESSION_EMPTY_2);
+ s_empty1_fd = luo_create_session(luo_fd, SESSION_EMPTY_1);
+ if (s_empty1_fd < 0)
+ fail_exit("luo_create_session for '%s'", SESSION_EMPTY_1);
+
+ s_empty2_fd = luo_create_session(luo_fd, SESSION_EMPTY_2);
+ if (s_empty2_fd < 0)
+ fail_exit("luo_create_session for '%s'", SESSION_EMPTY_2);
+
+ ksft_print_msg("[STAGE 1] Creating session '%s' with one memfd...\n",
+ SESSION_FILES_1);
+
+ s_files1_fd = luo_create_session(luo_fd, SESSION_FILES_1);
+ if (s_files1_fd < 0)
+ fail_exit("luo_create_session for '%s'", SESSION_FILES_1);
+ if (create_and_preserve_memfd(s_files1_fd, MFD1_TOKEN, MFD1_DATA) < 0) {
+ fail_exit("create_and_preserve_memfd for token %#x",
+ MFD1_TOKEN);
+ }
+
+ ksft_print_msg("[STAGE 1] Creating session '%s' with two memfds...\n",
+ SESSION_FILES_2);
+
+ s_files2_fd = luo_create_session(luo_fd, SESSION_FILES_2);
+ if (s_files2_fd < 0)
+ fail_exit("luo_create_session for '%s'", SESSION_FILES_2);
+ if (create_and_preserve_memfd(s_files2_fd, MFD2_TOKEN, MFD2_DATA) < 0) {
+ fail_exit("create_and_preserve_memfd for token %#x",
+ MFD2_TOKEN);
+ }
+ if (create_and_preserve_memfd(s_files2_fd, MFD3_TOKEN, MFD3_DATA) < 0) {
+ fail_exit("create_and_preserve_memfd for token %#x",
+ MFD3_TOKEN);
+ }
+
+ ksft_print_msg("[STAGE 1] Executing kexec...\n");
+
+ if (system(KEXEC_SCRIPT) != 0)
+ fail_exit("kexec script failed");
+
+ exit(EXIT_FAILURE);
+}
+
+/* Stage 2: Executed after the kexec reboot. */
+static void run_stage_2(int luo_fd, int state_session_fd)
+{
+ int s_empty1_fd, s_empty2_fd, s_files1_fd, s_files2_fd;
+ int mfd1, mfd2, mfd3, stage;
+
+ ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
+
+ restore_and_read_stage(state_session_fd, STATE_MEMFD_TOKEN, &stage);
+ if (stage != 2) {
+ fail_exit("Expected stage 2, but state file contains %d",
+ stage);
+ }
+
+ ksft_print_msg("[STAGE 2] Retrieving all sessions...\n");
+ s_empty1_fd = luo_retrieve_session(luo_fd, SESSION_EMPTY_1);
+ if (s_empty1_fd < 0)
+ fail_exit("luo_retrieve_session for '%s'", SESSION_EMPTY_1);
+
+ s_empty2_fd = luo_retrieve_session(luo_fd, SESSION_EMPTY_2);
+ if (s_empty2_fd < 0)
+ fail_exit("luo_retrieve_session for '%s'", SESSION_EMPTY_2);
+
+ s_files1_fd = luo_retrieve_session(luo_fd, SESSION_FILES_1);
+ if (s_files1_fd < 0)
+ fail_exit("luo_retrieve_session for '%s'", SESSION_FILES_1);
+
+ s_files2_fd = luo_retrieve_session(luo_fd, SESSION_FILES_2);
+ if (s_files2_fd < 0)
+ fail_exit("luo_retrieve_session for '%s'", SESSION_FILES_2);
+
+ ksft_print_msg("[STAGE 2] Verifying contents of session '%s'...\n",
+ SESSION_FILES_1);
+ mfd1 = restore_and_verify_memfd(s_files1_fd, MFD1_TOKEN, MFD1_DATA);
+ if (mfd1 < 0)
+ fail_exit("restore_and_verify_memfd for token %#x", MFD1_TOKEN);
+ close(mfd1);
+
+ ksft_print_msg("[STAGE 2] Verifying contents of session '%s'...\n",
+ SESSION_FILES_2);
+
+ mfd2 = restore_and_verify_memfd(s_files2_fd, MFD2_TOKEN, MFD2_DATA);
+ if (mfd2 < 0)
+ fail_exit("restore_and_verify_memfd for token %#x", MFD2_TOKEN);
+ close(mfd2);
+
+ mfd3 = restore_and_verify_memfd(s_files2_fd, MFD3_TOKEN, MFD3_DATA);
+ if (mfd3 < 0)
+ fail_exit("restore_and_verify_memfd for token %#x", MFD3_TOKEN);
+ close(mfd3);
+
+ ksft_print_msg("[STAGE 2] Test data verified successfully.\n");
+
+ ksft_print_msg("[STAGE 2] Finalizing all test sessions...\n");
+ if (luo_session_finish(s_empty1_fd) < 0)
+ fail_exit("luo_session_finish for '%s'", SESSION_EMPTY_1);
+ close(s_empty1_fd);
+
+ if (luo_session_finish(s_empty2_fd) < 0)
+ fail_exit("luo_session_finish for '%s'", SESSION_EMPTY_2);
+ close(s_empty2_fd);
+
+ if (luo_session_finish(s_files1_fd) < 0)
+ fail_exit("luo_session_finish for '%s'", SESSION_FILES_1);
+ close(s_files1_fd);
+
+ if (luo_session_finish(s_files2_fd) < 0)
+ fail_exit("luo_session_finish for '%s'", SESSION_FILES_2);
+ close(s_files2_fd);
+
+ ksft_print_msg("[STAGE 2] Finalizing state session...\n");
+ if (luo_session_finish(state_session_fd) < 0)
+ fail_exit("luo_session_finish for state session");
+ close(state_session_fd);
+
+ ksft_print_msg("\n--- MULTI-SESSION KEXEC TEST PASSED ---\n");
+}
+
+int main(int argc, char *argv[])
+{
+ int luo_fd;
+ int state_session_fd;
+
+ luo_fd = luo_open_device();
+ if (luo_fd < 0)
+ ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
+ LUO_DEVICE);
+
+ /*
+ * Determine the stage by attempting to retrieve the state session.
+ * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
+ */
+ state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
+ if (state_session_fd == -ENOENT) {
+ run_stage_1(luo_fd);
+ } else if (state_session_fd >= 0) {
+ /* We got a valid handle, pass it directly to stage 2 */
+ run_stage_2(luo_fd, state_session_fd);
+ } else {
+ fail_exit("Failed to check for state session");
+ }
+
+ close(luo_fd);
+ return 0;
+}
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
` (18 preceding siblings ...)
2025-11-15 23:34 ` [PATCH v6 19/20] selftests/liveupdate: Add kexec test for multiple and empty sessions Pasha Tatashin
@ 2025-11-15 23:34 ` Pasha Tatashin
2025-11-17 11:13 ` Mike Rapoport
19 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-15 23:34 UTC (permalink / raw)
To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
Introduce an in-kernel test module to validate the core logic of the
Live Update Orchestrator's File-Lifecycle-Bound feature. This
provides a low-level, controlled environment to test FLB registration
and callback invocation without requiring userspace interaction or
actual kexec reboots.
The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
include/linux/liveupdate/abi/luo.h | 5 +
kernel/liveupdate/luo_file.c | 2 +
kernel/liveupdate/luo_internal.h | 6 ++
lib/Kconfig.debug | 23 +++++
lib/tests/Makefile | 1 +
lib/tests/liveupdate.c | 143 +++++++++++++++++++++++++++++
6 files changed, 180 insertions(+)
create mode 100644 lib/tests/liveupdate.c
diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
index 85596ce68c16..cdcace9b48f5 100644
--- a/include/linux/liveupdate/abi/luo.h
+++ b/include/linux/liveupdate/abi/luo.h
@@ -230,4 +230,9 @@ struct luo_flb_ser {
u64 count;
} __packed;
+/* Kernel Live Update Test ABI */
+#ifdef CONFIG_LIVEUPDATE_TEST
+#define LIVEUPDATE_TEST_FLB_COMPATIBLE(i) "liveupdate-test-flb-v" #i
+#endif
+
#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
index df337c9c4f21..9a531096bdb5 100644
--- a/kernel/liveupdate/luo_file.c
+++ b/kernel/liveupdate/luo_file.c
@@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
INIT_LIST_HEAD(&fh->flb_list);
list_add_tail(&fh->list, &luo_file_handler_list);
+ liveupdate_test_register(fh);
+
return 0;
}
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
index 389fb102775f..c863cb051d49 100644
--- a/kernel/liveupdate/luo_internal.h
+++ b/kernel/liveupdate/luo_internal.h
@@ -86,4 +86,10 @@ int __init luo_flb_setup_outgoing(void *fdt);
int __init luo_flb_setup_incoming(void *fdt);
void luo_flb_serialize(void);
+#ifdef CONFIG_LIVEUPDATE_TEST
+void liveupdate_test_register(struct liveupdate_file_handler *h);
+#else
+static inline void liveupdate_test_register(struct liveupdate_file_handler *h) { }
+#endif
+
#endif /* _LINUX_LUO_INTERNAL_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9a087826498a..eaa2af2bd963 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2803,6 +2803,29 @@ config LINEAR_RANGES_TEST
If unsure, say N.
+config LIVEUPDATE_TEST
+ bool "Live Update Kernel Test"
+ default n
+ depends on LIVEUPDATE
+ help
+ Enable a built-in kernel test module for the Live Update
+ Orchestrator.
+
+ This module validates the File-Lifecycle-Bound subsystem by
+ registering a set of mock FLB objects with any real file handlers
+ that support live update (such as the memfd handler).
+
+ When live update operations are performed, this test module will
+ output messages to the kernel log (dmesg), confirming that its
+ registration and various callback functions (preserve, retrieve,
+ finish, etc.) are being invoked correctly.
+
+ This is a debugging and regression testing tool for developers
+ working on the Live Update subsystem. It should not be enabled in
+ production kernels.
+
+ If unsure, say N
+
config CMDLINE_KUNIT_TEST
tristate "KUnit test for cmdline API" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/lib/tests/Makefile b/lib/tests/Makefile
index f7460831cfdd..8e5c527a94ac 100644
--- a/lib/tests/Makefile
+++ b/lib/tests/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
obj-$(CONFIG_KFIFO_KUNIT_TEST) += kfifo_kunit.o
obj-$(CONFIG_TEST_LIST_SORT) += test_list_sort.o
obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
+obj-$(CONFIG_LIVEUPDATE_TEST) += liveupdate.o
CFLAGS_longest_symbol_kunit.o += $(call cc-disable-warning, missing-prototypes)
obj-$(CONFIG_LONGEST_SYM_KUNIT_TEST) += longest_symbol_kunit.o
diff --git a/lib/tests/liveupdate.c b/lib/tests/liveupdate.c
new file mode 100644
index 000000000000..05c05b8c1c22
--- /dev/null
+++ b/lib/tests/liveupdate.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME " test: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/liveupdate.h>
+#include <linux/module.h>
+#include "../../kernel/liveupdate/luo_internal.h"
+
+static const struct liveupdate_flb_ops test_flb_ops;
+#define DEFINE_TEST_FLB(i) { \
+ .ops = &test_flb_ops, \
+ .compatible = LIVEUPDATE_TEST_FLB_COMPATIBLE(i), \
+}
+
+/* Number of Test FLBs to register with every file handler */
+#define TEST_NFLBS 3
+static struct liveupdate_flb test_flbs[TEST_NFLBS] = {
+ DEFINE_TEST_FLB(0),
+ DEFINE_TEST_FLB(1),
+ DEFINE_TEST_FLB(2),
+};
+
+#define TEST_FLB_MAGIC_BASE 0xFEEDF00DCAFEBEE0ULL
+
+static int test_flb_preserve(struct liveupdate_flb_op_args *argp)
+{
+ ptrdiff_t index = argp->flb - test_flbs;
+
+ pr_info("%s: preserve was triggered\n", argp->flb->compatible);
+ argp->data = TEST_FLB_MAGIC_BASE + index;
+
+ return 0;
+}
+
+static void test_flb_unpreserve(struct liveupdate_flb_op_args *argp)
+{
+ pr_info("%s: unpreserve was triggered\n", argp->flb->compatible);
+}
+
+static int test_flb_retrieve(struct liveupdate_flb_op_args *argp)
+{
+ ptrdiff_t index = argp->flb - test_flbs;
+ u64 expected_data = TEST_FLB_MAGIC_BASE + index;
+
+ if (argp->data == expected_data) {
+ pr_info("%s: found flb data from the previous boot\n",
+ argp->flb->compatible);
+ argp->obj = (void *)argp->data;
+ } else {
+ pr_err("%s: ERROR - incorrect data handle: %llx, expected %llx\n",
+ argp->flb->compatible, argp->data, expected_data);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void test_flb_finish(struct liveupdate_flb_op_args *argp)
+{
+ ptrdiff_t index = argp->flb - test_flbs;
+ void *expected_obj = (void *)(TEST_FLB_MAGIC_BASE + index);
+
+ if (argp->obj == expected_obj) {
+ pr_info("%s: finish was triggered\n", argp->flb->compatible);
+ } else {
+ pr_err("%s: ERROR - finish called with invalid object\n",
+ argp->flb->compatible);
+ }
+}
+
+static const struct liveupdate_flb_ops test_flb_ops = {
+ .preserve = test_flb_preserve,
+ .unpreserve = test_flb_unpreserve,
+ .retrieve = test_flb_retrieve,
+ .finish = test_flb_finish,
+ .owner = THIS_MODULE,
+};
+
+static void liveupdate_test_init(void)
+{
+ static DEFINE_MUTEX(init_lock);
+ static bool initialized;
+ int i;
+
+ guard(mutex)(&init_lock);
+
+ if (initialized)
+ return;
+
+ for (i = 0; i < TEST_NFLBS; i++) {
+ struct liveupdate_flb *flb = &test_flbs[i];
+ void *obj;
+ int err;
+
+ liveupdate_init_flb(flb);
+
+ err = liveupdate_flb_incoming_locked(flb, &obj);
+ if (!err) {
+ liveupdate_flb_incoming_unlock(flb, obj);
+ } else if (err != -ENODATA && err != -ENOENT) {
+ pr_err("liveupdate_flb_incoming_locked for %s failed: %pe\n",
+ flb->compatible, ERR_PTR(err));
+ }
+ }
+ initialized = true;
+}
+
+void liveupdate_test_register(struct liveupdate_file_handler *h)
+{
+ int err, i;
+
+ liveupdate_test_init();
+
+ for (i = 0; i < TEST_NFLBS; i++) {
+ struct liveupdate_flb *flb = &test_flbs[i];
+
+ err = liveupdate_register_flb(h, flb);
+ if (err)
+ pr_err("Failed to register %s %pe\n",
+ flb->compatible, ERR_PTR(err));
+ }
+
+ err = liveupdate_register_flb(h, &test_flbs[0]);
+ if (!err || err != -EEXIST) {
+ pr_err("Failed: %s should be already registered, but got err: %pe\n",
+ test_flbs[0].compatible, ERR_PTR(err));
+ }
+
+ pr_info("Registered %d FLBs with file handler: [%s]\n",
+ TEST_NFLBS, h->compatible);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Pasha Tatashin <pasha.tatashin@soleen.com>");
+MODULE_DESCRIPTION("In-kernel test for LUO mechanism");
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-15 23:33 ` [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO Pasha Tatashin
@ 2025-11-16 12:43 ` Mike Rapoport
2025-11-16 14:55 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 12:43 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:48PM -0500, Pasha Tatashin wrote:
> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
>
> When LUO is transitioned to a "prepared" state, it tells KHO to
> finalize, so all memory segments that were added to KHO preservation
> list are getting preserved. After "Prepared" state no new segments
> can be preserved. If LUO is canceled, it also tells KHO to cancel the
> serialization, and therefore, later LUO can go back into the prepared
> state.
>
> This patch introduces the following changes:
> - During the KHO finalization phase allocate FDT blob.
This happens much earlier, isn't it?
> - Populate this FDT with a LUO compatibility string ("luo-v1").
>
> LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> logic (`luo_do_*_calls`) remains unimplemented in this patch.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/liveupdate/abi/luo.h | 54 ++++++++++
> kernel/liveupdate/luo_core.c | 153 ++++++++++++++++++++++++++++-
> 2 files changed, 206 insertions(+), 1 deletion(-)
> create mode 100644 include/linux/liveupdate/abi/luo.h
>
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> new file mode 100644
> index 000000000000..9483a294287f
> --- /dev/null
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: Live Update Orchestrator ABI
> + *
> + * This header defines the stable Application Binary Interface used by the
> + * Live Update Orchestrator to pass state from a pre-update kernel to a
> + * post-update kernel. The ABI is built upon the Kexec HandOver framework
> + * and uses a Flattened Device Tree to describe the preserved data.
> + *
> + * This interface is a contract. Any modification to the FDT structure, node
> + * properties, compatible strings, or the layout of the `__packed` serialization
> + * structures defined here constitutes a breaking change. Such changes require
> + * incrementing the version number in the relevant `_COMPATIBLE` string to
> + * prevent a new kernel from misinterpreting data from an old kernel.
I'd add a sentence that stresses that ABI changes are possible as long they
include changes to the FDT version.
This is indeed implied by the last paragraph, but I think it's worth
spelling it explicitly.
Another thing that I think this should mention is that compatibility is
only guaranteed for the kernels that use the same ABI version.
> + *
> + * FDT Structure Overview:
> + * The entire LUO state is encapsulated within a single KHO entry named "LUO".
> + * This entry contains an FDT with the following layout:
> + *
> + * .. code-block:: none
> + *
> + * / {
> + * compatible = "luo-v1";
> + * liveupdate-number = <...>;
> + * };
> + *
> + * Main LUO Node (/):
> + *
> + * - compatible: "luo-v1"
> + * Identifies the overall LUO ABI version.
> + * - liveupdate-number: u64
> + * A counter tracking the number of successful live updates performed.
> + */
...
> +static int __init liveupdate_early_init(void)
> +{
> + int err;
> +
> + err = luo_early_startup();
> + if (err) {
> + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> + ERR_PTR(err));
How do we report this to the userspace?
I think the decision what to do in this case belongs there. Even if it's
down to choosing between plain kexec and full reboot, it's still a policy
that should be implemented in userspace.
> + luo_global.enabled = false;
> + }
> +
> + return err;
> +}
> +early_initcall(liveupdate_early_init);
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec
2025-11-15 23:33 ` [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec Pasha Tatashin
@ 2025-11-16 12:44 ` Mike Rapoport
0 siblings, 0 replies; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 12:44 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:49PM -0500, Pasha Tatashin wrote:
> Modify the kernel_kexec() to call liveupdate_reboot().
>
> This ensures that the Live Update Orchestrator is notified just
> before the kernel executes the kexec jump. The liveupdate_reboot()
> function triggers the final freeze event, allowing participating
> FDs perform last-minute check or state saving within the blackout
> window.
>
> If liveupdate_reboot() returns an error (indicating a failure during
> LUO finalization), the kexec operation is aborted to prevent proceeding
> with an inconsistent state. An error is returned to user.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> kernel/kexec_core.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index a8890dd03a1d..3122235c225b 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -15,6 +15,7 @@
> #include <linux/kexec.h>
> #include <linux/mutex.h>
> #include <linux/list.h>
> +#include <linux/liveupdate.h>
> #include <linux/highmem.h>
> #include <linux/syscalls.h>
> #include <linux/reboot.h>
> @@ -1145,6 +1146,10 @@ int kernel_kexec(void)
> goto Unlock;
> }
>
> + error = liveupdate_reboot();
> + if (error)
> + goto Unlock;
> +
> #ifdef CONFIG_KEXEC_JUMP
> if (kexec_image->preserve_context) {
> /*
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-16 12:43 ` Mike Rapoport
@ 2025-11-16 14:55 ` Pasha Tatashin
2025-11-16 19:16 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-16 14:55 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:48PM -0500, Pasha Tatashin wrote:
> > Integrate the LUO with the KHO framework to enable passing LUO state
> > across a kexec reboot.
> >
> > When LUO is transitioned to a "prepared" state, it tells KHO to
> > finalize, so all memory segments that were added to KHO preservation
> > list are getting preserved. After "Prepared" state no new segments
> > can be preserved. If LUO is canceled, it also tells KHO to cancel the
> > serialization, and therefore, later LUO can go back into the prepared
> > state.
> >
> > This patch introduces the following changes:
> > - During the KHO finalization phase allocate FDT blob.
>
> This happens much earlier, isn't it?
It is, this commit log needs to be updated, it still talks about
prepare/cancel, where they are since v5 replaced with
preserve/unfreeze.
>
> > - Populate this FDT with a LUO compatibility string ("luo-v1").
> >
> > LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> > logic (`luo_do_*_calls`) remains unimplemented in this patch.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> > include/linux/liveupdate/abi/luo.h | 54 ++++++++++
> > kernel/liveupdate/luo_core.c | 153 ++++++++++++++++++++++++++++-
> > 2 files changed, 206 insertions(+), 1 deletion(-)
> > create mode 100644 include/linux/liveupdate/abi/luo.h
> >
> > diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> > new file mode 100644
> > index 000000000000..9483a294287f
> > --- /dev/null
> > +++ b/include/linux/liveupdate/abi/luo.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +/**
> > + * DOC: Live Update Orchestrator ABI
> > + *
> > + * This header defines the stable Application Binary Interface used by the
> > + * Live Update Orchestrator to pass state from a pre-update kernel to a
> > + * post-update kernel. The ABI is built upon the Kexec HandOver framework
> > + * and uses a Flattened Device Tree to describe the preserved data.
> > + *
> > + * This interface is a contract. Any modification to the FDT structure, node
> > + * properties, compatible strings, or the layout of the `__packed` serialization
> > + * structures defined here constitutes a breaking change. Such changes require
> > + * incrementing the version number in the relevant `_COMPATIBLE` string to
> > + * prevent a new kernel from misinterpreting data from an old kernel.
>
> I'd add a sentence that stresses that ABI changes are possible as long they
> include changes to the FDT version.
> This is indeed implied by the last paragraph, but I think it's worth
> spelling it explicitly.
>
> Another thing that I think this should mention is that compatibility is
> only guaranteed for the kernels that use the same ABI version.
Sure, I will add both.
> > + *
> > + * FDT Structure Overview:
> > + * The entire LUO state is encapsulated within a single KHO entry named "LUO".
> > + * This entry contains an FDT with the following layout:
> > + *
> > + * .. code-block:: none
> > + *
> > + * / {
> > + * compatible = "luo-v1";
> > + * liveupdate-number = <...>;
> > + * };
> > + *
> > + * Main LUO Node (/):
> > + *
> > + * - compatible: "luo-v1"
> > + * Identifies the overall LUO ABI version.
> > + * - liveupdate-number: u64
> > + * A counter tracking the number of successful live updates performed.
> > + */
> ...
>
> > +static int __init liveupdate_early_init(void)
> > +{
> > + int err;
> > +
> > + err = luo_early_startup();
> > + if (err) {
> > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > + ERR_PTR(err));
>
> How do we report this to the userspace?
> I think the decision what to do in this case belongs there. Even if it's
> down to choosing between plain kexec and full reboot, it's still a policy
> that should be implemented in userspace.
I agree that policy belongs in userspace, and that is how we designed
it. In this specific failure case (ABI mismatch or corrupt FDT), the
preserved state is unrecoverable by the kernel. We cannot parse the
incoming data, so we cannot offer it to userspace.
We report this state by not registering the /dev/liveupdate device.
When the userspace agent attempts to initialize, it receives ENOENT.
At that point, the agent exercises its policy:
- Check dmesg for the specific error and report the failure to the
fleet control plane.
- Trigger a fresh (kexec or cold) reboot to reset unreclaimable resources.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
2025-11-15 23:33 ` [PATCH v6 04/20] liveupdate: luo_session: add sessions support Pasha Tatashin
@ 2025-11-16 17:05 ` Mike Rapoport
2025-11-17 15:09 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 17:05 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:50PM -0500, Pasha Tatashin wrote:
> Introduce concept of "Live Update Sessions" within the LUO framework.
> LUO sessions provide a mechanism to group and manage `struct file *`
> instances (representing file descriptors) that need to be preserved
> across a kexec-based live update.
>
> Each session is identified by a unique name and acts as a container
> for file objects whose state is critical to a userspace workload, such
> as a virtual machine or a high-performance database, aiming to maintain
> their functionality across a kernel transition.
>
> This groundwork establishes the framework for preserving file-backed
> state across kernel updates, with the actual file data preservation
> mechanisms to be implemented in subsequent patches.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/liveupdate/abi/luo.h | 83 +++++-
> include/uapi/linux/liveupdate.h | 3 +
> kernel/liveupdate/Makefile | 3 +-
> kernel/liveupdate/luo_core.c | 10 +
> kernel/liveupdate/luo_internal.h | 52 ++++
> kernel/liveupdate/luo_session.c | 421 +++++++++++++++++++++++++++++
> 6 files changed, 570 insertions(+), 2 deletions(-)
> create mode 100644 kernel/liveupdate/luo_internal.h
> create mode 100644 kernel/liveupdate/luo_session.c
...
> +/**
> + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> + * @name: The unique name of the session, copied from the `luo_session`
> + * structure.
I'd phase it as
The unique name of the session provided by the userspace at
the time of session creation.
> + * @files: The physical address of a contiguous memory block that holds
> + * the serialized state of files.
Maybe add ^ in this session?
> + * @pgcnt: The number of pages occupied by the `files` memory block.
> + * @count: The total number of files that were part of this session during
> + * serialization. Used for iteration and validation during
> + * restoration.
> + *
> + * This structure is used to package session-specific metadata for transfer
> + * between kernels via Kexec Handover. An array of these structures (one per
> + * session) is created and passed to the new kernel, allowing it to reconstruct
> + * the session context.
> + *
> + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
This comment applies to the luo_session_header_ser description as well.
> + */
> +struct luo_session_ser {
> + char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> + u64 files;
> + u64 pgcnt;
> + u64 count;
> +} __packed;
> +
> #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> index df34c1642c4d..d2ef2f7e0dbd 100644
> --- a/include/uapi/linux/liveupdate.h
> +++ b/include/uapi/linux/liveupdate.h
> @@ -43,4 +43,7 @@
> /* The ioctl type, documented in ioctl-number.rst */
> #define LIVEUPDATE_IOCTL_TYPE 0xBA
>
> +/* The maximum length of session name including null termination */
> +#define LIVEUPDATE_SESSION_NAME_LENGTH 56
You decided not to bump it to 64 in the end? ;-)
> +
> #endif /* _UAPI_LIVEUPDATE_H */
> diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> index 413722002b7a..83285e7ad726 100644
> --- a/kernel/liveupdate/Makefile
> +++ b/kernel/liveupdate/Makefile
> @@ -2,7 +2,8 @@
>
> luo-y := \
> luo_core.o \
> - luo_ioctl.o
> + luo_ioctl.o \
> + luo_session.o
>
> obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
> obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o
...
> +int luo_session_retrieve(const char *name, struct file **filep)
> +{
> + struct luo_session_header *sh = &luo_session_global.incoming;
> + struct luo_session *session = NULL;
> + struct luo_session *it;
> + int err;
> +
> + scoped_guard(rwsem_read, &sh->rwsem) {
> + list_for_each_entry(it, &sh->list, list) {
> + if (!strncmp(it->name, name, sizeof(it->name))) {
> + session = it;
> + break;
> + }
> + }
> + }
> +
> + if (!session)
> + return -ENOENT;
> +
> + scoped_guard(mutex, &session->mutex) {
> + if (session->retrieved)
> + return -EINVAL;
> + }
> +
> + err = luo_session_getfile(session, filep);
> + if (!err) {
> + scoped_guard(mutex, &session->mutex)
> + session->retrieved = true;
Retaking the mutex here seems a bit odd.
Do we really have to lock session->mutex in luo_session_getfile()?
> + }
> +
> + return err;
> +}
...
> +int __init luo_session_setup_incoming(void *fdt_in)
> +{
> + struct luo_session_header_ser *header_ser;
> + int err, header_size, offset;
> + u64 header_ser_pa;
> + const void *ptr;
> +
> + offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
> + if (offset < 0) {
> + pr_err("Unable to get session node: [%s]\n",
> + LUO_FDT_SESSION_NODE_NAME);
> + return -EINVAL;
> + }
> +
> + err = fdt_node_check_compatible(fdt_in, offset,
> + LUO_FDT_SESSION_COMPATIBLE);
> + if (err) {
> + pr_err("Session node incompatible [%s]\n",
> + LUO_FDT_SESSION_COMPATIBLE);
> + return -EINVAL;
> + }
> +
> + header_size = 0;
> + ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEADER, &header_size);
> + if (!ptr || header_size != sizeof(u64)) {
> + pr_err("Unable to get session header '%s' [%d]\n",
> + LUO_FDT_SESSION_HEADER, header_size);
> + return -EINVAL;
> + }
> +
> + header_ser_pa = get_unaligned((u64 *)ptr);
> + header_ser = phys_to_virt(header_ser_pa);
> +
> + luo_session_global.incoming.header_ser = header_ser;
> + luo_session_global.incoming.ser = (void *)(header_ser + 1);
> + INIT_LIST_HEAD(&luo_session_global.incoming.list);
> + init_rwsem(&luo_session_global.incoming.rwsem);
> + luo_session_global.incoming.active = true;
> +
> + return 0;
> +}
> +
> +bool luo_session_is_deserialized(void)
> +{
> + return luo_session_global.deserialized;
> +}
> +
> +int luo_session_deserialize(void)
> +{
> + struct luo_session_header *sh = &luo_session_global.incoming;
> + int err;
> +
> + if (luo_session_is_deserialized())
> + return 0;
> +
> + luo_session_global.deserialized = true;
> + if (!sh->active) {
> + INIT_LIST_HEAD(&sh->list);
> + init_rwsem(&sh->rwsem);
> + return 0;
How this can happen? luo_session_deserialize() is supposed to be called
from ioctl and luo_session_global.incoming should be set up way earlier.
And, why don't we initialize ->list and ->rwsem statically?
> + }
> +
> + for (int i = 0; i < sh->header_ser->count; i++) {
> + struct luo_session *session;
> +
> + session = luo_session_alloc(sh->ser[i].name);
> + if (IS_ERR(session)) {
> + pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> + sh->ser[i].name, session);
> + return PTR_ERR(session);
> + }
The allocated sessions still need to be freed if an insert fails ;-)
> +
> + err = luo_session_insert(sh, session);
> + if (err) {
> + luo_session_free(session);
> + pr_warn("Failed to insert session [%s] %pe\n",
> + session->name, ERR_PTR(err));
> + return err;
> + }
> +
> + session->count = sh->ser[i].count;
> + session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
> + session->pgcnt = sh->ser[i].pgcnt;
> + }
> +
> + kho_restore_free(sh->header_ser);
> + sh->header_ser = NULL;
> + sh->ser = NULL;
> +
> + return 0;
> +}
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
2025-11-15 23:33 ` [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface Pasha Tatashin
@ 2025-11-16 17:15 ` Mike Rapoport
2025-11-17 14:22 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 17:15 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:51PM -0500, Pasha Tatashin wrote:
> Introduce the user-space interface for the Live Update Orchestrator
> via ioctl commands, enabling external control over the live update
> process and management of preserved resources.
>
> The idea is that there is going to be a single userspace agent driving
> the live update, therefore, only a single process can ever hold this
> device opened at a time.
>
> The following ioctl commands are introduced:
>
> LIVEUPDATE_IOCTL_CREATE_SESSION
> Provides a way for userspace to create a named session for grouping file
> descriptors that need to be preserved. It returns a new file descriptor
> representing the session.
>
> LIVEUPDATE_IOCTL_RETRIEVE_SESSION
> Allows the userspace agent in the new kernel to reclaim a preserved
> session by its name, receiving a new file descriptor to manage the
> restored resources.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/uapi/linux/liveupdate.h | 66 +++++++++++-
> kernel/liveupdate/luo_internal.h | 21 ++++
> kernel/liveupdate/luo_ioctl.c | 178 +++++++++++++++++++++++++++++++
> 3 files changed, 264 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> index d2ef2f7e0dbd..6e04254ee535 100644
> --- a/include/uapi/linux/liveupdate.h
> +++ b/include/uapi/linux/liveupdate.h
> @@ -44,6 +44,70 @@
> #define LIVEUPDATE_IOCTL_TYPE 0xBA
>
> /* The maximum length of session name including null termination */
> -#define LIVEUPDATE_SESSION_NAME_LENGTH 56
> +#define LIVEUPDATE_SESSION_NAME_LENGTH 64
> +
> +/* The /dev/liveupdate ioctl commands */
> +enum {
> + LIVEUPDATE_CMD_BASE = 0x00,
> + LIVEUPDATE_CMD_CREATE_SESSION = LIVEUPDATE_CMD_BASE,
> + LIVEUPDATE_CMD_RETRIEVE_SESSION = 0x01,
> +};
> +
> +/**
> + * struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION)
> + * @size: Input; sizeof(struct liveupdate_ioctl_create_session)
> + * @fd: Output; The new file descriptor for the created session.
> + * @name: Input; A null-terminated string for the session name, max
> + * length %LIVEUPDATE_SESSION_NAME_LENGTH including termination
> + * char.
Nit: ^ character
> + *
> + * Creates a new live update session for managing preserved resources.
> + * This ioctl can only be called on the main /dev/liveupdate device.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +struct liveupdate_ioctl_create_session {
> + __u32 size;
> + __s32 fd;
> + __u8 name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +};
> +
> +#define LIVEUPDATE_IOCTL_CREATE_SESSION \
> + _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_CREATE_SESSION)
> +
> +/**
> + * struct liveupdate_ioctl_retrieve_session - ioctl(LIVEUPDATE_IOCTL_RETRIEVE_SESSION)
> + * @size: Input; sizeof(struct liveupdate_ioctl_retrieve_session)
> + * @fd: Output; The new file descriptor for the retrieved session.
> + * @name: Input; A null-terminated string identifying the session to retrieve.
> + * The name must exactly match the name used when the session was
> + * created in the previous kernel.
> + *
> + * Retrieves a handle (a new file descriptor) for a preserved session by its
> + * name. This is the primary mechanism for a userspace agent to regain control
> + * of its preserved resources after a live update.
> + *
> + * The userspace application provides the null-terminated `name` of a session
> + * it created before the live update. If a preserved session with a matching
> + * name is found, the kernel instantiates it and returns a new file descriptor
> + * in the `fd` field. This new session FD can then be used for all file-specific
> + * operations, such as restoring individual file descriptors with
> + * LIVEUPDATE_SESSION_RETRIEVE_FD.
> + *
> + * It is the responsibility of the userspace application to know the names of
> + * the sessions it needs to retrieve. If no session with the given name is
> + * found, the ioctl will fail with -ENOENT.
> + *
> + * This ioctl can only be called on the main /dev/liveupdate device when the
> + * system is in the LIVEUPDATE_STATE_UPDATED state.
> + */
> +struct liveupdate_ioctl_retrieve_session {
> + __u32 size;
> + __s32 fd;
> + __u8 name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +};
> +
> +#define LIVEUPDATE_IOCTL_RETRIEVE_SESSION \
> + _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_RETRIEVE_SESSION)
>
> #endif /* _UAPI_LIVEUPDATE_H */
> diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> index 245373edfa6f..5185ad37a8c1 100644
> --- a/kernel/liveupdate/luo_internal.h
> +++ b/kernel/liveupdate/luo_internal.h
> @@ -9,6 +9,27 @@
> #define _LINUX_LUO_INTERNAL_H
>
> #include <linux/liveupdate.h>
> +#include <linux/uaccess.h>
> +
> +struct luo_ucmd {
> + void __user *ubuffer;
> + u32 user_size;
> + void *cmd;
> +};
> +
> +static inline int luo_ucmd_respond(struct luo_ucmd *ucmd,
> + size_t kernel_cmd_size)
> +{
> + /*
> + * Copy the minimum of what the user provided and what we actually
> + * have.
> + */
> + if (copy_to_user(ucmd->ubuffer, ucmd->cmd,
> + min_t(size_t, ucmd->user_size, kernel_cmd_size))) {
> + return -EFAULT;
> + }
> + return 0;
> +}
>
> /**
> * struct luo_session - Represents an active or incoming Live Update session.
> diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> index 44d365185f7c..367385efa962 100644
> --- a/kernel/liveupdate/luo_ioctl.c
> +++ b/kernel/liveupdate/luo_ioctl.c
> @@ -5,15 +5,192 @@
> * Pasha Tatashin <pasha.tatashin@soleen.com>
> */
>
> +/**
> + * DOC: LUO ioctl Interface
> + *
> + * The IOCTL user-space control interface for the LUO subsystem.
> + * It registers a character device, typically found at ``/dev/liveupdate``,
> + * which allows a userspace agent to manage the LUO state machine and its
> + * associated resources, such as preservable file descriptors.
> + *
> + * To ensure that the state machine is controlled by a single entity, access
> + * to this device is exclusive: only one process is permitted to have
> + * ``/dev/liveupdate`` open at any given time. Subsequent open attempts will
> + * fail with -EBUSY until the first process closes its file descriptor.
> + * This singleton model simplifies state management by preventing conflicting
> + * commands from multiple userspace agents.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/atomic.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> #include <linux/liveupdate.h>
> #include <linux/miscdevice.h>
> +#include <uapi/linux/liveupdate.h>
> +#include "luo_internal.h"
>
> struct luo_device_state {
> struct miscdevice miscdev;
> + atomic_t in_use;
> +};
> +
> +static int luo_ioctl_create_session(struct luo_ucmd *ucmd)
> +{
> + struct liveupdate_ioctl_create_session *argp = ucmd->cmd;
> + struct file *file;
> + int err;
> +
> + argp->fd = get_unused_fd_flags(O_CLOEXEC);
> + if (argp->fd < 0)
> + return argp->fd;
> +
> + err = luo_session_create(argp->name, &file);
> + if (err)
> + goto err_put_fd;
> +
> + err = luo_ucmd_respond(ucmd, sizeof(*argp));
> + if (err)
> + goto err_put_file;
> +
> + fd_install(argp->fd, file);
> +
> + return 0;
> +
> +err_put_file:
> + fput(file);
> +err_put_fd:
> + put_unused_fd(argp->fd);
> +
> + return err;
> +}
> +
> +static int luo_ioctl_retrieve_session(struct luo_ucmd *ucmd)
> +{
> + struct liveupdate_ioctl_retrieve_session *argp = ucmd->cmd;
> + struct file *file;
> + int err;
> +
> + argp->fd = get_unused_fd_flags(O_CLOEXEC);
> + if (argp->fd < 0)
> + return argp->fd;
> +
> + err = luo_session_retrieve(argp->name, &file);
> + if (err < 0)
> + goto err_put_fd;
> +
> + err = luo_ucmd_respond(ucmd, sizeof(*argp));
> + if (err)
> + goto err_put_file;
> +
> + fd_install(argp->fd, file);
> +
> + return 0;
> +
> +err_put_file:
> + fput(file);
> +err_put_fd:
> + put_unused_fd(argp->fd);
> +
> + return err;
> +}
> +
> +static int luo_open(struct inode *inodep, struct file *filep)
> +{
> + struct luo_device_state *ldev = container_of(filep->private_data,
> + struct luo_device_state,
> + miscdev);
> +
> + if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> + return -EBUSY;
> +
> + luo_session_deserialize();
Why luo_session_deserialize() is tied to the first open of the chardev?
> +
> + return 0;
> +}
> +
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
@ 2025-11-16 18:15 ` Mike Rapoport
2025-11-17 17:50 ` Pasha Tatashin
2025-11-18 17:38 ` David Matlack
1 sibling, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 18:15 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:52PM -0500, Pasha Tatashin wrote:
> This patch implements the core mechanism for managing preserved
> files throughout the live update lifecycle. It provides the logic to
> invoke the file handler callbacks (preserve, unpreserve, freeze,
> unfreeze, retrieve, and finish) at the appropriate stages.
>
> During the reboot phase, luo_file_freeze() serializes the final
> metadata for each file (handler compatible string, token, and data
> handle) into a memory region preserved by KHO. In the new kernel,
> luo_file_deserialize() reconstructs the in-memory file list from this
> data, preparing the session for retrieval.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/liveupdate.h | 109 ++++
> include/linux/liveupdate/abi/luo.h | 22 +
> kernel/liveupdate/Makefile | 1 +
> kernel/liveupdate/luo_file.c | 887 +++++++++++++++++++++++++++++
> kernel/liveupdate/luo_internal.h | 9 +
> 5 files changed, 1028 insertions(+)
> create mode 100644 kernel/liveupdate/luo_file.c
>
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 730b76625fec..4a5d4dd9905a 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -10,6 +10,88 @@
> #include <linux/bug.h>
> #include <linux/types.h>
> #include <linux/list.h>
> +#include <linux/liveupdate/abi/luo.h>
> +#include <uapi/linux/liveupdate.h>
> +
> +struct liveupdate_file_handler;
> +struct liveupdate_session;
Why struct liveupdate_session is a part of public LUO API?
> +struct file;
> +
> +/**
> + * struct liveupdate_file_op_args - Arguments for file operation callbacks.
> + * @handler: The file handler being called.
> + * @session: The session this file belongs to.
> + * @retrieved: The retrieve status for the 'can_finish / finish'
> + * operation.
> + * @file: The file object. For retrieve: [OUT] The callback sets
> + * this to the new file. For other ops: [IN] The caller sets
> + * this to the file being operated on.
> + * @serialized_data: The opaque u64 handle, preserve/prepare/freeze may update
> + * this field.
> + *
> + * This structure bundles all parameters for the file operation callbacks.
> + * The 'data' and 'file' fields are used for both input and output.
> + */
> +struct liveupdate_file_op_args {
> + struct liveupdate_file_handler *handler;
> + struct liveupdate_session *session;
> + bool retrieved;
> + struct file *file;
> + u64 serialized_data;
> +};
> +
> +/**
> + * struct liveupdate_file_ops - Callbacks for live-updatable files.
> + * @can_preserve: Required. Lightweight check to see if this handler is
> + * compatible with the given file.
> + * @preserve: Required. Performs state-saving for the file.
> + * @unpreserve: Required. Cleans up any resources allocated by @preserve.
> + * @freeze: Optional. Final actions just before kernel transition.
> + * @unfreeze: Optional. Undo freeze operations.
> + * @retrieve: Required. Restores the file in the new kernel.
> + * @can_finish: Optional. Check if this FD can finish, i.e. all restoration
> + * pre-requirements for this FD are satisfied. Called prior to
> + * finish, in order to do successful finish calls for all
> + * resources in the session.
> + * @finish: Required. Final cleanup in the new kernel.
> + * @owner: Module reference
> + *
> + * All operations (except can_preserve) receive a pointer to a
> + * 'struct liveupdate_file_op_args' containing the necessary context.
> + */
> +struct liveupdate_file_ops {
> + bool (*can_preserve)(struct liveupdate_file_handler *handler,
> + struct file *file);
> + int (*preserve)(struct liveupdate_file_op_args *args);
> + void (*unpreserve)(struct liveupdate_file_op_args *args);
> + int (*freeze)(struct liveupdate_file_op_args *args);
> + void (*unfreeze)(struct liveupdate_file_op_args *args);
> + int (*retrieve)(struct liveupdate_file_op_args *args);
> + bool (*can_finish)(struct liveupdate_file_op_args *args);
> + void (*finish)(struct liveupdate_file_op_args *args);
> + struct module *owner;
> +};
> +
> +/**
> + * struct liveupdate_file_handler - Represents a handler for a live-updatable file type.
> + * @ops: Callback functions
> + * @compatible: The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
> + * that uniquely identifies the file type this handler
> + * supports. This is matched against the compatible string
> + * associated with individual &struct file instances.
> + * @list: Used for linking this handler instance into a global
> + * list of registered file handlers.
> + *
> + * Modules that want to support live update for specific file types should
> + * register an instance of this structure. LUO uses this registration to
> + * determine if a given file can be preserved and to find the appropriate
> + * operations to manage its state across the update.
> + */
> +struct liveupdate_file_handler {
> + const struct liveupdate_file_ops *ops;
> + const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
> + struct list_head list;
Did you consider using __private and ACCESS_PRIVATE() for the ->list
member here and in other structures visible outside kernel/liveupdate?
> +};
>
> #ifdef CONFIG_LIVEUPDATE
>
> @@ -19,6 +101,16 @@ bool liveupdate_enabled(void);
> /* Called during kexec to tell LUO that entered into reboot */
> int liveupdate_reboot(void);
>
> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
> +
> +/* kernel can internally retrieve files */
> +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> + struct file **filep);
> +
> +/* Get a token for an outgoing file, or -ENOENT if file is not preserved */
> +int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> + struct file *file, u64 *tokenp);
> +
> #else /* CONFIG_LIVEUPDATE */
>
> static inline bool liveupdate_enabled(void)
> @@ -31,5 +123,22 @@ static inline int liveupdate_reboot(void)
> return 0;
> }
>
> +static inline int liveupdate_register_file_handler(struct liveupdate_file_handler *h)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline int liveupdate_get_file_incoming(struct liveupdate_session *s,
> + u64 token, struct file **filep)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> + struct file *file, u64 *tokenp)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> #endif /* CONFIG_LIVEUPDATE */
> #endif /* _LINUX_LIVEUPDATE_H */
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 03a177ae232e..3a596ca1907b 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -65,6 +65,11 @@
> * Metadata for a single session, including its name and a physical pointer
> * to another preserved memory block containing an array of
> * `struct luo_file_ser` for all files in that session.
> + *
> + * - struct luo_file_ser:
> + * Metadata for a single preserved file. Contains the `compatible` string to
> + * find the correct handler in the new kernel, a user-provided `token` for
> + * identification, and an opaque `data` handle for the handler to use.
> */
>
> #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
> @@ -132,4 +137,21 @@ struct luo_session_ser {
> u64 count;
> } __packed;
>
> +/* The max size is set so it can be reliably used during in serialization */
I failed to parse this comment.
> +#define LIVEUPDATE_HNDL_COMPAT_LENGTH 48
> +
> +/**
> + * struct luo_file_ser - Represents the serialized preserves files.
> + * @compatible: File handler compatible string.
> + * @data: Private data
> + * @token: User provided token for this file
> + *
> + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> + */
> +struct luo_file_ser {
> + char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
> + u64 data;
> + u64 token;
> +} __packed;
> +
> #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> index 83285e7ad726..c2252a2ad7bd 100644
> --- a/kernel/liveupdate/Makefile
> +++ b/kernel/liveupdate/Makefile
> @@ -2,6 +2,7 @@
>
> luo-y := \
> luo_core.o \
> + luo_file.o \
> luo_ioctl.o \
> luo_session.o
>
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> new file mode 100644
> index 000000000000..dae27a69a09f
> --- /dev/null
> +++ b/kernel/liveupdate/luo_file.c
> @@ -0,0 +1,887 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: LUO File Descriptors
> + *
> + * LUO provides the infrastructure to preserve specific, stateful file
> + * descriptors across a kexec-based live update. The primary goal is to allow
> + * workloads, such as virtual machines using vfio, memfd, or iommufd, to
> + * retain access to their essential resources without interruption.
> + *
> + * The framework is built around a callback-based handler model and a well-
> + * defined lifecycle for each preserved file.
> + *
> + * Handler Registration:
> + * Kernel modules responsible for a specific file type (e.g., memfd, vfio)
> + * register a &struct liveupdate_file_handler. This handler provides a set of
> + * callbacks that LUO invokes at different stages of the update process, most
> + * notably:
> + *
> + * - can_preserve(): A lightweight check to determine if the handler is
> + * compatible with a given 'struct file'.
> + * - preserve(): The heavyweight operation that saves the file's state and
> + * returns an opaque u64 handle, happens while vcpus are still running.
^ VCPUs
This narrows the description to VM-only usecase and in general ->preserve()
may happen after VCPUs are suspended, although it's neither intended nor
desirable. LUO does not control the sequencing so we can't claim here
anything about VCPUs.
> + * LUO becomes the owner of this file until session is closed or file is
> + * finished.
"file is finished" reads too vague to me.
> + * - unpreserve(): Cleans up any resources allocated by .preserve(), called
> + * if the preservation process is aborted before the reboot (i.e. session is
> + * closed).
> + * - freeze(): A final pre-reboot opportunity to prepare the state for kexec.
> + * We are already in reboot syscall, and therefore userspace cannot mutate
> + * the file anymore.
> + * - unfreeze(): Undoes the actions of .freeze(), called if the live update
> + * is aborted after the freeze phase.
> + * - retrieve(): Reconstructs the file in the new kernel from the preserved
> + * handle.
> + * - finish(): Performs final check and cleanup in the new kernel. After
> + * succesul finish call, LUO gives up ownership to this file.
> + *
> + * File Preservation Lifecycle happy path:
> + *
> + * 1. Preserve (Normal Operation): A userspace agent preserves files one by one
> + * via an ioctl. For each file, luo_preserve_file() finds a compatible
> + * handler, calls its .preserve() op, and creates an internal &struct
^ method or operation
> + * luo_file to track the live state.
> + *
> + * 2. Freeze (Pre-Reboot): Just before the kexec, luo_file_freeze() is called.
> + * It iterates through all preserved files, calls their respective .freeze()
> + * ops, and serializes their final metadata (compatible string, token, and
^ method or operation
> + * data handle) into a contiguous memory block for KHO.
> + *
> + * 3. Deserialize (New Kernel - Early Boot): After kexec, luo_file_deserialize()
From the code it seems that description runs on the fist open of
/dev/liveupdated, what do I miss?
> + * runs. It reads the serialized data from the KHO memory region and
> + * reconstructs the in-memory list of &struct luo_file instances for the new
> + * kernel, linking them to their corresponding handlers.
> + *
> + * 4. Retrieve (New Kernel - Userspace Ready): The userspace agent can now
> + * restore file descriptors by providing a token. luo_retrieve_file()
> + * searches for the matching token, calls the handler's .retrieve() op to
> + * re-create the 'struct file', and returns a new FD. Files can be
> + * retrieved in ANY order.
> + *
> + * 5. Finish (New Kernel - Cleanup): Once a session retrival is complete,
> + * luo_file_finish() is called. It iterates through all files,
> + * invokes their .finish() ops for final cleanup, and releases all
^ method
> + * associated kernel resources.
> + *
> + * File Preservation Lifecycle unhappy paths:
> + *
> + * 1. Abort Before Reboot: If the userspace agent aborts the live update
> + * process before calling reboot (e.g., by closing the session file
> + * descriptor), the session's release handler calls
> + * luo_file_unpreserve_files(). This invokes the .unpreserve() callback on
> + * all preserved files, ensuring all allocated resources are cleaned up and
> + * returning the system to a clean state.
> + *
> + * 2. Freeze Failure: During the reboot() syscall, if any handler's .freeze()
> + * op fails, the .unfreeze() op is invoked on all previously *successful*
> + * freezes to roll back their state. The reboot() syscall then returns an
> + * error to userspace, canceling the live update.
> + *
> + * 3. Finish Failure: In the new kernel, if a handler's .finish() op fails,
> + * the luo_file_finish() operation is aborted. LUO retains ownership of
> + * all files within that session, including those that were not yet
> + * processed. The userspace agent can attempt to call the finish operation
> + * again later. If the issue cannot be resolved, these resources will be held
> + * by LUO until the next live update cycle, at which point they will be
> + * discarded.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/kexec_handover.h>
> +#include <linux/liveupdate.h>
> +#include <linux/liveupdate/abi/luo.h>
> +#include <linux/module.h>
> +#include <linux/sizes.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include "luo_internal.h"
> +
> +static LIST_HEAD(luo_file_handler_list);
> +
> +/* 2 4K pages, give space for 128 files per session */
> +#define LUO_FILE_PGCNT 2ul
> +#define LUO_FILE_MAX \
> + ((LUO_FILE_PGCNT << PAGE_SHIFT) / sizeof(struct luo_file_ser))
> +
> +/**
> + * struct luo_file - Represents a single preserved file instance.
> + * @fh: Pointer to the &struct liveupdate_file_handler that manages
> + * this type of file.
> + * @file: Pointer to the kernel's &struct file that is being preserved.
> + * This is NULL in the new kernel until the file is successfully
> + * retrieved.
> + * @serialized_data: The opaque u64 handle to the serialized state of the file.
> + * This handle is passed back to the handler's .freeze(),
> + * .retrieve(), and .finish() callbacks, allowing it to track
> + * and update its serialized state across phases.
> + * @retrieved: A flag indicating whether a user/kernel in the new kernel has
> + * successfully called retrieve() on this file. This prevents
> + * multiple retrieval attempts.
> + * @mutex: A mutex that protects the fields of this specific instance
> + * (e.g., @retrieved, @file), ensuring that operations like
> + * retrieving or finishing a file are atomic.
> + * @list: The list_head linking this instance into its parent
> + * session's list of preserved files.
> + * @token: The user-provided unique token used to identify this file.
> + *
> + * This structure is the core in-kernel representation of a single file being
> + * managed through a live update. An instance is created by luo_preserve_file()
> + * to link a 'struct file' to its corresponding handler, a user-provided token,
> + * and the serialized state handle returned by the handler's .preserve()
> + * operation.
> + *
> + * These instances are tracked in a per-session list. The @serialized_data
> + * field, which holds a handle to the file's serialized state, may be updated
> + * during the .freeze() callback before being serialized for the next kernel.
> + * After reboot, these structures are recreated by luo_file_deserialize() and
> + * are finally cleaned up by luo_file_finish().
> + */
> +struct luo_file {
> + struct liveupdate_file_handler *fh;
> + struct file *file;
> + u64 serialized_data;
> + bool retrieved;
> + struct mutex mutex;
> + struct list_head list;
> + u64 token;
> +};
> +
> +static int luo_session_alloc_files_mem(struct luo_session *session)
It seems like this belongs to luo_session.c
> +{
> + size_t size;
> + void *mem;
> +
> + if (session->files)
> + return 0;
> +
> + WARN_ON_ONCE(session->count);
> +
> + size = LUO_FILE_PGCNT << PAGE_SHIFT;
> + mem = kho_alloc_preserve(size);
> + if (IS_ERR(mem))
> + return PTR_ERR(mem);
> +
> + session->files = mem;
> + session->pgcnt = LUO_FILE_PGCNT;
> +
> + return 0;
> +}
> +
> +static void luo_session_free_files_mem(struct luo_session *session)
> +{
Ditto
> + /* If session has files, no need to free preservation memory */
> + if (session->count)
> + return;
> +
> + if (!session->files)
> + return;
> +
> + kho_unpreserve_free(session->files);
> + session->files = NULL;
> + session->pgcnt = 0;
> +}
> +
> +static bool luo_token_is_used(struct luo_session *session, u64 token)
> +{
> + struct luo_file *iter;
> +
> + list_for_each_entry(iter, &session->files_list, list) {
And here again I'm not very fond of dereferencing session objects in
luo_file.
> + if (iter->token == token)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/**
> + * luo_preserve_file - Initiate the preservation of a file descriptor.
> + * @session: The session to which the preserved file will be added.
> + * @token: A unique, user-provided identifier for the file.
> + * @fd: The file descriptor to be preserved.
> + *
> + * This function orchestrates the first phase of preserving a file. Upon entry,
> + * it takes a reference to the 'struct file' via fget(), effectively making LUO
> + * a co-owner of the file. This reference is held until the file is either
> + * unpreserved or successfully finished in the next kernel, preventing the file
> + * from being prematurely destroyed.
> + *
> + * This function orchestrates the first phase of preserving a file. It performs
> + * the following steps:
> + *
> + * 1. Validates that the @token is not already in use within the session.
> + * 2. Ensures the session's memory for files serialization is allocated
> + * (allocates if needed).
> + * 3. Iterates through registered handlers, calling can_preserve() to find one
> + * compatible with the given @fd.
> + * 4. Calls the handler's .preserve() operation, which saves the file's state
> + * and returns an opaque private data handle.
> + * 5. Adds the new instance to the session's internal list.
> + *
> + * On success, LUO takes a reference to the 'struct file' and considers it
> + * under its management until it is unpreserved or finished.
> + *
> + * In case of any failure, all intermediate allocations (file reference, memory
> + * for the 'luo_file' struct, etc.) are cleaned up before returning an error.
> + *
> + * Context: Can be called from an ioctl handler during normal system operation.
> + * Return: 0 on success. Returns a negative errno on failure:
> + * -EEXIST if the token is already used.
> + * -EBADF if the file descriptor is invalid.
> + * -ENOSPC if the session is full.
> + * -ENOENT if no compatible handler is found.
> + * -ENOMEM on memory allocation failure.
> + * Other erros might be returned by .preserve().
> + */
> +int luo_preserve_file(struct luo_session *session, u64 token, int fd)
> +{
> + struct liveupdate_file_op_args args = {0};
> + struct liveupdate_file_handler *fh;
> + struct luo_file *luo_file;
> + struct file *file;
> + int err;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + if (luo_token_is_used(session, token))
> + return -EEXIST;
> +
> + file = fget(fd);
> + if (!file)
> + return -EBADF;
> +
> + err = luo_session_alloc_files_mem(session);
> + if (err)
> + goto exit_err;
> +
> + if (session->count == LUO_FILE_MAX) {
> + err = -ENOSPC;
> + goto exit_err;
> + }
I believe session can be prepared and vailidated by the caller.
> +
> + err = -ENOENT;
> + list_for_each_entry(fh, &luo_file_handler_list, list) {
> + if (fh->ops->can_preserve(fh, file)) {
> + err = 0;
> + break;
> + }
> + }
> +
> + /* err is still -ENOENT if no handler was found */
> + if (err)
> + goto exit_err;
> +
> + luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> + if (!luo_file) {
> + err = -ENOMEM;
> + goto exit_err;
> + }
> +
> + luo_file->file = file;
> + luo_file->fh = fh;
> + luo_file->token = token;
> + luo_file->retrieved = false;
> + mutex_init(&luo_file->mutex);
> +
> + args.handler = fh;
> + args.session = (struct liveupdate_session *)session;
Isn't args.session already struct liveupdate_session *?
> + args.file = file;
> + err = fh->ops->preserve(&args);
> + if (err) {
> + mutex_destroy(&luo_file->mutex);
> + kfree(luo_file);
> + goto exit_err;
> + } else {
> + luo_file->serialized_data = args.serialized_data;
> + list_add_tail(&luo_file->list, &session->files_list);
> + session->count++;
I'd use luo_session_add_file(struct luo_file *luo_file) or return luo_file
by reference to the caller.
Than the lockdep_assert_held() can go away as well.
> + }
> +
> + return 0;
> +
> +exit_err:
> + fput(file);
> + luo_session_free_files_mem(session);
The error handling in this function is a mess. Pasha, please, please, use
goto consistently.
> +
> + return err;
> +}
> +
> +/**
> + * luo_file_unpreserve_files - Unpreserves all files from a session.
> + * @session: The session to be cleaned up.
> + *
> + * This function serves as the primary cleanup path for a session. It is
> + * invoked when the userspace agent closes the session's file descriptor.
> + *
> + * For each file, it performs the following cleanup actions:
> + * 1. Calls the handler's .unpreserve() callback to allow the handler to
> + * release any resources it allocated.
> + * 2. Removes the file from the session's internal tracking list.
> + * 3. Releases the reference to the 'struct file' that was taken by
> + * luo_preserve_file() via fput(), returning ownership.
> + * 4. Frees the memory associated with the internal 'struct luo_file'.
> + *
> + * After all individual files are unpreserved, it frees the contiguous memory
> + * block that was allocated to hold their serialization data.
> + */
> +void luo_file_unpreserve_files(struct luo_session *session)
> +{
> + struct luo_file *luo_file;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + while (!list_empty(&session->files_list)) {
I think the loop should be in luo_session.c and luo_files.c should
implement luo_file_unpreserve(struct luo_file *luo_file)
The same applies to other functions below that do something with all files
in the session. In my view luo_session should iterate through
luo_session.files_list and call luo_file methods for each luo_file object.
> + struct liveupdate_file_op_args args = {0};
> +
> + luo_file = list_last_entry(&session->files_list,
> + struct luo_file, list);
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.file = luo_file->file;
> + args.serialized_data = luo_file->serialized_data;
> + luo_file->fh->ops->unpreserve(&args);
> +
> + list_del(&luo_file->list);
> + session->count--;
> +
> + fput(luo_file->file);
> + mutex_destroy(&luo_file->mutex);
> + kfree(luo_file);
> + }
> +
> + luo_session_free_files_mem(session);
> +}
> +
> +static int luo_file_freeze_one(struct luo_session *session,
> + struct luo_file *luo_file)
> +{
> + int err = 0;
> +
> + guard(mutex)(&luo_file->mutex);
> +
> + if (luo_file->fh->ops->freeze) {
> + struct liveupdate_file_op_args args = {0};
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.file = luo_file->file;
> + args.serialized_data = luo_file->serialized_data;
> +
> + err = luo_file->fh->ops->freeze(&args);
> + if (!err)
> + luo_file->serialized_data = args.serialized_data;
> + }
> +
> + return err;
> +}
> +
> +static void luo_file_unfreeze_one(struct luo_session *session,
> + struct luo_file *luo_file)
> +{
> + guard(mutex)(&luo_file->mutex);
> +
> + if (luo_file->fh->ops->unfreeze) {
> + struct liveupdate_file_op_args args = {0};
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.file = luo_file->file;
> + args.serialized_data = luo_file->serialized_data;
> +
> + luo_file->fh->ops->unfreeze(&args);
> + }
> +
> + luo_file->serialized_data = 0;
> +}
> +
> +static void __luo_file_unfreeze(struct luo_session *session,
> + struct luo_file *failed_entry)
> +{
> + struct list_head *files_list = &session->files_list;
> + struct luo_file *luo_file;
> +
> + list_for_each_entry(luo_file, files_list, list) {
> + if (luo_file == failed_entry)
> + break;
> +
> + luo_file_unfreeze_one(session, luo_file);
> + }
> +
> + memset(session->files, 0, session->pgcnt << PAGE_SHIFT);
> +}
> +
> +/**
> + * luo_file_freeze - Freezes all preserved files and serializes their metadata.
> + * @session: The session whose files are to be frozen.
> + *
> + * This function is called from the reboot() syscall path, just before the
> + * kernel transitions to the new image via kexec. Its purpose is to perform the
> + * final preparation and serialization of all preserved files in the session.
> + *
> + * It iterates through each preserved file in FIFO order (the order of
> + * preservation) and performs two main actions:
> + *
> + * 1. Freezes the File: It calls the handler's .freeze() callback for each
> + * file. This gives the handler a final opportunity to quiesce the device or
> + * prepare its state for the upcoming reboot. The handler may update its
> + * private data handle during this step.
> + *
> + * 2. Serializes Metadata: After a successful freeze, it copies the final file
> + * metadata—the handler's compatible string, the user token, and the final
> + * private data handle—into the pre-allocated contiguous memory buffer
> + * (session->files) that will be handed over to the next kernel via KHO.
> + *
> + * Error Handling (Rollback):
> + * This function is atomic. If any handler's .freeze() operation fails, the
> + * entire live update is aborted. The __luo_file_unfreeze() helper is
> + * immediately called to invoke the .unfreeze() op on all files that were
> + * successfully frozen before the point of failure, rolling them back to a
> + * running state. The function then returns an error, causing the reboot()
> + * syscall to fail.
> + *
> + * Context: Called only from the liveupdate_reboot() path.
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int luo_file_freeze(struct luo_session *session)
> +{
> + struct luo_file_ser *file_ser = session->files;
> + struct luo_file *luo_file;
> + int err;
> + int i;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + if (!session->count)
> + return 0;
> +
> + if (WARN_ON(!file_ser))
> + return -EINVAL;
> +
> + i = 0;
> + list_for_each_entry(luo_file, &session->files_list, list) {
> + err = luo_file_freeze_one(session, luo_file);
> + if (err < 0) {
> + pr_warn("Freeze failed for session[%s] token[%#0llx] handler[%s] err[%pe]\n",
> + session->name, luo_file->token,
> + luo_file->fh->compatible, ERR_PTR(err));
> + goto exit_err;
> + }
> +
> + strscpy(file_ser[i].compatible, luo_file->fh->compatible,
> + sizeof(file_ser[i].compatible));
> + file_ser[i].data = luo_file->serialized_data;
> + file_ser[i].token = luo_file->token;
> + i++;
> + }
> +
> + return 0;
> +
> +exit_err:
> + __luo_file_unfreeze(session, luo_file);
Maybe move frozen files to a local list, call __luo_file_unfreeze() with
that list and than splice it back to session.files_list?
> +
> + return err;
> +}
> +
> +/**
> + * luo_file_unfreeze - Unfreezes all files in a session.
> + * @session: The session whose files are to be unfrozen.
> + *
> + * This function rolls back the state of all files in a session after the freeze
> + * phase has begun but must be aborted. It is the counterpart to
> + * luo_file_freeze().
> + *
> + * It invokes the __luo_file_unfreeze() helper with a NULL argument, which
> + * signals the helper to iterate through all files in the session and call
> + * their respective .unfreeze() handler callbacks.
> + *
> + * Context: This is called when the live update is aborted during
> + * the reboot() syscall, after luo_file_freeze() has been called.
> + */
> +void luo_file_unfreeze(struct luo_session *session)
> +{
> + lockdep_assert_held(&session->mutex);
> +
> + if (!session->count)
> + return;
> +
> + __luo_file_unfreeze(session, NULL);
> +}
> +
> +/**
> + * luo_retrieve_file - Restores a preserved file from a session by its token.
> + * @session: The session from which to retrieve the file.
> + * @token: The unique token identifying the file to be restored.
> + * @filep: Output parameter; on success, this is populated with a pointer
> + * to the newly retrieved 'struct file'.
> + *
> + * This function is the primary mechanism for recreating a file in the new
> + * kernel after a live update. It searches the session's list of deserialized
> + * files for an entry matching the provided @token.
> + *
> + * The operation is idempotent: if a file has already been successfully
> + * retrieved, this function will simply return a pointer to the existing
> + * 'struct file' and report success without re-executing the retrieve
> + * operation. This is handled by checking the 'retrieved' flag under a lock.
> + *
> + * File retrieval can happen in any order; it is not bound by the order of
> + * preservation.
> + *
> + * Context: Can be called from an ioctl or other in-kernel code in the new
> + * kernel.
> + * Return: 0 on success. Returns a negative errno on failure:
> + * -ENOENT if no file with the matching token is found.
> + * Any error code returned by the handler's .retrieve() op.
> + */
> +int luo_retrieve_file(struct luo_session *session, u64 token,
> + struct file **filep)
> +{
> + struct liveupdate_file_op_args args = {0};
> + struct luo_file *luo_file;
> + int err;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + if (list_empty(&session->files_list))
> + return -ENOENT;
> +
> + list_for_each_entry(luo_file, &session->files_list, list) {
> + if (luo_file->token == token)
> + break;
> + }
> +
> + if (luo_file->token != token)
> + return -ENOENT;
> +
> + guard(mutex)(&luo_file->mutex);
> + if (luo_file->retrieved) {
> + /*
> + * Someone is asking for this file again, so get a reference
> + * for them.
> + */
> + get_file(luo_file->file);
> + *filep = luo_file->file;
> + return 0;
> + }
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.serialized_data = luo_file->serialized_data;
> + err = luo_file->fh->ops->retrieve(&args);
> + if (!err) {
> + luo_file->file = args.file;
> +
> + /* Get reference so we can keep this file in LUO until finish */
> + get_file(luo_file->file);
> + *filep = luo_file->file;
> + luo_file->retrieved = true;
> + }
> +
> + return err;
> +}
> +
> +static int luo_file_can_finish_one(struct luo_session *session,
> + struct luo_file *luo_file)
> +{
> + bool can_finish = true;
> +
> + guard(mutex)(&luo_file->mutex);
> +
> + if (luo_file->fh->ops->can_finish) {
> + struct liveupdate_file_op_args args = {0};
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.file = luo_file->file;
> + args.serialized_data = luo_file->serialized_data;
> + args.retrieved = luo_file->retrieved;
> + can_finish = luo_file->fh->ops->can_finish(&args);
> + }
> +
> + return can_finish ? 0 : -EBUSY;
> +}
> +
> +static void luo_file_finish_one(struct luo_session *session,
> + struct luo_file *luo_file)
> +{
> + struct liveupdate_file_op_args args = {0};
> +
> + guard(mutex)(&luo_file->mutex);
> +
> + args.handler = luo_file->fh;
> + args.session = (struct liveupdate_session *)session;
> + args.file = luo_file->file;
> + args.serialized_data = luo_file->serialized_data;
> + args.retrieved = luo_file->retrieved;
> +
> + luo_file->fh->ops->finish(&args);
> +}
> +
> +/**
> + * luo_file_finish - Completes the lifecycle for all files in a session.
> + * @session: The session to be finalized.
> + *
> + * This function orchestrates the final teardown of a live update session in the
> + * new kernel. It should be called after all necessary files have been
> + * retrieved and the userspace agent is ready to release the preserved state.
> + *
> + * The function iterates through all tracked files. For each file, it performs
> + * the following sequence of cleanup actions:
> + *
> + * 1. If file is not yet retrieved, retrieves it, and calls can_finish() on
> + * every file in the session. If all can_finish return true, continue to
> + * finish.
> + * 2. Calls the handler's .finish() callback (via luo_file_finish_one) to
> + * allow for final resource cleanup within the handler.
> + * 3. Releases LUO's ownership reference on the 'struct file' via fput(). This
> + * is the counterpart to the get_file() call in luo_retrieve_file().
> + * 4. Removes the 'struct luo_file' from the session's internal list.
> + * 5. Frees the memory for the 'struct luo_file' instance itself.
> + *
> + * After successfully finishing all individual files, it frees the
> + * contiguous memory block that was used to transfer the serialized metadata
> + * from the previous kernel.
> + *
> + * Error Handling (Atomic Failure):
> + * This operation is atomic. If any handler's .can_finish() op fails, the entire
> + * function aborts immediately and returns an error.
> + *
> + * Context: Can be called from an ioctl handler in the new kernel.
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int luo_file_finish(struct luo_session *session)
> +{
> + struct list_head *files_list = &session->files_list;
> + struct luo_file *luo_file;
> + int err;
> +
> + if (!session->count)
> + return 0;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + list_for_each_entry(luo_file, files_list, list) {
> + err = luo_file_can_finish_one(session, luo_file);
> + if (err)
> + return err;
> + }
> +
> + while (!list_empty(&session->files_list)) {
> + luo_file = list_last_entry(&session->files_list,
> + struct luo_file, list);
> +
> + luo_file_finish_one(session, luo_file);
> +
> + if (luo_file->file)
> + fput(luo_file->file);
> + list_del(&luo_file->list);
> + session->count--;
> + mutex_destroy(&luo_file->mutex);
> + kfree(luo_file);
> + }
> +
> + if (session->files) {
> + kho_restore_free(session->files);
> + session->files = NULL;
> + session->pgcnt = 0;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * luo_file_deserialize - Reconstructs the list of preserved files in the new kernel.
> + * @session: The incoming session containing the serialized file data from KHO.
> + *
> + * This function is called during the early boot process of the new kernel. It
> + * takes the raw, contiguous memory block of 'struct luo_file_ser' entries,
> + * provided by the previous kernel, and transforms it back into a live,
> + * in-memory linked list of 'struct luo_file' instances.
> + *
> + * For each serialized entry, it performs the following steps:
> + * 1. Reads the 'compatible' string.
> + * 2. Searches the global list of registered file handlers for one that
> + * matches the compatible string.
> + * 3. Allocates a new 'struct luo_file'.
> + * 4. Populates the new structure with the deserialized data (token, private
> + * data handle) and links it to the found handler. The 'file' pointer is
> + * initialized to NULL, as the file has not been retrieved yet.
> + * 5. Adds the new 'struct luo_file' to the session's files_list.
> + *
> + * This prepares the session for userspace, which can later call
> + * luo_retrieve_file() to restore the actual file descriptors.
> + *
> + * Context: Called from session deserialization.
> + */
> +int luo_file_deserialize(struct luo_session *session)
> +{
> + struct luo_file_ser *file_ser;
> + u64 i;
> +
> + lockdep_assert_held(&session->mutex);
> +
> + if (!session->files)
> + return 0;
> +
> + file_ser = session->files;
> + for (i = 0; i < session->count; i++) {
> + struct liveupdate_file_handler *fh;
> + bool handler_found = false;
> + struct luo_file *luo_file;
> +
> + list_for_each_entry(fh, &luo_file_handler_list, list) {
> + if (!strcmp(fh->compatible, file_ser[i].compatible)) {
> + handler_found = true;
> + break;
> + }
> + }
> +
> + if (!handler_found) {
> + pr_warn("No registered handler for compatible '%s'\n",
> + file_ser[i].compatible);
> + return -ENOENT;
> + }
> +
> + luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> + if (!luo_file)
> + return -ENOMEM;
Shouldn't we free files allocated on the previous iterations?
> +
> + luo_file->fh = fh;
> + luo_file->file = NULL;
> + luo_file->serialized_data = file_ser[i].data;
> + luo_file->token = file_ser[i].token;
> + luo_file->retrieved = false;
> + mutex_init(&luo_file->mutex);
> + list_add_tail(&luo_file->list, &session->files_list);
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * liveupdate_register_file_handler - Register a file handler with LUO.
> + * @fh: Pointer to a caller-allocated &struct liveupdate_file_handler.
> + * The caller must initialize this structure, including a unique
> + * 'compatible' string and a valid 'fh' callbacks. This function adds the
> + * handler to the global list of supported file handlers.
> + *
> + * Context: Typically called during module initialization for file types that
> + * support live update preservation.
> + *
> + * Return: 0 on success. Negative errno on failure.
> + */
> +int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> +{
> + static DEFINE_MUTEX(register_file_handler_lock);
> + struct liveupdate_file_handler *fh_iter;
> +
> + if (!liveupdate_enabled())
> + return -EOPNOTSUPP;
> +
> + /*
> + * Once sessions have been deserialized, file handlers cannot be
> + * registered, it is too late.
> + */
> + if (WARN_ON(luo_session_is_deserialized()))
> + return -EBUSY;
> +
> + /* Sanity check that all required callbacks are set */
> + if (!fh->ops->preserve || !fh->ops->unpreserve ||
> + !fh->ops->retrieve || !fh->ops->finish) {
> + return -EINVAL;
> + }
> +
> + guard(mutex)(®ister_file_handler_lock);
> + list_for_each_entry(fh_iter, &luo_file_handler_list, list) {
> + if (!strcmp(fh_iter->compatible, fh->compatible)) {
> + pr_err("File handler registration failed: Compatible string '%s' already registered.\n",
> + fh->compatible);
> + return -EEXIST;
> + }
> + }
> +
> + if (!try_module_get(fh->ops->owner))
> + return -EAGAIN;
> +
> + INIT_LIST_HEAD(&fh->list);
> + list_add_tail(&fh->list, &luo_file_handler_list);
> +
> + return 0;
> +}
> +
> +/**
> + * liveupdate_get_token_outgoing - Get the token for a preserved file.
> + * @s: The outgoing liveupdate session.
> + * @file: The file object to search for.
> + * @tokenp: Output parameter for the found token.
> + *
> + * Searches the list of preserved files in an outgoing session for a matching
> + * file object. If found, the corresponding user-provided token is returned.
> + *
> + * This function is intended for in-kernel callers that need to correlate a
> + * file with its liveupdate token.
> + *
> + * Context: Can be called from any context that can acquire the session mutex.
> + * Return: 0 on success, -ENOENT if the file is not preserved in this session.
> + */
> +int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> + struct file *file, u64 *tokenp)
> +{
This function is apparently unused.
> + struct luo_session *session = (struct luo_session *)s;
> + struct luo_file *luo_file;
> + int err = -ENOENT;
> +
> + list_for_each_entry(luo_file, &session->files_list, list) {
> + if (luo_file->file == file) {
> + if (tokenp)
> + *tokenp = luo_file->token;
> + err = 0;
> + break;
> + }
> + }
> +
> + return err;
> +}
> +
> +/**
> + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel use.
> + * @s: The incoming liveupdate session (restored from the previous kernel).
> + * @token: The unique token identifying the file to retrieve.
> + * @filep: On success, this will be populated with a pointer to the retrieved
> + * 'struct file'.
> + *
> + * Provides a kernel-internal API for other subsystems to retrieve their
> + * preserved files after a live update. This function is a simple wrapper
> + * around luo_retrieve_file(), allowing callers to find a file by its token.
> + *
> + * The operation is idempotent; subsequent calls for the same token will return
> + * a pointer to the same 'struct file' object.
> + *
> + * The caller receives a pointer to the file with a reference incremented. The
> + * file's lifetime is managed by LUO and any userspace file
> + * descriptors. If the caller needs to hold a reference to the file beyond the
> + * immediate scope, it must call get_file() itself.
> + *
> + * Context: Can be called from any context in the new kernel that has a handle
> + * to a restored session.
> + * Return: 0 on success. Returns -ENOENT if no file with the matching token is
> + * found, or any other negative errno on failure.
> + */
> +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> + struct file **filep)
> +{
Ditto.
> + struct luo_session *session = (struct luo_session *)s;
> +
> + return luo_retrieve_file(session, token, filep);
> +}
> diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> index 5185ad37a8c1..1a36f2383123 100644
> --- a/kernel/liveupdate/luo_internal.h
> +++ b/kernel/liveupdate/luo_internal.h
> @@ -70,4 +70,13 @@ int luo_session_serialize(void);
> int luo_session_deserialize(void);
> bool luo_session_is_deserialized(void);
>
> +int luo_preserve_file(struct luo_session *session, u64 token, int fd);
> +void luo_file_unpreserve_files(struct luo_session *session);
> +int luo_file_freeze(struct luo_session *session);
> +void luo_file_unfreeze(struct luo_session *session);
> +int luo_retrieve_file(struct luo_session *session, u64 token,
> + struct file **filep);
> +int luo_file_finish(struct luo_session *session);
> +int luo_file_deserialize(struct luo_session *session);
> +
> #endif /* _LINUX_LUO_INTERNAL_H */
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation
2025-11-15 23:33 ` [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation Pasha Tatashin
@ 2025-11-16 18:25 ` Mike Rapoport
2025-11-18 2:58 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 18:25 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:53PM -0500, Pasha Tatashin wrote:
> Introducing the userspace interface and internal logic required to
> manage the lifecycle of file descriptors within a session. Previously, a
> session was merely a container; this change makes it a functional
> management unit.
>
> The following capabilities are added:
>
> A new set of ioctl commands are added, which operate on the file
> descriptor returned by CREATE_SESSION. This allows userspace to:
> - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
> to be preserved across the live update.
> - LIVEUPDATE_SESSION_RETRIEVE_FD: Retrieve a preserved file in the
> new kernel using its unique token.
> - LIVEUPDATE_SESSION_FINISH: finish session
>
> The session's .release handler is enhanced to be state-aware. When a
> session's file descriptor is closed, it correctly unpreserves
> the session based on its current state before freeing all
> associated file resources.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/uapi/linux/liveupdate.h | 103 ++++++++++++++++++
> kernel/liveupdate/luo_session.c | 187 +++++++++++++++++++++++++++++++-
> 2 files changed, 286 insertions(+), 4 deletions(-)
...
> static int luo_session_release(struct inode *inodep, struct file *filep)
> {
> struct luo_session *session = filep->private_data;
> struct luo_session_header *sh;
> + int err = 0;
>
> /* If retrieved is set, it means this session is from incoming list */
> - if (session->retrieved)
> + if (session->retrieved) {
> sh = &luo_session_global.incoming;
> - else
> +
> + err = luo_session_finish_one(session);
> + if (err) {
> + pr_warn("Unable to finish session [%s] on release\n",
> + session->name);
return err;
and then else can go away here and luo_session_remove() and
luo_session_free() can be moved outside if (session->retrieved).
> + } else {
> + luo_session_remove(sh, session);
> + luo_session_free(session);
> + }
> +
> + } else {
> sh = &luo_session_global.outgoing;
>
> - luo_session_remove(sh, session);
> - luo_session_free(session);
> + scoped_guard(mutex, &session->mutex)
> + luo_file_unpreserve_files(session);
> + luo_session_remove(sh, session);
> + luo_session_free(session);
> + }
> +
> + return err;
> +}
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
@ 2025-11-16 18:53 ` Zhu Yanjun
2025-11-17 18:23 ` Pasha Tatashin
2025-11-17 19:27 ` David Matlack
` (2 subsequent siblings)
3 siblings, 1 reply; 92+ messages in thread
From: Zhu Yanjun @ 2025-11-16 18:53 UTC (permalink / raw)
To: Pasha Tatashin, pratyush, jasonmiu, graf, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
hughd, skhawaja, chrisl
在 2025/11/15 15:34, Pasha Tatashin 写道:
> Introduce a kexec-based selftest, luo_kexec_simple, to validate the
> end-to-end lifecycle of a Live Update Orchestrator (LUO) session across
> a reboot.
>
> While existing tests verify the uAPI in a pre-reboot context, this test
> ensures that the core functionality—preserving state via Kexec Handover
> and restoring it in a new kernel—works as expected.
>
> The test operates in two stages, managing its state across the reboot by
> preserving a dedicated "state session" containing a memfd. This
> mechanism dogfoods the LUO feature itself for state tracking, making the
> test self-contained.
>
> The test validates the following sequence:
>
> Stage 1 (Pre-kexec):
> - Creates a test session (test-session).
> - Creates and preserves a memfd with a known data pattern into the test
> session.
> - Creates the state-tracking session to signal progression to Stage 2.
> - Executes a kexec reboot via a helper script.
>
> Stage 2 (Post-kexec):
> - Retrieves the state-tracking session to confirm it is in the
> post-reboot stage.
> - Retrieves the preserved test session.
> - Restores the memfd from the test session and verifies its contents
> match the original data pattern written in Stage 1.
> - Finalizes both the test and state sessions to ensure a clean
> teardown.
>
> The test relies on a helper script (do_kexec.sh) to perform the reboot
> and a shared utility library (luo_test_utils.c) for common LUO
> operations, keeping the main test logic clean and focused.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> tools/testing/selftests/liveupdate/.gitignore | 1 +
> tools/testing/selftests/liveupdate/Makefile | 32 ++++
> .../testing/selftests/liveupdate/do_kexec.sh | 16 ++
> .../selftests/liveupdate/luo_kexec_simple.c | 114 ++++++++++++
> .../selftests/liveupdate/luo_test_utils.c | 168 ++++++++++++++++++
> .../selftests/liveupdate/luo_test_utils.h | 39 ++++
> 6 files changed, 370 insertions(+)
> create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
> create mode 100644 tools/testing/selftests/liveupdate/luo_kexec_simple.c
> create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
> create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h
>
> diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> index af6e773cf98f..daeef116174d 100644
> --- a/tools/testing/selftests/liveupdate/.gitignore
> +++ b/tools/testing/selftests/liveupdate/.gitignore
> @@ -1 +1,2 @@
> /liveupdate
> +/luo_kexec_simple
> diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> index 2a573c36016e..1563ac84006a 100644
> --- a/tools/testing/selftests/liveupdate/Makefile
> +++ b/tools/testing/selftests/liveupdate/Makefile
> @@ -1,7 +1,39 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +
> +KHDR_INCLUDES ?= -I../../../../usr/include
> CFLAGS += -Wall -O2 -Wno-unused-function
> CFLAGS += $(KHDR_INCLUDES)
> +LDFLAGS += -static
> +OUTPUT ?= .
> +
> +# --- Test Configuration (Edit this section when adding new tests) ---
> +LUO_SHARED_SRCS := luo_test_utils.c
> +LUO_SHARED_HDRS += luo_test_utils.h
> +
> +LUO_MANUAL_TESTS += luo_kexec_simple
> +
> +TEST_FILES += do_kexec.sh
>
> TEST_GEN_PROGS += liveupdate
>
> +# --- Automatic Rule Generation (Do not edit below) ---
> +
> +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> +
> +# Define the full list of sources for each manual test.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> +
> +# This loop automatically generates an explicit build rule for each manual test.
> +# It includes dependencies on the shared headers and makes the output
> +# executable.
> +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> + $(call msg,LINK,,$$@) ; \
> + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> + $(Q)chmod +x $$@ \
> + ) \
> +)
> +
> include ../lib.mk
> diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> new file mode 100755
> index 000000000000..3c7c6cafbef8
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +set -e
> +
> +# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
> +
> +KERNEL="${KERNEL:-/boot/bzImage}"
> +set -- -l -s --reuse-cmdline "$KERNEL"
> +
> +INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> +if [ -f "$INITRAMFS" ]; then
> + set -- "$@" --initrd="$INITRAMFS"
> +fi
> +
> +kexec "$@"
> +kexec -e
Thanks a lot. Just with kernel image, it is not enough to boot the host.
Adding initramfs will avoid the crash when the host boots.
I have made tests to verify this.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
> diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
> new file mode 100644
> index 000000000000..67ab6ebf9eec
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + *
> + * A simple selftest to validate the end-to-end lifecycle of a LUO session
> + * across a single kexec reboot.
> + */
> +
> +#include "luo_test_utils.h"
> +
> +/* Test-specific constants are now defined locally */
> +#define KEXEC_SCRIPT "./do_kexec.sh"
> +#define TEST_SESSION_NAME "test-session"
> +#define TEST_MEMFD_TOKEN 0x1A
> +#define TEST_MEMFD_DATA "hello kexec world"
> +
> +/* Constants for the state-tracking mechanism, specific to this test file. */
> +#define STATE_SESSION_NAME "kexec_simple_state"
> +#define STATE_MEMFD_TOKEN 999
> +
> +/* Stage 1: Executed before the kexec reboot. */
> +static void run_stage_1(int luo_fd)
> +{
> + int session_fd;
> +
> + ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> +
> + ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> + create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> +
> + ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> + TEST_SESSION_NAME);
> + session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> + if (session_fd < 0)
> + fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> +
> + if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> + TEST_MEMFD_DATA) < 0) {
> + fail_exit("create_and_preserve_memfd for token %#x",
> + TEST_MEMFD_TOKEN);
> + }
> +
> + ksft_print_msg("[STAGE 1] Executing kexec...\n");
> + if (system(KEXEC_SCRIPT) != 0)
> + fail_exit("kexec script failed");
> + exit(EXIT_FAILURE);
> +}
> +
> +/* Stage 2: Executed after the kexec reboot. */
> +static void run_stage_2(int luo_fd, int state_session_fd)
> +{
> + int session_fd, mfd, stage;
> +
> + ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
> +
> + restore_and_read_stage(state_session_fd, STATE_MEMFD_TOKEN, &stage);
> + if (stage != 2)
> + fail_exit("Expected stage 2, but state file contains %d", stage);
> +
> + ksft_print_msg("[STAGE 2] Retrieving session '%s'...\n", TEST_SESSION_NAME);
> + session_fd = luo_retrieve_session(luo_fd, TEST_SESSION_NAME);
> + if (session_fd < 0)
> + fail_exit("luo_retrieve_session for '%s'", TEST_SESSION_NAME);
> +
> + ksft_print_msg("[STAGE 2] Restoring and verifying memfd (token %#x)...\n",
> + TEST_MEMFD_TOKEN);
> + mfd = restore_and_verify_memfd(session_fd, TEST_MEMFD_TOKEN,
> + TEST_MEMFD_DATA);
> + if (mfd < 0)
> + fail_exit("restore_and_verify_memfd for token %#x", TEST_MEMFD_TOKEN);
> + close(mfd);
> +
> + ksft_print_msg("[STAGE 2] Test data verified successfully.\n");
> + ksft_print_msg("[STAGE 2] Finalizing test session...\n");
> + if (luo_session_finish(session_fd) < 0)
> + fail_exit("luo_session_finish for test session");
> + close(session_fd);
> +
> + ksft_print_msg("[STAGE 2] Finalizing state session...\n");
> + if (luo_session_finish(state_session_fd) < 0)
> + fail_exit("luo_session_finish for state session");
> + close(state_session_fd);
> +
> + ksft_print_msg("\n--- SIMPLE KEXEC TEST PASSED ---\n");
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + int luo_fd;
> + int state_session_fd;
> +
> + luo_fd = luo_open_device();
> + if (luo_fd < 0)
> + ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
> + LUO_DEVICE);
> +
> + /*
> + * Determine the stage by attempting to retrieve the state session.
> + * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
> + */
> + state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
> + if (state_session_fd == -ENOENT) {
> + run_stage_1(luo_fd);
> + } else if (state_session_fd >= 0) {
> + /* We got a valid handle, pass it directly to stage 2 */
> + run_stage_2(luo_fd, state_session_fd);
> + } else {
> + fail_exit("Failed to check for state session");
> + }
> +
> + close(luo_fd);
> +}
> diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c
> new file mode 100644
> index 000000000000..0a24105cbc54
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_test_utils.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/ioctl.h>
> +#include <sys/syscall.h>
> +#include <sys/mman.h>
> +#include <errno.h>
> +#include <stdarg.h>
> +
> +#include "luo_test_utils.h"
> +
> +int luo_open_device(void)
> +{
> + return open(LUO_DEVICE, O_RDWR);
> +}
> +
> +int luo_create_session(int luo_fd, const char *name)
> +{
> + struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) };
> +
> + snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
> + LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
> +
> + if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0)
> + return -errno;
> +
> + return arg.fd;
> +}
> +
> +int luo_retrieve_session(int luo_fd, const char *name)
> +{
> + struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) };
> +
> + snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
> + LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
> +
> + if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0)
> + return -errno;
> +
> + return arg.fd;
> +}
> +
> +int create_and_preserve_memfd(int session_fd, int token, const char *data)
> +{
> + struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) };
> + long page_size = sysconf(_SC_PAGE_SIZE);
> + void *map = MAP_FAILED;
> + int mfd = -1, ret = -1;
> +
> + mfd = memfd_create("test_mfd", 0);
> + if (mfd < 0)
> + return -errno;
> +
> + if (ftruncate(mfd, page_size) != 0)
> + goto out;
> +
> + map = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, mfd, 0);
> + if (map == MAP_FAILED)
> + goto out;
> +
> + snprintf(map, page_size, "%s", data);
> + munmap(map, page_size);
> +
> + arg.fd = mfd;
> + arg.token = token;
> + if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0)
> + goto out;
> +
> + ret = 0;
> +out:
> + if (ret != 0 && errno != 0)
> + ret = -errno;
> + if (mfd >= 0)
> + close(mfd);
> + return ret;
> +}
> +
> +int restore_and_verify_memfd(int session_fd, int token,
> + const char *expected_data)
> +{
> + struct liveupdate_session_retrieve_fd arg = { .size = sizeof(arg) };
> + long page_size = sysconf(_SC_PAGE_SIZE);
> + void *map = MAP_FAILED;
> + int mfd = -1, ret = -1;
> +
> + arg.token = token;
> + if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg) < 0)
> + return -errno;
> + mfd = arg.fd;
> +
> + map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0);
> + if (map == MAP_FAILED)
> + goto out;
> +
> + if (expected_data && strcmp(expected_data, map) != 0) {
> + ksft_print_msg("Data mismatch! Expected '%s', Got '%s'\n",
> + expected_data, (char *)map);
> + ret = -EINVAL;
> + goto out_munmap;
> + }
> +
> + ret = mfd;
> +out_munmap:
> + munmap(map, page_size);
> +out:
> + if (ret < 0 && errno != 0)
> + ret = -errno;
> + if (ret < 0 && mfd >= 0)
> + close(mfd);
> + return ret;
> +}
> +
> +int luo_session_finish(int session_fd)
> +{
> + struct liveupdate_session_finish arg = { .size = sizeof(arg) };
> +
> + if (ioctl(session_fd, LIVEUPDATE_SESSION_FINISH, &arg) < 0)
> + return -errno;
> +
> + return 0;
> +}
> +
> +void create_state_file(int luo_fd, const char *session_name, int token,
> + int next_stage)
> +{
> + char buf[32];
> + int state_session_fd;
> +
> + state_session_fd = luo_create_session(luo_fd, session_name);
> + if (state_session_fd < 0)
> + fail_exit("luo_create_session for state tracking");
> +
> + snprintf(buf, sizeof(buf), "%d", next_stage);
> + if (create_and_preserve_memfd(state_session_fd, token, buf) < 0)
> + fail_exit("create_and_preserve_memfd for state tracking");
> +
> + /*
> + * DO NOT close session FD, otherwise it is going to be unpreserved
> + */
> +}
> +
> +void restore_and_read_stage(int state_session_fd, int token, int *stage)
> +{
> + char buf[32] = {0};
> + int mfd;
> +
> + mfd = restore_and_verify_memfd(state_session_fd, token, NULL);
> + if (mfd < 0)
> + fail_exit("failed to restore state memfd");
> +
> + if (read(mfd, buf, sizeof(buf) - 1) < 0)
> + fail_exit("failed to read state mfd");
> +
> + *stage = atoi(buf);
> +
> + close(mfd);
> +}
> diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/luo_test_utils.h
> new file mode 100644
> index 000000000000..093e787b9f4b
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_test_utils.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + *
> + * Utility functions for LUO kselftests.
> + */
> +
> +#ifndef LUO_TEST_UTILS_H
> +#define LUO_TEST_UTILS_H
> +
> +#include <errno.h>
> +#include <string.h>
> +#include <linux/liveupdate.h>
> +#include "../kselftest.h"
> +
> +#define LUO_DEVICE "/dev/liveupdate"
> +
> +#define fail_exit(fmt, ...) \
> + ksft_exit_fail_msg("[%s:%d] " fmt " (errno: %s)\n", \
> + __func__, __LINE__, ##__VA_ARGS__, strerror(errno))
> +
> +/* Generic LUO and session management helpers */
> +int luo_open_device(void);
> +int luo_create_session(int luo_fd, const char *name);
> +int luo_retrieve_session(int luo_fd, const char *name);
> +int luo_session_finish(int session_fd);
> +
> +/* Generic file preservation and restoration helpers */
> +int create_and_preserve_memfd(int session_fd, int token, const char *data);
> +int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
> +
> +/* Kexec state-tracking helpers */
> +void create_state_file(int luo_fd, const char *session_name, int token,
> + int next_stage);
> +void restore_and_read_stage(int state_session_fd, int token, int *stage);
> +
> +#endif /* LUO_TEST_UTILS_H */
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-16 14:55 ` Pasha Tatashin
@ 2025-11-16 19:16 ` Mike Rapoport
2025-11-17 18:29 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-16 19:16 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > > +static int __init liveupdate_early_init(void)
> > > +{
> > > + int err;
> > > +
> > > + err = luo_early_startup();
> > > + if (err) {
> > > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > + ERR_PTR(err));
> >
> > How do we report this to the userspace?
> > I think the decision what to do in this case belongs there. Even if it's
> > down to choosing between plain kexec and full reboot, it's still a policy
> > that should be implemented in userspace.
>
> I agree that policy belongs in userspace, and that is how we designed
> it. In this specific failure case (ABI mismatch or corrupt FDT), the
> preserved state is unrecoverable by the kernel. We cannot parse the
> incoming data, so we cannot offer it to userspace.
>
> We report this state by not registering the /dev/liveupdate device.
> When the userspace agent attempts to initialize, it receives ENOENT.
> At that point, the agent exercises its policy:
>
> - Check dmesg for the specific error and report the failure to the
> fleet control plane.
Hmm, this is not nice. I think we still should register /dev/liveupdate and
let userspace discover this error via /dev/liveupdate ABIs.
> - Trigger a fresh (kexec or cold) reboot to reset unreclaimable resources.
>
> Pasha
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
@ 2025-11-17 2:54 ` Andrew Morton
2025-11-17 14:27 ` Pasha Tatashin
2025-11-18 15:45 ` Pratyush Yadav
1 sibling, 1 reply; 92+ messages in thread
From: Andrew Morton @ 2025-11-17 2:54 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, 15 Nov 2025 18:33:47 -0500 Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec).
Thanks, I updated mm.git's mm-unstable branch to this version. I
expect at least one more version as a result of feedback for this v6.
I wasn't able to reproduce Stephen's build error
(https://lkml.kernel.org/r/20251117093614.1490d048@canb.auug.org.au)
with this series.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
2025-11-15 23:33 ` [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state Pasha Tatashin
@ 2025-11-17 9:39 ` Mike Rapoport
2025-11-18 3:54 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 9:39 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:54PM -0500, Pasha Tatashin wrote:
> Introduce a mechanism for managing global kernel state whose lifecycle
> is tied to the preservation of one or more files. This is necessary for
> subsystems where multiple preserved file descriptors depend on a single,
> shared underlying resource.
>
> An example is HugeTLB, where multiple file descriptors such as memfd and
> guest_memfd may rely on the state of a single HugeTLB subsystem.
> Preserving this state for each individual file would be redundant and
> incorrect. The state should be preserved only once when the first file
> is preserved, and restored/finished only once the last file is handled.
>
> This patch introduces File-Lifecycle-Bound (FLB) objects to solve this
> problem. An FLB is a global, reference-counted object with a defined set
> of operations:
>
> - A file handler (struct liveupdate_file_handler) declares a dependency
> on one or more FLBs via a new registration function,
> liveupdate_register_flb().
> - When the first file depending on an FLB is preserved, the FLB's
> .preserve() callback is invoked to save the shared global state. The
> reference count is then incremented for each subsequent file.
> - Conversely, when the last file is unpreserved (before reboot) or
> finished (after reboot), the FLB's .unpreserve() or .finish() callback
> is invoked to clean up the global resource.
>
> The implementation includes:
>
> - A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a
> corresponding FDT node (luo-flb) to serialize the state of all active
> FLBs and pass them via Kexec Handover.
> - Core logic in luo_flb.c to manage FLB registration, reference
> counting, and the invocation of lifecycle callbacks.
> - An API (liveupdate_flb_*_locked/*_unlock) for other kernel subsystems
> to safely access the live object managed by an FLB, both before and
> after the live update.
>
> This framework provides the necessary infrastructure for more complex
> subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update
> Orchestrator.
The concept makes sense to me, but it's hard to review the implementation
without an actual user.
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/liveupdate.h | 116 +++++
> include/linux/liveupdate/abi/luo.h | 76 ++++
> kernel/liveupdate/Makefile | 1 +
> kernel/liveupdate/luo_core.c | 7 +-
> kernel/liveupdate/luo_file.c | 8 +
> kernel/liveupdate/luo_flb.c | 658 +++++++++++++++++++++++++++++
> kernel/liveupdate/luo_internal.h | 7 +
> 7 files changed, 872 insertions(+), 1 deletion(-)
> create mode 100644 kernel/liveupdate/luo_flb.c
>
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 4a5d4dd9905a..36a831ae3ead 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -14,6 +14,7 @@
> #include <uapi/linux/liveupdate.h>
>
> struct liveupdate_file_handler;
> +struct liveupdate_flb;
> struct liveupdate_session;
> struct file;
>
> @@ -81,6 +82,7 @@ struct liveupdate_file_ops {
> * associated with individual &struct file instances.
> * @list: Used for linking this handler instance into a global
> * list of registered file handlers.
> + * @flb_list: A list of FLB dependencies.
> *
> * Modules that want to support live update for specific file types should
> * register an instance of this structure. LUO uses this registration to
> @@ -91,6 +93,80 @@ struct liveupdate_file_handler {
> const struct liveupdate_file_ops *ops;
> const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
> struct list_head list;
> + struct list_head flb_list;
> +};
> +
> +/**
> + * struct liveupdate_flb_op_args - Arguments for FLB operation callbacks.
> + * @flb: The global FLB instance for which this call is performed.
> + * @data: For .preserve(): [OUT] The callback sets this field.
> + * For .unpreserve(): [IN] The handle from .preserve().
> + * For .retrieve(): [IN] The handle from .preserve().
> + * @obj: For .preserve(): [OUT] Sets this to the live object.
> + * For .retrieve(): [OUT] Sets this to the live object.
> + * For .finish(): [IN] The live object from .retrieve().
> + *
> + * This structure bundles all parameters for the FLB operation callbacks.
> + */
> +struct liveupdate_flb_op_args {
> + struct liveupdate_flb *flb;
> + u64 data;
> + void *obj;
> +};
> +
> +/**
> + * struct liveupdate_flb_ops - Callbacks for global File-Lifecycle-Bound data.
> + * @preserve: Called when the first file using this FLB is preserved.
> + * The callback must save its state and return a single,
> + * self-contained u64 handle by setting the 'argp->data'
> + * field and 'argp->obj'.
> + * @unpreserve: Called when the last file using this FLB is unpreserved
> + * (aborted before reboot). Receives the handle via
> + * 'argp->data' and live object via 'argp->obj'.
> + * @retrieve: Called on-demand in the new kernel, the first time a
> + * component requests access to the shared object. It receives
> + * the preserved handle via 'argp->data' and must reconstruct
> + * the live object, returning it by setting the 'argp->obj'
> + * field.
> + * @finish: Called in the new kernel when the last file using this FLB
> + * is finished. Receives the live object via 'argp->obj' for
> + * cleanup.
> + * @owner: Module reference
> + *
> + * Operations that manage global shared data with file bound lifecycle,
> + * triggered by the first file that uses it and concluded by the last file that
> + * uses it, across all sessions.
> + */
> +struct liveupdate_flb_ops {
> + int (*preserve)(struct liveupdate_flb_op_args *argp);
> + void (*unpreserve)(struct liveupdate_flb_op_args *argp);
> + int (*retrieve)(struct liveupdate_flb_op_args *argp);
> + void (*finish)(struct liveupdate_flb_op_args *argp);
> + struct module *owner;
> +};
> +
> +/**
> + * struct liveupdate_flb - A global definition for a shared data object.
> + * @ops: Callback functions
> + * @compatible: The compatibility string (e.g., "iommu-core-v1"
> + * that uniquely identifies the FLB type this handler
> + * supports. This is matched against the compatible string
> + * associated with individual &struct liveupdate_flb
> + * instances.
> + * @list: A global list of registered FLBs.
> + * @internal: Internal state, set in liveupdate_init_flb().
> + *
> + * This struct is the "template" that a driver registers to define a shared,
> + * file-lifecycle-bound object. The actual runtime state (the live object,
> + * refcount, etc.) is managed internally by the LUO core.
> + * Use liveupdate_init_flb() to initialize this struct before using it in
> + * other functions.
> + */
> +struct liveupdate_flb {
> + const struct liveupdate_flb_ops *ops;
> + const char compatible[LIVEUPDATE_FLB_COMPAT_LENGTH];
> + struct list_head list;
> + void *internal;
Can't list be a part of internal?
And don't we usually call this .private rather than .internal?
> };
>
> #ifdef CONFIG_LIVEUPDATE
> @@ -111,6 +187,17 @@ int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> struct file *file, u64 *tokenp);
>
> +/* Before using FLB for the first time it should be initialized */
> +int liveupdate_init_flb(struct liveupdate_flb *flb);
> +
> +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> + struct liveupdate_flb *flb);
While these are obvious ...
> +
> +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp);
> +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj);
> +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp);
> +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj);
> +
... it's not very clear what these APIs are for and how they are going to be
used.
> #else /* CONFIG_LIVEUPDATE */
...
> +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> + struct liveupdate_flb *flb)
> +{
> + struct luo_flb_internal *internal = flb->internal;
> + struct luo_flb_link *link __free(kfree) = NULL;
> + static DEFINE_MUTEX(register_flb_lock);
> + struct liveupdate_flb *gflb;
> + struct luo_flb_link *iter;
> +
> + if (!liveupdate_enabled())
> + return -EOPNOTSUPP;
> +
> + if (WARN_ON(!h || !flb || !internal))
> + return -EINVAL;
> +
> + if (WARN_ON(!flb->ops->preserve || !flb->ops->unpreserve ||
> + !flb->ops->retrieve || !flb->ops->finish)) {
> + return -EINVAL;
> + }
> +
> + /*
> + * Once session/files have been deserialized, FLBs cannot be registered,
> + * it is too late. Deserialization uses file handlers, and FLB registers
> + * to file handlers.
> + */
> + if (WARN_ON(luo_session_is_deserialized()))
> + return -EBUSY;
> +
> + /*
> + * File handler must already be registered, as it is initializes the
> + * flb_list
> + */
> + if (WARN_ON(list_empty(&h->list)))
> + return -EINVAL;
> +
> + link = kzalloc(sizeof(*link), GFP_KERNEL);
> + if (!link)
> + return -ENOMEM;
> +
> + guard(mutex)(®ister_flb_lock);
> +
> + /* Check that this FLB is not already linked to this file handler */
> + list_for_each_entry(iter, &h->flb_list, list) {
> + if (iter->flb == flb)
> + return -EEXIST;
> + }
> +
> + /* Is this FLB linked to global list ? */
Maybe:
/*
* If this FLB is not linked to global list it's first time the FLB
* is registered
*/
> + if (list_empty(&flb->list)) {
> + if (luo_flb_global.count == LUO_FLB_MAX)
> + return -ENOSPC;
> +
> + /* Check that compatible string is unique in global list */
> + list_for_each_entry(gflb, &luo_flb_global.list, list) {
> + if (!strcmp(gflb->compatible, flb->compatible))
> + return -EEXIST;
> + }
> +
> + if (!try_module_get(flb->ops->owner))
> + return -EAGAIN;
> +
> + list_add_tail(&flb->list, &luo_flb_global.list);
> + luo_flb_global.count++;
> + }
> +
> + /* Finally, link the FLB to the file handler */
> + link->flb = flb;
> + list_add_tail(&no_free_ptr(link)->list, &h->flb_list);
> +
> + return 0;
> +}
> +
> +/**
> + * liveupdate_flb_incoming_locked - Lock and retrieve the incoming FLB object.
> + * @flb: The FLB definition.
> + * @objp: Output parameter; will be populated with the live shared object.
> + *
> + * Acquires the FLB's internal lock and returns a pointer to its shared live
> + * object for the incoming (post-reboot) path.
> + *
> + * If this is the first time the object is requested in the new kernel, this
> + * function will trigger the FLB's .retrieve() callback to reconstruct the
> + * object from its preserved state. Subsequent calls will return the same
> + * cached object.
> + *
> + * The caller MUST call liveupdate_flb_incoming_unlock() to release the lock.
> + *
> + * Return: 0 on success, or a negative errno on failure. -ENODATA means no
> + * incoming FLB data, -ENOENT means specific flb not found in the incoming
> + * data, and -EOPNOTSUPP when live update is disabled or not configured.
> + */
> +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp)
> +{
> + struct luo_flb_internal *internal = flb->internal;
> +
> + if (!liveupdate_enabled())
> + return -EOPNOTSUPP;
> +
> + if (WARN_ON(!internal))
> + return -EINVAL;
> +
> + if (!internal->incoming.obj) {
> + int err = luo_flb_retrieve_one(flb);
> +
> + if (err)
> + return err;
> + }
> +
> + mutex_lock(&internal->incoming.lock);
> + *objp = internal->incoming.obj;
> +
> + return 0;
> +}
> +
> +/**
> + * liveupdate_flb_incoming_unlock - Unlock an incoming FLB object.
> + * @flb: The FLB definition.
> + * @obj: The object that was returned by the _locked call (used for validation).
> + *
> + * Releases the internal lock acquired by liveupdate_flb_incoming_locked().
> + */
> +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj)
> +{
> + struct luo_flb_internal *internal = flb->internal;
> +
> + lockdep_assert_held(&internal->incoming.lock);
> + internal->incoming.obj = obj;
The comment says obj is for validation and here it's assigned to flb.
Something is off here :)
> + mutex_unlock(&internal->incoming.lock);
> +}
> +
> +/**
> + * liveupdate_flb_outgoing_locked - Lock and retrieve the outgoing FLB object.
> + * @flb: The FLB definition.
> + * @objp: Output parameter; will be populated with the live shared object.
> + *
> + * Acquires the FLB's internal lock and returns a pointer to its shared live
> + * object for the outgoing (pre-reboot) path.
> + *
> + * This function assumes the object has already been created by the FLB's
> + * .preserve() callback, which is triggered when the first dependent file
> + * is preserved.
> + *
> + * The caller MUST call liveupdate_flb_outgoing_unlock() to release the lock.
> + *
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp)
> +{
> + struct luo_flb_internal *internal = flb->internal;
> +
> + if (!liveupdate_enabled())
> + return -EOPNOTSUPP;
> +
> + if (WARN_ON(!internal))
> + return -EINVAL;
> +
> + mutex_lock(&internal->outgoing.lock);
> +
> + /* The object must exist if any file is being preserved */
> + if (WARN_ON_ONCE(!internal->outgoing.obj)) {
> + mutex_unlock(&internal->outgoing.lock);
> + return -ENOENT;
> + }
_incoming_locked() and outgoing_locked() are nearly identical, it seems we
can have the common part in a
static liveupdate_flb_locked(struct luo_flb_state *state).
liveupdate_flb_incoming_locked() will be oneline wrapper and
liveupdate_flb_outgoing_locked() will have this WARN_ON if obj is NULL.
> +
> + *objp = internal->outgoing.obj;
> +
> + return 0;
> +}
> +
> +/**
> + * liveupdate_flb_outgoing_unlock - Unlock an outgoing FLB object.
> + * @flb: The FLB definition.
> + * @obj: The object that was returned by the _locked call (used for validation).
> + *
> + * Releases the internal lock acquired by liveupdate_flb_outgoing_locked().
> + */
> +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj)
> +{
> + struct luo_flb_internal *internal = flb->internal;
> +
> + lockdep_assert_held(&internal->outgoing.lock);
> + internal->outgoing.obj = obj;
So it is assignment or validation? ;-)
This one is a copy of liveupdate_flb_incoming_unlock(),
> + mutex_unlock(&internal->outgoing.lock);
> +}
> +
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 10/20] MAINTAINERS: add liveupdate entry
2025-11-15 23:33 ` [PATCH v6 10/20] MAINTAINERS: add liveupdate entry Pasha Tatashin
@ 2025-11-17 9:40 ` Mike Rapoport
2025-11-17 18:20 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 9:40 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:56PM -0500, Pasha Tatashin wrote:
> Add a MAINTAINERS file entry for the new Live Update Orchestrator
> introduced in previous patches.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> MAINTAINERS | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 500789529359..bc9f5c6f0e80 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14464,6 +14464,17 @@ F: kernel/module/livepatch.c
> F: samples/livepatch/
> F: tools/testing/selftests/livepatch/
>
> +LIVE UPDATE
> +M: Pasha Tatashin <pasha.tatashin@soleen.com>
Please count me in :)
> +L: linux-kernel@vger.kernel.org
> +S: Maintained
> +F: Documentation/core-api/liveupdate.rst
> +F: Documentation/userspace-api/liveupdate.rst
> +F: include/linux/liveupdate.h
> +F: include/linux/liveupdate/
> +F: include/uapi/linux/liveupdate.h
> +F: kernel/liveupdate/
> +
> LLC (802.2)
> L: netdev@vger.kernel.org
> S: Odd fixes
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags
2025-11-15 23:33 ` [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
@ 2025-11-17 9:48 ` Mike Rapoport
2025-11-17 18:25 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 9:48 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:57PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> shmem_inode_info::flags can have the VM flags VM_NORESERVE and
> VM_LOCKED. These are used to suppress pre-accounting or to lock the
> pages in the inode respectively. Using the VM flags directly makes it
> difficult to add shmem-specific flags that are unrelated to VM behavior
> since one would need to find a VM flag not used by shmem and re-purpose
> it.
>
> Introduce SHMEM_F_NORESERVE and SHMEM_F_LOCKED which represent the same
> information, but their bits are independent of the VM flags. Callers can
> still pass VM_NORESERVE to shmem_get_inode(), but it gets transformed to
> the shmem-specific flag internally.
>
> No functional changes intended.
>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/shmem_fs.h | 6 ++++++
> mm/shmem.c | 28 +++++++++++++++-------------
> 2 files changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 0e47465ef0fd..650874b400b5 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -10,6 +10,7 @@
> #include <linux/xattr.h>
> #include <linux/fs_parser.h>
> #include <linux/userfaultfd_k.h>
> +#include <linux/bits.h>
>
> struct swap_iocb;
>
> @@ -19,6 +20,11 @@ struct swap_iocb;
> #define SHMEM_MAXQUOTAS 2
> #endif
>
> +/* Suppress pre-accounting of the entire object size. */
> +#define SHMEM_F_NORESERVE BIT(0)
> +/* Disallow swapping. */
> +#define SHMEM_F_LOCKED BIT(1)
> +
> struct shmem_inode_info {
> spinlock_t lock;
> unsigned int seals; /* shmem seals */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 58701d14dd96..1d5036dec08a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -175,20 +175,20 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
> */
> static inline int shmem_acct_size(unsigned long flags, loff_t size)
> {
> - return (flags & VM_NORESERVE) ?
> + return (flags & SHMEM_F_NORESERVE) ?
> 0 : security_vm_enough_memory_mm(current->mm, VM_ACCT(size));
> }
>
> static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> {
> - if (!(flags & VM_NORESERVE))
> + if (!(flags & SHMEM_F_NORESERVE))
> vm_unacct_memory(VM_ACCT(size));
> }
>
> static inline int shmem_reacct_size(unsigned long flags,
> loff_t oldsize, loff_t newsize)
> {
> - if (!(flags & VM_NORESERVE)) {
> + if (!(flags & SHMEM_F_NORESERVE)) {
> if (VM_ACCT(newsize) > VM_ACCT(oldsize))
> return security_vm_enough_memory_mm(current->mm,
> VM_ACCT(newsize) - VM_ACCT(oldsize));
> @@ -206,7 +206,7 @@ static inline int shmem_reacct_size(unsigned long flags,
> */
> static inline int shmem_acct_blocks(unsigned long flags, long pages)
> {
> - if (!(flags & VM_NORESERVE))
> + if (!(flags & SHMEM_F_NORESERVE))
> return 0;
>
> return security_vm_enough_memory_mm(current->mm,
> @@ -215,7 +215,7 @@ static inline int shmem_acct_blocks(unsigned long flags, long pages)
>
> static inline void shmem_unacct_blocks(unsigned long flags, long pages)
> {
> - if (flags & VM_NORESERVE)
> + if (flags & SHMEM_F_NORESERVE)
> vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
> }
>
> @@ -1551,7 +1551,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
> int nr_pages;
> bool split = false;
>
> - if ((info->flags & VM_LOCKED) || sbinfo->noswap)
> + if ((info->flags & SHMEM_F_LOCKED) || sbinfo->noswap)
> goto redirty;
>
> if (!total_swap_pages)
> @@ -2910,15 +2910,15 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
> * ipc_lock_object() when called from shmctl_do_lock(),
> * no serialization needed when called from shm_destroy().
> */
> - if (lock && !(info->flags & VM_LOCKED)) {
> + if (lock && !(info->flags & SHMEM_F_LOCKED)) {
> if (!user_shm_lock(inode->i_size, ucounts))
> goto out_nomem;
> - info->flags |= VM_LOCKED;
> + info->flags |= SHMEM_F_LOCKED;
> mapping_set_unevictable(file->f_mapping);
> }
> - if (!lock && (info->flags & VM_LOCKED) && ucounts) {
> + if (!lock && (info->flags & SHMEM_F_LOCKED) && ucounts) {
> user_shm_unlock(inode->i_size, ucounts);
> - info->flags &= ~VM_LOCKED;
> + info->flags &= ~SHMEM_F_LOCKED;
> mapping_clear_unevictable(file->f_mapping);
> }
> retval = 0;
> @@ -3062,7 +3062,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
> spin_lock_init(&info->lock);
> atomic_set(&info->stop_eviction, 0);
> info->seals = F_SEAL_SEAL;
> - info->flags = flags & VM_NORESERVE;
> + info->flags = (flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
> info->i_crtime = inode_get_mtime(inode);
> info->fsflags = (dir == NULL) ? 0 :
> SHMEM_I(dir)->fsflags & SHMEM_FL_INHERITED;
> @@ -5804,8 +5804,10 @@ static inline struct inode *shmem_get_inode(struct mnt_idmap *idmap,
> /* common code */
>
> static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
> - loff_t size, unsigned long flags, unsigned int i_flags)
> + loff_t size, unsigned long vm_flags,
> + unsigned int i_flags)
> {
> + unsigned long flags = (vm_flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
> struct inode *inode;
> struct file *res;
>
> @@ -5822,7 +5824,7 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
> return ERR_PTR(-ENOMEM);
>
> inode = shmem_get_inode(&nop_mnt_idmap, mnt->mnt_sb, NULL,
> - S_IFREG | S_IRWXUGO, 0, flags);
> + S_IFREG | S_IRWXUGO, 0, vm_flags);
> if (IS_ERR(inode)) {
> shmem_unacct_size(flags, size);
> return ERR_CAST(inode);
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 12/20] mm: shmem: allow freezing inode mapping
2025-11-15 23:33 ` [PATCH v6 12/20] mm: shmem: allow freezing inode mapping Pasha Tatashin
@ 2025-11-17 10:08 ` Mike Rapoport
2025-11-18 4:13 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 10:08 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:58PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> To prepare a shmem inode for live update via the Live Update
> Orchestrator (LUO), its index -> folio mappings must be serialized. Once
> the mappings are serialized, they cannot change since it would cause the
> serialized data to become inconsistent. This can be done by pinning the
> folios to avoid migration, and by making sure no folios can be added to
> or removed from the inode.
>
> While mechanisms to pin folios already exist, the only way to stop
> folios being added or removed are the grow and shrink file seals. But
> file seals come with their own semantics, one of which is that they
> can't be removed. This doesn't work with liveupdate since it can be
> cancelled or error out, which would need the seals to be removed and the
> file's normal functionality to be restored.
>
> Introduce SHMEM_F_MAPPING_FROZEN to indicate this instead. It is
> internal to shmem and is not directly exposed to userspace. It functions
> similar to F_SEAL_GROW | F_SEAL_SHRINK, but additionally disallows hole
> punching, and can be removed.
>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/shmem_fs.h | 17 +++++++++++++++++
> mm/shmem.c | 12 +++++++++++-
> 2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 650874b400b5..a9f5db472a39 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -24,6 +24,14 @@ struct swap_iocb;
> #define SHMEM_F_NORESERVE BIT(0)
> /* Disallow swapping. */
> #define SHMEM_F_LOCKED BIT(1)
> +/*
> + * Disallow growing, shrinking, or hole punching in the inode. Combined with
> + * folio pinning, makes sure the inode's mapping stays fixed.
> + *
> + * In some ways similar to F_SEAL_GROW | F_SEAL_SHRINK, but can be removed and
> + * isn't directly visible to userspace.
> + */
> +#define SHMEM_F_MAPPING_FROZEN BIT(2)
>
> struct shmem_inode_info {
> spinlock_t lock;
> @@ -186,6 +194,15 @@ static inline bool shmem_file(struct file *file)
> return shmem_mapping(file->f_mapping);
> }
>
> +/* Must be called with inode lock taken exclusive. */
> +static inline void shmem_i_mapping_freeze(struct inode *inode, bool freeze)
_mapping usually refers to operations on struct address_space.
It seems that all shmem methods that take inode are just shmem_<operation>,
so shmem_freeze() looks more appropriate.
> +{
> + if (freeze)
> + SHMEM_I(inode)->flags |= SHMEM_F_MAPPING_FROZEN;
> + else
> + SHMEM_I(inode)->flags &= ~SHMEM_F_MAPPING_FROZEN;
> +}
> +
> /*
> * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
> * beyond i_size's notion of EOF, which fallocate has committed to reserving:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1d5036dec08a..05c3db840257 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1292,7 +1292,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
> loff_t newsize = attr->ia_size;
>
> /* protected by i_rwsem */
> - if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> + if ((info->flags & SHMEM_F_MAPPING_FROZEN) ||
A corner case: if newsize == oldsize this will be a false positive
> + (newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> (newsize > oldsize && (info->seals & F_SEAL_GROW)))
> return -EPERM;
>
> @@ -3289,6 +3290,10 @@ shmem_write_begin(const struct kiocb *iocb, struct address_space *mapping,
> return -EPERM;
> }
>
> + if (unlikely((info->flags & SHMEM_F_MAPPING_FROZEN) &&
> + pos + len > inode->i_size))
> + return -EPERM;
> +
> ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE);
> if (ret)
> return ret;
> @@ -3662,6 +3667,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>
> inode_lock(inode);
>
> + if (info->flags & SHMEM_F_MAPPING_FROZEN) {
> + error = -EPERM;
> + goto out;
> + }
> +
> if (mode & FALLOC_FL_PUNCH_HOLE) {
> struct address_space *mapping = file->f_mapping;
> loff_t unmap_start = round_up(offset, PAGE_SIZE);
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 13/20] mm: shmem: export some functions to internal.h
2025-11-15 23:33 ` [PATCH v6 13/20] mm: shmem: export some functions to internal.h Pasha Tatashin
@ 2025-11-17 10:14 ` Mike Rapoport
2025-11-17 18:43 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 10:14 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:33:59PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> shmem_inode_acct_blocks(), shmem_recalc_inode(), and
> shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
> functionality will also be used in the future by Live Update
> Orchestrator (LUO) to recreate memfd files after a live update.
I'd rephrase this a bit to say that it will be used by memfd integration
into LUO to emphasize this stays inside mm.
Other than that
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> mm/internal.h | 6 ++++++
> mm/shmem.c | 10 +++++-----
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..4ba155524f80 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1562,6 +1562,12 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid);
> unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
> int priority);
>
> +int shmem_add_to_page_cache(struct folio *folio,
> + struct address_space *mapping,
> + pgoff_t index, void *expected, gfp_t gfp);
> +int shmem_inode_acct_blocks(struct inode *inode, long pages);
> +bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped);
> +
> #ifdef CONFIG_SHRINKER_DEBUG
> static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
> struct shrinker *shrinker, const char *fmt, va_list ap)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 05c3db840257..c3dc4af59c14 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -219,7 +219,7 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
> vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
> }
>
> -static int shmem_inode_acct_blocks(struct inode *inode, long pages)
> +int shmem_inode_acct_blocks(struct inode *inode, long pages)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> @@ -435,7 +435,7 @@ static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
> *
> * Return: true if swapped was incremented from 0, for shmem_writeout().
> */
> -static bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
> +bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> bool first_swapped = false;
> @@ -861,9 +861,9 @@ static void shmem_update_stats(struct folio *folio, int nr_pages)
> /*
> * Somewhat like filemap_add_folio, but error if expected item has gone.
> */
> -static int shmem_add_to_page_cache(struct folio *folio,
> - struct address_space *mapping,
> - pgoff_t index, void *expected, gfp_t gfp)
> +int shmem_add_to_page_cache(struct folio *folio,
> + struct address_space *mapping,
> + pgoff_t index, void *expected, gfp_t gfp)
> {
> XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
> unsigned long nr = folio_nr_pages(folio);
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state
2025-11-15 23:34 ` [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state Pasha Tatashin
@ 2025-11-17 10:15 ` Mike Rapoport
2025-11-17 18:45 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 10:15 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:34:00PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <pratyush@kernel.org>
>
> Currently file handlers only get the serialized_data field to store
> their state. This field has a pointer to the serialized state of the
> file, and it becomes a part of LUO file's serialized state.
>
> File handlers can also need some runtime state to track information that
> shouldn't make it in the serialized data.
>
> One such example is a vmalloc pointer. While kho_preserve_vmalloc()
> preserves the memory backing a vmalloc allocation, it does not store the
> original vmap pointer, since that has no use being passed to the next
> kernel. The pointer is needed to free the memory in case the file is
> unpreserved.
>
> Provide a private field in struct luo_file and pass it to all the
> callbacks. The field's can be set by preserve, and must be freed by
> unpreserve.
>
> Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/liveupdate.h | 5 +++++
> kernel/liveupdate/luo_file.c | 9 +++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 36a831ae3ead..defc69a1985d 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -29,6 +29,10 @@ struct file;
> * this to the file being operated on.
> * @serialized_data: The opaque u64 handle, preserve/prepare/freeze may update
> * this field.
> + * @private_data: Private data for the file used to hold runtime state that
> + * is not preserved. Set by the handler's .preserve()
> + * callback, and must be freed in the handler's
> + * .unpreserve() callback.
> *
> * This structure bundles all parameters for the file operation callbacks.
> * The 'data' and 'file' fields are used for both input and output.
> @@ -39,6 +43,7 @@ struct liveupdate_file_op_args {
> bool retrieved;
> struct file *file;
> u64 serialized_data;
> + void *private_data;
> };
>
> /**
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> index 3d3bd84cb281..df337c9c4f21 100644
> --- a/kernel/liveupdate/luo_file.c
> +++ b/kernel/liveupdate/luo_file.c
> @@ -126,6 +126,10 @@ static LIST_HEAD(luo_file_handler_list);
> * This handle is passed back to the handler's .freeze(),
> * .retrieve(), and .finish() callbacks, allowing it to track
> * and update its serialized state across phases.
> + * @private_data: Pointer to the private data for the file used to hold runtime
> + * state that is not preserved. Set by the handler's .preserve()
> + * callback, and must be freed in the handler's .unpreserve()
> + * callback.
> * @retrieved: A flag indicating whether a user/kernel in the new kernel has
> * successfully called retrieve() on this file. This prevents
> * multiple retrieval attempts.
> @@ -152,6 +156,7 @@ struct luo_file {
> struct liveupdate_file_handler *fh;
> struct file *file;
> u64 serialized_data;
> + void *private_data;
> bool retrieved;
> struct mutex mutex;
> struct list_head list;
> @@ -309,6 +314,7 @@ int luo_preserve_file(struct luo_session *session, u64 token, int fd)
> goto exit_err;
> } else {
> luo_file->serialized_data = args.serialized_data;
> + luo_file->private_data = args.private_data;
> list_add_tail(&luo_file->list, &session->files_list);
> session->count++;
> }
> @@ -356,6 +362,7 @@ void luo_file_unpreserve_files(struct luo_session *session)
> args.session = (struct liveupdate_session *)session;
> args.file = luo_file->file;
> args.serialized_data = luo_file->serialized_data;
> + args.private_data = luo_file->private_data;
> luo_file->fh->ops->unpreserve(&args);
> luo_flb_file_unpreserve(luo_file->fh);
>
> @@ -384,6 +391,7 @@ static int luo_file_freeze_one(struct luo_session *session,
> args.session = (struct liveupdate_session *)session;
> args.file = luo_file->file;
> args.serialized_data = luo_file->serialized_data;
> + args.private_data = luo_file->private_data;
>
> err = luo_file->fh->ops->freeze(&args);
> if (!err)
> @@ -405,6 +413,7 @@ static void luo_file_unfreeze_one(struct luo_session *session,
> args.session = (struct liveupdate_session *)session;
> args.file = luo_file->file;
> args.serialized_data = luo_file->serialized_data;
> + args.private_data = luo_file->private_data;
>
> luo_file->fh->ops->unfreeze(&args);
> }
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
2025-11-15 23:34 ` [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd Pasha Tatashin
@ 2025-11-17 11:03 ` Mike Rapoport
2025-11-19 21:56 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 11:03 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> The ability to preserve a memfd allows userspace to use KHO and LUO to
> transfer its memory contents to the next kernel. This is useful in many
> ways. For one, it can be used with IOMMUFD as the backing store for
> IOMMU page tables. Preserving IOMMUFD is essential for performing a
> hypervisor live update with passthrough devices. memfd support provides
> the first building block for making that possible.
>
> For another, applications with a large amount of memory that takes time
> to reconstruct, reboots to consume kernel upgrades can be very
> expensive. memfd with LUO gives those applications reboot-persistent
> memory that they can use to quickly save and reconstruct that state.
>
> While memfd is backed by either hugetlbfs or shmem, currently only
> support on shmem is added. To be more precise, support for anonymous
> shmem files is added.
>
> The handover to the next kernel is not transparent. All the properties
> of the file are not preserved; only its memory contents, position, and
> size. The recreated file gets the UID and GID of the task doing the
> restore, and the task's cgroup gets charged with the memory.
>
> Once preserved, the file cannot grow or shrink, and all its pages are
> pinned to avoid migrations and swapping. The file can still be read from
> or written to.
>
> Use vmalloc to get the buffer to hold the folios, and preserve
> it using kho_preserve_vmalloc(). This doesn't have the size limit.
>
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
The order of signed-offs seems wrong, Pasha's should be the last one.
> ---
...
> +/**
> + * DOC: memfd Live Update ABI
> + *
> + * This header defines the ABI for preserving the state of a memfd across a
> + * kexec reboot using the LUO.
> + *
> + * The state is serialized into a Flattened Device Tree which is then handed
> + * over to the next kernel via the KHO mechanism. The FDT is passed as the
> + * opaque `data` handle in the file handler callbacks.
> + *
> + * This interface is a contract. Any modification to the FDT structure,
> + * node properties, compatible string, or the layout of the serialization
> + * structures defined here constitutes a breaking change. Such changes require
> + * incrementing the version number in the MEMFD_LUO_FH_COMPATIBLE string.
The same comment about contract as for the generic LUO documentation
applies here (https://lore.kernel.org/all/aRnG8wDSSAtkEI_z@kernel.org/)
> + *
> + * FDT Structure Overview:
> + * The memfd state is contained within a single FDT with the following layout:
...
> +static struct memfd_luo_folio_ser *memfd_luo_preserve_folios(struct file *file, void *fdt,
> + u64 *nr_foliosp)
> +{
If we are already returning nr_folios by reference, we might do it for
memfd_luo_folio_ser as well and make the function return int.
> + struct inode *inode = file_inode(file);
> + struct memfd_luo_folio_ser *pfolios;
> + struct kho_vmalloc *kho_vmalloc;
> + unsigned int max_folios;
> + long i, size, nr_pinned;
> + struct folio **folios;
pfolios and folios read like the former is a pointer to latter.
I'd s/pfolios/folios_ser/
> + int err = -EINVAL;
> + pgoff_t offset;
> + u64 nr_folios;
...
> + kvfree(folios);
> + *nr_foliosp = nr_folios;
> + return pfolios;
> +
> +err_unpreserve:
> + i--;
> + for (; i >= 0; i--)
Maybe a single line
for (--i; i >= 0; --i)
> + kho_unpreserve_folio(folios[i]);
> + vfree(pfolios);
> +err_unpin:
> + unpin_folios(folios, nr_folios);
> +err_free_folios:
> + kvfree(folios);
> + return ERR_PTR(err);
> +}
> +
> +static void memfd_luo_unpreserve_folios(void *fdt, struct memfd_luo_folio_ser *pfolios,
> + u64 nr_folios)
> +{
> + struct kho_vmalloc *kho_vmalloc;
> + long i;
> +
> + if (!nr_folios)
> + return;
> +
> + kho_vmalloc = (struct kho_vmalloc *)fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, NULL);
> + /* The FDT was created by this kernel so expect it to be sane. */
> + WARN_ON_ONCE(!kho_vmalloc);
The FDT won't have FOLIOS property if size was zero, will it?
I think that if we add kho_vmalloc handle to struct memfd_luo_private and
pass that around it will make things easier and simpler.
> + kho_unpreserve_vmalloc(kho_vmalloc);
> +
> + for (i = 0; i < nr_folios; i++) {
> + const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
> + struct folio *folio;
> +
> + if (!pfolio->foliodesc)
> + continue;
How can this happen? Can pfolios be a sparse array?
> + folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> +
> + kho_unpreserve_folio(folio);
> + unpin_folio(folio);
> + }
> +
> + vfree(pfolios);
> +}
...
> +static void memfd_luo_finish(struct liveupdate_file_op_args *args)
> +{
> + const struct memfd_luo_folio_ser *pfolios;
> + struct folio *fdt_folio;
> + const void *fdt;
> + u64 nr_folios;
> +
> + if (args->retrieved)
> + return;
> +
> + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
> + if (!fdt_folio) {
> + pr_err("failed to restore memfd FDT\n");
> + return;
> + }
> +
> + fdt = folio_address(fdt_folio);
> +
> + pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
> + if (!pfolios)
> + goto out;
> +
> + memfd_luo_discard_folios(pfolios, nr_folios);
Does not this free the actual folios that were supposed to be preserved?
> + vfree(pfolios);
> +
> +out:
> + folio_put(fdt_folio);
> +}
...
> +static int memfd_luo_retrieve(struct liveupdate_file_op_args *args)
> +{
> + struct folio *fdt_folio;
> + const u64 *pos, *size;
> + struct file *file;
> + int len, ret = 0;
> + const void *fdt;
> +
> + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
Why do we need to kho_restore_folio() twice? Here and in
memfd_luo_finish()?
> + if (!fdt_folio)
> + return -ENOENT;
> +
> + fdt = page_to_virt(folio_page(fdt_folio, 0));
folio_address()
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
2025-11-15 23:34 ` [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test Pasha Tatashin
@ 2025-11-17 11:13 ` Mike Rapoport
2025-11-17 19:00 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 11:13 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 06:34:06PM -0500, Pasha Tatashin wrote:
> Introduce an in-kernel test module to validate the core logic of the
> Live Update Orchestrator's File-Lifecycle-Bound feature. This
> provides a low-level, controlled environment to test FLB registration
> and callback invocation without requiring userspace interaction or
> actual kexec reboots.
>
> The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/liveupdate/abi/luo.h | 5 +
> kernel/liveupdate/luo_file.c | 2 +
> kernel/liveupdate/luo_internal.h | 6 ++
> lib/Kconfig.debug | 23 +++++
> lib/tests/Makefile | 1 +
> lib/tests/liveupdate.c | 143 +++++++++++++++++++++++++++++
> 6 files changed, 180 insertions(+)
> create mode 100644 lib/tests/liveupdate.c
>
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 85596ce68c16..cdcace9b48f5 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -230,4 +230,9 @@ struct luo_flb_ser {
> u64 count;
> } __packed;
>
> +/* Kernel Live Update Test ABI */
> +#ifdef CONFIG_LIVEUPDATE_TEST
> +#define LIVEUPDATE_TEST_FLB_COMPATIBLE(i) "liveupdate-test-flb-v" #i
> +#endif
> +
> #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> index df337c9c4f21..9a531096bdb5 100644
> --- a/kernel/liveupdate/luo_file.c
> +++ b/kernel/liveupdate/luo_file.c
> @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> INIT_LIST_HEAD(&fh->flb_list);
> list_add_tail(&fh->list, &luo_file_handler_list);
>
> + liveupdate_test_register(fh);
> +
Why this cannot be called from the test?
> return 0;
> }
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
2025-11-16 17:15 ` Mike Rapoport
@ 2025-11-17 14:22 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 14:22 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > --- a/include/uapi/linux/liveupdate.h
> > +++ b/include/uapi/linux/liveupdate.h
> > @@ -44,6 +44,70 @@
> > #define LIVEUPDATE_IOCTL_TYPE 0xBA
> >
> > /* The maximum length of session name including null termination */
> > -#define LIVEUPDATE_SESSION_NAME_LENGTH 56
> > +#define LIVEUPDATE_SESSION_NAME_LENGTH 64
Ah, here I updated the session name length :-) I will move this change
to the proper patch.
> > +/**
> > + * struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION)
> > + * @size: Input; sizeof(struct liveupdate_ioctl_create_session)
> > + * @fd: Output; The new file descriptor for the created session.
> > + * @name: Input; A null-terminated string for the session name, max
> > + * length %LIVEUPDATE_SESSION_NAME_LENGTH including termination
> > + * char.
>
> Nit: ^ character
Done.
> > + if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> > + return -EBUSY;
> > +
> > + luo_session_deserialize();
>
> Why luo_session_deserialize() is tied to the first open of the chardev?
Because at this point, when `/dev/liveupdate` is opened we expect that
userspace has finished loading modules that might register
File-Handlers, and FLBs, with LUO, and therefore we can deserialize
the sessions and find all the rightful owners for FDs. After this
point, we also forbid registering new FHs and FLBs.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
2025-11-17 2:54 ` Andrew Morton
@ 2025-11-17 14:27 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 14:27 UTC (permalink / raw)
To: Andrew Morton
Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sun, Nov 16, 2025 at 9:54 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sat, 15 Nov 2025 18:33:47 -0500 Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec).
>
> Thanks, I updated mm.git's mm-unstable branch to this version. I
> expect at least one more version as a result of feedback for this v6.
Thank you Andrew! I plan to address all comments and send a v7 in
about a week. The comments/changes so far are minor, so I hope to land
this during the next merging window
>
> I wasn't able to reproduce Stephen's build error
> (https://lkml.kernel.org/r/20251117093614.1490d048@canb.auug.org.au)
> with this series.
That build error was fixed with the KHO fix-up patch back on Friday.
>
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
2025-11-16 17:05 ` Mike Rapoport
@ 2025-11-17 15:09 ` Pasha Tatashin
2025-11-17 21:11 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 15:09 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > +/**
> > + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> > + * @name: The unique name of the session, copied from the `luo_session`
> > + * structure.
>
> I'd phase it as
>
> The unique name of the session provided by the userspace at
> the time of session creation.
Done
>
> > + * @files: The physical address of a contiguous memory block that holds
> > + * the serialized state of files.
>
> Maybe add ^ in this session?
Done
>
> > + * @pgcnt: The number of pages occupied by the `files` memory block.
> > + * @count: The total number of files that were part of this session during
> > + * serialization. Used for iteration and validation during
> > + * restoration.
> > + *
> > + * This structure is used to package session-specific metadata for transfer
> > + * between kernels via Kexec Handover. An array of these structures (one per
> > + * session) is created and passed to the new kernel, allowing it to reconstruct
> > + * the session context.
> > + *
> > + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
>
> This comment applies to the luo_session_header_ser description as well.
Done
>
> > + */
> > +struct luo_session_ser {
> > + char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> > + u64 files;
> > + u64 pgcnt;
> > + u64 count;
> > +} __packed;
> > +
> > #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> > index df34c1642c4d..d2ef2f7e0dbd 100644
> > --- a/include/uapi/linux/liveupdate.h
> > +++ b/include/uapi/linux/liveupdate.h
> > @@ -43,4 +43,7 @@
> > /* The ioctl type, documented in ioctl-number.rst */
> > #define LIVEUPDATE_IOCTL_TYPE 0xBA
> >
> > +/* The maximum length of session name including null termination */
> > +#define LIVEUPDATE_SESSION_NAME_LENGTH 56
>
> You decided not to bump it to 64 in the end? ;-)
I bumped it to 64, but in the next patch, I will fix it in the next version.
>
> > +
> > #endif /* _UAPI_LIVEUPDATE_H */
> > diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> > index 413722002b7a..83285e7ad726 100644
> > --- a/kernel/liveupdate/Makefile
> > +++ b/kernel/liveupdate/Makefile
> > @@ -2,7 +2,8 @@
> >
> > luo-y := \
> > luo_core.o \
> > - luo_ioctl.o
> > + luo_ioctl.o \
> > + luo_session.o
> >
> > obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o
> > obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o
>
> ...
>
> > +int luo_session_retrieve(const char *name, struct file **filep)
> > +{
> > + struct luo_session_header *sh = &luo_session_global.incoming;
> > + struct luo_session *session = NULL;
> > + struct luo_session *it;
> > + int err;
> > +
> > + scoped_guard(rwsem_read, &sh->rwsem) {
> > + list_for_each_entry(it, &sh->list, list) {
> > + if (!strncmp(it->name, name, sizeof(it->name))) {
> > + session = it;
> > + break;
> > + }
> > + }
> > + }
> > +
> > + if (!session)
> > + return -ENOENT;
> > +
> > + scoped_guard(mutex, &session->mutex) {
> > + if (session->retrieved)
> > + return -EINVAL;
> > + }
> > +
> > + err = luo_session_getfile(session, filep);
> > + if (!err) {
> > + scoped_guard(mutex, &session->mutex)
> > + session->retrieved = true;
>
> Retaking the mutex here seems a bit odd.
> Do we really have to lock session->mutex in luo_session_getfile()?
Moved it out of luo_session_getfile(), and added
lockdep_assert_held(&session->mutex); to luo_session_getfile
> > +int luo_session_deserialize(void)
> > +{
> > + struct luo_session_header *sh = &luo_session_global.incoming;
> > + int err;
> > +
> > + if (luo_session_is_deserialized())
> > + return 0;
> > +
> > + luo_session_global.deserialized = true;
> > + if (!sh->active) {
> > + INIT_LIST_HEAD(&sh->list);
> > + init_rwsem(&sh->rwsem);
> > + return 0;
>
> How this can happen? luo_session_deserialize() is supposed to be called
> from ioctl and luo_session_global.incoming should be set up way earlier.
No LUO was passed from the previous kernel, so
luo_session_global.incoming.active stays false, as it is not
participating.
> And, why don't we initialize ->list and ->rwsem statically?
Good idea, done.
> > + }
> > +
> > + for (int i = 0; i < sh->header_ser->count; i++) {
> > + struct luo_session *session;
> > +
> > + session = luo_session_alloc(sh->ser[i].name);
> > + if (IS_ERR(session)) {
> > + pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > + sh->ser[i].name, session);
> > + return PTR_ERR(session);
> > + }
>
> The allocated sessions still need to be freed if an insert fails ;-)
No. We have failed to deserialize, so anyways the machine will need to
be rebooted by the user in order to release the preserved resources.
This is something that Jason Gunthrope also mentioned regarding IOMMU:
if something is not correct (i.e., if a session cannot finish for some
reason), don't add complicated "undo" code that cleans up all
resources. Instead, treat them as a memory leak and allow a reboot to
perform the cleanup.
While in this particular patch the clean-up looks simple, later in the
series we are adding file deserialization to each session to this
function. So, the clean-up will look like this: we would have to free
the resources for each session we deserialized, and also free the
resources for files that were deserialized for those sessions, only to
still boot into a "maintenance" mode where bunch of resources are not
accessible from which the machine would have to be rebooted to get
back to a normal state. This code will never be tested, and never be
used, so let's use reboot to solve this problem, where devices are
going to be properly reset, and memory is going to be properly freed.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-16 18:15 ` Mike Rapoport
@ 2025-11-17 17:50 ` Pasha Tatashin
2025-11-20 17:20 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 17:50 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > +struct liveupdate_file_handler;
> > +struct liveupdate_session;
>
> Why struct liveupdate_session is a part of public LUO API?
It is an obscure version of private "struct luo_session", in order to
give subsystem access to:
liveupdate_get_file_incoming(s, token, filep)
liveupdate_get_token_outgoing(s, file, tokenp)
For example, if your FD depends on another FD within a session, you
can check if another FD is already preserved via
liveupdate_get_token_outgoing(), and during retrieval time you can
retrieve the "struct file" for your dependency.
> > +struct file;
> > +
> > +/**
> > + * struct liveupdate_file_op_args - Arguments for file operation callbacks.
> > + * @handler: The file handler being called.
> > + * @session: The session this file belongs to.
> > + * @retrieved: The retrieve status for the 'can_finish / finish'
> > + * operation.
> > + * @file: The file object. For retrieve: [OUT] The callback sets
> > + * this to the new file. For other ops: [IN] The caller sets
> > + * this to the file being operated on.
> > + * @serialized_data: The opaque u64 handle, preserve/prepare/freeze may update
> > + * this field.
> > + *
> > + * This structure bundles all parameters for the file operation callbacks.
> > + * The 'data' and 'file' fields are used for both input and output.
> > + */
> > +struct liveupdate_file_op_args {
> > + struct liveupdate_file_handler *handler;
> > + struct liveupdate_session *session;
> > + bool retrieved;
> > + struct file *file;
> > + u64 serialized_data;
> > +};
> > +
> > +/**
> > + * struct liveupdate_file_ops - Callbacks for live-updatable files.
> > + * @can_preserve: Required. Lightweight check to see if this handler is
> > + * compatible with the given file.
> > + * @preserve: Required. Performs state-saving for the file.
> > + * @unpreserve: Required. Cleans up any resources allocated by @preserve.
> > + * @freeze: Optional. Final actions just before kernel transition.
> > + * @unfreeze: Optional. Undo freeze operations.
> > + * @retrieve: Required. Restores the file in the new kernel.
> > + * @can_finish: Optional. Check if this FD can finish, i.e. all restoration
> > + * pre-requirements for this FD are satisfied. Called prior to
> > + * finish, in order to do successful finish calls for all
> > + * resources in the session.
> > + * @finish: Required. Final cleanup in the new kernel.
> > + * @owner: Module reference
> > + *
> > + * All operations (except can_preserve) receive a pointer to a
> > + * 'struct liveupdate_file_op_args' containing the necessary context.
> > + */
> > +struct liveupdate_file_ops {
> > + bool (*can_preserve)(struct liveupdate_file_handler *handler,
> > + struct file *file);
> > + int (*preserve)(struct liveupdate_file_op_args *args);
> > + void (*unpreserve)(struct liveupdate_file_op_args *args);
> > + int (*freeze)(struct liveupdate_file_op_args *args);
> > + void (*unfreeze)(struct liveupdate_file_op_args *args);
> > + int (*retrieve)(struct liveupdate_file_op_args *args);
> > + bool (*can_finish)(struct liveupdate_file_op_args *args);
> > + void (*finish)(struct liveupdate_file_op_args *args);
> > + struct module *owner;
> > +};
> > +
> > +/**
> > + * struct liveupdate_file_handler - Represents a handler for a live-updatable file type.
> > + * @ops: Callback functions
> > + * @compatible: The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
> > + * that uniquely identifies the file type this handler
> > + * supports. This is matched against the compatible string
> > + * associated with individual &struct file instances.
> > + * @list: Used for linking this handler instance into a global
> > + * list of registered file handlers.
> > + *
> > + * Modules that want to support live update for specific file types should
> > + * register an instance of this structure. LUO uses this registration to
> > + * determine if a given file can be preserved and to find the appropriate
> > + * operations to manage its state across the update.
> > + */
> > +struct liveupdate_file_handler {
> > + const struct liveupdate_file_ops *ops;
> > + const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
> > + struct list_head list;
>
> Did you consider using __private and ACCESS_PRIVATE() for the ->list
> member here and in other structures visible outside kernel/liveupdate?
I hadn't considered it, but that is a great suggestion. I will update
the headers to use __private/ACCESS_PRIVATE().
> >
> > +/* The max size is set so it can be reliably used during in serialization */
>
> I failed to parse this comment.
Me too, I removed it. :-)
> > + * - can_preserve(): A lightweight check to determine if the handler is
> > + * compatible with a given 'struct file'.
> > + * - preserve(): The heavyweight operation that saves the file's state and
> > + * returns an opaque u64 handle, happens while vcpus are still running.
>
> ^ VCPUs
Done
>
> This narrows the description to VM-only usecase and in general ->preserve()
> may happen after VCPUs are suspended, although it's neither intended nor
> desirable. LUO does not control the sequencing so we can't claim here
> anything about VCPUs.
Agreed. While keeping VCPUs running is the target behavior for the
hypervisor use case to minimize downtime, the framework itself is
agnostic to the workload type and sequencing. Re-wrote:
* - preserve(): The heavyweight operation that saves the file's state and
* returns an opaque u64 handle. This is typically performed while the
* workload is still active to minimize the downtime during the
* actual reboot transition.
> > + * - unpreserve(): Cleans up any resources allocated by .preserve(), called
> > + * if the preservation process is aborted before the reboot (i.e. session is
> > + * closed).
> > + * - freeze(): A final pre-reboot opportunity to prepare the state for kexec.
> > + * We are already in reboot syscall, and therefore userspace cannot mutate
> > + * the file anymore.
> > + * - unfreeze(): Undoes the actions of .freeze(), called if the live update
> > + * is aborted after the freeze phase.
> > + * - retrieve(): Reconstructs the file in the new kernel from the preserved
> > + * handle.
> > + * - finish(): Performs final check and cleanup in the new kernel. After
> > + * succesul finish call, LUO gives up ownership to this file.
> > + *
> > + * File Preservation Lifecycle happy path:
> > + *
> > + * 1. Preserve (Normal Operation): A userspace agent preserves files one by one
> > + * via an ioctl. For each file, luo_preserve_file() finds a compatible
> > + * handler, calls its .preserve() op, and creates an internal &struct
>
> ^ method or operation
Done
>
> > + * luo_file to track the live state.
> > + *
> > + * 2. Freeze (Pre-Reboot): Just before the kexec, luo_file_freeze() is called.
> > + * It iterates through all preserved files, calls their respective .freeze()
> > + * ops, and serializes their final metadata (compatible string, token, and
>
> ^ method or operation
>
> > + * data handle) into a contiguous memory block for KHO.
> > + *
> > + * 3. Deserialize (New Kernel - Early Boot): After kexec, luo_file_deserialize()
>
> From the code it seems that description runs on the fist open of
> /dev/liveupdated, what do I miss?
Updated:
* 3. Deserialize: After kexec, luo_file_deserialize() runs when session gets
* deserialized (which is when /dev/liveupdate is first opened). It reads the
* serialized data from the KHO memory region and reconstructs the in-memory
* list of &struct luo_file instances for the new kernel, linking them to
* their corresponding handlers.
>
> > + * runs. It reads the serialized data from the KHO memory region and
> > + * reconstructs the in-memory list of &struct luo_file instances for the new
> > + * kernel, linking them to their corresponding handlers.
> > + *
> > + * 4. Retrieve (New Kernel - Userspace Ready): The userspace agent can now
> > + * restore file descriptors by providing a token. luo_retrieve_file()
> > + * searches for the matching token, calls the handler's .retrieve() op to
> > + * re-create the 'struct file', and returns a new FD. Files can be
> > + * retrieved in ANY order.
> > + *
> > + * 5. Finish (New Kernel - Cleanup): Once a session retrival is complete,
> > + * luo_file_finish() is called. It iterates through all files,
> > + * invokes their .finish() ops for final cleanup, and releases all
>
> ^ method
Done
>
> > + * associated kernel resources.
> > + *
> > + * File Preservation Lifecycle unhappy paths:
> > + *
> > + * 1. Abort Before Reboot: If the userspace agent aborts the live update
> > + * process before calling reboot (e.g., by closing the session file
> > + * descriptor), the session's release handler calls
> > + * luo_file_unpreserve_files(). This invokes the .unpreserve() callback on
> > + * all preserved files, ensuring all allocated resources are cleaned up and
> > + * returning the system to a clean state.
> > + *
> > + * 2. Freeze Failure: During the reboot() syscall, if any handler's .freeze()
> > + * op fails, the .unfreeze() op is invoked on all previously *successful*
> > + * freezes to roll back their state. The reboot() syscall then returns an
> > + * error to userspace, canceling the live update.
> > + *
> > + * 3. Finish Failure: In the new kernel, if a handler's .finish() op fails,
> > + * the luo_file_finish() operation is aborted. LUO retains ownership of
> > + * all files within that session, including those that were not yet
> > + * processed. The userspace agent can attempt to call the finish operation
> > + * again later. If the issue cannot be resolved, these resources will be held
> > + * by LUO until the next live update cycle, at which point they will be
> > + * discarded.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/cleanup.h>
> > +#include <linux/err.h>
> > +#include <linux/errno.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/kexec_handover.h>
> > +#include <linux/liveupdate.h>
> > +#include <linux/liveupdate/abi/luo.h>
> > +#include <linux/module.h>
> > +#include <linux/sizes.h>
> > +#include <linux/slab.h>
> > +#include <linux/string.h>
> > +#include "luo_internal.h"
> > +
> > +static LIST_HEAD(luo_file_handler_list);
> > +
> > +/* 2 4K pages, give space for 128 files per session */
> > +#define LUO_FILE_PGCNT 2ul
> > +#define LUO_FILE_MAX \
> > + ((LUO_FILE_PGCNT << PAGE_SHIFT) / sizeof(struct luo_file_ser))
> > +
> > +/**
> > + * struct luo_file - Represents a single preserved file instance.
> > + * @fh: Pointer to the &struct liveupdate_file_handler that manages
> > + * this type of file.
> > + * @file: Pointer to the kernel's &struct file that is being preserved.
> > + * This is NULL in the new kernel until the file is successfully
> > + * retrieved.
> > + * @serialized_data: The opaque u64 handle to the serialized state of the file.
> > + * This handle is passed back to the handler's .freeze(),
> > + * .retrieve(), and .finish() callbacks, allowing it to track
> > + * and update its serialized state across phases.
> > + * @retrieved: A flag indicating whether a user/kernel in the new kernel has
> > + * successfully called retrieve() on this file. This prevents
> > + * multiple retrieval attempts.
> > + * @mutex: A mutex that protects the fields of this specific instance
> > + * (e.g., @retrieved, @file), ensuring that operations like
> > + * retrieving or finishing a file are atomic.
> > + * @list: The list_head linking this instance into its parent
> > + * session's list of preserved files.
> > + * @token: The user-provided unique token used to identify this file.
> > + *
> > + * This structure is the core in-kernel representation of a single file being
> > + * managed through a live update. An instance is created by luo_preserve_file()
> > + * to link a 'struct file' to its corresponding handler, a user-provided token,
> > + * and the serialized state handle returned by the handler's .preserve()
> > + * operation.
> > + *
> > + * These instances are tracked in a per-session list. The @serialized_data
> > + * field, which holds a handle to the file's serialized state, may be updated
> > + * during the .freeze() callback before being serialized for the next kernel.
> > + * After reboot, these structures are recreated by luo_file_deserialize() and
> > + * are finally cleaned up by luo_file_finish().
> > + */
> > +struct luo_file {
> > + struct liveupdate_file_handler *fh;
> > + struct file *file;
> > + u64 serialized_data;
> > + bool retrieved;
> > + struct mutex mutex;
> > + struct list_head list;
> > + u64 token;
> > +};
> > +
> > +static int luo_session_alloc_files_mem(struct luo_session *session)
>
> It seems like this belongs to luo_session.c
It belongs here, but the name is wrong, so I renamed the alloc/free functions.
> > +{
> > + size_t size;
> > + void *mem;
> > +
> > + if (session->files)
> > + return 0;
> > +
> > + WARN_ON_ONCE(session->count);
> > +
> > + size = LUO_FILE_PGCNT << PAGE_SHIFT;
> > + mem = kho_alloc_preserve(size);
> > + if (IS_ERR(mem))
> > + return PTR_ERR(mem);
> > +
> > + session->files = mem;
> > + session->pgcnt = LUO_FILE_PGCNT;
> > +
> > + return 0;
> > +}
> > +
> > +static void luo_session_free_files_mem(struct luo_session *session)
> > +{
>
> Ditto
done.
>
> > + /* If session has files, no need to free preservation memory */
> > + if (session->count)
> > + return;
> > +
> > + if (!session->files)
> > + return;
> > +
> > + kho_unpreserve_free(session->files);
> > + session->files = NULL;
> > + session->pgcnt = 0;
> > +}
> > +
> > +static bool luo_token_is_used(struct luo_session *session, u64 token)
> > +{
> > + struct luo_file *iter;
> > +
> > + list_for_each_entry(iter, &session->files_list, list) {
>
> And here again I'm not very fond of dereferencing session objects in
> luo_file.
luo_file only access session->files_* fields, that are both allocated
and freed in luo_files, and iterated inside luo_file.
>
> > + if (iter->token == token)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +/**
> > + * luo_preserve_file - Initiate the preservation of a file descriptor.
> > + * @session: The session to which the preserved file will be added.
> > + * @token: A unique, user-provided identifier for the file.
> > + * @fd: The file descriptor to be preserved.
> > + *
> > + * This function orchestrates the first phase of preserving a file. Upon entry,
> > + * it takes a reference to the 'struct file' via fget(), effectively making LUO
> > + * a co-owner of the file. This reference is held until the file is either
> > + * unpreserved or successfully finished in the next kernel, preventing the file
> > + * from being prematurely destroyed.
> > + *
> > + * This function orchestrates the first phase of preserving a file. It performs
> > + * the following steps:
> > + *
> > + * 1. Validates that the @token is not already in use within the session.
> > + * 2. Ensures the session's memory for files serialization is allocated
> > + * (allocates if needed).
> > + * 3. Iterates through registered handlers, calling can_preserve() to find one
> > + * compatible with the given @fd.
> > + * 4. Calls the handler's .preserve() operation, which saves the file's state
> > + * and returns an opaque private data handle.
> > + * 5. Adds the new instance to the session's internal list.
> > + *
> > + * On success, LUO takes a reference to the 'struct file' and considers it
> > + * under its management until it is unpreserved or finished.
> > + *
> > + * In case of any failure, all intermediate allocations (file reference, memory
> > + * for the 'luo_file' struct, etc.) are cleaned up before returning an error.
> > + *
> > + * Context: Can be called from an ioctl handler during normal system operation.
> > + * Return: 0 on success. Returns a negative errno on failure:
> > + * -EEXIST if the token is already used.
> > + * -EBADF if the file descriptor is invalid.
> > + * -ENOSPC if the session is full.
> > + * -ENOENT if no compatible handler is found.
> > + * -ENOMEM on memory allocation failure.
> > + * Other erros might be returned by .preserve().
> > + */
> > +int luo_preserve_file(struct luo_session *session, u64 token, int fd)
> > +{
> > + struct liveupdate_file_op_args args = {0};
> > + struct liveupdate_file_handler *fh;
> > + struct luo_file *luo_file;
> > + struct file *file;
> > + int err;
> > +
> > + lockdep_assert_held(&session->mutex);
> > +
> > + if (luo_token_is_used(session, token))
> > + return -EEXIST;
> > +
> > + file = fget(fd);
> > + if (!file)
> > + return -EBADF;
> > +
> > + err = luo_session_alloc_files_mem(session);
> > + if (err)
> > + goto exit_err;
> > +
> > + if (session->count == LUO_FILE_MAX) {
> > + err = -ENOSPC;
> > + goto exit_err;
> > + }
>
> I believe session can be prepared and vailidated by the caller.
Size of luo_files, and other file count related limitations all belong
luo_file.c
>
> > +
> > + err = -ENOENT;
> > + list_for_each_entry(fh, &luo_file_handler_list, list) {
> > + if (fh->ops->can_preserve(fh, file)) {
> > + err = 0;
> > + break;
> > + }
> > + }
> > +
> > + /* err is still -ENOENT if no handler was found */
> > + if (err)
> > + goto exit_err;
> > +
> > + luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > + if (!luo_file) {
> > + err = -ENOMEM;
> > + goto exit_err;
> > + }
> > +
> > + luo_file->file = file;
> > + luo_file->fh = fh;
> > + luo_file->token = token;
> > + luo_file->retrieved = false;
> > + mutex_init(&luo_file->mutex);
> > +
> > + args.handler = fh;
> > + args.session = (struct liveupdate_session *)session;
>
> Isn't args.session already struct liveupdate_session *?
This casts (struct luo_session *) to obscure public (struct
liveupdate_session *).
>
> > + args.file = file;
> > + err = fh->ops->preserve(&args);
> > + if (err) {
> > + mutex_destroy(&luo_file->mutex);
> > + kfree(luo_file);
> > + goto exit_err;
> > + } else {
> > + luo_file->serialized_data = args.serialized_data;
> > + list_add_tail(&luo_file->list, &session->files_list);
> > + session->count++;
>
> I'd use luo_session_add_file(struct luo_file *luo_file) or return luo_file
> by reference to the caller.
> Than the lockdep_assert_held() can go away as well.
Let's keep this, I do not think, there is any architectural win from
disallowing luo_file from insert itself directly into a session, both
a part of luo_*
luo_session does not manage anything files related: no
serialization/deserialization, no allocations/free, no
insertion/removal.
>
> > + }
> > +
> > + return 0;
> > +
> > +exit_err:
> > + fput(file);
> > + luo_session_free_files_mem(session);
>
> The error handling in this function is a mess. Pasha, please, please, use
> goto consistently.
How is this a mess? There is a single exit_err destination, no
exception, no early returns except at the very top of the function
where we do early returns before fget() which makes total sense.
Do you want to add a separate destination for
luo_session_free_files_mem() ? But that is not necessary, in many
places it is considered totally reasonable for free(NULL) to work
correctly...
> > +
> > + return err;
> > +}
> > +
> > +/**
> > + * luo_file_unpreserve_files - Unpreserves all files from a session.
> > + * @session: The session to be cleaned up.
> > + *
> > + * This function serves as the primary cleanup path for a session. It is
> > + * invoked when the userspace agent closes the session's file descriptor.
> > + *
> > + * For each file, it performs the following cleanup actions:
> > + * 1. Calls the handler's .unpreserve() callback to allow the handler to
> > + * release any resources it allocated.
> > + * 2. Removes the file from the session's internal tracking list.
> > + * 3. Releases the reference to the 'struct file' that was taken by
> > + * luo_preserve_file() via fput(), returning ownership.
> > + * 4. Frees the memory associated with the internal 'struct luo_file'.
> > + *
> > + * After all individual files are unpreserved, it frees the contiguous memory
> > + * block that was allocated to hold their serialization data.
> > + */
> > +void luo_file_unpreserve_files(struct luo_session *session)
> > +{
> > + struct luo_file *luo_file;
> > +
> > + lockdep_assert_held(&session->mutex);
> > +
> > + while (!list_empty(&session->files_list)) {
>
> I think the loop should be in luo_session.c and luo_files.c should
> implement luo_file_unpreserve(struct luo_file *luo_file)
>
> The same applies to other functions below that do something with all files
> in the session. In my view luo_session should iterate through
> luo_session.files_list and call luo_file methods for each luo_file object.
Let's not do that, files within a session related operations belong to
file, sessions within LUO related operations belong to luo_session
> > +int luo_file_freeze(struct luo_session *session)
> > +{
> > + struct luo_file_ser *file_ser = session->files;
> > + struct luo_file *luo_file;
> > + int err;
> > + int i;
> > +
> > + lockdep_assert_held(&session->mutex);
> > +
> > + if (!session->count)
> > + return 0;
> > +
> > + if (WARN_ON(!file_ser))
> > + return -EINVAL;
> > +
> > + i = 0;
> > + list_for_each_entry(luo_file, &session->files_list, list) {
> > + err = luo_file_freeze_one(session, luo_file);
> > + if (err < 0) {
> > + pr_warn("Freeze failed for session[%s] token[%#0llx] handler[%s] err[%pe]\n",
> > + session->name, luo_file->token,
> > + luo_file->fh->compatible, ERR_PTR(err));
> > + goto exit_err;
> > + }
> > +
> > + strscpy(file_ser[i].compatible, luo_file->fh->compatible,
> > + sizeof(file_ser[i].compatible));
> > + file_ser[i].data = luo_file->serialized_data;
> > + file_ser[i].token = luo_file->token;
> > + i++;
> > + }
> > +
> > + return 0;
> > +
> > +exit_err:
> > + __luo_file_unfreeze(session, luo_file);
>
> Maybe move frozen files to a local list, call __luo_file_unfreeze() with
> that list and than splice it back to session.files_list?
IMO, it would add unnecessary complications. session is locked,
session->files_list is all under our control, no need to add
complications with private list.
> > + luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > + if (!luo_file)
> > + return -ENOMEM;
>
> Shouldn't we free files allocated on the previous iterations?
No, for the same reason explained in luo_session.c :-)
>
> > +
> > + luo_file->fh = fh;
> > + luo_file->file = NULL;
> > + luo_file->serialized_data = file_ser[i].data;
> > + luo_file->token = file_ser[i].token;
> > + luo_file->retrieved = false;
> > + mutex_init(&luo_file->mutex);
> > + list_add_tail(&luo_file->list, &session->files_list);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * liveupdate_register_file_handler - Register a file handler with LUO.
> > + * @fh: Pointer to a caller-allocated &struct liveupdate_file_handler.
> > + * The caller must initialize this structure, including a unique
> > + * 'compatible' string and a valid 'fh' callbacks. This function adds the
> > + * handler to the global list of supported file handlers.
> > + *
> > + * Context: Typically called during module initialization for file types that
> > + * support live update preservation.
> > + *
> > + * Return: 0 on success. Negative errno on failure.
> > + */
> > +int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > +{
> > + static DEFINE_MUTEX(register_file_handler_lock);
> > + struct liveupdate_file_handler *fh_iter;
> > +
> > + if (!liveupdate_enabled())
> > + return -EOPNOTSUPP;
> > +
> > + /*
> > + * Once sessions have been deserialized, file handlers cannot be
> > + * registered, it is too late.
> > + */
> > + if (WARN_ON(luo_session_is_deserialized()))
> > + return -EBUSY;
> > +
> > + /* Sanity check that all required callbacks are set */
> > + if (!fh->ops->preserve || !fh->ops->unpreserve ||
> > + !fh->ops->retrieve || !fh->ops->finish) {
> > + return -EINVAL;
> > + }
> > +
> > + guard(mutex)(®ister_file_handler_lock);
> > + list_for_each_entry(fh_iter, &luo_file_handler_list, list) {
> > + if (!strcmp(fh_iter->compatible, fh->compatible)) {
> > + pr_err("File handler registration failed: Compatible string '%s' already registered.\n",
> > + fh->compatible);
> > + return -EEXIST;
> > + }
> > + }
> > +
> > + if (!try_module_get(fh->ops->owner))
> > + return -EAGAIN;
> > +
> > + INIT_LIST_HEAD(&fh->list);
> > + list_add_tail(&fh->list, &luo_file_handler_list);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * liveupdate_get_token_outgoing - Get the token for a preserved file.
> > + * @s: The outgoing liveupdate session.
> > + * @file: The file object to search for.
> > + * @tokenp: Output parameter for the found token.
> > + *
> > + * Searches the list of preserved files in an outgoing session for a matching
> > + * file object. If found, the corresponding user-provided token is returned.
> > + *
> > + * This function is intended for in-kernel callers that need to correlate a
> > + * file with its liveupdate token.
> > + *
> > + * Context: Can be called from any context that can acquire the session mutex.
> > + * Return: 0 on success, -ENOENT if the file is not preserved in this session.
> > + */
> > +int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> > + struct file *file, u64 *tokenp)
> > +{
>
> This function is apparently unused.
>
> > + struct luo_session *session = (struct luo_session *)s;
> > + struct luo_file *luo_file;
> > + int err = -ENOENT;
> > +
> > + list_for_each_entry(luo_file, &session->files_list, list) {
> > + if (luo_file->file == file) {
> > + if (tokenp)
> > + *tokenp = luo_file->token;
> > + err = 0;
> > + break;
> > + }
> > + }
> > +
> > + return err;
> > +}
> > +
> > +/**
> > + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel use.
> > + * @s: The incoming liveupdate session (restored from the previous kernel).
> > + * @token: The unique token identifying the file to retrieve.
> > + * @filep: On success, this will be populated with a pointer to the retrieved
> > + * 'struct file'.
> > + *
> > + * Provides a kernel-internal API for other subsystems to retrieve their
> > + * preserved files after a live update. This function is a simple wrapper
> > + * around luo_retrieve_file(), allowing callers to find a file by its token.
> > + *
> > + * The operation is idempotent; subsequent calls for the same token will return
> > + * a pointer to the same 'struct file' object.
> > + *
> > + * The caller receives a pointer to the file with a reference incremented. The
> > + * file's lifetime is managed by LUO and any userspace file
> > + * descriptors. If the caller needs to hold a reference to the file beyond the
> > + * immediate scope, it must call get_file() itself.
> > + *
> > + * Context: Can be called from any context in the new kernel that has a handle
> > + * to a restored session.
> > + * Return: 0 on success. Returns -ENOENT if no file with the matching token is
> > + * found, or any other negative errno on failure.
> > + */
> > +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > + struct file **filep)
> > +{
>
> Ditto.
These two functions are part of the public API allowing dependency
tracking for vfio->iommu->memfd during preservation.
>
> > + struct luo_session *session = (struct luo_session *)s;
> > +
> > + return luo_retrieve_file(session, token, filep);
> > +}
> > diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> > index 5185ad37a8c1..1a36f2383123 100644
> > --- a/kernel/liveupdate/luo_internal.h
> > +++ b/kernel/liveupdate/luo_internal.h
> > @@ -70,4 +70,13 @@ int luo_session_serialize(void);
> > int luo_session_deserialize(void);
> > bool luo_session_is_deserialized(void);
> >
> > +int luo_preserve_file(struct luo_session *session, u64 token, int fd);
> > +void luo_file_unpreserve_files(struct luo_session *session);
> > +int luo_file_freeze(struct luo_session *session);
> > +void luo_file_unfreeze(struct luo_session *session);
> > +int luo_retrieve_file(struct luo_session *session, u64 token,
> > + struct file **filep);
> > +int luo_file_finish(struct luo_session *session);
> > +int luo_file_deserialize(struct luo_session *session);
> > +
> > #endif /* _LINUX_LUO_INTERNAL_H */
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 10/20] MAINTAINERS: add liveupdate entry
2025-11-17 9:40 ` Mike Rapoport
@ 2025-11-17 18:20 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:20 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 4:41 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:56PM -0500, Pasha Tatashin wrote:
> > Add a MAINTAINERS file entry for the new Live Update Orchestrator
> > introduced in previous patches.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> > MAINTAINERS | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 500789529359..bc9f5c6f0e80 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14464,6 +14464,17 @@ F: kernel/module/livepatch.c
> > F: samples/livepatch/
> > F: tools/testing/selftests/livepatch/
> >
> > +LIVE UPDATE
> > +M: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Please count me in :)
>
Sure, added.
> > +L: linux-kernel@vger.kernel.org
> > +S: Maintained
> > +F: Documentation/core-api/liveupdate.rst
> > +F: Documentation/userspace-api/liveupdate.rst
> > +F: include/linux/liveupdate.h
> > +F: include/linux/liveupdate/
> > +F: include/uapi/linux/liveupdate.h
> > +F: kernel/liveupdate/
> > +
> > LLC (802.2)
> > L: netdev@vger.kernel.org
> > S: Odd fixes
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-16 18:53 ` Zhu Yanjun
@ 2025-11-17 18:23 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:23 UTC (permalink / raw)
To: Zhu Yanjun
Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> Thanks a lot. Just with kernel image, it is not enough to boot the host.
> Adding initramfs will avoid the crash when the host boots.
> I have made tests to verify this.
>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Thank you!
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags
2025-11-17 9:48 ` Mike Rapoport
@ 2025-11-17 18:25 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:25 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 4:48 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:57PM -0500, Pasha Tatashin wrote:
> > From: Pratyush Yadav <ptyadav@amazon.de>
> >
> > shmem_inode_info::flags can have the VM flags VM_NORESERVE and
> > VM_LOCKED. These are used to suppress pre-accounting or to lock the
> > pages in the inode respectively. Using the VM flags directly makes it
> > difficult to add shmem-specific flags that are unrelated to VM behavior
> > since one would need to find a VM flag not used by shmem and re-purpose
> > it.
> >
> > Introduce SHMEM_F_NORESERVE and SHMEM_F_LOCKED which represent the same
> > information, but their bits are independent of the VM flags. Callers can
> > still pass VM_NORESERVE to shmem_get_inode(), but it gets transformed to
> > the shmem-specific flag internally.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Thank you.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-16 19:16 ` Mike Rapoport
@ 2025-11-17 18:29 ` Pasha Tatashin
2025-11-17 21:05 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:29 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sun, Nov 16, 2025 at 2:16 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> > On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > > +static int __init liveupdate_early_init(void)
> > > > +{
> > > > + int err;
> > > > +
> > > > + err = luo_early_startup();
> > > > + if (err) {
> > > > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > > + ERR_PTR(err));
> > >
> > > How do we report this to the userspace?
> > > I think the decision what to do in this case belongs there. Even if it's
> > > down to choosing between plain kexec and full reboot, it's still a policy
> > > that should be implemented in userspace.
> >
> > I agree that policy belongs in userspace, and that is how we designed
> > it. In this specific failure case (ABI mismatch or corrupt FDT), the
> > preserved state is unrecoverable by the kernel. We cannot parse the
> > incoming data, so we cannot offer it to userspace.
> >
> > We report this state by not registering the /dev/liveupdate device.
> > When the userspace agent attempts to initialize, it receives ENOENT.
> > At that point, the agent exercises its policy:
> >
> > - Check dmesg for the specific error and report the failure to the
> > fleet control plane.
>
> Hmm, this is not nice. I think we still should register /dev/liveupdate and
> let userspace discover this error via /dev/liveupdate ABIs.
Not registering the device is the correct approach here for two reasons:
1. This follows the standard Linux driver pattern. If a driver fails
to initialize its underlying resources (hardware, firmware, or in this
case, the incoming FDT), it does not register a character device.
2. Registering a "zombie" device that exists solely to return errors
adds significant complexity. We would need to introduce a specific
"broken" state to the state machine and add checks to IOCTLs to reject
commands with a specific error code.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 13/20] mm: shmem: export some functions to internal.h
2025-11-17 10:14 ` Mike Rapoport
@ 2025-11-17 18:43 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:43 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 5:14 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:59PM -0500, Pasha Tatashin wrote:
> > From: Pratyush Yadav <ptyadav@amazon.de>
> >
> > shmem_inode_acct_blocks(), shmem_recalc_inode(), and
> > shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
> > functionality will also be used in the future by Live Update
> > Orchestrator (LUO) to recreate memfd files after a live update.
>
> I'd rephrase this a bit to say that it will be used by memfd integration
> into LUO to emphasize this stays inside mm.
Done
>
> Other than that
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Thank you.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state
2025-11-17 10:15 ` Mike Rapoport
@ 2025-11-17 18:45 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 18:45 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
> > Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Thank you!
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
2025-11-17 11:13 ` Mike Rapoport
@ 2025-11-17 19:00 ` Pasha Tatashin
2025-11-18 11:30 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 19:00 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > index df337c9c4f21..9a531096bdb5 100644
> > --- a/kernel/liveupdate/luo_file.c
> > +++ b/kernel/liveupdate/luo_file.c
> > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > INIT_LIST_HEAD(&fh->flb_list);
> > list_add_tail(&fh->list, &luo_file_handler_list);
> >
> > + liveupdate_test_register(fh);
> > +
>
> Why this cannot be called from the test?
Because test does not have access to all file_handlers that are being
registered with LUO.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
2025-11-16 18:53 ` Zhu Yanjun
@ 2025-11-17 19:27 ` David Matlack
2025-11-17 20:08 ` David Matlack
2025-11-18 0:06 ` David Matlack
2025-11-19 21:20 ` David Matlack
3 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-17 19:27 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
> diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> index 2a573c36016e..1563ac84006a 100644
> --- a/tools/testing/selftests/liveupdate/Makefile
> +++ b/tools/testing/selftests/liveupdate/Makefile
> @@ -1,7 +1,39 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +
> +KHDR_INCLUDES ?= -I../../../../usr/include
You shouldn't need to set this variable and $(OUTPUT). Both should be
provided by lib.mk. Maybe the include is too far down?
> CFLAGS += -Wall -O2 -Wno-unused-function
> CFLAGS += $(KHDR_INCLUDES)
> +LDFLAGS += -static
Is static build really required or just for your setup? If it's
setup-specific, I would recommend letting the user pass in -static via
EXTRA_CFLAGS. That what we do in the KVM and VFIO selftests.
CFLAGS += $(EXTRA_CFLAGS)
Then the user can pass EXTRA_CFLAGS=-static on the command line.
> +OUTPUT ?= .
> +
> +# --- Test Configuration (Edit this section when adding new tests) ---
> +LUO_SHARED_SRCS := luo_test_utils.c
> +LUO_SHARED_HDRS += luo_test_utils.h
I would suggest using the -MD flag and Make's -include directive to
automatically handle headers. That way you don't need to add every
header to Makefile for Make to detect changes. See the end of my email
for how to do this.
> +
> +LUO_MANUAL_TESTS += luo_kexec_simple
> +
> +TEST_FILES += do_kexec.sh
>
> TEST_GEN_PROGS += liveupdate
>
> +# --- Automatic Rule Generation (Do not edit below) ---
> +
> +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> +
> +# Define the full list of sources for each manual test.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
This does not build with Google's gbuild wrapper around make. I get
these errors (after fixing the semi-colon issue below):
clang: error: no such file or directory: 'luo_kexec_simple.c'
clang: error: no such file or directory: 'luo_test_utils.c'
clang: error: no such file or directory: 'luo_test_utils.h'
> +
> +# This loop automatically generates an explicit build rule for each manual test.
> +# It includes dependencies on the shared headers and makes the output
> +# executable.
> +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> + $(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> + $(call msg,LINK,,$$@) ; \
> + $(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> + $(Q)chmod +x $$@ \
These semi-colons swollow any errors. I would recommend against using
a foreach and eval. Make supports pattern-based targets so there's
really no need for loops. See below.
> + ) \
> +)
> +
> include ../lib.mk
Putting it all together, here is what I'd recommend for this Makefile
(drop-in replacement for the current Makefile). This will also make it
easier for me to share the library code with VFIO selftests, which
I'll need to do in the VFIO series.
(Sorry in advance for the line wrap. I had to send this through gmail.)
# SPDX-License-Identifier: GPL-2.0-only
LIBLIVEUPDATE_C += luo_test_utils.c
TEST_GEN_PROGS_EXTENDED += luo_kexec_simple
TEST_GEN_PROGS_EXTENDED += luo_multi_session
TEST_FILES += do_kexec.sh
include ../lib.mk
CFLAGS += $(KHDR_INCLUDES)
CFLAGS += -Wall -O2 -Wno-unused-function
CFLAGS += -MD
CFLAGS += $(EXTRA_CFLAGS)
LIBLIVEUPDATE_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBLIVEUPDATE_C))
TEST_GEN_PROGS_EXTENDED_O += $(patsubst %, %.o, $(TEST_GEN_PROGS_EXTENDED))
TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_GEN_PROGS_EXTENDED_O))
-include $(TEST_DEP_FILES)
$(LIBLIVEUPDATE_O): $(OUTPUT)/%.o: %.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
$(TEST_GEN_PROGS_EXTENDED): %: %.o $(LIBLIVEUPDATE_O)
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $<
$(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
EXTRA_CLEAN += $(LIBLIVEUPDATE_O) $(TEST_GEN_PROGS_EXTENDED_O) $(TEST_DEP_FILES)
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests
2025-11-15 23:34 ` [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests Pasha Tatashin
@ 2025-11-17 19:38 ` David Matlack
2025-11-17 20:16 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-17 19:38 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
> diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> new file mode 100644
> index 000000000000..af6e773cf98f
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/.gitignore
> @@ -0,0 +1 @@
> +/liveupdate
I would recommend the following .gitignore so you don't have to keep
updating it every time there's a new executable or other build
artifact. This is what we use in the KVM and VFIO selftests.
# SPDX-License-Identifier: GPL-2.0-only
*
!/**/
!*.c
!*.h
!*.S
!*.sh
!*.mk
!.gitignore
!config
!Makefile
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-17 19:27 ` David Matlack
@ 2025-11-17 20:08 ` David Matlack
2025-11-17 21:06 ` David Matlack
0 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-17 20:08 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 11:27 AM David Matlack <dmatlack@google.com> wrote:
> Putting it all together, here is what I'd recommend for this Makefile
> (drop-in replacement for the current Makefile). This will also make it
> easier for me to share the library code with VFIO selftests, which
> I'll need to do in the VFIO series.
>
> (Sorry in advance for the line wrap. I had to send this through gmail.)
Oops I dropped the build rule for liveupdate.c. Here it is with that included:
# SPDX-License-Identifier: GPL-2.0-only
LIBLIVEUPDATE_C += luo_test_utils.c
TEST_GEN_PROGS += liveupdate
TEST_GEN_PROGS_EXTENDED += luo_kexec_simple
TEST_GEN_PROGS_EXTENDED += luo_multi_session
TEST_FILES += do_kexec.sh
include ../lib.mk
CFLAGS += $(KHDR_INCLUDES)
CFLAGS += -Wall -O2 -Wno-unused-function
CFLAGS += -MD
CFLAGS += $(EXTRA_CFLAGS)
LIBLIVEUPDATE_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBLIVEUPDATE_C))
TEST_PROGS := $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED)
TEST_PROGS_O := $(patsubst %, %.o, $(TEST_PROGS))
TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_PROGS_O))
-include $(TEST_DEP_FILES)
$(LIBLIVEUPDATE_O): $(OUTPUT)/%.o: %.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
$(TEST_PROGS): %: %.o $(LIBLIVEUPDATE_O)
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $<
$(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
EXTRA_CLEAN += $(LIBLIVEUPDATE_O)
EXTRA_CLEAN += $(TEST_PROGS_O)
EXTRA_CLEAN += $(TEST_DEP_FILES)
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests
2025-11-17 19:38 ` David Matlack
@ 2025-11-17 20:16 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-17 20:16 UTC (permalink / raw)
To: David Matlack
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 2:39 PM David Matlack <dmatlack@google.com> wrote:
>
> On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
>
> > diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> > new file mode 100644
> > index 000000000000..af6e773cf98f
> > --- /dev/null
> > +++ b/tools/testing/selftests/liveupdate/.gitignore
> > @@ -0,0 +1 @@
> > +/liveupdate
>
> I would recommend the following .gitignore so you don't have to keep
> updating it every time there's a new executable or other build
> artifact. This is what we use in the KVM and VFIO selftests.
Good idea, I will do that.
Thanks,
Pasha
>
> # SPDX-License-Identifier: GPL-2.0-only
> *
> !/**/
> !*.c
> !*.h
> !*.S
> !*.sh
> !*.mk
> !.gitignore
> !config
> !Makefile
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-17 18:29 ` Pasha Tatashin
@ 2025-11-17 21:05 ` Mike Rapoport
2025-11-18 4:22 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 21:05 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 01:29:47PM -0500, Pasha Tatashin wrote:
> On Sun, Nov 16, 2025 at 2:16 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> > > On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > >
> > > > > +static int __init liveupdate_early_init(void)
> > > > > +{
> > > > > + int err;
> > > > > +
> > > > > + err = luo_early_startup();
> > > > > + if (err) {
> > > > > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > > > + ERR_PTR(err));
> > > >
> > > > How do we report this to the userspace?
> > > > I think the decision what to do in this case belongs there. Even if it's
> > > > down to choosing between plain kexec and full reboot, it's still a policy
> > > > that should be implemented in userspace.
> > >
> > > I agree that policy belongs in userspace, and that is how we designed
> > > it. In this specific failure case (ABI mismatch or corrupt FDT), the
> > > preserved state is unrecoverable by the kernel. We cannot parse the
> > > incoming data, so we cannot offer it to userspace.
> > >
> > > We report this state by not registering the /dev/liveupdate device.
> > > When the userspace agent attempts to initialize, it receives ENOENT.
> > > At that point, the agent exercises its policy:
> > >
> > > - Check dmesg for the specific error and report the failure to the
> > > fleet control plane.
> >
> > Hmm, this is not nice. I think we still should register /dev/liveupdate and
> > let userspace discover this error via /dev/liveupdate ABIs.
>
> Not registering the device is the correct approach here for two reasons:
>
> 1. This follows the standard Linux driver pattern. If a driver fails
> to initialize its underlying resources (hardware, firmware, or in this
> case, the incoming FDT), it does not register a character device.
> 2. Registering a "zombie" device that exists solely to return errors
> adds significant complexity. We would need to introduce a specific
> "broken" state to the state machine and add checks to IOCTLs to reject
> commands with a specific error code.
You can avoid that complexity if you register the device with a different
fops, but that's technicality.
Your point about treating the incoming FDT as an underlying resource that
failed to initialize makes sense, but nevertheless userspace needs a
reliable way to detect it and parsing dmesg is not something we should rely
on.
> Pasha
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-17 20:08 ` David Matlack
@ 2025-11-17 21:06 ` David Matlack
2025-11-18 1:01 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-17 21:06 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 12:08 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, Nov 17, 2025 at 11:27 AM David Matlack <dmatlack@google.com> wrote:
>
> > Putting it all together, here is what I'd recommend for this Makefile
> > (drop-in replacement for the current Makefile). This will also make it
> > easier for me to share the library code with VFIO selftests, which
> > I'll need to do in the VFIO series.
> >
> > (Sorry in advance for the line wrap. I had to send this through gmail.)
>
> Oops I dropped the build rule for liveupdate.c. Here it is with that included:
>
> # SPDX-License-Identifier: GPL-2.0-only
>
> LIBLIVEUPDATE_C += luo_test_utils.c
>
> TEST_GEN_PROGS += liveupdate
> TEST_GEN_PROGS_EXTENDED += luo_kexec_simple
> TEST_GEN_PROGS_EXTENDED += luo_multi_session
>
> TEST_FILES += do_kexec.sh
>
> include ../lib.mk
>
> CFLAGS += $(KHDR_INCLUDES)
> CFLAGS += -Wall -O2 -Wno-unused-function
> CFLAGS += -MD
> CFLAGS += $(EXTRA_CFLAGS)
>
> LIBLIVEUPDATE_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBLIVEUPDATE_C))
> TEST_PROGS := $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED)
Correction: I forgot that TEST_PROGS is reserved for test shell
scripts, so this variable needs a different name.
> TEST_PROGS_O := $(patsubst %, %.o, $(TEST_PROGS))
>
> TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
> TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_PROGS_O))
> -include $(TEST_DEP_FILES)
>
> $(LIBLIVEUPDATE_O): $(OUTPUT)/%.o: %.c
> $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
>
> $(TEST_PROGS): %: %.o $(LIBLIVEUPDATE_O)
> $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $<
> $(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
>
> EXTRA_CLEAN += $(LIBLIVEUPDATE_O)
> EXTRA_CLEAN += $(TEST_PROGS_O)
> EXTRA_CLEAN += $(TEST_DEP_FILES)
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
2025-11-17 15:09 ` Pasha Tatashin
@ 2025-11-17 21:11 ` Mike Rapoport
2025-11-18 4:28 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-17 21:11 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 10:09:28AM -0500, Pasha Tatashin wrote:
>
> > > + }
> > > +
> > > + for (int i = 0; i < sh->header_ser->count; i++) {
> > > + struct luo_session *session;
> > > +
> > > + session = luo_session_alloc(sh->ser[i].name);
> > > + if (IS_ERR(session)) {
> > > + pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > > + sh->ser[i].name, session);
> > > + return PTR_ERR(session);
> > > + }
> >
> > The allocated sessions still need to be freed if an insert fails ;-)
>
> No. We have failed to deserialize, so anyways the machine will need to
> be rebooted by the user in order to release the preserved resources.
>
> This is something that Jason Gunthrope also mentioned regarding IOMMU:
> if something is not correct (i.e., if a session cannot finish for some
> reason), don't add complicated "undo" code that cleans up all
> resources. Instead, treat them as a memory leak and allow a reboot to
> perform the cleanup.
>
> While in this particular patch the clean-up looks simple, later in the
> series we are adding file deserialization to each session to this
> function. So, the clean-up will look like this: we would have to free
> the resources for each session we deserialized, and also free the
> resources for files that were deserialized for those sessions, only to
> still boot into a "maintenance" mode where bunch of resources are not
> accessible from which the machine would have to be rebooted to get
> back to a normal state. This code will never be tested, and never be
> used, so let's use reboot to solve this problem, where devices are
> going to be properly reset, and memory is going to be properly freed.
A part of this explanation should be a comment in the code.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
2025-11-16 18:53 ` Zhu Yanjun
2025-11-17 19:27 ` David Matlack
@ 2025-11-18 0:06 ` David Matlack
2025-11-18 1:08 ` Pasha Tatashin
2025-11-19 21:20 ` David Matlack
3 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-18 0:06 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
> +/* Stage 1: Executed before the kexec reboot. */
> +static void run_stage_1(int luo_fd)
> +{
> + int session_fd;
> +
> + ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> +
> + ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> + create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> +
> + ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> + TEST_SESSION_NAME);
> + session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> + if (session_fd < 0)
> + fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> +
> + if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> + TEST_MEMFD_DATA) < 0) {
> + fail_exit("create_and_preserve_memfd for token %#x",
> + TEST_MEMFD_TOKEN);
> + }
> +
> + ksft_print_msg("[STAGE 1] Executing kexec...\n");
> + if (system(KEXEC_SCRIPT) != 0)
> + fail_exit("kexec script failed");
> + exit(EXIT_FAILURE);
Can we separate the kexec from the test and allow the user/automation to
trigger it however is appropriate for their system? The current
do_kexec.sh script does not do any sort of graceful shutdown, and I bet
everyone will have different ways of initiating kexec on their systems.
For example, something like this (but sleeping in the child instead of
busy waiting):
diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
index 67ab6ebf9eec..513693bfb77b 100644
--- a/tools/testing/selftests/liveupdate/luo_kexec_simple.c
+++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
@@ -24,6 +24,7 @@
static void run_stage_1(int luo_fd)
{
int session_fd;
+ int ret;
ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
@@ -42,10 +43,17 @@ static void run_stage_1(int luo_fd)
TEST_MEMFD_TOKEN);
}
- ksft_print_msg("[STAGE 1] Executing kexec...\n");
- if (system(KEXEC_SCRIPT) != 0)
- fail_exit("kexec script failed");
- exit(EXIT_FAILURE);
+ ksft_print_msg("[STAGE 1] Forking child process to hold session open\n");
+ ret = fork();
+ if (ret < 0)
+ fail_exit("fork() failed");
+ if (!ret)
+ for (;;) {}
+
+ ksft_print_msg("[STAGE 1] Child Process: %d\n", ret);
+ ksft_print_msg("[STAGE 1] Complete!\n");
+ ksft_print_msg("[STAGE 1] Execute kexec to continue\n");
+ exit(0);
}
/* Stage 2: Executed after the kexec reboot. */
> +int main(int argc, char *argv[])
> +{
> + int luo_fd;
> + int state_session_fd;
> +
> + luo_fd = luo_open_device();
> + if (luo_fd < 0)
> + ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
> + LUO_DEVICE);
> +
> + /*
> + * Determine the stage by attempting to retrieve the state session.
> + * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
> + */
> + state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
I don't think the test should try to infer the stage from the state of
the system. If a user runs this test, then does the kexec, then runs
this test again and the session can't be retrieved, that should be a
test failure (not just run stage 1 again).
I think it'd be better to require the user to pass in what stage of the
test should be run when invoking the test. e.g.
$ ./luo_kexec_simple stage_2
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-17 21:06 ` David Matlack
@ 2025-11-18 1:01 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 1:01 UTC (permalink / raw)
To: David Matlack
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > TEST_PROGS_O := $(patsubst %, %.o, $(TEST_PROGS))
> >
> > TEST_DEP_FILES += $(patsubst %.o, %.d, $(LIBLIVEUPDATE_O))
> > TEST_DEP_FILES += $(patsubst %.o, %.d, $(TEST_PROGS_O))
> > -include $(TEST_DEP_FILES)
> >
> > $(LIBLIVEUPDATE_O): $(OUTPUT)/%.o: %.c
> > $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> >
> > $(TEST_PROGS): %: %.o $(LIBLIVEUPDATE_O)
> > $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) $<
> > $(LIBLIVEUPDATE_O) $(LDLIBS) -o $@
> >
> > EXTRA_CLEAN += $(LIBLIVEUPDATE_O)
> > EXTRA_CLEAN += $(TEST_PROGS_O)
> > EXTRA_CLEAN += $(TEST_DEP_FILES)
Took your suggestion, thank you!
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-18 0:06 ` David Matlack
@ 2025-11-18 1:08 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 1:08 UTC (permalink / raw)
To: David Matlack
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 7:06 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
>
> > +/* Stage 1: Executed before the kexec reboot. */
> > +static void run_stage_1(int luo_fd)
> > +{
> > + int session_fd;
> > +
> > + ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> > +
> > + ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> > + create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> > +
> > + ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> > + TEST_SESSION_NAME);
> > + session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> > + if (session_fd < 0)
> > + fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> > +
> > + if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> > + TEST_MEMFD_DATA) < 0) {
> > + fail_exit("create_and_preserve_memfd for token %#x",
> > + TEST_MEMFD_TOKEN);
> > + }
> > +
> > + ksft_print_msg("[STAGE 1] Executing kexec...\n");
> > + if (system(KEXEC_SCRIPT) != 0)
> > + fail_exit("kexec script failed");
> > + exit(EXIT_FAILURE);
>
> Can we separate the kexec from the test and allow the user/automation to
> trigger it however is appropriate for their system? The current
> do_kexec.sh script does not do any sort of graceful shutdown, and I bet
> everyone will have different ways of initiating kexec on their systems.
Yes, this is a good idea, I am going to do what you suggested:
1. provide stage as argument.
2. allow user to do kexec command
Thank you,
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation
2025-11-16 18:25 ` Mike Rapoport
@ 2025-11-18 2:58 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 2:58 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > static int luo_session_release(struct inode *inodep, struct file *filep)
> > {
> > struct luo_session *session = filep->private_data;
> > struct luo_session_header *sh;
> > + int err = 0;
> >
> > /* If retrieved is set, it means this session is from incoming list */
> > - if (session->retrieved)
> > + if (session->retrieved) {
> > sh = &luo_session_global.incoming;
> > - else
> > +
> > + err = luo_session_finish_one(session);
> > + if (err) {
> > + pr_warn("Unable to finish session [%s] on release\n",
> > + session->name);
>
> return err;
>
> and then else can go away here and luo_session_remove() and
> luo_session_free() can be moved outside if (session->retrieved).
Done.
Thanks,
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
2025-11-17 9:39 ` Mike Rapoport
@ 2025-11-18 3:54 ` Pasha Tatashin
2025-11-18 11:28 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 3:54 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
>
> The concept makes sense to me, but it's hard to review the implementation
> without an actual user.
There are three users: we will have HugeTLB support that is going to
be posted as RFC in a few weeks. Also, in two weeks we are going to
have an updated VFIO and IOMMU series posted both using FLBs. In the
mean time, this series provides an FLB in-kernel test that verifies
that multiple FLBs can be attached to File-Handlers, and the basic
interfaces are working.
> > +struct liveupdate_flb {
> > + const struct liveupdate_flb_ops *ops;
> > + const char compatible[LIVEUPDATE_FLB_COMPAT_LENGTH];
> > + struct list_head list;
> > + void *internal;
>
> Can't list be a part of internal?
Yes, I moved it inside internal, and also, I removed
liveupdate_init_flb function (do that automatically now), and use the
__private as you suggested earlier, and also removed the kmalloc() for
the internal data, so FLBs can be safely used early in boot.
> And don't we usually call this .private rather than .internal?
Renamed.
>
> > };
> >
> > #ifdef CONFIG_LIVEUPDATE
> > @@ -111,6 +187,17 @@ int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> > struct file *file, u64 *tokenp);
> >
> > +/* Before using FLB for the first time it should be initialized */
> > +int liveupdate_init_flb(struct liveupdate_flb *flb);
> > +
> > +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> > + struct liveupdate_flb *flb);
>
> While these are obvious ...
>
> > +
> > +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp);
> > +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj);
> > +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp);
> > +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj);
> > +
>
> ... it's not very clear what these APIs are for and how they are going to be
> used.
Global resource that is accessible either while a file is getting
preserved or anytime during boot.
>
> > #else /* CONFIG_LIVEUPDATE */
>
> ...
>
> > +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> > + struct liveupdate_flb *flb)
> > +{
> > + struct luo_flb_internal *internal = flb->internal;
> > + struct luo_flb_link *link __free(kfree) = NULL;
> > + static DEFINE_MUTEX(register_flb_lock);
> > + struct liveupdate_flb *gflb;
> > + struct luo_flb_link *iter;
> > +
> > + if (!liveupdate_enabled())
> > + return -EOPNOTSUPP;
> > +
> > + if (WARN_ON(!h || !flb || !internal))
> > + return -EINVAL;
> > +
> > + if (WARN_ON(!flb->ops->preserve || !flb->ops->unpreserve ||
> > + !flb->ops->retrieve || !flb->ops->finish)) {
> > + return -EINVAL;
> > + }
> > +
> > + /*
> > + * Once session/files have been deserialized, FLBs cannot be registered,
> > + * it is too late. Deserialization uses file handlers, and FLB registers
> > + * to file handlers.
> > + */
> > + if (WARN_ON(luo_session_is_deserialized()))
> > + return -EBUSY;
> > +
> > + /*
> > + * File handler must already be registered, as it is initializes the
> > + * flb_list
> > + */
> > + if (WARN_ON(list_empty(&h->list)))
> > + return -EINVAL;
> > +
> > + link = kzalloc(sizeof(*link), GFP_KERNEL);
> > + if (!link)
> > + return -ENOMEM;
> > +
> > + guard(mutex)(®ister_flb_lock);
> > +
> > + /* Check that this FLB is not already linked to this file handler */
> > + list_for_each_entry(iter, &h->flb_list, list) {
> > + if (iter->flb == flb)
> > + return -EEXIST;
> > + }
> > +
> > + /* Is this FLB linked to global list ? */
>
> Maybe:
>
> /*
> * If this FLB is not linked to global list it's first time the FLB
> * is registered
> */
Done
> > +/**
> > + * liveupdate_flb_incoming_unlock - Unlock an incoming FLB object.
> > + * @flb: The FLB definition.
> > + * @obj: The object that was returned by the _locked call (used for validation).
> > + *
> > + * Releases the internal lock acquired by liveupdate_flb_incoming_locked().
> > + */
> > +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj)
> > +{
> > + struct luo_flb_internal *internal = flb->internal;
> > +
> > + lockdep_assert_held(&internal->incoming.lock);
> > + internal->incoming.obj = obj;
>
> The comment says obj is for validation and here it's assigned to flb.
> Something is off here :)
Thank you for catching stale comment, fixed.
> > + mutex_unlock(&internal->incoming.lock);
> > +}
> > +
> > +/**
> > + * liveupdate_flb_outgoing_locked - Lock and retrieve the outgoing FLB object.
> > + * @flb: The FLB definition.
> > + * @objp: Output parameter; will be populated with the live shared object.
> > + *
> > + * Acquires the FLB's internal lock and returns a pointer to its shared live
> > + * object for the outgoing (pre-reboot) path.
> > + *
> > + * This function assumes the object has already been created by the FLB's
> > + * .preserve() callback, which is triggered when the first dependent file
> > + * is preserved.
> > + *
> > + * The caller MUST call liveupdate_flb_outgoing_unlock() to release the lock.
> > + *
> > + * Return: 0 on success, or a negative errno on failure.
> > + */
> > +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp)
> > +{
> > + struct luo_flb_internal *internal = flb->internal;
> > +
> > + if (!liveupdate_enabled())
> > + return -EOPNOTSUPP;
> > +
> > + if (WARN_ON(!internal))
> > + return -EINVAL;
> > +
> > + mutex_lock(&internal->outgoing.lock);
> > +
> > + /* The object must exist if any file is being preserved */
> > + if (WARN_ON_ONCE(!internal->outgoing.obj)) {
> > + mutex_unlock(&internal->outgoing.lock);
> > + return -ENOENT;
> > + }
>
> _incoming_locked() and outgoing_locked() are nearly identical, it seems we
> can have the common part in a
> static liveupdate_flb_locked(struct luo_flb_state *state).
>
> liveupdate_flb_incoming_locked() will be oneline wrapper and
> liveupdate_flb_outgoing_locked() will have this WARN_ON if obj is NULL.
Done
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 12/20] mm: shmem: allow freezing inode mapping
2025-11-17 10:08 ` Mike Rapoport
@ 2025-11-18 4:13 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 4:13 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> > +/* Must be called with inode lock taken exclusive. */
> > +static inline void shmem_i_mapping_freeze(struct inode *inode, bool freeze)
>
> _mapping usually refers to operations on struct address_space.
> It seems that all shmem methods that take inode are just shmem_<operation>,
> so shmem_freeze() looks more appropriate.
Done, renamed to shmem_freeze()
>
> > +{
> > + if (freeze)
> > + SHMEM_I(inode)->flags |= SHMEM_F_MAPPING_FROZEN;
> > + else
> > + SHMEM_I(inode)->flags &= ~SHMEM_F_MAPPING_FROZEN;
> > +}
> > +
> > /*
> > * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
> > * beyond i_size's notion of EOF, which fallocate has committed to reserving:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 1d5036dec08a..05c3db840257 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1292,7 +1292,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
> > loff_t newsize = attr->ia_size;
> >
> > /* protected by i_rwsem */
> > - if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> > + if ((info->flags & SHMEM_F_MAPPING_FROZEN) ||
>
> A corner case: if newsize == oldsize this will be a false positive
Added a fix.
Thanks,
Pasha
>
> > + (newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> > (newsize > oldsize && (info->seals & F_SEAL_GROW)))
> > return -EPERM;
> >
> > @@ -3289,6 +3290,10 @@ shmem_write_begin(const struct kiocb *iocb, struct address_space *mapping,
> > return -EPERM;
> > }
> >
> > + if (unlikely((info->flags & SHMEM_F_MAPPING_FROZEN) &&
> > + pos + len > inode->i_size))
> > + return -EPERM;
> > +
> > ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE);
> > if (ret)
> > return ret;
> > @@ -3662,6 +3667,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >
> > inode_lock(inode);
> >
> > + if (info->flags & SHMEM_F_MAPPING_FROZEN) {
> > + error = -EPERM;
> > + goto out;
> > + }
> > +
> > if (mode & FALLOC_FL_PUNCH_HOLE) {
> > struct address_space *mapping = file->f_mapping;
> > loff_t unmap_start = round_up(offset, PAGE_SIZE);
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-17 21:05 ` Mike Rapoport
@ 2025-11-18 4:22 ` Pasha Tatashin
2025-11-18 11:21 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 4:22 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
> You can avoid that complexity if you register the device with a different
> fops, but that's technicality.
>
> Your point about treating the incoming FDT as an underlying resource that
> failed to initialize makes sense, but nevertheless userspace needs a
> reliable way to detect it and parsing dmesg is not something we should rely
> on.
I see two solutions:
1. LUO fails to retrieve the preserved data, the user gets informed by
not finding /dev/liveupdate, and studying the dmesg for what has
happened (in reality in fleets version mismatches should not be
happening, those should be detected in quals).
2. Create a zombie device to return some errno on open, and still
study dmesg to understand what really happened.
I think that 1 is better
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
2025-11-17 21:11 ` Mike Rapoport
@ 2025-11-18 4:28 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 4:28 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 4:11 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 10:09:28AM -0500, Pasha Tatashin wrote:
> >
> > > > + }
> > > > +
> > > > + for (int i = 0; i < sh->header_ser->count; i++) {
> > > > + struct luo_session *session;
> > > > +
> > > > + session = luo_session_alloc(sh->ser[i].name);
> > > > + if (IS_ERR(session)) {
> > > > + pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > > > + sh->ser[i].name, session);
> > > > + return PTR_ERR(session);
> > > > + }
> > >
> > > The allocated sessions still need to be freed if an insert fails ;-)
> >
> > No. We have failed to deserialize, so anyways the machine will need to
> > be rebooted by the user in order to release the preserved resources.
> >
> > This is something that Jason Gunthrope also mentioned regarding IOMMU:
> > if something is not correct (i.e., if a session cannot finish for some
> > reason), don't add complicated "undo" code that cleans up all
> > resources. Instead, treat them as a memory leak and allow a reboot to
> > perform the cleanup.
> >
> > While in this particular patch the clean-up looks simple, later in the
> > series we are adding file deserialization to each session to this
> > function. So, the clean-up will look like this: we would have to free
> > the resources for each session we deserialized, and also free the
> > resources for files that were deserialized for those sessions, only to
> > still boot into a "maintenance" mode where bunch of resources are not
> > accessible from which the machine would have to be rebooted to get
> > back to a normal state. This code will never be tested, and never be
> > used, so let's use reboot to solve this problem, where devices are
> > going to be properly reset, and memory is going to be properly freed.
>
> A part of this explanation should be a comment in the code.
Done.
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 4:22 ` Pasha Tatashin
@ 2025-11-18 11:21 ` Mike Rapoport
2025-11-18 14:03 ` Jason Gunthorpe
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-18 11:21 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > You can avoid that complexity if you register the device with a different
> > fops, but that's technicality.
> >
> > Your point about treating the incoming FDT as an underlying resource that
> > failed to initialize makes sense, but nevertheless userspace needs a
> > reliable way to detect it and parsing dmesg is not something we should rely
> > on.
>
> I see two solutions:
>
> 1. LUO fails to retrieve the preserved data, the user gets informed by
> not finding /dev/liveupdate, and studying the dmesg for what has
> happened (in reality in fleets version mismatches should not be
> happening, those should be detected in quals).
> 2. Create a zombie device to return some errno on open, and still
> study dmesg to understand what really happened.
User should not study dmesg. We need another solution.
What's wrong with e.g. ioctl()?
> I think that 1 is better
>
> Pasha
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
2025-11-18 3:54 ` Pasha Tatashin
@ 2025-11-18 11:28 ` Mike Rapoport
2025-11-18 15:37 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-18 11:28 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> >
> > The concept makes sense to me, but it's hard to review the implementation
> > without an actual user.
>
> There are three users: we will have HugeTLB support that is going to
> be posted as RFC in a few weeks. Also, in two weeks we are going to
> have an updated VFIO and IOMMU series posted both using FLBs. In the
> mean time, this series provides an FLB in-kernel test that verifies
> that multiple FLBs can be attached to File-Handlers, and the basic
> interfaces are working.
Which means that essentially there won't be a real kernel user for FLB for
a while.
We usually don't merge dead code because some future patchset depends on
it.
I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
it there until the first user is going to land and then FLB will move
upstream along with that user.
If keeping FLB in mm tree is an issue we can set up an integration tree for
LUO/KHO.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
2025-11-17 19:00 ` Pasha Tatashin
@ 2025-11-18 11:30 ` Mike Rapoport
2025-11-18 18:56 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-18 11:30 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 02:00:15PM -0500, Pasha Tatashin wrote:
> > > #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > index df337c9c4f21..9a531096bdb5 100644
> > > --- a/kernel/liveupdate/luo_file.c
> > > +++ b/kernel/liveupdate/luo_file.c
> > > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > > INIT_LIST_HEAD(&fh->flb_list);
> > > list_add_tail(&fh->list, &luo_file_handler_list);
> > >
> > > + liveupdate_test_register(fh);
> > > +
> >
> > Why this cannot be called from the test?
>
> Because test does not have access to all file_handlers that are being
> registered with LUO.
Unless I'm missing something, an FLB users registers a file handlers and
let's LUO know that it will need FLB. Why the test can't do the same?
> Pasha
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 11:21 ` Mike Rapoport
@ 2025-11-18 14:03 ` Jason Gunthorpe
2025-11-18 15:06 ` Mike Rapoport
0 siblings, 1 reply; 92+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 14:03 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > You can avoid that complexity if you register the device with a different
> > > fops, but that's technicality.
> > >
> > > Your point about treating the incoming FDT as an underlying resource that
> > > failed to initialize makes sense, but nevertheless userspace needs a
> > > reliable way to detect it and parsing dmesg is not something we should rely
> > > on.
> >
> > I see two solutions:
> >
> > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > not finding /dev/liveupdate, and studying the dmesg for what has
> > happened (in reality in fleets version mismatches should not be
> > happening, those should be detected in quals).
> > 2. Create a zombie device to return some errno on open, and still
> > study dmesg to understand what really happened.
>
> User should not study dmesg. We need another solution.
> What's wrong with e.g. ioctl()?
It seems very dangerous to even boot at all if the next kernel doesn't
understand the serialization information..
IMHO I think we should not even be thinking about this, it is up to
the predecessor environment to prevent it from happening. The ideas to
use ELF metadata/etc to allow a pre-flight validation are the right
solution.
If we get into the next kernel and it receives information it cannot
process it should just BUG_ON and die, or some broad equivalent.
It is a catastrophic orchestration error, and we don't need some fine
grain recovery or userspace visibility. Crash dump the system and
reboot it.
IOW, I would not invest time in this.
Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 14:03 ` Jason Gunthorpe
@ 2025-11-18 15:06 ` Mike Rapoport
2025-11-18 15:18 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Mike Rapoport @ 2025-11-18 15:06 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > You can avoid that complexity if you register the device with a different
> > > > fops, but that's technicality.
> > > >
> > > > Your point about treating the incoming FDT as an underlying resource that
> > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > on.
> > >
> > > I see two solutions:
> > >
> > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > happened (in reality in fleets version mismatches should not be
> > > happening, those should be detected in quals).
> > > 2. Create a zombie device to return some errno on open, and still
> > > study dmesg to understand what really happened.
> >
> > User should not study dmesg. We need another solution.
> > What's wrong with e.g. ioctl()?
>
> It seems very dangerous to even boot at all if the next kernel doesn't
> understand the serialization information..
>
> IMHO I think we should not even be thinking about this, it is up to
> the predecessor environment to prevent it from happening. The ideas to
> use ELF metadata/etc to allow a pre-flight validation are the right
> solution.
>
> If we get into the next kernel and it receives information it cannot
> process it should just BUG_ON and die, or some broad equivalent.
> It is a catastrophic orchestration error, and we don't need some fine
> grain recovery or userspace visibility. Crash dump the system and
> reboot it.
I was under impression Pasha wanted to get up to the userspace no matter
what.
panic() in liveupdate_early_init() makes perfect sense to me. Parsing dmesg
does not.
> IOW, I would not invest time in this.
>
> Jason
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 15:06 ` Mike Rapoport
@ 2025-11-18 15:18 ` Pasha Tatashin
2025-11-18 15:36 ` Jason Gunthorpe
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 15:18 UTC (permalink / raw)
To: Mike Rapoport
Cc: Jason Gunthorpe, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > You can avoid that complexity if you register the device with a different
> > > > > fops, but that's technicality.
> > > > >
> > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > on.
> > > >
> > > > I see two solutions:
> > > >
> > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > happened (in reality in fleets version mismatches should not be
> > > > happening, those should be detected in quals).
> > > > 2. Create a zombie device to return some errno on open, and still
> > > > study dmesg to understand what really happened.
> > >
> > > User should not study dmesg. We need another solution.
> > > What's wrong with e.g. ioctl()?
> >
> > It seems very dangerous to even boot at all if the next kernel doesn't
> > understand the serialization information..
> >
> > IMHO I think we should not even be thinking about this, it is up to
> > the predecessor environment to prevent it from happening. The ideas to
> > use ELF metadata/etc to allow a pre-flight validation are the right
> > solution.
100% agreed, this is the goal.
> > If we get into the next kernel and it receives information it cannot
> > process it should just BUG_ON and die, or some broad equivalent.
I initially had a panic() that would kill the kernel, but after
further consideration, I realized that we can still boot into
"maintenance" mode and allow the user to decide when and how to reboot
the machine back to a normal state.
Crashing during early boot has its own disadvantages: the crash kernel
is not available. Also, because live-update has to be very fast, the
console is likely to be disabled. Therefore, getting to userspace and
allowing the user to investigate what happened (e.g., automatically
retrieving dmesg or a core dump and filing a bug) before rebooting
seems like the most sensible approach.
This won't leak data, as /dev/liveupdate is completely disabled, so
nothing preserved in memory will be recoverable.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 15:18 ` Pasha Tatashin
@ 2025-11-18 15:36 ` Jason Gunthorpe
2025-11-18 15:46 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 15:36 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 10:18:28AM -0500, Pasha Tatashin wrote:
> On Tue, Nov 18, 2025 at 10:06 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Nov 18, 2025 at 10:03:00AM -0400, Jason Gunthorpe wrote:
> > > On Tue, Nov 18, 2025 at 01:21:34PM +0200, Mike Rapoport wrote:
> > > > On Mon, Nov 17, 2025 at 11:22:54PM -0500, Pasha Tatashin wrote:
> > > > > > You can avoid that complexity if you register the device with a different
> > > > > > fops, but that's technicality.
> > > > > >
> > > > > > Your point about treating the incoming FDT as an underlying resource that
> > > > > > failed to initialize makes sense, but nevertheless userspace needs a
> > > > > > reliable way to detect it and parsing dmesg is not something we should rely
> > > > > > on.
> > > > >
> > > > > I see two solutions:
> > > > >
> > > > > 1. LUO fails to retrieve the preserved data, the user gets informed by
> > > > > not finding /dev/liveupdate, and studying the dmesg for what has
> > > > > happened (in reality in fleets version mismatches should not be
> > > > > happening, those should be detected in quals).
> > > > > 2. Create a zombie device to return some errno on open, and still
> > > > > study dmesg to understand what really happened.
> > > >
> > > > User should not study dmesg. We need another solution.
> > > > What's wrong with e.g. ioctl()?
> > >
> > > It seems very dangerous to even boot at all if the next kernel doesn't
> > > understand the serialization information..
> > >
> > > IMHO I think we should not even be thinking about this, it is up to
> > > the predecessor environment to prevent it from happening. The ideas to
> > > use ELF metadata/etc to allow a pre-flight validation are the right
> > > solution.
>
> 100% agreed, this is the goal.
>
> > > If we get into the next kernel and it receives information it cannot
> > > process it should just BUG_ON and die, or some broad equivalent.
>
> I initially had a panic() that would kill the kernel, but after
> further consideration, I realized that we can still boot into
> "maintenance" mode and allow the user to decide when and how to reboot
> the machine back to a normal state.
> This won't leak data, as /dev/liveupdate is completely disabled, so
> nothing preserved in memory will be recoverable.
This seems reasonable, but it is still dangerous.
At the minimum the KHO startup either needs to succeed, panic, or fail
to online most of the memory (ie run from the safe region only)
The above approach works better for things like VFIO or memfd where
you can boot significantly safely. Not sure about iommu though, if
iommu doesn't deserialize properly then it probably corrupts all
memory too.
Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
2025-11-18 11:28 ` Mike Rapoport
@ 2025-11-18 15:37 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 15:37 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Tue, Nov 18, 2025 at 6:28 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> > >
> > > The concept makes sense to me, but it's hard to review the implementation
> > > without an actual user.
> >
> > There are three users: we will have HugeTLB support that is going to
> > be posted as RFC in a few weeks. Also, in two weeks we are going to
> > have an updated VFIO and IOMMU series posted both using FLBs. In the
> > mean time, this series provides an FLB in-kernel test that verifies
> > that multiple FLBs can be attached to File-Handlers, and the basic
> > interfaces are working.
>
> Which means that essentially there won't be a real kernel user for FLB for
> a while.
> We usually don't merge dead code because some future patchset depends on
> it.
I understand the concern. I would prefer to merge FLB with the rest of
the LUO series; I don't view it as completely dead code since I have
added the in-kernel test that specifically exercises and validates
this API.
> I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
> it there until the first user is going to land and then FLB will move
> upstream along with that user.
My reasoning for pushing for inclusion now is that there are many
developers who currently depend on the FLB functionality. Having it in
a public tree, preferably upstream, or at least linux-next, would be
highly beneficial for their development and testing.
However, to avoid blocking the entire series, I am going to move the
FLB patch and the in-kernel test patch to be the last two patches in
LUOv7.
This way, the rest of the LUO series can be merged without them if
they are blocked, however, in this case it would be best if the two
FLB patches stayed in mm tree to allow VFIO/IOMMU/PCI/HugeTLB
preservation developers to use them, as they all depend on functional
FLB.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
2025-11-17 2:54 ` Andrew Morton
@ 2025-11-18 15:45 ` Pratyush Yadav
2025-11-18 16:11 ` Pasha Tatashin
1 sibling, 1 reply; 92+ messages in thread
From: Pratyush Yadav @ 2025-11-18 15:45 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl
On Sat, Nov 15 2025, Pasha Tatashin wrote:
> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). The primary use case is updating hypervisors with minimal
> disruption to running virtual machines. For userspace side of hypervisor
> update we have copyless migration. LUO is for updating the kernel.
>
> This initial patch lays the groundwork for the LUO subsystem.
>
> Further functionality, including the implementation of state transition
> logic, integration with KHO, and hooks for subsystems and file
> descriptors, will be added in subsequent patches.
>
> Create a character device at /dev/liveupdate.
>
> A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> structures. The magic number for IOCTL is registered in
> Documentation/userspace-api/ioctl/ioctl-number.rst.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> new file mode 100644
> index 000000000000..0e1ab19fa1cd
> --- /dev/null
> +++ b/kernel/liveupdate/luo_core.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: Live Update Orchestrator (LUO)
> + *
> + * Live Update is a specialized, kexec-based reboot process that allows a
> + * running kernel to be updated from one version to another while preserving
> + * the state of selected resources and keeping designated hardware devices
> + * operational. For these devices, DMA activity may continue throughout the
> + * kernel transition.
> + *
> + * While the primary use case driving this work is supporting live updates of
> + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> + * Live Patching, which applies security fixes regardless of the workload,
> + * Live Update facilitates a full kernel version upgrade for any type of system.
Nit: I think live update is very different from live patching. It has
very different limitations and advantages. In fact, I view live patching
and live update on two opposite ends of the "applying security patches"
spectrum. I think this line is going to mislead or confuse people.
I think it would better to either spend more lines explaining the
difference between the two, or just drop it from here.
> + *
> + * For example, a non-hypervisor system running an in-memory cache like
> + * memcached with many gigabytes of data can use LUO. The userspace service
> + * can place its cache into a memfd, have its state preserved by LUO, and
> + * restore it immediately after the kernel kexec.
> + *
> + * Whether the system is running virtual machines, containers, a
> + * high-performance database, or networking services, LUO's primary goal is to
> + * enable a full kernel update by preserving critical userspace state and
> + * keeping essential devices operational.
> + *
> + * The core of LUO is a mechanism that tracks the progress of a live update,
> + * along with a callback API that allows other kernel subsystems to participate
> + * in the process. Example subsystems that can hook into LUO include: kvm,
> + * iommu, interrupts, vfio, participating filesystems, and memory management.
> + *
> + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> + * the next kernel. For more details see
> + * Documentation/core-api/kho/concepts.rst.
> + */
> +
[...]
> diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> new file mode 100644
> index 000000000000..44d365185f7c
> --- /dev/null
> +++ b/kernel/liveupdate/luo_ioctl.c
[...]
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Pasha Tatashin");
> +MODULE_DESCRIPTION("Live Update Orchestrator");
> +MODULE_VERSION("0.1");
Nit: do we really need the module version? I don't think LUO can even be
used as a module. What does this number mean then?
Other than these two nitpicks,
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 15:36 ` Jason Gunthorpe
@ 2025-11-18 15:46 ` Pasha Tatashin
2025-11-18 16:15 ` Jason Gunthorpe
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 15:46 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
> > This won't leak data, as /dev/liveupdate is completely disabled, so
> > nothing preserved in memory will be recoverable.
>
> This seems reasonable, but it is still dangerous.
>
> At the minimum the KHO startup either needs to succeed, panic, or fail
> to online most of the memory (ie run from the safe region only)
Allowing degrade booting using only scratch memory sounds like a very
good compromise. This allows the live-update boot to stay alive as a
sort of "crash kernel," particularly since kdump functionality is not
available here. However, it would require some work in KHO to enable
such a feature.
> The above approach works better for things like VFIO or memfd where
> you can boot significantly safely. Not sure about iommu though, if
> iommu doesn't deserialize properly then it probably corrupts all
> memory too.
Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
broken LUO recovering, the KHO preserved memory should still stay as
preserved but unretriable, so DMA activity should only happen to those
regions...
Pasha
>
> Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
2025-11-18 15:45 ` Pratyush Yadav
@ 2025-11-18 16:11 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 16:11 UTC (permalink / raw)
To: Pratyush Yadav
Cc: jasonmiu, graf, rppt, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl
On Tue, Nov 18, 2025 at 10:46 AM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Sat, Nov 15 2025, Pasha Tatashin wrote:
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec). The primary use case is updating hypervisors with minimal
> > disruption to running virtual machines. For userspace side of hypervisor
> > update we have copyless migration. LUO is for updating the kernel.
> >
> > This initial patch lays the groundwork for the LUO subsystem.
> >
> > Further functionality, including the implementation of state transition
> > logic, integration with KHO, and hooks for subsystems and file
> > descriptors, will be added in subsequent patches.
> >
> > Create a character device at /dev/liveupdate.
> >
> > A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> > structures. The magic number for IOCTL is registered in
> > Documentation/userspace-api/ioctl/ioctl-number.rst.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> [...]
> > diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> > new file mode 100644
> > index 000000000000..0e1ab19fa1cd
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_core.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +/**
> > + * DOC: Live Update Orchestrator (LUO)
> > + *
> > + * Live Update is a specialized, kexec-based reboot process that allows a
> > + * running kernel to be updated from one version to another while preserving
> > + * the state of selected resources and keeping designated hardware devices
> > + * operational. For these devices, DMA activity may continue throughout the
> > + * kernel transition.
> > + *
> > + * While the primary use case driving this work is supporting live updates of
> > + * the Linux kernel when it is used as a hypervisor in cloud environments, the
> > + * LUO framework itself is designed to be workload-agnostic. Much like Kernel
> > + * Live Patching, which applies security fixes regardless of the workload,
> > + * Live Update facilitates a full kernel version upgrade for any type of system.
>
> Nit: I think live update is very different from live patching. It has
> very different limitations and advantages. In fact, I view live patching
> and live update on two opposite ends of the "applying security patches"
> spectrum. I think this line is going to mislead or confuse people.
>
> I think it would better to either spend more lines explaining the
> difference between the two, or just drop it from here.
I removed mentioning live-patching.
>
> > + *
> > + * For example, a non-hypervisor system running an in-memory cache like
> > + * memcached with many gigabytes of data can use LUO. The userspace service
> > + * can place its cache into a memfd, have its state preserved by LUO, and
> > + * restore it immediately after the kernel kexec.
> > + *
> > + * Whether the system is running virtual machines, containers, a
> > + * high-performance database, or networking services, LUO's primary goal is to
> > + * enable a full kernel update by preserving critical userspace state and
> > + * keeping essential devices operational.
> > + *
> > + * The core of LUO is a mechanism that tracks the progress of a live update,
> > + * along with a callback API that allows other kernel subsystems to participate
> > + * in the process. Example subsystems that can hook into LUO include: kvm,
> > + * iommu, interrupts, vfio, participating filesystems, and memory management.
> > + *
> > + * LUO uses Kexec Handover to transfer memory state from the current kernel to
> > + * the next kernel. For more details see
> > + * Documentation/core-api/kho/concepts.rst.
> > + */
> > +
> [...]
> > diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
> > new file mode 100644
> > index 000000000000..44d365185f7c
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_ioctl.c
> [...]
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Pasha Tatashin");
> > +MODULE_DESCRIPTION("Live Update Orchestrator");
> > +MODULE_VERSION("0.1");
>
> Nit: do we really need the module version? I don't think LUO can even be
> used as a module. What does this number mean then?
Removed the above and also removed liveupdate_exit(). Also changed:
module_init(liveupdate_ioctl_init); to late_initcall(liveupdate_ioctl_init);
> Other than these two nitpicks,
>
> Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Thank you!
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 15:46 ` Pasha Tatashin
@ 2025-11-18 16:15 ` Jason Gunthorpe
2025-11-18 22:07 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 16:15 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 10:46:35AM -0500, Pasha Tatashin wrote:
> > > This won't leak data, as /dev/liveupdate is completely disabled, so
> > > nothing preserved in memory will be recoverable.
> >
> > This seems reasonable, but it is still dangerous.
> >
> > At the minimum the KHO startup either needs to succeed, panic, or fail
> > to online most of the memory (ie run from the safe region only)
>
> Allowing degrade booting using only scratch memory sounds like a very
> good compromise. This allows the live-update boot to stay alive as a
> sort of "crash kernel," particularly since kdump functionality is not
> available here. However, it would require some work in KHO to enable
> such a feature.
>
> > The above approach works better for things like VFIO or memfd where
> > you can boot significantly safely. Not sure about iommu though, if
> > iommu doesn't deserialize properly then it probably corrupts all
> > memory too.
>
> Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
> broken LUO recovering, the KHO preserved memory should still stay as
> preserved but unretriable, so DMA activity should only happen to those
> regions...
If the iommu is not preserved then normal iommu boot will possibly set
the translation the identiy and it will scribble over random memory.
You can't rely on the translation being present and only reaching kho
preserved memroy if the iommu can't restore itself.
Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-11-16 18:15 ` Mike Rapoport
@ 2025-11-18 17:38 ` David Matlack
2025-11-18 17:43 ` Pratyush Yadav
1 sibling, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-18 17:38 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
> This patch implements the core mechanism for managing preserved
> files throughout the live update lifecycle. It provides the logic to
> invoke the file handler callbacks (preserve, unpreserve, freeze,
> unfreeze, retrieve, and finish) at the appropriate stages.
>
> During the reboot phase, luo_file_freeze() serializes the final
> metadata for each file (handler compatible string, token, and data
> handle) into a memory region preserved by KHO. In the new kernel,
> luo_file_deserialize() reconstructs the in-memory file list from this
> data, preparing the session for retrieval.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
Should there be a way to unregister a file handler?
If VFIO is built as module then I think it would need to be able to
unregister its file handler when the module is unloaded to avoid leaking
pointers to its text in LUO.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-18 17:38 ` David Matlack
@ 2025-11-18 17:43 ` Pratyush Yadav
2025-11-18 17:58 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Pratyush Yadav @ 2025-11-18 17:43 UTC (permalink / raw)
To: David Matlack
Cc: Pasha Tatashin, pratyush, jasonmiu, graf, rppt, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl
On Tue, Nov 18 2025, David Matlack wrote:
> On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
>> This patch implements the core mechanism for managing preserved
>> files throughout the live update lifecycle. It provides the logic to
>> invoke the file handler callbacks (preserve, unpreserve, freeze,
>> unfreeze, retrieve, and finish) at the appropriate stages.
>>
>> During the reboot phase, luo_file_freeze() serializes the final
>> metadata for each file (handler compatible string, token, and data
>> handle) into a memory region preserved by KHO. In the new kernel,
>> luo_file_deserialize() reconstructs the in-memory file list from this
>> data, preparing the session for retrieval.
>>
>> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
>> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
>
> Should there be a way to unregister a file handler?
>
> If VFIO is built as module then I think it would need to be able to
> unregister its file handler when the module is unloaded to avoid leaking
> pointers to its text in LUO.
Good point. We also need when using FLB. You would first do
liveupdate_register_file_handler(), and then do
liveupdate_register_flb(). If the latter fails, you would want to
unregister the file handler too.
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-18 17:43 ` Pratyush Yadav
@ 2025-11-18 17:58 ` Pasha Tatashin
2025-11-18 18:17 ` Pratyush Yadav
2025-11-18 19:09 ` Jason Gunthorpe
0 siblings, 2 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 17:58 UTC (permalink / raw)
To: Pratyush Yadav
Cc: David Matlack, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl
On Tue, Nov 18, 2025 at 12:43 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Tue, Nov 18 2025, David Matlack wrote:
>
> > On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
> >> This patch implements the core mechanism for managing preserved
> >> files throughout the live update lifecycle. It provides the logic to
> >> invoke the file handler callbacks (preserve, unpreserve, freeze,
> >> unfreeze, retrieve, and finish) at the appropriate stages.
> >>
> >> During the reboot phase, luo_file_freeze() serializes the final
> >> metadata for each file (handler compatible string, token, and data
> >> handle) into a memory region preserved by KHO. In the new kernel,
> >> luo_file_deserialize() reconstructs the in-memory file list from this
> >> data, preparing the session for retrieval.
> >>
> >> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> >
> >> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
> >
> > Should there be a way to unregister a file handler?
> >
> > If VFIO is built as module then I think it would need to be able to
> > unregister its file handler when the module is unloaded to avoid leaking
> > pointers to its text in LUO.
I actually had full unregister functionality in v4 and earlier, but I
dropped it from this series to minimize the footprint and get the core
infrastructure landed first.
For now, safety is guaranteed because
liveupdate_register_file_handler() and liveupdate_register_flb() take
a module reference. This effectively pins any module that registers
with LUO, meaning those driver modules cannot be unloaded or upgraded
dynamically, they can only be updated via Live Update or full reboot.
I plan to introduce unregister support in a future improvements to
relax this constraint. The design I have in mind is:
1. Unregistration will acquire the singleton lock on /dev/liveupdate
to ensure no new sessions can be created during teardown.
2. Verify that there are no incoming/outgoing sessions.
2. File-Handler can only be unregistered if there are no FLBs
currently registered against it.
Pasha
> Good point. We also need when using FLB. You would first do
> liveupdate_register_file_handler(), and then do
> liveupdate_register_flb(). If the latter fails, you would want to
> unregister the file handler too.
>
> --
> Regards,
> Pratyush Yadav
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-18 17:58 ` Pasha Tatashin
@ 2025-11-18 18:17 ` Pratyush Yadav
2025-11-18 19:09 ` Jason Gunthorpe
1 sibling, 0 replies; 92+ messages in thread
From: Pratyush Yadav @ 2025-11-18 18:17 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
chrisl
On Tue, Nov 18 2025, Pasha Tatashin wrote:
> On Tue, Nov 18, 2025 at 12:43 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>>
>> On Tue, Nov 18 2025, David Matlack wrote:
>>
>> > On 2025-11-15 06:33 PM, Pasha Tatashin wrote:
>> >> This patch implements the core mechanism for managing preserved
>> >> files throughout the live update lifecycle. It provides the logic to
>> >> invoke the file handler callbacks (preserve, unpreserve, freeze,
>> >> unfreeze, retrieve, and finish) at the appropriate stages.
>> >>
>> >> During the reboot phase, luo_file_freeze() serializes the final
>> >> metadata for each file (handler compatible string, token, and data
>> >> handle) into a memory region preserved by KHO. In the new kernel,
>> >> luo_file_deserialize() reconstructs the in-memory file list from this
>> >> data, preparing the session for retrieval.
>> >>
>> >> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> >
>> >> +int liveupdate_register_file_handler(struct liveupdate_file_handler *h);
>> >
>> > Should there be a way to unregister a file handler?
>> >
>> > If VFIO is built as module then I think it would need to be able to
>> > unregister its file handler when the module is unloaded to avoid leaking
>> > pointers to its text in LUO.
>
> I actually had full unregister functionality in v4 and earlier, but I
> dropped it from this series to minimize the footprint and get the core
> infrastructure landed first.
>
> For now, safety is guaranteed because
> liveupdate_register_file_handler() and liveupdate_register_flb() take
> a module reference. This effectively pins any module that registers
> with LUO, meaning those driver modules cannot be unloaded or upgraded
> dynamically, they can only be updated via Live Update or full reboot.
What if liveupdate_register_flb() fails? It would need to unregister its
file handler too, since the file handler can't really work without its
FLB. Shouldn't happen in practice, but still LUO clients need a way to
handle this failure.
[...]
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
2025-11-18 11:30 ` Mike Rapoport
@ 2025-11-18 18:56 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 18:56 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Tue, Nov 18, 2025 at 6:31 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 02:00:15PM -0500, Pasha Tatashin wrote:
> > > > #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > > index df337c9c4f21..9a531096bdb5 100644
> > > > --- a/kernel/liveupdate/luo_file.c
> > > > +++ b/kernel/liveupdate/luo_file.c
> > > > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > > > INIT_LIST_HEAD(&fh->flb_list);
> > > > list_add_tail(&fh->list, &luo_file_handler_list);
> > > >
> > > > + liveupdate_test_register(fh);
> > > > +
> > >
> > > Why this cannot be called from the test?
> >
> > Because test does not have access to all file_handlers that are being
> > registered with LUO.
>
> Unless I'm missing something, an FLB users registers a file handlers and
> let's LUO know that it will need FLB. Why the test can't do the same?
The test needs to attach to every registered file handler because we
want to ensure that FLB scales and works correctly with any file
handler. For this in-kernel test, there is no need to create our own
file type or to drive it from userspace (where a user would create a
file of that type, preserve it with LUO, so FLB can be allocated and
checked. This in-kernel test is self-sufficient.
> > Pasha
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-18 17:58 ` Pasha Tatashin
2025-11-18 18:17 ` Pratyush Yadav
@ 2025-11-18 19:09 ` Jason Gunthorpe
2025-11-18 19:31 ` Pasha Tatashin
1 sibling, 1 reply; 92+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 19:09 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, parav, leonro, witu, hughd, skhawaja,
chrisl
On Tue, Nov 18, 2025 at 12:58:20PM -0500, Pasha Tatashin wrote:
> I actually had full unregister functionality in v4 and earlier, but I
> dropped it from this series to minimize the footprint and get the core
> infrastructure landed first.
I don't think this will make sense, there are enough error paths we
can't have registers without unregisters to unwind them.
Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-18 19:09 ` Jason Gunthorpe
@ 2025-11-18 19:31 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 19:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pratyush Yadav, David Matlack, jasonmiu, graf, rppt, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, parav, leonro, witu, hughd, skhawaja,
chrisl
On Tue, Nov 18, 2025 at 2:09 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 12:58:20PM -0500, Pasha Tatashin wrote:
> > I actually had full unregister functionality in v4 and earlier, but I
> > dropped it from this series to minimize the footprint and get the core
> > infrastructure landed first.
>
> I don't think this will make sense, there are enough error paths we
> can't have registers without unregisters to unwind them.
I will add them back in LUOv7.
>
> Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 16:15 ` Jason Gunthorpe
@ 2025-11-18 22:07 ` Pasha Tatashin
2025-11-18 23:25 ` Jason Gunthorpe
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-18 22:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 11:15 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 10:46:35AM -0500, Pasha Tatashin wrote:
> > > > This won't leak data, as /dev/liveupdate is completely disabled, so
> > > > nothing preserved in memory will be recoverable.
> > >
> > > This seems reasonable, but it is still dangerous.
> > >
> > > At the minimum the KHO startup either needs to succeed, panic, or fail
> > > to online most of the memory (ie run from the safe region only)
> >
> > Allowing degrade booting using only scratch memory sounds like a very
> > good compromise. This allows the live-update boot to stay alive as a
> > sort of "crash kernel," particularly since kdump functionality is not
> > available here. However, it would require some work in KHO to enable
> > such a feature.
> >
> > > The above approach works better for things like VFIO or memfd where
> > > you can boot significantly safely. Not sure about iommu though, if
> > > iommu doesn't deserialize properly then it probably corrupts all
> > > memory too.
> >
> > Yes, DMA may corrupt memory if KHO is broken, *but* we are discussing
> > broken LUO recovering, the KHO preserved memory should still stay as
> > preserved but unretriable, so DMA activity should only happen to those
> > regions...
>
> If the iommu is not preserved then normal iommu boot will possibly set
> the translation the identiy and it will scribble over random memory.
>
> You can't rely on the translation being present and only reaching kho
> preserved memroy if the iommu can't restore itself.
In this case, we cannot even rely on having "safe" memory, i.e. this
scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
there a way to avoid defaulting to identify mode when we are booting
into the "maintenance" mode?
Thanks,
Pasha
>
> Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 22:07 ` Pasha Tatashin
@ 2025-11-18 23:25 ` Jason Gunthorpe
2025-11-19 3:03 ` Pasha Tatashin
0 siblings, 1 reply; 92+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 23:25 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 05:07:15PM -0500, Pasha Tatashin wrote:
> In this case, we cannot even rely on having "safe" memory, i.e. this
> scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
> there a way to avoid defaulting to identify mode when we are booting
> into the "maintenance" mode?
Maybe one could be created?
It's tricky though because you also really want to block drivers from
using the iommu if you don't know they are quieted and you can't do
that without parsing the KHO data, which you can't do because it
doesn't understand it..
IDK, I think the "maintenance" mode is something that is probably best
effort and shouldn't be relied on. It will work if the iommu data is
restored or other lucky conditions hit, so it is not useless, but it
is certainly not robust or guaranteed.
You are better to squirt a panic message out of the serial port and
hope for the best I guess.
Jason
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
2025-11-18 23:25 ` Jason Gunthorpe
@ 2025-11-19 3:03 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-19 3:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
skhawaja, chrisl
On Tue, Nov 18, 2025 at 6:25 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Nov 18, 2025 at 05:07:15PM -0500, Pasha Tatashin wrote:
>
> > In this case, we cannot even rely on having "safe" memory, i.e. this
> > scratch only boot to preserve dmesg/core etc, this is unfortunate. Is
> > there a way to avoid defaulting to identify mode when we are booting
> > into the "maintenance" mode?
>
> Maybe one could be created?
>
> It's tricky though because you also really want to block drivers from
> using the iommu if you don't know they are quieted and you can't do
> that without parsing the KHO data, which you can't do because it
> doesn't understand it..
>
> IDK, I think the "maintenance" mode is something that is probably best
> effort and shouldn't be relied on. It will work if the iommu data is
> restored or other lucky conditions hit, so it is not useless, but it
> is certainly not robust or guaranteed.
Right, even kdump has always been best-effort; many types of crashes
do not make it to the crash kernel.
> You are better to squirt a panic message out of the serial port and
For early boot LUO mismatches, or if FLB data is inaccessible for any
reason, devices might go rogue, so triggering a panic during boot is
appropriate.
However, session and file data structures are deserialized later, when
/dev/liveupdate is first opened by userspace. If deserialization fails
at that stage, I think we should simply fail the open(/dev/liveupdate)
call with an error such as -EIO.
Pasha
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
` (2 preceding siblings ...)
2025-11-18 0:06 ` David Matlack
@ 2025-11-19 21:20 ` David Matlack
2025-11-19 22:12 ` Pasha Tatashin
3 siblings, 1 reply; 92+ messages in thread
From: David Matlack @ 2025-11-19 21:20 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
> diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> new file mode 100755
> index 000000000000..3c7c6cafbef8
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +set -e
> +
> +# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
It'd be nice to use proper command line options for KERNEL and INITRAMFS
instead of relying on environment variables.
e.g.
./do_kexec.sh -k <kernel> -i <initramfs>
> +
> +KERNEL="${KERNEL:-/boot/bzImage}"
> +set -- -l -s --reuse-cmdline "$KERNEL"
I've observed --reuse-cmdline causing overload of the kernel command
line when doing repeated kexecs, since it includes the built-in command
line (CONFIG_CMDLINE) which then also gets added by the next kernel
during boot.
Should we have something like this instead?
diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
index 3c7c6cafbef8..2590a870993d 100755
--- a/tools/testing/selftests/liveupdate/do_kexec.sh
+++ b/tools/testing/selftests/liveupdate/do_kexec.sh
@@ -4,8 +4,16 @@ set -e
# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
+# Determine the boot command line we need to pass to the kexec kernel. Note
+# that the kernel will append to it its builtin command line, so make sure we
+# subtract the builtin command to avoid accumulating kernel parameters and
+# eventually overflowing the command line.
+full_cmdline=$(cat /proc/cmdline)
+builtin_cmdline=$(zcat /proc/config.gz|grep CONFIG_CMDLINE=|cut -f2 -d\")
+cmdline=${full_cmdline/$builtin_cmdline /}
+
KERNEL="${KERNEL:-/boot/bzImage}"
-set -- -l -s --reuse-cmdline "$KERNEL"
+set -- -l -s --command-line="${cmdline}" "$KERNEL"
INITRAMFS="${INITRAMFS:-/boot/initramfs}"
if [ -f "$INITRAMFS" ]; then
> +
> +INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> +if [ -f "$INITRAMFS" ]; then
> + set -- "$@" --initrd="$INITRAMFS"
> +fi
> +
> +kexec "$@"
> +kexec -e
Consider separating the kexec load into its own script, in case systems have
their own ways of shutting down for kexec.
e.g. a kexec_load.sh script that does everything that do_kexec.sh does execpt
the `kexec -e`. Then do_kexec.sh just calls kexec_load.sh and kexec -e.
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
2025-11-17 11:03 ` Mike Rapoport
@ 2025-11-19 21:56 ` Pasha Tatashin
2025-11-20 15:34 ` Pratyush Yadav
0 siblings, 1 reply; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-19 21:56 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 6:04 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
> > From: Pratyush Yadav <ptyadav@amazon.de>
> >
> > The ability to preserve a memfd allows userspace to use KHO and LUO to
> > transfer its memory contents to the next kernel. This is useful in many
> > ways. For one, it can be used with IOMMUFD as the backing store for
> > IOMMU page tables. Preserving IOMMUFD is essential for performing a
> > hypervisor live update with passthrough devices. memfd support provides
> > the first building block for making that possible.
> >
> > For another, applications with a large amount of memory that takes time
> > to reconstruct, reboots to consume kernel upgrades can be very
> > expensive. memfd with LUO gives those applications reboot-persistent
> > memory that they can use to quickly save and reconstruct that state.
> >
> > While memfd is backed by either hugetlbfs or shmem, currently only
> > support on shmem is added. To be more precise, support for anonymous
> > shmem files is added.
> >
> > The handover to the next kernel is not transparent. All the properties
> > of the file are not preserved; only its memory contents, position, and
> > size. The recreated file gets the UID and GID of the task doing the
> > restore, and the task's cgroup gets charged with the memory.
> >
> > Once preserved, the file cannot grow or shrink, and all its pages are
> > pinned to avoid migrations and swapping. The file can still be read from
> > or written to.
> >
> > Use vmalloc to get the buffer to hold the folios, and preserve
> > it using kho_preserve_vmalloc(). This doesn't have the size limit.
> >
> > Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
>
> The order of signed-offs seems wrong, Pasha's should be the last one.
Updated.
> > + * This interface is a contract. Any modification to the FDT structure,
> > + * node properties, compatible string, or the layout of the serialization
> > + * structures defined here constitutes a breaking change. Such changes require
> > + * incrementing the version number in the MEMFD_LUO_FH_COMPATIBLE string.
>
> The same comment about contract as for the generic LUO documentation
> applies here (https://lore.kernel.org/all/aRnG8wDSSAtkEI_z@kernel.org/)
Added.
>
> > + *
> > + * FDT Structure Overview:
> > + * The memfd state is contained within a single FDT with the following layout:
>
> ...
>
> > +static struct memfd_luo_folio_ser *memfd_luo_preserve_folios(struct file *file, void *fdt,
> > + u64 *nr_foliosp)
> > +{
>
> If we are already returning nr_folios by reference, we might do it for
> memfd_luo_folio_ser as well and make the function return int.
Done
>
> > + struct inode *inode = file_inode(file);
> > + struct memfd_luo_folio_ser *pfolios;
> > + struct kho_vmalloc *kho_vmalloc;
> > + unsigned int max_folios;
> > + long i, size, nr_pinned;
> > + struct folio **folios;
>
> pfolios and folios read like the former is a pointer to latter.
> I'd s/pfolios/folios_ser/
Done
> > + int err = -EINVAL;
> > + pgoff_t offset;
> > + u64 nr_folios;
>
> ...
>
> > + kvfree(folios);
> > + *nr_foliosp = nr_folios;
> > + return pfolios;
> > +
> > +err_unpreserve:
> > + i--;
> > + for (; i >= 0; i--)
>
> Maybe a single line
>
> for (--i; i >= 0; --i)
Done, but wrote it as:
for (i = i - 1; i >= 0; i--)
Which looks a little cleaner to me.
>
> > + kho_unpreserve_folio(folios[i]);
> > + vfree(pfolios);
> > +err_unpin:
> > + unpin_folios(folios, nr_folios);
> > +err_free_folios:
> > + kvfree(folios);
> > + return ERR_PTR(err);
> > +}
> > +
> > +static void memfd_luo_unpreserve_folios(void *fdt, struct memfd_luo_folio_ser *pfolios,
> > + u64 nr_folios)
> > +{
> > + struct kho_vmalloc *kho_vmalloc;
> > + long i;
> > +
> > + if (!nr_folios)
> > + return;
> > +
> > + kho_vmalloc = (struct kho_vmalloc *)fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, NULL);
> > + /* The FDT was created by this kernel so expect it to be sane. */
> > + WARN_ON_ONCE(!kho_vmalloc);
>
> The FDT won't have FOLIOS property if size was zero, will it?
> I think that if we add kho_vmalloc handle to struct memfd_luo_private and
> pass that around it will make things easier and simpler.
I am actually thinking of removing FDTs and using versioned struct directly.
>
> > + kho_unpreserve_vmalloc(kho_vmalloc);
> > +
> > + for (i = 0; i < nr_folios; i++) {
> > + const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
> > + struct folio *folio;
> > +
> > + if (!pfolio->foliodesc)
> > + continue;
>
> How can this happen? Can pfolios be a sparse array?
With the current implementation of memfd_pin_folios, which populates
holes, this array will be dense. This check is defensive coding in
case we switch to a sparse preservation mechanism in the future. I
will add a comment, and add a warn_on_once.
>
> > + folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> > +
> > + kho_unpreserve_folio(folio);
> > + unpin_folio(folio);
> > + }
> > +
> > + vfree(pfolios);
> > +}
>
> ...
>
> > +static void memfd_luo_finish(struct liveupdate_file_op_args *args)
> > +{
> > + const struct memfd_luo_folio_ser *pfolios;
> > + struct folio *fdt_folio;
> > + const void *fdt;
> > + u64 nr_folios;
> > +
> > + if (args->retrieved)
> > + return;
> > +
> > + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
> > + if (!fdt_folio) {
> > + pr_err("failed to restore memfd FDT\n");
> > + return;
> > + }
> > +
> > + fdt = folio_address(fdt_folio);
> > +
> > + pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
> > + if (!pfolios)
> > + goto out;
> > +
> > + memfd_luo_discard_folios(pfolios, nr_folios);
>
> Does not this free the actual folios that were supposed to be preserved?
It does, when memfd was not reclaimed.
>
> > + vfree(pfolios);
> > +
> > +out:
> > + folio_put(fdt_folio);
> > +}
>
> ...
>
> > +static int memfd_luo_retrieve(struct liveupdate_file_op_args *args)
> > +{
> > + struct folio *fdt_folio;
> > + const u64 *pos, *size;
> > + struct file *file;
> > + int len, ret = 0;
> > + const void *fdt;
> > +
> > + fdt_folio = memfd_luo_get_fdt(args->serialized_data);
>
> Why do we need to kho_restore_folio() twice? Here and in
> memfd_luo_finish()?
Here we retrieve memfd and give it to userspace. In finish, discard
whatever was not reclaimed.
>
> > + if (!fdt_folio)
> > + return -ENOENT;
> > +
> > + fdt = page_to_virt(folio_page(fdt_folio, 0));
>
> folio_address()
Done
>
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
2025-11-19 21:20 ` David Matlack
@ 2025-11-19 22:12 ` Pasha Tatashin
0 siblings, 0 replies; 92+ messages in thread
From: Pasha Tatashin @ 2025-11-19 22:12 UTC (permalink / raw)
To: David Matlack
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Wed, Nov 19, 2025 at 4:20 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
>
> > diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> > new file mode 100755
> > index 000000000000..3c7c6cafbef8
> > --- /dev/null
> > +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> > @@ -0,0 +1,16 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +set -e
> > +
> > +# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
>
> It'd be nice to use proper command line options for KERNEL and INITRAMFS
> instead of relying on environment variables.
Now that tests and do_kexec are separate, I do not think we should
complicate do_kexec.sh to support every possible environment. On most
modern distros kexec is managed via systemd, and the load and reboot
commands are going to be handled through systemd. do_kexec.sh is meant
for a very simplistic environment such as with busybox rootfs to
perform selftests.
> e.g.
>
> ./do_kexec.sh -k <kernel> -i <initramfs>
>
> > +
> > +KERNEL="${KERNEL:-/boot/bzImage}"
> > +set -- -l -s --reuse-cmdline "$KERNEL"
>
> I've observed --reuse-cmdline causing overload of the kernel command
> line when doing repeated kexecs, since it includes the built-in command
> line (CONFIG_CMDLINE) which then also gets added by the next kernel
> during boot.
There is a problem with CONFIG_CMDLINE + KEXEC, ideally, it should be
addressed in the kernel
>
> Should we have something like this instead?
>
> diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> index 3c7c6cafbef8..2590a870993d 100755
> --- a/tools/testing/selftests/liveupdate/do_kexec.sh
> +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> @@ -4,8 +4,16 @@ set -e
>
> # Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
>
> +# Determine the boot command line we need to pass to the kexec kernel. Note
> +# that the kernel will append to it its builtin command line, so make sure we
> +# subtract the builtin command to avoid accumulating kernel parameters and
> +# eventually overflowing the command line.
> +full_cmdline=$(cat /proc/cmdline)
> +builtin_cmdline=$(zcat /proc/config.gz|grep CONFIG_CMDLINE=|cut -f2 -d\")
This also implies we have /proc/config.gz or CONFIG_IKCONFIG_PROC ...
> +cmdline=${full_cmdline/$builtin_cmdline /}
> +
> KERNEL="${KERNEL:-/boot/bzImage}"
> -set -- -l -s --reuse-cmdline "$KERNEL"
> +set -- -l -s --command-line="${cmdline}" "$KERNEL"
>
> INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> if [ -f "$INITRAMFS" ]; then
>
> > +
> > +INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> > +if [ -f "$INITRAMFS" ]; then
> > + set -- "$@" --initrd="$INITRAMFS"
> > +fi
> > +
> > +kexec "$@"
> > +kexec -e
>
> Consider separating the kexec load into its own script, in case systems have
> their own ways of shutting down for kexec.
I think, if do_kexec.sh does not work (load + reboot), the user should
use whatever the standard way on a distro to do kexec.
>
> e.g. a kexec_load.sh script that does everything that do_kexec.sh does execpt
> the `kexec -e`. Then do_kexec.sh just calls kexec_load.sh and kexec -e.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
2025-11-19 21:56 ` Pasha Tatashin
@ 2025-11-20 15:34 ` Pratyush Yadav
0 siblings, 0 replies; 92+ messages in thread
From: Pratyush Yadav @ 2025-11-20 15:34 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
chrisl
On Wed, Nov 19 2025, Pasha Tatashin wrote:
> On Mon, Nov 17, 2025 at 6:04 AM Mike Rapoport <rppt@kernel.org> wrote:
>>
>> On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
>> > From: Pratyush Yadav <ptyadav@amazon.de>
>> >
>> > The ability to preserve a memfd allows userspace to use KHO and LUO to
>> > transfer its memory contents to the next kernel. This is useful in many
>> > ways. For one, it can be used with IOMMUFD as the backing store for
>> > IOMMU page tables. Preserving IOMMUFD is essential for performing a
>> > hypervisor live update with passthrough devices. memfd support provides
>> > the first building block for making that possible.
>> >
>> > For another, applications with a large amount of memory that takes time
>> > to reconstruct, reboots to consume kernel upgrades can be very
>> > expensive. memfd with LUO gives those applications reboot-persistent
>> > memory that they can use to quickly save and reconstruct that state.
>> >
>> > While memfd is backed by either hugetlbfs or shmem, currently only
>> > support on shmem is added. To be more precise, support for anonymous
>> > shmem files is added.
>> >
>> > The handover to the next kernel is not transparent. All the properties
>> > of the file are not preserved; only its memory contents, position, and
>> > size. The recreated file gets the UID and GID of the task doing the
>> > restore, and the task's cgroup gets charged with the memory.
>> >
>> > Once preserved, the file cannot grow or shrink, and all its pages are
>> > pinned to avoid migrations and swapping. The file can still be read from
>> > or written to.
>> >
>> > Use vmalloc to get the buffer to hold the folios, and preserve
>> > it using kho_preserve_vmalloc(). This doesn't have the size limit.
>> >
>> > Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> > Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
[...]
>> > + struct inode *inode = file_inode(file);
>> > + struct memfd_luo_folio_ser *pfolios;
>> > + struct kho_vmalloc *kho_vmalloc;
>> > + unsigned int max_folios;
>> > + long i, size, nr_pinned;
>> > + struct folio **folios;
>>
>> pfolios and folios read like the former is a pointer to latter.
>> I'd s/pfolios/folios_ser/
folios_ser is a tricky name, it is very close to folio_ser (which is
what you might use for one member of the array).
I was bit by this when hacking on some hugetlb preservation code. I
wrote folios_ser instead of folio_ser in a loop, and then had to spend
half an hour trying to figure out why the code wasn't working. It is
kinda hard to differentiate between the two visually.
Not that I have a better name off the top of my head. Just saying that
this naming causes weird readability problems.
>
> Done
>
[...]
--
Regards,
Pratyush Yadav
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
2025-11-17 17:50 ` Pasha Tatashin
@ 2025-11-20 17:20 ` Mike Rapoport
0 siblings, 0 replies; 92+ messages in thread
From: Mike Rapoport @ 2025-11-20 17:20 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
On Mon, Nov 17, 2025 at 12:50:56PM -0500, Pasha Tatashin wrote:
> > > +struct liveupdate_file_handler;
> > > +struct liveupdate_session;
> >
> > Why struct liveupdate_session is a part of public LUO API?
>
> It is an obscure version of private "struct luo_session", in order to
> give subsystem access to:
> liveupdate_get_file_incoming(s, token, filep)
> liveupdate_get_token_outgoing(s, file, tokenp)
>
> For example, if your FD depends on another FD within a session, you
> can check if another FD is already preserved via
> liveupdate_get_token_outgoing(), and during retrieval time you can
> retrieve the "struct file" for your dependency.
And it's essentially unused right now.
> > > + }
> > > +
> > > + return 0;
> > > +
> > > +exit_err:
> > > + fput(file);
> > > + luo_session_free_files_mem(session);
> >
> > The error handling in this function is a mess. Pasha, please, please, use
> > goto consistently.
>
> How is this a mess? There is a single exit_err destination, no
> exception, no early returns except at the very top of the function
> where we do early returns before fget() which makes total sense.
>
> Do you want to add a separate destination for
> luo_session_free_files_mem() ? But that is not necessary, in many
> places it is considered totally reasonable for free(NULL) to work
> correctly...
You have a mix of releasing resources with goto or inside if (err).
And while basic free() primitives like kfree() and vfree() work correctly
with NULL as a parameter, luo_session_free_files_mem() is already not a
basic primitive and it may grow with a time. It already has two conditions
that essentially prevent anything from freeing and this will grow with the
time.
So yes, I want a separate goto destination for freeing each resource and a
goto for
err = fh->ops->preserve(&args);
if (err)
case.
> > > + luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > > + if (!luo_file)
> > > + return -ENOMEM;
> >
> > Shouldn't we free files allocated on the previous iterations?
>
> No, for the same reason explained in luo_session.c :-)
A comment here as well please :)
> > > +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > > + struct file **filep)
> > > +{
> >
> > Ditto.
>
> These two functions are part of the public API allowing dependency
> tracking for vfio->iommu->memfd during preservation.
So like with FLB, until we get actual users for them they are dead code.
And until it's clear how exactly dependency tracking for vfio->iommu->memfd
will work, we won't know if this API is useful at all or we'll need
something else in the end.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 92+ messages in thread
end of thread, other threads:[~2025-11-20 17:20 UTC | newest]
Thread overview: 92+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-15 23:33 [PATCH v6 00/20] Live Update Orchestrator Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: " Pasha Tatashin
2025-11-17 2:54 ` Andrew Morton
2025-11-17 14:27 ` Pasha Tatashin
2025-11-18 15:45 ` Pratyush Yadav
2025-11-18 16:11 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-11-16 12:43 ` Mike Rapoport
2025-11-16 14:55 ` Pasha Tatashin
2025-11-16 19:16 ` Mike Rapoport
2025-11-17 18:29 ` Pasha Tatashin
2025-11-17 21:05 ` Mike Rapoport
2025-11-18 4:22 ` Pasha Tatashin
2025-11-18 11:21 ` Mike Rapoport
2025-11-18 14:03 ` Jason Gunthorpe
2025-11-18 15:06 ` Mike Rapoport
2025-11-18 15:18 ` Pasha Tatashin
2025-11-18 15:36 ` Jason Gunthorpe
2025-11-18 15:46 ` Pasha Tatashin
2025-11-18 16:15 ` Jason Gunthorpe
2025-11-18 22:07 ` Pasha Tatashin
2025-11-18 23:25 ` Jason Gunthorpe
2025-11-19 3:03 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec Pasha Tatashin
2025-11-16 12:44 ` Mike Rapoport
2025-11-15 23:33 ` [PATCH v6 04/20] liveupdate: luo_session: add sessions support Pasha Tatashin
2025-11-16 17:05 ` Mike Rapoport
2025-11-17 15:09 ` Pasha Tatashin
2025-11-17 21:11 ` Mike Rapoport
2025-11-18 4:28 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface Pasha Tatashin
2025-11-16 17:15 ` Mike Rapoport
2025-11-17 14:22 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-11-16 18:15 ` Mike Rapoport
2025-11-17 17:50 ` Pasha Tatashin
2025-11-20 17:20 ` Mike Rapoport
2025-11-18 17:38 ` David Matlack
2025-11-18 17:43 ` Pratyush Yadav
2025-11-18 17:58 ` Pasha Tatashin
2025-11-18 18:17 ` Pratyush Yadav
2025-11-18 19:09 ` Jason Gunthorpe
2025-11-18 19:31 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation Pasha Tatashin
2025-11-16 18:25 ` Mike Rapoport
2025-11-18 2:58 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state Pasha Tatashin
2025-11-17 9:39 ` Mike Rapoport
2025-11-18 3:54 ` Pasha Tatashin
2025-11-18 11:28 ` Mike Rapoport
2025-11-18 15:37 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 09/20] docs: add luo documentation Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 10/20] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-11-17 9:40 ` Mike Rapoport
2025-11-17 18:20 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-11-17 9:48 ` Mike Rapoport
2025-11-17 18:25 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 12/20] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-11-17 10:08 ` Mike Rapoport
2025-11-18 4:13 ` Pasha Tatashin
2025-11-15 23:33 ` [PATCH v6 13/20] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-11-17 10:14 ` Mike Rapoport
2025-11-17 18:43 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state Pasha Tatashin
2025-11-17 10:15 ` Mike Rapoport
2025-11-17 18:45 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd Pasha Tatashin
2025-11-17 11:03 ` Mike Rapoport
2025-11-19 21:56 ` Pasha Tatashin
2025-11-20 15:34 ` Pratyush Yadav
2025-11-15 23:34 ` [PATCH v6 16/20] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 17/20] selftests/liveupdate: Add userspace API selftests Pasha Tatashin
2025-11-17 19:38 ` David Matlack
2025-11-17 20:16 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle Pasha Tatashin
2025-11-16 18:53 ` Zhu Yanjun
2025-11-17 18:23 ` Pasha Tatashin
2025-11-17 19:27 ` David Matlack
2025-11-17 20:08 ` David Matlack
2025-11-17 21:06 ` David Matlack
2025-11-18 1:01 ` Pasha Tatashin
2025-11-18 0:06 ` David Matlack
2025-11-18 1:08 ` Pasha Tatashin
2025-11-19 21:20 ` David Matlack
2025-11-19 22:12 ` Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 19/20] selftests/liveupdate: Add kexec test for multiple and empty sessions Pasha Tatashin
2025-11-15 23:34 ` [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test Pasha Tatashin
2025-11-17 11:13 ` Mike Rapoport
2025-11-17 19:00 ` Pasha Tatashin
2025-11-18 11:30 ` Mike Rapoport
2025-11-18 18:56 ` Pasha Tatashin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).