Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCH v6 04/20] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-1-pasha.tatashin@soleen.com>

Introduce concept of "Live Update Sessions" within the LUO framework.
LUO sessions provide a mechanism to group and manage `struct file *`
instances (representing file descriptors) that need to be preserved
across a kexec-based live update.

Each session is identified by a unique name and acts as a container
for file objects whose state is critical to a userspace workload, such
as a virtual machine or a high-performance database, aiming to maintain
their functionality across a kernel transition.

This groundwork establishes the framework for preserving file-backed
state across kernel updates, with the actual file data preservation
mechanisms to be implemented in subsequent patches.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/liveupdate/abi/luo.h |  83 +++++-
 include/uapi/linux/liveupdate.h    |   3 +
 kernel/liveupdate/Makefile         |   3 +-
 kernel/liveupdate/luo_core.c       |  10 +
 kernel/liveupdate/luo_internal.h   |  52 ++++
 kernel/liveupdate/luo_session.c    | 421 +++++++++++++++++++++++++++++
 6 files changed, 570 insertions(+), 2 deletions(-)
 create mode 100644 kernel/liveupdate/luo_internal.h
 create mode 100644 kernel/liveupdate/luo_session.c

diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
index 9483a294287f..03a177ae232e 100644
--- a/include/linux/liveupdate/abi/luo.h
+++ b/include/linux/liveupdate/abi/luo.h
@@ -28,6 +28,11 @@
  *     / {
  *         compatible = "luo-v1";
  *         liveupdate-number = <...>;
+ *
+ *         luo-session {
+ *             compatible = "luo-session-v1";
+ *             luo-session-header = <phys_addr_of_session_header_ser>;
+ *         };
  *     };
  *
  * Main LUO Node (/):
@@ -36,14 +41,40 @@
  *     Identifies the overall LUO ABI version.
  *   - liveupdate-number: u64
  *     A counter tracking the number of successful live updates performed.
+ *
+ * Session Node (luo-session):
+ *   This node describes all preserved user-space sessions.
+ *
+ *   - compatible: "luo-session-v1"
+ *     Identifies the session ABI version.
+ *   - luo-session-header: u64
+ *     The physical address of a `struct luo_session_header_ser`. This structure
+ *     is the header for a contiguous block of memory containing an array of
+ *     `struct luo_session_ser`, one for each preserved session.
+ *
+ * Serialization Structures:
+ *   The FDT properties point to memory regions containing arrays of simple,
+ *   `__packed` structures. These structures contain the actual preserved state.
+ *
+ *   - struct luo_session_header_ser:
+ *     Header for the session array. Contains the total page count of the
+ *     preserved memory block and the number of `struct luo_session_ser`
+ *     entries that follow.
+ *
+ *   - struct luo_session_ser:
+ *     Metadata for a single session, including its name and a physical pointer
+ *     to another preserved memory block containing an array of
+ *     `struct luo_file_ser` for all files in that session.
  */
 
 #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
 #define _LINUX_LIVEUPDATE_ABI_LUO_H
 
+#include <uapi/linux/liveupdate.h>
+
 /*
  * The LUO FDT hooks all LUO state for sessions, fds, etc.
- * In the root it allso carries "liveupdate-number" 64-bit property that
+ * In the root it also carries "liveupdate-number" 64-bit property that
  * corresponds to the number of live-updates performed on this machine.
  */
 #define LUO_FDT_SIZE		PAGE_SIZE
@@ -51,4 +82,54 @@
 #define LUO_FDT_COMPATIBLE	"luo-v1"
 #define LUO_FDT_LIVEUPDATE_NUM	"liveupdate-number"
 
+/*
+ * LUO FDT session node
+ * LUO_FDT_SESSION_HEADER:  is a u64 physical address of struct
+ *                          luo_session_header_ser
+ */
+#define LUO_FDT_SESSION_NODE_NAME	"luo-session"
+#define LUO_FDT_SESSION_COMPATIBLE	"luo-session-v1"
+#define LUO_FDT_SESSION_HEADER		"luo-session-header"
+
+/**
+ * struct luo_session_header_ser - Header for the serialized session data block.
+ * @pgcnt: The total size, in pages, of the entire preserved memory block
+ *         that this header describes.
+ * @count: The number of 'struct luo_session_ser' entries that immediately
+ *         follow this header in the memory block.
+ *
+ * This structure is located at the beginning of a contiguous block of
+ * physical memory preserved across the kexec. It provides the necessary
+ * metadata to interpret the array of session entries that follow.
+ */
+struct luo_session_header_ser {
+	u64 pgcnt;
+	u64 count;
+} __packed;
+
+/**
+ * struct luo_session_ser - Represents the serialized metadata for a LUO session.
+ * @name:    The unique name of the session, copied from the `luo_session`
+ *           structure.
+ * @files:   The physical address of a contiguous memory block that holds
+ *           the serialized state of files.
+ * @pgcnt:   The number of pages occupied by the `files` memory block.
+ * @count:   The total number of files that were part of this session during
+ *           serialization. Used for iteration and validation during
+ *           restoration.
+ *
+ * This structure is used to package session-specific metadata for transfer
+ * between kernels via Kexec Handover. An array of these structures (one per
+ * session) is created and passed to the new kernel, allowing it to reconstruct
+ * the session context.
+ *
+ * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
+ */
+struct luo_session_ser {
+	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
+	u64 files;
+	u64 pgcnt;
+	u64 count;
+} __packed;
+
 #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
index df34c1642c4d..d2ef2f7e0dbd 100644
--- a/include/uapi/linux/liveupdate.h
+++ b/include/uapi/linux/liveupdate.h
@@ -43,4 +43,7 @@
 /* The ioctl type, documented in ioctl-number.rst */
 #define LIVEUPDATE_IOCTL_TYPE		0xBA
 
+/* The maximum length of session name including null termination */
+#define LIVEUPDATE_SESSION_NAME_LENGTH 56
+
 #endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index 413722002b7a..83285e7ad726 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -2,7 +2,8 @@
 
 luo-y :=								\
 		luo_core.o						\
-		luo_ioctl.o
+		luo_ioctl.o						\
+		luo_session.o
 
 obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
 obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index 4a213b262b9f..653cdca5e25d 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -54,6 +54,7 @@
 #include <linux/unaligned.h>
 
 #include "kexec_handover_internal.h"
+#include "luo_internal.h"
 
 static struct {
 	bool enabled;
@@ -117,6 +118,10 @@ static int __init luo_early_startup(void)
 	pr_info("Retrieved live update data, liveupdate number: %lld\n",
 		luo_global.liveupdate_num);
 
+	err = luo_session_setup_incoming(luo_global.fdt_in);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -153,6 +158,7 @@ static int __init luo_fdt_setup(void)
 	err |= fdt_begin_node(fdt_out, "");
 	err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
 	err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
+	err |= luo_session_setup_outgoing(fdt_out);
 	err |= fdt_end_node(fdt_out);
 	err |= fdt_finish(fdt_out);
 	if (err)
@@ -210,6 +216,10 @@ int liveupdate_reboot(void)
 	if (!liveupdate_enabled())
 		return 0;
 
+	err = luo_session_serialize();
+	if (err)
+		return err;
+
 	err = kho_finalize();
 	if (err) {
 		pr_err("kho_finalize failed %d\n", err);
diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
new file mode 100644
index 000000000000..245373edfa6f
--- /dev/null
+++ b/kernel/liveupdate/luo_internal.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef _LINUX_LUO_INTERNAL_H
+#define _LINUX_LUO_INTERNAL_H
+
+#include <linux/liveupdate.h>
+
+/**
+ * struct luo_session - Represents an active or incoming Live Update session.
+ * @name:       A unique name for this session, used for identification and
+ *              retrieval.
+ * @files_list: An ordered list of files associated with this session, it is
+ *              ordered by preservation time.
+ * @ser:        Pointer to the serialized data for this session.
+ * @count:      A counter tracking the number of files currently stored in the
+ *              @files_list for this session.
+ * @list:       A list_head member used to link this session into a global list
+ *              of either outgoing (to be preserved) or incoming (restored from
+ *              previous kernel) sessions.
+ * @retrieved:  A boolean flag indicating whether this session has been
+ *              retrieved by a consumer in the new kernel.
+ * @mutex:      Session lock, protects files_list, and count.
+ * @files:      The physically contiguous memory block that holds the serialized
+ *              state of files.
+ * @pgcnt:      The number of pages @files occupy.
+ */
+struct luo_session {
+	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
+	struct list_head files_list;
+	struct luo_session_ser *ser;
+	long count;
+	struct list_head list;
+	bool retrieved;
+	struct mutex mutex;
+	struct luo_file_ser *files;
+	u64 pgcnt;
+};
+
+int luo_session_create(const char *name, struct file **filep);
+int luo_session_retrieve(const char *name, struct file **filep);
+int __init luo_session_setup_outgoing(void *fdt);
+int __init luo_session_setup_incoming(void *fdt);
+int luo_session_serialize(void);
+int luo_session_deserialize(void);
+bool luo_session_is_deserialized(void);
+
+#endif /* _LINUX_LUO_INTERNAL_H */
diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
new file mode 100644
index 000000000000..cb74bfaba479
--- /dev/null
+++ b/kernel/liveupdate/luo_session.c
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: LUO Sessions
+ *
+ * LUO Sessions provide the core mechanism for grouping and managing `struct
+ * file *` instances that need to be preserved across a kexec-based live
+ * update. Each session acts as a named container for a set of file objects,
+ * allowing a userspace agent to manage the lifecycle of resources critical to a
+ * workload.
+ *
+ * Core Concepts:
+ *
+ * - Named Containers: Sessions are identified by a unique, user-provided name,
+ *   which is used for both creation in the current kernel and retrieval in the
+ *   next kernel.
+ *
+ * - Userspace Interface: Session management is driven from userspace via
+ *   ioctls on /dev/liveupdate.
+ *
+ * - Serialization: Session metadata is preserved using the KHO framework. When
+ *   a live update is triggered via kexec, an array of `struct luo_session_ser`
+ *   is populated and placed in a preserved memory region. An FDT node is also
+ *   created, containing the count of sessions and the physical address of this
+ *   array.
+ *
+ * Session Lifecycle:
+ *
+ * 1.  Creation: A userspace agent calls `luo_session_create()` to create a
+ *     new, empty session and receives a file descriptor for it.
+ *
+ * 2.  Serialization: When the `reboot(LINUX_REBOOT_CMD_KEXEC)` syscall is
+ *     made, `luo_session_serialize()` is called. It iterates through all
+ *     active sessions and writes their metadata into a memory area preserved
+ *     by KHO.
+ *
+ * 3.  Deserialization (in new kernel): After kexec, `luo_session_deserialize()`
+ *     runs, reading the serialized data and creating a list of `struct
+ *     luo_session` objects representing the preserved sessions.
+ *
+ * 4.  Retrieval: A userspace agent in the new kernel can then call
+ *     `luo_session_retrieve()` with a session name to get a new file
+ *     descriptor and access the preserved state.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/cleanup.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
+#include <linux/libfdt.h>
+#include <linux/list.h>
+#include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/unaligned.h>
+#include <uapi/linux/liveupdate.h>
+#include "luo_internal.h"
+
+/* 16 4K pages, give space for 819 sessions */
+#define LUO_SESSION_PGCNT	16ul
+#define LUO_SESSION_MAX		(((LUO_SESSION_PGCNT << PAGE_SHIFT) -	\
+		sizeof(struct luo_session_header_ser)) /		\
+		sizeof(struct luo_session_ser))
+
+/**
+ * struct luo_session_header - Header struct for managing LUO sessions.
+ * @count:      The number of sessions currently tracked in the @list.
+ * @list:       The head of the linked list of `struct luo_session` instances.
+ * @rwsem:      A read-write semaphore providing synchronized access to the
+ *              session list and other fields in this structure.
+ * @header_ser: The header data of serialization array.
+ * @ser:        The serialized session data (an array of
+ *              `struct luo_session_ser`).
+ * @active:     Set to true when first initialized. If previous kernel did not
+ *              send session data, active stays false for incoming.
+ */
+struct luo_session_header {
+	long count;
+	struct list_head list;
+	struct rw_semaphore rwsem;
+	struct luo_session_header_ser *header_ser;
+	struct luo_session_ser *ser;
+	bool active;
+};
+
+/**
+ * struct luo_session_global - Global container for managing LUO sessions.
+ * @incoming:     The sessions passed from the previous kernel.
+ * @outgoing:     The sessions that are going to be passed to the next kernel.
+ * @deserialized: The sessions have been deserialized once /dev/liveupdate
+ *                has been opened.
+ */
+struct luo_session_global {
+	struct luo_session_header incoming;
+	struct luo_session_header outgoing;
+	bool deserialized;
+};
+
+static struct luo_session_global luo_session_global;
+
+static struct luo_session *luo_session_alloc(const char *name)
+{
+	struct luo_session *session = kzalloc(sizeof(*session), GFP_KERNEL);
+
+	if (!session)
+		return ERR_PTR(-ENOMEM);
+
+	strscpy(session->name, name, sizeof(session->name));
+	INIT_LIST_HEAD(&session->files_list);
+	INIT_LIST_HEAD(&session->list);
+	mutex_init(&session->mutex);
+	session->count = 0;
+
+	return session;
+}
+
+static void luo_session_free(struct luo_session *session)
+{
+	WARN_ON(session->count);
+	WARN_ON(!list_empty(&session->files_list));
+	mutex_destroy(&session->mutex);
+	kfree(session);
+}
+
+static int luo_session_insert(struct luo_session_header *sh,
+			      struct luo_session *session)
+{
+	struct luo_session *it;
+
+	guard(rwsem_write)(&sh->rwsem);
+
+	/*
+	 * For outgoing we should make sure there is room in serialization array
+	 * for new session.
+	 */
+	if (sh == &luo_session_global.outgoing) {
+		if (sh->count == LUO_SESSION_MAX)
+			return -ENOMEM;
+	}
+
+	/*
+	 * For small number of sessions this loop won't hurt performance
+	 * but if we ever start using a lot of sessions, this might
+	 * become a bottle neck during deserialization time, as it would
+	 * cause O(n*n) complexity.
+	 */
+	list_for_each_entry(it, &sh->list, list) {
+		if (!strncmp(it->name, session->name, sizeof(it->name)))
+			return -EEXIST;
+	}
+	list_add_tail(&session->list, &sh->list);
+	sh->count++;
+
+	return 0;
+}
+
+static void luo_session_remove(struct luo_session_header *sh,
+			       struct luo_session *session)
+{
+	guard(rwsem_write)(&sh->rwsem);
+	list_del(&session->list);
+	sh->count--;
+}
+
+static int luo_session_release(struct inode *inodep, struct file *filep)
+{
+	struct luo_session *session = filep->private_data;
+	struct luo_session_header *sh;
+
+	/* If retrieved is set, it means this session is from incoming list */
+	if (session->retrieved)
+		sh = &luo_session_global.incoming;
+	else
+		sh = &luo_session_global.outgoing;
+
+	luo_session_remove(sh, session);
+	luo_session_free(session);
+
+	return 0;
+}
+
+static const struct file_operations luo_session_fops = {
+	.owner = THIS_MODULE,
+	.release = luo_session_release,
+};
+
+/* Create a "struct file" for session */
+static int luo_session_getfile(struct luo_session *session, struct file **filep)
+{
+	char name_buf[128];
+	struct file *file;
+
+	guard(mutex)(&session->mutex);
+	snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
+	file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	*filep = file;
+
+	return 0;
+}
+
+int luo_session_create(const char *name, struct file **filep)
+{
+	struct luo_session *session;
+	int err;
+
+	session = luo_session_alloc(name);
+	if (IS_ERR(session))
+		return PTR_ERR(session);
+
+	err = luo_session_insert(&luo_session_global.outgoing, session);
+	if (err)
+		goto err_free;
+
+	err = luo_session_getfile(session, filep);
+	if (err)
+		goto err_remove;
+
+	return 0;
+
+err_remove:
+	luo_session_remove(&luo_session_global.outgoing, session);
+err_free:
+	luo_session_free(session);
+
+	return err;
+}
+
+int luo_session_retrieve(const char *name, struct file **filep)
+{
+	struct luo_session_header *sh = &luo_session_global.incoming;
+	struct luo_session *session = NULL;
+	struct luo_session *it;
+	int err;
+
+	scoped_guard(rwsem_read, &sh->rwsem) {
+		list_for_each_entry(it, &sh->list, list) {
+			if (!strncmp(it->name, name, sizeof(it->name))) {
+				session = it;
+				break;
+			}
+		}
+	}
+
+	if (!session)
+		return -ENOENT;
+
+	scoped_guard(mutex, &session->mutex) {
+		if (session->retrieved)
+			return -EINVAL;
+	}
+
+	err = luo_session_getfile(session, filep);
+	if (!err) {
+		scoped_guard(mutex, &session->mutex)
+			session->retrieved = true;
+	}
+
+	return err;
+}
+
+int __init luo_session_setup_outgoing(void *fdt_out)
+{
+	struct luo_session_header_ser *header_ser;
+	u64 header_ser_pa;
+	int err;
+
+	header_ser = kho_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
+	if (IS_ERR(header_ser))
+		return PTR_ERR(header_ser);
+	header_ser_pa = virt_to_phys(header_ser);
+
+	err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
+	err |= fdt_property_string(fdt_out, "compatible",
+				   LUO_FDT_SESSION_COMPATIBLE);
+	err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEADER, &header_ser_pa,
+			    sizeof(header_ser_pa));
+	err |= fdt_end_node(fdt_out);
+
+	if (err)
+		goto err_unpreserve;
+
+	header_ser->pgcnt = LUO_SESSION_PGCNT;
+	INIT_LIST_HEAD(&luo_session_global.outgoing.list);
+	init_rwsem(&luo_session_global.outgoing.rwsem);
+	luo_session_global.outgoing.header_ser = header_ser;
+	luo_session_global.outgoing.ser = (void *)(header_ser + 1);
+	luo_session_global.outgoing.active = true;
+
+	return 0;
+
+err_unpreserve:
+	kho_unpreserve_free(header_ser);
+	return err;
+}
+
+int __init luo_session_setup_incoming(void *fdt_in)
+{
+	struct luo_session_header_ser *header_ser;
+	int err, header_size, offset;
+	u64 header_ser_pa;
+	const void *ptr;
+
+	offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
+	if (offset < 0) {
+		pr_err("Unable to get session node: [%s]\n",
+		       LUO_FDT_SESSION_NODE_NAME);
+		return -EINVAL;
+	}
+
+	err = fdt_node_check_compatible(fdt_in, offset,
+					LUO_FDT_SESSION_COMPATIBLE);
+	if (err) {
+		pr_err("Session node incompatible [%s]\n",
+		       LUO_FDT_SESSION_COMPATIBLE);
+		return -EINVAL;
+	}
+
+	header_size = 0;
+	ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEADER, &header_size);
+	if (!ptr || header_size != sizeof(u64)) {
+		pr_err("Unable to get session header '%s' [%d]\n",
+		       LUO_FDT_SESSION_HEADER, header_size);
+		return -EINVAL;
+	}
+
+	header_ser_pa = get_unaligned((u64 *)ptr);
+	header_ser = phys_to_virt(header_ser_pa);
+
+	luo_session_global.incoming.header_ser = header_ser;
+	luo_session_global.incoming.ser = (void *)(header_ser + 1);
+	INIT_LIST_HEAD(&luo_session_global.incoming.list);
+	init_rwsem(&luo_session_global.incoming.rwsem);
+	luo_session_global.incoming.active = true;
+
+	return 0;
+}
+
+bool luo_session_is_deserialized(void)
+{
+	return luo_session_global.deserialized;
+}
+
+int luo_session_deserialize(void)
+{
+	struct luo_session_header *sh = &luo_session_global.incoming;
+	int err;
+
+	if (luo_session_is_deserialized())
+		return 0;
+
+	luo_session_global.deserialized = true;
+	if (!sh->active) {
+		INIT_LIST_HEAD(&sh->list);
+		init_rwsem(&sh->rwsem);
+		return 0;
+	}
+
+	for (int i = 0; i < sh->header_ser->count; i++) {
+		struct luo_session *session;
+
+		session = luo_session_alloc(sh->ser[i].name);
+		if (IS_ERR(session)) {
+			pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
+				sh->ser[i].name, session);
+			return PTR_ERR(session);
+		}
+
+		err = luo_session_insert(sh, session); 
+		if (err) {
+			luo_session_free(session);
+			pr_warn("Failed to insert session [%s] %pe\n",
+				session->name, ERR_PTR(err));
+			return err;
+		}
+
+		session->count = sh->ser[i].count;
+		session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
+		session->pgcnt = sh->ser[i].pgcnt;
+	}
+
+	kho_restore_free(sh->header_ser);
+	sh->header_ser = NULL;
+	sh->ser = NULL;
+
+	return 0;
+}
+
+int luo_session_serialize(void)
+{
+	struct luo_session_header *sh = &luo_session_global.outgoing;
+	struct luo_session *session;
+	int i = 0;
+
+	guard(rwsem_write)(&sh->rwsem);
+	list_for_each_entry(session, &sh->list, list) {
+		strscpy(sh->ser[i].name, session->name,
+			sizeof(sh->ser[i].name));
+		sh->ser[i].count = session->count;
+		sh->ser[i].files = session->files ? virt_to_phys(session->files) : 0;
+		sh->ser[i].pgcnt = session->pgcnt;
+		i++;
+	}
+	sh->header_ser->count = sh->count;
+
+	return 0;
+}
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related

* [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-1-pasha.tatashin@soleen.com>

Modify the kernel_kexec() to call liveupdate_reboot().

This ensures that the Live Update Orchestrator is notified just
before the kernel executes the kexec jump. The liveupdate_reboot()
function triggers the final freeze event, allowing participating
FDs perform last-minute check or state saving within the blackout
window.

If liveupdate_reboot() returns an error (indicating a failure during
LUO finalization), the kexec operation is aborted to prevent proceeding
with an inconsistent state. An error is returned to user.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 kernel/kexec_core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index a8890dd03a1d..3122235c225b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -15,6 +15,7 @@
 #include <linux/kexec.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/liveupdate.h>
 #include <linux/highmem.h>
 #include <linux/syscalls.h>
 #include <linux/reboot.h>
@@ -1145,6 +1146,10 @@ int kernel_kexec(void)
 		goto Unlock;
 	}
 
+	error = liveupdate_reboot();
+	if (error)
+		goto Unlock;
+
 #ifdef CONFIG_KEXEC_JUMP
 	if (kexec_image->preserve_context) {
 		/*
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related

* [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-1-pasha.tatashin@soleen.com>

Integrate the LUO with the KHO framework to enable passing LUO state
across a kexec reboot.

When LUO is transitioned to a "prepared" state, it tells KHO to
finalize, so all memory segments that were added to KHO preservation
list are getting preserved. After "Prepared" state no new segments
can be preserved. If LUO is canceled, it also tells KHO to cancel the
serialization, and therefore, later LUO can go back into the prepared
state.

This patch introduces the following changes:
- During the KHO finalization phase allocate FDT blob.
- Populate this FDT with a LUO compatibility string ("luo-v1").

LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
logic (`luo_do_*_calls`) remains unimplemented in this patch.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/liveupdate/abi/luo.h |  54 ++++++++++
 kernel/liveupdate/luo_core.c       | 153 ++++++++++++++++++++++++++++-
 2 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/liveupdate/abi/luo.h

diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
new file mode 100644
index 000000000000..9483a294287f
--- /dev/null
+++ b/include/linux/liveupdate/abi/luo.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: Live Update Orchestrator ABI
+ *
+ * This header defines the stable Application Binary Interface used by the
+ * Live Update Orchestrator to pass state from a pre-update kernel to a
+ * post-update kernel. The ABI is built upon the Kexec HandOver framework
+ * and uses a Flattened Device Tree to describe the preserved data.
+ *
+ * This interface is a contract. Any modification to the FDT structure, node
+ * properties, compatible strings, or the layout of the `__packed` serialization
+ * structures defined here constitutes a breaking change. Such changes require
+ * incrementing the version number in the relevant `_COMPATIBLE` string to
+ * prevent a new kernel from misinterpreting data from an old kernel.
+ *
+ * FDT Structure Overview:
+ *   The entire LUO state is encapsulated within a single KHO entry named "LUO".
+ *   This entry contains an FDT with the following layout:
+ *
+ *   .. code-block:: none
+ *
+ *     / {
+ *         compatible = "luo-v1";
+ *         liveupdate-number = <...>;
+ *     };
+ *
+ * Main LUO Node (/):
+ *
+ *   - compatible: "luo-v1"
+ *     Identifies the overall LUO ABI version.
+ *   - liveupdate-number: u64
+ *     A counter tracking the number of successful live updates performed.
+ */
+
+#ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
+#define _LINUX_LIVEUPDATE_ABI_LUO_H
+
+/*
+ * The LUO FDT hooks all LUO state for sessions, fds, etc.
+ * In the root it allso carries "liveupdate-number" 64-bit property that
+ * corresponds to the number of live-updates performed on this machine.
+ */
+#define LUO_FDT_SIZE		PAGE_SIZE
+#define LUO_FDT_KHO_ENTRY_NAME	"LUO"
+#define LUO_FDT_COMPATIBLE	"luo-v1"
+#define LUO_FDT_LIVEUPDATE_NUM	"liveupdate-number"
+
+#endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
index 0e1ab19fa1cd..4a213b262b9f 100644
--- a/kernel/liveupdate/luo_core.c
+++ b/kernel/liveupdate/luo_core.c
@@ -42,11 +42,24 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/io.h>
+#include <linux/kexec_handover.h>
 #include <linux/kobject.h>
+#include <linux/libfdt.h>
 #include <linux/liveupdate.h>
+#include <linux/liveupdate/abi/luo.h>
+#include <linux/mm.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+#include <linux/unaligned.h>
+
+#include "kexec_handover_internal.h"
 
 static struct {
 	bool enabled;
+	void *fdt_out;
+	void *fdt_in;
+	u64 liveupdate_num;
 } luo_global;
 
 static int __init early_liveupdate_param(char *buf)
@@ -55,6 +68,129 @@ static int __init early_liveupdate_param(char *buf)
 }
 early_param("liveupdate", early_liveupdate_param);
 
+static int __init luo_early_startup(void)
+{
+	phys_addr_t fdt_phys;
+	int err, ln_size;
+	const void *ptr;
+
+	if (!kho_is_enabled()) {
+		if (liveupdate_enabled())
+			pr_warn("Disabling liveupdate because KHO is disabled\n");
+		luo_global.enabled = false;
+		return 0;
+	}
+
+	/* Retrieve LUO subtree, and verify its format. */
+	err = kho_retrieve_subtree(LUO_FDT_KHO_ENTRY_NAME, &fdt_phys);
+	if (err) {
+		if (err != -ENOENT) {
+			pr_err("failed to retrieve FDT '%s' from KHO: %pe\n",
+			       LUO_FDT_KHO_ENTRY_NAME, ERR_PTR(err));
+			return err;
+		}
+
+		return 0;
+	}
+
+	luo_global.fdt_in = phys_to_virt(fdt_phys);
+	err = fdt_node_check_compatible(luo_global.fdt_in, 0,
+					LUO_FDT_COMPATIBLE);
+	if (err) {
+		pr_err("FDT '%s' is incompatible with '%s' [%d]\n",
+		       LUO_FDT_KHO_ENTRY_NAME, LUO_FDT_COMPATIBLE, err);
+
+		return -EINVAL;
+	}
+
+	ln_size = 0;
+	ptr = fdt_getprop(luo_global.fdt_in, 0, LUO_FDT_LIVEUPDATE_NUM,
+			  &ln_size);
+	if (!ptr || ln_size != sizeof(luo_global.liveupdate_num)) {
+		pr_err("Unable to get live update number '%s' [%d]\n",
+		       LUO_FDT_LIVEUPDATE_NUM, ln_size);
+
+		return -EINVAL;
+	}
+
+	luo_global.liveupdate_num = get_unaligned((u64 *)ptr);
+	pr_info("Retrieved live update data, liveupdate number: %lld\n",
+		luo_global.liveupdate_num);
+
+	return 0;
+}
+
+static int __init liveupdate_early_init(void)
+{
+	int err;
+
+	err = luo_early_startup();
+	if (err) {
+		pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
+		       ERR_PTR(err));
+		luo_global.enabled = false;
+	}
+
+	return err;
+}
+early_initcall(liveupdate_early_init);
+
+/* Called during boot to create outgoing LUO fdt tree */
+static int __init luo_fdt_setup(void)
+{
+	const u64 ln = luo_global.liveupdate_num + 1;
+	void *fdt_out;
+	int err;
+
+	fdt_out = kho_alloc_preserve(LUO_FDT_SIZE);
+	if (IS_ERR(fdt_out)) {
+		pr_err("failed to allocate/preserve FDT memory\n");
+		return PTR_ERR(fdt_out);
+	}
+
+	err = fdt_create(fdt_out, LUO_FDT_SIZE);
+	err |= fdt_finish_reservemap(fdt_out);
+	err |= fdt_begin_node(fdt_out, "");
+	err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
+	err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
+	err |= fdt_end_node(fdt_out);
+	err |= fdt_finish(fdt_out);
+	if (err)
+		goto exit_free;
+
+	err = kho_add_subtree(LUO_FDT_KHO_ENTRY_NAME, fdt_out);
+	if (err)
+		goto exit_free;
+	luo_global.fdt_out = fdt_out;
+
+	return 0;
+
+exit_free:
+	kho_unpreserve_free(fdt_out);
+	pr_err("failed to prepare LUO FDT: %d\n", err);
+
+	return err;
+}
+
+/*
+ * late initcall because it initializes the outgoing tree that is needed only
+ * once userspace starts using /dev/liveupdate.
+ */
+static int __init luo_late_startup(void)
+{
+	int err;
+
+	if (!liveupdate_enabled())
+		return 0;
+
+	err = luo_fdt_setup();
+	if (err)
+		luo_global.enabled = false;
+
+	return err;
+}
+late_initcall(luo_late_startup);
+
 /* Public Functions */
 
 /**
@@ -69,7 +205,22 @@ early_param("liveupdate", early_liveupdate_param);
  */
 int liveupdate_reboot(void)
 {
-	return 0;
+	int err;
+
+	if (!liveupdate_enabled())
+		return 0;
+
+	err = kho_finalize();
+	if (err) {
+		pr_err("kho_finalize failed %d\n", err);
+		/*
+		 * kho_finalize() may return libfdt errors, to aboid passing to
+		 * userspace unknown errors, change this to EAGAIN.
+		 */
+		err = -EAGAIN;
+	}
+
+	return err;
 }
 
 /**
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related

* [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-1-pasha.tatashin@soleen.com>

Introduce LUO, a mechanism intended to facilitate kernel updates while
keeping designated devices operational across the transition (e.g., via
kexec). The primary use case is updating hypervisors with minimal
disruption to running virtual machines. For userspace side of hypervisor
update we have copyless migration. LUO is for updating the kernel.

This initial patch lays the groundwork for the LUO subsystem.

Further functionality, including the implementation of state transition
logic, integration with KHO, and hooks for subsystems and file
descriptors, will be added in subsequent patches.

Create a character device at /dev/liveupdate.

A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
structures. The magic number for IOCTL is registered in
Documentation/userspace-api/ioctl/ioctl-number.rst.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 .../userspace-api/ioctl/ioctl-number.rst      |  2 +
 include/linux/liveupdate.h                    | 35 ++++++++
 include/uapi/linux/liveupdate.h               | 46 ++++++++++
 kernel/liveupdate/Kconfig                     | 27 ++++++
 kernel/liveupdate/Makefile                    |  6 ++
 kernel/liveupdate/luo_core.c                  | 86 +++++++++++++++++++
 kernel/liveupdate/luo_ioctl.c                 | 45 ++++++++++
 7 files changed, 247 insertions(+)
 create mode 100644 include/linux/liveupdate.h
 create mode 100644 include/uapi/linux/liveupdate.h
 create mode 100644 kernel/liveupdate/luo_core.c
 create mode 100644 kernel/liveupdate/luo_ioctl.c

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 7c527a01d1cf..7232b3544cec 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -385,6 +385,8 @@ Code  Seq#    Include File                                             Comments
 0xB8  01-02  uapi/misc/mrvl_cn10k_dpi.h                                Marvell CN10K DPI driver
 0xB8  all    uapi/linux/mshv.h                                         Microsoft Hyper-V /dev/mshv driver
                                                                        <mailto:linux-hyperv@vger.kernel.org>
+0xBA  00-0F  uapi/linux/liveupdate.h                                   Pasha Tatashin
+                                                                       <mailto:pasha.tatashin@soleen.com>
 0xC0  00-0F  linux/usb/iowarrior.h
 0xCA  00-0F  uapi/misc/cxl.h                                           Dead since 6.15
 0xCA  10-2F  uapi/misc/ocxl.h
diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
new file mode 100644
index 000000000000..730b76625fec
--- /dev/null
+++ b/include/linux/liveupdate.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+#ifndef _LINUX_LIVEUPDATE_H
+#define _LINUX_LIVEUPDATE_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/list.h>
+
+#ifdef CONFIG_LIVEUPDATE
+
+/* Return true if live update orchestrator is enabled */
+bool liveupdate_enabled(void);
+
+/* Called during kexec to tell LUO that entered into reboot */
+int liveupdate_reboot(void);
+
+#else /* CONFIG_LIVEUPDATE */
+
+static inline bool liveupdate_enabled(void)
+{
+	return false;
+}
+
+static inline int liveupdate_reboot(void)
+{
+	return 0;
+}
+
+#endif /* CONFIG_LIVEUPDATE */
+#endif /* _LINUX_LIVEUPDATE_H */
diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
new file mode 100644
index 000000000000..df34c1642c4d
--- /dev/null
+++ b/include/uapi/linux/liveupdate.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/*
+ * Userspace interface for /dev/liveupdate
+ * Live Update Orchestrator
+ *
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef _UAPI_LIVEUPDATE_H
+#define _UAPI_LIVEUPDATE_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/**
+ * DOC: General ioctl format
+ *
+ * The ioctl interface follows a general format to allow for extensibility. Each
+ * ioctl is passed in a structure pointer as the argument providing the size of
+ * the structure in the first u32. The kernel checks that any structure space
+ * beyond what it understands is 0. This allows userspace to use the backward
+ * compatible portion while consistently using the newer, larger, structures.
+ *
+ * ioctls use a standard meaning for common errnos:
+ *
+ *  - ENOTTY: The IOCTL number itself is not supported at all
+ *  - E2BIG: The IOCTL number is supported, but the provided structure has
+ *    non-zero in a part the kernel does not understand.
+ *  - EOPNOTSUPP: The IOCTL number is supported, and the structure is
+ *    understood, however a known field has a value the kernel does not
+ *    understand or support.
+ *  - EINVAL: Everything about the IOCTL was understood, but a field is not
+ *    correct.
+ *  - ENOENT: A provided token does not exist.
+ *  - ENOMEM: Out of memory.
+ *  - EOVERFLOW: Mathematics overflowed.
+ *
+ * As well as additional errnos, within specific ioctls.
+ */
+
+/* The ioctl type, documented in ioctl-number.rst */
+#define LIVEUPDATE_IOCTL_TYPE		0xBA
+
+#endif /* _UAPI_LIVEUPDATE_H */
diff --git a/kernel/liveupdate/Kconfig b/kernel/liveupdate/Kconfig
index a973a54447de..90857dccb359 100644
--- a/kernel/liveupdate/Kconfig
+++ b/kernel/liveupdate/Kconfig
@@ -1,4 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2025, Google LLC.
+# Pasha Tatashin <pasha.tatashin@soleen.com>
+#
+# Live Update Orchestrator
+#
 
 menu "Live Update and Kexec HandOver"
 	depends on !DEFERRED_STRUCT_PAGE_INIT
@@ -51,4 +57,25 @@ config KEXEC_HANDOVER_ENABLE_DEFAULT
 	  The default behavior can still be overridden at boot time by
 	  passing 'kho=off'.
 
+config LIVEUPDATE
+	bool "Live Update Orchestrator"
+	depends on KEXEC_HANDOVER
+	help
+	  Enable the Live Update Orchestrator. Live Update is a mechanism,
+	  typically based on kexec, that allows the kernel to be updated
+	  while keeping selected devices operational across the transition.
+	  These devices are intended to be reclaimed by the new kernel and
+	  re-attached to their original workload without requiring a device
+	  reset.
+
+	  Ability to handover a device from current to the next kernel depends
+	  on specific support within device drivers and related kernel
+	  subsystems.
+
+	  This feature primarily targets virtual machine hosts to quickly update
+	  the kernel hypervisor with minimal disruption to the running virtual
+	  machines.
+
+	  If unsure, say N.
+
 endmenu
diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
index f52ce1ebcf86..413722002b7a 100644
--- a/kernel/liveupdate/Makefile
+++ b/kernel/liveupdate/Makefile
@@ -1,5 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0
 
+luo-y :=								\
+		luo_core.o						\
+		luo_ioctl.o
+
 obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
 obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
 obj-$(CONFIG_KEXEC_HANDOVER_DEBUGFS)	+= kexec_handover_debugfs.o
+
+obj-$(CONFIG_LIVEUPDATE)		+= luo.o
diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
new file mode 100644
index 000000000000..0e1ab19fa1cd
--- /dev/null
+++ b/kernel/liveupdate/luo_core.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+/**
+ * DOC: Live Update Orchestrator (LUO)
+ *
+ * Live Update is a specialized, kexec-based reboot process that allows a
+ * running kernel to be updated from one version to another while preserving
+ * the state of selected resources and keeping designated hardware devices
+ * operational. For these devices, DMA activity may continue throughout the
+ * kernel transition.
+ *
+ * While the primary use case driving this work is supporting live updates of
+ * the Linux kernel when it is used as a hypervisor in cloud environments, the
+ * LUO framework itself is designed to be workload-agnostic. Much like Kernel
+ * Live Patching, which applies security fixes regardless of the workload,
+ * Live Update facilitates a full kernel version upgrade for any type of system.
+ *
+ * For example, a non-hypervisor system running an in-memory cache like
+ * memcached with many gigabytes of data can use LUO. The userspace service
+ * can place its cache into a memfd, have its state preserved by LUO, and
+ * restore it immediately after the kernel kexec.
+ *
+ * Whether the system is running virtual machines, containers, a
+ * high-performance database, or networking services, LUO's primary goal is to
+ * enable a full kernel update by preserving critical userspace state and
+ * keeping essential devices operational.
+ *
+ * The core of LUO is a mechanism that tracks the progress of a live update,
+ * along with a callback API that allows other kernel subsystems to participate
+ * in the process. Example subsystems that can hook into LUO include: kvm,
+ * iommu, interrupts, vfio, participating filesystems, and memory management.
+ *
+ * LUO uses Kexec Handover to transfer memory state from the current kernel to
+ * the next kernel. For more details see
+ * Documentation/core-api/kho/concepts.rst.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kobject.h>
+#include <linux/liveupdate.h>
+
+static struct {
+	bool enabled;
+} luo_global;
+
+static int __init early_liveupdate_param(char *buf)
+{
+	return kstrtobool(buf, &luo_global.enabled);
+}
+early_param("liveupdate", early_liveupdate_param);
+
+/* Public Functions */
+
+/**
+ * liveupdate_reboot() - Kernel reboot notifier for live update final
+ * serialization.
+ *
+ * This function is invoked directly from the reboot() syscall pathway
+ * if kexec is in progress.
+ *
+ * If any callback fails, this function aborts KHO, undoes the freeze()
+ * callbacks, and returns an error.
+ */
+int liveupdate_reboot(void)
+{
+	return 0;
+}
+
+/**
+ * liveupdate_enabled - Check if the live update feature is enabled.
+ *
+ * This function returns the state of the live update feature flag, which
+ * can be controlled via the ``liveupdate`` kernel command-line parameter.
+ *
+ * @return true if live update is enabled, false otherwise.
+ */
+bool liveupdate_enabled(void)
+{
+	return luo_global.enabled;
+}
diff --git a/kernel/liveupdate/luo_ioctl.c b/kernel/liveupdate/luo_ioctl.c
new file mode 100644
index 000000000000..44d365185f7c
--- /dev/null
+++ b/kernel/liveupdate/luo_ioctl.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2025, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#include <linux/liveupdate.h>
+#include <linux/miscdevice.h>
+
+struct luo_device_state {
+	struct miscdevice miscdev;
+};
+
+static const struct file_operations luo_fops = {
+	.owner		= THIS_MODULE,
+};
+
+static struct luo_device_state luo_dev = {
+	.miscdev = {
+		.minor = MISC_DYNAMIC_MINOR,
+		.name  = "liveupdate",
+		.fops  = &luo_fops,
+	},
+};
+
+static int __init liveupdate_ioctl_init(void)
+{
+	if (!liveupdate_enabled())
+		return 0;
+
+	return misc_register(&luo_dev.miscdev);
+}
+module_init(liveupdate_ioctl_init);
+
+static void __exit liveupdate_exit(void)
+{
+	misc_deregister(&luo_dev.miscdev);
+}
+module_exit(liveupdate_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Pasha Tatashin");
+MODULE_DESCRIPTION("Live Update Orchestrator");
+MODULE_VERSION("0.1");
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related

* [PATCH v6 00/20] Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-15 23:33 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl

This series introduces the Live Update Orchestrator, a kernel subsystem
designed to facilitate live kernel updates using a kexec-based reboot.
This capability is critical for cloud environments, allowing hypervisors
to be updated with minimal downtime for running virtual machines. LUO
achieves this by preserving the state of selected resources, such as
memory, devices and their dependencies, across the kernel transition.

As a key feature, this series includes support for preserving memfd file
descriptors, which allows critical in-memory data, such as guest RAM or
any other large memory region, to be maintained in RAM across the kexec
reboot.

The other series that use LUO, are VFIO [1], IOMMU [2], and PCI [3]
preservations.

Github repo of this series [4].

The core of LUO is a framework for managing the lifecycle of preserved
resources through a userspace-driven interface. Key features include:

- Session Management
  Userspace agent (i.e. luod [5]) creates named sessions, each
  represented by a file descriptor (via centralized agent that controls
  /dev/liveupdate). The lifecycle of all preserved resources within a
  session is tied to this FD, ensuring automatic kernel cleanup if the
  controlling userspace agent crashes or exits unexpectedly.

- File Preservation
  A handler-based framework allows specific file types (demonstrated
  here with memfd) to be preserved. Handlers manage the serialization,
  restoration, and lifecycle of their specific file types.

- File-Lifecycle-Bound State
  A new mechanism for managing shared global state whose lifecycle is
  tied to the preservation of one or more files. This is crucial for
  subsystems like IOMMU or HugeTLB, where multiple file descriptors may
  depend on a single, shared underlying resource that must be preserved
  only once.

- KHO Integration
  LUO drives the Kexec Handover framework programmatically to pass its
  serialized metadata to the next kernel. The LUO state is finalized and
  added to the kexec image just before the reboot is triggered. In the
  future this step will also be removed once stateless KHO is
  merged [6].

- Userspace Interface
  Control is provided via ioctl commands on /dev/liveupdate for creating
  and retrieving sessions, as well as on session file descriptors for
  managing individual files.

- Testing
  The series includes a set of selftests, including userspace API
  validation, kexec-based lifecycle tests for various session and file
  scenarios, and a new in-kernel test module to validate the FLB logic.

Changelog since v5 [7]

- Moved internal luo_alloc/free_* memory helpers to generic
  kho_alloc/free_* APIs, and submitted as a separate KHO series [8].

- Moved the liveupdate_reboot() invocation from kernel/reboot.c to
  kernel_kexec() in kernel/kexec_core.c.

- Moved generic KHO enabling patches (debugfs, kimage logic) out of this
  series and into the base KHO series.

- Feedback: Addressed review comments from Mike Rapoport and Pratyush
  Yadav.

[1] https://lore.kernel.org/all/20251018000713.677779-1-vipinsh@google.com/
[2] https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google.com
[3] https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel.org
[4] https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v6
[5] https://tinyurl.com/luoddesign
[6] https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com
[7] https://lore.kernel.org/all/20251107210526.257742-1-pasha.tatashin@soleen.com
[8] https://lore.kernel.org/all/20251114190002.3311679-1-pasha.tatashin@soleen.com

Pasha Tatashin (14):
  liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
  liveupdate: luo_core: integrate with KHO
  kexec: call liveupdate_reboot() before kexec
  liveupdate: luo_session: add sessions support
  liveupdate: luo_ioctl: add user interface
  liveupdate: luo_file: implement file systems callbacks
  liveupdate: luo_session: Add ioctls for file preservation
  liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
  docs: add luo documentation
  MAINTAINERS: add liveupdate entry
  selftests/liveupdate: Add userspace API selftests
  selftests/liveupdate: Add kexec-based selftest for session lifecycle
  selftests/liveupdate: Add kexec test for multiple and empty sessions
  tests/liveupdate: Add in-kernel liveupdate test

Pratyush Yadav (6):
  mm: shmem: use SHMEM_F_* flags instead of VM_* flags
  mm: shmem: allow freezing inode mapping
  mm: shmem: export some functions to internal.h
  liveupdate: luo_file: add private argument to store runtime state
  mm: memfd_luo: allow preserving memfd
  docs: add documentation for memfd preservation via LUO

 Documentation/core-api/index.rst              |   1 +
 Documentation/core-api/liveupdate.rst         |  71 ++
 Documentation/mm/index.rst                    |   1 +
 Documentation/mm/memfd_preservation.rst       |  23 +
 Documentation/userspace-api/index.rst         |   1 +
 .../userspace-api/ioctl/ioctl-number.rst      |   2 +
 Documentation/userspace-api/liveupdate.rst    |  20 +
 MAINTAINERS                                   |  15 +
 include/linux/liveupdate.h                    | 265 +++++
 include/linux/liveupdate/abi/luo.h            | 238 +++++
 include/linux/liveupdate/abi/memfd.h          |  88 ++
 include/linux/shmem_fs.h                      |  23 +
 include/uapi/linux/liveupdate.h               | 216 +++++
 kernel/kexec_core.c                           |   5 +
 kernel/liveupdate/Kconfig                     |  27 +
 kernel/liveupdate/Makefile                    |   9 +
 kernel/liveupdate/luo_core.c                  | 252 +++++
 kernel/liveupdate/luo_file.c                  | 906 ++++++++++++++++++
 kernel/liveupdate/luo_flb.c                   | 658 +++++++++++++
 kernel/liveupdate/luo_internal.h              |  95 ++
 kernel/liveupdate/luo_ioctl.c                 | 223 +++++
 kernel/liveupdate/luo_session.c               | 600 ++++++++++++
 lib/Kconfig.debug                             |  23 +
 lib/tests/Makefile                            |   1 +
 lib/tests/liveupdate.c                        | 143 +++
 mm/Makefile                                   |   1 +
 mm/internal.h                                 |   6 +
 mm/memfd_luo.c                                | 671 +++++++++++++
 mm/shmem.c                                    |  50 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/liveupdate/.gitignore |   3 +
 tools/testing/selftests/liveupdate/Makefile   |  40 +
 tools/testing/selftests/liveupdate/config     |   5 +
 .../testing/selftests/liveupdate/do_kexec.sh  |  16 +
 .../testing/selftests/liveupdate/liveupdate.c | 348 +++++++
 .../selftests/liveupdate/luo_kexec_simple.c   | 114 +++
 .../selftests/liveupdate/luo_multi_session.c  | 190 ++++
 .../selftests/liveupdate/luo_test_utils.c     | 168 ++++
 .../selftests/liveupdate/luo_test_utils.h     |  39 +
 39 files changed, 5539 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/core-api/liveupdate.rst
 create mode 100644 Documentation/mm/memfd_preservation.rst
 create mode 100644 Documentation/userspace-api/liveupdate.rst
 create mode 100644 include/linux/liveupdate.h
 create mode 100644 include/linux/liveupdate/abi/luo.h
 create mode 100644 include/linux/liveupdate/abi/memfd.h
 create mode 100644 include/uapi/linux/liveupdate.h
 create mode 100644 kernel/liveupdate/luo_core.c
 create mode 100644 kernel/liveupdate/luo_file.c
 create mode 100644 kernel/liveupdate/luo_flb.c
 create mode 100644 kernel/liveupdate/luo_internal.h
 create mode 100644 kernel/liveupdate/luo_ioctl.c
 create mode 100644 kernel/liveupdate/luo_session.c
 create mode 100644 lib/tests/liveupdate.c
 create mode 100644 mm/memfd_luo.c
 create mode 100644 tools/testing/selftests/liveupdate/.gitignore
 create mode 100644 tools/testing/selftests/liveupdate/Makefile
 create mode 100644 tools/testing/selftests/liveupdate/config
 create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
 create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c
 create mode 100644 tools/testing/selftests/liveupdate/luo_kexec_simple.c
 create mode 100644 tools/testing/selftests/liveupdate/luo_multi_session.c
 create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
 create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h

-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-15 22:29 UTC (permalink / raw)
  To: Ned Ulbricht, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <6c26eea2-6f90-f48a-9488-e7480f086c70@netscape.net>

On 2025-11-15 13:29, Ned Ulbricht wrote:
> On 11/14/25 10:53, H. Peter Anvin wrote:
>> On November 14, 2025 10:49:09 AM PST, "Maciej W. Rozycki"
>> <macro@orcam.me.uk> wrote:
>>> On Thu, 13 Nov 2025, H. Peter Anvin wrote:
>>>
>>>>> I think this is going to be the most difficult.  I don't remember why I
>>>>> rejected the old submission, but maybe it would have modified the
>>>>> existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
>>>>> the simplest?
>>>>>
>>>>
>>>> Okay, to I'm going to toss out a couple suggestions for naming:
>>>>
>>>>     O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
>>>>     O_(NO?RESET|PREPARE)(DEV|HW|IO)?
>>>>     O_NO?TOUCH
>>>>     O_NYET ("not yet")
>>>>     
>>>> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
>>>> or O_NYET; although it is perhaps a bit more "use case centric" than "what
>>>> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
>>>> would seem to needlessly preclude it being used for future similar use cases
>>>> for files that are not device nodes.
>>>
>>> Hmm, I'm inconvinced about any of these.
>>>
>>> How about O_FDONLY, to reflect that you are after a file descriptor only
>>> [snip]
> 
> Hi all,
> 
> Resurrecting a (private email) discussion from a few years back now, my
> personal preferences are:
> (1) O_KEEP
> (2) O_TTY_KEEP
> (3) O_TTY_NOINIT.
> 
> (Of course, naming an open() flag has got to be a paradigmatic
> invitation for bike-shedding...)
> 
> It's worth pointing out, though, that even though O_TTY_INIT doesn't
> generally appear in linux headers, that particular flag is documented in
> POSIX to have at least incompatible --perhaps even strictly opposite--
> behavior compared with this new proposed flag.
> 

I dislike O_TTY_* because restricts it to the TTY use case.

	-hpa


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: Ned Ulbricht @ 2025-11-15 21:29 UTC (permalink / raw)
  To: H. Peter Anvin, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <B72D6F71-7C0B-4C5A-8866-25D7946E0932@zytor.com>

On 11/14/25 10:53, H. Peter Anvin wrote:
> On November 14, 2025 10:49:09 AM PST, "Maciej W. Rozycki" <macro@orcam.me.uk> wrote:
>> On Thu, 13 Nov 2025, H. Peter Anvin wrote:
>>
>>>> I think this is going to be the most difficult.  I don't remember why I
>>>> rejected the old submission, but maybe it would have modified the
>>>> existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
>>>> the simplest?
>>>>
>>>
>>> Okay, to I'm going to toss out a couple suggestions for naming:
>>>
>>> 	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
>>> 	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
>>> 	O_NO?TOUCH
>>> 	O_NYET ("not yet")
>>> 	
>>> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
>>> or O_NYET; although it is perhaps a bit more "use case centric" than "what
>>> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
>>> would seem to needlessly preclude it being used for future similar use cases
>>> for files that are not device nodes.
>>
>> Hmm, I'm inconvinced about any of these.
>>
>> How about O_FDONLY, to reflect that you are after a file descriptor only [snip]

Hi all,

Resurrecting a (private email) discussion from a few years back now, my
personal preferences are:
(1) O_KEEP
(2) O_TTY_KEEP
(3) O_TTY_NOINIT.

(Of course, naming an open() flag has got to be a paradigmatic
invitation for bike-shedding...)

It's worth pointing out, though, that even though O_TTY_INIT doesn't
generally appear in linux headers, that particular flag is documented in
POSIX to have at least incompatible --perhaps even strictly opposite--
behavior compared with this new proposed flag.

See The Open Group Base Specifications Issue 8 (IEEE Std 1003.1-2024):

| 11.1.1 Opening a Terminal Device File
|
| 3. ... The terminal parameters can be set to values that ensure the
| terminal behaves in a conforming manner by means of the O_TTY_INIT
| open flag....

https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap11.html

| open, openat — open file
|
| O_TTY_INIT

https://pubs.opengroup.org/onlinepubs/9799919799/

That's what motivates my first-glance preference to name this new flag,
which will have approximately opposite behavior, as O_TTY_NOINIT.

But as a generic abstraction, I more prefer O_KEEP.

Ned

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-14 18:53 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <alpine.DEB.2.21.2511141836130.47194@angie.orcam.me.uk>

On November 14, 2025 10:49:09 AM PST, "Maciej W. Rozycki" <macro@orcam.me.uk> wrote:
>On Thu, 13 Nov 2025, H. Peter Anvin wrote:
>
>> > I think this is going to be the most difficult.  I don't remember why I
>> > rejected the old submission, but maybe it would have modified the
>> > existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
>> > the simplest?
>> > 
>> 
>> Okay, to I'm going to toss out a couple suggestions for naming:
>> 
>> 	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
>> 	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
>> 	O_NO?TOUCH
>> 	O_NYET ("not yet")
>> 	
>> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
>> or O_NYET; although it is perhaps a bit more "use case centric" than "what
>> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
>> would seem to needlessly preclude it being used for future similar use cases
>> for files that are not device nodes.
>
> Hmm, I'm inconvinced about any of these.
>
> How about O_FDONLY, to reflect that you are after a file descriptor only 
>with no further actions at open time while avoiding the ambiguity of names 
>such as CONFIG vs NOCONFIG or speaking more broadly implying any specific 
>intent of use at all such as with CONFIG/INIT/PREPARE/RESET/whatever?
>
> I think O_FDONLY is concise, easy to spell/say/remember, and fits the 
>purpose.  Your call!
>
>  Maciej

Overlaps too much with O_PATH, and implies that communication isn't possible *after* device-dependent setup.

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: Maciej W. Rozycki @ 2025-11-14 18:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <14b1bc5c-83ac-431f-a53b-14872024b969@zytor.com>

On Thu, 13 Nov 2025, H. Peter Anvin wrote:

> > I think this is going to be the most difficult.  I don't remember why I
> > rejected the old submission, but maybe it would have modified the
> > existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> > the simplest?
> > 
> 
> Okay, to I'm going to toss out a couple suggestions for naming:
> 
> 	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
> 	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
> 	O_NO?TOUCH
> 	O_NYET ("not yet")
> 	
> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
> or O_NYET; although it is perhaps a bit more "use case centric" than "what
> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
> would seem to needlessly preclude it being used for future similar use cases
> for files that are not device nodes.

 Hmm, I'm inconvinced about any of these.

 How about O_FDONLY, to reflect that you are after a file descriptor only 
with no further actions at open time while avoiding the ambiguity of names 
such as CONFIG vs NOCONFIG or speaking more broadly implying any specific 
intent of use at all such as with CONFIG/INIT/PREPARE/RESET/whatever?

 I think O_FDONLY is concise, easy to spell/say/remember, and fits the 
purpose.  Your call!

  Maciej

^ permalink raw reply

* [PATCH 2/2] man/man7/ip.7: Reword IP_PKTINFO's description
From: Jakub Głogowski @ 2025-11-14 14:29 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jakub Głogowski, linux-man, LKML, Linux API, ej
In-Reply-To: <cover.1763130571.git.not@dzwdz.net>

I've heavily cut down the first paragraph (which wasn't really saying
anything), and emphasized the difference between how recvmsg(2) and
sendmsg(2) treat this struct.

"This works only for datagram oriented sockets" is redundant with
"Not supported for SOCK_STREAM", and the mention of sendmsg(2) was moved
down.

I called it a boolean option because that's how these were introduced at
the start of the section.

I've tried rewording ipi_spec_dst's effect on sendmsg to be a bit more
clear.

The only piece of new information which this adds is that you can use
the structure returned by recvmsg with sendmsg, which directly follows
from the preceding text.

RFC 3542, Section 6, directly calls out this usecase for in6_pktinfo:

> Some UDP servers want to respond to client
> requests by sending their reply out the same interface on which the
> request was received and with the source IPv6 address of the reply
> equal to the destination IPv6 address of the request.  To do this the
> application can enable just the IPV6_RECVPKTINFO socket option and
> then use the received control information from recvmsg() as the
> outgoing control information for sendmsg().  The application need not
> examine or modify the in6_pktinfo structure at all.

I'm not sure if this is the best place to document this, as the sendmsg
behavior is unrelated to the IP_PKTINFO sockopt at all.  Maybe some of
the control messages should be broken out to another manpage?

Signed-off-by: Jakub Głogowski <not@dzwdz.net>
---
 man/man7/ip.7 | 49 +++++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/man/man7/ip.7 b/man/man7/ip.7
index a7f118b42..aa2508bc7 100644
--- a/man/man7/ip.7
+++ b/man/man7/ip.7
@@ -783,20 +783,13 @@ .SS Socket options
 .TP
 .BR IP_PKTINFO " (since Linux 2.2)"
 .\" Precisely: since Linux 2.1.68
-Pass an
-.B IP_PKTINFO
-ancillary message that contains a
-.I pktinfo
-structure that supplies some information about the incoming packet.
-This works only for datagram oriented sockets.
-The argument is a flag that tells the socket whether the
-.B IP_PKTINFO
-message should be passed or not.
-The message itself can be sent/retrieved
-only as a control message with a packet using
+If this boolean option is enabled,
 .BR recvmsg (2)
-or
-.BR sendmsg (2).
+outputs an
+.B IP_PKTINFO
+ancillary message containing an
+.I in_pktinfo
+structure.
 .IP
 .in +4n
 .EX
@@ -809,37 +802,37 @@ .SS Socket options
 .EE
 .in
 .IP
-When returned by
-.BR recvmsg (2) ,
+In this context,
 .I ipi_ifindex
 is the unique index of the interface the packet was received on.
 .I ipi_spec_dst
 is the preferred source address for replies to the given packet, and
 .I ipi_addr
-is the destination address in the packet header.
+is the destination address from the packet header.
 These addresses are usually the same,
 but can differ for broadcast or multicast packets.
 Note that, depending on the configured routes,
 .I ipi_spec_dst
 might belong to a different interface from the one that received the packet.
 .IP
-If
-.B IP_PKTINFO
-is passed to
-.BR sendmsg (2)
-and
+This structure can also be passed as an ancillary message to
+.BR sendmsg (2) .
+In that case,
 .\" This field is grossly misnamed
 .I ipi_spec_dst
-is not zero, then it is used as the local source address, for the routing
-table lookup, and for setting up IP source route options.
-When
+is used as the local source address
+(if non-zero),
+including for the purposes of setting up IP source route options.
+It's also used for the routing table lookup, unless
 .I ipi_ifindex
-is not zero, the primary local address of the interface specified by the
-index overwrites
-.I ipi_spec_dst
-for the routing table lookup.
+is non-zero \(en
+then the primary local address of that interface is used there instead.
 .I ipi_addr
 is ignored.
+The structure returned by
+.BR recvmsg (2)
+can be reused,
+which effectively sends a reply to the original packet.
 .IP
 Not supported for
 .B SOCK_STREAM
-- 
2.47.3


^ permalink raw reply related

* [PATCH 0/2] man7/ip.7: Clarify PKTINFO's docs
From: Jakub Głogowski @ 2025-11-14 14:29 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jakub Głogowski, linux-man, LKML, Linux API, ej

I found the PKTINFO docs pretty confusing, so I tried clarifying them:
- being more specific about each field in the struct
  (e.g. "local address of the packet" for a received packet could've
  been interpreted in myriad ways),
- making the differences between sendmsg(2)'s and recvmsg(2)'s handling
  of that struct more explicit,
- and some other slight rewording to make it (IMO) more readable - I cut
  out most of a paragraph that wasn't really saying anything, etc.

I'm not sure if this should even be documented in ip(7) together with
the other sockopts, though?  sendmsg(2)'s handling of in_pktinfo is
completely unrelated to the IP_PKTINFO sockopt.  Documenting it in its
own manual page would also give us more room for subsection headings and
other formatting, examples, etc - instead of trying to cram it into
what's already an enormous manpage.

Same goes for some of the other more complex sockopts, I guess.


PS. sorry for not signing this email, but neomutt didn't want to
cooperate :/  I'll try to figure it out for any followup patches.


Jakub Głogowski (2):
  man/man7/ip.7: Clarify PKTINFO's semantics depending on packet
    direction
  man/man7/ip.7: Reword IP_PKTINFO's description

 man/man7/ip.7 | 57 +++++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 27 deletions(-)

-- 
2.47.3


^ permalink raw reply

* [PATCH 1/2] man/man7/ip.7: Clarify PKTINFO's semantics depending on packet direction
From: Jakub Głogowski @ 2025-11-14 14:29 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jakub Głogowski, linux-man, LKML, Linux API, ej
In-Reply-To: <cover.1763130571.git.not@dzwdz.net>

For recvmsg(2), ipi_spec_dst is set by ipv4_pktinfo_prepare() to the
result of fib_compute_sec_dst().  The latter was introduced in
	linux.git 35ebf65e851c6d97 ("ipv4: Create and use fib_compute_spec_dst() helper.").

Quoting its commit message:

> The specific destination is the host we direct unicast replies to.
> Usually this is the original packet source address, but if we are
> responding to a multicast or broadcast packet we have to use something
> different.
>
> Specifically we must use the source address we would use if we were to
> send a packet to the unicast source of the original packet.

Experimentation seems to confirm that behavior.

As for the note about ipi_spec_dst being on a different interface:
- For unicast packets (for which ipi_spec_dst is the original
  destination address), I believe this is trivially true because Linux
  uses the weak host model (unless there's some interaction with
  RTCF_LOCAL that I'm missing).
- For multicast/broadcast packets, fib_compute_sec_dst() only passes the
  original interface to the lookup in the context of L3M.  In
  particular, the original implementation (cited above) set iif and oof
  to 0. Also, citing
	linux.git e7372197e15856ec ("net/ipv4: Set oif in fib_compute_spec_dst"),
  > If the device is not enslaved, oif is still 0 so no affect.

It doesn't seem like using an address specifically from the interface
the packet was received on was ever the intention.  I've also confirmed
this behavior (sending a multicast packet from another machine, whose IP
I've routed to a dummy interface).

I'm focusing on this because that's a misconception I've had before
digging into the code - the sendmsg behavior explained in the same
paragraph made me think ipi_spec_dst was the (primary?) address of
ipi_ifindex.  I think this is worth clarifying.

I've made it explicit that ipi_addr isn't used by sendmsg because that's
another possible misconception.

The (first) extra comma in sendmsg's ipi_spec_dst's description is meant
to emphasize that it's used as the local source address _and_ for the
routing table lookup, as opposed to just affecting the routing table
lookup.
Stylistically it might be a bit weird but idk how to convey this better.

Apart from the cited commits I was referencing the linux-6.17.7 tarball.

__fib_validate_source (and the comment near it) might also be of
interest to people trying to figure out what "specific destinations"
are, exactly.

Signed-off-by: Jakub Głogowski <not@dzwdz.net>
---
 man/man7/ip.7 | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/man/man7/ip.7 b/man/man7/ip.7
index a92939cd0..a7f118b42 100644
--- a/man/man7/ip.7
+++ b/man/man7/ip.7
@@ -809,12 +809,20 @@ .SS Socket options
 .EE
 .in
 .IP
+When returned by
+.BR recvmsg (2) ,
 .I ipi_ifindex
 is the unique index of the interface the packet was received on.
 .I ipi_spec_dst
-is the local address of the packet and
+is the preferred source address for replies to the given packet, and
 .I ipi_addr
 is the destination address in the packet header.
+These addresses are usually the same,
+but can differ for broadcast or multicast packets.
+Note that, depending on the configured routes,
+.I ipi_spec_dst
+might belong to a different interface from the one that received the packet.
+.IP
 If
 .B IP_PKTINFO
 is passed to
@@ -822,14 +830,16 @@ .SS Socket options
 and
 .\" This field is grossly misnamed
 .I ipi_spec_dst
-is not zero, then it is used as the local source address for the routing
-table lookup and for setting up IP source route options.
+is not zero, then it is used as the local source address, for the routing
+table lookup, and for setting up IP source route options.
 When
 .I ipi_ifindex
 is not zero, the primary local address of the interface specified by the
 index overwrites
 .I ipi_spec_dst
 for the routing table lookup.
+.I ipi_addr
+is ignored.
 .IP
 Not supported for
 .B SOCK_STREAM
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-14 14:48 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRcSpbwBabFjeYe3@kernel.org>

On Fri, Nov 14, 2025 at 6:30 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Fri, Nov 07, 2025 at 04:03:00PM -0500, Pasha Tatashin wrote:
> > Integrate the LUO with the KHO framework to enable passing LUO state
> > across a kexec reboot.
> >
> > When LUO is transitioned to a "prepared" state, it tells KHO to
> > finalize, so all memory segments that were added to KHO preservation
> > list are getting preserved. After "Prepared" state no new segments
> > can be preserved. If LUO is canceled, it also tells KHO to cancel the
> > serialization, and therefore, later LUO can go back into the prepared
> > state.
> >
> > This patch introduces the following changes:
> > - During the KHO finalization phase allocate FDT blob.
> > - Populate this FDT with a LUO compatibility string ("luo-v1").
> >
> > LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> > logic (`luo_do_*_calls`) remains unimplemented in this patch.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  include/linux/liveupdate.h         |   6 +
> >  include/linux/liveupdate/abi/luo.h |  54 +++++++
> >  kernel/liveupdate/luo_core.c       | 243 ++++++++++++++++++++++++++++-
> >  kernel/liveupdate/luo_internal.h   |  17 ++
> >  mm/mm_init.c                       |   4 +
> >  5 files changed, 323 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/liveupdate/abi/luo.h
> >  create mode 100644 kernel/liveupdate/luo_internal.h
> >
> > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> > index 730b76625fec..0be8804fc42a 100644
> > --- a/include/linux/liveupdate.h
> > +++ b/include/linux/liveupdate.h
> > @@ -13,6 +13,8 @@
> >
> >  #ifdef CONFIG_LIVEUPDATE
> >
> > +void __init liveupdate_init(void);
> > +
> >  /* Return true if live update orchestrator is enabled */
> >  bool liveupdate_enabled(void);
> >
> > @@ -21,6 +23,10 @@ int liveupdate_reboot(void);
> >
> >  #else /* CONFIG_LIVEUPDATE */
> >
> > +static inline void liveupdate_init(void)
> > +{
> > +}
>
> The common practice is to place brackets at the same line with function
> declaration.

Sure.

>
> ...
>
> > +static int __init luo_early_startup(void)
> > +{
> > +     phys_addr_t fdt_phys;
> > +     int err, ln_size;
> > +     const void *ptr;
> > +
> > +     if (!kho_is_enabled()) {
> > +             if (liveupdate_enabled())
> > +                     pr_warn("Disabling liveupdate because KHO is disabled\n");
> > +             luo_global.enabled = false;
> > +             return 0;
> > +     }
> > +
> > +     /* Retrieve LUO subtree, and verify its format. */
> > +     err = kho_retrieve_subtree(LUO_FDT_KHO_ENTRY_NAME, &fdt_phys);
> > +     if (err) {
> > +             if (err != -ENOENT) {
> > +                     pr_err("failed to retrieve FDT '%s' from KHO: %pe\n",
> > +                            LUO_FDT_KHO_ENTRY_NAME, ERR_PTR(err));
> > +                     return err;
> > +             }
> > +
> > +             return 0;
> > +     }
> > +
> > +     luo_global.fdt_in = __va(fdt_phys);
>
> phys_to_virt is clearer, isn't it?

Sure

>
> > +     err = fdt_node_check_compatible(luo_global.fdt_in, 0,
> > +                                     LUO_FDT_COMPATIBLE);
>
> ...
>
> > +void __init liveupdate_init(void)
> > +{
> > +     int err;
> > +
> > +     err = luo_early_startup();
> > +     if (err) {
> > +             pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > +                    ERR_PTR(err));
> > +             luo_global.enabled = false;
> > +     }
> > +}
> > +
> > +/* Called during boot to create LUO fdt tree */
>
>                          ^ create outgoing

OK

>
> > +static int __init luo_late_startup(void)
> > +{
> > +     int err;
> > +
> > +     if (!liveupdate_enabled())
> > +             return 0;
> > +
> > +     err = luo_fdt_setup();
> > +     if (err)
> > +             luo_global.enabled = false;
> > +
> > +     return err;
> > +}
> > +late_initcall(luo_late_startup);
>
> It would be nice to have a comment explaining why late_initcall() is fine
> and why there's no need to initialize the outgoing fdt earlier.

I will add a comment; basically it is fine because the outgoing data
structures are only used after we enter userspace.

>
> > +/**
> > + * luo_alloc_preserve - Allocate, zero, and preserve memory.
>
> I think this and the "free" counterparts would be useful for any KHO users,
> even those that don't need LUO.

I will move them to KHO.

>
> > + * @size: The number of bytes to allocate.
> > + *
> > + * Allocates a physically contiguous block of zeroed pages that is large
> > + * enough to hold @size bytes. The allocated memory is then registered with
> > + * KHO for preservation across a kexec.
> > + *
> > + * Note: The actual allocated size will be rounded up to the nearest
> > + * power-of-two page boundary.
> > + *
> > + * @return A virtual pointer to the allocated and preserved memory on success,
> > + * or an ERR_PTR() encoded error on failure.
> > + */
> > +void *luo_alloc_preserve(size_t size)
> > +{
> > +     struct folio *folio;
> > +     int order, ret;
> > +
> > +     if (!size)
> > +             return ERR_PTR(-EINVAL);
> > +
> > +     order = get_order(size);
> > +     if (order > MAX_PAGE_ORDER)
> > +             return ERR_PTR(-E2BIG);
>
> High order allocations would likely fail or at least cause a heavy reclaim.
> For now it seems that we won't be needing really large contiguous chunks so
> maybe limiting this to PAGE_ALLOC_COSTLY_ORDER?

Let's use MAX_PAGE_ORDER for now, my concern is that
PAGE_ALLOC_COSTLY_ORDER too fragile to make it part of ABI. If
allocation fails, the user will have to deal with it, as we return a
proper error code.

> Later if we'd need higher order allocations we can try to allocate with
> __GFP_NORETRY or __GFP_RETRY_MAYFAIL with a fallback to vmalloc.
>
> > +
> > +     folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
> > +     if (!folio)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     ret = kho_preserve_folio(folio);
> > +     if (ret) {
> > +             folio_put(folio);
> > +             return ERR_PTR(ret);
> > +     }
> > +
> > +     return folio_address(folio);
> > +}
> > +
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v5 07/22] liveupdate: luo_ioctl: add user interface
From: Pasha Tatashin @ 2025-11-14 14:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRcnRFnqhm3jkqd3@kernel.org>

On Fri, Nov 14, 2025 at 7:58 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Fri, Nov 07, 2025 at 04:03:05PM -0500, Pasha Tatashin wrote:
> > Introduce the user-space interface for the Live Update Orchestrator
> > via ioctl commands, enabling external control over the live update
> > process and management of preserved resources.
> >
> > The idea is that there is going to be a single userspace agent driving
> > the live update, therefore, only a single process can ever hold this
> > device opened at a time.
> >
> > The following ioctl commands are introduced:
> >
> > LIVEUPDATE_IOCTL_CREATE_SESSION
> > Provides a way for userspace to create a named session for grouping file
> > descriptors that need to be preserved. It returns a new file descriptor
> > representing the session.
> >
> > LIVEUPDATE_IOCTL_RETRIEVE_SESSION
> > Allows the userspace agent in the new kernel to reclaim a preserved
> > session by its name, receiving a new file descriptor to manage the
> > restored resources.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  include/uapi/linux/liveupdate.h  |  64 ++++++++++++
> >  kernel/liveupdate/luo_internal.h |  21 ++++
> >  kernel/liveupdate/luo_ioctl.c    | 173 +++++++++++++++++++++++++++++++
> >  3 files changed, 258 insertions(+)
>
> ...
>
> > +static int luo_ioctl_create_session(struct luo_ucmd *ucmd)
> > +{
> > +     struct liveupdate_ioctl_create_session *argp = ucmd->cmd;
> > +     struct file *file;
> > +     int ret;
> > +
> > +     argp->fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (argp->fd < 0)
> > +             return argp->fd;
> > +
> > +     ret = luo_session_create(argp->name, &file);
> > +     if (ret)
>
>                 put_unused_fd(fd) ?

Yes, thank you.

>
> > +             return ret;
> > +
> > +     ret = luo_ucmd_respond(ucmd, sizeof(*argp));
> > +     if (ret) {
> > +             fput(file);
> > +             put_unused_fd(argp->fd);
> > +             return ret;
> > +     }
>
> I think that using gotos for error handling is more appropriate here.

Sure, I will do that

>
> > +
> > +     fd_install(argp->fd, file);
> > +
> > +     return 0;
> > +}
> > +
> > +static int luo_ioctl_retrieve_session(struct luo_ucmd *ucmd)
> > +{
> > +     struct liveupdate_ioctl_retrieve_session *argp = ucmd->cmd;
> > +     struct file *file;
> > +     int ret;
> > +
> > +     argp->fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (argp->fd < 0)
> > +             return argp->fd;
> > +
> > +     ret = luo_session_retrieve(argp->name, &file);
> > +     if (ret < 0) {
> > +             put_unused_fd(argp->fd);
> > +
> > +             return ret;
> > +     }
> > +
> > +     ret = luo_ucmd_respond(ucmd, sizeof(*argp));
> > +     if (ret) {
> > +             fput(file);
> > +             put_unused_fd(argp->fd);
> > +             return ret;
> > +     }
>
> and here.

Sure

>
> > +
> > +     fd_install(argp->fd, file);
> > +
> > +     return 0;
> > +}
> > +
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v5 06/22] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-14 14:07 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRclYgHYXQFJ2Fpn@kernel.org>

On Fri, Nov 14, 2025 at 7:50 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Fri, Nov 07, 2025 at 04:03:04PM -0500, Pasha Tatashin wrote:
> > Introduce concept of "Live Update Sessions" within the LUO framework.
> > LUO sessions provide a mechanism to group and manage `struct file *`
> > instances (representing file descriptors) that need to be preserved
> > across a kexec-based live update.
> >
> > Each session is identified by a unique name and acts as a container
> > for file objects whose state is critical to a userspace workload, such
> > as a virtual machine or a high-performance database, aiming to maintain
> > their functionality across a kernel transition.
> >
> > This groundwork establishes the framework for preserving file-backed
> > state across kernel updates, with the actual file data preservation
> > mechanisms to be implemented in subsequent patches.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  include/linux/liveupdate/abi/luo.h |  81 ++++++
> >  include/uapi/linux/liveupdate.h    |   3 +
> >  kernel/liveupdate/Makefile         |   3 +-
> >  kernel/liveupdate/luo_core.c       |   9 +
> >  kernel/liveupdate/luo_internal.h   |  39 +++
> >  kernel/liveupdate/luo_session.c    | 405 +++++++++++++++++++++++++++++
> >  6 files changed, 539 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel/liveupdate/luo_session.c
> >
> > diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> > index 9483a294287f..37b9fecef3f7 100644
> > --- a/include/linux/liveupdate/abi/luo.h
> > +++ b/include/linux/liveupdate/abi/luo.h
> > @@ -28,6 +28,11 @@
> >   *     / {
> >   *         compatible = "luo-v1";
> >   *         liveupdate-number = <...>;
> > + *
> > + *         luo-session {
> > + *             compatible = "luo-session-v1";
> > + *             luo-session-head = <phys_addr_of_session_head_ser>;
> > + *         };
> >   *     };
> >   *
> >   * Main LUO Node (/):
> > @@ -36,11 +41,37 @@
> >   *     Identifies the overall LUO ABI version.
> >   *   - liveupdate-number: u64
> >   *     A counter tracking the number of successful live updates performed.
> > + *
> > + * Session Node (luo-session):
> > + *   This node describes all preserved user-space sessions.
> > + *
> > + *   - compatible: "luo-session-v1"
> > + *     Identifies the session ABI version.
> > + *   - luo-session-head: u64
> > + *     The physical address of a `struct luo_session_head_ser`. This structure is
> > + *     the header for a contiguous block of memory containing an array of
> > + *     `struct luo_session_ser`, one for each preserved session.
> > + *
> > + * Serialization Structures:
> > + *   The FDT properties point to memory regions containing arrays of simple,
> > + *   `__packed` structures. These structures contain the actual preserved state.
> > + *
> > + *   - struct luo_session_head_ser:
> > + *     Header for the session array. Contains the total page count of the
> > + *     preserved memory block and the number of `struct luo_session_ser`
> > + *     entries that follow.
> > + *
> > + *   - struct luo_session_ser:
> > + *     Metadata for a single session, including its name and a physical pointer
> > + *     to another preserved memory block containing an array of
> > + *     `struct luo_file_ser` for all files in that session.
> >   */
> >
> >  #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
> >  #define _LINUX_LIVEUPDATE_ABI_LUO_H
> >
> > +#include <uapi/linux/liveupdate.h>
> > +
> >  /*
> >   * The LUO FDT hooks all LUO state for sessions, fds, etc.
> >   * In the root it allso carries "liveupdate-number" 64-bit property that
> > @@ -51,4 +82,54 @@
> >  #define LUO_FDT_COMPATIBLE   "luo-v1"
> >  #define LUO_FDT_LIVEUPDATE_NUM       "liveupdate-number"
> >
> > +/*
> > + * LUO FDT session node
> > + * LUO_FDT_SESSION_HEAD:  is a u64 physical address of struct
> > + *                        luo_session_head_ser
> > + */
> > +#define LUO_FDT_SESSION_NODE_NAME    "luo-session"
> > +#define LUO_FDT_SESSION_COMPATIBLE   "luo-session-v1"
> > +#define LUO_FDT_SESSION_HEAD         "luo-session-head"
> > +
> > +/**
> > + * struct luo_session_head_ser - Header for the serialized session data block.
> > + * @pgcnt: The total size, in pages, of the entire preserved memory block
> > + *         that this header describes.
> > + * @count: The number of 'struct luo_session_ser' entries that immediately
> > + *         follow this header in the memory block.
> > + *
> > + * This structure is located at the beginning of a contiguous block of
> > + * physical memory preserved across the kexec. It provides the necessary
> > + * metadata to interpret the array of session entries that follow.
> > + */
> > +struct luo_session_head_ser {
> > +     u64 pgcnt;
> > +     u64 count;
> > +} __packed;
> > +
> > +/**
> > + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> > + * @name:    The unique name of the session, copied from the `luo_session`
> > + *           structure.
> > + * @files:   The physical address of a contiguous memory block that holds
> > + *           the serialized state of files.
> > + * @pgcnt:   The number of pages occupied by the `files` memory block.
> > + * @count:   The total number of files that were part of this session during
> > + *           serialization. Used for iteration and validation during
> > + *           restoration.
> > + *
> > + * This structure is used to package session-specific metadata for transfer
> > + * between kernels via Kexec Handover. An array of these structures (one per
> > + * session) is created and passed to the new kernel, allowing it to reconstruct
> > + * the session context.
> > + *
> > + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> > + */
> > +struct luo_session_ser {
> > +     char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> > +     u64 files;
> > +     u64 pgcnt;
> > +     u64 count;
> > +} __packed;
> > +
> >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> > index df34c1642c4d..d2ef2f7e0dbd 100644
> > --- a/include/uapi/linux/liveupdate.h
> > +++ b/include/uapi/linux/liveupdate.h
> > @@ -43,4 +43,7 @@
> >  /* The ioctl type, documented in ioctl-number.rst */
> >  #define LIVEUPDATE_IOCTL_TYPE                0xBA
> >
> > +/* The maximum length of session name including null termination */
> > +#define LIVEUPDATE_SESSION_NAME_LENGTH 56
>
> Out of curiosity, why 56? :)

There is no architectural requirement, I picked 56 to be long enough
to contain a meaningful identifier, and also more efficiently fit the
luo_session_ser[] array. However, now thinking about this, I will bump
it up to 64-bytes, just so it does not look strange.

>
> > +
> >  #endif /* _UAPI_LIVEUPDATE_H */
> > diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> > index 413722002b7a..83285e7ad726 100644
> > --- a/kernel/liveupdate/Makefile
> > +++ b/kernel/liveupdate/Makefile
> > @@ -2,7 +2,8 @@
> >
> >  luo-y :=                                                             \
> >               luo_core.o                                              \
> > -             luo_ioctl.o
> > +             luo_ioctl.o                                             \
> > +             luo_session.o
> >
> >  obj-$(CONFIG_KEXEC_HANDOVER)         += kexec_handover.o
> >  obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)   += kexec_handover_debug.o
> > diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> > index c1bd236bccb0..83257ab93ebb 100644
> > --- a/kernel/liveupdate/luo_core.c
> > +++ b/kernel/liveupdate/luo_core.c
> > @@ -116,6 +116,10 @@ static int __init luo_early_startup(void)
> >       pr_info("Retrieved live update data, liveupdate number: %lld\n",
> >               luo_global.liveupdate_num);
> >
> > +     err = luo_session_setup_incoming(luo_global.fdt_in);
> > +     if (err)
> > +             return err;
> > +
> >       return 0;
> >  }
> >
> > @@ -149,6 +153,7 @@ static int __init luo_fdt_setup(void)
> >       err |= fdt_begin_node(fdt_out, "");
> >       err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
> >       err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
> > +     err |= luo_session_setup_outgoing(fdt_out);
> >       err |= fdt_end_node(fdt_out);
> >       err |= fdt_finish(fdt_out);
> >       if (err)
> > @@ -202,6 +207,10 @@ int liveupdate_reboot(void)
> >       if (!liveupdate_enabled())
> >               return 0;
> >
> > +     err = luo_session_serialize();
> > +     if (err)
> > +             return err;
> > +
> >       err = kho_finalize();
> >       if (err) {
> >               pr_err("kho_finalize failed %d\n", err);
> > diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> > index 29f47a69be0b..b4f2d1443c76 100644
> > --- a/kernel/liveupdate/luo_internal.h
> > +++ b/kernel/liveupdate/luo_internal.h
> > @@ -14,4 +14,43 @@ void *luo_alloc_preserve(size_t size);
> >  void luo_free_unpreserve(void *mem, size_t size);
> >  void luo_free_restore(void *mem, size_t size);
> >
> > +/**
> > + * struct luo_session - Represents an active or incoming Live Update session.
> > + * @name:       A unique name for this session, used for identification and
> > + *              retrieval.
> > + * @files_list: An ordered list of files associated with this session, it is
> > + *              ordered by preservation time.
> > + * @ser:        Pointer to the serialized data for this session.
> > + * @count:      A counter tracking the number of files currently stored in the
> > + *              @files_xa for this session.
>
>                    ^@files_list

Sure, thanks

>
> > + * @list:       A list_head member used to link this session into a global list
> > + *              of either outgoing (to be preserved) or incoming (restored from
> > + *              previous kernel) sessions.
> > + * @retrieved:  A boolean flag indicating whether this session has been
> > + *              retrieved by a consumer in the new kernel.
> > + * @mutex:      Session lock, protects files_list, and count.
> > + * @files:      The physically contiguous memory block that holds the serialized
> > + *              state of files.
> > + * @pgcnt:      The number of pages files occupy.
>
>                                       ^ @files

Ok

>
> > + */
> > +struct luo_session {
> > +     char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> > +     struct list_head files_list;
> > +     struct luo_session_ser *ser;
> > +     long count;
> > +     struct list_head list;
> > +     bool retrieved;
> > +     struct mutex mutex;
> > +     struct luo_file_ser *files;
> > +     u64 pgcnt;
> > +};
> > +
> > +int luo_session_create(const char *name, struct file **filep);
> > +int luo_session_retrieve(const char *name, struct file **filep);
> > +int __init luo_session_setup_outgoing(void *fdt);
> > +int __init luo_session_setup_incoming(void *fdt);
> > +int luo_session_serialize(void);
> > +int luo_session_deserialize(void);
>
> The last four deal with all the sessions, maybe use plural in the function
> names.

luo_session_* is a common prefix, I would prefer to keep it the same.
These functions are:

luo_session_serialize_sessions()
luo_session_deserialize_sessions()

But, that becomes redundant, so let's keep them as is.

>
> > +bool luo_session_is_deserialized(void);
> > +
> >  #endif /* _LINUX_LUO_INTERNAL_H */
> > diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
> > new file mode 100644
> > index 000000000000..a3513118aa74
> > --- /dev/null
> > +++ b/kernel/liveupdate/luo_session.c
> > @@ -0,0 +1,405 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +/**
> > + * DOC: LUO Sessions
> > + *
> > + * LUO Sessions provide the core mechanism for grouping and managing `struct
> > + * file *` instances that need to be preserved across a kexec-based live
> > + * update. Each session acts as a named container for a set of file objects,
> > + * allowing a userspace agent to manage the lifecycle of resources critical to a
> > + * workload.
> > + *
> > + * Core Concepts:
> > + *
> > + * - Named Containers: Sessions are identified by a unique, user-provided name,
> > + *   which is used for both creation in the current kernel and retrieval in the
> > + *   next kernel.
> > + *
> > + * - Userspace Interface: Session management is driven from userspace via
> > + *   ioctls on /dev/liveupdate.
> > + *
> > + * - Serialization: Session metadata is preserved using the KHO framework. When
> > + *   a live update is triggered via kexec, an array of `struct luo_session_ser`
> > + *   is populated and placed in a preserved memory region. An FDT node is also
> > + *   created, containing the count of sessions and the physical address of this
> > + *   array.
> > + *
> > + * Session Lifecycle:
> > + *
> > + * 1.  Creation: A userspace agent calls `luo_session_create()` to create a
> > + *     new, empty session and receives a file descriptor for it.
> > + *
> > + * 2.  Serialization: When the `reboot(LINUX_REBOOT_CMD_KEXEC)` syscall is
> > + *     made, `luo_session_serialize()` is called. It iterates through all
> > + *     active sessions and writes their metadata into a memory area preserved
> > + *     by KHO.
> > + *
> > + * 3.  Deserialization (in new kernel): After kexec, `luo_session_deserialize()`
> > + *     runs, reading the serialized data and creating a list of `struct
> > + *     luo_session` objects representing the preserved sessions.
> > + *
> > + * 4.  Retrieval: A userspace agent in the new kernel can then call
> > + *     `luo_session_retrieve()` with a session name to get a new file
> > + *     descriptor and access the preserved state.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/anon_inodes.h>
> > +#include <linux/errno.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/libfdt.h>
> > +#include <linux/liveupdate.h>
> > +#include <linux/liveupdate/abi/luo.h>
> > +#include <uapi/linux/liveupdate.h>
> > +#include "luo_internal.h"
> > +
> > +/* 16 4K pages, give space for 819 sessions */
> > +#define LUO_SESSION_PGCNT    16ul
> > +#define LUO_SESSION_MAX              (((LUO_SESSION_PGCNT << PAGE_SHIFT) -   \
> > +             sizeof(struct luo_session_head_ser)) /                  \
> > +             sizeof(struct luo_session_ser))
> > +
> > +/**
> > + * struct luo_session_head - Head struct for managing LUO sessions.
>
> Head of what? ;-)

Head of incoming and outgoing session lists, sounds appropriate, but
if you do not like that how about: luo_session_set ?

> Maybe luo_session_list? Or even luo_sessions?

luo_session_list would sound good, but I prefer to use that for
"struct list_head *"  types. Let's keep luo_session

>
> > + * @count:    The number of sessions currently tracked in the @list.
> > + * @list:     The head of the linked list of `struct luo_session` instances.
> > + * @rwsem:    A read-write semaphore providing synchronized access to the
> > + *            session list and other fields in this structure.
> > + * @head_ser: The head data of serialization array.
>
>                     ^ header?

Yes, I am going to re-name all these to header.

>
> > + * @ser:      The serialized session data (an array of
> > + *            `struct luo_session_ser`).
> > + * @active:   Set to true when first initialized. If previous kernel did not
> > + *            send session data, active stays false for incoming.
> > + */
> > +struct luo_session_head {
> > +     long count;
> > +     struct list_head list;
> > +     struct rw_semaphore rwsem;
> > +     struct luo_session_head_ser *head_ser;
> > +     struct luo_session_ser *ser;
> > +     bool active;
> > +};
> > +
> > +/**
> > + * struct luo_session_global - Global container for managing LUO sessions.
> > + * @incoming:     The sessions passed from the previous kernel.
> > + * @outgoing:     The sessions that are going to be passed to the next kernel.
> > + * @deserialized: The sessions have been deserialized once /dev/liveupdate
> > + *                has been opened.
> > + */
> > +struct luo_session_global {
> > +     struct luo_session_head incoming;
> > +     struct luo_session_head outgoing;
> > +     bool deserialized;
> > +} luo_session_global;
>
> Should be static. And frankly, I don't think grouping two global variables
> into a struct gains much.

I have sent a separate fix-up patch to make it static.
>
> static struct luo_sessions luo_sessions_incoming;
> static struct luo_sessions luo_sessions_outgoing;

 I prefer to group globals in a struct so over time we do not end-up
sprinkling them all over the file.


>
> reads clearer to me.
>
> > +
> > +static struct luo_session *luo_session_alloc(const char *name)
> > +{
> > +     struct luo_session *session = kzalloc(sizeof(*session), GFP_KERNEL);
> > +
> > +     if (!session)
> > +             return NULL;
> > +
> > +     strscpy(session->name, name, sizeof(session->name));
> > +     INIT_LIST_HEAD(&session->files_list);
> > +     session->count = 0;
>
> I'd move this after mutex_init(), a bit more readable IMHO.

Sure

>
> > +     INIT_LIST_HEAD(&session->list);
> > +     mutex_init(&session->mutex);
> > +
> > +     return session;
> > +}
> > +
> > +static void luo_session_free(struct luo_session *session)
> > +{
> > +     WARN_ON(session->count);
> > +     WARN_ON(!list_empty(&session->files_list));
> > +     mutex_destroy(&session->mutex);
> > +     kfree(session);
> > +}
> > +
> > +static int luo_session_insert(struct luo_session_head *sh,
> > +                           struct luo_session *session)
> > +{
> > +     struct luo_session *it;
> > +
> > +     guard(rwsem_write)(&sh->rwsem);
> > +
> > +     /*
> > +      * For outgoing we should make sure there is room in serialization array
> > +      * for new session.
> > +      */
> > +     if (sh == &luo_session_global.outgoing) {
> > +             if (sh->count == LUO_SESSION_MAX)
> > +                     return -ENOMEM;
> > +     }
>
> Not a big deal, but this could be outside the guard().

Yes, but then we would still need to check inside... So, since it is
not a performance critical path, let's keep it cleaner and check only
inside.

>
> > +
> > +     /*
> > +      * For small number of sessions this loop won't hurt performance
> > +      * but if we ever start using a lot of sessions, this might
> > +      * become a bottle neck during deserialization time, as it would
> > +      * cause O(n*n) complexity.
> > +      */
>
> The loop is always O(n*n) in the worst case, no matter how many sessions
> there are ;-)

Yes, this is what I am stating, just logically the number of sessions
is not in millions, so it should not cause this to be a problem if it
ever does, we can fix it.

>
> > +     list_for_each_entry(it, &sh->list, list) {
> > +             if (!strncmp(it->name, session->name, sizeof(it->name)))
> > +                     return -EEXIST;
> > +     }
> > +     list_add_tail(&session->list, &sh->list);
> > +     sh->count++;
> > +
> > +     return 0;
> > +}
> > +
> > +static void luo_session_remove(struct luo_session_head *sh,
> > +                            struct luo_session *session)
> > +{
> > +     guard(rwsem_write)(&sh->rwsem);
> > +     list_del(&session->list);
> > +     sh->count--;
> > +}
> > +
> > +static int luo_session_release(struct inode *inodep, struct file *filep)
> > +{
> > +     struct luo_session *session = filep->private_data;
> > +     struct luo_session_head *sh;
> > +
> > +     /* If retrieved is set, it means this session is from incoming list */
> > +     if (session->retrieved)
> > +             sh = &luo_session_global.incoming;
> > +     else
> > +             sh = &luo_session_global.outgoing;
>
> Maybe just add a backpointer to the list to struct luo_session?

This is the only place where this is used, I think it is readable
instead of carrying an extra 8-byte state in every session.

>
> > +
> > +     luo_session_remove(sh, session);
> > +     luo_session_free(session);
> > +
> > +     return 0;
> > +}
> > +
> > +static const struct file_operations luo_session_fops = {
> > +     .owner = THIS_MODULE,
> > +     .release = luo_session_release,
> > +};
> > +
> > +/* Create a "struct file" for session */
> > +static int luo_session_getfile(struct luo_session *session, struct file **filep)
> > +{
> > +     char name_buf[128];
> > +     struct file *file;
> > +
> > +     guard(mutex)(&session->mutex);
> > +     snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
> > +     file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
> > +     if (IS_ERR(file))
> > +             return PTR_ERR(file);
> > +
> > +     *filep = file;
> > +
> > +     return 0;
> > +}
> > +
> > +int luo_session_create(const char *name, struct file **filep)
> > +{
> > +     struct luo_session *session;
> > +     int err;
> > +
> > +     session = luo_session_alloc(name);
> > +     if (!session)
> > +             return -ENOMEM;
> > +
> > +     err = luo_session_insert(&luo_session_global.outgoing, session);
> > +     if (err) {
> > +             luo_session_free(session);
> > +             return err;
>
> Please goto err_free

Sure.

>
> > +     }
> > +
> > +     err = luo_session_getfile(session, filep);
> > +     if (err) {
> > +             luo_session_remove(&luo_session_global.outgoing, session);
> > +             luo_session_free(session);
>
> and goto err_remove

Sure.

>
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +int luo_session_retrieve(const char *name, struct file **filep)
> > +{
> > +     struct luo_session_head *sh = &luo_session_global.incoming;
> > +     struct luo_session *session = NULL;
> > +     struct luo_session *it;
> > +     int err;
> > +
> > +     scoped_guard(rwsem_read, &sh->rwsem) {
> > +             list_for_each_entry(it, &sh->list, list) {
> > +                     if (!strncmp(it->name, name, sizeof(it->name))) {
> > +                             session = it;
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +
> > +     if (!session)
> > +             return -ENOENT;
> > +
> > +     scoped_guard(mutex, &session->mutex) {
> > +             if (session->retrieved)
> > +                     return -EINVAL;
> > +     }
> > +
> > +     err = luo_session_getfile(session, filep);
> > +     if (!err) {
> > +             scoped_guard(mutex, &session->mutex)
> > +                     session->retrieved = true;
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +int __init luo_session_setup_outgoing(void *fdt_out)
> > +{
> > +     struct luo_session_head_ser *head_ser;
> > +     u64 head_ser_pa;
> > +     int err;
> > +
> > +     head_ser = luo_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
> > +     if (IS_ERR(head_ser))
> > +             return PTR_ERR(head_ser);
> > +     head_ser_pa = __pa(head_ser);
>
> virt_to_phys please
>
> > +
> > +     err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
> > +     err |= fdt_property_string(fdt_out, "compatible",
> > +                                LUO_FDT_SESSION_COMPATIBLE);
> > +     err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEAD, &head_ser_pa,
> > +                         sizeof(head_ser_pa));
> > +     err |= fdt_end_node(fdt_out);
> > +
> > +     if (err)
> > +             goto err_unpreserve;
> > +
> > +     head_ser->pgcnt = LUO_SESSION_PGCNT;
> > +     INIT_LIST_HEAD(&luo_session_global.outgoing.list);
> > +     init_rwsem(&luo_session_global.outgoing.rwsem);
> > +     luo_session_global.outgoing.head_ser = head_ser;
> > +     luo_session_global.outgoing.ser = (void *)(head_ser + 1);
> > +     luo_session_global.outgoing.active = true;
> > +
> > +     return 0;
> > +
> > +err_unpreserve:
> > +     luo_free_unpreserve(head_ser, LUO_SESSION_PGCNT << PAGE_SHIFT);
> > +     return err;
> > +}
> > +
> > +int __init luo_session_setup_incoming(void *fdt_in)
> > +{
> > +     struct luo_session_head_ser *head_ser;
> > +     int err, head_size, offset;
> > +     const void *ptr;
> > +     u64 head_ser_pa;
> > +
> > +     offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
> > +     if (offset < 0) {
> > +             pr_err("Unable to get session node: [%s]\n",
> > +                    LUO_FDT_SESSION_NODE_NAME);
> > +             return -EINVAL;
> > +     }
> > +
> > +     err = fdt_node_check_compatible(fdt_in, offset,
> > +                                     LUO_FDT_SESSION_COMPATIBLE);
> > +     if (err) {
> > +             pr_err("Session node incompatibale [%s]\n",
> > +                    LUO_FDT_SESSION_COMPATIBLE);
> > +             return -EINVAL;
> > +     }
> > +
> > +     head_size = 0;
> > +     ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEAD, &head_size);
> > +     if (!ptr || head_size != sizeof(u64)) {
> > +             pr_err("Unable to get session head '%s' [%d]\n",
> > +                    LUO_FDT_SESSION_HEAD, head_size);
> > +             return -EINVAL;
> > +     }
> > +
> > +     memcpy(&head_ser_pa, ptr, sizeof(u64));
> > +     head_ser = __va(head_ser_pa);
> > +
> > +     luo_session_global.incoming.head_ser = head_ser;
> > +     luo_session_global.incoming.ser = (void *)(head_ser + 1);
> > +     INIT_LIST_HEAD(&luo_session_global.incoming.list);
> > +     init_rwsem(&luo_session_global.incoming.rwsem);
> > +     luo_session_global.incoming.active = true;
> > +
> > +     return 0;
> > +}
> > +
> > +bool luo_session_is_deserialized(void)
> > +{
> > +     return luo_session_global.deserialized;
> > +}
> > +
> > +int luo_session_deserialize(void)
> > +{
> > +     struct luo_session_head *sh = &luo_session_global.incoming;
> > +
> > +     if (luo_session_is_deserialized())
> > +             return 0;
> > +
> > +     luo_session_global.deserialized = true;
>
> Shouldn't this be set after deserialization succeeded?

We do luo_session_deserialize() only once even if it failed or was not
needed. So, set this flag at the beginning.

>
> > +     if (!sh->active) {
> > +             INIT_LIST_HEAD(&sh->list);
> > +             init_rwsem(&sh->rwsem);
> > +             return 0;
> > +     }
> > +
> > +     for (int i = 0; i < sh->head_ser->count; i++) {
> > +             struct luo_session *session;
> > +
> > +             session = luo_session_alloc(sh->ser[i].name);
> > +             if (!session) {
> > +                     pr_warn("Failed to allocate session [%s] during deserialization\n",
> > +                             sh->ser[i].name);
> > +                     return -ENOMEM;
> > +             }
> > +
> > +             if (luo_session_insert(sh, session)) {
> > +                     pr_warn("Failed to insert session due to name conflict [%s]\n",
> > +                             session->name);
> > +                     return -EEXIST;
>
> Need to free allocated sessions if an insert fails.

Thanks, I will fix it.

>
> > +             }
> > +
> > +             session->count = sh->ser[i].count;
> > +             session->files = __va(sh->ser[i].files);
> > +             session->pgcnt = sh->ser[i].pgcnt;
> > +     }
> > +
> > +     luo_free_restore(sh->head_ser, sh->head_ser->pgcnt << PAGE_SHIFT);
> > +     sh->head_ser = NULL;
> > +     sh->ser = NULL;
> > +
> > +     return 0;
> > +}
> > +
> > +int luo_session_serialize(void)
> > +{
> > +     struct luo_session_head *sh = &luo_session_global.outgoing;
> > +     struct luo_session *session;
> > +     int i = 0;
> > +
> > +     guard(rwsem_write)(&sh->rwsem);
> > +     list_for_each_entry(session, &sh->list, list) {
> > +             strscpy(sh->ser[i].name, session->name,
> > +                     sizeof(sh->ser[i].name));
> > +             sh->ser[i].count = session->count;
> > +             sh->ser[i].files = __pa(session->files);
> > +             sh->ser[i].pgcnt = session->pgcnt;
> > +             i++;
> > +     }
> > +     sh->head_ser->count = sh->count;
> > +
> > +     return 0;
> > +}
> > --
> > 2.51.2.1041.gc1ab5b90ca-goog
> >
> >
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v5 07/22] liveupdate: luo_ioctl: add user interface
From: Mike Rapoport @ 2025-11-14 12:57 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-8-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:05PM -0500, Pasha Tatashin wrote:
> Introduce the user-space interface for the Live Update Orchestrator
> via ioctl commands, enabling external control over the live update
> process and management of preserved resources.
> 
> The idea is that there is going to be a single userspace agent driving
> the live update, therefore, only a single process can ever hold this
> device opened at a time.
> 
> The following ioctl commands are introduced:
> 
> LIVEUPDATE_IOCTL_CREATE_SESSION
> Provides a way for userspace to create a named session for grouping file
> descriptors that need to be preserved. It returns a new file descriptor
> representing the session.
> 
> LIVEUPDATE_IOCTL_RETRIEVE_SESSION
> Allows the userspace agent in the new kernel to reclaim a preserved
> session by its name, receiving a new file descriptor to manage the
> restored resources.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/uapi/linux/liveupdate.h  |  64 ++++++++++++
>  kernel/liveupdate/luo_internal.h |  21 ++++
>  kernel/liveupdate/luo_ioctl.c    | 173 +++++++++++++++++++++++++++++++
>  3 files changed, 258 insertions(+)

...
  
> +static int luo_ioctl_create_session(struct luo_ucmd *ucmd)
> +{
> +	struct liveupdate_ioctl_create_session *argp = ucmd->cmd;
> +	struct file *file;
> +	int ret;
> +
> +	argp->fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (argp->fd < 0)
> +		return argp->fd;
> +
> +	ret = luo_session_create(argp->name, &file);
> +	if (ret)

		put_unused_fd(fd) ?

> +		return ret;
> +
> +	ret = luo_ucmd_respond(ucmd, sizeof(*argp));
> +	if (ret) {
> +		fput(file);
> +		put_unused_fd(argp->fd);
> +		return ret;
> +	}

I think that using gotos for error handling is more appropriate here.

> +
> +	fd_install(argp->fd, file);
> +
> +	return 0;
> +}
> +
> +static int luo_ioctl_retrieve_session(struct luo_ucmd *ucmd)
> +{
> +	struct liveupdate_ioctl_retrieve_session *argp = ucmd->cmd;
> +	struct file *file;
> +	int ret;
> +
> +	argp->fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (argp->fd < 0)
> +		return argp->fd;
> +
> +	ret = luo_session_retrieve(argp->name, &file);
> +	if (ret < 0) {
> +		put_unused_fd(argp->fd);
> +
> +		return ret;
> +	}
> +
> +	ret = luo_ucmd_respond(ucmd, sizeof(*argp));
> +	if (ret) {
> +		fput(file);
> +		put_unused_fd(argp->fd);
> +		return ret;
> +	}

and here.

> +
> +	fd_install(argp->fd, file);
> +
> +	return 0;
> +}
> +

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 06/22] liveupdate: luo_session: add sessions support
From: Mike Rapoport @ 2025-11-14 12:49 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-7-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:04PM -0500, Pasha Tatashin wrote:
> Introduce concept of "Live Update Sessions" within the LUO framework.
> LUO sessions provide a mechanism to group and manage `struct file *`
> instances (representing file descriptors) that need to be preserved
> across a kexec-based live update.
> 
> Each session is identified by a unique name and acts as a container
> for file objects whose state is critical to a userspace workload, such
> as a virtual machine or a high-performance database, aiming to maintain
> their functionality across a kernel transition.
> 
> This groundwork establishes the framework for preserving file-backed
> state across kernel updates, with the actual file data preservation
> mechanisms to be implemented in subsequent patches.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate/abi/luo.h |  81 ++++++
>  include/uapi/linux/liveupdate.h    |   3 +
>  kernel/liveupdate/Makefile         |   3 +-
>  kernel/liveupdate/luo_core.c       |   9 +
>  kernel/liveupdate/luo_internal.h   |  39 +++
>  kernel/liveupdate/luo_session.c    | 405 +++++++++++++++++++++++++++++
>  6 files changed, 539 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/liveupdate/luo_session.c
> 
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 9483a294287f..37b9fecef3f7 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -28,6 +28,11 @@
>   *     / {
>   *         compatible = "luo-v1";
>   *         liveupdate-number = <...>;
> + *
> + *         luo-session {
> + *             compatible = "luo-session-v1";
> + *             luo-session-head = <phys_addr_of_session_head_ser>;
> + *         };
>   *     };
>   *
>   * Main LUO Node (/):
> @@ -36,11 +41,37 @@
>   *     Identifies the overall LUO ABI version.
>   *   - liveupdate-number: u64
>   *     A counter tracking the number of successful live updates performed.
> + *
> + * Session Node (luo-session):
> + *   This node describes all preserved user-space sessions.
> + *
> + *   - compatible: "luo-session-v1"
> + *     Identifies the session ABI version.
> + *   - luo-session-head: u64
> + *     The physical address of a `struct luo_session_head_ser`. This structure is
> + *     the header for a contiguous block of memory containing an array of
> + *     `struct luo_session_ser`, one for each preserved session.
> + *
> + * Serialization Structures:
> + *   The FDT properties point to memory regions containing arrays of simple,
> + *   `__packed` structures. These structures contain the actual preserved state.
> + *
> + *   - struct luo_session_head_ser:
> + *     Header for the session array. Contains the total page count of the
> + *     preserved memory block and the number of `struct luo_session_ser`
> + *     entries that follow.
> + *
> + *   - struct luo_session_ser:
> + *     Metadata for a single session, including its name and a physical pointer
> + *     to another preserved memory block containing an array of
> + *     `struct luo_file_ser` for all files in that session.
>   */
>  
>  #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
>  #define _LINUX_LIVEUPDATE_ABI_LUO_H
>  
> +#include <uapi/linux/liveupdate.h>
> +
>  /*
>   * The LUO FDT hooks all LUO state for sessions, fds, etc.
>   * In the root it allso carries "liveupdate-number" 64-bit property that
> @@ -51,4 +82,54 @@
>  #define LUO_FDT_COMPATIBLE	"luo-v1"
>  #define LUO_FDT_LIVEUPDATE_NUM	"liveupdate-number"
>  
> +/*
> + * LUO FDT session node
> + * LUO_FDT_SESSION_HEAD:  is a u64 physical address of struct
> + *                        luo_session_head_ser
> + */
> +#define LUO_FDT_SESSION_NODE_NAME	"luo-session"
> +#define LUO_FDT_SESSION_COMPATIBLE	"luo-session-v1"
> +#define LUO_FDT_SESSION_HEAD		"luo-session-head"
> +
> +/**
> + * struct luo_session_head_ser - Header for the serialized session data block.
> + * @pgcnt: The total size, in pages, of the entire preserved memory block
> + *         that this header describes.
> + * @count: The number of 'struct luo_session_ser' entries that immediately
> + *         follow this header in the memory block.
> + *
> + * This structure is located at the beginning of a contiguous block of
> + * physical memory preserved across the kexec. It provides the necessary
> + * metadata to interpret the array of session entries that follow.
> + */
> +struct luo_session_head_ser {
> +	u64 pgcnt;
> +	u64 count;
> +} __packed;
> +
> +/**
> + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> + * @name:    The unique name of the session, copied from the `luo_session`
> + *           structure.
> + * @files:   The physical address of a contiguous memory block that holds
> + *           the serialized state of files.
> + * @pgcnt:   The number of pages occupied by the `files` memory block.
> + * @count:   The total number of files that were part of this session during
> + *           serialization. Used for iteration and validation during
> + *           restoration.
> + *
> + * This structure is used to package session-specific metadata for transfer
> + * between kernels via Kexec Handover. An array of these structures (one per
> + * session) is created and passed to the new kernel, allowing it to reconstruct
> + * the session context.
> + *
> + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> + */
> +struct luo_session_ser {
> +	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +	u64 files;
> +	u64 pgcnt;
> +	u64 count;
> +} __packed;
> +
>  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> index df34c1642c4d..d2ef2f7e0dbd 100644
> --- a/include/uapi/linux/liveupdate.h
> +++ b/include/uapi/linux/liveupdate.h
> @@ -43,4 +43,7 @@
>  /* The ioctl type, documented in ioctl-number.rst */
>  #define LIVEUPDATE_IOCTL_TYPE		0xBA
>  
> +/* The maximum length of session name including null termination */
> +#define LIVEUPDATE_SESSION_NAME_LENGTH 56

Out of curiosity, why 56? :)

> +
>  #endif /* _UAPI_LIVEUPDATE_H */
> diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> index 413722002b7a..83285e7ad726 100644
> --- a/kernel/liveupdate/Makefile
> +++ b/kernel/liveupdate/Makefile
> @@ -2,7 +2,8 @@
>  
>  luo-y :=								\
>  		luo_core.o						\
> -		luo_ioctl.o
> +		luo_ioctl.o						\
> +		luo_session.o
>  
>  obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
>  obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
> diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> index c1bd236bccb0..83257ab93ebb 100644
> --- a/kernel/liveupdate/luo_core.c
> +++ b/kernel/liveupdate/luo_core.c
> @@ -116,6 +116,10 @@ static int __init luo_early_startup(void)
>  	pr_info("Retrieved live update data, liveupdate number: %lld\n",
>  		luo_global.liveupdate_num);
>  
> +	err = luo_session_setup_incoming(luo_global.fdt_in);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }
>  
> @@ -149,6 +153,7 @@ static int __init luo_fdt_setup(void)
>  	err |= fdt_begin_node(fdt_out, "");
>  	err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
>  	err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
> +	err |= luo_session_setup_outgoing(fdt_out);
>  	err |= fdt_end_node(fdt_out);
>  	err |= fdt_finish(fdt_out);
>  	if (err)
> @@ -202,6 +207,10 @@ int liveupdate_reboot(void)
>  	if (!liveupdate_enabled())
>  		return 0;
>  
> +	err = luo_session_serialize();
> +	if (err)
> +		return err;
> +
>  	err = kho_finalize();
>  	if (err) {
>  		pr_err("kho_finalize failed %d\n", err);
> diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> index 29f47a69be0b..b4f2d1443c76 100644
> --- a/kernel/liveupdate/luo_internal.h
> +++ b/kernel/liveupdate/luo_internal.h
> @@ -14,4 +14,43 @@ void *luo_alloc_preserve(size_t size);
>  void luo_free_unpreserve(void *mem, size_t size);
>  void luo_free_restore(void *mem, size_t size);
>  
> +/**
> + * struct luo_session - Represents an active or incoming Live Update session.
> + * @name:       A unique name for this session, used for identification and
> + *              retrieval.
> + * @files_list: An ordered list of files associated with this session, it is
> + *              ordered by preservation time.
> + * @ser:        Pointer to the serialized data for this session.
> + * @count:      A counter tracking the number of files currently stored in the
> + *              @files_xa for this session.

		   ^@files_list

> + * @list:       A list_head member used to link this session into a global list
> + *              of either outgoing (to be preserved) or incoming (restored from
> + *              previous kernel) sessions.
> + * @retrieved:  A boolean flag indicating whether this session has been
> + *              retrieved by a consumer in the new kernel.
> + * @mutex:      Session lock, protects files_list, and count.
> + * @files:      The physically contiguous memory block that holds the serialized
> + *              state of files.
> + * @pgcnt:      The number of pages files occupy.

                                      ^ @files

> + */
> +struct luo_session {
> +	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +	struct list_head files_list;
> +	struct luo_session_ser *ser;
> +	long count;
> +	struct list_head list;
> +	bool retrieved;
> +	struct mutex mutex;
> +	struct luo_file_ser *files;
> +	u64 pgcnt;
> +};
> +
> +int luo_session_create(const char *name, struct file **filep);
> +int luo_session_retrieve(const char *name, struct file **filep);
> +int __init luo_session_setup_outgoing(void *fdt);
> +int __init luo_session_setup_incoming(void *fdt);
> +int luo_session_serialize(void);
> +int luo_session_deserialize(void);

The last four deal with all the sessions, maybe use plural in the function
names.

> +bool luo_session_is_deserialized(void);
> +
>  #endif /* _LINUX_LUO_INTERNAL_H */
> diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
> new file mode 100644
> index 000000000000..a3513118aa74
> --- /dev/null
> +++ b/kernel/liveupdate/luo_session.c
> @@ -0,0 +1,405 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: LUO Sessions
> + *
> + * LUO Sessions provide the core mechanism for grouping and managing `struct
> + * file *` instances that need to be preserved across a kexec-based live
> + * update. Each session acts as a named container for a set of file objects,
> + * allowing a userspace agent to manage the lifecycle of resources critical to a
> + * workload.
> + *
> + * Core Concepts:
> + *
> + * - Named Containers: Sessions are identified by a unique, user-provided name,
> + *   which is used for both creation in the current kernel and retrieval in the
> + *   next kernel.
> + *
> + * - Userspace Interface: Session management is driven from userspace via
> + *   ioctls on /dev/liveupdate.
> + *
> + * - Serialization: Session metadata is preserved using the KHO framework. When
> + *   a live update is triggered via kexec, an array of `struct luo_session_ser`
> + *   is populated and placed in a preserved memory region. An FDT node is also
> + *   created, containing the count of sessions and the physical address of this
> + *   array.
> + *
> + * Session Lifecycle:
> + *
> + * 1.  Creation: A userspace agent calls `luo_session_create()` to create a
> + *     new, empty session and receives a file descriptor for it.
> + *
> + * 2.  Serialization: When the `reboot(LINUX_REBOOT_CMD_KEXEC)` syscall is
> + *     made, `luo_session_serialize()` is called. It iterates through all
> + *     active sessions and writes their metadata into a memory area preserved
> + *     by KHO.
> + *
> + * 3.  Deserialization (in new kernel): After kexec, `luo_session_deserialize()`
> + *     runs, reading the serialized data and creating a list of `struct
> + *     luo_session` objects representing the preserved sessions.
> + *
> + * 4.  Retrieval: A userspace agent in the new kernel can then call
> + *     `luo_session_retrieve()` with a session name to get a new file
> + *     descriptor and access the preserved state.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/libfdt.h>
> +#include <linux/liveupdate.h>
> +#include <linux/liveupdate/abi/luo.h>
> +#include <uapi/linux/liveupdate.h>
> +#include "luo_internal.h"
> +
> +/* 16 4K pages, give space for 819 sessions */
> +#define LUO_SESSION_PGCNT	16ul
> +#define LUO_SESSION_MAX		(((LUO_SESSION_PGCNT << PAGE_SHIFT) -	\
> +		sizeof(struct luo_session_head_ser)) /			\
> +		sizeof(struct luo_session_ser))
> +
> +/**
> + * struct luo_session_head - Head struct for managing LUO sessions.

Head of what? ;-)
Maybe luo_session_list? Or even luo_sessions?

> + * @count:    The number of sessions currently tracked in the @list.
> + * @list:     The head of the linked list of `struct luo_session` instances.
> + * @rwsem:    A read-write semaphore providing synchronized access to the
> + *            session list and other fields in this structure.
> + * @head_ser: The head data of serialization array.

	            ^ header?

> + * @ser:      The serialized session data (an array of
> + *            `struct luo_session_ser`).
> + * @active:   Set to true when first initialized. If previous kernel did not
> + *            send session data, active stays false for incoming.
> + */
> +struct luo_session_head {
> +	long count;
> +	struct list_head list;
> +	struct rw_semaphore rwsem;
> +	struct luo_session_head_ser *head_ser;
> +	struct luo_session_ser *ser;
> +	bool active;
> +};
> +
> +/**
> + * struct luo_session_global - Global container for managing LUO sessions.
> + * @incoming:     The sessions passed from the previous kernel.
> + * @outgoing:     The sessions that are going to be passed to the next kernel.
> + * @deserialized: The sessions have been deserialized once /dev/liveupdate
> + *                has been opened.
> + */
> +struct luo_session_global {
> +	struct luo_session_head incoming;
> +	struct luo_session_head outgoing;
> +	bool deserialized;
> +} luo_session_global;

Should be static. And frankly, I don't think grouping two global variables
into a struct gains much.

static struct luo_sessions luo_sessions_incoming;
static struct luo_sessions luo_sessions_outgoing;

reads clearer to me.

> +
> +static struct luo_session *luo_session_alloc(const char *name)
> +{
> +	struct luo_session *session = kzalloc(sizeof(*session), GFP_KERNEL);
> +
> +	if (!session)
> +		return NULL;
> +
> +	strscpy(session->name, name, sizeof(session->name));
> +	INIT_LIST_HEAD(&session->files_list);
> +	session->count = 0;

I'd move this after mutex_init(), a bit more readable IMHO.

> +	INIT_LIST_HEAD(&session->list);
> +	mutex_init(&session->mutex);
> +
> +	return session;
> +}
> +
> +static void luo_session_free(struct luo_session *session)
> +{
> +	WARN_ON(session->count);
> +	WARN_ON(!list_empty(&session->files_list));
> +	mutex_destroy(&session->mutex);
> +	kfree(session);
> +}
> +
> +static int luo_session_insert(struct luo_session_head *sh,
> +			      struct luo_session *session)
> +{
> +	struct luo_session *it;
> +
> +	guard(rwsem_write)(&sh->rwsem);
> +
> +	/*
> +	 * For outgoing we should make sure there is room in serialization array
> +	 * for new session.
> +	 */
> +	if (sh == &luo_session_global.outgoing) {
> +		if (sh->count == LUO_SESSION_MAX)
> +			return -ENOMEM;
> +	}

Not a big deal, but this could be outside the guard().

> +
> +	/*
> +	 * For small number of sessions this loop won't hurt performance
> +	 * but if we ever start using a lot of sessions, this might
> +	 * become a bottle neck during deserialization time, as it would
> +	 * cause O(n*n) complexity.
> +	 */

The loop is always O(n*n) in the worst case, no matter how many sessions
there are ;-)

> +	list_for_each_entry(it, &sh->list, list) {
> +		if (!strncmp(it->name, session->name, sizeof(it->name)))
> +			return -EEXIST;
> +	}
> +	list_add_tail(&session->list, &sh->list);
> +	sh->count++;
> +
> +	return 0;
> +}
> +
> +static void luo_session_remove(struct luo_session_head *sh,
> +			       struct luo_session *session)
> +{
> +	guard(rwsem_write)(&sh->rwsem);
> +	list_del(&session->list);
> +	sh->count--;
> +}
> +
> +static int luo_session_release(struct inode *inodep, struct file *filep)
> +{
> +	struct luo_session *session = filep->private_data;
> +	struct luo_session_head *sh;
> +
> +	/* If retrieved is set, it means this session is from incoming list */
> +	if (session->retrieved)
> +		sh = &luo_session_global.incoming;
> +	else
> +		sh = &luo_session_global.outgoing;

Maybe just add a backpointer to the list to struct luo_session?

> +
> +	luo_session_remove(sh, session);
> +	luo_session_free(session);
> +
> +	return 0;
> +}
> +
> +static const struct file_operations luo_session_fops = {
> +	.owner = THIS_MODULE,
> +	.release = luo_session_release,
> +};
> +
> +/* Create a "struct file" for session */
> +static int luo_session_getfile(struct luo_session *session, struct file **filep)
> +{
> +	char name_buf[128];
> +	struct file *file;
> +
> +	guard(mutex)(&session->mutex);
> +	snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
> +	file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	*filep = file;
> +
> +	return 0;
> +}
> +
> +int luo_session_create(const char *name, struct file **filep)
> +{
> +	struct luo_session *session;
> +	int err;
> +
> +	session = luo_session_alloc(name);
> +	if (!session)
> +		return -ENOMEM;
> +
> +	err = luo_session_insert(&luo_session_global.outgoing, session);
> +	if (err) {
> +		luo_session_free(session);
> +		return err;

Please goto err_free

> +	}
> +
> +	err = luo_session_getfile(session, filep);
> +	if (err) {
> +		luo_session_remove(&luo_session_global.outgoing, session);
> +		luo_session_free(session);

and goto err_remove

> +	}
> +
> +	return err;
> +}
> +
> +int luo_session_retrieve(const char *name, struct file **filep)
> +{
> +	struct luo_session_head *sh = &luo_session_global.incoming;
> +	struct luo_session *session = NULL;
> +	struct luo_session *it;
> +	int err;
> +
> +	scoped_guard(rwsem_read, &sh->rwsem) {
> +		list_for_each_entry(it, &sh->list, list) {
> +			if (!strncmp(it->name, name, sizeof(it->name))) {
> +				session = it;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (!session)
> +		return -ENOENT;
> +
> +	scoped_guard(mutex, &session->mutex) {
> +		if (session->retrieved)
> +			return -EINVAL;
> +	}
> +
> +	err = luo_session_getfile(session, filep);
> +	if (!err) {
> +		scoped_guard(mutex, &session->mutex)
> +			session->retrieved = true;
> +	}
> +
> +	return err;
> +}
> +
> +int __init luo_session_setup_outgoing(void *fdt_out)
> +{
> +	struct luo_session_head_ser *head_ser;
> +	u64 head_ser_pa;
> +	int err;
> +
> +	head_ser = luo_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
> +	if (IS_ERR(head_ser))
> +		return PTR_ERR(head_ser);
> +	head_ser_pa = __pa(head_ser);

virt_to_phys please

> +
> +	err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
> +	err |= fdt_property_string(fdt_out, "compatible",
> +				   LUO_FDT_SESSION_COMPATIBLE);
> +	err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEAD, &head_ser_pa,
> +			    sizeof(head_ser_pa));
> +	err |= fdt_end_node(fdt_out);
> +
> +	if (err)
> +		goto err_unpreserve;
> +
> +	head_ser->pgcnt = LUO_SESSION_PGCNT;
> +	INIT_LIST_HEAD(&luo_session_global.outgoing.list);
> +	init_rwsem(&luo_session_global.outgoing.rwsem);
> +	luo_session_global.outgoing.head_ser = head_ser;
> +	luo_session_global.outgoing.ser = (void *)(head_ser + 1);
> +	luo_session_global.outgoing.active = true;
> +
> +	return 0;
> +
> +err_unpreserve:
> +	luo_free_unpreserve(head_ser, LUO_SESSION_PGCNT << PAGE_SHIFT);
> +	return err;
> +}
> +
> +int __init luo_session_setup_incoming(void *fdt_in)
> +{
> +	struct luo_session_head_ser *head_ser;
> +	int err, head_size, offset;
> +	const void *ptr;
> +	u64 head_ser_pa;
> +
> +	offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
> +	if (offset < 0) {
> +		pr_err("Unable to get session node: [%s]\n",
> +		       LUO_FDT_SESSION_NODE_NAME);
> +		return -EINVAL;
> +	}
> +
> +	err = fdt_node_check_compatible(fdt_in, offset,
> +					LUO_FDT_SESSION_COMPATIBLE);
> +	if (err) {
> +		pr_err("Session node incompatibale [%s]\n",
> +		       LUO_FDT_SESSION_COMPATIBLE);
> +		return -EINVAL;
> +	}
> +
> +	head_size = 0;
> +	ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEAD, &head_size);
> +	if (!ptr || head_size != sizeof(u64)) {
> +		pr_err("Unable to get session head '%s' [%d]\n",
> +		       LUO_FDT_SESSION_HEAD, head_size);
> +		return -EINVAL;
> +	}
> +
> +	memcpy(&head_ser_pa, ptr, sizeof(u64));
> +	head_ser = __va(head_ser_pa);
> +
> +	luo_session_global.incoming.head_ser = head_ser;
> +	luo_session_global.incoming.ser = (void *)(head_ser + 1);
> +	INIT_LIST_HEAD(&luo_session_global.incoming.list);
> +	init_rwsem(&luo_session_global.incoming.rwsem);
> +	luo_session_global.incoming.active = true;
> +
> +	return 0;
> +}
> +
> +bool luo_session_is_deserialized(void)
> +{
> +	return luo_session_global.deserialized;
> +}
> +
> +int luo_session_deserialize(void)
> +{
> +	struct luo_session_head *sh = &luo_session_global.incoming;
> +
> +	if (luo_session_is_deserialized())
> +		return 0;
> +
> +	luo_session_global.deserialized = true;

Shouldn't this be set after deserialization succeeded?

> +	if (!sh->active) {
> +		INIT_LIST_HEAD(&sh->list);
> +		init_rwsem(&sh->rwsem);
> +		return 0;
> +	}
> +
> +	for (int i = 0; i < sh->head_ser->count; i++) {
> +		struct luo_session *session;
> +
> +		session = luo_session_alloc(sh->ser[i].name);
> +		if (!session) {
> +			pr_warn("Failed to allocate session [%s] during deserialization\n",
> +				sh->ser[i].name);
> +			return -ENOMEM;
> +		}
> +
> +		if (luo_session_insert(sh, session)) {
> +			pr_warn("Failed to insert session due to name conflict [%s]\n",
> +				session->name);
> +			return -EEXIST;

Need to free allocated sessions if an insert fails.

> +		}
> +
> +		session->count = sh->ser[i].count;
> +		session->files = __va(sh->ser[i].files);
> +		session->pgcnt = sh->ser[i].pgcnt;
> +	}
> +
> +	luo_free_restore(sh->head_ser, sh->head_ser->pgcnt << PAGE_SHIFT);
> +	sh->head_ser = NULL;
> +	sh->ser = NULL;
> +
> +	return 0;
> +}
> +
> +int luo_session_serialize(void)
> +{
> +	struct luo_session_head *sh = &luo_session_global.outgoing;
> +	struct luo_session *session;
> +	int i = 0;
> +
> +	guard(rwsem_write)(&sh->rwsem);
> +	list_for_each_entry(session, &sh->list, list) {
> +		strscpy(sh->ser[i].name, session->name,
> +			sizeof(sh->ser[i].name));
> +		sh->ser[i].count = session->count;
> +		sh->ser[i].files = __pa(session->files);
> +		sh->ser[i].pgcnt = session->pgcnt;
> +		i++;
> +	}
> +	sh->head_ser->count = sh->count;
> +
> +	return 0;
> +}
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 03/22] reboot: call liveupdate_reboot() before kexec
From: Mike Rapoport @ 2025-11-14 11:30 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-4-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:01PM -0500, Pasha Tatashin wrote:
> Modify the reboot() syscall handler in kernel/reboot.c to call
> liveupdate_reboot() when processing the LINUX_REBOOT_CMD_KEXEC
> command.
> 
> This ensures that the Live Update Orchestrator is notified just
> before the kernel executes the kexec jump. The liveupdate_reboot()
> function triggers the final freeze event, allowing participating
> FDs perform last-minute check or state saving within the blackout
> window.
> 
> The call is placed immediately before kernel_kexec() to ensure LUO
> finalization happens at the latest possible moment before the kernel
> transition.
> 
> If liveupdate_reboot() returns an error (indicating a failure during
> LUO finalization), the kexec operation is aborted to prevent proceeding
> with an inconsistent state.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  kernel/reboot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index ec087827c85c..bdeb04a773db 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -13,6 +13,7 @@
>  #include <linux/kexec.h>
>  #include <linux/kmod.h>
>  #include <linux/kmsg_dump.h>
> +#include <linux/liveupdate.h>
>  #include <linux/reboot.h>
>  #include <linux/suspend.h>
>  #include <linux/syscalls.h>
> @@ -797,6 +798,9 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  
>  #ifdef CONFIG_KEXEC_CORE
>  	case LINUX_REBOOT_CMD_KEXEC:
> +		ret = liveupdate_reboot();
> +		if (ret)
> +			break;

As we discussed elsewhere, let's move the call to liveupdate_reboot() to
kernel_kexec().

>  		ret = kernel_kexec();
>  		break;
>  #endif
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-14 11:29 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-3-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:00PM -0500, Pasha Tatashin wrote:
> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
> 
> When LUO is transitioned to a "prepared" state, it tells KHO to
> finalize, so all memory segments that were added to KHO preservation
> list are getting preserved. After "Prepared" state no new segments
> can be preserved. If LUO is canceled, it also tells KHO to cancel the
> serialization, and therefore, later LUO can go back into the prepared
> state.
> 
> This patch introduces the following changes:
> - During the KHO finalization phase allocate FDT blob.
> - Populate this FDT with a LUO compatibility string ("luo-v1").
> 
> LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> logic (`luo_do_*_calls`) remains unimplemented in this patch.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate.h         |   6 +
>  include/linux/liveupdate/abi/luo.h |  54 +++++++
>  kernel/liveupdate/luo_core.c       | 243 ++++++++++++++++++++++++++++-
>  kernel/liveupdate/luo_internal.h   |  17 ++
>  mm/mm_init.c                       |   4 +
>  5 files changed, 323 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/liveupdate/abi/luo.h
>  create mode 100644 kernel/liveupdate/luo_internal.h
> 
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 730b76625fec..0be8804fc42a 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -13,6 +13,8 @@
>  
>  #ifdef CONFIG_LIVEUPDATE
>  
> +void __init liveupdate_init(void);
> +
>  /* Return true if live update orchestrator is enabled */
>  bool liveupdate_enabled(void);
>  
> @@ -21,6 +23,10 @@ int liveupdate_reboot(void);
>  
>  #else /* CONFIG_LIVEUPDATE */
>  
> +static inline void liveupdate_init(void)
> +{
> +}

The common practice is to place brackets at the same line with function
declaration.

...

> +static int __init luo_early_startup(void)
> +{
> +	phys_addr_t fdt_phys;
> +	int err, ln_size;
> +	const void *ptr;
> +
> +	if (!kho_is_enabled()) {
> +		if (liveupdate_enabled())
> +			pr_warn("Disabling liveupdate because KHO is disabled\n");
> +		luo_global.enabled = false;
> +		return 0;
> +	}
> +
> +	/* Retrieve LUO subtree, and verify its format. */
> +	err = kho_retrieve_subtree(LUO_FDT_KHO_ENTRY_NAME, &fdt_phys);
> +	if (err) {
> +		if (err != -ENOENT) {
> +			pr_err("failed to retrieve FDT '%s' from KHO: %pe\n",
> +			       LUO_FDT_KHO_ENTRY_NAME, ERR_PTR(err));
> +			return err;
> +		}
> +
> +		return 0;
> +	}
> +
> +	luo_global.fdt_in = __va(fdt_phys);

phys_to_virt is clearer, isn't it?

> +	err = fdt_node_check_compatible(luo_global.fdt_in, 0,
> +					LUO_FDT_COMPATIBLE);

...

> +void __init liveupdate_init(void)
> +{
> +	int err;
> +
> +	err = luo_early_startup();
> +	if (err) {
> +		pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> +		       ERR_PTR(err));
> +		luo_global.enabled = false;
> +	}
> +}
> +
> +/* Called during boot to create LUO fdt tree */

			 ^ create outgoing

> +static int __init luo_late_startup(void)
> +{
> +	int err;
> +
> +	if (!liveupdate_enabled())
> +		return 0;
> +
> +	err = luo_fdt_setup();
> +	if (err)
> +		luo_global.enabled = false;
> +
> +	return err;
> +}
> +late_initcall(luo_late_startup);

It would be nice to have a comment explaining why late_initcall() is fine
and why there's no need to initialize the outgoing fdt earlier.

> +/**
> + * luo_alloc_preserve - Allocate, zero, and preserve memory.

I think this and the "free" counterparts would be useful for any KHO users,
even those that don't need LUO.

> + * @size: The number of bytes to allocate.
> + *
> + * Allocates a physically contiguous block of zeroed pages that is large
> + * enough to hold @size bytes. The allocated memory is then registered with
> + * KHO for preservation across a kexec.
> + *
> + * Note: The actual allocated size will be rounded up to the nearest
> + * power-of-two page boundary.
> + *
> + * @return A virtual pointer to the allocated and preserved memory on success,
> + * or an ERR_PTR() encoded error on failure.
> + */
> +void *luo_alloc_preserve(size_t size)
> +{
> +	struct folio *folio;
> +	int order, ret;
> +
> +	if (!size)
> +		return ERR_PTR(-EINVAL);
> +
> +	order = get_order(size);
> +	if (order > MAX_PAGE_ORDER)
> +		return ERR_PTR(-E2BIG);

High order allocations would likely fail or at least cause a heavy reclaim.
For now it seems that we won't be needing really large contiguous chunks so
maybe limiting this to PAGE_ALLOC_COSTLY_ORDER?

Later if we'd need higher order allocations we can try to allocate with
__GFP_NORETRY or __GFP_RETRY_MAYFAIL with a fallback to vmalloc.

> +
> +	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
> +	if (!folio)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = kho_preserve_folio(folio);
> +	if (ret) {
> +		folio_put(folio);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return folio_address(folio);
> +}
> +

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* RE: RFC: Serial port DTR/RTS - O_<something>
From: Maarten Brock @ 2025-11-14 10:26 UTC (permalink / raw)
  To: H. Peter Anvin, Greg KH
  Cc: Theodore Ts'o, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <14b1bc5c-83ac-431f-a53b-14872024b969@zytor.com>

> > A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> > the simplest?
> >
> 
> Okay, to I'm going to toss out a couple suggestions for naming:
> 
> 	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
> 	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
> 	O_NO?TOUCH
> 	O_NYET ("not yet")
> 
> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
> or O_NYET; although it is perhaps a bit more "use case centric" than "what
> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
> would seem to needlessly preclude it being used for future similar use cases
> for files that are not device nodes.
> 
> O_NYET ("not yet") is kind of attractive because it has some geekish smirk
> value, doesn't have "obvious enough" meaning that if you don't know what it
> does you'll guess rather than looking it up, but once you know you are not
> going to forget it!  There is even precedent: USB 2 already has the NYET
> packet type meaning just "not yet".  The more I'm thinking about it the more
> am starting to like it...

Personally, I don't much like the O_NYET as it seems to describe not to open
the device.

> Many of the other combinations have the problem of seeming to do the opposite
> of what the used wants in some use cases; it seems rather odd to open a device
> node that you are intending to configure with "O_NOCONFIG".

Don't like this one either.

> On the other
> hand, "O_CONFIG" might be a valid indication of the intent (like O_RDONLY or
> O_RDWR are indicator of intent), but also has the implication that it *will*
> cause the device to configure itself.  It also would seem to imply that the
> resulting file descriptor can *only* be used for that purpose.

I do like the O_CONFIG or O_FORCONFIG names.
I also like O_PREINIT or O_PRESTART.

Kind Regards,
Maarten


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-13 22:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111227-equipment-magnetism-1443@gregkh>

On 2025-11-12 11:39, Greg KH wrote:
>>
>> 1. Opening a device for configuration as opposed to data streaming; in the tty case that doesn't just improve the DTR# and RTS# issue but allows setserial, configuring line disciplines and so on.
>>
>> As I have said, this is application-specific intent, which is why I strongly believe that it needs to be part of the open system call. I furthermore believe that it would have use cases beyond ttys and serial ports, which is why I'm proposing a new open flag as opposed to a sysfs attribute, which actually was my initial approach (yes, I have already prototyped some of this, and as referenced before there is an existing patchset that was never merged.)
> 
> I think this is going to be the most difficult.  I don't remember why I
> rejected the old submission, but maybe it would have modified the
> existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> the simplest?
> 

Okay, to I'm going to toss out a couple suggestions for naming:

	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
	O_NO?TOUCH
	O_NYET ("not yet")
	
I think my personal preference at the moment is either O_NYET or O_PRECONFIG
or O_NYET; although it is perhaps a bit more "use case centric" than "what
actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
would seem to needlessly preclude it being used for future similar use cases
for files that are not device nodes.

O_NYET ("not yet") is kind of attractive because it has some geekish smirk
value, doesn't have "obvious enough" meaning that if you don't know what it
does you'll guess rather than looking it up, but once you know you are not
going to forget it!  There is even precedent: USB 2 already has the NYET
packet type meaning just "not yet".  The more I'm thinking about it the more
am starting to like it...

Many of the other combinations have the problem of seeming to do the opposite
of what the used wants in some use cases; it seems rather odd to open a device
node that you are intending to configure with "O_NOCONFIG".  On the other
hand, "O_CONFIG" might be a valid indication of the intent (like O_RDONLY or
O_RDWR are indicator of intent), but also has the implication that it *will*
cause the device to configure itself.  It also would seem to imply that the
resulting file descriptor can *only* be used for that purpose.

	-hpa


^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-13 18:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRYH_Ugp1IiUQdlM@kernel.org>

On Thu, Nov 13, 2025 at 11:32 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Wed, Nov 12, 2025 at 09:58:27AM -0500, Pasha Tatashin wrote:
> > On Wed, Nov 12, 2025 at 8:25 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > Hi Pasha,
> > >
> > > On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> > > > Hi Mike,
> > > >
> > > > Thank you for review, my comments below:
> > > >
> > > > > > This is why this call is placed first in reboot(), before any
> > > > > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > > > > an allocation problem occurs in KHO, the error is simply reported back
> > > > > > to userspace, and the live update update is safely aborted.
> > >
> > > The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
> > > move it there?
> >
> > Yes, I can move that call into kernel_kexec().
> >
> > > And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
> > > the error value before returning it to userspace. Why kernel_kexec() can't
> > > do the same?
> >
> > We could do that. It would look something like this:
> >
> > if (liveupdate_enabled())
> >    kho_finalize();
> >
> > Because we want to do kho_finalize() from kernel_kexec only when we do
> > live update.
> >
> > > > > This is fine. But what I don't like is that we can't use kho without
> > > > > liveupdate. We are making debugfs optional, we have a way to call
> >
> > This is exactly the fix I proposed:
> >
> > 1. When live-update is enabled, always disable "finalize" debugfs API.
> > 2. When live-update is disabled, always enable "finalize" debugfs API.
>
> I don't mind the concept, what I do mind is sprinkling liveupdate_enabled()
> in KHO.

Sure, let's just unconditionally do kho_fill_kimage().

> How about we kill debugfs/kho/out/abort and make kho_finalize() overwrite
> an existing FDT if there was any?
>
> Abort was required to allow rollback for subsystems that had kho notifiers,
> but now notifiers are gone and kho_abort() only frees the memory
> serialization data. I don't see an issue with kho_finalize() from debugfs
> being a tad slower because of a call to kho_abort() and the liveupdate path
> anyway won't incur that penalty.

Sounds good to me.

> > > KHO should not call into liveupdate. That's layering violation.
> > > And "stateless KHO" does not really make it stateless, it only removes the
> > > memory serialization from kho_finalize(), but it's still required to pack
> > > the FDT.
> >
> > This touches on a point I've raised in the KHO sync meetings: to be
> > effective, the "stateless KHO" work must also make subtree add/remove
> > stateless. There should not be a separate "finalize" state just to
> > finish the FDT. The KHO FDT is tiny (only one page), and there are
> > only a handful of subtrees. Adding and removing subtrees is cheap; we
> > should be able to open FDT, modify it, and finish FDT on every
> > operation. There's no need for a special finalization state at kexec
> > time. KHO should be totally stateless.
>
> And as the first step we can drop 'if (!kho_out.finalized)' from
> kho_fill_kimage(). We might need to massage the check for valid FDT in
> kho_populate() to avoid unnecessary noise, but largely there's no issue
> with always passing KHO data in kimage.

Sounds good, let me work on this patch.

Pasha

^ permalink raw reply

* Re: [PATCH v5 18/22] docs: add documentation for memfd preservation via LUO
From: Pratyush Yadav @ 2025-11-13 16:59 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bBmSD_YftJ-9w1zidLz2=a4NynnLz_gLPsScF145bu5dQ@mail.gmail.com>

On Thu, Nov 13 2025, Pasha Tatashin wrote:

>> +Limitations
>> +===========
>> +
>> +The current implementation has the following limitations:
>> +
>> +Size
>> +  Currently the size of the file is limited by the size of the FDT. The FDT can
>> +  be at of most ``MAX_PAGE_ORDER`` order. By default this is 4 MiB with 4K
>> +  pages. Each page in the file is tracked using 16 bytes. This limits the
>> +  maximum size of the file to 1 GiB.
>
> The above should be removed, as we are using KHO vmalloc that resolves
> this limitation. Pratyush, I suggest for v6 let's move memfd
> documnetation right into the code: memfd_luo.c and
> liveupdate/abi/memfd.h, and source it from there.

ACK. I think the section on behavior in different phases is also out of
date now, and the serialization format too. The format is more
accurately defined in include/linux/liveupdate/abi/memfd.h. So this
documentation needs an overhaul.

I don't mind moving it to the code and including it in the HTML docs via
kernel-doc. Will do that for the next revision.

>
> Keeping documentation with the code helps reduce code/doc divergence.
>
> Pasha

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v5 18/22] docs: add documentation for memfd preservation via LUO
From: Pasha Tatashin @ 2025-11-13 16:55 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-19-pasha.tatashin@soleen.com>

> +Limitations
> +===========
> +
> +The current implementation has the following limitations:
> +
> +Size
> +  Currently the size of the file is limited by the size of the FDT. The FDT can
> +  be at of most ``MAX_PAGE_ORDER`` order. By default this is 4 MiB with 4K
> +  pages. Each page in the file is tracked using 16 bytes. This limits the
> +  maximum size of the file to 1 GiB.

The above should be removed, as we are using KHO vmalloc that resolves
this limitation. Pratyush, I suggest for v6 let's move memfd
documnetation right into the code: memfd_luo.c and
liveupdate/abi/memfd.h, and source it from there.

Keeping documentation with the code helps reduce code/doc divergence.

Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-13 16:31 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bAq-0Vz4jSRWnb_ut9AqG3RcH67JQj76GhoH0BaspWs2A@mail.gmail.com>

On Wed, Nov 12, 2025 at 09:58:27AM -0500, Pasha Tatashin wrote:
> On Wed, Nov 12, 2025 at 8:25 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi Pasha,
> >
> > On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> > > Hi Mike,
> > >
> > > Thank you for review, my comments below:
> > >
> > > > > This is why this call is placed first in reboot(), before any
> > > > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > > > an allocation problem occurs in KHO, the error is simply reported back
> > > > > to userspace, and the live update update is safely aborted.
> >
> > The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
> > move it there?
> 
> Yes, I can move that call into kernel_kexec().
> 
> > And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
> > the error value before returning it to userspace. Why kernel_kexec() can't
> > do the same?
> 
> We could do that. It would look something like this:
> 
> if (liveupdate_enabled())
>    kho_finalize();
> 
> Because we want to do kho_finalize() from kernel_kexec only when we do
> live update.
> 
> > > > This is fine. But what I don't like is that we can't use kho without
> > > > liveupdate. We are making debugfs optional, we have a way to call
> 
> This is exactly the fix I proposed:
> 
> 1. When live-update is enabled, always disable "finalize" debugfs API.
> 2. When live-update is disabled, always enable "finalize" debugfs API.

I don't mind the concept, what I do mind is sprinkling liveupdate_enabled()
in KHO.

How about we kill debugfs/kho/out/abort and make kho_finalize() overwrite
an existing FDT if there was any? 

Abort was required to allow rollback for subsystems that had kho notifiers,
but now notifiers are gone and kho_abort() only frees the memory
serialization data. I don't see an issue with kho_finalize() from debugfs
being a tad slower because of a call to kho_abort() and the liveupdate path
anyway won't incur that penalty.

> > KHO should not call into liveupdate. That's layering violation.
> > And "stateless KHO" does not really make it stateless, it only removes the
> > memory serialization from kho_finalize(), but it's still required to pack
> > the FDT.
> 
> This touches on a point I've raised in the KHO sync meetings: to be
> effective, the "stateless KHO" work must also make subtree add/remove
> stateless. There should not be a separate "finalize" state just to
> finish the FDT. The KHO FDT is tiny (only one page), and there are
> only a handful of subtrees. Adding and removing subtrees is cheap; we
> should be able to open FDT, modify it, and finish FDT on every
> operation. There's no need for a special finalization state at kexec
> time. KHO should be totally stateless.

And as the first step we can drop 'if (!kho_out.finalized)' from
kho_fill_kimage(). We might need to massage the check for valid FDT in
kho_populate() to avoid unnecessary noise, but largely there's no issue
with always passing KHO data in kimage.
 
> Thanks,
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox