Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v5 06/22] liveupdate: luo_session: add sessions support
From: Mike Rapoport @ 2025-11-14 12:49 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-7-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:04PM -0500, Pasha Tatashin wrote:
> Introduce concept of "Live Update Sessions" within the LUO framework.
> LUO sessions provide a mechanism to group and manage `struct file *`
> instances (representing file descriptors) that need to be preserved
> across a kexec-based live update.
> 
> Each session is identified by a unique name and acts as a container
> for file objects whose state is critical to a userspace workload, such
> as a virtual machine or a high-performance database, aiming to maintain
> their functionality across a kernel transition.
> 
> This groundwork establishes the framework for preserving file-backed
> state across kernel updates, with the actual file data preservation
> mechanisms to be implemented in subsequent patches.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate/abi/luo.h |  81 ++++++
>  include/uapi/linux/liveupdate.h    |   3 +
>  kernel/liveupdate/Makefile         |   3 +-
>  kernel/liveupdate/luo_core.c       |   9 +
>  kernel/liveupdate/luo_internal.h   |  39 +++
>  kernel/liveupdate/luo_session.c    | 405 +++++++++++++++++++++++++++++
>  6 files changed, 539 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/liveupdate/luo_session.c
> 
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 9483a294287f..37b9fecef3f7 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -28,6 +28,11 @@
>   *     / {
>   *         compatible = "luo-v1";
>   *         liveupdate-number = <...>;
> + *
> + *         luo-session {
> + *             compatible = "luo-session-v1";
> + *             luo-session-head = <phys_addr_of_session_head_ser>;
> + *         };
>   *     };
>   *
>   * Main LUO Node (/):
> @@ -36,11 +41,37 @@
>   *     Identifies the overall LUO ABI version.
>   *   - liveupdate-number: u64
>   *     A counter tracking the number of successful live updates performed.
> + *
> + * Session Node (luo-session):
> + *   This node describes all preserved user-space sessions.
> + *
> + *   - compatible: "luo-session-v1"
> + *     Identifies the session ABI version.
> + *   - luo-session-head: u64
> + *     The physical address of a `struct luo_session_head_ser`. This structure is
> + *     the header for a contiguous block of memory containing an array of
> + *     `struct luo_session_ser`, one for each preserved session.
> + *
> + * Serialization Structures:
> + *   The FDT properties point to memory regions containing arrays of simple,
> + *   `__packed` structures. These structures contain the actual preserved state.
> + *
> + *   - struct luo_session_head_ser:
> + *     Header for the session array. Contains the total page count of the
> + *     preserved memory block and the number of `struct luo_session_ser`
> + *     entries that follow.
> + *
> + *   - struct luo_session_ser:
> + *     Metadata for a single session, including its name and a physical pointer
> + *     to another preserved memory block containing an array of
> + *     `struct luo_file_ser` for all files in that session.
>   */
>  
>  #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
>  #define _LINUX_LIVEUPDATE_ABI_LUO_H
>  
> +#include <uapi/linux/liveupdate.h>
> +
>  /*
>   * The LUO FDT hooks all LUO state for sessions, fds, etc.
>   * In the root it allso carries "liveupdate-number" 64-bit property that
> @@ -51,4 +82,54 @@
>  #define LUO_FDT_COMPATIBLE	"luo-v1"
>  #define LUO_FDT_LIVEUPDATE_NUM	"liveupdate-number"
>  
> +/*
> + * LUO FDT session node
> + * LUO_FDT_SESSION_HEAD:  is a u64 physical address of struct
> + *                        luo_session_head_ser
> + */
> +#define LUO_FDT_SESSION_NODE_NAME	"luo-session"
> +#define LUO_FDT_SESSION_COMPATIBLE	"luo-session-v1"
> +#define LUO_FDT_SESSION_HEAD		"luo-session-head"
> +
> +/**
> + * struct luo_session_head_ser - Header for the serialized session data block.
> + * @pgcnt: The total size, in pages, of the entire preserved memory block
> + *         that this header describes.
> + * @count: The number of 'struct luo_session_ser' entries that immediately
> + *         follow this header in the memory block.
> + *
> + * This structure is located at the beginning of a contiguous block of
> + * physical memory preserved across the kexec. It provides the necessary
> + * metadata to interpret the array of session entries that follow.
> + */
> +struct luo_session_head_ser {
> +	u64 pgcnt;
> +	u64 count;
> +} __packed;
> +
> +/**
> + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> + * @name:    The unique name of the session, copied from the `luo_session`
> + *           structure.
> + * @files:   The physical address of a contiguous memory block that holds
> + *           the serialized state of files.
> + * @pgcnt:   The number of pages occupied by the `files` memory block.
> + * @count:   The total number of files that were part of this session during
> + *           serialization. Used for iteration and validation during
> + *           restoration.
> + *
> + * This structure is used to package session-specific metadata for transfer
> + * between kernels via Kexec Handover. An array of these structures (one per
> + * session) is created and passed to the new kernel, allowing it to reconstruct
> + * the session context.
> + *
> + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> + */
> +struct luo_session_ser {
> +	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +	u64 files;
> +	u64 pgcnt;
> +	u64 count;
> +} __packed;
> +
>  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> index df34c1642c4d..d2ef2f7e0dbd 100644
> --- a/include/uapi/linux/liveupdate.h
> +++ b/include/uapi/linux/liveupdate.h
> @@ -43,4 +43,7 @@
>  /* The ioctl type, documented in ioctl-number.rst */
>  #define LIVEUPDATE_IOCTL_TYPE		0xBA
>  
> +/* The maximum length of session name including null termination */
> +#define LIVEUPDATE_SESSION_NAME_LENGTH 56

Out of curiosity, why 56? :)

> +
>  #endif /* _UAPI_LIVEUPDATE_H */
> diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> index 413722002b7a..83285e7ad726 100644
> --- a/kernel/liveupdate/Makefile
> +++ b/kernel/liveupdate/Makefile
> @@ -2,7 +2,8 @@
>  
>  luo-y :=								\
>  		luo_core.o						\
> -		luo_ioctl.o
> +		luo_ioctl.o						\
> +		luo_session.o
>  
>  obj-$(CONFIG_KEXEC_HANDOVER)		+= kexec_handover.o
>  obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)	+= kexec_handover_debug.o
> diff --git a/kernel/liveupdate/luo_core.c b/kernel/liveupdate/luo_core.c
> index c1bd236bccb0..83257ab93ebb 100644
> --- a/kernel/liveupdate/luo_core.c
> +++ b/kernel/liveupdate/luo_core.c
> @@ -116,6 +116,10 @@ static int __init luo_early_startup(void)
>  	pr_info("Retrieved live update data, liveupdate number: %lld\n",
>  		luo_global.liveupdate_num);
>  
> +	err = luo_session_setup_incoming(luo_global.fdt_in);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }
>  
> @@ -149,6 +153,7 @@ static int __init luo_fdt_setup(void)
>  	err |= fdt_begin_node(fdt_out, "");
>  	err |= fdt_property_string(fdt_out, "compatible", LUO_FDT_COMPATIBLE);
>  	err |= fdt_property(fdt_out, LUO_FDT_LIVEUPDATE_NUM, &ln, sizeof(ln));
> +	err |= luo_session_setup_outgoing(fdt_out);
>  	err |= fdt_end_node(fdt_out);
>  	err |= fdt_finish(fdt_out);
>  	if (err)
> @@ -202,6 +207,10 @@ int liveupdate_reboot(void)
>  	if (!liveupdate_enabled())
>  		return 0;
>  
> +	err = luo_session_serialize();
> +	if (err)
> +		return err;
> +
>  	err = kho_finalize();
>  	if (err) {
>  		pr_err("kho_finalize failed %d\n", err);
> diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> index 29f47a69be0b..b4f2d1443c76 100644
> --- a/kernel/liveupdate/luo_internal.h
> +++ b/kernel/liveupdate/luo_internal.h
> @@ -14,4 +14,43 @@ void *luo_alloc_preserve(size_t size);
>  void luo_free_unpreserve(void *mem, size_t size);
>  void luo_free_restore(void *mem, size_t size);
>  
> +/**
> + * struct luo_session - Represents an active or incoming Live Update session.
> + * @name:       A unique name for this session, used for identification and
> + *              retrieval.
> + * @files_list: An ordered list of files associated with this session, it is
> + *              ordered by preservation time.
> + * @ser:        Pointer to the serialized data for this session.
> + * @count:      A counter tracking the number of files currently stored in the
> + *              @files_xa for this session.

		   ^@files_list

> + * @list:       A list_head member used to link this session into a global list
> + *              of either outgoing (to be preserved) or incoming (restored from
> + *              previous kernel) sessions.
> + * @retrieved:  A boolean flag indicating whether this session has been
> + *              retrieved by a consumer in the new kernel.
> + * @mutex:      Session lock, protects files_list, and count.
> + * @files:      The physically contiguous memory block that holds the serialized
> + *              state of files.
> + * @pgcnt:      The number of pages files occupy.

                                      ^ @files

> + */
> +struct luo_session {
> +	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +	struct list_head files_list;
> +	struct luo_session_ser *ser;
> +	long count;
> +	struct list_head list;
> +	bool retrieved;
> +	struct mutex mutex;
> +	struct luo_file_ser *files;
> +	u64 pgcnt;
> +};
> +
> +int luo_session_create(const char *name, struct file **filep);
> +int luo_session_retrieve(const char *name, struct file **filep);
> +int __init luo_session_setup_outgoing(void *fdt);
> +int __init luo_session_setup_incoming(void *fdt);
> +int luo_session_serialize(void);
> +int luo_session_deserialize(void);

The last four deal with all the sessions, maybe use plural in the function
names.

> +bool luo_session_is_deserialized(void);
> +
>  #endif /* _LINUX_LUO_INTERNAL_H */
> diff --git a/kernel/liveupdate/luo_session.c b/kernel/liveupdate/luo_session.c
> new file mode 100644
> index 000000000000..a3513118aa74
> --- /dev/null
> +++ b/kernel/liveupdate/luo_session.c
> @@ -0,0 +1,405 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +/**
> + * DOC: LUO Sessions
> + *
> + * LUO Sessions provide the core mechanism for grouping and managing `struct
> + * file *` instances that need to be preserved across a kexec-based live
> + * update. Each session acts as a named container for a set of file objects,
> + * allowing a userspace agent to manage the lifecycle of resources critical to a
> + * workload.
> + *
> + * Core Concepts:
> + *
> + * - Named Containers: Sessions are identified by a unique, user-provided name,
> + *   which is used for both creation in the current kernel and retrieval in the
> + *   next kernel.
> + *
> + * - Userspace Interface: Session management is driven from userspace via
> + *   ioctls on /dev/liveupdate.
> + *
> + * - Serialization: Session metadata is preserved using the KHO framework. When
> + *   a live update is triggered via kexec, an array of `struct luo_session_ser`
> + *   is populated and placed in a preserved memory region. An FDT node is also
> + *   created, containing the count of sessions and the physical address of this
> + *   array.
> + *
> + * Session Lifecycle:
> + *
> + * 1.  Creation: A userspace agent calls `luo_session_create()` to create a
> + *     new, empty session and receives a file descriptor for it.
> + *
> + * 2.  Serialization: When the `reboot(LINUX_REBOOT_CMD_KEXEC)` syscall is
> + *     made, `luo_session_serialize()` is called. It iterates through all
> + *     active sessions and writes their metadata into a memory area preserved
> + *     by KHO.
> + *
> + * 3.  Deserialization (in new kernel): After kexec, `luo_session_deserialize()`
> + *     runs, reading the serialized data and creating a list of `struct
> + *     luo_session` objects representing the preserved sessions.
> + *
> + * 4.  Retrieval: A userspace agent in the new kernel can then call
> + *     `luo_session_retrieve()` with a session name to get a new file
> + *     descriptor and access the preserved state.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/libfdt.h>
> +#include <linux/liveupdate.h>
> +#include <linux/liveupdate/abi/luo.h>
> +#include <uapi/linux/liveupdate.h>
> +#include "luo_internal.h"
> +
> +/* 16 4K pages, give space for 819 sessions */
> +#define LUO_SESSION_PGCNT	16ul
> +#define LUO_SESSION_MAX		(((LUO_SESSION_PGCNT << PAGE_SHIFT) -	\
> +		sizeof(struct luo_session_head_ser)) /			\
> +		sizeof(struct luo_session_ser))
> +
> +/**
> + * struct luo_session_head - Head struct for managing LUO sessions.

Head of what? ;-)
Maybe luo_session_list? Or even luo_sessions?

> + * @count:    The number of sessions currently tracked in the @list.
> + * @list:     The head of the linked list of `struct luo_session` instances.
> + * @rwsem:    A read-write semaphore providing synchronized access to the
> + *            session list and other fields in this structure.
> + * @head_ser: The head data of serialization array.

	            ^ header?

> + * @ser:      The serialized session data (an array of
> + *            `struct luo_session_ser`).
> + * @active:   Set to true when first initialized. If previous kernel did not
> + *            send session data, active stays false for incoming.
> + */
> +struct luo_session_head {
> +	long count;
> +	struct list_head list;
> +	struct rw_semaphore rwsem;
> +	struct luo_session_head_ser *head_ser;
> +	struct luo_session_ser *ser;
> +	bool active;
> +};
> +
> +/**
> + * struct luo_session_global - Global container for managing LUO sessions.
> + * @incoming:     The sessions passed from the previous kernel.
> + * @outgoing:     The sessions that are going to be passed to the next kernel.
> + * @deserialized: The sessions have been deserialized once /dev/liveupdate
> + *                has been opened.
> + */
> +struct luo_session_global {
> +	struct luo_session_head incoming;
> +	struct luo_session_head outgoing;
> +	bool deserialized;
> +} luo_session_global;

Should be static. And frankly, I don't think grouping two global variables
into a struct gains much.

static struct luo_sessions luo_sessions_incoming;
static struct luo_sessions luo_sessions_outgoing;

reads clearer to me.

> +
> +static struct luo_session *luo_session_alloc(const char *name)
> +{
> +	struct luo_session *session = kzalloc(sizeof(*session), GFP_KERNEL);
> +
> +	if (!session)
> +		return NULL;
> +
> +	strscpy(session->name, name, sizeof(session->name));
> +	INIT_LIST_HEAD(&session->files_list);
> +	session->count = 0;

I'd move this after mutex_init(), a bit more readable IMHO.

> +	INIT_LIST_HEAD(&session->list);
> +	mutex_init(&session->mutex);
> +
> +	return session;
> +}
> +
> +static void luo_session_free(struct luo_session *session)
> +{
> +	WARN_ON(session->count);
> +	WARN_ON(!list_empty(&session->files_list));
> +	mutex_destroy(&session->mutex);
> +	kfree(session);
> +}
> +
> +static int luo_session_insert(struct luo_session_head *sh,
> +			      struct luo_session *session)
> +{
> +	struct luo_session *it;
> +
> +	guard(rwsem_write)(&sh->rwsem);
> +
> +	/*
> +	 * For outgoing we should make sure there is room in serialization array
> +	 * for new session.
> +	 */
> +	if (sh == &luo_session_global.outgoing) {
> +		if (sh->count == LUO_SESSION_MAX)
> +			return -ENOMEM;
> +	}

Not a big deal, but this could be outside the guard().

> +
> +	/*
> +	 * For small number of sessions this loop won't hurt performance
> +	 * but if we ever start using a lot of sessions, this might
> +	 * become a bottle neck during deserialization time, as it would
> +	 * cause O(n*n) complexity.
> +	 */

The loop is always O(n*n) in the worst case, no matter how many sessions
there are ;-)

> +	list_for_each_entry(it, &sh->list, list) {
> +		if (!strncmp(it->name, session->name, sizeof(it->name)))
> +			return -EEXIST;
> +	}
> +	list_add_tail(&session->list, &sh->list);
> +	sh->count++;
> +
> +	return 0;
> +}
> +
> +static void luo_session_remove(struct luo_session_head *sh,
> +			       struct luo_session *session)
> +{
> +	guard(rwsem_write)(&sh->rwsem);
> +	list_del(&session->list);
> +	sh->count--;
> +}
> +
> +static int luo_session_release(struct inode *inodep, struct file *filep)
> +{
> +	struct luo_session *session = filep->private_data;
> +	struct luo_session_head *sh;
> +
> +	/* If retrieved is set, it means this session is from incoming list */
> +	if (session->retrieved)
> +		sh = &luo_session_global.incoming;
> +	else
> +		sh = &luo_session_global.outgoing;

Maybe just add a backpointer to the list to struct luo_session?

> +
> +	luo_session_remove(sh, session);
> +	luo_session_free(session);
> +
> +	return 0;
> +}
> +
> +static const struct file_operations luo_session_fops = {
> +	.owner = THIS_MODULE,
> +	.release = luo_session_release,
> +};
> +
> +/* Create a "struct file" for session */
> +static int luo_session_getfile(struct luo_session *session, struct file **filep)
> +{
> +	char name_buf[128];
> +	struct file *file;
> +
> +	guard(mutex)(&session->mutex);
> +	snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
> +	file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	*filep = file;
> +
> +	return 0;
> +}
> +
> +int luo_session_create(const char *name, struct file **filep)
> +{
> +	struct luo_session *session;
> +	int err;
> +
> +	session = luo_session_alloc(name);
> +	if (!session)
> +		return -ENOMEM;
> +
> +	err = luo_session_insert(&luo_session_global.outgoing, session);
> +	if (err) {
> +		luo_session_free(session);
> +		return err;

Please goto err_free

> +	}
> +
> +	err = luo_session_getfile(session, filep);
> +	if (err) {
> +		luo_session_remove(&luo_session_global.outgoing, session);
> +		luo_session_free(session);

and goto err_remove

> +	}
> +
> +	return err;
> +}
> +
> +int luo_session_retrieve(const char *name, struct file **filep)
> +{
> +	struct luo_session_head *sh = &luo_session_global.incoming;
> +	struct luo_session *session = NULL;
> +	struct luo_session *it;
> +	int err;
> +
> +	scoped_guard(rwsem_read, &sh->rwsem) {
> +		list_for_each_entry(it, &sh->list, list) {
> +			if (!strncmp(it->name, name, sizeof(it->name))) {
> +				session = it;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (!session)
> +		return -ENOENT;
> +
> +	scoped_guard(mutex, &session->mutex) {
> +		if (session->retrieved)
> +			return -EINVAL;
> +	}
> +
> +	err = luo_session_getfile(session, filep);
> +	if (!err) {
> +		scoped_guard(mutex, &session->mutex)
> +			session->retrieved = true;
> +	}
> +
> +	return err;
> +}
> +
> +int __init luo_session_setup_outgoing(void *fdt_out)
> +{
> +	struct luo_session_head_ser *head_ser;
> +	u64 head_ser_pa;
> +	int err;
> +
> +	head_ser = luo_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
> +	if (IS_ERR(head_ser))
> +		return PTR_ERR(head_ser);
> +	head_ser_pa = __pa(head_ser);

virt_to_phys please

> +
> +	err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
> +	err |= fdt_property_string(fdt_out, "compatible",
> +				   LUO_FDT_SESSION_COMPATIBLE);
> +	err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEAD, &head_ser_pa,
> +			    sizeof(head_ser_pa));
> +	err |= fdt_end_node(fdt_out);
> +
> +	if (err)
> +		goto err_unpreserve;
> +
> +	head_ser->pgcnt = LUO_SESSION_PGCNT;
> +	INIT_LIST_HEAD(&luo_session_global.outgoing.list);
> +	init_rwsem(&luo_session_global.outgoing.rwsem);
> +	luo_session_global.outgoing.head_ser = head_ser;
> +	luo_session_global.outgoing.ser = (void *)(head_ser + 1);
> +	luo_session_global.outgoing.active = true;
> +
> +	return 0;
> +
> +err_unpreserve:
> +	luo_free_unpreserve(head_ser, LUO_SESSION_PGCNT << PAGE_SHIFT);
> +	return err;
> +}
> +
> +int __init luo_session_setup_incoming(void *fdt_in)
> +{
> +	struct luo_session_head_ser *head_ser;
> +	int err, head_size, offset;
> +	const void *ptr;
> +	u64 head_ser_pa;
> +
> +	offset = fdt_subnode_offset(fdt_in, 0, LUO_FDT_SESSION_NODE_NAME);
> +	if (offset < 0) {
> +		pr_err("Unable to get session node: [%s]\n",
> +		       LUO_FDT_SESSION_NODE_NAME);
> +		return -EINVAL;
> +	}
> +
> +	err = fdt_node_check_compatible(fdt_in, offset,
> +					LUO_FDT_SESSION_COMPATIBLE);
> +	if (err) {
> +		pr_err("Session node incompatibale [%s]\n",
> +		       LUO_FDT_SESSION_COMPATIBLE);
> +		return -EINVAL;
> +	}
> +
> +	head_size = 0;
> +	ptr = fdt_getprop(fdt_in, offset, LUO_FDT_SESSION_HEAD, &head_size);
> +	if (!ptr || head_size != sizeof(u64)) {
> +		pr_err("Unable to get session head '%s' [%d]\n",
> +		       LUO_FDT_SESSION_HEAD, head_size);
> +		return -EINVAL;
> +	}
> +
> +	memcpy(&head_ser_pa, ptr, sizeof(u64));
> +	head_ser = __va(head_ser_pa);
> +
> +	luo_session_global.incoming.head_ser = head_ser;
> +	luo_session_global.incoming.ser = (void *)(head_ser + 1);
> +	INIT_LIST_HEAD(&luo_session_global.incoming.list);
> +	init_rwsem(&luo_session_global.incoming.rwsem);
> +	luo_session_global.incoming.active = true;
> +
> +	return 0;
> +}
> +
> +bool luo_session_is_deserialized(void)
> +{
> +	return luo_session_global.deserialized;
> +}
> +
> +int luo_session_deserialize(void)
> +{
> +	struct luo_session_head *sh = &luo_session_global.incoming;
> +
> +	if (luo_session_is_deserialized())
> +		return 0;
> +
> +	luo_session_global.deserialized = true;

Shouldn't this be set after deserialization succeeded?

> +	if (!sh->active) {
> +		INIT_LIST_HEAD(&sh->list);
> +		init_rwsem(&sh->rwsem);
> +		return 0;
> +	}
> +
> +	for (int i = 0; i < sh->head_ser->count; i++) {
> +		struct luo_session *session;
> +
> +		session = luo_session_alloc(sh->ser[i].name);
> +		if (!session) {
> +			pr_warn("Failed to allocate session [%s] during deserialization\n",
> +				sh->ser[i].name);
> +			return -ENOMEM;
> +		}
> +
> +		if (luo_session_insert(sh, session)) {
> +			pr_warn("Failed to insert session due to name conflict [%s]\n",
> +				session->name);
> +			return -EEXIST;

Need to free allocated sessions if an insert fails.

> +		}
> +
> +		session->count = sh->ser[i].count;
> +		session->files = __va(sh->ser[i].files);
> +		session->pgcnt = sh->ser[i].pgcnt;
> +	}
> +
> +	luo_free_restore(sh->head_ser, sh->head_ser->pgcnt << PAGE_SHIFT);
> +	sh->head_ser = NULL;
> +	sh->ser = NULL;
> +
> +	return 0;
> +}
> +
> +int luo_session_serialize(void)
> +{
> +	struct luo_session_head *sh = &luo_session_global.outgoing;
> +	struct luo_session *session;
> +	int i = 0;
> +
> +	guard(rwsem_write)(&sh->rwsem);
> +	list_for_each_entry(session, &sh->list, list) {
> +		strscpy(sh->ser[i].name, session->name,
> +			sizeof(sh->ser[i].name));
> +		sh->ser[i].count = session->count;
> +		sh->ser[i].files = __pa(session->files);
> +		sh->ser[i].pgcnt = session->pgcnt;
> +		i++;
> +	}
> +	sh->head_ser->count = sh->count;
> +
> +	return 0;
> +}
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 03/22] reboot: call liveupdate_reboot() before kexec
From: Mike Rapoport @ 2025-11-14 11:30 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-4-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:01PM -0500, Pasha Tatashin wrote:
> Modify the reboot() syscall handler in kernel/reboot.c to call
> liveupdate_reboot() when processing the LINUX_REBOOT_CMD_KEXEC
> command.
> 
> This ensures that the Live Update Orchestrator is notified just
> before the kernel executes the kexec jump. The liveupdate_reboot()
> function triggers the final freeze event, allowing participating
> FDs perform last-minute check or state saving within the blackout
> window.
> 
> The call is placed immediately before kernel_kexec() to ensure LUO
> finalization happens at the latest possible moment before the kernel
> transition.
> 
> If liveupdate_reboot() returns an error (indicating a failure during
> LUO finalization), the kexec operation is aborted to prevent proceeding
> with an inconsistent state.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  kernel/reboot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index ec087827c85c..bdeb04a773db 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -13,6 +13,7 @@
>  #include <linux/kexec.h>
>  #include <linux/kmod.h>
>  #include <linux/kmsg_dump.h>
> +#include <linux/liveupdate.h>
>  #include <linux/reboot.h>
>  #include <linux/suspend.h>
>  #include <linux/syscalls.h>
> @@ -797,6 +798,9 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>  
>  #ifdef CONFIG_KEXEC_CORE
>  	case LINUX_REBOOT_CMD_KEXEC:
> +		ret = liveupdate_reboot();
> +		if (ret)
> +			break;

As we discussed elsewhere, let's move the call to liveupdate_reboot() to
kernel_kexec().

>  		ret = kernel_kexec();
>  		break;
>  #endif
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-14 11:29 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-3-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:00PM -0500, Pasha Tatashin wrote:
> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
> 
> When LUO is transitioned to a "prepared" state, it tells KHO to
> finalize, so all memory segments that were added to KHO preservation
> list are getting preserved. After "Prepared" state no new segments
> can be preserved. If LUO is canceled, it also tells KHO to cancel the
> serialization, and therefore, later LUO can go back into the prepared
> state.
> 
> This patch introduces the following changes:
> - During the KHO finalization phase allocate FDT blob.
> - Populate this FDT with a LUO compatibility string ("luo-v1").
> 
> LUO now depends on `CONFIG_KEXEC_HANDOVER`. The core state transition
> logic (`luo_do_*_calls`) remains unimplemented in this patch.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate.h         |   6 +
>  include/linux/liveupdate/abi/luo.h |  54 +++++++
>  kernel/liveupdate/luo_core.c       | 243 ++++++++++++++++++++++++++++-
>  kernel/liveupdate/luo_internal.h   |  17 ++
>  mm/mm_init.c                       |   4 +
>  5 files changed, 323 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/liveupdate/abi/luo.h
>  create mode 100644 kernel/liveupdate/luo_internal.h
> 
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 730b76625fec..0be8804fc42a 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -13,6 +13,8 @@
>  
>  #ifdef CONFIG_LIVEUPDATE
>  
> +void __init liveupdate_init(void);
> +
>  /* Return true if live update orchestrator is enabled */
>  bool liveupdate_enabled(void);
>  
> @@ -21,6 +23,10 @@ int liveupdate_reboot(void);
>  
>  #else /* CONFIG_LIVEUPDATE */
>  
> +static inline void liveupdate_init(void)
> +{
> +}

The common practice is to place brackets at the same line with function
declaration.

...

> +static int __init luo_early_startup(void)
> +{
> +	phys_addr_t fdt_phys;
> +	int err, ln_size;
> +	const void *ptr;
> +
> +	if (!kho_is_enabled()) {
> +		if (liveupdate_enabled())
> +			pr_warn("Disabling liveupdate because KHO is disabled\n");
> +		luo_global.enabled = false;
> +		return 0;
> +	}
> +
> +	/* Retrieve LUO subtree, and verify its format. */
> +	err = kho_retrieve_subtree(LUO_FDT_KHO_ENTRY_NAME, &fdt_phys);
> +	if (err) {
> +		if (err != -ENOENT) {
> +			pr_err("failed to retrieve FDT '%s' from KHO: %pe\n",
> +			       LUO_FDT_KHO_ENTRY_NAME, ERR_PTR(err));
> +			return err;
> +		}
> +
> +		return 0;
> +	}
> +
> +	luo_global.fdt_in = __va(fdt_phys);

phys_to_virt is clearer, isn't it?

> +	err = fdt_node_check_compatible(luo_global.fdt_in, 0,
> +					LUO_FDT_COMPATIBLE);

...

> +void __init liveupdate_init(void)
> +{
> +	int err;
> +
> +	err = luo_early_startup();
> +	if (err) {
> +		pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> +		       ERR_PTR(err));
> +		luo_global.enabled = false;
> +	}
> +}
> +
> +/* Called during boot to create LUO fdt tree */

			 ^ create outgoing

> +static int __init luo_late_startup(void)
> +{
> +	int err;
> +
> +	if (!liveupdate_enabled())
> +		return 0;
> +
> +	err = luo_fdt_setup();
> +	if (err)
> +		luo_global.enabled = false;
> +
> +	return err;
> +}
> +late_initcall(luo_late_startup);

It would be nice to have a comment explaining why late_initcall() is fine
and why there's no need to initialize the outgoing fdt earlier.

> +/**
> + * luo_alloc_preserve - Allocate, zero, and preserve memory.

I think this and the "free" counterparts would be useful for any KHO users,
even those that don't need LUO.

> + * @size: The number of bytes to allocate.
> + *
> + * Allocates a physically contiguous block of zeroed pages that is large
> + * enough to hold @size bytes. The allocated memory is then registered with
> + * KHO for preservation across a kexec.
> + *
> + * Note: The actual allocated size will be rounded up to the nearest
> + * power-of-two page boundary.
> + *
> + * @return A virtual pointer to the allocated and preserved memory on success,
> + * or an ERR_PTR() encoded error on failure.
> + */
> +void *luo_alloc_preserve(size_t size)
> +{
> +	struct folio *folio;
> +	int order, ret;
> +
> +	if (!size)
> +		return ERR_PTR(-EINVAL);
> +
> +	order = get_order(size);
> +	if (order > MAX_PAGE_ORDER)
> +		return ERR_PTR(-E2BIG);

High order allocations would likely fail or at least cause a heavy reclaim.
For now it seems that we won't be needing really large contiguous chunks so
maybe limiting this to PAGE_ALLOC_COSTLY_ORDER?

Later if we'd need higher order allocations we can try to allocate with
__GFP_NORETRY or __GFP_RETRY_MAYFAIL with a fallback to vmalloc.

> +
> +	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
> +	if (!folio)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = kho_preserve_folio(folio);
> +	if (ret) {
> +		folio_put(folio);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return folio_address(folio);
> +}
> +

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* RE: RFC: Serial port DTR/RTS - O_<something>
From: Maarten Brock @ 2025-11-14 10:26 UTC (permalink / raw)
  To: H. Peter Anvin, Greg KH
  Cc: Theodore Ts'o, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <14b1bc5c-83ac-431f-a53b-14872024b969@zytor.com>

> > A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> > the simplest?
> >
> 
> Okay, to I'm going to toss out a couple suggestions for naming:
> 
> 	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
> 	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
> 	O_NO?TOUCH
> 	O_NYET ("not yet")
> 
> I think my personal preference at the moment is either O_NYET or O_PRECONFIG
> or O_NYET; although it is perhaps a bit more "use case centric" than "what
> actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
> would seem to needlessly preclude it being used for future similar use cases
> for files that are not device nodes.
> 
> O_NYET ("not yet") is kind of attractive because it has some geekish smirk
> value, doesn't have "obvious enough" meaning that if you don't know what it
> does you'll guess rather than looking it up, but once you know you are not
> going to forget it!  There is even precedent: USB 2 already has the NYET
> packet type meaning just "not yet".  The more I'm thinking about it the more
> am starting to like it...

Personally, I don't much like the O_NYET as it seems to describe not to open
the device.

> Many of the other combinations have the problem of seeming to do the opposite
> of what the used wants in some use cases; it seems rather odd to open a device
> node that you are intending to configure with "O_NOCONFIG".

Don't like this one either.

> On the other
> hand, "O_CONFIG" might be a valid indication of the intent (like O_RDONLY or
> O_RDWR are indicator of intent), but also has the implication that it *will*
> cause the device to configure itself.  It also would seem to imply that the
> resulting file descriptor can *only* be used for that purpose.

I do like the O_CONFIG or O_FORCONFIG names.
I also like O_PREINIT or O_PRESTART.

Kind Regards,
Maarten


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: H. Peter Anvin @ 2025-11-13 22:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111227-equipment-magnetism-1443@gregkh>

On 2025-11-12 11:39, Greg KH wrote:
>>
>> 1. Opening a device for configuration as opposed to data streaming; in the tty case that doesn't just improve the DTR# and RTS# issue but allows setserial, configuring line disciplines and so on.
>>
>> As I have said, this is application-specific intent, which is why I strongly believe that it needs to be part of the open system call. I furthermore believe that it would have use cases beyond ttys and serial ports, which is why I'm proposing a new open flag as opposed to a sysfs attribute, which actually was my initial approach (yes, I have already prototyped some of this, and as referenced before there is an existing patchset that was never merged.)
> 
> I think this is going to be the most difficult.  I don't remember why I
> rejected the old submission, but maybe it would have modified the
> existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> the simplest?
> 

Okay, to I'm going to toss out a couple suggestions for naming:

	O_(PRE|FOR|N|NO)?(INIT|CONFIG|START)(DEV|HW|IO)?
	O_(NO?RESET|PREPARE)(DEV|HW|IO)?
	O_NO?TOUCH
	O_NYET ("not yet")
	
I think my personal preference at the moment is either O_NYET or O_PRECONFIG
or O_NYET; although it is perhaps a bit more "use case centric" than "what
actual effect it has" I think it might be clearer.  A -DEV, -HW or -IO suffix
would seem to needlessly preclude it being used for future similar use cases
for files that are not device nodes.

O_NYET ("not yet") is kind of attractive because it has some geekish smirk
value, doesn't have "obvious enough" meaning that if you don't know what it
does you'll guess rather than looking it up, but once you know you are not
going to forget it!  There is even precedent: USB 2 already has the NYET
packet type meaning just "not yet".  The more I'm thinking about it the more
am starting to like it...

Many of the other combinations have the problem of seeming to do the opposite
of what the used wants in some use cases; it seems rather odd to open a device
node that you are intending to configure with "O_NOCONFIG".  On the other
hand, "O_CONFIG" might be a valid indication of the intent (like O_RDONLY or
O_RDWR are indicator of intent), but also has the implication that it *will*
cause the device to configure itself.  It also would seem to imply that the
resulting file descriptor can *only* be used for that purpose.

	-hpa


^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-13 18:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRYH_Ugp1IiUQdlM@kernel.org>

On Thu, Nov 13, 2025 at 11:32 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Wed, Nov 12, 2025 at 09:58:27AM -0500, Pasha Tatashin wrote:
> > On Wed, Nov 12, 2025 at 8:25 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > Hi Pasha,
> > >
> > > On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> > > > Hi Mike,
> > > >
> > > > Thank you for review, my comments below:
> > > >
> > > > > > This is why this call is placed first in reboot(), before any
> > > > > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > > > > an allocation problem occurs in KHO, the error is simply reported back
> > > > > > to userspace, and the live update update is safely aborted.
> > >
> > > The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
> > > move it there?
> >
> > Yes, I can move that call into kernel_kexec().
> >
> > > And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
> > > the error value before returning it to userspace. Why kernel_kexec() can't
> > > do the same?
> >
> > We could do that. It would look something like this:
> >
> > if (liveupdate_enabled())
> >    kho_finalize();
> >
> > Because we want to do kho_finalize() from kernel_kexec only when we do
> > live update.
> >
> > > > > This is fine. But what I don't like is that we can't use kho without
> > > > > liveupdate. We are making debugfs optional, we have a way to call
> >
> > This is exactly the fix I proposed:
> >
> > 1. When live-update is enabled, always disable "finalize" debugfs API.
> > 2. When live-update is disabled, always enable "finalize" debugfs API.
>
> I don't mind the concept, what I do mind is sprinkling liveupdate_enabled()
> in KHO.

Sure, let's just unconditionally do kho_fill_kimage().

> How about we kill debugfs/kho/out/abort and make kho_finalize() overwrite
> an existing FDT if there was any?
>
> Abort was required to allow rollback for subsystems that had kho notifiers,
> but now notifiers are gone and kho_abort() only frees the memory
> serialization data. I don't see an issue with kho_finalize() from debugfs
> being a tad slower because of a call to kho_abort() and the liveupdate path
> anyway won't incur that penalty.

Sounds good to me.

> > > KHO should not call into liveupdate. That's layering violation.
> > > And "stateless KHO" does not really make it stateless, it only removes the
> > > memory serialization from kho_finalize(), but it's still required to pack
> > > the FDT.
> >
> > This touches on a point I've raised in the KHO sync meetings: to be
> > effective, the "stateless KHO" work must also make subtree add/remove
> > stateless. There should not be a separate "finalize" state just to
> > finish the FDT. The KHO FDT is tiny (only one page), and there are
> > only a handful of subtrees. Adding and removing subtrees is cheap; we
> > should be able to open FDT, modify it, and finish FDT on every
> > operation. There's no need for a special finalization state at kexec
> > time. KHO should be totally stateless.
>
> And as the first step we can drop 'if (!kho_out.finalized)' from
> kho_fill_kimage(). We might need to massage the check for valid FDT in
> kho_populate() to avoid unnecessary noise, but largely there's no issue
> with always passing KHO data in kimage.

Sounds good, let me work on this patch.

Pasha

^ permalink raw reply

* Re: [PATCH v5 18/22] docs: add documentation for memfd preservation via LUO
From: Pratyush Yadav @ 2025-11-13 16:59 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bBmSD_YftJ-9w1zidLz2=a4NynnLz_gLPsScF145bu5dQ@mail.gmail.com>

On Thu, Nov 13 2025, Pasha Tatashin wrote:

>> +Limitations
>> +===========
>> +
>> +The current implementation has the following limitations:
>> +
>> +Size
>> +  Currently the size of the file is limited by the size of the FDT. The FDT can
>> +  be at of most ``MAX_PAGE_ORDER`` order. By default this is 4 MiB with 4K
>> +  pages. Each page in the file is tracked using 16 bytes. This limits the
>> +  maximum size of the file to 1 GiB.
>
> The above should be removed, as we are using KHO vmalloc that resolves
> this limitation. Pratyush, I suggest for v6 let's move memfd
> documnetation right into the code: memfd_luo.c and
> liveupdate/abi/memfd.h, and source it from there.

ACK. I think the section on behavior in different phases is also out of
date now, and the serialization format too. The format is more
accurately defined in include/linux/liveupdate/abi/memfd.h. So this
documentation needs an overhaul.

I don't mind moving it to the code and including it in the HTML docs via
kernel-doc. Will do that for the next revision.

>
> Keeping documentation with the code helps reduce code/doc divergence.
>
> Pasha

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v5 18/22] docs: add documentation for memfd preservation via LUO
From: Pasha Tatashin @ 2025-11-13 16:55 UTC (permalink / raw)
  To: pratyush, jasonmiu, graf, pasha.tatashin, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
	gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-19-pasha.tatashin@soleen.com>

> +Limitations
> +===========
> +
> +The current implementation has the following limitations:
> +
> +Size
> +  Currently the size of the file is limited by the size of the FDT. The FDT can
> +  be at of most ``MAX_PAGE_ORDER`` order. By default this is 4 MiB with 4K
> +  pages. Each page in the file is tracked using 16 bytes. This limits the
> +  maximum size of the file to 1 GiB.

The above should be removed, as we are using KHO vmalloc that resolves
this limitation. Pratyush, I suggest for v6 let's move memfd
documnetation right into the code: memfd_luo.c and
liveupdate/abi/memfd.h, and source it from there.

Keeping documentation with the code helps reduce code/doc divergence.

Pasha

^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-13 16:31 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bAq-0Vz4jSRWnb_ut9AqG3RcH67JQj76GhoH0BaspWs2A@mail.gmail.com>

On Wed, Nov 12, 2025 at 09:58:27AM -0500, Pasha Tatashin wrote:
> On Wed, Nov 12, 2025 at 8:25 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > Hi Pasha,
> >
> > On Tue, Nov 11, 2025 at 03:57:39PM -0500, Pasha Tatashin wrote:
> > > Hi Mike,
> > >
> > > Thank you for review, my comments below:
> > >
> > > > > This is why this call is placed first in reboot(), before any
> > > > > irreversible reboot notifiers or shutdown callbacks are performed. If
> > > > > an allocation problem occurs in KHO, the error is simply reported back
> > > > > to userspace, and the live update update is safely aborted.
> >
> > The call to liveupdate_reboot() is just before kernel_kexec(). Why we don't
> > move it there?
> 
> Yes, I can move that call into kernel_kexec().
> 
> > And all the liveupdate_reboot() does if kho_finalize() fails it's massaging
> > the error value before returning it to userspace. Why kernel_kexec() can't
> > do the same?
> 
> We could do that. It would look something like this:
> 
> if (liveupdate_enabled())
>    kho_finalize();
> 
> Because we want to do kho_finalize() from kernel_kexec only when we do
> live update.
> 
> > > > This is fine. But what I don't like is that we can't use kho without
> > > > liveupdate. We are making debugfs optional, we have a way to call
> 
> This is exactly the fix I proposed:
> 
> 1. When live-update is enabled, always disable "finalize" debugfs API.
> 2. When live-update is disabled, always enable "finalize" debugfs API.

I don't mind the concept, what I do mind is sprinkling liveupdate_enabled()
in KHO.

How about we kill debugfs/kho/out/abort and make kho_finalize() overwrite
an existing FDT if there was any? 

Abort was required to allow rollback for subsystems that had kho notifiers,
but now notifiers are gone and kho_abort() only frees the memory
serialization data. I don't see an issue with kho_finalize() from debugfs
being a tad slower because of a call to kho_abort() and the liveupdate path
anyway won't incur that penalty.

> > KHO should not call into liveupdate. That's layering violation.
> > And "stateless KHO" does not really make it stateless, it only removes the
> > memory serialization from kho_finalize(), but it's still required to pack
> > the FDT.
> 
> This touches on a point I've raised in the KHO sync meetings: to be
> effective, the "stateless KHO" work must also make subtree add/remove
> stateless. There should not be a separate "finalize" state just to
> finish the FDT. The KHO FDT is tiny (only one page), and there are
> only a handful of subtrees. Adding and removing subtrees is cheap; we
> should be able to open FDT, modify it, and finish FDT on every
> operation. There's no need for a special finalization state at kexec
> time. KHO should be totally stateless.

And as the first step we can drop 'if (!kho_out.finalized)' from
kho_fill_kimage(). We might need to massage the check for valid FDT in
kho_populate() to avoid unnecessary noise, but largely there's no issue
with always passing KHO data in kimage.
 
> Thanks,
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH RFC 4/4] arm64/io: Add {__raw_read|__raw_write}128 support
From: huangchenghai @ 2025-11-13 14:19 UTC (permalink / raw)
  To: Mark Rutland
  Cc: arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api, fanghao11,
	shenyang39, liulongfang, qianweili
In-Reply-To: <aRR9UesvUCFLdVoW@J2N7QTR9R3>


在 2025/11/12 20:28, Mark Rutland 写道:
> On Wed, Nov 12, 2025 at 09:58:46AM +0800, Chenghai Huang wrote:
>> From: Weili Qian <qianweili@huawei.com>
>>
>> Starting from ARMv8.4, stp and ldp instructions become atomic.
> That's not true for accesses to Device memory types.
>
> Per ARM DDI 0487, L.b, section B2.2.1.1 ("Changes to single-copy atomicity in
> Armv8.4"):
>
>    If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load
>    or store two 64-bit registers are single-copy atomic when all of the
>    following conditions are true:
>    • The overall memory access is aligned to 16 bytes.
>    • Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
>
> IIUC when used for Device memory types, those can be split, and a part
> of the access could be replayed multiple times (e.g. due to an
> intetrupt).
>
> I don't think we can add this generally. It is not atomic, and not
> generally safe.
>
> Mark.
Thanks for your correction. I misunderstood the behavior of LDP and
STP instructions. So, regarding device memory types, LDP and STP
instructions do not guarantee single-copy atomicity.

For devices that require 128-bit atomic access, is it only possible
to implement this functionality in the driver?

Chenghai
>
>> Currently, device drivers depend on 128-bit atomic memory IO access,
>> but these are implemented within the drivers. Therefore, this introduces
>> generic {__raw_read|__raw_write}128 function for 128-bit memory access.
>>
>> Signed-off-by: Weili Qian <qianweili@huawei.com>
>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
>> ---
>>   arch/arm64/include/asm/io.h | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
>> index 83e03abbb2ca..80430750a28c 100644
>> --- a/arch/arm64/include/asm/io.h
>> +++ b/arch/arm64/include/asm/io.h
>> @@ -50,6 +50,17 @@ static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>>   	asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
>>   }
>>   
>> +#define __raw_write128 __raw_write128
>> +static __always_inline void __raw_write128(u128 val, volatile void __iomem *addr)
>> +{
>> +	u64 low, high;
>> +
>> +	low = val;
>> +	high = (u64)(val >> 64);
>> +
>> +	asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(low), "rZ"(high), "r"(addr));
>> +}
>> +
>>   #define __raw_readb __raw_readb
>>   static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
>>   {
>> @@ -95,6 +106,16 @@ static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
>>   	return val;
>>   }
>>   
>> +#define __raw_read128 __raw_read128
>> +static __always_inline u128 __raw_read128(const volatile void __iomem *addr)
>> +{
>> +	u64 high, low;
>> +
>> +	asm volatile("ldp %0, %1, [%2]" : "=r" (low), "=r" (high) : "r" (addr));
>> +
>> +	return (((u128)high << 64) | (u128)low);
>> +}
>> +
>>   /* IO barriers */
>>   #define __io_ar(v)							\
>>   ({									\
>> -- 
>> 2.33.0
>>
>>

^ permalink raw reply

* Re: [PATCH v5 01/22] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-13 13:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRXfKPfoi96B68Ef@kernel.org>

> > +/**
> > + * DOC: General ioctl format
> > + *
>
> It seems it's not linked from Documentation/.../liveupdate.rst

It is linked:
Here is uAPI: https://docs.kernel.org/next/userspace-api/liveupdate.html

And also from the main Doc:
https://docs.kernel.org/next/core-api/liveupdate.html
There is a link in "Sea Also" section: Live Update uAPI

> > + * The ioctl interface follows a general format to allow for extensibility. Each
> > + * ioctl is passed in a structure pointer as the argument providing the size of
> > + * the structure in the first u32. The kernel checks that any structure space
> > + * beyond what it understands is 0. This allows userspace to use the backward
> > + * compatible portion while consistently using the newer, larger, structures.
> > + *
> > + * ioctls use a standard meaning for common errnos:
> > + *
> > + *  - ENOTTY: The IOCTL number itself is not supported at all
> > + *  - E2BIG: The IOCTL number is supported, but the provided structure has
> > + *    non-zero in a part the kernel does not understand.
> > + *  - EOPNOTSUPP: The IOCTL number is supported, and the structure is
> > + *    understood, however a known field has a value the kernel does not
> > + *    understand or support.
> > + *  - EINVAL: Everything about the IOCTL was understood, but a field is not
> > + *    correct.
> > + *  - ENOENT: A provided token does not exist.
> > + *  - ENOMEM: Out of memory.
> > + *  - EOVERFLOW: Mathematics overflowed.
> > + *
> > + * As well as additional errnos, within specific ioctls.
>
> ...
>
> > --- a/kernel/liveupdate/Kconfig
> > +++ b/kernel/liveupdate/Kconfig
> > @@ -1,7 +1,34 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Copyright (c) 2025, Google LLC.
> > +# Pasha Tatashin <pasha.tatashin@soleen.com>
> > +#
> > +# Live Update Orchestrator
> > +#
> >
> >  menu "Live Update and Kexec HandOver"
> >
> > +config LIVEUPDATE
> > +     bool "Live Update Orchestrator"
> > +     depends on KEXEC_HANDOVER
> > +     help
> > +       Enable the Live Update Orchestrator. Live Update is a mechanism,
> > +       typically based on kexec, that allows the kernel to be updated
> > +       while keeping selected devices operational across the transition.
> > +       These devices are intended to be reclaimed by the new kernel and
> > +       re-attached to their original workload without requiring a device
> > +       reset.
> > +
> > +       Ability to handover a device from current to the next kernel depends
> > +       on specific support within device drivers and related kernel
> > +       subsystems.
> > +
> > +       This feature primarily targets virtual machine hosts to quickly update
> > +       the kernel hypervisor with minimal disruption to the running virtual
> > +       machines.
> > +
> > +       If unsure, say N.
> > +
>
> Not a big deal, but since LIVEUPDATE depends on KEXEC_HANDOVER, shouldn't
> it go after KEXEC_HANDOVER?

Sure, I'll move them to the end of the file.

Thanks,
Pasha

^ permalink raw reply

* Re: [PATCH v5 01/22] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Mike Rapoport @ 2025-11-13 13:37 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-2-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:02:59PM -0500, Pasha Tatashin wrote:
> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). The primary use case is updating hypervisors with minimal
> disruption to running virtual machines. For userspace side of hypervisor
> update we have copyless migration. LUO is for updating the kernel.
> 
> This initial patch lays the groundwork for the LUO subsystem.
> 
> Further functionality, including the implementation of state transition
> logic, integration with KHO, and hooks for subsystems and file
> descriptors, will be added in subsequent patches.
> 
> Create a character device at /dev/liveupdate.
> 
> A new uAPI header, <uapi/linux/liveupdate.h>, will define the necessary
> structures. The magic number for IOCTL is registered in
> Documentation/userspace-api/ioctl/ioctl-number.rst.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---

...

> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +
> +/*
> + * Userspace interface for /dev/liveupdate
> + * Live Update Orchestrator
> + *
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +#ifndef _UAPI_LIVEUPDATE_H
> +#define _UAPI_LIVEUPDATE_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +/**
> + * DOC: General ioctl format
> + *

It seems it's not linked from Documentation/.../liveupdate.rst

> + * The ioctl interface follows a general format to allow for extensibility. Each
> + * ioctl is passed in a structure pointer as the argument providing the size of
> + * the structure in the first u32. The kernel checks that any structure space
> + * beyond what it understands is 0. This allows userspace to use the backward
> + * compatible portion while consistently using the newer, larger, structures.
> + *
> + * ioctls use a standard meaning for common errnos:
> + *
> + *  - ENOTTY: The IOCTL number itself is not supported at all
> + *  - E2BIG: The IOCTL number is supported, but the provided structure has
> + *    non-zero in a part the kernel does not understand.
> + *  - EOPNOTSUPP: The IOCTL number is supported, and the structure is
> + *    understood, however a known field has a value the kernel does not
> + *    understand or support.
> + *  - EINVAL: Everything about the IOCTL was understood, but a field is not
> + *    correct.
> + *  - ENOENT: A provided token does not exist.
> + *  - ENOMEM: Out of memory.
> + *  - EOVERFLOW: Mathematics overflowed.
> + *
> + * As well as additional errnos, within specific ioctls.

...

> --- a/kernel/liveupdate/Kconfig
> +++ b/kernel/liveupdate/Kconfig
> @@ -1,7 +1,34 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Copyright (c) 2025, Google LLC.
> +# Pasha Tatashin <pasha.tatashin@soleen.com>
> +#
> +# Live Update Orchestrator
> +#
>  
>  menu "Live Update and Kexec HandOver"
>  
> +config LIVEUPDATE
> +	bool "Live Update Orchestrator"
> +	depends on KEXEC_HANDOVER
> +	help
> +	  Enable the Live Update Orchestrator. Live Update is a mechanism,
> +	  typically based on kexec, that allows the kernel to be updated
> +	  while keeping selected devices operational across the transition.
> +	  These devices are intended to be reclaimed by the new kernel and
> +	  re-attached to their original workload without requiring a device
> +	  reset.
> +
> +	  Ability to handover a device from current to the next kernel depends
> +	  on specific support within device drivers and related kernel
> +	  subsystems.
> +
> +	  This feature primarily targets virtual machine hosts to quickly update
> +	  the kernel hypervisor with minimal disruption to the running virtual
> +	  machines.
> +
> +	  If unsure, say N.
> +

Not a big deal, but since LIVEUPDATE depends on KEXEC_HANDOVER, shouldn't
it go after KEXEC_HANDOVER?

>  config KEXEC_HANDOVER
>  	bool "kexec handover"
>  	depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH RFC 3/4] io-128-nonatomic: introduce io{read|write}128_{lo_hi|hi_lo}
From: huangchenghai @ 2025-11-13 11:10 UTC (permalink / raw)
  To: Ben Dooks, arnd, catalin.marinas, will, akpm, anshuman.khandual,
	ryan.roberts, andriy.shevchenko, herbert, linux-kernel,
	linux-arch, linux-arm-kernel, linux-crypto, linux-api
  Cc: fanghao11, shenyang39, liulongfang, qianweili
In-Reply-To: <59f8bc30-c1c6-4f07-87dd-cd2893ae87f7@codethink.co.uk>


在 2025/11/12 22:48, Ben Dooks 写道:
> On 12/11/2025 01:58, Chenghai Huang wrote:
>> From: Weili Qian <qianweili@huawei.com>
>>
>> In order to provide non-atomic functions for io{read|write}128.
>> We define a number of variants of these functions in the generic
>> iomap that will do non-atomic operations.
>>
>> These functions are only defined if io{read|write}128 are defined.
>> If they are not, then the wrappers that always use non-atomic operations
>> from include/linux/io-128-nonatomic*.h will be used.
>>
>> Signed-off-by: Weili Qian <qianweili@huawei.com>
>> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
>> ---
>>   include/linux/io-128-nonatomic-hi-lo.h | 35 ++++++++++++++++++++++++++
>>   include/linux/io-128-nonatomic-lo-hi.h | 34 +++++++++++++++++++++++++
>>   2 files changed, 69 insertions(+)
>>   create mode 100644 include/linux/io-128-nonatomic-hi-lo.h
>>   create mode 100644 include/linux/io-128-nonatomic-lo-hi.h
>>
>> diff --git a/include/linux/io-128-nonatomic-hi-lo.h 
>> b/include/linux/io-128-nonatomic-hi-lo.h
>> new file mode 100644
>> index 000000000000..b5b083a9e81b
>> --- /dev/null
>> +++ b/include/linux/io-128-nonatomic-hi-lo.h
>> @@ -0,0 +1,35 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_IO_128_NONATOMIC_HI_LO_H_
>> +#define _LINUX_IO_128_NONATOMIC_HI_LO_H_
>> +
>> +#include <linux/io.h>
>> +#include <asm-generic/int-ll64.h>
>> +
>> +static inline u128 ioread128_hi_lo(const void __iomem *addr)
>> +{
>> +    u32 low, high;
>
> did you mean u64 here?
>
Thank you for your reminder, I made a rookie mistake.


Chenghai

>> +    high = ioread64(addr + sizeof(u64));
>> +    low = ioread64(addr);
>> +
>> +    return low + ((u128)high << 64);
>> +}
>> +
>> +static inline void iowrite128_hi_lo(u128 val, void __iomem *addr)
>> +{
>> +    iowrite64(val >> 64, addr + sizeof(u64));
>> +    iowrite64(val, addr);
>> +}
>> +
>

^ permalink raw reply

* Re: [PATCH v5 06/22] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-12 20:47 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRTwZNKFvDqb1NG5@kernel.org>

On Wed, Nov 12, 2025 at 3:39 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Fri, Nov 07, 2025 at 04:03:04PM -0500, Pasha Tatashin wrote:
> > Introduce concept of "Live Update Sessions" within the LUO framework.
> > LUO sessions provide a mechanism to group and manage `struct file *`
> > instances (representing file descriptors) that need to be preserved
> > across a kexec-based live update.
> >
> > Each session is identified by a unique name and acts as a container
> > for file objects whose state is critical to a userspace workload, such
> > as a virtual machine or a high-performance database, aiming to maintain
> > their functionality across a kernel transition.
> >
> > This groundwork establishes the framework for preserving file-backed
> > state across kernel updates, with the actual file data preservation
> > mechanisms to be implemented in subsequent patches.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  include/linux/liveupdate/abi/luo.h |  81 ++++++
> >  include/uapi/linux/liveupdate.h    |   3 +
> >  kernel/liveupdate/Makefile         |   3 +-
> >  kernel/liveupdate/luo_core.c       |   9 +
> >  kernel/liveupdate/luo_internal.h   |  39 +++
> >  kernel/liveupdate/luo_session.c    | 405 +++++++++++++++++++++++++++++
> >  6 files changed, 539 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel/liveupdate/luo_session.c
> >
> > diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> > index 9483a294287f..37b9fecef3f7 100644
> > --- a/include/linux/liveupdate/abi/luo.h
> > +++ b/include/linux/liveupdate/abi/luo.h
> > @@ -28,6 +28,11 @@
> >   *     / {
> >   *         compatible = "luo-v1";
> >   *         liveupdate-number = <...>;
> > + *
> > + *         luo-session {
> > + *             compatible = "luo-session-v1";
> > + *             luo-session-head = <phys_addr_of_session_head_ser>;
>
> 'head' reads to me as list head rather than a header. I'd use 'hdr' for the
> latter.

Or just use the full name: "header" ? It is not too long as well.

Pasha

>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v5 22/22] tests/liveupdate: Add in-kernel liveupdate test
From: Pasha Tatashin @ 2025-11-12 20:40 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRTs3ZouoL1CGHst@kernel.org>

On Wed, Nov 12, 2025 at 3:24 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Fri, Nov 07, 2025 at 04:03:20PM -0500, Pasha Tatashin wrote:
> > Introduce an in-kernel test module to validate the core logic of the
> > Live Update Orchestrator's File-Lifecycle-Bound feature. This
> > provides a low-level, controlled environment to test FLB registration
> > and callback invocation without requiring userspace interaction or
> > actual kexec reboots.
> >
> > The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  kernel/liveupdate/luo_file.c     |   2 +
> >  kernel/liveupdate/luo_internal.h |   8 ++
> >  lib/Kconfig.debug                |  23 ++++++
> >  lib/tests/Makefile               |   1 +
> >  lib/tests/liveupdate.c           | 130 +++++++++++++++++++++++++++++++
> >  5 files changed, 164 insertions(+)
> >  create mode 100644 lib/tests/liveupdate.c
> >
> > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > index 713069b96278..4c0a75918f3d 100644
> > --- a/kernel/liveupdate/luo_file.c
> > +++ b/kernel/liveupdate/luo_file.c
> > @@ -829,6 +829,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> >       INIT_LIST_HEAD(&fh->flb_list);
> >       list_add_tail(&fh->list, &luo_file_handler_list);
> >
> > +     liveupdate_test_register(fh);
> > +
>
> Do it mean that every flb user will be added here?

No, FLB users will use:

liveupdate_register_flb() from various subsystems. This
liveupdate_test_register() is only to allow kernel test to register
test-FLBs to every single file-handler for in-kernel testing purpose
only.

Pasha

^ permalink raw reply

* Re: [PATCH v5 06/22] liveupdate: luo_session: add sessions support
From: Mike Rapoport @ 2025-11-12 20:39 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-7-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:04PM -0500, Pasha Tatashin wrote:
> Introduce concept of "Live Update Sessions" within the LUO framework.
> LUO sessions provide a mechanism to group and manage `struct file *`
> instances (representing file descriptors) that need to be preserved
> across a kexec-based live update.
> 
> Each session is identified by a unique name and acts as a container
> for file objects whose state is critical to a userspace workload, such
> as a virtual machine or a high-performance database, aiming to maintain
> their functionality across a kernel transition.
> 
> This groundwork establishes the framework for preserving file-backed
> state across kernel updates, with the actual file data preservation
> mechanisms to be implemented in subsequent patches.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate/abi/luo.h |  81 ++++++
>  include/uapi/linux/liveupdate.h    |   3 +
>  kernel/liveupdate/Makefile         |   3 +-
>  kernel/liveupdate/luo_core.c       |   9 +
>  kernel/liveupdate/luo_internal.h   |  39 +++
>  kernel/liveupdate/luo_session.c    | 405 +++++++++++++++++++++++++++++
>  6 files changed, 539 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/liveupdate/luo_session.c
> 
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 9483a294287f..37b9fecef3f7 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -28,6 +28,11 @@
>   *     / {
>   *         compatible = "luo-v1";
>   *         liveupdate-number = <...>;
> + *
> + *         luo-session {
> + *             compatible = "luo-session-v1";
> + *             luo-session-head = <phys_addr_of_session_head_ser>;

'head' reads to me as list head rather than a header. I'd use 'hdr' for the
latter.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v5 22/22] tests/liveupdate: Add in-kernel liveupdate test
From: Mike Rapoport @ 2025-11-12 20:23 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251107210526.257742-23-pasha.tatashin@soleen.com>

On Fri, Nov 07, 2025 at 04:03:20PM -0500, Pasha Tatashin wrote:
> Introduce an in-kernel test module to validate the core logic of the
> Live Update Orchestrator's File-Lifecycle-Bound feature. This
> provides a low-level, controlled environment to test FLB registration
> and callback invocation without requiring userspace interaction or
> actual kexec reboots.
> 
> The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  kernel/liveupdate/luo_file.c     |   2 +
>  kernel/liveupdate/luo_internal.h |   8 ++
>  lib/Kconfig.debug                |  23 ++++++
>  lib/tests/Makefile               |   1 +
>  lib/tests/liveupdate.c           | 130 +++++++++++++++++++++++++++++++
>  5 files changed, 164 insertions(+)
>  create mode 100644 lib/tests/liveupdate.c
> 
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> index 713069b96278..4c0a75918f3d 100644
> --- a/kernel/liveupdate/luo_file.c
> +++ b/kernel/liveupdate/luo_file.c
> @@ -829,6 +829,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
>  	INIT_LIST_HEAD(&fh->flb_list);
>  	list_add_tail(&fh->list, &luo_file_handler_list);
>  
> +	liveupdate_test_register(fh);
> +

Do it mean that every flb user will be added here?

>  	return 0;
>  }
>  

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: H. Peter Anvin @ 2025-11-12 19:55 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111227-equipment-magnetism-1443@gregkh>

On 2025-11-12 11:39, Greg KH wrote:
> Trimming out stuff to get to the real questions:
> 
> On Wed, Nov 12, 2025 at 11:12:22AM -0800, H. Peter Anvin wrote:
>> Things that I have identified, at least in my opinion:
>>
>> 1. Opening a device for configuration as opposed to data streaming; in the tty case that doesn't just improve the DTR# and RTS# issue but allows setserial, configuring line disciplines and so on.
>>
>> As I have said, this is application-specific intent, which is why I strongly believe that it needs to be part of the open system call. I furthermore believe that it would have use cases beyond ttys and serial ports, which is why I'm proposing a new open flag as opposed to a sysfs attribute, which actually was my initial approach (yes, I have already prototyped some of this, and as referenced before there is an existing patchset that was never merged.)
> 
> I think this is going to be the most difficult.  I don't remember why I
> rejected the old submission, but maybe it would have modified the
> existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
> the simplest?

That was exactly my proposal - see the header of this thread :)

>> 3. The only way to determine the type of a tty driver is reading and parsing /proc/tty/drivers; that information is exported neither through ioctl nor sysfs. Exporting *that* through sysfs is probably the easiest of all the improvements.
> 
> The "type" is interesting.  We keep adding new "types" of serial ports
> to the uapi list, and they don't really show up very well to userspace,
> as you say.  Adding this export to sysfs is fine with me, but we should
> make it a string somehow, and not just a random number like the current
> types are listed as, to give people a chance to keep track of this.
> 
> So yes, this too should be done.

I meant to add this to the previous email -- the obvious choice (and what is
in my prototype) is to use the same string as is currently exposed in
/proc/tty/drivers.

	-hpa


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: H. Peter Anvin @ 2025-11-12 19:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111227-equipment-magnetism-1443@gregkh>

On 2025-11-12 11:39, Greg KH wrote:
> 
>> 3. The only way to determine the type of a tty driver is reading and parsing /proc/tty/drivers; that information is exported neither through ioctl nor sysfs. Exporting *that* through sysfs is probably the easiest of all the improvements.
> 
> The "type" is interesting.  We keep adding new "types" of serial ports
> to the uapi list, and they don't really show up very well to userspace,
> as you say.  Adding this export to sysfs is fine with me, but we should
> make it a string somehow, and not just a random number like the current
> types are listed as, to give people a chance to keep track of this.
> 
> So yes, this too should be done.
> 

Yes, this one is pretty obvious:

>> 4. There isn't a device-independent way to determine if a device is "real" (configured for hardware) or not without opening it and executing one of the termios ioctls like TCGETS (returns -EIO if there isn't anything behind it.) For a UART port it is possible to come up with an educated guess based on the aforementioned sysfs properties (does it have any kind of address associated with it?), but seriously, should stty -a /dev/ttyS0 really glitch RTS# and DTR# even though there is no intent of using the port for communication? 
> 
> Determining "realness" is going to be hard I think (is a usb-serial
> device real or not?  Some are, some are not, but how do we even know?)
> Does a "real" uart mean that the device is real?  How do you define
> that?  What about virtual ones?  Modem chips that do have full line
> discipline support on USB connections?  There's a lot out there to deal
> with here and I think some "fake" ones do pass TCGETS calls just because
> they lie.)
> 

What I mean with "real" is that the device exists at all, unlike e.g.
/dev/ttyS* device nodes which are *only* available for the purpose of binding.

So "bound to a hardware device" is what I mean, not that it is a device with
RS232 drivers on it (which would be impossible to determine, as you very
correctly point out.)

> And addresses are only the "very old" method, many "real" PCI uarts
> don't have them, same for USB ones.
> 
> And changing 'stty -a' is going to be hard, unless you want to use the
> new flag?

That's exactly the idea: use the new open flag.

> But yes, making this more sane is always good, 2 of your things here
> should be pretty simple to knock up if someone wants to.  The others
> might be more difficult just due to backwards compatibility issues.


Indeed. Which is the whole reason for this RFC thread.

	-hpa


^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: Greg KH @ 2025-11-12 19:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <DD67C0CF-D330-4D40-B610-FD3EB7AA0218@zytor.com>

Trimming out stuff to get to the real questions:

On Wed, Nov 12, 2025 at 11:12:22AM -0800, H. Peter Anvin wrote:
> Things that I have identified, at least in my opinion:
> 
> 1. Opening a device for configuration as opposed to data streaming; in the tty case that doesn't just improve the DTR# and RTS# issue but allows setserial, configuring line disciplines and so on.
> 
> As I have said, this is application-specific intent, which is why I strongly believe that it needs to be part of the open system call. I furthermore believe that it would have use cases beyond ttys and serial ports, which is why I'm proposing a new open flag as opposed to a sysfs attribute, which actually was my initial approach (yes, I have already prototyped some of this, and as referenced before there is an existing patchset that was never merged.)

I think this is going to be the most difficult.  I don't remember why I
rejected the old submission, but maybe it would have modified the
existing behaviour?  A new open flag "O_DO_NOT_TOUCH_ANYTHING" might be
the simplest?

> 2. Currently the setserial configurables are available in sysfs, but *only* for UARTs, whereas TIOC[GS]SERIAL is at least available to all serial devices. That code should presumably be hoisted into a higher layer; this shouldn't be too difficult.

I agree, this shouldn't be hard, no reason to not do this.

> 3. The only way to determine the type of a tty driver is reading and parsing /proc/tty/drivers; that information is exported neither through ioctl nor sysfs. Exporting *that* through sysfs is probably the easiest of all the improvements.

The "type" is interesting.  We keep adding new "types" of serial ports
to the uapi list, and they don't really show up very well to userspace,
as you say.  Adding this export to sysfs is fine with me, but we should
make it a string somehow, and not just a random number like the current
types are listed as, to give people a chance to keep track of this.

So yes, this too should be done.

> 4. There isn't a device-independent way to determine if a device is "real" (configured for hardware) or not without opening it and executing one of the termios ioctls like TCGETS (returns -EIO if there isn't anything behind it.) For a UART port it is possible to come up with an educated guess based on the aforementioned sysfs properties (does it have any kind of address associated with it?), but seriously, should stty -a /dev/ttyS0 really glitch RTS# and DTR# even though there is no intent of using the port for communication? 

Determining "realness" is going to be hard I think (is a usb-serial
device real or not?  Some are, some are not, but how do we even know?)
Does a "real" uart mean that the device is real?  How do you define
that?  What about virtual ones?  Modem chips that do have full line
discipline support on USB connections?  There's a lot out there to deal
with here and I think some "fake" ones do pass TCGETS calls just because
they lie.)

And addresses are only the "very old" method, many "real" PCI uarts
don't have them, same for USB ones.

And changing 'stty -a' is going to be hard, unless you want to use the
new flag?

But yes, making this more sane is always good, 2 of your things here
should be pretty simple to knock up if someone wants to.  The others
might be more difficult just due to backwards compatibility issues.

thanks,

greg k-h

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: H. Peter Anvin @ 2025-11-12 19:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111241-domestic-moonstone-f75f@gregkh>

On November 12, 2025 8:46:50 AM PST, Greg KH <gregkh@linuxfoundation.org> wrote:
>On Wed, Nov 12, 2025 at 08:09:45AM -0800, H. Peter Anvin wrote:
>> On November 12, 2025 3:22:56 AM PST, Greg KH <gregkh@linuxfoundation.org> wrote:
>> >On Mon, Nov 10, 2025 at 07:57:22PM -0800, H. Peter Anvin wrote:
>> >> Honestly, though, I'm far less interested in what 8250-based hardware does than e.g. USB.
>> >
>> >hahahahahahaha {snort}
>> >
>> >Hah.  that's a good one.
>> >
>> >Oh, you aren't kidding.
>> >
>> >Wow, good luck with this.  USB-serial adaptors are all over the place,
>> >some have real uarts in them (and so do buffering in the device, and
>> >line handling in odd ways when powered up), and some are almost just a
>> >straight pipe through to the USB host with control line handling ideas
>> >tacked on to the side as an afterthought, if at all.
>> >
>> >There is no standard here, they all work differently, and even work
>> >differently across the same device type with just barely enough hints
>> >for us to determine what is going on.
>> >
>> >So don't worry about USB, if you throw that into the mix, all bets are
>> >off and you should NEVER rely on that.
>> >
>> >Remeber USB->serial was explicitly rejected by the USB standard group,
>> >only to have it come back in the "side door" through the spec process
>> >when it turned out that Microsoft hated having to write a zillion
>> >different vendor-specific drivers because the vendor provided ones kept
>> >crashing user's machines.  So what we ended up with was "just enough" to
>> >make it through the spec process, and even then line signals are
>> >probably never tested so you can't rely on them.
>> >
>> >good luck!
>> >
>> >greg "this brought up too many bad memories" k-h
>> 
>> Ugh.
>> 
>> I have made it very clear that I am very aware that there is broken hardware. 
>
>I would posit that there is NO "non-broken" usb->serial devices out
>there.  The closest I have seen was the old IO-Edgeport devices, but
>they were expensive and got bought out by some other company and in the
>end didn't succeed due to all of the "cheap" devices/chips out there
>that just did dumb tx/rx transfers over a fake serial connection.
>
>> What I'm trying to do is to deal with the (occasional) case of
>> *non*-broken hardware. Right now Linux breaks the non-broken hardware
>> for it, and I don't think the existence of broken hardware is a good
>> justification for that.
>
>No, but we have to handle both somehow.
>
>And given that we still get brand-new UART drivers sent to use every few
>months, there is just more and more "broken" hardware out there overall.
>
>Anyway, good luck coming up with a scheme to handle your crazy
>connections, I would push back and say "any device that treats a serial
>control line as a power signal is broken to start with" :)
>
>greg k-h

Yeah, well, I will certainly *not* argue with that one! Quite the contrary... *shudder*.

There are enough facepalms to go around, both on the DTE and DCE sides, and I'm not in any shape, way, or form denying that.

Nor am I trying to boil the ocean here. I'm just trying to figure out how to make the situation a bit more flexible to try to at least reduce the amount of brokenness we throw back in the face of the user. 

The reason I brought up USB is that while RTS# briefly glitching on an actual RS232 line is unlikely to actually make it through the driver, which will have a cutoff of at most 10 MHz and usually much less, an ACM device receiving a SET_LINE_CONTROL message may react to it immediately, especially if it is an emulated port.(*)

The same thing is true, of course, for "uart" lines, a.k.a. TTL or CMOS level serial, which have much higher bandwidth than any physical RS232 or RS485 drivers.

Incidentally, I'm not looking at this because of a huge need on my own part; I'm doing it because it irked me to no end that glibc still hadn't implemented support for the almost two decades old support in the Linux kernel for arbitrary serial port speeds, and in the end I ended up Just Writing The Code; but there as well I ended up having several discussions with the glibc maintainers about how to deal with the unavoidable compromises involved in evolving one l the interfaces; the POSIX termios interface design from back in the '80s really caused some serious headaches. 

There was fallout, as it exposed bugs in a bunch of software which had implemented their own hacks during the 17 years that glibc didn't provide any support for handling this. It gave me a *lot* of new insights in how various applications in the field actually do things and what they have to do to work around limitations in both hardware and software at the moment, and so I'm feeling kind of motivated to try to make these real life use cases a little bit less obnoxious.

The thing with RTS# and DTR# came up when I started rooting around in gtkterm's serial code probing code, but I had myself run into it using ESP32 modules.

Had I personally designed the ESP32 interface I would have used BREAK to pulse reset instead of RTS#, but that would have required a capacitor and a diode, and omg that would have added *cost!*

(This is in fact exactly what I did when I implemented my own ACM device in an FPGA, but that's a different story.)

As far as actual USB-to-serial devices are concerned, my *personal* experience with quite a few of them is that the line control signals are generally quite reliable, and transmitting BREAK usually works OK; however, whether *receiving* BREAK works is a crapshoot at the very best.

Then there are of course the ones that just lock up randomly, or seriously glitch on the USB side, but that's an entirely different kettle of fish.

And you are most definitely right that not standardizing a USB to serial device class from the very beginning was a very bad mistake. At least now newer devices tend toward using ACM and advertise as "driverless". It also allows the rest of the USB descriptor to contain more useful information about the DCE, assuming it is a device that is physically bound to or integrated in the DCE (virtual.)

I'm not saying you and Ted are *wrong*; you are most certainly not. What I'm hoping for is a bit of pragmatism that would make at least some users' lives a little easier.

None of this will, of course, help when the hardware itself is buggered to the point that there is nothing we can do about it. I'm not trying to deal with anything like that.

Things that I have identified, at least in my opinion:

1. Opening a device for configuration as opposed to data streaming; in the tty case that doesn't just improve the DTR# and RTS# issue but allows setserial, configuring line disciplines and so on.

As I have said, this is application-specific intent, which is why I strongly believe that it needs to be part of the open system call. I furthermore believe that it would have use cases beyond ttys and serial ports, which is why I'm proposing a new open flag as opposed to a sysfs attribute, which actually was my initial approach (yes, I have already prototyped some of this, and as referenced before there is an existing patchset that was never merged.)

2. Currently the setserial configurables are available in sysfs, but *only* for UARTs, whereas TIOC[GS]SERIAL is at least available to all serial devices. That code should presumably be hoisted into a higher layer; this shouldn't be too difficult.

3. The only way to determine the type of a tty driver is reading and parsing /proc/tty/drivers; that information is exported neither through ioctl nor sysfs. Exporting *that* through sysfs is probably the easiest of all the improvements.

4. There isn't a device-independent way to determine if a device is "real" (configured for hardware) or not without opening it and executing one of the termios ioctls like TCGETS (returns -EIO if there isn't anything behind it.) For a UART port it is possible to come up with an educated guess based on the aforementioned sysfs properties (does it have any kind of address associated with it?), but seriously, should stty -a /dev/ttyS0 really glitch RTS# and DTR# even though there is no intent of using the port for communication? 

Let me make it very clear that I'm *not* criticizing neither you, Ted, Alan Cox nor anyone else who have been involved: on the contrary, you have done an absolutely fantastic job making Linux work with all these pieces of hardware, each with various "interesting" properties. Nor am I criticizing the tty interface as it is: it is designed to allow both interactive terminal use and use for other purposes under program control, which is really an astonishing level of flexibility. We have already shown that it can evolve to meet new needs, which sometimes requires interface extensions – like O_NOCTTY, O_NONBLOCK, CRTSCTS, termios2, and BOTHER. *And that is perfectly okay.* If anything it is a strength.

And please do recognize that I have stated from the beginning that I expect this to be a "best effort" on the part of kernel, not a guarantee. If the hardware is too broken, the user gets to keep both pieces – that's just reality.


^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-12 17:39 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bA6vCH=RkiZjAOsh5iR52BY567bJB3HNAGqDb307YxVdw@mail.gmail.com>

On Wed, Nov 12, 2025 at 10:14 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> > > FLB global objects act similarly to subsystem-wide data, except their
> > > data has a clear creation and destruction time tied to preserved
> > > files. When the first file of a particular type is added to LUO, this
> > > global data is created; when the last file of that type is removed
> > > (unpreserved or finished), this global data is destroyed, this is why
> > > its life is bound to file lifecycle. Crucially, this global data is
> > > accessible at any time while LUO owns the associated files spanning
> > > the early boot update boundary.
> >
> > But there are no files at mm_core_init(). I'm really confused here.
>
> This isn't about the files themselves, but about the subsystem global
> data. The files are only used to describe the lifetime of this global
> data.
>
> I think mm_core_init() is too late, and the call would need to be
> moved earlier to work correctly with subsystems. At the very least, we
> will have to add some early FDT parsing to retrieve data during early
> boot, but that would be part of the HugeTLB preservation work.
>
> I can move liveupdate_init() inside kho_memory_init(), so we don't
> need to modify mm_core_init(). Or, rename kho_memory_init to
> kho_and_liveupdate_memory_init() and combine the two calls into a
> single function in kexec_handover.c.
>
> > > > So I think for now we can move liveupdate_init() later in boot and we will
> > > > solve the problem of hugetlb reservations when we add support for hugetlb.
> > >
> > > HugeTLB reserves memory early in boot. If we already have preserved
> > > HugeTLB pages via LUO/KHO, we must ensure they are counted against the
> > > boot-time reservation. For example, if hugetlb_cma_reserve() needs to
> > > reserve ten 1G pages, but LUO has already preserved seven, we only
> > > need to reserve three new pages and the rest are going to be restored
> > > with the files.
> > >
> > > Since this count is contained in the FLB global object, that data
> > > needs to be available during the early reservation phase. (Pratyush is
> > > working on HugeTLB preservation and can explain further).
> >
> > Not sure I really follow the design here, but in my understanding the gist
> > here is that hugetlb reservations need to be aware of the preserved state.
> > If that's the case, we definitely can move liveupdate_init() to an initcall
> > and revisit this when hugetlb support for luo comes along.
>
> This will break the in-kernel tests that ensure FLB data is accessible
> and works correctly during early boot, as they use
> early_initcall(liveupdate_test_early_init);.

We had a chat, so we agreed to move liveupdate_init() into
early_initcall() and liveupdate_test_early_init into somewhere later
initcall. And when HugeTLB support is added we will introduce a
variant for read-only access to do it early in boot from setup_arch().

> We cannot rely on early_initcall() for liveupdate_init() because it
> would compete with the test. We also can't move the test to a later
> initcall, as that would break the verification of what FLB is
> promising: early access to global data by subsystems that need it
> (PCI, IOMMU Core, HugeTLB, etc.).
>
> Thanks,
> Pasha

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: Greg KH @ 2025-11-12 16:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <6DBB5931-ACD4-4174-9FCE-96C45FFC4603@zytor.com>

On Wed, Nov 12, 2025 at 08:09:45AM -0800, H. Peter Anvin wrote:
> On November 12, 2025 3:22:56 AM PST, Greg KH <gregkh@linuxfoundation.org> wrote:
> >On Mon, Nov 10, 2025 at 07:57:22PM -0800, H. Peter Anvin wrote:
> >> Honestly, though, I'm far less interested in what 8250-based hardware does than e.g. USB.
> >
> >hahahahahahaha {snort}
> >
> >Hah.  that's a good one.
> >
> >Oh, you aren't kidding.
> >
> >Wow, good luck with this.  USB-serial adaptors are all over the place,
> >some have real uarts in them (and so do buffering in the device, and
> >line handling in odd ways when powered up), and some are almost just a
> >straight pipe through to the USB host with control line handling ideas
> >tacked on to the side as an afterthought, if at all.
> >
> >There is no standard here, they all work differently, and even work
> >differently across the same device type with just barely enough hints
> >for us to determine what is going on.
> >
> >So don't worry about USB, if you throw that into the mix, all bets are
> >off and you should NEVER rely on that.
> >
> >Remeber USB->serial was explicitly rejected by the USB standard group,
> >only to have it come back in the "side door" through the spec process
> >when it turned out that Microsoft hated having to write a zillion
> >different vendor-specific drivers because the vendor provided ones kept
> >crashing user's machines.  So what we ended up with was "just enough" to
> >make it through the spec process, and even then line signals are
> >probably never tested so you can't rely on them.
> >
> >good luck!
> >
> >greg "this brought up too many bad memories" k-h
> 
> Ugh.
> 
> I have made it very clear that I am very aware that there is broken hardware. 

I would posit that there is NO "non-broken" usb->serial devices out
there.  The closest I have seen was the old IO-Edgeport devices, but
they were expensive and got bought out by some other company and in the
end didn't succeed due to all of the "cheap" devices/chips out there
that just did dumb tx/rx transfers over a fake serial connection.

> What I'm trying to do is to deal with the (occasional) case of
> *non*-broken hardware. Right now Linux breaks the non-broken hardware
> for it, and I don't think the existence of broken hardware is a good
> justification for that.

No, but we have to handle both somehow.

And given that we still get brand-new UART drivers sent to use every few
months, there is just more and more "broken" hardware out there overall.

Anyway, good luck coming up with a scheme to handle your crazy
connections, I would push back and say "any device that treats a serial
control line as a power signal is broken to start with" :)

greg k-h

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_NRESETDEV
From: H. Peter Anvin @ 2025-11-12 16:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Theodore Ts'o, Maarten Brock, linux-serial@vger.kernel.org,
	linux-api@vger.kernel.org, LKML
In-Reply-To: <2025111214-doily-anyway-b24b@gregkh>

On November 12, 2025 3:22:56 AM PST, Greg KH <gregkh@linuxfoundation.org> wrote:
>On Mon, Nov 10, 2025 at 07:57:22PM -0800, H. Peter Anvin wrote:
>> Honestly, though, I'm far less interested in what 8250-based hardware does than e.g. USB.
>
>hahahahahahaha {snort}
>
>Hah.  that's a good one.
>
>Oh, you aren't kidding.
>
>Wow, good luck with this.  USB-serial adaptors are all over the place,
>some have real uarts in them (and so do buffering in the device, and
>line handling in odd ways when powered up), and some are almost just a
>straight pipe through to the USB host with control line handling ideas
>tacked on to the side as an afterthought, if at all.
>
>There is no standard here, they all work differently, and even work
>differently across the same device type with just barely enough hints
>for us to determine what is going on.
>
>So don't worry about USB, if you throw that into the mix, all bets are
>off and you should NEVER rely on that.
>
>Remeber USB->serial was explicitly rejected by the USB standard group,
>only to have it come back in the "side door" through the spec process
>when it turned out that Microsoft hated having to write a zillion
>different vendor-specific drivers because the vendor provided ones kept
>crashing user's machines.  So what we ended up with was "just enough" to
>make it through the spec process, and even then line signals are
>probably never tested so you can't rely on them.
>
>good luck!
>
>greg "this brought up too many bad memories" k-h

Ugh.

I have made it very clear that I am very aware that there is broken hardware. 

What I'm trying to do is to deal with the (occasional) case of *non*-broken hardware. Right now Linux breaks the non-broken hardware for it, and I don't think the existence of broken hardware is a good justification for that.


^ permalink raw reply

* Re: [PATCH v5 02/22] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-12 15:14 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRSMsz4zy8QBbsIH@kernel.org>

> > FLB global objects act similarly to subsystem-wide data, except their
> > data has a clear creation and destruction time tied to preserved
> > files. When the first file of a particular type is added to LUO, this
> > global data is created; when the last file of that type is removed
> > (unpreserved or finished), this global data is destroyed, this is why
> > its life is bound to file lifecycle. Crucially, this global data is
> > accessible at any time while LUO owns the associated files spanning
> > the early boot update boundary.
>
> But there are no files at mm_core_init(). I'm really confused here.

This isn't about the files themselves, but about the subsystem global
data. The files are only used to describe the lifetime of this global
data.

I think mm_core_init() is too late, and the call would need to be
moved earlier to work correctly with subsystems. At the very least, we
will have to add some early FDT parsing to retrieve data during early
boot, but that would be part of the HugeTLB preservation work.

I can move liveupdate_init() inside kho_memory_init(), so we don't
need to modify mm_core_init(). Or, rename kho_memory_init to
kho_and_liveupdate_memory_init() and combine the two calls into a
single function in kexec_handover.c.

> > > So I think for now we can move liveupdate_init() later in boot and we will
> > > solve the problem of hugetlb reservations when we add support for hugetlb.
> >
> > HugeTLB reserves memory early in boot. If we already have preserved
> > HugeTLB pages via LUO/KHO, we must ensure they are counted against the
> > boot-time reservation. For example, if hugetlb_cma_reserve() needs to
> > reserve ten 1G pages, but LUO has already preserved seven, we only
> > need to reserve three new pages and the rest are going to be restored
> > with the files.
> >
> > Since this count is contained in the FLB global object, that data
> > needs to be available during the early reservation phase. (Pratyush is
> > working on HugeTLB preservation and can explain further).
>
> Not sure I really follow the design here, but in my understanding the gist
> here is that hugetlb reservations need to be aware of the preserved state.
> If that's the case, we definitely can move liveupdate_init() to an initcall
> and revisit this when hugetlb support for luo comes along.

This will break the in-kernel tests that ensure FLB data is accessible
and works correctly during early boot, as they use
early_initcall(liveupdate_test_early_init);.

We cannot rely on early_initcall() for liveupdate_init() because it
would compete with the test. We also can't move the test to a later
initcall, as that would break the verification of what FLB is
promising: early access to global data by subsystems that need it
(PCI, IOMMU Core, HugeTLB, etc.).

Thanks,
Pasha

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox