Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: Pasha Tatashin @ 2025-11-19 22:12 UTC (permalink / raw)
  To: David Matlack
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aR40oVOxZ-dezpy0@google.com>

On Wed, Nov 19, 2025 at 4:20 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-11-15 06:34 PM, Pasha Tatashin wrote:
>
> > diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> > new file mode 100755
> > index 000000000000..3c7c6cafbef8
> > --- /dev/null
> > +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> > @@ -0,0 +1,16 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +set -e
> > +
> > +# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
>
> It'd be nice to use proper command line options for KERNEL and INITRAMFS
> instead of relying on environment variables.

Now that tests and do_kexec are separate, I do not think we should
complicate do_kexec.sh to support every possible environment. On most
modern distros kexec is managed via systemd, and the load and reboot
commands are going to be handled through systemd. do_kexec.sh is meant
for a very simplistic environment such as with busybox rootfs to
perform selftests.

> e.g.
>
>   ./do_kexec.sh -k <kernel> -i <initramfs>
>
> > +
> > +KERNEL="${KERNEL:-/boot/bzImage}"
> > +set -- -l -s --reuse-cmdline "$KERNEL"
>
> I've observed --reuse-cmdline causing overload of the kernel command
> line when doing repeated kexecs, since it includes the built-in command
> line (CONFIG_CMDLINE) which then also gets added by the next kernel
> during boot.

There is a problem with CONFIG_CMDLINE + KEXEC, ideally, it should be
addressed in the kernel

>
> Should we have something like this instead?
>
> diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> index 3c7c6cafbef8..2590a870993d 100755
> --- a/tools/testing/selftests/liveupdate/do_kexec.sh
> +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> @@ -4,8 +4,16 @@ set -e
>
>  # Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
>
> +# Determine the boot command line we need to pass to the kexec kernel.  Note
> +# that the kernel will append to it its builtin command line, so make sure we
> +# subtract the builtin command to avoid accumulating kernel parameters and
> +# eventually overflowing the command line.
> +full_cmdline=$(cat /proc/cmdline)
> +builtin_cmdline=$(zcat /proc/config.gz|grep CONFIG_CMDLINE=|cut -f2 -d\")

This also implies we have /proc/config.gz or CONFIG_IKCONFIG_PROC ...

> +cmdline=${full_cmdline/$builtin_cmdline /}
> +
>  KERNEL="${KERNEL:-/boot/bzImage}"
> -set -- -l -s --reuse-cmdline "$KERNEL"
> +set -- -l -s --command-line="${cmdline}" "$KERNEL"
>
>  INITRAMFS="${INITRAMFS:-/boot/initramfs}"
>  if [ -f "$INITRAMFS" ]; then
>
> > +
> > +INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> > +if [ -f "$INITRAMFS" ]; then
> > +    set -- "$@" --initrd="$INITRAMFS"
> > +fi
> > +
> > +kexec "$@"
> > +kexec -e
>
> Consider separating the kexec load into its own script, in case systems have
> their own ways of shutting down for kexec.

I think, if do_kexec.sh does not work (load + reboot), the user should
use whatever the standard way on a distro to do kexec.

>
> e.g. a kexec_load.sh script that does everything that do_kexec.sh does execpt
> the `kexec -e`. Then do_kexec.sh just calls kexec_load.sh and kexec -e.

^ permalink raw reply

* [PATCH v4 0/3] initrd: remove half of classic initrd support
From: Askar Safin @ 2025-11-19 22:24 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
	Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
	Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
	Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
	Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
	linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
	Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
	Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
	Jessica Clarke, Nicolas Schichan, David Disseldorp, patches

This patchset will not affect anyone, who showed up in these lists.
See [5] for details.

Intro
====
This patchset removes half of classic initrd (initial RAM disk) support,
i. e. linuxrc code path, which was deprecated in 2020.
Initramfs still stays, RAM disk itself (brd) still stays.
And other half of initrd stays, too.
init/do_mounts* are listed in VFS entry in
MAINTAINERS, so I think this patchset should go through VFS tree.
I tested the patchset on 8 (!!!) archs in Qemu (see details below).
If you still use initrd, see below for workaround.

In 2020 deprecation notice was put to linuxrc initrd code path.
In v1 I tried to remove initrd
fully, but Nicolas Schichan reported that he still uses
other code path (root=/dev/ram0 one) on million devices [4].
root=/dev/ram0 code path did not contain deprecation notice.

So, in this version of patchset I remove deprecated code path,
i. e. linuxrc one, while keeping other, i. e. root=/dev/ram0 one.

Also I put deprecation notice to remaining code path, i. e. to
root=/dev/ram0 one. I plan to send patches for full removal
of initrd after one year, i. e. in January 2027 (of course,
initramfs will still work).

Also, I tried to make this patchset small to make sure it
can be reverted easily. I plan to send cleanups later.

Details
====
Other user-visible changes:

- Removed kernel command line parameters "load_ramdisk" and
"prompt_ramdisk", which did nothing and were deprecated
- Removed /proc/sys/kernel/real-root-dev . It was used
for initrd only
- Command line parameters "noinitrd" and "ramdisk_start=" are deprecated

Testing
====
I tested my patchset on many architectures in Qemu using my Rust
program, heavily based on mkroot [1].

I used the following cross-compilers:

aarch64-linux-musleabi
armv4l-linux-musleabihf
armv5l-linux-musleabihf
armv7l-linux-musleabihf
i486-linux-musl
i686-linux-musl
mips-linux-musl
mips64-linux-musl
mipsel-linux-musl
powerpc-linux-musl
powerpc64-linux-musl
powerpc64le-linux-musl
riscv32-linux-musl
riscv64-linux-musl
s390x-linux-musl
sh4-linux-musl
sh4eb-linux-musl
x86_64-linux-musl

taken from this directory [2].

So, as you can see, there are 18 triplets, which correspond to 8 subdirs in arch/.

For every triplet I tested that:
- Initramfs still works (both builtin and external)
- Direct boot from disk still works
- Remaining initrd code path (root=/dev/ram0) still works

Workaround
====
If "retain_initrd" is passed to kernel, then initramfs/initrd,
passed by bootloader, is retained and becomes available after boot
as read-only magic file /sys/firmware/initrd [3].

No copies are involved. I. e. /sys/firmware/initrd is simply
a reference to original blob passed by bootloader.

This works even if initrd/initramfs is not recognized by kernel
in any way, i. e. even if it is not valid cpio archive, nor
a fs image supported by classic initrd.

This works both with my patchset and without it.

This means that you can emulate classic initrd so:
link builtin initramfs to kernel; in /init in this initramfs
copy /sys/firmware/initrd to some file in / and loop-mount it.

This is even better than classic initrd, because:
- You can use fs not supported by classic initrd, for example erofs
- One copy is involved (from /sys/firmware/initrd to some file in /)
as opposed to two when using classic initrd

Still, I don't recommend using this workaround, because
I want everyone to migrate to proper modern initramfs.
But still you can use this workaround if you want.

Also: it is not possible to directly loop-mount
/sys/firmware/initrd . Theoretically kernel can be changed
to allow this (and/or to make it writable), but I think nobody needs this.
And I don't want to implement this.

On Qemu's -initrd and GRUB's initrd
====
Don't panic, this patchset doesn't remove initramfs
(which is used by nearly all Linux distros). And I don't
have plans to remove it.

Qemu's -initrd option and GRUB's initrd command refer
to initrd bootloader mechanism, which is used to
load both initrd and (external) initramfs.

So, if you use Qemu's -initrd or GRUB's initrd,
then you likely use them to pass initramfs, and thus
you are safe.

v1: https://lore.kernel.org/lkml/20250913003842.41944-1-safinaskar@gmail.com/

v1 -> v2 changes:
- A lot. I removed most patches, see cover letter for details

v2: https://lore.kernel.org/lkml/20251010094047.3111495-1-safinaskar@gmail.com/

v2 -> v3 changes:
- Commit messages
- Expanded docs for "noinitrd"
- Added link to /sys/firmware/initrd workaround to pr_warn

v3: https://lore.kernel.org/lkml/20251017060956.1151347-1-safinaskar@gmail.com/

v3 -> v4 changes:
- Changed "September 2026" to "January 2027" (i. e. after 2026 LTS release)

[1] https://github.com/landley/toybox/tree/master/mkroot
[2] https://landley.net/toybox/downloads/binaries/toolchains/latest
[3] https://lore.kernel.org/all/20231207235654.16622-1-graf@amazon.com/
[4] https://lore.kernel.org/lkml/20250918152830.438554-1-nschichan@freebox.fr/
[5] https://lore.kernel.org/lkml/20251022082604.25437-1-safinaskar@gmail.com/

Askar Safin (3):
  init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command
    line parameters
  initrd: remove deprecated code path (linuxrc)
  init: remove /proc/sys/kernel/real-root-dev

 .../admin-guide/kernel-parameters.txt         |  12 +-
 Documentation/admin-guide/sysctl/kernel.rst   |   6 -
 arch/arm/configs/neponset_defconfig           |   2 +-
 fs/init.c                                     |  14 ---
 include/linux/init_syscalls.h                 |   1 -
 include/linux/initrd.h                        |   2 -
 include/uapi/linux/sysctl.h                   |   1 -
 init/do_mounts.c                              |  11 +-
 init/do_mounts.h                              |  18 +--
 init/do_mounts_initrd.c                       | 107 ++----------------
 init/do_mounts_rd.c                           |  24 +---
 11 files changed, 23 insertions(+), 175 deletions(-)

base-commit: 6a23ae0a96a600d1d12557add110e0bb6e32730c (v6.18-rc6)
-- 
2.47.3

^ permalink raw reply

* [PATCH v4 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Askar Safin @ 2025-11-19 22:24 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
	Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
	Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
	Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
	Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
	linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
	Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
	Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
	Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251119222407.3333257-1-safinaskar@gmail.com>

...which do nothing. They were deprecated (in documentation) in
6b99e6e6aa62 ("Documentation/admin-guide: blockdev/ramdisk: remove use of
"rdev"") in 2020 and in kernel messages in c8376994c86c ("initrd: remove
support for multiple floppies") in 2020.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ----
 arch/arm/configs/neponset_defconfig             | 2 +-
 init/do_mounts.c                                | 7 -------
 init/do_mounts_rd.c                             | 7 -------
 4 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e..15af6933eab4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3319,8 +3319,6 @@
 			If there are multiple matching configurations changing
 			the same attribute, the last one is used.
 
-	load_ramdisk=	[RAM] [Deprecated]
-
 	lockd.nlm_grace_period=P  [NFS] Assign grace period.
 			Format: <integer>
 
@@ -5284,8 +5282,6 @@
 			Param: <number> - step/bucket size as a power of 2 for
 				statistical time based profiling.
 
-	prompt_ramdisk=	[RAM] [Deprecated]
-
 	prot_virt=	[S390] enable hosting protected virtual machines
 			isolated from the hypervisor (if hardware supports
 			that). If enabled, the default kernel base address
diff --git a/arch/arm/configs/neponset_defconfig b/arch/arm/configs/neponset_defconfig
index 2227f86100ad..4d720001c12e 100644
--- a/arch/arm/configs/neponset_defconfig
+++ b/arch/arm/configs/neponset_defconfig
@@ -9,7 +9,7 @@ CONFIG_ASSABET_NEPONSET=y
 CONFIG_ZBOOT_ROM_TEXT=0x80000
 CONFIG_ZBOOT_ROM_BSS=0xc1000000
 CONFIG_ZBOOT_ROM=y
-CONFIG_CMDLINE="console=ttySA0,38400n8 cpufreq=221200 rw root=/dev/mtdblock2 mtdparts=sa1100:512K(boot),1M(kernel),2560K(initrd),4M(root) load_ramdisk=1 prompt_ramdisk=0 mem=32M noinitrd initrd=0xc0800000,3M"
+CONFIG_CMDLINE="console=ttySA0,38400n8 cpufreq=221200 rw root=/dev/mtdblock2 mtdparts=sa1100:512K(boot),1M(kernel),2560K(initrd),4M(root) mem=32M noinitrd initrd=0xc0800000,3M"
 CONFIG_FPE_NWFPE=y
 CONFIG_PM=y
 CONFIG_MODULES=y
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 6af29da8889e..0f2f44e6250c 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -34,13 +34,6 @@ static int root_wait;
 
 dev_t ROOT_DEV;
 
-static int __init load_ramdisk(char *str)
-{
-	pr_warn("ignoring the deprecated load_ramdisk= option\n");
-	return 1;
-}
-__setup("load_ramdisk=", load_ramdisk);
-
 static int __init readonly(char *str)
 {
 	if (*str)
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 19d9f33dcacf..5311f2d7edc8 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -18,13 +18,6 @@
 static struct file *in_file, *out_file;
 static loff_t in_pos, out_pos;
 
-static int __init prompt_ramdisk(char *str)
-{
-	pr_warn("ignoring the deprecated prompt_ramdisk= option\n");
-	return 1;
-}
-__setup("prompt_ramdisk=", prompt_ramdisk);
-
 int __initdata rd_image_start;		/* starting block # of image */
 
 static int __init ramdisk_start_setup(char *str)
-- 
2.47.3


^ permalink raw reply related

* [PATCH v4 2/3] initrd: remove deprecated code path (linuxrc)
From: Askar Safin @ 2025-11-19 22:24 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
	Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
	Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
	Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
	Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
	linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
	Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
	Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
	Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251119222407.3333257-1-safinaskar@gmail.com>

Remove linuxrc initrd code path, which was deprecated in 2020.

Initramfs and (non-initial) RAM disks (i. e. brd) still work.

Both built-in and bootloader-supplied initramfs still work.

Non-linuxrc initrd code path (i. e. using /dev/ram as final root
filesystem) still works, but I put deprecation message into it.

Also I deprecate command line parameters "noinitrd" and "ramdisk_start=".

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  8 +-
 fs/init.c                                     | 14 ---
 include/linux/init_syscalls.h                 |  1 -
 include/linux/initrd.h                        |  2 -
 init/do_mounts.c                              |  4 +-
 init/do_mounts.h                              | 18 +---
 init/do_mounts_initrd.c                       | 87 ++-----------------
 init/do_mounts_rd.c                           | 17 +---
 8 files changed, 22 insertions(+), 129 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 15af6933eab4..df441d1a9555 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4324,8 +4324,10 @@
 			Note that this argument takes precedence over
 			the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
 
-	noinitrd	[RAM] Tells the kernel not to load any configured
-			initial RAM disk.
+	noinitrd	[Deprecated,RAM] Tells the kernel not to load any configured
+			initial RAM disk. Currently this parameter applies to
+			initrd only, not to initramfs. But it applies to both
+			in EFI mode.
 
 	nointremap	[X86-64,Intel-IOMMU,EARLY] Do not enable interrupt
 			remapping.
@@ -5338,7 +5340,7 @@
 	ramdisk_size=	[RAM] Sizes of RAM disks in kilobytes
 			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
-	ramdisk_start=	[RAM] RAM disk image start address
+	ramdisk_start=	[Deprecated,RAM] RAM disk image start address
 
 	random.trust_cpu=off
 			[KNL,EARLY] Disable trusting the use of the CPU's
diff --git a/fs/init.c b/fs/init.c
index 07f592ccdba8..60719494d9a0 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -27,20 +27,6 @@ int __init init_mount(const char *dev_name, const char *dir_name,
 	return ret;
 }
 
-int __init init_umount(const char *name, int flags)
-{
-	int lookup_flags = LOOKUP_MOUNTPOINT;
-	struct path path;
-	int ret;
-
-	if (!(flags & UMOUNT_NOFOLLOW))
-		lookup_flags |= LOOKUP_FOLLOW;
-	ret = kern_path(name, lookup_flags, &path);
-	if (ret)
-		return ret;
-	return path_umount(&path, flags);
-}
-
 int __init init_chdir(const char *filename)
 {
 	struct path path;
diff --git a/include/linux/init_syscalls.h b/include/linux/init_syscalls.h
index 92045d18cbfc..0bdbc458a881 100644
--- a/include/linux/init_syscalls.h
+++ b/include/linux/init_syscalls.h
@@ -2,7 +2,6 @@
 
 int __init init_mount(const char *dev_name, const char *dir_name,
 		const char *type_page, unsigned long flags, void *data_page);
-int __init init_umount(const char *name, int flags);
 int __init init_chdir(const char *filename);
 int __init init_chroot(const char *filename);
 int __init init_chown(const char *filename, uid_t user, gid_t group, int flags);
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index f1a1f4c92ded..7e5d26c8136f 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -3,8 +3,6 @@
 #ifndef __LINUX_INITRD_H
 #define __LINUX_INITRD_H
 
-#define INITRD_MINOR 250 /* shouldn't collide with /dev/ram* too soon ... */
-
 /* starting block # of image */
 extern int rd_image_start;
 
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 0f2f44e6250c..1054ad3c905a 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -476,13 +476,11 @@ void __init prepare_namespace(void)
 	if (saved_root_name[0])
 		ROOT_DEV = parse_root_device(saved_root_name);
 
-	if (initrd_load(saved_root_name))
-		goto out;
+	initrd_load();
 
 	if (root_wait)
 		wait_for_root(saved_root_name);
 	mount_root(saved_root_name);
-out:
 	devtmpfs_mount();
 	init_mount(".", "/", NULL, MS_MOVE, NULL);
 	init_chroot(".");
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 6069ea3eb80d..a386ee5314c9 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -23,25 +23,15 @@ static inline __init int create_dev(char *name, dev_t dev)
 }
 
 #ifdef CONFIG_BLK_DEV_RAM
-
-int __init rd_load_disk(int n);
-int __init rd_load_image(char *from);
-
+int __init rd_load_image(void);
 #else
-
-static inline int rd_load_disk(int n) { return 0; }
-static inline int rd_load_image(char *from) { return 0; }
-
+static inline int rd_load_image(void) { return 0; }
 #endif
 
 #ifdef CONFIG_BLK_DEV_INITRD
-bool __init initrd_load(char *root_device_name);
+void __init initrd_load(void);
 #else
-static inline bool initrd_load(char *root_device_name)
-{
-	return false;
-	}
-
+static inline void initrd_load(void) { }
 #endif
 
 /* Ensure that async file closing finished to prevent spurious errors. */
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index f6867bad0d78..fe335dbc95e0 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -2,13 +2,7 @@
 #include <linux/unistd.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
-#include <linux/minix_fs.h>
-#include <linux/romfs_fs.h>
 #include <linux/initrd.h>
-#include <linux/sched.h>
-#include <linux/freezer.h>
-#include <linux/kmod.h>
-#include <uapi/linux/mount.h>
 
 #include "do_mounts.h"
 
@@ -41,6 +35,7 @@ late_initcall(kernel_do_mounts_initrd_sysctls_init);
 
 static int __init no_initrd(char *str)
 {
+	pr_warn("noinitrd option is deprecated and will be removed soon\n");
 	mount_initrd = 0;
 	return 1;
 }
@@ -70,85 +65,19 @@ static int __init early_initrd(char *p)
 }
 early_param("initrd", early_initrd);
 
-static int __init init_linuxrc(struct subprocess_info *info, struct cred *new)
-{
-	ksys_unshare(CLONE_FS | CLONE_FILES);
-	console_on_rootfs();
-	/* move initrd over / and chdir/chroot in initrd root */
-	init_chdir("/root");
-	init_mount(".", "/", NULL, MS_MOVE, NULL);
-	init_chroot(".");
-	ksys_setsid();
-	return 0;
-}
-
-static void __init handle_initrd(char *root_device_name)
-{
-	struct subprocess_info *info;
-	static char *argv[] = { "linuxrc", NULL, };
-	extern char *envp_init[];
-	int error;
-
-	pr_warn("using deprecated initrd support, will be removed soon.\n");
-
-	real_root_dev = new_encode_dev(ROOT_DEV);
-	create_dev("/dev/root.old", Root_RAM0);
-	/* mount initrd on rootfs' /root */
-	mount_root_generic("/dev/root.old", root_device_name,
-			   root_mountflags & ~MS_RDONLY);
-	init_mkdir("/old", 0700);
-	init_chdir("/old");
-
-	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
-					 GFP_KERNEL, init_linuxrc, NULL, NULL);
-	if (!info)
-		return;
-	call_usermodehelper_exec(info, UMH_WAIT_PROC|UMH_FREEZABLE);
-
-	/* move initrd to rootfs' /old */
-	init_mount("..", ".", NULL, MS_MOVE, NULL);
-	/* switch root and cwd back to / of rootfs */
-	init_chroot("..");
-
-	if (new_decode_dev(real_root_dev) == Root_RAM0) {
-		init_chdir("/old");
-		return;
-	}
-
-	init_chdir("/");
-	ROOT_DEV = new_decode_dev(real_root_dev);
-	mount_root(root_device_name);
-
-	printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
-	error = init_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
-	if (!error)
-		printk("okay\n");
-	else {
-		if (error == -ENOENT)
-			printk("/initrd does not exist. Ignored.\n");
-		else
-			printk("failed\n");
-		printk(KERN_NOTICE "Unmounting old root\n");
-		init_umount("/old", MNT_DETACH);
-	}
-}
-
-bool __init initrd_load(char *root_device_name)
+void __init initrd_load(void)
 {
 	if (mount_initrd) {
 		create_dev("/dev/ram", Root_RAM0);
 		/*
-		 * Load the initrd data into /dev/ram0. Execute it as initrd
-		 * unless /dev/ram0 is supposed to be our actual root device,
-		 * in that case the ram disk is just set up here, and gets
-		 * mounted in the normal path.
+		 * Load the initrd data into /dev/ram0.
 		 */
-		if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
-			init_unlink("/initrd.image");
-			handle_initrd(root_device_name);
-			return true;
+		if (rd_load_image()) {
+			pr_warn("using deprecated initrd support, will be removed in January 2027; "
+				"use initramfs instead or (as a last resort) /sys/firmware/initrd; "
+				"see section \"Workaround\" in "
+				"https://lore.kernel.org/lkml/20251010094047.3111495-1-safinaskar@gmail.com\n");
 		}
 	}
 	init_unlink("/initrd.image");
-	return false;
 }
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 5311f2d7edc8..0a021bbcd501 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -22,6 +22,7 @@ int __initdata rd_image_start;		/* starting block # of image */
 
 static int __init ramdisk_start_setup(char *str)
 {
+	pr_warn("ramdisk_start= option is deprecated and will be removed soon\n");
 	rd_image_start = simple_strtol(str,NULL,0);
 	return 1;
 }
@@ -177,7 +178,7 @@ static unsigned long nr_blocks(struct file *file)
 	return i_size_read(inode) >> 10;
 }
 
-int __init rd_load_image(char *from)
+int __init rd_load_image(void)
 {
 	int res = 0;
 	unsigned long rd_blocks, devblocks, nr_disks;
@@ -191,7 +192,7 @@ int __init rd_load_image(char *from)
 	if (IS_ERR(out_file))
 		goto out;
 
-	in_file = filp_open(from, O_RDONLY, 0);
+	in_file = filp_open("/initrd.image", O_RDONLY, 0);
 	if (IS_ERR(in_file))
 		goto noclose_input;
 
@@ -220,10 +221,7 @@ int __init rd_load_image(char *from)
 	/*
 	 * OK, time to copy in the data
 	 */
-	if (strcmp(from, "/initrd.image") == 0)
-		devblocks = nblocks;
-	else
-		devblocks = nr_blocks(in_file);
+	devblocks = nblocks;
 
 	if (devblocks == 0) {
 		printk(KERN_ERR "RAMDISK: could not determine device size\n");
@@ -267,13 +265,6 @@ int __init rd_load_image(char *from)
 	return res;
 }
 
-int __init rd_load_disk(int n)
-{
-	create_dev("/dev/root", ROOT_DEV);
-	create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, n));
-	return rd_load_image("/dev/root");
-}
-
 static int exit_code;
 static int decompress_error;
 
-- 
2.47.3


^ permalink raw reply related

* [PATCH v4 3/3] init: remove /proc/sys/kernel/real-root-dev
From: Askar Safin @ 2025-11-19 22:24 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
	Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
	Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
	Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
	Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
	linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
	Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
	Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
	Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251119222407.3333257-1-safinaskar@gmail.com>

It is not used anymore.

Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  6 ------
 include/uapi/linux/sysctl.h                 |  1 -
 init/do_mounts_initrd.c                     | 20 --------------------
 3 files changed, 27 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index f3ee807b5d8b..218265babaf9 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1215,12 +1215,6 @@ that support this feature.
 ==  ===========================================================================
 
 
-real-root-dev
-=============
-
-See Documentation/admin-guide/initrd.rst.
-
-
 reboot-cmd (SPARC only)
 =======================
 
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 63d1464cb71c..1c7fe0f4dca4 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -92,7 +92,6 @@ enum
 	KERN_DOMAINNAME=8,	/* string: domainname */
 
 	KERN_PANIC=15,		/* int: panic timeout */
-	KERN_REALROOTDEV=16,	/* real root device to mount after initrd */
 
 	KERN_SPARC_REBOOT=21,	/* reboot command on Sparc */
 	KERN_CTLALTDEL=22,	/* int: allow ctl-alt-del to reboot */
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index fe335dbc95e0..892e69ab41c4 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -8,31 +8,11 @@
 
 unsigned long initrd_start, initrd_end;
 int initrd_below_start_ok;
-static unsigned int real_root_dev;	/* do_proc_dointvec cannot handle kdev_t */
 static int __initdata mount_initrd = 1;
 
 phys_addr_t phys_initrd_start __initdata;
 unsigned long phys_initrd_size __initdata;
 
-#ifdef CONFIG_SYSCTL
-static const struct ctl_table kern_do_mounts_initrd_table[] = {
-	{
-		.procname       = "real-root-dev",
-		.data           = &real_root_dev,
-		.maxlen         = sizeof(int),
-		.mode           = 0644,
-		.proc_handler   = proc_dointvec,
-	},
-};
-
-static __init int kernel_do_mounts_initrd_sysctls_init(void)
-{
-	register_sysctl_init("kernel", kern_do_mounts_initrd_table);
-	return 0;
-}
-late_initcall(kernel_do_mounts_initrd_sysctls_init);
-#endif /* CONFIG_SYSCTL */
-
 static int __init no_initrd(char *str)
 {
 	pr_warn("noinitrd option is deprecated and will be removed soon\n");
-- 
2.47.3


^ permalink raw reply related

* Re: Safety of resolving untrusted paths with detached mount dirfd
From: Aleksa Sarai @ 2025-11-20  2:18 UTC (permalink / raw)
  To: Alyssa Ross
  Cc: linux-fsdevel, Demi Marie Obenour, Jann Horn, Eric W. Biederman,
	jlayton, Bruce Fields, Al Viro, Arnd Bergmann, shuah,
	David Howells, Andy Lutomirski, Christian Brauner, Tycho Andersen,
	linux-kernel, linux-api
In-Reply-To: <87cy5eqgn8.fsf@alyssa.is>

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]

On 2025-11-19, Alyssa Ross <hi@alyssa.is> wrote:
> Hello,
> 
> As we know, it's not safe to use chroot() for resolving untrusted paths
> within some root, as a subdirectory could be moved outside of the
> process root while walking the path[1].  On the other hand,
> LOOKUP_BENEATH is supposed to be robust against this, and going by [2],
> it sounds like resolving with the mount namespace root as dirfd should
> also be.
> 
> My question is: would resolving an untrusted path against a detached
> mount root dirfd opened with OPEN_TREE_CLONE (not necessarily a
> filesystem root) also be expected to be robust against traversal issues?
> i.e. can I rely on an untrusted path never resolving to a path that
> isn't under the mount root?

No, if you hit an absolute symlink or use an absolute path it will
resolve to your current->fs->root (mount namespace root or chroot).
However, OPEN_TREE_CLONE will stop ".." from naively stepping out of the
detached bind-mount. If you are dealing with procfs then magic-links can
also jump out.

You can always use RESOLVE_BENEATH or RESOLVE_IN_ROOT in combination
with OPEN_TREE_CLONE.

-- 
Aleksa Sarai
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: Safety of resolving untrusted paths with detached mount dirfd
From: Demi Marie Obenour @ 2025-11-20  2:39 UTC (permalink / raw)
  To: Aleksa Sarai, Alyssa Ross
  Cc: linux-fsdevel, Jann Horn, Eric W. Biederman, jlayton,
	Bruce Fields, Al Viro, Arnd Bergmann, shuah, David Howells,
	Andy Lutomirski, Christian Brauner, Tycho Andersen, linux-kernel,
	linux-api
In-Reply-To: <2025-11-20-limber-salted-luncheon-scads-7AT044@cyphar.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 1730 bytes --]

On 11/19/25 21:18, Aleksa Sarai wrote:
> On 2025-11-19, Alyssa Ross <hi@alyssa.is> wrote:
>> Hello,
>>
>> As we know, it's not safe to use chroot() for resolving untrusted paths
>> within some root, as a subdirectory could be moved outside of the
>> process root while walking the path[1].  On the other hand,
>> LOOKUP_BENEATH is supposed to be robust against this, and going by [2],
>> it sounds like resolving with the mount namespace root as dirfd should
>> also be.
>>
>> My question is: would resolving an untrusted path against a detached
>> mount root dirfd opened with OPEN_TREE_CLONE (not necessarily a
>> filesystem root) also be expected to be robust against traversal issues?
>> i.e. can I rely on an untrusted path never resolving to a path that
>> isn't under the mount root?
> 
> No, if you hit an absolute symlink or use an absolute path it will
> resolve to your current->fs->root (mount namespace root or chroot).
> However, OPEN_TREE_CLONE will stop ".." from naively stepping out of the
> detached bind-mount. If you are dealing with procfs then magic-links can
> also jump out.

Is using open_tree_attr() with MOUNT_ATTR_NOSYMFOLLOW enough to prevent
these?  Will it still provide protection even if someone concurrently
renames one of the files out from under the root?  I know that can
escape a chroot, but I wonder if this provides more guarantees.
https://github.com/QubesOS/qubes-secpack/blob/main/QSBs/qsb-014-2015.txt
was the chroot breakout.

> You can always use RESOLVE_BENEATH or RESOLVE_IN_ROOT in combination
> with OPEN_TREE_CLONE.
Unfortunately not everything supports that.  For instance, mkdirat()
doesn't.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Arnd Bergmann @ 2025-11-20  8:20 UTC (permalink / raw)
  To: Askar Safin, linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner,
	Alexander Viro, Jan Kara, Christoph Hellwig, Jens Axboe,
	Andy Shevchenko, Aleksa Sarai, Thomas Weißschuh,
	Julian Stecklina, Gao Xiang, Art Nikpal, Andrew Morton,
	Alexander Graf, Rob Landley, Lennart Poettering, Linux-Arch,
	linux-block, initramfs, linux-api, linux-doc, Michal Simek,
	Luis Chamberlain, Kees Cook, Thorsten Blum, Heiko Carstens,
	Dave Young, Christophe Leroy, Krzysztof Kozlowski,
	Borislav Petkov, Jessica Clarke, Nicolas Schichan,
	David Disseldorp, patches
In-Reply-To: <20251119222407.3333257-2-safinaskar@gmail.com>

On Wed, Nov 19, 2025, at 23:24, Askar Safin wrote:
> ...which do nothing. They were deprecated (in documentation) in
> 6b99e6e6aa62 ("Documentation/admin-guide: blockdev/ramdisk: remove use of
> "rdev"") in 2020 and in kernel messages in c8376994c86c ("initrd: remove
> support for multiple floppies") in 2020.
>
> Signed-off-by: Askar Safin <safinaskar@gmail.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 4 ----
>  arch/arm/configs/neponset_defconfig             | 2 +-

For the arm defconfig:

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply

* Re: Safety of resolving untrusted paths with detached mount dirfd
From: Aleksa Sarai @ 2025-11-20  9:24 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Alyssa Ross, linux-fsdevel, Jann Horn, Eric W. Biederman, jlayton,
	Bruce Fields, Al Viro, Arnd Bergmann, shuah, David Howells,
	Andy Lutomirski, Christian Brauner, Tycho Andersen, linux-kernel,
	linux-api
In-Reply-To: <cdf9deb2-7a09-48c5-97e2-2ea6d5901882@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]

On 2025-11-19, Demi Marie Obenour <demiobenour@gmail.com> wrote:
> On 11/19/25 21:18, Aleksa Sarai wrote:
> > On 2025-11-19, Alyssa Ross <hi@alyssa.is> wrote:
> >> Hello,
> >>
> >> As we know, it's not safe to use chroot() for resolving untrusted paths
> >> within some root, as a subdirectory could be moved outside of the
> >> process root while walking the path[1].  On the other hand,
> >> LOOKUP_BENEATH is supposed to be robust against this, and going by [2],
> >> it sounds like resolving with the mount namespace root as dirfd should
> >> also be.
> >>
> >> My question is: would resolving an untrusted path against a detached
> >> mount root dirfd opened with OPEN_TREE_CLONE (not necessarily a
> >> filesystem root) also be expected to be robust against traversal issues?
> >> i.e. can I rely on an untrusted path never resolving to a path that
> >> isn't under the mount root?
> > 
> > No, if you hit an absolute symlink or use an absolute path it will
> > resolve to your current->fs->root (mount namespace root or chroot).
> > However, OPEN_TREE_CLONE will stop ".." from naively stepping out of the
> > detached bind-mount. If you are dealing with procfs then magic-links can
> > also jump out.
> 
> Is using open_tree_attr() with MOUNT_ATTR_NOSYMFOLLOW enough to prevent
> these?  Will it still provide protection even if someone concurrently
> renames one of the files out from under the root?  I know that can
> escape a chroot, but I wonder if this provides more guarantees.

That will block symlinks (in a similar manner to RESOLVE_NO_SYMLINKS),
so those particular problems would not be an issue. Of course, a lot of
symlink usages are valid and so this will block those as well (back when
I wrote openat2 I did a cursory scan and something like 15% of system
paths contained symlinks on my system).

I think that ".." will not be a problem even with renames because the
detached mount is associated with the directory (just like how moving a
bind-mount source doesn't suddenly expose more information).

It also goes without saying that you need to make sure an absolute path
*never* gets passed to any of the helper functions you write to do this
-- in my view this is usually going to be quite a fragile setup. Who is
providing the paths to your program?

> https://github.com/QubesOS/qubes-secpack/blob/main/QSBs/qsb-014-2015.txt
> was the chroot breakout.
> 
> > You can always use RESOLVE_BENEATH or RESOLVE_IN_ROOT in combination
> > with OPEN_TREE_CLONE.
>
> Unfortunately not everything supports that.  For instance, mkdirat()
> doesn't.

You can openat2(RESOLVE_BENEATH) the parent directory and then mkdirat()
the final component (because mkdirat does not follow trailing symlinks).
This is what libpathrs[1] does, and it works for most *at() syscalls
(those that support AT_EMPTY_PATH are even easier).

[1]: https://github.com/cyphar/libpathrs

-- 
Aleksa Sarai
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply

* Re: RFC: Serial port DTR/RTS - O_<something>
From: Ned Ulbricht @ 2025-11-20 13:31 UTC (permalink / raw)
  To: H. Peter Anvin, Maciej W. Rozycki
  Cc: Greg KH, Theodore Ts'o, Maarten Brock,
	linux-serial@vger.kernel.org, linux-api@vger.kernel.org, LKML
In-Reply-To: <f643f1f6-7e69-4be6-ac8a-7b1a3a9c402d@zytor.com>

On 11/18/25 10:05, H. Peter Anvin wrote:

>> "(O_EXCL|O_NOFOLLOW)" provokes a thought...
>>
>> As essential context, fs/open.c build_open_flags() has:
>>
>> if (flags & O_CREAT) {
>>      op->intent |= LOOKUP_CREATE;
>>      if (flags & O_EXCL) {
>>          op->intent |= LOOKUP_EXCL;
>>          flags |= O_NOFOLLOW;
>>      }
>> }

[snip]

> I had missed the bit in the spec that says that O_CREAT|O_EXCL is required to
> imply O_NOFOLLOW (as Linux indeed does as seen above.)

Fwiw, earlier today I had an ultimately unsuccessful series of searches
using the Austin Group Issue tracker at:

https://austingroupbugs.net/view_all_bug_page.php

Searched (serially): "O_EXCL", "O_NOFOLLOW", "EEXIST", "ELOOP"; all with
no other filter refinements.  Then searched filtering by 'Section'
(multiple adjacent selections): 'open' .. 'openat'. In all results,
simply eyeball-scanned 'Summary' (w/o opening most).

Apparent upshot, unless I'm mistaken, is that the exact error return is
a trivial conflict with no appreciable impact on higher levels.

In roughly same vein, FreeBSD open(2) man page, specifically at
"[EMLINK]" and "STANDARDS", might possibly be stretched to read as
implicitly encouraging that assessment.

https://man.freebsd.org/cgi/man.cgi?query=open&manpath=FreeBSD+16.0-CURRENT

Alhough I don't have a FreeBSD box available to actually test
(O_CREAT|O_EXCL|O_NOFOLLOW) symlink behavior on that platform.  (Maybe
that combo's wired to detonate tnt nasal daemons? Dunno;-)

Unfortunately, this does prompt a close re-scrutinization of linux's
open(2) man page. Notwithstanding the damn spec, the linux man page
should precisely and accurately reflect the observed error return?

Ned

^ permalink raw reply

* Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
From: Pratyush Yadav @ 2025-11-20 15:34 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, pratyush, jasonmiu, graf, dmatlack, rientjes,
	corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
	saeedm, ajayachandra, jgg, parav, leonro, witu, hughd, skhawaja,
	chrisl
In-Reply-To: <CA+CK2bADcVsRnovkwWftPCbubXoaFrPzSavMU+G9f3XAz3YMLQ@mail.gmail.com>

On Wed, Nov 19 2025, Pasha Tatashin wrote:

> On Mon, Nov 17, 2025 at 6:04 AM Mike Rapoport <rppt@kernel.org> wrote:
>>
>> On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
>> > From: Pratyush Yadav <ptyadav@amazon.de>
>> >
>> > The ability to preserve a memfd allows userspace to use KHO and LUO to
>> > transfer its memory contents to the next kernel. This is useful in many
>> > ways. For one, it can be used with IOMMUFD as the backing store for
>> > IOMMU page tables. Preserving IOMMUFD is essential for performing a
>> > hypervisor live update with passthrough devices. memfd support provides
>> > the first building block for making that possible.
>> >
>> > For another, applications with a large amount of memory that takes time
>> > to reconstruct, reboots to consume kernel upgrades can be very
>> > expensive. memfd with LUO gives those applications reboot-persistent
>> > memory that they can use to quickly save and reconstruct that state.
>> >
>> > While memfd is backed by either hugetlbfs or shmem, currently only
>> > support on shmem is added. To be more precise, support for anonymous
>> > shmem files is added.
>> >
>> > The handover to the next kernel is not transparent. All the properties
>> > of the file are not preserved; only its memory contents, position, and
>> > size. The recreated file gets the UID and GID of the task doing the
>> > restore, and the task's cgroup gets charged with the memory.
>> >
>> > Once preserved, the file cannot grow or shrink, and all its pages are
>> > pinned to avoid migrations and swapping. The file can still be read from
>> > or written to.
>> >
>> > Use vmalloc to get the buffer to hold the folios, and preserve
>> > it using kho_preserve_vmalloc(). This doesn't have the size limit.
>> >
>> > Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>> > Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
[...]
>> > +     struct inode *inode = file_inode(file);
>> > +     struct memfd_luo_folio_ser *pfolios;
>> > +     struct kho_vmalloc *kho_vmalloc;
>> > +     unsigned int max_folios;
>> > +     long i, size, nr_pinned;
>> > +     struct folio **folios;
>>
>> pfolios and folios read like the former is a pointer to latter.
>> I'd s/pfolios/folios_ser/

folios_ser is a tricky name, it is very close to folio_ser (which is
what you might use for one member of the array).

I was bit by this when hacking on some hugetlb preservation code. I
wrote folios_ser instead of folio_ser in a loop, and then had to spend
half an hour trying to figure out why the code wasn't working. It is
kinda hard to differentiate between the two visually.

Not that I have a better name off the top of my head. Just saying that
this naming causes weird readability problems.

>
> Done
>
[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Mike Rapoport @ 2025-11-20 17:20 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bBFS754hdPfNAkMp_PqNpOB2nY02OkWbhRdoUiZ+ah=jw@mail.gmail.com>

On Mon, Nov 17, 2025 at 12:50:56PM -0500, Pasha Tatashin wrote:
> > > +struct liveupdate_file_handler;
> > > +struct liveupdate_session;
> >
> > Why struct liveupdate_session is a part of public LUO API?
> 
> It is an obscure version of private "struct luo_session", in order to
> give subsystem access to:
> liveupdate_get_file_incoming(s, token, filep)
> liveupdate_get_token_outgoing(s, file, tokenp)
> 
> For example, if your FD depends on another FD within a session, you
> can check if another FD is already preserved via
> liveupdate_get_token_outgoing(), and during retrieval time you can
> retrieve the "struct file" for your dependency.
 
And it's essentially unused right now.

> > > +     }
> > > +
> > > +     return 0;
> > > +
> > > +exit_err:
> > > +     fput(file);
> > > +     luo_session_free_files_mem(session);
> >
> > The error handling in this function is a mess. Pasha, please, please, use
> > goto consistently.
> 
> How is this a mess? There is a single exit_err destination, no
> exception, no early returns except at the very top of the function
> where we do early returns before fget() which makes total sense.
> 
> Do you want to add a separate destination for
> luo_session_free_files_mem() ? But that is not necessary, in many
> places it is considered totally reasonable for free(NULL) to work
> correctly...

You have a mix of releasing resources with goto or inside if (err).
And while basic free() primitives like kfree() and vfree() work correctly
with NULL as a parameter, luo_session_free_files_mem() is already not a
basic primitive and it may grow with a time. It already has two conditions
that essentially prevent anything from freeing and this will grow with the
time.

So yes, I want a separate goto destination for freeing each resource and a
goto for 

	err = fh->ops->preserve(&args);
	if (err)

case.

> > > +             luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > > +             if (!luo_file)
> > > +                     return -ENOMEM;
> >
> > Shouldn't we free files allocated on the previous iterations?
> 
> No, for the same reason explained in luo_session.c :-)

A comment here as well please :)

> > > +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > > +                              struct file **filep)
> > > +{
> >
> > Ditto.
> 
> These two functions are part of the public API allowing dependency
> tracking for vfio->iommu->memfd during preservation.

So like with FLB, until we get actual users for them they are dead code. 
And until it's clear how exactly dependency tracking for vfio->iommu->memfd
will work, we won't know if this API is useful at all or we'll need
something else in the end.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: David Matlack @ 2025-11-20 18:37 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-6-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
> The idea is that there is going to be a single userspace agent driving
> the live update, therefore, only a single process can ever hold this
> device opened at a time.
...
> +static int luo_open(struct inode *inodep, struct file *filep)
> +{
> +       struct luo_device_state *ldev = container_of(filep->private_data,
> +                                                    struct luo_device_state,
> +                                                    miscdev);
> +
> +       if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> +               return -EBUSY;

Can you remind me why the kernel needs to enforce this? What would be
wrong or unsafe from the kernel perspective if there were multiple
userspace agents holding open files for /dev/liveupdate, each with
their own sessions?

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Mike Rapoport @ 2025-11-20 18:50 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bASYtBndN24HZhkndDpsrU1rwjCokE=9eLZUq2Jhj6bag@mail.gmail.com>

On Tue, Nov 18, 2025 at 10:37:30AM -0500, Pasha Tatashin wrote:
> On Tue, Nov 18, 2025 at 6:28 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> > > >
> > > > The concept makes sense to me, but it's hard to review the implementation
> > > > without an actual user.
> > >
> > > There are three users: we will have HugeTLB support that is going to
> > > be posted as RFC in a few weeks. Also, in two weeks we are going to
> > > have an updated VFIO and IOMMU series posted both using FLBs. In the
> > > mean time, this series provides an FLB in-kernel test that verifies
> > > that multiple FLBs can be attached to File-Handlers, and the basic
> > > interfaces are working.
> >
> > Which means that essentially there won't be a real kernel user for FLB for
> > a while.
> > We usually don't merge dead code because some future patchset depends on
> > it.
> 
> I understand the concern. I would prefer to merge FLB with the rest of
> the LUO series; I don't view it as completely dead code since I have
> added the in-kernel test that specifically exercises and validates
> this API.

The test exercises a simple happy flow, but it still does not validate that
this API is what we'll be using in the end.
It's quite probable that the first upstream user of FLB will use this exact
API, but chances are that it will require adjustments to "the real life".

It does look sane, but without an actual user (sorry, but the test does not
count) it's hard to anticipate the potential required changes and potential
corner cases.

Let's hold FLB until it can be actually consumed by HugeTLB or VFIO or
IOMMU.
 
> > I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
> > it there until the first user is going to land and then FLB will move
> > upstream along with that user.
> 
> My reasoning for pushing for inclusion now is that there are many
> developers who currently depend on the FLB functionality. Having it in
> a public tree, preferably upstream, or at least linux-next, would be
> highly beneficial for their development and testing.
> 
> However, to avoid blocking the entire series, I am going to move the
> FLB patch and the in-kernel test patch to be the last two patches in
> LUOv7.
> 
> This way, the rest of the LUO series can be merged without them if
> they are blocked, however, in this case it would be best if the two
> FLB patches stayed in mm tree to allow VFIO/IOMMU/PCI/HugeTLB
> preservation developers to use them, as they all depend on functional
> FLB.

That's pretty much what I'm suggesting just without "if they are blocked" :) 
 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Pasha Tatashin @ 2025-11-20 19:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aR9i9SXGDQ6bi1mi@kernel.org>

On Thu, Nov 20, 2025 at 1:50 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 18, 2025 at 10:37:30AM -0500, Pasha Tatashin wrote:
> > On Tue, Nov 18, 2025 at 6:28 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > On Mon, Nov 17, 2025 at 10:54:29PM -0500, Pasha Tatashin wrote:
> > > > >
> > > > > The concept makes sense to me, but it's hard to review the implementation
> > > > > without an actual user.
> > > >
> > > > There are three users: we will have HugeTLB support that is going to
> > > > be posted as RFC in a few weeks. Also, in two weeks we are going to
> > > > have an updated VFIO and IOMMU series posted both using FLBs. In the
> > > > mean time, this series provides an FLB in-kernel test that verifies
> > > > that multiple FLBs can be attached to File-Handlers, and the basic
> > > > interfaces are working.
> > >
> > > Which means that essentially there won't be a real kernel user for FLB for
> > > a while.
> > > We usually don't merge dead code because some future patchset depends on
> > > it.
> >
> > I understand the concern. I would prefer to merge FLB with the rest of
> > the LUO series; I don't view it as completely dead code since I have
> > added the in-kernel test that specifically exercises and validates
> > this API.
>
> The test exercises a simple happy flow, but it still does not validate that
> this API is what we'll be using in the end.
> It's quite probable that the first upstream user of FLB will use this exact
> API, but chances are that it will require adjustments to "the real life".
>
> It does look sane, but without an actual user (sorry, but the test does not
> count) it's hard to anticipate the potential required changes and potential
> corner cases.
>
> Let's hold FLB until it can be actually consumed by HugeTLB or VFIO or
> IOMMU.

Ok

> > > I think it should stay in mm-nonmm-unstable if Andrew does not mind keeping
> > > it there until the first user is going to land and then FLB will move
> > > upstream along with that user.
> >
> > My reasoning for pushing for inclusion now is that there are many
> > developers who currently depend on the FLB functionality. Having it in
> > a public tree, preferably upstream, or at least linux-next, would be
> > highly beneficial for their development and testing.
> >
> > However, to avoid blocking the entire series, I am going to move the
> > FLB patch and the in-kernel test patch to be the last two patches in
> > LUOv7.
> >
> > This way, the rest of the LUO series can be merged without them if
> > they are blocked, however, in this case it would be best if the two
> > FLB patches stayed in mm tree to allow VFIO/IOMMU/PCI/HugeTLB
> > preservation developers to use them, as they all depend on functional
> > FLB.
>
> That's pretty much what I'm suggesting just without "if they are blocked" :)

SGTM

>
> > Pasha
>
> --
> Sincerely yours,
> Mike.
>

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: Pasha Tatashin @ 2025-11-20 19:22 UTC (permalink / raw)
  To: David Matlack
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CALzav=c-KJg8q8-4EaDC1M+GErTCiRKtn5qRbh1wa08zJ0N4ng@mail.gmail.com>

On Thu, Nov 20, 2025 at 1:38 PM David Matlack <dmatlack@google.com> wrote:
>
> On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
> > The idea is that there is going to be a single userspace agent driving
> > the live update, therefore, only a single process can ever hold this
> > device opened at a time.
> ...
> > +static int luo_open(struct inode *inodep, struct file *filep)
> > +{
> > +       struct luo_device_state *ldev = container_of(filep->private_data,
> > +                                                    struct luo_device_state,
> > +                                                    miscdev);
> > +
> > +       if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> > +               return -EBUSY;
>
> Can you remind me why the kernel needs to enforce this? What would be
> wrong or unsafe from the kernel perspective if there were multiple
> userspace agents holding open files for /dev/liveupdate, each with
> their own sessions?

By enforcing a singleton, we will ensure a consistent view for tooling
like luoadm (which will track incoming/outgoing sessions, UUIDs, etc.)
and prevent conflicting commands regarding the transition state.

This is not a bottleneck because the vast majority of the work
(preserving devicse/memory) is handled via the individual Session FDs.
Also, since sessions persist even if /dev/liveupdate is closed, we
allow the agent upgrade, or crashing without requiring concurrent
access.

Pasha

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: David Matlack @ 2025-11-20 19:42 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bD4Y3CMHcTGKradmv-hAbdtA7zsw2CYeh7-8LNianYMZw@mail.gmail.com>

On Thu, Nov 20, 2025 at 11:23 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Thu, Nov 20, 2025 at 1:38 PM David Matlack <dmatlack@google.com> wrote:
> >
> > On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
> > <pasha.tatashin@soleen.com> wrote:
> > > The idea is that there is going to be a single userspace agent driving
> > > the live update, therefore, only a single process can ever hold this
> > > device opened at a time.
> > ...
> > > +static int luo_open(struct inode *inodep, struct file *filep)
> > > +{
> > > +       struct luo_device_state *ldev = container_of(filep->private_data,
> > > +                                                    struct luo_device_state,
> > > +                                                    miscdev);
> > > +
> > > +       if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> > > +               return -EBUSY;
> >
> > Can you remind me why the kernel needs to enforce this? What would be
> > wrong or unsafe from the kernel perspective if there were multiple
> > userspace agents holding open files for /dev/liveupdate, each with
> > their own sessions?
>
> By enforcing a singleton, we will ensure a consistent view for tooling
> like luoadm (which will track incoming/outgoing sessions, UUIDs, etc.)
> and prevent conflicting commands regarding the transition state.
>
> This is not a bottleneck because the vast majority of the work
> (preserving devicse/memory) is handled via the individual Session FDs.
> Also, since sessions persist even if /dev/liveupdate is closed, we
> allow the agent upgrade, or crashing without requiring concurrent
> access.

Yeah, I'm not concerned about bottlenecking. It just seems like an
artificial constraint to impose on userspace at this point. The only
ioctls on /dev/liveupdate are to create a session and retreive a
session. Neither of those will conflict with having multiple open
files for /dev/liveupdate.

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: Pasha Tatashin @ 2025-11-20 20:13 UTC (permalink / raw)
  To: David Matlack
  Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CALzav=dmFQr+BrqzRDgio0q68MPRVnZPK4-wUXVj47o1FObgNg@mail.gmail.com>

On Thu, Nov 20, 2025 at 2:43 PM David Matlack <dmatlack@google.com> wrote:
>
> On Thu, Nov 20, 2025 at 11:23 AM Pasha Tatashin
> <pasha.tatashin@soleen.com> wrote:
> >
> > On Thu, Nov 20, 2025 at 1:38 PM David Matlack <dmatlack@google.com> wrote:
> > >
> > > On Sat, Nov 15, 2025 at 3:34 PM Pasha Tatashin
> > > <pasha.tatashin@soleen.com> wrote:
> > > > The idea is that there is going to be a single userspace agent driving
> > > > the live update, therefore, only a single process can ever hold this
> > > > device opened at a time.
> > > ...
> > > > +static int luo_open(struct inode *inodep, struct file *filep)
> > > > +{
> > > > +       struct luo_device_state *ldev = container_of(filep->private_data,
> > > > +                                                    struct luo_device_state,
> > > > +                                                    miscdev);
> > > > +
> > > > +       if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> > > > +               return -EBUSY;
> > >
> > > Can you remind me why the kernel needs to enforce this? What would be
> > > wrong or unsafe from the kernel perspective if there were multiple
> > > userspace agents holding open files for /dev/liveupdate, each with
> > > their own sessions?
> >
> > By enforcing a singleton, we will ensure a consistent view for tooling
> > like luoadm (which will track incoming/outgoing sessions, UUIDs, etc.)
> > and prevent conflicting commands regarding the transition state.
> >
> > This is not a bottleneck because the vast majority of the work
> > (preserving devicse/memory) is handled via the individual Session FDs.
> > Also, since sessions persist even if /dev/liveupdate is closed, we
> > allow the agent upgrade, or crashing without requiring concurrent
> > access.
>
> Yeah, I'm not concerned about bottlenecking. It just seems like an
> artificial constraint to impose on userspace at this point. The only
> ioctls on /dev/liveupdate are to create a session and retreive a
> session. Neither of those will conflict with having multiple open
> files for /dev/liveupdate.

Enforcing tooling consistency, and improving security for global
state. Otherwise, it can be relaxed.

Pasha

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pasha Tatashin @ 2025-11-20 20:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aR9N14KWaz6SdFcw@kernel.org>

On Thu, Nov 20, 2025 at 12:20 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 12:50:56PM -0500, Pasha Tatashin wrote:
> > > > +struct liveupdate_file_handler;
> > > > +struct liveupdate_session;
> > >
> > > Why struct liveupdate_session is a part of public LUO API?
> >
> > It is an obscure version of private "struct luo_session", in order to
> > give subsystem access to:
> > liveupdate_get_file_incoming(s, token, filep)
> > liveupdate_get_token_outgoing(s, file, tokenp)
> >
> > For example, if your FD depends on another FD within a session, you
> > can check if another FD is already preserved via
> > liveupdate_get_token_outgoing(), and during retrieval time you can
> > retrieve the "struct file" for your dependency.
>
> And it's essentially unused right now.

I am going to move this API to the end of the series, next to FLB :-)

>
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +
> > > > +exit_err:
> > > > +     fput(file);
> > > > +     luo_session_free_files_mem(session);
> > >
> > > The error handling in this function is a mess. Pasha, please, please, use
> > > goto consistently.
> >
> > How is this a mess? There is a single exit_err destination, no
> > exception, no early returns except at the very top of the function
> > where we do early returns before fget() which makes total sense.
> >
> > Do you want to add a separate destination for
> > luo_session_free_files_mem() ? But that is not necessary, in many
> > places it is considered totally reasonable for free(NULL) to work
> > correctly...
>
> You have a mix of releasing resources with goto or inside if (err).
> And while basic free() primitives like kfree() and vfree() work correctly
> with NULL as a parameter, luo_session_free_files_mem() is already not a
> basic primitive and it may grow with a time. It already has two conditions
> that essentially prevent anything from freeing and this will grow with the
> time.
>
> So yes, I want a separate goto destination for freeing each resource and a
> goto for
>
>         err = fh->ops->preserve(&args);
>         if (err)

Thanks, I made the change.

>
> case.
>
> > > > +             luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > > > +             if (!luo_file)
> > > > +                     return -ENOMEM;
> > >
> > > Shouldn't we free files allocated on the previous iterations?
> >
> > No, for the same reason explained in luo_session.c :-)
>
> A comment here as well please :)

Done

>
> > > > +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > > > +                              struct file **filep)
> > > > +{
> > >
> > > Ditto.
> >
> > These two functions are part of the public API allowing dependency
> > tracking for vfio->iommu->memfd during preservation.
>
> So like with FLB, until we get actual users for them they are dead code.
> And until it's clear how exactly dependency tracking for vfio->iommu->memfd
> will work, we won't know if this API is useful at all or we'll need
> something else in the end.

SGTM

>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v4 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Randy Dunlap @ 2025-11-20 23:44 UTC (permalink / raw)
  To: Askar Safin, linux-fsdevel, linux-kernel
  Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
	Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
	Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
	Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
	Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
	linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
	Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
	Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
	Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251119222407.3333257-2-safinaskar@gmail.com>



On 11/19/25 2:24 PM, Askar Safin wrote:
> ...which do nothing. They were deprecated (in documentation) in
> 6b99e6e6aa62 ("Documentation/admin-guide: blockdev/ramdisk: remove use of
> "rdev"") in 2020 and in kernel messages in c8376994c86c ("initrd: remove
> support for multiple floppies") in 2020.
> 
> Signed-off-by: Askar Safin <safinaskar@gmail.com>

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 4 ----
>  arch/arm/configs/neponset_defconfig             | 2 +-
>  init/do_mounts.c                                | 7 -------
>  init/do_mounts_rd.c                             | 7 -------
>  4 files changed, 1 insertion(+), 19 deletions(-)


-- 
~Randy

^ permalink raw reply

* Re: [PATCH v6 03/20] kexec: call liveupdate_reboot() before kexec
From: Pratyush Yadav @ 2025-11-21 15:55 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-4-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> Modify the kernel_kexec() to call liveupdate_reboot().
>
> This ensures that the Live Update Orchestrator is notified just
> before the kernel executes the kexec jump. The liveupdate_reboot()
> function triggers the final freeze event, allowing participating
> FDs perform last-minute check or state saving within the blackout
> window.
>
> If liveupdate_reboot() returns an error (indicating a failure during
> LUO finalization), the kexec operation is aborted to prevent proceeding
> with an inconsistent state. An error is returned to user.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
From: Pratyush Yadav @ 2025-11-21 16:32 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-5-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> Introduce concept of "Live Update Sessions" within the LUO framework.
> LUO sessions provide a mechanism to group and manage `struct file *`
> instances (representing file descriptors) that need to be preserved
> across a kexec-based live update.
>
> Each session is identified by a unique name and acts as a container
> for file objects whose state is critical to a userspace workload, such
> as a virtual machine or a high-performance database, aiming to maintain
> their functionality across a kernel transition.
>
> This groundwork establishes the framework for preserving file-backed
> state across kernel updates, with the actual file data preservation
> mechanisms to be implemented in subsequent patches.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
>  
>  #ifndef _LINUX_LIVEUPDATE_ABI_LUO_H
>  #define _LINUX_LIVEUPDATE_ABI_LUO_H
>  
> +#include <uapi/linux/liveupdate.h>
> +
>  /*
>   * The LUO FDT hooks all LUO state for sessions, fds, etc.
> - * In the root it allso carries "liveupdate-number" 64-bit property that
> + * In the root it also carries "liveupdate-number" 64-bit property that

Nit: This needs a bit of patch massaging. Patch 2 added the typo, and
this patch fixes it. It would be better to just update patch 2.

>   * corresponds to the number of live-updates performed on this machine.
>   */
>  #define LUO_FDT_SIZE		PAGE_SIZE
> @@ -51,4 +82,54 @@
>  #define LUO_FDT_COMPATIBLE	"luo-v1"
>  #define LUO_FDT_LIVEUPDATE_NUM	"liveupdate-number"
>  
> +/*
> + * LUO FDT session node
> + * LUO_FDT_SESSION_HEADER:  is a u64 physical address of struct
> + *                          luo_session_header_ser
> + */
> +#define LUO_FDT_SESSION_NODE_NAME	"luo-session"
> +#define LUO_FDT_SESSION_COMPATIBLE	"luo-session-v1"
> +#define LUO_FDT_SESSION_HEADER		"luo-session-header"
> +
> +/**
> + * struct luo_session_header_ser - Header for the serialized session data block.
> + * @pgcnt: The total size, in pages, of the entire preserved memory block
> + *         that this header describes.
> + * @count: The number of 'struct luo_session_ser' entries that immediately
> + *         follow this header in the memory block.
> + *
> + * This structure is located at the beginning of a contiguous block of
> + * physical memory preserved across the kexec. It provides the necessary
> + * metadata to interpret the array of session entries that follow.
> + */
> +struct luo_session_header_ser {
> +	u64 pgcnt;

Why do you need pgcnt here? Can't the size be inferred from count? And
since you use contiguous memory block, the folio will know its page
count anyway, right? The less we have in the ABI the better IMO.

Same for other structures below.

> +	u64 count;
> +} __packed;
> +
> +/**
> + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> + * @name:    The unique name of the session, copied from the `luo_session`
> + *           structure.
> + * @files:   The physical address of a contiguous memory block that holds
> + *           the serialized state of files.
> + * @pgcnt:   The number of pages occupied by the `files` memory block.
> + * @count:   The total number of files that were part of this session during
> + *           serialization. Used for iteration and validation during
> + *           restoration.
> + *
> + * This structure is used to package session-specific metadata for transfer
> + * between kernels via Kexec Handover. An array of these structures (one per
> + * session) is created and passed to the new kernel, allowing it to reconstruct
> + * the session context.
> + *
> + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> + */
> +struct luo_session_ser {
> +	char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> +	u64 files;
> +	u64 pgcnt;
> +	u64 count;
> +} __packed;
> +
>  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
[...]
> +/* Create a "struct file" for session */
> +static int luo_session_getfile(struct luo_session *session, struct file **filep)
> +{
> +	char name_buf[128];
> +	struct file *file;
> +
> +	guard(mutex)(&session->mutex);
> +	snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
> +	file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);

Nit: You can return the file directly and get rid of filep.

> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +
> +	*filep = file;
> +
> +	return 0;
> +}
[...]
> +int __init luo_session_setup_outgoing(void *fdt_out)
> +{
> +	struct luo_session_header_ser *header_ser;
> +	u64 header_ser_pa;
> +	int err;
> +
> +	header_ser = kho_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);

Nit: The naming is a bit confusing here. At first glance I thought this
was just allocating the header, but it allocates the whole session
serialization buffer.

> +	if (IS_ERR(header_ser))
> +		return PTR_ERR(header_ser);
> +	header_ser_pa = virt_to_phys(header_ser);
> +
> +	err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
> +	err |= fdt_property_string(fdt_out, "compatible",
> +				   LUO_FDT_SESSION_COMPATIBLE);
> +	err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEADER, &header_ser_pa,
> +			    sizeof(header_ser_pa));
> +	err |= fdt_end_node(fdt_out);
> +
> +	if (err)
> +		goto err_unpreserve;
> +
> +	header_ser->pgcnt = LUO_SESSION_PGCNT;
> +	INIT_LIST_HEAD(&luo_session_global.outgoing.list);
> +	init_rwsem(&luo_session_global.outgoing.rwsem);
> +	luo_session_global.outgoing.header_ser = header_ser;
> +	luo_session_global.outgoing.ser = (void *)(header_ser + 1);
> +	luo_session_global.outgoing.active = true;
> +
> +	return 0;
> +
> +err_unpreserve:
> +	kho_unpreserve_free(header_ser);
> +	return err;
> +}
[...]
> +int luo_session_deserialize(void)
> +{
> +	struct luo_session_header *sh = &luo_session_global.incoming;
> +	int err;
> +
> +	if (luo_session_is_deserialized())
> +		return 0;
> +
> +	luo_session_global.deserialized = true;
> +	if (!sh->active) {
> +		INIT_LIST_HEAD(&sh->list);
> +		init_rwsem(&sh->rwsem);

Nit: it would be a bit simpler if LUO init always initialized this. And
then luo_session_setup_incoming() can fill the list if it has any data.
Slight reduction in code duplication and mental load.

> +		return 0;
> +	}
> +
> +	for (int i = 0; i < sh->header_ser->count; i++) {
> +		struct luo_session *session;
> +
> +		session = luo_session_alloc(sh->ser[i].name);
> +		if (IS_ERR(session)) {
> +			pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> +				sh->ser[i].name, session);
> +			return PTR_ERR(session);
> +		}
> +
> +		err = luo_session_insert(sh, session); 
> +		if (err) {
> +			luo_session_free(session);
> +			pr_warn("Failed to insert session [%s] %pe\n",
> +				session->name, ERR_PTR(err));
> +			return err;
> +		}
> +
> +		session->count = sh->ser[i].count;
> +		session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
> +		session->pgcnt = sh->ser[i].pgcnt;
> +	}
> +
> +	kho_restore_free(sh->header_ser);
> +	sh->header_ser = NULL;
> +	sh->ser = NULL;
> +
> +	return 0;
> +}
[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: Pratyush Yadav @ 2025-11-21 16:45 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-6-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> Introduce the user-space interface for the Live Update Orchestrator
> via ioctl commands, enabling external control over the live update
> process and management of preserved resources.
>
> The idea is that there is going to be a single userspace agent driving
> the live update, therefore, only a single process can ever hold this
> device opened at a time.
>
> The following ioctl commands are introduced:
>
> LIVEUPDATE_IOCTL_CREATE_SESSION
> Provides a way for userspace to create a named session for grouping file
> descriptors that need to be preserved. It returns a new file descriptor
> representing the session.
>
> LIVEUPDATE_IOCTL_RETRIEVE_SESSION
> Allows the userspace agent in the new kernel to reclaim a preserved
> session by its name, receiving a new file descriptor to manage the
> restored resources.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Reviewed-by: Pratyush Yadav <pratyush@kernel.org>

[...]

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pratyush Yadav @ 2025-11-21 17:24 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-7-pasha.tatashin@soleen.com>

On Sat, Nov 15 2025, Pasha Tatashin wrote:

> This patch implements the core mechanism for managing preserved
> files throughout the live update lifecycle. It provides the logic to
> invoke the file handler callbacks (preserve, unpreserve, freeze,
> unfreeze, retrieve, and finish) at the appropriate stages.
>
> During the reboot phase, luo_file_freeze() serializes the final
> metadata for each file (handler compatible string, token, and data
> handle) into a memory region preserved by KHO. In the new kernel,
> luo_file_deserialize() reconstructs the in-memory file list from this
> data, preparing the session for retrieval.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
[...]
> +
> +static LIST_HEAD(luo_file_handler_list);
> +
> +/* 2 4K pages, give space for 128 files per session */
> +#define LUO_FILE_PGCNT		2ul
> +#define LUO_FILE_MAX							\
> +	((LUO_FILE_PGCNT << PAGE_SHIFT) / sizeof(struct luo_file_ser))
> +
> +/**
> + * struct luo_file - Represents a single preserved file instance.
> + * @fh:            Pointer to the &struct liveupdate_file_handler that manages
> + *                 this type of file.
> + * @file:          Pointer to the kernel's &struct file that is being preserved.
> + *                 This is NULL in the new kernel until the file is successfully
> + *                 retrieved.
> + * @serialized_data: The opaque u64 handle to the serialized state of the file.
> + *                 This handle is passed back to the handler's .freeze(),
> + *                 .retrieve(), and .finish() callbacks, allowing it to track
> + *                 and update its serialized state across phases.
> + * @retrieved:     A flag indicating whether a user/kernel in the new kernel has
> + *                 successfully called retrieve() on this file. This prevents
> + *                 multiple retrieval attempts.
> + * @mutex:         A mutex that protects the fields of this specific instance
> + *                 (e.g., @retrieved, @file), ensuring that operations like
> + *                 retrieving or finishing a file are atomic.
> + * @list:          The list_head linking this instance into its parent
> + *                 session's list of preserved files.
> + * @token:         The user-provided unique token used to identify this file.
> + *
> + * This structure is the core in-kernel representation of a single file being
> + * managed through a live update. An instance is created by luo_preserve_file()
> + * to link a 'struct file' to its corresponding handler, a user-provided token,
> + * and the serialized state handle returned by the handler's .preserve()
> + * operation.
> + *
> + * These instances are tracked in a per-session list. The @serialized_data
> + * field, which holds a handle to the file's serialized state, may be updated
> + * during the .freeze() callback before being serialized for the next kernel.
> + * After reboot, these structures are recreated by luo_file_deserialize() and
> + * are finally cleaned up by luo_file_finish().
> + */
> +struct luo_file {
> +	struct liveupdate_file_handler *fh;
> +	struct file *file;
> +	u64 serialized_data;
> +	bool retrieved;
> +	struct mutex mutex;
> +	struct list_head list;
> +	u64 token;
> +};
> +
> +static int luo_session_alloc_files_mem(struct luo_session *session)
> +{
> +	size_t size;
> +	void *mem;
> +
> +	if (session->files)
> +		return 0;
> +
> +	WARN_ON_ONCE(session->count);
> +
> +	size = LUO_FILE_PGCNT << PAGE_SHIFT;
> +	mem = kho_alloc_preserve(size);
> +	if (IS_ERR(mem))
> +		return PTR_ERR(mem);
> +
> +	session->files = mem;
> +	session->pgcnt = LUO_FILE_PGCNT;

I think this is a layering violation. luo_session should take care of
managing the session, including the memory it needs. luo_files should
take care of managing the file, including the memory it needs for _the
file_. I think proper layering will make the code a lot easier to grok
and modify later. When I want to see how sessions are handled, I can do
to luo_session.c. I won't have to poke into luo_files.c.

So I think luo_session_preserve_fd() should first make sure there is
memory available to store the file in the session, and only then call
luo_preserve_file().

> +
> +	return 0;
> +}
> +
> +static void luo_session_free_files_mem(struct luo_session *session)
> +{
> +	/* If session has files, no need to free preservation memory */
> +	if (session->count)
> +		return;
> +
> +	if (!session->files)
> +		return;
> +
> +	kho_unpreserve_free(session->files);
> +	session->files = NULL;
> +	session->pgcnt = 0;
> +}
> +
> +static bool luo_token_is_used(struct luo_session *session, u64 token)
> +{
> +	struct luo_file *iter;
> +
> +	list_for_each_entry(iter, &session->files_list, list) {
> +		if (iter->token == token)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * luo_preserve_file - Initiate the preservation of a file descriptor.
> + * @session: The session to which the preserved file will be added.
> + * @token:   A unique, user-provided identifier for the file.
> + * @fd:      The file descriptor to be preserved.
> + *
> + * This function orchestrates the first phase of preserving a file. Upon entry,
> + * it takes a reference to the 'struct file' via fget(), effectively making LUO
> + * a co-owner of the file. This reference is held until the file is either
> + * unpreserved or successfully finished in the next kernel, preventing the file
> + * from being prematurely destroyed.
> + *
> + * This function orchestrates the first phase of preserving a file. It performs
> + * the following steps:
> + *
> + * 1. Validates that the @token is not already in use within the session.
> + * 2. Ensures the session's memory for files serialization is allocated
> + *    (allocates if needed).
> + * 3. Iterates through registered handlers, calling can_preserve() to find one
> + *    compatible with the given @fd.
> + * 4. Calls the handler's .preserve() operation, which saves the file's state
> + *    and returns an opaque private data handle.
> + * 5. Adds the new instance to the session's internal list.
> + *
> + * On success, LUO takes a reference to the 'struct file' and considers it
> + * under its management until it is unpreserved or finished.
> + *
> + * In case of any failure, all intermediate allocations (file reference, memory
> + * for the 'luo_file' struct, etc.) are cleaned up before returning an error.
> + *
> + * Context: Can be called from an ioctl handler during normal system operation.
> + * Return: 0 on success. Returns a negative errno on failure:
> + *         -EEXIST if the token is already used.
> + *         -EBADF if the file descriptor is invalid.
> + *         -ENOSPC if the session is full.
> + *         -ENOENT if no compatible handler is found.
> + *         -ENOMEM on memory allocation failure.
> + *         Other erros might be returned by .preserve().
> + */
> +int luo_preserve_file(struct luo_session *session, u64 token, int fd)
> +{
> +	struct liveupdate_file_op_args args = {0};
> +	struct liveupdate_file_handler *fh;
> +	struct luo_file *luo_file;
> +	struct file *file;
> +	int err;
> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	if (luo_token_is_used(session, token))
> +		return -EEXIST;
> +
> +	file = fget(fd);
> +	if (!file)
> +		return -EBADF;
> +
> +	err = luo_session_alloc_files_mem(session);
> +	if (err)
> +		goto  exit_err;
> +
> +	if (session->count == LUO_FILE_MAX) {
> +		err = -ENOSPC;
> +		goto exit_err;
> +	}

Similarly, luo_file has no business knowing the size of a session.
Checking session->count should also be done in
luo_session_preserve_fd(). luo_preserve_file() should never be called if
there is no space _in the session_ to accommodate the file.

> +
> +	err = -ENOENT;
> +	list_for_each_entry(fh, &luo_file_handler_list, list) {
> +		if (fh->ops->can_preserve(fh, file)) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +
> +	/* err is still -ENOENT if no handler was found */
> +	if (err)
> +		goto exit_err;
> +
> +	luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> +	if (!luo_file) {
> +		err = -ENOMEM;
> +		goto exit_err;
> +	}
> +
> +	luo_file->file = file;
> +	luo_file->fh = fh;
> +	luo_file->token = token;
> +	luo_file->retrieved = false;
> +	mutex_init(&luo_file->mutex);
> +
> +	args.handler = fh;
> +	args.session = (struct liveupdate_session *)session;
> +	args.file = file;
> +	err = fh->ops->preserve(&args);
> +	if (err) {
> +		mutex_destroy(&luo_file->mutex);
> +		kfree(luo_file);
> +		goto exit_err;
> +	} else {
> +		luo_file->serialized_data = args.serialized_data;
> +		list_add_tail(&luo_file->list, &session->files_list);
> +		session->count++;
> +	}
> +
> +	return 0;
> +
> +exit_err:
> +	fput(file);
> +	luo_session_free_files_mem(session);
> +
> +	return err;
> +}
> +
> +/**
> + * luo_file_unpreserve_files - Unpreserves all files from a session.
> + * @session: The session to be cleaned up.
> + *
> + * This function serves as the primary cleanup path for a session. It is
> + * invoked when the userspace agent closes the session's file descriptor.
> + *
> + * For each file, it performs the following cleanup actions:
> + *   1. Calls the handler's .unpreserve() callback to allow the handler to
> + *      release any resources it allocated.
> + *   2. Removes the file from the session's internal tracking list.
> + *   3. Releases the reference to the 'struct file' that was taken by
> + *      luo_preserve_file() via fput(), returning ownership.
> + *   4. Frees the memory associated with the internal 'struct luo_file'.
> + *
> + * After all individual files are unpreserved, it frees the contiguous memory
> + * block that was allocated to hold their serialization data.
> + */
> +void luo_file_unpreserve_files(struct luo_session *session)
> +{
> +	struct luo_file *luo_file;
> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	while (!list_empty(&session->files_list)) {

Continuing with the layering thing, the list belongs to the session, so
it should traverse it. luo_session_release() should traverse the list
and call luo_file_unpreserve() on each file in the list. The body of
this loop becomes luo_file_unpreserve().

> +		struct liveupdate_file_op_args args = {0};
> +
> +		luo_file = list_last_entry(&session->files_list,
> +					   struct luo_file, list);
> +
> +		args.handler = luo_file->fh;
> +		args.session = (struct liveupdate_session *)session;
> +		args.file = luo_file->file;
> +		args.serialized_data = luo_file->serialized_data;
> +		luo_file->fh->ops->unpreserve(&args);
> +
> +		list_del(&luo_file->list);
> +		session->count--;

... and these two go into luo_session_release().

> +
> +		fput(luo_file->file);
> +		mutex_destroy(&luo_file->mutex);
> +		kfree(luo_file);
> +	}
> +
> +	luo_session_free_files_mem(session);
> +}
> +
> +static int luo_file_freeze_one(struct luo_session *session,
> +			       struct luo_file *luo_file)
> +{
> +	int err = 0;
> +
> +	guard(mutex)(&luo_file->mutex);
> +
> +	if (luo_file->fh->ops->freeze) {

Nit: "if (!luo_file->fh->ops->freeze) return 0;" would make this tad bit
neater. You probably don't even need the mutex since ops are const.

> +		struct liveupdate_file_op_args args = {0};
> +
> +		args.handler = luo_file->fh;
> +		args.session = (struct liveupdate_session *)session;
> +		args.file = luo_file->file;
> +		args.serialized_data = luo_file->serialized_data;
> +
> +		err = luo_file->fh->ops->freeze(&args);
> +		if (!err)
> +			luo_file->serialized_data = args.serialized_data;

Then this can be:

	if (err)
		return err;

	luo_file->serialized_data = args.serialized_data;
	return 0;

> +	}
> +
> +	return err;
> +}
> +
> +static void luo_file_unfreeze_one(struct luo_session *session,
> +				  struct luo_file *luo_file)
> +{
> +	guard(mutex)(&luo_file->mutex);
> +
> +	if (luo_file->fh->ops->unfreeze) {

Same here.

> +		struct liveupdate_file_op_args args = {0};
> +
> +		args.handler = luo_file->fh;
> +		args.session = (struct liveupdate_session *)session;
> +		args.file = luo_file->file;
> +		args.serialized_data = luo_file->serialized_data;
> +
> +		luo_file->fh->ops->unfreeze(&args);
> +	}
> +
> +	luo_file->serialized_data = 0;

The file will also need to be unpreserved after unfreeze. Resetting the
data here is not the right thing, since unpreserve is responsible for
freeing things, and it won't have access to its data.

> +}
> +
> +static void __luo_file_unfreeze(struct luo_session *session,
> +				struct luo_file *failed_entry)
> +{
> +	struct list_head *files_list = &session->files_list;
> +	struct luo_file *luo_file;
> +
> +	list_for_each_entry(luo_file, files_list, list) {
> +		if (luo_file == failed_entry)
> +			break;
> +
> +		luo_file_unfreeze_one(session, luo_file);
> +	}
> +
> +	memset(session->files, 0, session->pgcnt << PAGE_SHIFT);
> +}
> +
> +/**
> + * luo_file_freeze - Freezes all preserved files and serializes their metadata.
> + * @session: The session whose files are to be frozen.
> + *
> + * This function is called from the reboot() syscall path, just before the
> + * kernel transitions to the new image via kexec. Its purpose is to perform the
> + * final preparation and serialization of all preserved files in the session.
> + *
> + * It iterates through each preserved file in FIFO order (the order of
> + * preservation) and performs two main actions:
> + *
> + * 1. Freezes the File: It calls the handler's .freeze() callback for each
> + *    file. This gives the handler a final opportunity to quiesce the device or
> + *    prepare its state for the upcoming reboot. The handler may update its
> + *    private data handle during this step.
> + *
> + * 2. Serializes Metadata: After a successful freeze, it copies the final file
> + *    metadata—the handler's compatible string, the user token, and the final
> + *    private data handle—into the pre-allocated contiguous memory buffer
> + *    (session->files) that will be handed over to the next kernel via KHO.
> + *
> + * Error Handling (Rollback):
> + * This function is atomic. If any handler's .freeze() operation fails, the
> + * entire live update is aborted. The __luo_file_unfreeze() helper is
> + * immediately called to invoke the .unfreeze() op on all files that were
> + * successfully frozen before the point of failure, rolling them back to a
> + * running state. The function then returns an error, causing the reboot()
> + * syscall to fail.
> + *
> + * Context: Called only from the liveupdate_reboot() path.
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int luo_file_freeze(struct luo_session *session)
> +{
> +	struct luo_file_ser *file_ser = session->files;
> +	struct luo_file *luo_file;
> +	int err;
> +	int i;
> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	if (!session->count)
> +		return 0;

Same comment about layering here...

> +
> +	if (WARN_ON(!file_ser))
> +		return -EINVAL;
> +
> +	i = 0;
> +	list_for_each_entry(luo_file, &session->files_list, list) {
> +		err = luo_file_freeze_one(session, luo_file);
> +		if (err < 0) {
> +			pr_warn("Freeze failed for session[%s] token[%#0llx] handler[%s] err[%pe]\n",
> +				session->name, luo_file->token,
> +				luo_file->fh->compatible, ERR_PTR(err));
> +			goto exit_err;
> +		}
> +
> +		strscpy(file_ser[i].compatible, luo_file->fh->compatible,
> +			sizeof(file_ser[i].compatible));
> +		file_ser[i].data = luo_file->serialized_data;
> +		file_ser[i].token = luo_file->token;
> +		i++;
> +	}
> +
> +	return 0;
> +
> +exit_err:
> +	__luo_file_unfreeze(session, luo_file);
> +
> +	return err;
> +}
> +
> +/**
> + * luo_file_unfreeze - Unfreezes all files in a session.
> + * @session: The session whose files are to be unfrozen.
> + *
> + * This function rolls back the state of all files in a session after the freeze
> + * phase has begun but must be aborted. It is the counterpart to
> + * luo_file_freeze().
> + *
> + * It invokes the __luo_file_unfreeze() helper with a NULL argument, which
> + * signals the helper to iterate through all files in the session  and call
> + * their respective .unfreeze() handler callbacks.
> + *
> + * Context: This is called when the live update is aborted during
> + *          the reboot() syscall, after luo_file_freeze() has been called.
> + */
> +void luo_file_unfreeze(struct luo_session *session)
> +{
> +	lockdep_assert_held(&session->mutex);
> +
> +	if (!session->count)
> +		return;

... and here.

> +
> +	__luo_file_unfreeze(session, NULL);
> +}
> +
> +/**
> + * luo_retrieve_file - Restores a preserved file from a session by its token.
> + * @session: The session from which to retrieve the file.
> + * @token:   The unique token identifying the file to be restored.
> + * @filep:   Output parameter; on success, this is populated with a pointer
> + *           to the newly retrieved 'struct file'.
> + *
> + * This function is the primary mechanism for recreating a file in the new
> + * kernel after a live update. It searches the session's list of deserialized
> + * files for an entry matching the provided @token.
> + *
> + * The operation is idempotent: if a file has already been successfully
> + * retrieved, this function will simply return a pointer to the existing
> + * 'struct file' and report success without re-executing the retrieve
> + * operation. This is handled by checking the 'retrieved' flag under a lock.
> + *
> + * File retrieval can happen in any order; it is not bound by the order of
> + * preservation.
> + *
> + * Context: Can be called from an ioctl or other in-kernel code in the new
> + *          kernel.
> + * Return: 0 on success. Returns a negative errno on failure:
> + *         -ENOENT if no file with the matching token is found.
> + *         Any error code returned by the handler's .retrieve() op.
> + */
> +int luo_retrieve_file(struct luo_session *session, u64 token,
> +		      struct file **filep)
> +{
> +	struct liveupdate_file_op_args args = {0};
> +	struct luo_file *luo_file;
> +	int err;
> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	if (list_empty(&session->files_list))
> +		return -ENOENT;

... and here.

> +
> +	list_for_each_entry(luo_file, &session->files_list, list) {
> +		if (luo_file->token == token)
> +			break;
> +	}
> +
> +	if (luo_file->token != token)
> +		return -ENOENT;
> +
> +	guard(mutex)(&luo_file->mutex);
> +	if (luo_file->retrieved) {
> +		/*
> +		 * Someone is asking for this file again, so get a reference
> +		 * for them.
> +		 */

Should we even allow this? Is there a use case?

> +		get_file(luo_file->file);
> +		*filep = luo_file->file;
> +		return 0;
> +	}
> +
> +	args.handler = luo_file->fh;
> +	args.session = (struct liveupdate_session *)session;
> +	args.serialized_data = luo_file->serialized_data;
> +	err = luo_file->fh->ops->retrieve(&args);
> +	if (!err) {
> +		luo_file->file = args.file;
> +
> +		/* Get reference so we can keep this file in LUO until finish */
> +		get_file(luo_file->file);
> +		*filep = luo_file->file;
> +		luo_file->retrieved = true;
> +	}
> +
> +	return err;
> +}
> +
> +static int luo_file_can_finish_one(struct luo_session *session,
> +				   struct luo_file *luo_file)
> +{
> +	bool can_finish = true;
> +
> +	guard(mutex)(&luo_file->mutex);
> +
> +	if (luo_file->fh->ops->can_finish) {

Same nitpick about doing "if (!luo_file->fh->ops->can_finish)".

> +		struct liveupdate_file_op_args args = {0};
> +
> +		args.handler = luo_file->fh;
> +		args.session = (struct liveupdate_session *)session;
> +		args.file = luo_file->file;
> +		args.serialized_data = luo_file->serialized_data;
> +		args.retrieved = luo_file->retrieved;
> +		can_finish = luo_file->fh->ops->can_finish(&args);
> +	}
> +
> +	return can_finish ? 0 : -EBUSY;
> +}
> +
> +static void luo_file_finish_one(struct luo_session *session,
> +				struct luo_file *luo_file)
> +{
> +	struct liveupdate_file_op_args args = {0};
> +
> +	guard(mutex)(&luo_file->mutex);
> +
> +	args.handler = luo_file->fh;
> +	args.session = (struct liveupdate_session *)session;
> +	args.file = luo_file->file;
> +	args.serialized_data = luo_file->serialized_data;
> +	args.retrieved = luo_file->retrieved;
> +
> +	luo_file->fh->ops->finish(&args);
> +}
> +
> +/**
> + * luo_file_finish - Completes the lifecycle for all files in a session.
> + * @session: The session to be finalized.
> + *
> + * This function orchestrates the final teardown of a live update session in the
> + * new kernel. It should be called after all necessary files have been
> + * retrieved and the userspace agent is ready to release the preserved state.
> + *
> + * The function iterates through all tracked files. For each file, it performs
> + * the following sequence of cleanup actions:
> + *
> + * 1. If file is not yet retrieved, retrieves it, and calls can_finish() on
> + *    every file in the session. If all can_finish return true, continue to
> + *    finish.
> + * 2. Calls the handler's .finish() callback (via luo_file_finish_one) to
> + *    allow for final resource cleanup within the handler.
> + * 3. Releases LUO's ownership reference on the 'struct file' via fput(). This
> + *    is the counterpart to the get_file() call in luo_retrieve_file().
> + * 4. Removes the 'struct luo_file' from the session's internal list.
> + * 5. Frees the memory for the 'struct luo_file' instance itself.
> + *
> + * After successfully finishing all individual files, it frees the
> + * contiguous memory block that was used to transfer the serialized metadata
> + * from the previous kernel.
> + *
> + * Error Handling (Atomic Failure):
> + * This operation is atomic. If any handler's .can_finish() op fails, the entire
> + * function aborts immediately and returns an error.
> + *
> + * Context: Can be called from an ioctl handler in the new kernel.
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int luo_file_finish(struct luo_session *session)
> +{
> +	struct list_head *files_list = &session->files_list;
> +	struct luo_file *luo_file;
> +	int err;
> +
> +	if (!session->count)
> +		return 0;

Layering comment again.

> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	list_for_each_entry(luo_file, files_list, list) {
> +		err = luo_file_can_finish_one(session, luo_file);
> +		if (err)
> +			return err;
> +	}
> +
> +	while (!list_empty(&session->files_list)) {
> +		luo_file = list_last_entry(&session->files_list,
> +					   struct luo_file, list);
> +
> +		luo_file_finish_one(session, luo_file);
> +
> +		if (luo_file->file)
> +			fput(luo_file->file);
> +		list_del(&luo_file->list);
> +		session->count--;
> +		mutex_destroy(&luo_file->mutex);
> +		kfree(luo_file);
> +	}
> +
> +	if (session->files) {
> +		kho_restore_free(session->files);
> +		session->files = NULL;
> +		session->pgcnt = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * luo_file_deserialize - Reconstructs the list of preserved files in the new kernel.
> + * @session: The incoming session containing the serialized file data from KHO.
> + *
> + * This function is called during the early boot process of the new kernel. It
> + * takes the raw, contiguous memory block of 'struct luo_file_ser' entries,
> + * provided by the previous kernel, and transforms it back into a live,
> + * in-memory linked list of 'struct luo_file' instances.
> + *
> + * For each serialized entry, it performs the following steps:
> + *   1. Reads the 'compatible' string.
> + *   2. Searches the global list of registered file handlers for one that
> + *      matches the compatible string.
> + *   3. Allocates a new 'struct luo_file'.
> + *   4. Populates the new structure with the deserialized data (token, private
> + *      data handle) and links it to the found handler. The 'file' pointer is
> + *      initialized to NULL, as the file has not been retrieved yet.
> + *   5. Adds the new 'struct luo_file' to the session's files_list.
> + *
> + * This prepares the session for userspace, which can later call
> + * luo_retrieve_file() to restore the actual file descriptors.
> + *
> + * Context: Called from session deserialization.
> + */
> +int luo_file_deserialize(struct luo_session *session)
> +{
> +	struct luo_file_ser *file_ser;
> +	u64 i;
> +
> +	lockdep_assert_held(&session->mutex);
> +
> +	if (!session->files)
> +		return 0;

Layering again.

> +
> +	file_ser = session->files;
> +	for (i = 0; i < session->count; i++) {
> +		struct liveupdate_file_handler *fh;
> +		bool handler_found = false;
> +		struct luo_file *luo_file;
> +
> +		list_for_each_entry(fh, &luo_file_handler_list, list) {
> +			if (!strcmp(fh->compatible, file_ser[i].compatible)) {
> +				handler_found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!handler_found) {
> +			pr_warn("No registered handler for compatible '%s'\n",
> +				file_ser[i].compatible);
> +			return -ENOENT;
> +		}
> +
> +		luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> +		if (!luo_file)
> +			return -ENOMEM;
> +
> +		luo_file->fh = fh;
> +		luo_file->file = NULL;
> +		luo_file->serialized_data = file_ser[i].data;
> +		luo_file->token = file_ser[i].token;
> +		luo_file->retrieved = false;
> +		mutex_init(&luo_file->mutex);
> +		list_add_tail(&luo_file->list, &session->files_list);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * liveupdate_register_file_handler - Register a file handler with LUO.
> + * @fh: Pointer to a caller-allocated &struct liveupdate_file_handler.
> + * The caller must initialize this structure, including a unique
> + * 'compatible' string and a valid 'fh' callbacks. This function adds the
> + * handler to the global list of supported file handlers.
> + *
> + * Context: Typically called during module initialization for file types that
> + * support live update preservation.
> + *
> + * Return: 0 on success. Negative errno on failure.
> + */
> +int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> +{
> +	static DEFINE_MUTEX(register_file_handler_lock);
> +	struct liveupdate_file_handler *fh_iter;
> +
> +	if (!liveupdate_enabled())
> +		return -EOPNOTSUPP;
> +
> +	/*
> +	 * Once sessions have been deserialized, file handlers cannot be
> +	 * registered, it is too late.
> +	 */
> +	if (WARN_ON(luo_session_is_deserialized()))
> +		return -EBUSY;
> +
> +	/* Sanity check that all required callbacks are set */
> +	if (!fh->ops->preserve || !fh->ops->unpreserve ||
> +	    !fh->ops->retrieve || !fh->ops->finish) {

Should check can_preserve here, right?

> +		return -EINVAL;
> +	}
> +
> +	guard(mutex)(&register_file_handler_lock);
> +	list_for_each_entry(fh_iter, &luo_file_handler_list, list) {
> +		if (!strcmp(fh_iter->compatible, fh->compatible)) {
> +			pr_err("File handler registration failed: Compatible string '%s' already registered.\n",
> +			       fh->compatible);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	if (!try_module_get(fh->ops->owner))
> +		return -EAGAIN;
> +
> +	INIT_LIST_HEAD(&fh->list);
> +	list_add_tail(&fh->list, &luo_file_handler_list);
> +
> +	return 0;
> +}
> +
> +/**
> + * liveupdate_get_token_outgoing - Get the token for a preserved file.
> + * @s:      The outgoing liveupdate session.
> + * @file:   The file object to search for.
> + * @tokenp: Output parameter for the found token.
> + *
> + * Searches the list of preserved files in an outgoing session for a matching
> + * file object. If found, the corresponding user-provided token is returned.
> + *
> + * This function is intended for in-kernel callers that need to correlate a
> + * file with its liveupdate token.
> + *
> + * Context: Can be called from any context that can acquire the session mutex.
> + * Return: 0 on success, -ENOENT if the file is not preserved in this session.
> + */
> +int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> +				  struct file *file, u64 *tokenp)
> +{
> +	struct luo_session *session = (struct luo_session *)s;
> +	struct luo_file *luo_file;
> +	int err = -ENOENT;
> +
> +	list_for_each_entry(luo_file, &session->files_list, list) {
> +		if (luo_file->file == file) {
> +			if (tokenp)
> +				*tokenp = luo_file->token;
> +			err = 0;
> +			break;
> +		}
> +	}
> +
> +	return err;
> +}
> +
> +/**
> + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel use.
> + * @s:      The incoming liveupdate session (restored from the previous kernel).
> + * @token:  The unique token identifying the file to retrieve.
> + * @filep:  On success, this will be populated with a pointer to the retrieved
> + *          'struct file'.
> + *
> + * Provides a kernel-internal API for other subsystems to retrieve their
> + * preserved files after a live update. This function is a simple wrapper
> + * around luo_retrieve_file(), allowing callers to find a file by its token.
> + *
> + * The operation is idempotent; subsequent calls for the same token will return
> + * a pointer to the same 'struct file' object.
> + *
> + * The caller receives a pointer to the file with a reference incremented. The
> + * file's lifetime is managed by LUO and any userspace file
> + * descriptors. If the caller needs to hold a reference to the file beyond the
> + * immediate scope, it must call get_file() itself.
> + *
> + * Context: Can be called from any context in the new kernel that has a handle
> + *          to a restored session.
> + * Return: 0 on success. Returns -ENOENT if no file with the matching token is
> + *         found, or any other negative errno on failure.
> + */
> +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> +				 struct file **filep)
> +{
> +	struct luo_session *session = (struct luo_session *)s;
> +
> +	return luo_retrieve_file(session, token, filep);
> +}
> diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> index 5185ad37a8c1..1a36f2383123 100644
> --- a/kernel/liveupdate/luo_internal.h
> +++ b/kernel/liveupdate/luo_internal.h
> @@ -70,4 +70,13 @@ int luo_session_serialize(void);
>  int luo_session_deserialize(void);
>  bool luo_session_is_deserialized(void);
>  
> +int luo_preserve_file(struct luo_session *session, u64 token, int fd);
> +void luo_file_unpreserve_files(struct luo_session *session);
> +int luo_file_freeze(struct luo_session *session);
> +void luo_file_unfreeze(struct luo_session *session);
> +int luo_retrieve_file(struct luo_session *session, u64 token,
> +		      struct file **filep);
> +int luo_file_finish(struct luo_session *session);
> +int luo_file_deserialize(struct luo_session *session);
> +
>  #endif /* _LINUX_LUO_INTERNAL_H */

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-21 21:30 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: jasonmiu, graf, rppt, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
	brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
	parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <mafs08qfz1h3c.fsf@kernel.org>

> >  /*
> >   * The LUO FDT hooks all LUO state for sessions, fds, etc.
> > - * In the root it allso carries "liveupdate-number" 64-bit property that
> > + * In the root it also carries "liveupdate-number" 64-bit property that
>
> Nit: This needs a bit of patch massaging. Patch 2 added the typo, and
> this patch fixes it. It would be better to just update patch 2.

Yeap, this is fixed.


> > + * This structure is located at the beginning of a contiguous block of
> > + * physical memory preserved across the kexec. It provides the necessary
> > + * metadata to interpret the array of session entries that follow.
> > + */
> > +struct luo_session_header_ser {
> > +     u64 pgcnt;
>
> Why do you need pgcnt here? Can't the size be inferred from count? And
> since you use contiguous memory block, the folio will know its page
> count anyway, right? The less we have in the ABI the better IMO.

Right, I had pgnct because my allocators were using size as an
argument, but we removed that, so pgcnt can also be removed.

> Same for other structures below.
>
> > +     u64 count;
> > +} __packed;
> > +
> > +/**
> > + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> > + * @name:    The unique name of the session, copied from the `luo_session`
> > + *           structure.
> > + * @files:   The physical address of a contiguous memory block that holds
> > + *           the serialized state of files.
> > + * @pgcnt:   The number of pages occupied by the `files` memory block.
> > + * @count:   The total number of files that were part of this session during
> > + *           serialization. Used for iteration and validation during
> > + *           restoration.
> > + *
> > + * This structure is used to package session-specific metadata for transfer
> > + * between kernels via Kexec Handover. An array of these structures (one per
> > + * session) is created and passed to the new kernel, allowing it to reconstruct
> > + * the session context.
> > + *
> > + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
> > + */
> > +struct luo_session_ser {
> > +     char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> > +     u64 files;
> > +     u64 pgcnt;
> > +     u64 count;
> > +} __packed;
> > +
> >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> [...]
> > +/* Create a "struct file" for session */
> > +static int luo_session_getfile(struct luo_session *session, struct file **filep)
> > +{
> > +     char name_buf[128];
> > +     struct file *file;
> > +
> > +     guard(mutex)(&session->mutex);
> > +     snprintf(name_buf, sizeof(name_buf), "[luo_session] %s", session->name);
> > +     file = anon_inode_getfile(name_buf, &luo_session_fops, session, O_RDWR);
>
> Nit: You can return the file directly and get rid of filep.

I prefer returning error here.

>
> > +     if (IS_ERR(file))
> > +             return PTR_ERR(file);
> > +
> > +     *filep = file;
> > +
> > +     return 0;
> > +}
> [...]
> > +int __init luo_session_setup_outgoing(void *fdt_out)
> > +{
> > +     struct luo_session_header_ser *header_ser;
> > +     u64 header_ser_pa;
> > +     int err;
> > +
> > +     header_ser = kho_alloc_preserve(LUO_SESSION_PGCNT << PAGE_SHIFT);
>
> Nit: The naming is a bit confusing here. At first glance I thought this
> was just allocating the header, but it allocates the whole session
> serialization buffer.

I made it a little clearer by adding "outgoing_buffer" local variable,
and then assigning head_ser to this local variable.

> > +     if (IS_ERR(header_ser))
> > +             return PTR_ERR(header_ser);
> > +     header_ser_pa = virt_to_phys(header_ser);
> > +
> > +     err = fdt_begin_node(fdt_out, LUO_FDT_SESSION_NODE_NAME);
> > +     err |= fdt_property_string(fdt_out, "compatible",
> > +                                LUO_FDT_SESSION_COMPATIBLE);
> > +     err |= fdt_property(fdt_out, LUO_FDT_SESSION_HEADER, &header_ser_pa,
> > +                         sizeof(header_ser_pa));
> > +     err |= fdt_end_node(fdt_out);
> > +
> > +     if (err)
> > +             goto err_unpreserve;
> > +
> > +     header_ser->pgcnt = LUO_SESSION_PGCNT;
> > +     INIT_LIST_HEAD(&luo_session_global.outgoing.list);
> > +     init_rwsem(&luo_session_global.outgoing.rwsem);
> > +     luo_session_global.outgoing.header_ser = header_ser;
> > +     luo_session_global.outgoing.ser = (void *)(header_ser + 1);
> > +     luo_session_global.outgoing.active = true;
> > +
> > +     return 0;
> > +
> > +err_unpreserve:
> > +     kho_unpreserve_free(header_ser);
> > +     return err;
> > +}
> [...]
> > +int luo_session_deserialize(void)
> > +{
> > +     struct luo_session_header *sh = &luo_session_global.incoming;
> > +     int err;
> > +
> > +     if (luo_session_is_deserialized())
> > +             return 0;
> > +
> > +     luo_session_global.deserialized = true;
> > +     if (!sh->active) {
> > +             INIT_LIST_HEAD(&sh->list);
> > +             init_rwsem(&sh->rwsem);
>
> Nit: it would be a bit simpler if LUO init always initialized this. And
> then luo_session_setup_incoming() can fill the list if it has any data.
> Slight reduction in code duplication and mental load.

These are now statically initialized.

>
> > +             return 0;
> > +     }
> > +
> > +     for (int i = 0; i < sh->header_ser->count; i++) {
> > +             struct luo_session *session;
> > +
> > +             session = luo_session_alloc(sh->ser[i].name);
> > +             if (IS_ERR(session)) {
> > +                     pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > +                             sh->ser[i].name, session);
> > +                     return PTR_ERR(session);
> > +             }
> > +
> > +             err = luo_session_insert(sh, session);
> > +             if (err) {
> > +                     luo_session_free(session);
> > +                     pr_warn("Failed to insert session [%s] %pe\n",
> > +                             session->name, ERR_PTR(err));
> > +                     return err;
> > +             }
> > +
> > +             session->count = sh->ser[i].count;
> > +             session->files = sh->ser[i].files ? phys_to_virt(sh->ser[i].files) : 0;
> > +             session->pgcnt = sh->ser[i].pgcnt;
> > +     }
> > +
> > +     kho_restore_free(sh->header_ser);
> > +     sh->header_ser = NULL;
> > +     sh->ser = NULL;
> > +
> > +     return 0;
> > +}
> [...]
>
> --
> Regards,
> Pratyush Yadav

Thanks!

Pasha

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox