Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
From: Pasha Tatashin @ 2025-11-17 19:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRsDb-4bXFQ9Zmtu@kernel.org>

> >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > index df337c9c4f21..9a531096bdb5 100644
> > --- a/kernel/liveupdate/luo_file.c
> > +++ b/kernel/liveupdate/luo_file.c
> > @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> >       INIT_LIST_HEAD(&fh->flb_list);
> >       list_add_tail(&fh->list, &luo_file_handler_list);
> >
> > +     liveupdate_test_register(fh);
> > +
>
> Why this cannot be called from the test?

Because test does not have access to all file_handlers that are being
registered with LUO.

Pasha

^ permalink raw reply

* Re: [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state
From: Pasha Tatashin @ 2025-11-17 18:45 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRr13Q1xk9eunilo@kernel.org>

> > Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
> > Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thank you!

Pasha

^ permalink raw reply

* Re: [PATCH v6 13/20] mm: shmem: export some functions to internal.h
From: Pasha Tatashin @ 2025-11-17 18:43 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRr1aw45EYSFTCw9@kernel.org>

On Mon, Nov 17, 2025 at 5:14 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:59PM -0500, Pasha Tatashin wrote:
> > From: Pratyush Yadav <ptyadav@amazon.de>
> >
> > shmem_inode_acct_blocks(), shmem_recalc_inode(), and
> > shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
> > functionality will also be used in the future by Live Update
> > Orchestrator (LUO) to recreate memfd files after a live update.
>
> I'd rephrase this a bit to say that it will be used by memfd integration
> into LUO to emphasize this stays inside mm.

Done

>
> Other than that


>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thank you.

^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-17 18:29 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoi-Pb8jnjaZp0X@kernel.org>

On Sun, Nov 16, 2025 at 2:16 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> > On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > > +static int __init liveupdate_early_init(void)
> > > > +{
> > > > +     int err;
> > > > +
> > > > +     err = luo_early_startup();
> > > > +     if (err) {
> > > > +             pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > > +                    ERR_PTR(err));
> > >
> > > How do we report this to the userspace?
> > > I think the decision what to do in this case belongs there. Even if it's
> > > down to choosing between plain kexec and full reboot, it's still a policy
> > > that should be implemented in userspace.
> >
> > I agree that policy belongs in userspace, and that is how we designed
> > it. In this specific failure case (ABI mismatch or corrupt FDT), the
> > preserved state is unrecoverable by the kernel. We cannot parse the
> > incoming data, so we cannot offer it to userspace.
> >
> > We report this state by not registering the /dev/liveupdate device.
> > When the userspace agent attempts to initialize, it receives ENOENT.
> > At that point, the agent exercises its policy:
> >
> > - Check dmesg for the specific error and report the failure to the
> > fleet control plane.
>
> Hmm, this is not nice. I think we still should register /dev/liveupdate and
> let userspace discover this error via /dev/liveupdate ABIs.

Not registering the device is the correct approach here for two reasons:

1. This follows the standard Linux driver pattern. If a driver fails
to initialize its underlying resources (hardware, firmware, or in this
case, the incoming FDT), it does not register a character device.
2. Registering a "zombie" device that exists solely to return errors
adds significant complexity. We would need to introduce a specific
"broken" state to the state machine and add checks to IOCTLs to reject
commands with a specific error code.

Pasha

^ permalink raw reply

* Re: [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags
From: Pasha Tatashin @ 2025-11-17 18:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRrvaHh-cP8jygAF@kernel.org>

On Mon, Nov 17, 2025 at 4:48 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:57PM -0500, Pasha Tatashin wrote:
> > From: Pratyush Yadav <ptyadav@amazon.de>
> >
> > shmem_inode_info::flags can have the VM flags VM_NORESERVE and
> > VM_LOCKED. These are used to suppress pre-accounting or to lock the
> > pages in the inode respectively. Using the VM flags directly makes it
> > difficult to add shmem-specific flags that are unrelated to VM behavior
> > since one would need to find a VM flag not used by shmem and re-purpose
> > it.
> >
> > Introduce SHMEM_F_NORESERVE and SHMEM_F_LOCKED which represent the same
> > information, but their bits are independent of the VM flags. Callers can
> > still pass VM_NORESERVE to shmem_get_inode(), but it gets transformed to
> > the shmem-specific flag internally.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thank you.

Pasha

^ permalink raw reply

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: Pasha Tatashin @ 2025-11-17 18:23 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
	tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <c8b46600-d40f-41b4-a5a3-99300ef1a2eb@linux.dev>

> Thanks a lot. Just with kernel image, it is not enough to boot the host.
> Adding initramfs will avoid the crash when the host boots.
> I have made tests to verify this.
>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Thank you!

^ permalink raw reply

* Re: [PATCH v6 10/20] MAINTAINERS: add liveupdate entry
From: Pasha Tatashin @ 2025-11-17 18:20 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRrtmy--AWCEEbtg@kernel.org>

On Mon, Nov 17, 2025 at 4:41 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 15, 2025 at 06:33:56PM -0500, Pasha Tatashin wrote:
> > Add a MAINTAINERS file entry for the new Live Update Orchestrator
> > introduced in previous patches.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  MAINTAINERS | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 500789529359..bc9f5c6f0e80 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14464,6 +14464,17 @@ F:   kernel/module/livepatch.c
> >  F:   samples/livepatch/
> >  F:   tools/testing/selftests/livepatch/
> >
> > +LIVE UPDATE
> > +M:   Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Please count me in :)
>

Sure, added.

> > +L:   linux-kernel@vger.kernel.org
> > +S:   Maintained
> > +F:   Documentation/core-api/liveupdate.rst
> > +F:   Documentation/userspace-api/liveupdate.rst
> > +F:   include/linux/liveupdate.h
> > +F:   include/linux/liveupdate/
> > +F:   include/uapi/linux/liveupdate.h
> > +F:   kernel/liveupdate/
> > +
> >  LLC (802.2)
> >  L:   netdev@vger.kernel.org
> >  S:   Odd fixes
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 06/20] liveupdate: luo_file: implement file systems callbacks
From: Pasha Tatashin @ 2025-11-17 17:50 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoU1DSgVmplHr3E@kernel.org>

> > +struct liveupdate_file_handler;
> > +struct liveupdate_session;
>
> Why struct liveupdate_session is a part of public LUO API?

It is an obscure version of private "struct luo_session", in order to
give subsystem access to:
liveupdate_get_file_incoming(s, token, filep)
liveupdate_get_token_outgoing(s, file, tokenp)

For example, if your FD depends on another FD within a session, you
can check if another FD is already preserved via
liveupdate_get_token_outgoing(), and during retrieval time you can
retrieve the "struct file" for your dependency.

> > +struct file;
> > +
> > +/**
> > + * struct liveupdate_file_op_args - Arguments for file operation callbacks.
> > + * @handler:          The file handler being called.
> > + * @session:          The session this file belongs to.
> > + * @retrieved:        The retrieve status for the 'can_finish / finish'
> > + *                    operation.
> > + * @file:             The file object. For retrieve: [OUT] The callback sets
> > + *                    this to the new file. For other ops: [IN] The caller sets
> > + *                    this to the file being operated on.
> > + * @serialized_data:  The opaque u64 handle, preserve/prepare/freeze may update
> > + *                    this field.
> > + *
> > + * This structure bundles all parameters for the file operation callbacks.
> > + * The 'data' and 'file' fields are used for both input and output.
> > + */
> > +struct liveupdate_file_op_args {
> > +     struct liveupdate_file_handler *handler;
> > +     struct liveupdate_session *session;
> > +     bool retrieved;
> > +     struct file *file;
> > +     u64 serialized_data;
> > +};
> > +
> > +/**
> > + * struct liveupdate_file_ops - Callbacks for live-updatable files.
> > + * @can_preserve: Required. Lightweight check to see if this handler is
> > + *                compatible with the given file.
> > + * @preserve:     Required. Performs state-saving for the file.
> > + * @unpreserve:   Required. Cleans up any resources allocated by @preserve.
> > + * @freeze:       Optional. Final actions just before kernel transition.
> > + * @unfreeze:     Optional. Undo freeze operations.
> > + * @retrieve:     Required. Restores the file in the new kernel.
> > + * @can_finish:   Optional. Check if this FD can finish, i.e. all restoration
> > + *                pre-requirements for this FD are satisfied. Called prior to
> > + *                finish, in order to do successful finish calls for all
> > + *                resources in the session.
> > + * @finish:       Required. Final cleanup in the new kernel.
> > + * @owner:        Module reference
> > + *
> > + * All operations (except can_preserve) receive a pointer to a
> > + * 'struct liveupdate_file_op_args' containing the necessary context.
> > + */
> > +struct liveupdate_file_ops {
> > +     bool (*can_preserve)(struct liveupdate_file_handler *handler,
> > +                          struct file *file);
> > +     int (*preserve)(struct liveupdate_file_op_args *args);
> > +     void (*unpreserve)(struct liveupdate_file_op_args *args);
> > +     int (*freeze)(struct liveupdate_file_op_args *args);
> > +     void (*unfreeze)(struct liveupdate_file_op_args *args);
> > +     int (*retrieve)(struct liveupdate_file_op_args *args);
> > +     bool (*can_finish)(struct liveupdate_file_op_args *args);
> > +     void (*finish)(struct liveupdate_file_op_args *args);
> > +     struct module *owner;
> > +};
> > +
> > +/**
> > + * struct liveupdate_file_handler - Represents a handler for a live-updatable file type.
> > + * @ops:                Callback functions
> > + * @compatible:         The compatibility string (e.g., "memfd-v1", "vfiofd-v1")
> > + *                      that uniquely identifies the file type this handler
> > + *                      supports. This is matched against the compatible string
> > + *                      associated with individual &struct file instances.
> > + * @list:               Used for linking this handler instance into a global
> > + *                      list of registered file handlers.
> > + *
> > + * Modules that want to support live update for specific file types should
> > + * register an instance of this structure. LUO uses this registration to
> > + * determine if a given file can be preserved and to find the appropriate
> > + * operations to manage its state across the update.
> > + */
> > +struct liveupdate_file_handler {
> > +     const struct liveupdate_file_ops *ops;
> > +     const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
> > +     struct list_head list;
>
> Did you consider using __private and ACCESS_PRIVATE() for the ->list
> member here and in other structures visible outside kernel/liveupdate?

I hadn't considered it, but that is a great suggestion. I will update
the headers to use __private/ACCESS_PRIVATE().


> >
> > +/* The max size is set so it can be reliably used during in serialization */
>
> I failed to parse this comment.

Me too, I removed it. :-)

> > + *   - can_preserve(): A lightweight check to determine if the handler is
> > + *     compatible with a given 'struct file'.
> > + *   - preserve(): The heavyweight operation that saves the file's state and
> > + *     returns an opaque u64 handle, happens while vcpus are still running.
>
>                                                      ^ VCPUs

Done

>
> This narrows the description to VM-only usecase and in general ->preserve()
> may happen after VCPUs are suspended, although it's neither intended nor
> desirable. LUO does not control the sequencing so we can't claim here
> anything about VCPUs.

Agreed. While keeping VCPUs running is the target behavior for the
hypervisor use case to minimize downtime, the framework itself is
agnostic to the workload type and sequencing. Re-wrote:

 *   - preserve(): The heavyweight operation that saves the file's state and
 *     returns an opaque u64 handle. This is typically performed while the
 *     workload is still active to minimize the downtime during the
 *     actual reboot transition.

> > + *   - unpreserve(): Cleans up any resources allocated by .preserve(), called
> > + *     if the preservation process is aborted before the reboot (i.e. session is
> > + *     closed).
> > + *   - freeze(): A final pre-reboot opportunity to prepare the state for kexec.
> > + *     We are already in reboot syscall, and therefore userspace cannot mutate
> > + *     the file anymore.
> > + *   - unfreeze(): Undoes the actions of .freeze(), called if the live update
> > + *     is aborted after the freeze phase.
> > + *   - retrieve(): Reconstructs the file in the new kernel from the preserved
> > + *     handle.
> > + *   - finish(): Performs final check and cleanup in the new kernel. After
> > + *     succesul finish call, LUO gives up ownership to this file.
> > + *
> > + * File Preservation Lifecycle happy path:
> > + *
> > + * 1. Preserve (Normal Operation): A userspace agent preserves files one by one
> > + *    via an ioctl. For each file, luo_preserve_file() finds a compatible
> > + *    handler, calls its .preserve() op, and creates an internal &struct
>
>                                       ^ method or operation

Done

>
> > + *    luo_file to track the live state.
> > + *
> > + * 2. Freeze (Pre-Reboot): Just before the kexec, luo_file_freeze() is called.
> > + *    It iterates through all preserved files, calls their respective .freeze()
> > + *    ops, and serializes their final metadata (compatible string, token, and
>
>         ^ method or operation
>
> > + *    data handle) into a contiguous memory block for KHO.
> > + *
> > + * 3. Deserialize (New Kernel - Early Boot): After kexec, luo_file_deserialize()
>
> From the code it seems that description runs on the fist open of
> /dev/liveupdated, what do I miss?

Updated:
 * 3. Deserialize: After kexec, luo_file_deserialize() runs when session gets
 *    deserialized (which is when /dev/liveupdate is first opened). It reads the
 *    serialized data from the KHO memory region and reconstructs the in-memory
 *    list of &struct luo_file instances for the new kernel, linking them to
 *    their corresponding handlers.

>
> > + *    runs. It reads the serialized data from the KHO memory region and
> > + *    reconstructs the in-memory list of &struct luo_file instances for the new
> > + *    kernel, linking them to their corresponding handlers.
> > + *
> > + * 4. Retrieve (New Kernel - Userspace Ready): The userspace agent can now
> > + *    restore file descriptors by providing a token. luo_retrieve_file()
> > + *    searches for the matching token, calls the handler's .retrieve() op to
> > + *    re-create the 'struct file', and returns a new FD. Files can be
> > + *    retrieved in ANY order.
> > + *
> > + * 5. Finish (New Kernel - Cleanup): Once a session retrival is complete,
> > + *    luo_file_finish() is called. It iterates through all files,
> > + *    invokes their .finish() ops for final cleanup, and releases all
>
>                                 ^ method

Done

>
> > + *    associated kernel resources.
> > + *
> > + * File Preservation Lifecycle unhappy paths:
> > + *
> > + * 1. Abort Before Reboot: If the userspace agent aborts the live update
> > + *    process before calling reboot (e.g., by closing the session file
> > + *    descriptor), the session's release handler calls
> > + *    luo_file_unpreserve_files(). This invokes the .unpreserve() callback on
> > + *    all preserved files, ensuring all allocated resources are cleaned up and
> > + *    returning the system to a clean state.
> > + *
> > + * 2. Freeze Failure: During the reboot() syscall, if any handler's .freeze()
> > + *    op fails, the .unfreeze() op is invoked on all previously *successful*
> > + *    freezes to roll back their state. The reboot() syscall then returns an
> > + *    error to userspace, canceling the live update.
> > + *
> > + * 3. Finish Failure: In the new kernel, if a handler's .finish() op fails,
> > + *    the luo_file_finish() operation is aborted. LUO retains ownership of
> > + *    all files within that session, including those that were not yet
> > + *    processed. The userspace agent can attempt to call the finish operation
> > + *    again later. If the issue cannot be resolved, these resources will be held
> > + *    by LUO until the next live update cycle, at which point they will be
> > + *    discarded.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/cleanup.h>
> > +#include <linux/err.h>
> > +#include <linux/errno.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/kexec_handover.h>
> > +#include <linux/liveupdate.h>
> > +#include <linux/liveupdate/abi/luo.h>
> > +#include <linux/module.h>
> > +#include <linux/sizes.h>
> > +#include <linux/slab.h>
> > +#include <linux/string.h>
> > +#include "luo_internal.h"
> > +
> > +static LIST_HEAD(luo_file_handler_list);
> > +
> > +/* 2 4K pages, give space for 128 files per session */
> > +#define LUO_FILE_PGCNT               2ul
> > +#define LUO_FILE_MAX                                                 \
> > +     ((LUO_FILE_PGCNT << PAGE_SHIFT) / sizeof(struct luo_file_ser))
> > +
> > +/**
> > + * struct luo_file - Represents a single preserved file instance.
> > + * @fh:            Pointer to the &struct liveupdate_file_handler that manages
> > + *                 this type of file.
> > + * @file:          Pointer to the kernel's &struct file that is being preserved.
> > + *                 This is NULL in the new kernel until the file is successfully
> > + *                 retrieved.
> > + * @serialized_data: The opaque u64 handle to the serialized state of the file.
> > + *                 This handle is passed back to the handler's .freeze(),
> > + *                 .retrieve(), and .finish() callbacks, allowing it to track
> > + *                 and update its serialized state across phases.
> > + * @retrieved:     A flag indicating whether a user/kernel in the new kernel has
> > + *                 successfully called retrieve() on this file. This prevents
> > + *                 multiple retrieval attempts.
> > + * @mutex:         A mutex that protects the fields of this specific instance
> > + *                 (e.g., @retrieved, @file), ensuring that operations like
> > + *                 retrieving or finishing a file are atomic.
> > + * @list:          The list_head linking this instance into its parent
> > + *                 session's list of preserved files.
> > + * @token:         The user-provided unique token used to identify this file.
> > + *
> > + * This structure is the core in-kernel representation of a single file being
> > + * managed through a live update. An instance is created by luo_preserve_file()
> > + * to link a 'struct file' to its corresponding handler, a user-provided token,
> > + * and the serialized state handle returned by the handler's .preserve()
> > + * operation.
> > + *
> > + * These instances are tracked in a per-session list. The @serialized_data
> > + * field, which holds a handle to the file's serialized state, may be updated
> > + * during the .freeze() callback before being serialized for the next kernel.
> > + * After reboot, these structures are recreated by luo_file_deserialize() and
> > + * are finally cleaned up by luo_file_finish().
> > + */
> > +struct luo_file {
> > +     struct liveupdate_file_handler *fh;
> > +     struct file *file;
> > +     u64 serialized_data;
> > +     bool retrieved;
> > +     struct mutex mutex;
> > +     struct list_head list;
> > +     u64 token;
> > +};
> > +
> > +static int luo_session_alloc_files_mem(struct luo_session *session)
>
> It seems like this belongs to luo_session.c

It belongs here, but the name is wrong, so I renamed the alloc/free functions.

> > +{
> > +     size_t size;
> > +     void *mem;
> > +
> > +     if (session->files)
> > +             return 0;
> > +
> > +     WARN_ON_ONCE(session->count);
> > +
> > +     size = LUO_FILE_PGCNT << PAGE_SHIFT;
> > +     mem = kho_alloc_preserve(size);
> > +     if (IS_ERR(mem))
> > +             return PTR_ERR(mem);
> > +
> > +     session->files = mem;
> > +     session->pgcnt = LUO_FILE_PGCNT;
> > +
> > +     return 0;
> > +}
> > +
> > +static void luo_session_free_files_mem(struct luo_session *session)
> > +{
>
> Ditto

done.


>
> > +     /* If session has files, no need to free preservation memory */
> > +     if (session->count)
> > +             return;
> > +
> > +     if (!session->files)
> > +             return;
> > +
> > +     kho_unpreserve_free(session->files);
> > +     session->files = NULL;
> > +     session->pgcnt = 0;
> > +}
> > +
> > +static bool luo_token_is_used(struct luo_session *session, u64 token)
> > +{
> > +     struct luo_file *iter;
> > +
> > +     list_for_each_entry(iter, &session->files_list, list) {
>
> And here again I'm not very fond of dereferencing session objects in
> luo_file.

luo_file only access session->files_* fields, that are both allocated
and freed in luo_files, and iterated inside luo_file.

>
> > +             if (iter->token == token)
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +/**
> > + * luo_preserve_file - Initiate the preservation of a file descriptor.
> > + * @session: The session to which the preserved file will be added.
> > + * @token:   A unique, user-provided identifier for the file.
> > + * @fd:      The file descriptor to be preserved.
> > + *
> > + * This function orchestrates the first phase of preserving a file. Upon entry,
> > + * it takes a reference to the 'struct file' via fget(), effectively making LUO
> > + * a co-owner of the file. This reference is held until the file is either
> > + * unpreserved or successfully finished in the next kernel, preventing the file
> > + * from being prematurely destroyed.
> > + *
> > + * This function orchestrates the first phase of preserving a file. It performs
> > + * the following steps:
> > + *
> > + * 1. Validates that the @token is not already in use within the session.
> > + * 2. Ensures the session's memory for files serialization is allocated
> > + *    (allocates if needed).
> > + * 3. Iterates through registered handlers, calling can_preserve() to find one
> > + *    compatible with the given @fd.
> > + * 4. Calls the handler's .preserve() operation, which saves the file's state
> > + *    and returns an opaque private data handle.
> > + * 5. Adds the new instance to the session's internal list.
> > + *
> > + * On success, LUO takes a reference to the 'struct file' and considers it
> > + * under its management until it is unpreserved or finished.
> > + *
> > + * In case of any failure, all intermediate allocations (file reference, memory
> > + * for the 'luo_file' struct, etc.) are cleaned up before returning an error.
> > + *
> > + * Context: Can be called from an ioctl handler during normal system operation.
> > + * Return: 0 on success. Returns a negative errno on failure:
> > + *         -EEXIST if the token is already used.
> > + *         -EBADF if the file descriptor is invalid.
> > + *         -ENOSPC if the session is full.
> > + *         -ENOENT if no compatible handler is found.
> > + *         -ENOMEM on memory allocation failure.
> > + *         Other erros might be returned by .preserve().
> > + */
> > +int luo_preserve_file(struct luo_session *session, u64 token, int fd)
> > +{
> > +     struct liveupdate_file_op_args args = {0};
> > +     struct liveupdate_file_handler *fh;
> > +     struct luo_file *luo_file;
> > +     struct file *file;
> > +     int err;
> > +
> > +     lockdep_assert_held(&session->mutex);
> > +
> > +     if (luo_token_is_used(session, token))
> > +             return -EEXIST;
> > +
> > +     file = fget(fd);
> > +     if (!file)
> > +             return -EBADF;
> > +
> > +     err = luo_session_alloc_files_mem(session);
> > +     if (err)
> > +             goto  exit_err;
> > +
> > +     if (session->count == LUO_FILE_MAX) {
> > +             err = -ENOSPC;
> > +             goto exit_err;
> > +     }
>
> I believe session can be prepared and vailidated by the caller.

Size of luo_files, and other file count related limitations all belong
luo_file.c

>
> > +
> > +     err = -ENOENT;
> > +     list_for_each_entry(fh, &luo_file_handler_list, list) {
> > +             if (fh->ops->can_preserve(fh, file)) {
> > +                     err = 0;
> > +                     break;
> > +             }
> > +     }
> > +
> > +     /* err is still -ENOENT if no handler was found */
> > +     if (err)
> > +             goto exit_err;
> > +
> > +     luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > +     if (!luo_file) {
> > +             err = -ENOMEM;
> > +             goto exit_err;
> > +     }
> > +
> > +     luo_file->file = file;
> > +     luo_file->fh = fh;
> > +     luo_file->token = token;
> > +     luo_file->retrieved = false;
> > +     mutex_init(&luo_file->mutex);
> > +
> > +     args.handler = fh;
> > +     args.session = (struct liveupdate_session *)session;
>
> Isn't args.session already struct liveupdate_session *?

This casts (struct luo_session *) to obscure public (struct
liveupdate_session *).

>
> > +     args.file = file;
> > +     err = fh->ops->preserve(&args);
> > +     if (err) {
> > +             mutex_destroy(&luo_file->mutex);
> > +             kfree(luo_file);
> > +             goto exit_err;
> > +     } else {
> > +             luo_file->serialized_data = args.serialized_data;
> > +             list_add_tail(&luo_file->list, &session->files_list);
> > +             session->count++;
>
> I'd use luo_session_add_file(struct luo_file *luo_file) or return luo_file
> by reference to the caller.
> Than the lockdep_assert_held() can go away as well.

Let's keep this, I do not think, there is any architectural win from
disallowing luo_file from insert itself directly into a session, both
a part of luo_*
luo_session does not manage anything files related: no
serialization/deserialization, no allocations/free, no
insertion/removal.

>
> > +     }
> > +
> > +     return 0;
> > +
> > +exit_err:
> > +     fput(file);
> > +     luo_session_free_files_mem(session);
>
> The error handling in this function is a mess. Pasha, please, please, use
> goto consistently.

How is this a mess? There is a single exit_err destination, no
exception, no early returns except at the very top of the function
where we do early returns before fget() which makes total sense.

Do you want to add a separate destination for
luo_session_free_files_mem() ? But that is not necessary, in many
places it is considered totally reasonable for free(NULL) to work
correctly...

> > +
> > +     return err;
> > +}
> > +
> > +/**
> > + * luo_file_unpreserve_files - Unpreserves all files from a session.
> > + * @session: The session to be cleaned up.
> > + *
> > + * This function serves as the primary cleanup path for a session. It is
> > + * invoked when the userspace agent closes the session's file descriptor.
> > + *
> > + * For each file, it performs the following cleanup actions:
> > + *   1. Calls the handler's .unpreserve() callback to allow the handler to
> > + *      release any resources it allocated.
> > + *   2. Removes the file from the session's internal tracking list.
> > + *   3. Releases the reference to the 'struct file' that was taken by
> > + *      luo_preserve_file() via fput(), returning ownership.
> > + *   4. Frees the memory associated with the internal 'struct luo_file'.
> > + *
> > + * After all individual files are unpreserved, it frees the contiguous memory
> > + * block that was allocated to hold their serialization data.
> > + */
> > +void luo_file_unpreserve_files(struct luo_session *session)
> > +{
> > +     struct luo_file *luo_file;
> > +
> > +     lockdep_assert_held(&session->mutex);
> > +
> > +     while (!list_empty(&session->files_list)) {
>
> I think the loop should be in luo_session.c and luo_files.c should
> implement luo_file_unpreserve(struct luo_file *luo_file)
>
> The same applies to other functions below that do something with all files
> in the session. In my view luo_session should iterate through
> luo_session.files_list and call luo_file methods for each luo_file object.

Let's not do that, files within a session related operations belong to
file, sessions within LUO related operations belong to luo_session

> > +int luo_file_freeze(struct luo_session *session)
> > +{
> > +     struct luo_file_ser *file_ser = session->files;
> > +     struct luo_file *luo_file;
> > +     int err;
> > +     int i;
> > +
> > +     lockdep_assert_held(&session->mutex);
> > +
> > +     if (!session->count)
> > +             return 0;
> > +
> > +     if (WARN_ON(!file_ser))
> > +             return -EINVAL;
> > +
> > +     i = 0;
> > +     list_for_each_entry(luo_file, &session->files_list, list) {
> > +             err = luo_file_freeze_one(session, luo_file);
> > +             if (err < 0) {
> > +                     pr_warn("Freeze failed for session[%s] token[%#0llx] handler[%s] err[%pe]\n",
> > +                             session->name, luo_file->token,
> > +                             luo_file->fh->compatible, ERR_PTR(err));
> > +                     goto exit_err;
> > +             }
> > +
> > +             strscpy(file_ser[i].compatible, luo_file->fh->compatible,
> > +                     sizeof(file_ser[i].compatible));
> > +             file_ser[i].data = luo_file->serialized_data;
> > +             file_ser[i].token = luo_file->token;
> > +             i++;
> > +     }
> > +
> > +     return 0;
> > +
> > +exit_err:
> > +     __luo_file_unfreeze(session, luo_file);
>
> Maybe move frozen files to a local list, call __luo_file_unfreeze() with
> that list and than splice it back to session.files_list?

IMO, it would add unnecessary complications. session is locked,
session->files_list is all under our control, no need to add
complications with private list.

> > +             luo_file = kzalloc(sizeof(*luo_file), GFP_KERNEL);
> > +             if (!luo_file)
> > +                     return -ENOMEM;
>
> Shouldn't we free files allocated on the previous iterations?

No, for the same reason explained in luo_session.c :-)

>
> > +
> > +             luo_file->fh = fh;
> > +             luo_file->file = NULL;
> > +             luo_file->serialized_data = file_ser[i].data;
> > +             luo_file->token = file_ser[i].token;
> > +             luo_file->retrieved = false;
> > +             mutex_init(&luo_file->mutex);
> > +             list_add_tail(&luo_file->list, &session->files_list);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/**
> > + * liveupdate_register_file_handler - Register a file handler with LUO.
> > + * @fh: Pointer to a caller-allocated &struct liveupdate_file_handler.
> > + * The caller must initialize this structure, including a unique
> > + * 'compatible' string and a valid 'fh' callbacks. This function adds the
> > + * handler to the global list of supported file handlers.
> > + *
> > + * Context: Typically called during module initialization for file types that
> > + * support live update preservation.
> > + *
> > + * Return: 0 on success. Negative errno on failure.
> > + */
> > +int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > +{
> > +     static DEFINE_MUTEX(register_file_handler_lock);
> > +     struct liveupdate_file_handler *fh_iter;
> > +
> > +     if (!liveupdate_enabled())
> > +             return -EOPNOTSUPP;
> > +
> > +     /*
> > +      * Once sessions have been deserialized, file handlers cannot be
> > +      * registered, it is too late.
> > +      */
> > +     if (WARN_ON(luo_session_is_deserialized()))
> > +             return -EBUSY;
> > +
> > +     /* Sanity check that all required callbacks are set */
> > +     if (!fh->ops->preserve || !fh->ops->unpreserve ||
> > +         !fh->ops->retrieve || !fh->ops->finish) {
> > +             return -EINVAL;
> > +     }
> > +
> > +     guard(mutex)(&register_file_handler_lock);
> > +     list_for_each_entry(fh_iter, &luo_file_handler_list, list) {
> > +             if (!strcmp(fh_iter->compatible, fh->compatible)) {
> > +                     pr_err("File handler registration failed: Compatible string '%s' already registered.\n",
> > +                            fh->compatible);
> > +                     return -EEXIST;
> > +             }
> > +     }
> > +
> > +     if (!try_module_get(fh->ops->owner))
> > +             return -EAGAIN;
> > +
> > +     INIT_LIST_HEAD(&fh->list);
> > +     list_add_tail(&fh->list, &luo_file_handler_list);
> > +
> > +     return 0;
> > +}
> > +
> > +/**
> > + * liveupdate_get_token_outgoing - Get the token for a preserved file.
> > + * @s:      The outgoing liveupdate session.
> > + * @file:   The file object to search for.
> > + * @tokenp: Output parameter for the found token.
> > + *
> > + * Searches the list of preserved files in an outgoing session for a matching
> > + * file object. If found, the corresponding user-provided token is returned.
> > + *
> > + * This function is intended for in-kernel callers that need to correlate a
> > + * file with its liveupdate token.
> > + *
> > + * Context: Can be called from any context that can acquire the session mutex.
> > + * Return: 0 on success, -ENOENT if the file is not preserved in this session.
> > + */
> > +int liveupdate_get_token_outgoing(struct liveupdate_session *s,
> > +                               struct file *file, u64 *tokenp)
> > +{
>
> This function is apparently unused.
>
> > +     struct luo_session *session = (struct luo_session *)s;
> > +     struct luo_file *luo_file;
> > +     int err = -ENOENT;
> > +
> > +     list_for_each_entry(luo_file, &session->files_list, list) {
> > +             if (luo_file->file == file) {
> > +                     if (tokenp)
> > +                             *tokenp = luo_file->token;
> > +                     err = 0;
> > +                     break;
> > +             }
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +/**
> > + * liveupdate_get_file_incoming - Retrieves a preserved file for in-kernel use.
> > + * @s:      The incoming liveupdate session (restored from the previous kernel).
> > + * @token:  The unique token identifying the file to retrieve.
> > + * @filep:  On success, this will be populated with a pointer to the retrieved
> > + *          'struct file'.
> > + *
> > + * Provides a kernel-internal API for other subsystems to retrieve their
> > + * preserved files after a live update. This function is a simple wrapper
> > + * around luo_retrieve_file(), allowing callers to find a file by its token.
> > + *
> > + * The operation is idempotent; subsequent calls for the same token will return
> > + * a pointer to the same 'struct file' object.
> > + *
> > + * The caller receives a pointer to the file with a reference incremented. The
> > + * file's lifetime is managed by LUO and any userspace file
> > + * descriptors. If the caller needs to hold a reference to the file beyond the
> > + * immediate scope, it must call get_file() itself.
> > + *
> > + * Context: Can be called from any context in the new kernel that has a handle
> > + *          to a restored session.
> > + * Return: 0 on success. Returns -ENOENT if no file with the matching token is
> > + *         found, or any other negative errno on failure.
> > + */
> > +int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
> > +                              struct file **filep)
> > +{
>
> Ditto.

These two functions are part of the public API allowing dependency
tracking for vfio->iommu->memfd during preservation.

>
> > +     struct luo_session *session = (struct luo_session *)s;
> > +
> > +     return luo_retrieve_file(session, token, filep);
> > +}
> > diff --git a/kernel/liveupdate/luo_internal.h b/kernel/liveupdate/luo_internal.h
> > index 5185ad37a8c1..1a36f2383123 100644
> > --- a/kernel/liveupdate/luo_internal.h
> > +++ b/kernel/liveupdate/luo_internal.h
> > @@ -70,4 +70,13 @@ int luo_session_serialize(void);
> >  int luo_session_deserialize(void);
> >  bool luo_session_is_deserialized(void);
> >
> > +int luo_preserve_file(struct luo_session *session, u64 token, int fd);
> > +void luo_file_unpreserve_files(struct luo_session *session);
> > +int luo_file_freeze(struct luo_session *session);
> > +void luo_file_unfreeze(struct luo_session *session);
> > +int luo_retrieve_file(struct luo_session *session, u64 token,
> > +                   struct file **filep);
> > +int luo_file_finish(struct luo_session *session);
> > +int luo_file_deserialize(struct luo_session *session);
> > +
> >  #endif /* _LINUX_LUO_INTERNAL_H */
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 04/20] liveupdate: luo_session: add sessions support
From: Pasha Tatashin @ 2025-11-17 15:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoEduya5EO8Xc1b@kernel.org>

> > +/**
> > + * struct luo_session_ser - Represents the serialized metadata for a LUO session.
> > + * @name:    The unique name of the session, copied from the `luo_session`
> > + *           structure.
>
> I'd phase it as
>
>                 The unique name of the session provided by the userspace at
>                 the time of session creation.

Done

>
> > + * @files:   The physical address of a contiguous memory block that holds
> > + *           the serialized state of files.
>
> Maybe add                                    ^ in this session?

Done

>
> > + * @pgcnt:   The number of pages occupied by the `files` memory block.
> > + * @count:   The total number of files that were part of this session during
> > + *           serialization. Used for iteration and validation during
> > + *           restoration.
> > + *
> > + * This structure is used to package session-specific metadata for transfer
> > + * between kernels via Kexec Handover. An array of these structures (one per
> > + * session) is created and passed to the new kernel, allowing it to reconstruct
> > + * the session context.
> > + *
> > + * If this structure is modified, LUO_SESSION_COMPATIBLE must be updated.
>
> This comment applies to the luo_session_header_ser description as well.

Done

>
> > + */
> > +struct luo_session_ser {
> > +     char name[LIVEUPDATE_SESSION_NAME_LENGTH];
> > +     u64 files;
> > +     u64 pgcnt;
> > +     u64 count;
> > +} __packed;
> > +
> >  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> > diff --git a/include/uapi/linux/liveupdate.h b/include/uapi/linux/liveupdate.h
> > index df34c1642c4d..d2ef2f7e0dbd 100644
> > --- a/include/uapi/linux/liveupdate.h
> > +++ b/include/uapi/linux/liveupdate.h
> > @@ -43,4 +43,7 @@
> >  /* The ioctl type, documented in ioctl-number.rst */
> >  #define LIVEUPDATE_IOCTL_TYPE                0xBA
> >
> > +/* The maximum length of session name including null termination */
> > +#define LIVEUPDATE_SESSION_NAME_LENGTH 56
>
> You decided not to bump it to 64 in the end? ;-)

I bumped it to 64, but in the next patch, I will fix it in the next version.

>
> > +
> >  #endif /* _UAPI_LIVEUPDATE_H */
> > diff --git a/kernel/liveupdate/Makefile b/kernel/liveupdate/Makefile
> > index 413722002b7a..83285e7ad726 100644
> > --- a/kernel/liveupdate/Makefile
> > +++ b/kernel/liveupdate/Makefile
> > @@ -2,7 +2,8 @@
> >
> >  luo-y :=                                                             \
> >               luo_core.o                                              \
> > -             luo_ioctl.o
> > +             luo_ioctl.o                                             \
> > +             luo_session.o
> >
> >  obj-$(CONFIG_KEXEC_HANDOVER)         += kexec_handover.o
> >  obj-$(CONFIG_KEXEC_HANDOVER_DEBUG)   += kexec_handover_debug.o
>
> ...
>
> > +int luo_session_retrieve(const char *name, struct file **filep)
> > +{
> > +     struct luo_session_header *sh = &luo_session_global.incoming;
> > +     struct luo_session *session = NULL;
> > +     struct luo_session *it;
> > +     int err;
> > +
> > +     scoped_guard(rwsem_read, &sh->rwsem) {
> > +             list_for_each_entry(it, &sh->list, list) {
> > +                     if (!strncmp(it->name, name, sizeof(it->name))) {
> > +                             session = it;
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +
> > +     if (!session)
> > +             return -ENOENT;
> > +
> > +     scoped_guard(mutex, &session->mutex) {
> > +             if (session->retrieved)
> > +                     return -EINVAL;
> > +     }
> > +
> > +     err = luo_session_getfile(session, filep);
> > +     if (!err) {
> > +             scoped_guard(mutex, &session->mutex)
> > +                     session->retrieved = true;
>
> Retaking the mutex here seems a bit odd.
> Do we really have to lock session->mutex in luo_session_getfile()?

Moved it out of luo_session_getfile(), and added
lockdep_assert_held(&session->mutex); to luo_session_getfile


> > +int luo_session_deserialize(void)
> > +{
> > +     struct luo_session_header *sh = &luo_session_global.incoming;
> > +     int err;
> > +
> > +     if (luo_session_is_deserialized())
> > +             return 0;
> > +
> > +     luo_session_global.deserialized = true;
> > +     if (!sh->active) {
> > +             INIT_LIST_HEAD(&sh->list);
> > +             init_rwsem(&sh->rwsem);
> > +             return 0;
>
> How this can happen? luo_session_deserialize() is supposed to be called
> from ioctl and luo_session_global.incoming should be set up way earlier.

No LUO was passed from the previous kernel, so
luo_session_global.incoming.active stays false, as it is not
participating.

> And, why don't we initialize ->list and ->rwsem statically?

Good idea, done.

> > +     }
> > +
> > +     for (int i = 0; i < sh->header_ser->count; i++) {
> > +             struct luo_session *session;
> > +
> > +             session = luo_session_alloc(sh->ser[i].name);
> > +             if (IS_ERR(session)) {
> > +                     pr_warn("Failed to allocate session [%s] during deserialization %pe\n",
> > +                             sh->ser[i].name, session);
> > +                     return PTR_ERR(session);
> > +             }
>
> The allocated sessions still need to be freed if an insert fails ;-)

No. We have failed to deserialize, so anyways the machine will need to
be rebooted by the user in order to release the preserved resources.

This is something that Jason Gunthrope also mentioned regarding IOMMU:
if something is not correct (i.e., if a session cannot finish for some
reason), don't add complicated "undo" code that cleans up all
resources. Instead, treat them as a memory leak and allow a reboot to
perform the cleanup.

While in this particular patch the clean-up looks simple, later in the
series we are adding file deserialization to each session to this
function. So, the clean-up will look like this: we would have to free
the resources for each session we deserialized, and also free the
resources for files that were deserialized for those sessions, only to
still boot into a "maintenance" mode where bunch of resources are not
accessible from which the machine would have to be rebooted to get
back to a normal state. This code will never be tested, and never be
used, so let's use reboot to solve this problem, where devices are
going to be properly reset, and memory is going to be properly freed.

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-17 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251116185406.0fb85a3c52c16c91af1a0c80@linux-foundation.org>

On Sun, Nov 16, 2025 at 9:54 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sat, 15 Nov 2025 18:33:47 -0500 Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec).
>
> Thanks, I updated mm.git's mm-unstable branch to this version.  I
> expect at least one more version as a result of feedback for this v6.

Thank you Andrew! I plan to address all comments and send a v7 in
about a week. The comments/changes so far are minor, so I hope to land
this during the next merging window

>
> I wasn't able to reproduce Stephen's build error
> (https://lkml.kernel.org/r/20251117093614.1490d048@canb.auug.org.au)
> with this series.

That build error was fixed with the KHO fix-up patch back on Friday.

>

^ permalink raw reply

* Re: [PATCH v6 05/20] liveupdate: luo_ioctl: add user interface
From: Pasha Tatashin @ 2025-11-17 14:22 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoGw9gml3vozrbz@kernel.org>

> > --- a/include/uapi/linux/liveupdate.h
> > +++ b/include/uapi/linux/liveupdate.h
> > @@ -44,6 +44,70 @@
> >  #define LIVEUPDATE_IOCTL_TYPE                0xBA
> >
> >  /* The maximum length of session name including null termination */
> > -#define LIVEUPDATE_SESSION_NAME_LENGTH 56
> > +#define LIVEUPDATE_SESSION_NAME_LENGTH 64

Ah, here I updated the session name length :-) I will move this change
to the proper patch.

> > +/**
> > + * struct liveupdate_ioctl_create_session - ioctl(LIVEUPDATE_IOCTL_CREATE_SESSION)
> > + * @size:    Input; sizeof(struct liveupdate_ioctl_create_session)
> > + * @fd:              Output; The new file descriptor for the created session.
> > + * @name:    Input; A null-terminated string for the session name, max
> > + *           length %LIVEUPDATE_SESSION_NAME_LENGTH including termination
> > + *           char.
>
> Nit:          ^ character

Done.

> > +     if (atomic_cmpxchg(&ldev->in_use, 0, 1))
> > +             return -EBUSY;
> > +
> > +     luo_session_deserialize();
>
> Why luo_session_deserialize() is tied to the first open of the chardev?

Because at this point, when `/dev/liveupdate` is opened we expect that
userspace has finished loading modules that might register
File-Handlers, and FLBs, with LUO, and therefore we can deserialize
the sessions and find all the rightful owners for FDs. After this
point, we also forbid registering new FHs and FLBs.

Pasha

^ permalink raw reply

* Re: [PATCH v5 22/22] tests/liveupdate: Add in-kernel liveupdate test
From: Pasha Tatashin @ 2025-11-17 14:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aRoZq2bYYm5MGihy@kernel.org>

On Sun, Nov 16, 2025 at 1:36 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Wed, Nov 12, 2025 at 03:40:53PM -0500, Pasha Tatashin wrote:
> > On Wed, Nov 12, 2025 at 3:24 PM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > On Fri, Nov 07, 2025 at 04:03:20PM -0500, Pasha Tatashin wrote:
> > > > Introduce an in-kernel test module to validate the core logic of the
> > > > Live Update Orchestrator's File-Lifecycle-Bound feature. This
> > > > provides a low-level, controlled environment to test FLB registration
> > > > and callback invocation without requiring userspace interaction or
> > > > actual kexec reboots.
> > > >
> > > > The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
> > > >
> > > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > > > ---
> > > >  kernel/liveupdate/luo_file.c     |   2 +
> > > >  kernel/liveupdate/luo_internal.h |   8 ++
> > > >  lib/Kconfig.debug                |  23 ++++++
> > > >  lib/tests/Makefile               |   1 +
> > > >  lib/tests/liveupdate.c           | 130 +++++++++++++++++++++++++++++++
> > > >  5 files changed, 164 insertions(+)
> > > >  create mode 100644 lib/tests/liveupdate.c
> > > >
> > > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > > index 713069b96278..4c0a75918f3d 100644
> > > > --- a/kernel/liveupdate/luo_file.c
> > > > +++ b/kernel/liveupdate/luo_file.c
> > > > @@ -829,6 +829,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > > >       INIT_LIST_HEAD(&fh->flb_list);
> > > >       list_add_tail(&fh->list, &luo_file_handler_list);
> > > >
> > > > +     liveupdate_test_register(fh);
> > > > +
> > >
> > > Do it mean that every flb user will be added here?
> >
> > No, FLB users will use:
> >
> > liveupdate_register_flb() from various subsystems. This
> > liveupdate_test_register() is only to allow kernel test to register
> > test-FLBs to every single file-handler for in-kernel testing purpose
> > only.
>
> Why the in kernel test cannot liveupdate_register_flb()?

The kernel tests call liveupdate_register_flb() with every
file-handler that registers with LUO. It is unreasonable to expect
that all file handlers from various subsystems are going to be
exported and accessible to kernel test.

>
> > Pasha
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v6 20/20] tests/liveupdate: Add in-kernel liveupdate test
From: Mike Rapoport @ 2025-11-17 11:13 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-21-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:34:06PM -0500, Pasha Tatashin wrote:
> Introduce an in-kernel test module to validate the core logic of the
> Live Update Orchestrator's File-Lifecycle-Bound feature. This
> provides a low-level, controlled environment to test FLB registration
> and callback invocation without requiring userspace interaction or
> actual kexec reboots.
> 
> The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate/abi/luo.h |   5 +
>  kernel/liveupdate/luo_file.c       |   2 +
>  kernel/liveupdate/luo_internal.h   |   6 ++
>  lib/Kconfig.debug                  |  23 +++++
>  lib/tests/Makefile                 |   1 +
>  lib/tests/liveupdate.c             | 143 +++++++++++++++++++++++++++++
>  6 files changed, 180 insertions(+)
>  create mode 100644 lib/tests/liveupdate.c
> 
> diff --git a/include/linux/liveupdate/abi/luo.h b/include/linux/liveupdate/abi/luo.h
> index 85596ce68c16..cdcace9b48f5 100644
> --- a/include/linux/liveupdate/abi/luo.h
> +++ b/include/linux/liveupdate/abi/luo.h
> @@ -230,4 +230,9 @@ struct luo_flb_ser {
>  	u64 count;
>  } __packed;
>  
> +/* Kernel Live Update Test ABI */
> +#ifdef CONFIG_LIVEUPDATE_TEST
> +#define LIVEUPDATE_TEST_FLB_COMPATIBLE(i)	"liveupdate-test-flb-v" #i
> +#endif
> +
>  #endif /* _LINUX_LIVEUPDATE_ABI_LUO_H */
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> index df337c9c4f21..9a531096bdb5 100644
> --- a/kernel/liveupdate/luo_file.c
> +++ b/kernel/liveupdate/luo_file.c
> @@ -834,6 +834,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
>  	INIT_LIST_HEAD(&fh->flb_list);
>  	list_add_tail(&fh->list, &luo_file_handler_list);
>  
> +	liveupdate_test_register(fh);
> +

Why this cannot be called from the test?

>  	return 0;
>  }
>  

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 15/20] mm: memfd_luo: allow preserving memfd
From: Mike Rapoport @ 2025-11-17 11:03 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-16-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:34:01PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
> 
> The ability to preserve a memfd allows userspace to use KHO and LUO to
> transfer its memory contents to the next kernel. This is useful in many
> ways. For one, it can be used with IOMMUFD as the backing store for
> IOMMU page tables. Preserving IOMMUFD is essential for performing a
> hypervisor live update with passthrough devices. memfd support provides
> the first building block for making that possible.
> 
> For another, applications with a large amount of memory that takes time
> to reconstruct, reboots to consume kernel upgrades can be very
> expensive. memfd with LUO gives those applications reboot-persistent
> memory that they can use to quickly save and reconstruct that state.
> 
> While memfd is backed by either hugetlbfs or shmem, currently only
> support on shmem is added. To be more precise, support for anonymous
> shmem files is added.
> 
> The handover to the next kernel is not transparent. All the properties
> of the file are not preserved; only its memory contents, position, and
> size. The recreated file gets the UID and GID of the task doing the
> restore, and the task's cgroup gets charged with the memory.
> 
> Once preserved, the file cannot grow or shrink, and all its pages are
> pinned to avoid migrations and swapping. The file can still be read from
> or written to.
> 
> Use vmalloc to get the buffer to hold the folios, and preserve
> it using kho_preserve_vmalloc(). This doesn't have the size limit.
> 
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>

The order of signed-offs seems wrong, Pasha's should be the last one.

> ---

...

> +/**
> + * DOC: memfd Live Update ABI
> + *
> + * This header defines the ABI for preserving the state of a memfd across a
> + * kexec reboot using the LUO.
> + *
> + * The state is serialized into a Flattened Device Tree which is then handed
> + * over to the next kernel via the KHO mechanism. The FDT is passed as the
> + * opaque `data` handle in the file handler callbacks.
> + *
> + * This interface is a contract. Any modification to the FDT structure,
> + * node properties, compatible string, or the layout of the serialization
> + * structures defined here constitutes a breaking change. Such changes require
> + * incrementing the version number in the MEMFD_LUO_FH_COMPATIBLE string.

The same comment about contract as for the generic LUO documentation
applies here (https://lore.kernel.org/all/aRnG8wDSSAtkEI_z@kernel.org/)

> + *
> + * FDT Structure Overview:
> + *   The memfd state is contained within a single FDT with the following layout:

...

> +static struct memfd_luo_folio_ser *memfd_luo_preserve_folios(struct file *file, void *fdt,
> +							     u64 *nr_foliosp)
> +{

If we are already returning nr_folios by reference, we might do it for
memfd_luo_folio_ser as well and make the function return int.

> +	struct inode *inode = file_inode(file);
> +	struct memfd_luo_folio_ser *pfolios;
> +	struct kho_vmalloc *kho_vmalloc;
> +	unsigned int max_folios;
> +	long i, size, nr_pinned;
> +	struct folio **folios;

pfolios and folios read like the former is a pointer to latter.
I'd s/pfolios/folios_ser/

> +	int err = -EINVAL;
> +	pgoff_t offset;
> +	u64 nr_folios;

...

> +	kvfree(folios);
> +	*nr_foliosp = nr_folios;
> +	return pfolios;
> +
> +err_unpreserve:
> +	i--;
> +	for (; i >= 0; i--)

Maybe a single line

	for (--i; i >= 0; --i)

> +		kho_unpreserve_folio(folios[i]);
> +	vfree(pfolios);
> +err_unpin:
> +	unpin_folios(folios, nr_folios);
> +err_free_folios:
> +	kvfree(folios);
> +	return ERR_PTR(err);
> +}
> +
> +static void memfd_luo_unpreserve_folios(void *fdt, struct memfd_luo_folio_ser *pfolios,
> +					u64 nr_folios)
> +{
> +	struct kho_vmalloc *kho_vmalloc;
> +	long i;
> +
> +	if (!nr_folios)
> +		return;
> +
> +	kho_vmalloc = (struct kho_vmalloc *)fdt_getprop(fdt, 0, MEMFD_FDT_FOLIOS, NULL);
> +	/* The FDT was created by this kernel so expect it to be sane. */
> +	WARN_ON_ONCE(!kho_vmalloc);

The FDT won't have FOLIOS property if size was zero, will it?
I think that if we add kho_vmalloc handle to struct memfd_luo_private and
pass that around it will make things easier and simpler.

> +	kho_unpreserve_vmalloc(kho_vmalloc);
> +
> +	for (i = 0; i < nr_folios; i++) {
> +		const struct memfd_luo_folio_ser *pfolio = &pfolios[i];
> +		struct folio *folio;
> +
> +		if (!pfolio->foliodesc)
> +			continue;

How can this happen? Can pfolios be a sparse array?

> +		folio = pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> +
> +		kho_unpreserve_folio(folio);
> +		unpin_folio(folio);
> +	}
> +
> +	vfree(pfolios);
> +}

...

> +static void memfd_luo_finish(struct liveupdate_file_op_args *args)
> +{
> +	const struct memfd_luo_folio_ser *pfolios;
> +	struct folio *fdt_folio;
> +	const void *fdt;
> +	u64 nr_folios;
> +
> +	if (args->retrieved)
> +		return;
> +
> +	fdt_folio = memfd_luo_get_fdt(args->serialized_data);
> +	if (!fdt_folio) {
> +		pr_err("failed to restore memfd FDT\n");
> +		return;
> +	}
> +
> +	fdt = folio_address(fdt_folio);
> +
> +	pfolios = memfd_luo_fdt_folios(fdt, &nr_folios);
> +	if (!pfolios)
> +		goto out;
> +
> +	memfd_luo_discard_folios(pfolios, nr_folios);

Does not this free the actual folios that were supposed to be preserved?

> +	vfree(pfolios);
> +
> +out:
> +	folio_put(fdt_folio);
> +}

...

> +static int memfd_luo_retrieve(struct liveupdate_file_op_args *args)
> +{
> +	struct folio *fdt_folio;
> +	const u64 *pos, *size;
> +	struct file *file;
> +	int len, ret = 0;
> +	const void *fdt;
> +
> +	fdt_folio = memfd_luo_get_fdt(args->serialized_data);

Why do we need to kho_restore_folio() twice? Here and in
memfd_luo_finish()?

> +	if (!fdt_folio)
> +		return -ENOENT;
> +
> +	fdt = page_to_virt(folio_page(fdt_folio, 0));

folio_address()


-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 14/20] liveupdate: luo_file: add private argument to store runtime state
From: Mike Rapoport @ 2025-11-17 10:15 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-15-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:34:00PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <pratyush@kernel.org>
> 
> Currently file handlers only get the serialized_data field to store
> their state. This field has a pointer to the serialized state of the
> file, and it becomes a part of LUO file's serialized state.
> 
> File handlers can also need some runtime state to track information that
> shouldn't make it in the serialized data.
> 
> One such example is a vmalloc pointer. While kho_preserve_vmalloc()
> preserves the memory backing a vmalloc allocation, it does not store the
> original vmap pointer, since that has no use being passed to the next
> kernel. The pointer is needed to free the memory in case the file is
> unpreserved.
> 
> Provide a private field in struct luo_file and pass it to all the
> callbacks. The field's can be set by preserve, and must be freed by
> unpreserve.
> 
> Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/liveupdate.h   | 5 +++++
>  kernel/liveupdate/luo_file.c | 9 +++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 36a831ae3ead..defc69a1985d 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -29,6 +29,10 @@ struct file;
>   *                    this to the file being operated on.
>   * @serialized_data:  The opaque u64 handle, preserve/prepare/freeze may update
>   *                    this field.
> + * @private_data:     Private data for the file used to hold runtime state that
> + *                    is not preserved. Set by the handler's .preserve()
> + *                    callback, and must be freed in the handler's
> + *                    .unpreserve() callback.
>   *
>   * This structure bundles all parameters for the file operation callbacks.
>   * The 'data' and 'file' fields are used for both input and output.
> @@ -39,6 +43,7 @@ struct liveupdate_file_op_args {
>  	bool retrieved;
>  	struct file *file;
>  	u64 serialized_data;
> +	void *private_data;
>  };
>  
>  /**
> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> index 3d3bd84cb281..df337c9c4f21 100644
> --- a/kernel/liveupdate/luo_file.c
> +++ b/kernel/liveupdate/luo_file.c
> @@ -126,6 +126,10 @@ static LIST_HEAD(luo_file_handler_list);
>   *                 This handle is passed back to the handler's .freeze(),
>   *                 .retrieve(), and .finish() callbacks, allowing it to track
>   *                 and update its serialized state across phases.
> + * @private_data:  Pointer to the private data for the file used to hold runtime
> + *                 state that is not preserved. Set by the handler's .preserve()
> + *                 callback, and must be freed in the handler's .unpreserve()
> + *                 callback.
>   * @retrieved:     A flag indicating whether a user/kernel in the new kernel has
>   *                 successfully called retrieve() on this file. This prevents
>   *                 multiple retrieval attempts.
> @@ -152,6 +156,7 @@ struct luo_file {
>  	struct liveupdate_file_handler *fh;
>  	struct file *file;
>  	u64 serialized_data;
> +	void *private_data;
>  	bool retrieved;
>  	struct mutex mutex;
>  	struct list_head list;
> @@ -309,6 +314,7 @@ int luo_preserve_file(struct luo_session *session, u64 token, int fd)
>  		goto exit_err;
>  	} else {
>  		luo_file->serialized_data = args.serialized_data;
> +		luo_file->private_data = args.private_data;
>  		list_add_tail(&luo_file->list, &session->files_list);
>  		session->count++;
>  	}
> @@ -356,6 +362,7 @@ void luo_file_unpreserve_files(struct luo_session *session)
>  		args.session = (struct liveupdate_session *)session;
>  		args.file = luo_file->file;
>  		args.serialized_data = luo_file->serialized_data;
> +		args.private_data = luo_file->private_data;
>  		luo_file->fh->ops->unpreserve(&args);
>  		luo_flb_file_unpreserve(luo_file->fh);
>  
> @@ -384,6 +391,7 @@ static int luo_file_freeze_one(struct luo_session *session,
>  		args.session = (struct liveupdate_session *)session;
>  		args.file = luo_file->file;
>  		args.serialized_data = luo_file->serialized_data;
> +		args.private_data = luo_file->private_data;
>  
>  		err = luo_file->fh->ops->freeze(&args);
>  		if (!err)
> @@ -405,6 +413,7 @@ static void luo_file_unfreeze_one(struct luo_session *session,
>  		args.session = (struct liveupdate_session *)session;
>  		args.file = luo_file->file;
>  		args.serialized_data = luo_file->serialized_data;
> +		args.private_data = luo_file->private_data;
>  
>  		luo_file->fh->ops->unfreeze(&args);
>  	}
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 13/20] mm: shmem: export some functions to internal.h
From: Mike Rapoport @ 2025-11-17 10:14 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-14-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:59PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
> 
> shmem_inode_acct_blocks(), shmem_recalc_inode(), and
> shmem_add_to_page_cache() are used by shmem_alloc_and_add_folio(). This
> functionality will also be used in the future by Live Update
> Orchestrator (LUO) to recreate memfd files after a live update.

I'd rephrase this a bit to say that it will be used by memfd integration
into LUO to emphasize this stays inside mm.

Other than that

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> 
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/internal.h |  6 ++++++
>  mm/shmem.c    | 10 +++++-----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..4ba155524f80 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1562,6 +1562,12 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid);
>  unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
>  			  int priority);
>  
> +int shmem_add_to_page_cache(struct folio *folio,
> +			    struct address_space *mapping,
> +			    pgoff_t index, void *expected, gfp_t gfp);
> +int shmem_inode_acct_blocks(struct inode *inode, long pages);
> +bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped);
> +
>  #ifdef CONFIG_SHRINKER_DEBUG
>  static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
>  			struct shrinker *shrinker, const char *fmt, va_list ap)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 05c3db840257..c3dc4af59c14 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -219,7 +219,7 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
>  		vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
>  }
>  
> -static int shmem_inode_acct_blocks(struct inode *inode, long pages)
> +int shmem_inode_acct_blocks(struct inode *inode, long pages)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> @@ -435,7 +435,7 @@ static void shmem_free_inode(struct super_block *sb, size_t freed_ispace)
>   *
>   * Return: true if swapped was incremented from 0, for shmem_writeout().
>   */
> -static bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
> +bool shmem_recalc_inode(struct inode *inode, long alloced, long swapped)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	bool first_swapped = false;
> @@ -861,9 +861,9 @@ static void shmem_update_stats(struct folio *folio, int nr_pages)
>  /*
>   * Somewhat like filemap_add_folio, but error if expected item has gone.
>   */
> -static int shmem_add_to_page_cache(struct folio *folio,
> -				   struct address_space *mapping,
> -				   pgoff_t index, void *expected, gfp_t gfp)
> +int shmem_add_to_page_cache(struct folio *folio,
> +			    struct address_space *mapping,
> +			    pgoff_t index, void *expected, gfp_t gfp)
>  {
>  	XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
>  	unsigned long nr = folio_nr_pages(folio);
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 12/20] mm: shmem: allow freezing inode mapping
From: Mike Rapoport @ 2025-11-17 10:08 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-13-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:58PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
> 
> To prepare a shmem inode for live update via the Live Update
> Orchestrator (LUO), its index -> folio mappings must be serialized. Once
> the mappings are serialized, they cannot change since it would cause the
> serialized data to become inconsistent. This can be done by pinning the
> folios to avoid migration, and by making sure no folios can be added to
> or removed from the inode.
> 
> While mechanisms to pin folios already exist, the only way to stop
> folios being added or removed are the grow and shrink file seals. But
> file seals come with their own semantics, one of which is that they
> can't be removed. This doesn't work with liveupdate since it can be
> cancelled or error out, which would need the seals to be removed and the
> file's normal functionality to be restored.
> 
> Introduce SHMEM_F_MAPPING_FROZEN to indicate this instead. It is
> internal to shmem and is not directly exposed to userspace. It functions
> similar to F_SEAL_GROW | F_SEAL_SHRINK, but additionally disallows hole
> punching, and can be removed.
> 
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/shmem_fs.h | 17 +++++++++++++++++
>  mm/shmem.c               | 12 +++++++++++-
>  2 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 650874b400b5..a9f5db472a39 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -24,6 +24,14 @@ struct swap_iocb;
>  #define SHMEM_F_NORESERVE	BIT(0)
>  /* Disallow swapping. */
>  #define SHMEM_F_LOCKED		BIT(1)
> +/*
> + * Disallow growing, shrinking, or hole punching in the inode. Combined with
> + * folio pinning, makes sure the inode's mapping stays fixed.
> + *
> + * In some ways similar to F_SEAL_GROW | F_SEAL_SHRINK, but can be removed and
> + * isn't directly visible to userspace.
> + */
> +#define SHMEM_F_MAPPING_FROZEN	BIT(2)
>  
>  struct shmem_inode_info {
>  	spinlock_t		lock;
> @@ -186,6 +194,15 @@ static inline bool shmem_file(struct file *file)
>  	return shmem_mapping(file->f_mapping);
>  }
>  
> +/* Must be called with inode lock taken exclusive. */
> +static inline void shmem_i_mapping_freeze(struct inode *inode, bool freeze)

_mapping usually refers to operations on struct address_space.
It seems that all shmem methods that take inode are just shmem_<operation>,
so shmem_freeze() looks more appropriate.

> +{
> +	if (freeze)
> +		SHMEM_I(inode)->flags |= SHMEM_F_MAPPING_FROZEN;
> +	else
> +		SHMEM_I(inode)->flags &= ~SHMEM_F_MAPPING_FROZEN;
> +}
> +
>  /*
>   * If fallocate(FALLOC_FL_KEEP_SIZE) has been used, there may be pages
>   * beyond i_size's notion of EOF, which fallocate has committed to reserving:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1d5036dec08a..05c3db840257 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1292,7 +1292,8 @@ static int shmem_setattr(struct mnt_idmap *idmap,
>  		loff_t newsize = attr->ia_size;
>  
>  		/* protected by i_rwsem */
> -		if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
> +		if ((info->flags & SHMEM_F_MAPPING_FROZEN) ||

A corner case: if newsize == oldsize this will be a false positive

> +		    (newsize < oldsize && (info->seals & F_SEAL_SHRINK)) ||
>  		    (newsize > oldsize && (info->seals & F_SEAL_GROW)))
>  			return -EPERM;
>  
> @@ -3289,6 +3290,10 @@ shmem_write_begin(const struct kiocb *iocb, struct address_space *mapping,
>  			return -EPERM;
>  	}
>  
> +	if (unlikely((info->flags & SHMEM_F_MAPPING_FROZEN) &&
> +		     pos + len > inode->i_size))
> +		return -EPERM;
> +
>  	ret = shmem_get_folio(inode, index, pos + len, &folio, SGP_WRITE);
>  	if (ret)
>  		return ret;
> @@ -3662,6 +3667,11 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  
>  	inode_lock(inode);
>  
> +	if (info->flags & SHMEM_F_MAPPING_FROZEN) {
> +		error = -EPERM;
> +		goto out;
> +	}
> +
>  	if (mode & FALLOC_FL_PUNCH_HOLE) {
>  		struct address_space *mapping = file->f_mapping;
>  		loff_t unmap_start = round_up(offset, PAGE_SIZE);
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 11/20] mm: shmem: use SHMEM_F_* flags instead of VM_* flags
From: Mike Rapoport @ 2025-11-17  9:48 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-12-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:57PM -0500, Pasha Tatashin wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
> 
> shmem_inode_info::flags can have the VM flags VM_NORESERVE and
> VM_LOCKED. These are used to suppress pre-accounting or to lock the
> pages in the inode respectively. Using the VM flags directly makes it
> difficult to add shmem-specific flags that are unrelated to VM behavior
> since one would need to find a VM flag not used by shmem and re-purpose
> it.
> 
> Introduce SHMEM_F_NORESERVE and SHMEM_F_LOCKED which represent the same
> information, but their bits are independent of the VM flags. Callers can
> still pass VM_NORESERVE to shmem_get_inode(), but it gets transformed to
> the shmem-specific flag internally.
> 
> No functional changes intended.
> 
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
>  include/linux/shmem_fs.h |  6 ++++++
>  mm/shmem.c               | 28 +++++++++++++++-------------
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 0e47465ef0fd..650874b400b5 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -10,6 +10,7 @@
>  #include <linux/xattr.h>
>  #include <linux/fs_parser.h>
>  #include <linux/userfaultfd_k.h>
> +#include <linux/bits.h>
>  
>  struct swap_iocb;
>  
> @@ -19,6 +20,11 @@ struct swap_iocb;
>  #define SHMEM_MAXQUOTAS 2
>  #endif
>  
> +/* Suppress pre-accounting of the entire object size. */
> +#define SHMEM_F_NORESERVE	BIT(0)
> +/* Disallow swapping. */
> +#define SHMEM_F_LOCKED		BIT(1)
> +
>  struct shmem_inode_info {
>  	spinlock_t		lock;
>  	unsigned int		seals;		/* shmem seals */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 58701d14dd96..1d5036dec08a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -175,20 +175,20 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
>   */
>  static inline int shmem_acct_size(unsigned long flags, loff_t size)
>  {
> -	return (flags & VM_NORESERVE) ?
> +	return (flags & SHMEM_F_NORESERVE) ?
>  		0 : security_vm_enough_memory_mm(current->mm, VM_ACCT(size));
>  }
>  
>  static inline void shmem_unacct_size(unsigned long flags, loff_t size)
>  {
> -	if (!(flags & VM_NORESERVE))
> +	if (!(flags & SHMEM_F_NORESERVE))
>  		vm_unacct_memory(VM_ACCT(size));
>  }
>  
>  static inline int shmem_reacct_size(unsigned long flags,
>  		loff_t oldsize, loff_t newsize)
>  {
> -	if (!(flags & VM_NORESERVE)) {
> +	if (!(flags & SHMEM_F_NORESERVE)) {
>  		if (VM_ACCT(newsize) > VM_ACCT(oldsize))
>  			return security_vm_enough_memory_mm(current->mm,
>  					VM_ACCT(newsize) - VM_ACCT(oldsize));
> @@ -206,7 +206,7 @@ static inline int shmem_reacct_size(unsigned long flags,
>   */
>  static inline int shmem_acct_blocks(unsigned long flags, long pages)
>  {
> -	if (!(flags & VM_NORESERVE))
> +	if (!(flags & SHMEM_F_NORESERVE))
>  		return 0;
>  
>  	return security_vm_enough_memory_mm(current->mm,
> @@ -215,7 +215,7 @@ static inline int shmem_acct_blocks(unsigned long flags, long pages)
>  
>  static inline void shmem_unacct_blocks(unsigned long flags, long pages)
>  {
> -	if (flags & VM_NORESERVE)
> +	if (flags & SHMEM_F_NORESERVE)
>  		vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
>  }
>  
> @@ -1551,7 +1551,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
>  	int nr_pages;
>  	bool split = false;
>  
> -	if ((info->flags & VM_LOCKED) || sbinfo->noswap)
> +	if ((info->flags & SHMEM_F_LOCKED) || sbinfo->noswap)
>  		goto redirty;
>  
>  	if (!total_swap_pages)
> @@ -2910,15 +2910,15 @@ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts)
>  	 * ipc_lock_object() when called from shmctl_do_lock(),
>  	 * no serialization needed when called from shm_destroy().
>  	 */
> -	if (lock && !(info->flags & VM_LOCKED)) {
> +	if (lock && !(info->flags & SHMEM_F_LOCKED)) {
>  		if (!user_shm_lock(inode->i_size, ucounts))
>  			goto out_nomem;
> -		info->flags |= VM_LOCKED;
> +		info->flags |= SHMEM_F_LOCKED;
>  		mapping_set_unevictable(file->f_mapping);
>  	}
> -	if (!lock && (info->flags & VM_LOCKED) && ucounts) {
> +	if (!lock && (info->flags & SHMEM_F_LOCKED) && ucounts) {
>  		user_shm_unlock(inode->i_size, ucounts);
> -		info->flags &= ~VM_LOCKED;
> +		info->flags &= ~SHMEM_F_LOCKED;
>  		mapping_clear_unevictable(file->f_mapping);
>  	}
>  	retval = 0;
> @@ -3062,7 +3062,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>  	spin_lock_init(&info->lock);
>  	atomic_set(&info->stop_eviction, 0);
>  	info->seals = F_SEAL_SEAL;
> -	info->flags = flags & VM_NORESERVE;
> +	info->flags = (flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
>  	info->i_crtime = inode_get_mtime(inode);
>  	info->fsflags = (dir == NULL) ? 0 :
>  		SHMEM_I(dir)->fsflags & SHMEM_FL_INHERITED;
> @@ -5804,8 +5804,10 @@ static inline struct inode *shmem_get_inode(struct mnt_idmap *idmap,
>  /* common code */
>  
>  static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
> -			loff_t size, unsigned long flags, unsigned int i_flags)
> +				       loff_t size, unsigned long vm_flags,
> +				       unsigned int i_flags)
>  {
> +	unsigned long flags = (vm_flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
>  	struct inode *inode;
>  	struct file *res;
>  
> @@ -5822,7 +5824,7 @@ static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name,
>  		return ERR_PTR(-ENOMEM);
>  
>  	inode = shmem_get_inode(&nop_mnt_idmap, mnt->mnt_sb, NULL,
> -				S_IFREG | S_IRWXUGO, 0, flags);
> +				S_IFREG | S_IRWXUGO, 0, vm_flags);
>  	if (IS_ERR(inode)) {
>  		shmem_unacct_size(flags, size);
>  		return ERR_CAST(inode);
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 10/20] MAINTAINERS: add liveupdate entry
From: Mike Rapoport @ 2025-11-17  9:40 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-11-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:56PM -0500, Pasha Tatashin wrote:
> Add a MAINTAINERS file entry for the new Live Update Orchestrator
> introduced in previous patches.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  MAINTAINERS | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 500789529359..bc9f5c6f0e80 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14464,6 +14464,17 @@ F:	kernel/module/livepatch.c
>  F:	samples/livepatch/
>  F:	tools/testing/selftests/livepatch/
>  
> +LIVE UPDATE
> +M:	Pasha Tatashin <pasha.tatashin@soleen.com>

Please count me in :)

> +L:	linux-kernel@vger.kernel.org
> +S:	Maintained
> +F:	Documentation/core-api/liveupdate.rst
> +F:	Documentation/userspace-api/liveupdate.rst
> +F:	include/linux/liveupdate.h
> +F:	include/linux/liveupdate/
> +F:	include/uapi/linux/liveupdate.h
> +F:	kernel/liveupdate/
> +
>  LLC (802.2)
>  L:	netdev@vger.kernel.org
>  S:	Odd fixes
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 08/20] liveupdate: luo_flb: Introduce File-Lifecycle-Bound global state
From: Mike Rapoport @ 2025-11-17  9:39 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-9-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:54PM -0500, Pasha Tatashin wrote:
> Introduce a mechanism for managing global kernel state whose lifecycle
> is tied to the preservation of one or more files. This is necessary for
> subsystems where multiple preserved file descriptors depend on a single,
> shared underlying resource.
> 
> An example is HugeTLB, where multiple file descriptors such as memfd and
> guest_memfd may rely on the state of a single HugeTLB subsystem.
> Preserving this state for each individual file would be redundant and
> incorrect. The state should be preserved only once when the first file
> is preserved, and restored/finished only once the last file is handled.
> 
> This patch introduces File-Lifecycle-Bound (FLB) objects to solve this
> problem. An FLB is a global, reference-counted object with a defined set
> of operations:
> 
> - A file handler (struct liveupdate_file_handler) declares a dependency
>   on one or more FLBs via a new registration function,
>   liveupdate_register_flb().
> - When the first file depending on an FLB is preserved, the FLB's
>   .preserve() callback is invoked to save the shared global state. The
>   reference count is then incremented for each subsequent file.
> - Conversely, when the last file is unpreserved (before reboot) or
>   finished (after reboot), the FLB's .unpreserve() or .finish() callback
>   is invoked to clean up the global resource.
> 
> The implementation includes:
> 
> - A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a
>   corresponding FDT node (luo-flb) to serialize the state of all active
>   FLBs and pass them via Kexec Handover.
> - Core logic in luo_flb.c to manage FLB registration, reference
>   counting, and the invocation of lifecycle callbacks.
> - An API (liveupdate_flb_*_locked/*_unlock) for other kernel subsystems
>   to safely access the live object managed by an FLB, both before and
>   after the live update.
> 
> This framework provides the necessary infrastructure for more complex
> subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update
> Orchestrator.

The concept makes sense to me, but it's hard to review the implementation
without an actual user.
 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/linux/liveupdate.h         | 116 +++++
>  include/linux/liveupdate/abi/luo.h |  76 ++++
>  kernel/liveupdate/Makefile         |   1 +
>  kernel/liveupdate/luo_core.c       |   7 +-
>  kernel/liveupdate/luo_file.c       |   8 +
>  kernel/liveupdate/luo_flb.c        | 658 +++++++++++++++++++++++++++++
>  kernel/liveupdate/luo_internal.h   |   7 +
>  7 files changed, 872 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/liveupdate/luo_flb.c
> 
> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> index 4a5d4dd9905a..36a831ae3ead 100644
> --- a/include/linux/liveupdate.h
> +++ b/include/linux/liveupdate.h
> @@ -14,6 +14,7 @@
>  #include <uapi/linux/liveupdate.h>
>  
>  struct liveupdate_file_handler;
> +struct liveupdate_flb;
>  struct liveupdate_session;
>  struct file;
>  
> @@ -81,6 +82,7 @@ struct liveupdate_file_ops {
>   *                      associated with individual &struct file instances.
>   * @list:               Used for linking this handler instance into a global
>   *                      list of registered file handlers.
> + * @flb_list:           A list of FLB dependencies.
>   *
>   * Modules that want to support live update for specific file types should
>   * register an instance of this structure. LUO uses this registration to
> @@ -91,6 +93,80 @@ struct liveupdate_file_handler {
>  	const struct liveupdate_file_ops *ops;
>  	const char compatible[LIVEUPDATE_HNDL_COMPAT_LENGTH];
>  	struct list_head list;
> +	struct list_head flb_list;
> +};
> +
> +/**
> + * struct liveupdate_flb_op_args - Arguments for FLB operation callbacks.
> + * @flb:       The global FLB instance for which this call is performed.
> + * @data:      For .preserve():    [OUT] The callback sets this field.
> + *             For .unpreserve():  [IN]  The handle from .preserve().
> + *             For .retrieve():    [IN]  The handle from .preserve().
> + * @obj:       For .preserve():    [OUT] Sets this to the live object.
> + *             For .retrieve():    [OUT] Sets this to the live object.
> + *             For .finish():      [IN]  The live object from .retrieve().
> + *
> + * This structure bundles all parameters for the FLB operation callbacks.
> + */
> +struct liveupdate_flb_op_args {
> +	struct liveupdate_flb *flb;
> +	u64 data;
> +	void *obj;
> +};
> +
> +/**
> + * struct liveupdate_flb_ops - Callbacks for global File-Lifecycle-Bound data.
> + * @preserve:        Called when the first file using this FLB is preserved.
> + *                   The callback must save its state and return a single,
> + *                   self-contained u64 handle by setting the 'argp->data'
> + *                   field and 'argp->obj'.
> + * @unpreserve:      Called when the last file using this FLB is unpreserved
> + *                   (aborted before reboot). Receives the handle via
> + *                   'argp->data' and live object via 'argp->obj'.
> + * @retrieve:        Called on-demand in the new kernel, the first time a
> + *                   component requests access to the shared object. It receives
> + *                   the preserved handle via 'argp->data' and must reconstruct
> + *                   the live object, returning it by setting the 'argp->obj'
> + *                   field.
> + * @finish:          Called in the new kernel when the last file using this FLB
> + *                   is finished. Receives the live object via 'argp->obj' for
> + *                   cleanup.
> + * @owner:           Module reference
> + *
> + * Operations that manage global shared data with file bound lifecycle,
> + * triggered by the first file that uses it and concluded by the last file that
> + * uses it, across all sessions.
> + */
> +struct liveupdate_flb_ops {
> +	int (*preserve)(struct liveupdate_flb_op_args *argp);
> +	void (*unpreserve)(struct liveupdate_flb_op_args *argp);
> +	int (*retrieve)(struct liveupdate_flb_op_args *argp);
> +	void (*finish)(struct liveupdate_flb_op_args *argp);
> +	struct module *owner;
> +};
> +
> +/**
> + * struct liveupdate_flb - A global definition for a shared data object.
> + * @ops:         Callback functions
> + * @compatible:  The compatibility string (e.g., "iommu-core-v1"
> + *               that uniquely identifies the FLB type this handler
> + *               supports. This is matched against the compatible string
> + *               associated with individual &struct liveupdate_flb
> + *               instances.
> + * @list:        A global list of registered FLBs.
> + * @internal:    Internal state, set in liveupdate_init_flb().
> + *
> + * This struct is the "template" that a driver registers to define a shared,
> + * file-lifecycle-bound object. The actual runtime state (the live object,
> + * refcount, etc.) is managed internally by the LUO core.
> + * Use liveupdate_init_flb() to initialize this struct before using it in
> + * other functions.
> + */
> +struct liveupdate_flb {
> +	const struct liveupdate_flb_ops *ops;
> +	const char compatible[LIVEUPDATE_FLB_COMPAT_LENGTH];
> +	struct list_head list;
> +	void *internal;

Can't list be a part of internal?
And don't we usually call this .private rather than .internal?

>  };
>  
>  #ifdef CONFIG_LIVEUPDATE
> @@ -111,6 +187,17 @@ int liveupdate_get_file_incoming(struct liveupdate_session *s, u64 token,
>  int liveupdate_get_token_outgoing(struct liveupdate_session *s,
>  				  struct file *file, u64 *tokenp);
>  
> +/* Before using FLB for the first time it should be initialized */
> +int liveupdate_init_flb(struct liveupdate_flb *flb);
> +
> +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> +			    struct liveupdate_flb *flb);

While these are obvious ...

> +
> +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp);
> +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj);
> +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp);
> +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj);
> +

... it's not very clear what these APIs are for and how they are going to be
used.

>  #else /* CONFIG_LIVEUPDATE */
  
...

> +int liveupdate_register_flb(struct liveupdate_file_handler *h,
> +			    struct liveupdate_flb *flb)
> +{
> +	struct luo_flb_internal *internal = flb->internal;
> +	struct luo_flb_link *link __free(kfree) = NULL;
> +	static DEFINE_MUTEX(register_flb_lock);
> +	struct liveupdate_flb *gflb;
> +	struct luo_flb_link *iter;
> +
> +	if (!liveupdate_enabled())
> +		return -EOPNOTSUPP;
> +
> +	if (WARN_ON(!h || !flb || !internal))
> +		return -EINVAL;
> +
> +	if (WARN_ON(!flb->ops->preserve || !flb->ops->unpreserve ||
> +		    !flb->ops->retrieve || !flb->ops->finish)) {
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Once session/files have been deserialized, FLBs cannot be registered,
> +	 * it is too late. Deserialization uses file handlers, and FLB registers
> +	 * to file handlers.
> +	 */
> +	if (WARN_ON(luo_session_is_deserialized()))
> +		return -EBUSY;
> +
> +	/*
> +	 * File handler must already be registered, as it is initializes the
> +	 * flb_list
> +	 */
> +	if (WARN_ON(list_empty(&h->list)))
> +		return -EINVAL;
> +
> +	link = kzalloc(sizeof(*link), GFP_KERNEL);
> +	if (!link)
> +		return -ENOMEM;
> +
> +	guard(mutex)(&register_flb_lock);
> +
> +	/* Check that this FLB is not already linked to this file handler */
> +	list_for_each_entry(iter, &h->flb_list, list) {
> +		if (iter->flb == flb)
> +			return -EEXIST;
> +	}
> +
> +	/* Is this FLB linked to global list ? */

Maybe:

	/*
	 * If this FLB is not linked to global list it's first time the FLB
	 * is registered
	 */

> +	if (list_empty(&flb->list)) {
> +		if (luo_flb_global.count == LUO_FLB_MAX)
> +			return -ENOSPC;
> +
> +		/* Check that compatible string is unique in global list */
> +		list_for_each_entry(gflb, &luo_flb_global.list, list) {
> +			if (!strcmp(gflb->compatible, flb->compatible))
> +				return -EEXIST;
> +		}
> +
> +		if (!try_module_get(flb->ops->owner))
> +			return -EAGAIN;
> +
> +		list_add_tail(&flb->list, &luo_flb_global.list);
> +		luo_flb_global.count++;
> +	}
> +
> +	/* Finally, link the FLB to the file handler */
> +	link->flb = flb;
> +	list_add_tail(&no_free_ptr(link)->list, &h->flb_list);
> +
> +	return 0;
> +}
> +
> +/**
> + * liveupdate_flb_incoming_locked - Lock and retrieve the incoming FLB object.
> + * @flb:  The FLB definition.
> + * @objp: Output parameter; will be populated with the live shared object.
> + *
> + * Acquires the FLB's internal lock and returns a pointer to its shared live
> + * object for the incoming (post-reboot) path.
> + *
> + * If this is the first time the object is requested in the new kernel, this
> + * function will trigger the FLB's .retrieve() callback to reconstruct the
> + * object from its preserved state. Subsequent calls will return the same
> + * cached object.
> + *
> + * The caller MUST call liveupdate_flb_incoming_unlock() to release the lock.
> + *
> + * Return: 0 on success, or a negative errno on failure. -ENODATA means no
> + * incoming FLB data, -ENOENT means specific flb not found in the incoming
> + * data, and -EOPNOTSUPP when live update is disabled or not configured.
> + */
> +int liveupdate_flb_incoming_locked(struct liveupdate_flb *flb, void **objp)
> +{
> +	struct luo_flb_internal *internal = flb->internal;
> +
> +	if (!liveupdate_enabled())
> +		return -EOPNOTSUPP;
> +
> +	if (WARN_ON(!internal))
> +		return -EINVAL;
> +
> +	if (!internal->incoming.obj) {
> +		int err = luo_flb_retrieve_one(flb);
> +
> +		if (err)
> +			return err;
> +	}
> +
> +	mutex_lock(&internal->incoming.lock);
> +	*objp = internal->incoming.obj;
> +
> +	return 0;
> +}
> +
> +/**
> + * liveupdate_flb_incoming_unlock - Unlock an incoming FLB object.
> + * @flb: The FLB definition.
> + * @obj: The object that was returned by the _locked call (used for validation).
> + *
> + * Releases the internal lock acquired by liveupdate_flb_incoming_locked().
> + */
> +void liveupdate_flb_incoming_unlock(struct liveupdate_flb *flb, void *obj)
> +{
> +	struct luo_flb_internal *internal = flb->internal;
> +
> +	lockdep_assert_held(&internal->incoming.lock);
> +	internal->incoming.obj = obj;

The comment says obj is for validation and here it's assigned to flb.
Something is off here :)

> +	mutex_unlock(&internal->incoming.lock);
> +}
> +
> +/**
> + * liveupdate_flb_outgoing_locked - Lock and retrieve the outgoing FLB object.
> + * @flb:  The FLB definition.
> + * @objp: Output parameter; will be populated with the live shared object.
> + *
> + * Acquires the FLB's internal lock and returns a pointer to its shared live
> + * object for the outgoing (pre-reboot) path.
> + *
> + * This function assumes the object has already been created by the FLB's
> + * .preserve() callback, which is triggered when the first dependent file
> + * is preserved.
> + *
> + * The caller MUST call liveupdate_flb_outgoing_unlock() to release the lock.
> + *
> + * Return: 0 on success, or a negative errno on failure.
> + */
> +int liveupdate_flb_outgoing_locked(struct liveupdate_flb *flb, void **objp)
> +{
> +	struct luo_flb_internal *internal = flb->internal;
> +
> +	if (!liveupdate_enabled())
> +		return -EOPNOTSUPP;
> +
> +	if (WARN_ON(!internal))
> +		return -EINVAL;
> +
> +	mutex_lock(&internal->outgoing.lock);
> +
> +	/* The object must exist if any file is being preserved */
> +	if (WARN_ON_ONCE(!internal->outgoing.obj)) {
> +		mutex_unlock(&internal->outgoing.lock);
> +		return -ENOENT;
> +	}

_incoming_locked() and outgoing_locked() are nearly identical, it seems we
can have the common part in a 
static liveupdate_flb_locked(struct luo_flb_state *state).

liveupdate_flb_incoming_locked() will be oneline wrapper and
liveupdate_flb_outgoing_locked() will have this WARN_ON if obj is NULL.

> +
> +	*objp = internal->outgoing.obj;
> +
> +	return 0;
> +}
> +
> +/**
> + * liveupdate_flb_outgoing_unlock - Unlock an outgoing FLB object.
> + * @flb: The FLB definition.
> + * @obj: The object that was returned by the _locked call (used for validation).
> + *
> + * Releases the internal lock acquired by liveupdate_flb_outgoing_locked().
> + */
> +void liveupdate_flb_outgoing_unlock(struct liveupdate_flb *flb, void *obj)
> +{
> +	struct luo_flb_internal *internal = flb->internal;
> +
> +	lockdep_assert_held(&internal->outgoing.lock);
> +	internal->outgoing.obj = obj;

So it is assignment or validation? ;-)

This one is a copy of liveupdate_flb_incoming_unlock(), 

> +	mutex_unlock(&internal->outgoing.lock);
> +}
> +

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 01/20] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator
From: Andrew Morton @ 2025-11-17  2:54 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, rppt, dmatlack, rientjes, corbet,
	rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-2-pasha.tatashin@soleen.com>

On Sat, 15 Nov 2025 18:33:47 -0500 Pasha Tatashin <pasha.tatashin@soleen.com> wrote:

> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). 

Thanks, I updated mm.git's mm-unstable branch to this version.  I
expect at least one more version as a result of feedback for this v6.

I wasn't able to reproduce Stephen's build error
(https://lkml.kernel.org/r/20251117093614.1490d048@canb.auug.org.au)
with this series.


^ permalink raw reply

* Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-16 19:16 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bDu2FdzyotSwBpGwQtiisv=3f6gC7DzOpebPCxmmpwMYw@mail.gmail.com>

On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > > +static int __init liveupdate_early_init(void)
> > > +{
> > > +     int err;
> > > +
> > > +     err = luo_early_startup();
> > > +     if (err) {
> > > +             pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > +                    ERR_PTR(err));
> >
> > How do we report this to the userspace?
> > I think the decision what to do in this case belongs there. Even if it's
> > down to choosing between plain kexec and full reboot, it's still a policy
> > that should be implemented in userspace.
> 
> I agree that policy belongs in userspace, and that is how we designed
> it. In this specific failure case (ABI mismatch or corrupt FDT), the
> preserved state is unrecoverable by the kernel. We cannot parse the
> incoming data, so we cannot offer it to userspace.
> 
> We report this state by not registering the /dev/liveupdate device.
> When the userspace agent attempts to initialize, it receives ENOENT.
> At that point, the agent exercises its policy:
> 
> - Check dmesg for the specific error and report the failure to the
> fleet control plane.

Hmm, this is not nice. I think we still should register /dev/liveupdate and
let userspace discover this error via /dev/liveupdate ABIs.

> - Trigger a fresh (kexec or cold) reboot to reset unreclaimable resources.
> 
> Pasha
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 18/20] selftests/liveupdate: Add kexec-based selftest for session lifecycle
From: Zhu Yanjun @ 2025-11-16 18:53 UTC (permalink / raw)
  To: Pasha Tatashin, pratyush, jasonmiu, graf, rppt, dmatlack,
	rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
	masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
	chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
	dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
	song, linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx,
	mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
	bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
	Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
	andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
	stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
	linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu,
	hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-19-pasha.tatashin@soleen.com>

在 2025/11/15 15:34, Pasha Tatashin 写道:
> Introduce a kexec-based selftest, luo_kexec_simple, to validate the
> end-to-end lifecycle of a Live Update Orchestrator (LUO) session across
> a reboot.
> 
> While existing tests verify the uAPI in a pre-reboot context, this test
> ensures that the core functionality—preserving state via Kexec Handover
> and restoring it in a new kernel—works as expected.
> 
> The test operates in two stages, managing its state across the reboot by
> preserving a dedicated "state session" containing a memfd. This
> mechanism dogfoods the LUO feature itself for state tracking, making the
> test self-contained.
> 
> The test validates the following sequence:
> 
> Stage 1 (Pre-kexec):
>   - Creates a test session (test-session).
>   - Creates and preserves a memfd with a known data pattern into the test
>     session.
>   - Creates the state-tracking session to signal progression to Stage 2.
>   - Executes a kexec reboot via a helper script.
> 
> Stage 2 (Post-kexec):
>   - Retrieves the state-tracking session to confirm it is in the
>     post-reboot stage.
>   - Retrieves the preserved test session.
>   - Restores the memfd from the test session and verifies its contents
>     match the original data pattern written in Stage 1.
>   - Finalizes both the test and state sessions to ensure a clean
>     teardown.
> 
> The test relies on a helper script (do_kexec.sh) to perform the reboot
> and a shared utility library (luo_test_utils.c) for common LUO
> operations, keeping the main test logic clean and focused.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>   tools/testing/selftests/liveupdate/.gitignore |   1 +
>   tools/testing/selftests/liveupdate/Makefile   |  32 ++++
>   .../testing/selftests/liveupdate/do_kexec.sh  |  16 ++
>   .../selftests/liveupdate/luo_kexec_simple.c   | 114 ++++++++++++
>   .../selftests/liveupdate/luo_test_utils.c     | 168 ++++++++++++++++++
>   .../selftests/liveupdate/luo_test_utils.h     |  39 ++++
>   6 files changed, 370 insertions(+)
>   create mode 100755 tools/testing/selftests/liveupdate/do_kexec.sh
>   create mode 100644 tools/testing/selftests/liveupdate/luo_kexec_simple.c
>   create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.c
>   create mode 100644 tools/testing/selftests/liveupdate/luo_test_utils.h
> 
> diff --git a/tools/testing/selftests/liveupdate/.gitignore b/tools/testing/selftests/liveupdate/.gitignore
> index af6e773cf98f..daeef116174d 100644
> --- a/tools/testing/selftests/liveupdate/.gitignore
> +++ b/tools/testing/selftests/liveupdate/.gitignore
> @@ -1 +1,2 @@
>   /liveupdate
> +/luo_kexec_simple
> diff --git a/tools/testing/selftests/liveupdate/Makefile b/tools/testing/selftests/liveupdate/Makefile
> index 2a573c36016e..1563ac84006a 100644
> --- a/tools/testing/selftests/liveupdate/Makefile
> +++ b/tools/testing/selftests/liveupdate/Makefile
> @@ -1,7 +1,39 @@
>   # SPDX-License-Identifier: GPL-2.0-only
> +
> +KHDR_INCLUDES ?= -I../../../../usr/include
>   CFLAGS += -Wall -O2 -Wno-unused-function
>   CFLAGS += $(KHDR_INCLUDES)
> +LDFLAGS += -static
> +OUTPUT ?= .
> +
> +# --- Test Configuration (Edit this section when adding new tests) ---
> +LUO_SHARED_SRCS := luo_test_utils.c
> +LUO_SHARED_HDRS += luo_test_utils.h
> +
> +LUO_MANUAL_TESTS += luo_kexec_simple
> +
> +TEST_FILES += do_kexec.sh
>   
>   TEST_GEN_PROGS += liveupdate
>   
> +# --- Automatic Rule Generation (Do not edit below) ---
> +
> +TEST_GEN_PROGS_EXTENDED += $(LUO_MANUAL_TESTS)
> +
> +# Define the full list of sources for each manual test.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> +	$(eval $(test)_SOURCES := $(test).c $(LUO_SHARED_SRCS)))
> +
> +# This loop automatically generates an explicit build rule for each manual test.
> +# It includes dependencies on the shared headers and makes the output
> +# executable.
> +# Note the use of '$$' to escape automatic variables for the 'eval' command.
> +$(foreach test,$(LUO_MANUAL_TESTS), \
> +	$(eval $(OUTPUT)/$(test): $($(test)_SOURCES) $(LUO_SHARED_HDRS) \
> +		$(call msg,LINK,,$$@) ; \
> +		$(Q)$(LINK.c) $$^ $(LDLIBS) -o $$@ ; \
> +		$(Q)chmod +x $$@ \
> +	) \
> +)
> +
>   include ../lib.mk
> diff --git a/tools/testing/selftests/liveupdate/do_kexec.sh b/tools/testing/selftests/liveupdate/do_kexec.sh
> new file mode 100755
> index 000000000000..3c7c6cafbef8
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/do_kexec.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +set -e
> +
> +# Use $KERNEL and $INITRAMFS to pass custom Kernel and optional initramfs
> +
> +KERNEL="${KERNEL:-/boot/bzImage}"
> +set -- -l -s --reuse-cmdline "$KERNEL"
> +
> +INITRAMFS="${INITRAMFS:-/boot/initramfs}"
> +if [ -f "$INITRAMFS" ]; then
> +    set -- "$@" --initrd="$INITRAMFS"
> +fi
> +
> +kexec "$@"
> +kexec -e

Thanks a lot. Just with kernel image, it is not enough to boot the host. 
Adding initramfs will avoid the crash when the host boots.
I have made tests to verify this.

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

> diff --git a/tools/testing/selftests/liveupdate/luo_kexec_simple.c b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
> new file mode 100644
> index 000000000000..67ab6ebf9eec
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_kexec_simple.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + *
> + * A simple selftest to validate the end-to-end lifecycle of a LUO session
> + * across a single kexec reboot.
> + */
> +
> +#include "luo_test_utils.h"
> +
> +/* Test-specific constants are now defined locally */
> +#define KEXEC_SCRIPT "./do_kexec.sh"
> +#define TEST_SESSION_NAME "test-session"
> +#define TEST_MEMFD_TOKEN 0x1A
> +#define TEST_MEMFD_DATA "hello kexec world"
> +
> +/* Constants for the state-tracking mechanism, specific to this test file. */
> +#define STATE_SESSION_NAME "kexec_simple_state"
> +#define STATE_MEMFD_TOKEN 999
> +
> +/* Stage 1: Executed before the kexec reboot. */
> +static void run_stage_1(int luo_fd)
> +{
> +	int session_fd;
> +
> +	ksft_print_msg("[STAGE 1] Starting pre-kexec setup...\n");
> +
> +	ksft_print_msg("[STAGE 1] Creating state file for next stage (2)...\n");
> +	create_state_file(luo_fd, STATE_SESSION_NAME, STATE_MEMFD_TOKEN, 2);
> +
> +	ksft_print_msg("[STAGE 1] Creating session '%s' and preserving memfd...\n",
> +		       TEST_SESSION_NAME);
> +	session_fd = luo_create_session(luo_fd, TEST_SESSION_NAME);
> +	if (session_fd < 0)
> +		fail_exit("luo_create_session for '%s'", TEST_SESSION_NAME);
> +
> +	if (create_and_preserve_memfd(session_fd, TEST_MEMFD_TOKEN,
> +				      TEST_MEMFD_DATA) < 0) {
> +		fail_exit("create_and_preserve_memfd for token %#x",
> +			  TEST_MEMFD_TOKEN);
> +	}
> +
> +	ksft_print_msg("[STAGE 1] Executing kexec...\n");
> +	if (system(KEXEC_SCRIPT) != 0)
> +		fail_exit("kexec script failed");
> +	exit(EXIT_FAILURE);
> +}
> +
> +/* Stage 2: Executed after the kexec reboot. */
> +static void run_stage_2(int luo_fd, int state_session_fd)
> +{
> +	int session_fd, mfd, stage;
> +
> +	ksft_print_msg("[STAGE 2] Starting post-kexec verification...\n");
> +
> +	restore_and_read_stage(state_session_fd, STATE_MEMFD_TOKEN, &stage);
> +	if (stage != 2)
> +		fail_exit("Expected stage 2, but state file contains %d", stage);
> +
> +	ksft_print_msg("[STAGE 2] Retrieving session '%s'...\n", TEST_SESSION_NAME);
> +	session_fd = luo_retrieve_session(luo_fd, TEST_SESSION_NAME);
> +	if (session_fd < 0)
> +		fail_exit("luo_retrieve_session for '%s'", TEST_SESSION_NAME);
> +
> +	ksft_print_msg("[STAGE 2] Restoring and verifying memfd (token %#x)...\n",
> +		       TEST_MEMFD_TOKEN);
> +	mfd = restore_and_verify_memfd(session_fd, TEST_MEMFD_TOKEN,
> +				       TEST_MEMFD_DATA);
> +	if (mfd < 0)
> +		fail_exit("restore_and_verify_memfd for token %#x", TEST_MEMFD_TOKEN);
> +	close(mfd);
> +
> +	ksft_print_msg("[STAGE 2] Test data verified successfully.\n");
> +	ksft_print_msg("[STAGE 2] Finalizing test session...\n");
> +	if (luo_session_finish(session_fd) < 0)
> +		fail_exit("luo_session_finish for test session");
> +	close(session_fd);
> +
> +	ksft_print_msg("[STAGE 2] Finalizing state session...\n");
> +	if (luo_session_finish(state_session_fd) < 0)
> +		fail_exit("luo_session_finish for state session");
> +	close(state_session_fd);
> +
> +	ksft_print_msg("\n--- SIMPLE KEXEC TEST PASSED ---\n");
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int luo_fd;
> +	int state_session_fd;
> +
> +	luo_fd = luo_open_device();
> +	if (luo_fd < 0)
> +		ksft_exit_skip("Failed to open %s. Is the luo module loaded?\n",
> +			       LUO_DEVICE);
> +
> +	/*
> +	 * Determine the stage by attempting to retrieve the state session.
> +	 * If it doesn't exist (ENOENT), we are in Stage 1 (pre-kexec).
> +	 */
> +	state_session_fd = luo_retrieve_session(luo_fd, STATE_SESSION_NAME);
> +	if (state_session_fd == -ENOENT) {
> +		run_stage_1(luo_fd);
> +	} else if (state_session_fd >= 0) {
> +		/* We got a valid handle, pass it directly to stage 2 */
> +		run_stage_2(luo_fd, state_session_fd);
> +	} else {
> +		fail_exit("Failed to check for state session");
> +	}
> +
> +	close(luo_fd);
> +}
> diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.c b/tools/testing/selftests/liveupdate/luo_test_utils.c
> new file mode 100644
> index 000000000000..0a24105cbc54
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_test_utils.c
> @@ -0,0 +1,168 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/ioctl.h>
> +#include <sys/syscall.h>
> +#include <sys/mman.h>
> +#include <errno.h>
> +#include <stdarg.h>
> +
> +#include "luo_test_utils.h"
> +
> +int luo_open_device(void)
> +{
> +	return open(LUO_DEVICE, O_RDWR);
> +}
> +
> +int luo_create_session(int luo_fd, const char *name)
> +{
> +	struct liveupdate_ioctl_create_session arg = { .size = sizeof(arg) };
> +
> +	snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
> +		 LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
> +
> +	if (ioctl(luo_fd, LIVEUPDATE_IOCTL_CREATE_SESSION, &arg) < 0)
> +		return -errno;
> +
> +	return arg.fd;
> +}
> +
> +int luo_retrieve_session(int luo_fd, const char *name)
> +{
> +	struct liveupdate_ioctl_retrieve_session arg = { .size = sizeof(arg) };
> +
> +	snprintf((char *)arg.name, LIVEUPDATE_SESSION_NAME_LENGTH, "%.*s",
> +		 LIVEUPDATE_SESSION_NAME_LENGTH - 1, name);
> +
> +	if (ioctl(luo_fd, LIVEUPDATE_IOCTL_RETRIEVE_SESSION, &arg) < 0)
> +		return -errno;
> +
> +	return arg.fd;
> +}
> +
> +int create_and_preserve_memfd(int session_fd, int token, const char *data)
> +{
> +	struct liveupdate_session_preserve_fd arg = { .size = sizeof(arg) };
> +	long page_size = sysconf(_SC_PAGE_SIZE);
> +	void *map = MAP_FAILED;
> +	int mfd = -1, ret = -1;
> +
> +	mfd = memfd_create("test_mfd", 0);
> +	if (mfd < 0)
> +		return -errno;
> +
> +	if (ftruncate(mfd, page_size) != 0)
> +		goto out;
> +
> +	map = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, mfd, 0);
> +	if (map == MAP_FAILED)
> +		goto out;
> +
> +	snprintf(map, page_size, "%s", data);
> +	munmap(map, page_size);
> +
> +	arg.fd = mfd;
> +	arg.token = token;
> +	if (ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, &arg) < 0)
> +		goto out;
> +
> +	ret = 0;
> +out:
> +	if (ret != 0 && errno != 0)
> +		ret = -errno;
> +	if (mfd >= 0)
> +		close(mfd);
> +	return ret;
> +}
> +
> +int restore_and_verify_memfd(int session_fd, int token,
> +			     const char *expected_data)
> +{
> +	struct liveupdate_session_retrieve_fd arg = { .size = sizeof(arg) };
> +	long page_size = sysconf(_SC_PAGE_SIZE);
> +	void *map = MAP_FAILED;
> +	int mfd = -1, ret = -1;
> +
> +	arg.token = token;
> +	if (ioctl(session_fd, LIVEUPDATE_SESSION_RETRIEVE_FD, &arg) < 0)
> +		return -errno;
> +	mfd = arg.fd;
> +
> +	map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, mfd, 0);
> +	if (map == MAP_FAILED)
> +		goto out;
> +
> +	if (expected_data && strcmp(expected_data, map) != 0) {
> +		ksft_print_msg("Data mismatch! Expected '%s', Got '%s'\n",
> +			       expected_data, (char *)map);
> +		ret = -EINVAL;
> +		goto out_munmap;
> +	}
> +
> +	ret = mfd;
> +out_munmap:
> +	munmap(map, page_size);
> +out:
> +	if (ret < 0 && errno != 0)
> +		ret = -errno;
> +	if (ret < 0 && mfd >= 0)
> +		close(mfd);
> +	return ret;
> +}
> +
> +int luo_session_finish(int session_fd)
> +{
> +	struct liveupdate_session_finish arg = { .size = sizeof(arg) };
> +
> +	if (ioctl(session_fd, LIVEUPDATE_SESSION_FINISH, &arg) < 0)
> +		return -errno;
> +
> +	return 0;
> +}
> +
> +void create_state_file(int luo_fd, const char *session_name, int token,
> +		       int next_stage)
> +{
> +	char buf[32];
> +	int state_session_fd;
> +
> +	state_session_fd = luo_create_session(luo_fd, session_name);
> +	if (state_session_fd < 0)
> +		fail_exit("luo_create_session for state tracking");
> +
> +	snprintf(buf, sizeof(buf), "%d", next_stage);
> +	if (create_and_preserve_memfd(state_session_fd, token, buf) < 0)
> +		fail_exit("create_and_preserve_memfd for state tracking");
> +
> +	/*
> +	 * DO NOT close session FD, otherwise it is going to be unpreserved
> +	 */
> +}
> +
> +void restore_and_read_stage(int state_session_fd, int token, int *stage)
> +{
> +	char buf[32] = {0};
> +	int mfd;
> +
> +	mfd = restore_and_verify_memfd(state_session_fd, token, NULL);
> +	if (mfd < 0)
> +		fail_exit("failed to restore state memfd");
> +
> +	if (read(mfd, buf, sizeof(buf) - 1) < 0)
> +		fail_exit("failed to read state mfd");
> +
> +	*stage = atoi(buf);
> +
> +	close(mfd);
> +}
> diff --git a/tools/testing/selftests/liveupdate/luo_test_utils.h b/tools/testing/selftests/liveupdate/luo_test_utils.h
> new file mode 100644
> index 000000000000..093e787b9f4b
> --- /dev/null
> +++ b/tools/testing/selftests/liveupdate/luo_test_utils.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Copyright (c) 2025, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + *
> + * Utility functions for LUO kselftests.
> + */
> +
> +#ifndef LUO_TEST_UTILS_H
> +#define LUO_TEST_UTILS_H
> +
> +#include <errno.h>
> +#include <string.h>
> +#include <linux/liveupdate.h>
> +#include "../kselftest.h"
> +
> +#define LUO_DEVICE "/dev/liveupdate"
> +
> +#define fail_exit(fmt, ...)						\
> +	ksft_exit_fail_msg("[%s:%d] " fmt " (errno: %s)\n",	\
> +			   __func__, __LINE__, ##__VA_ARGS__, strerror(errno))
> +
> +/* Generic LUO and session management helpers */
> +int luo_open_device(void);
> +int luo_create_session(int luo_fd, const char *name);
> +int luo_retrieve_session(int luo_fd, const char *name);
> +int luo_session_finish(int session_fd);
> +
> +/* Generic file preservation and restoration helpers */
> +int create_and_preserve_memfd(int session_fd, int token, const char *data);
> +int restore_and_verify_memfd(int session_fd, int token, const char *expected_data);
> +
> +/* Kexec state-tracking helpers */
> +void create_state_file(int luo_fd, const char *session_name, int token,
> +		       int next_stage);
> +void restore_and_read_stage(int state_session_fd, int token, int *stage);
> +
> +#endif /* LUO_TEST_UTILS_H */


^ permalink raw reply

* Re: [PATCH v5 22/22] tests/liveupdate: Add in-kernel liveupdate test
From: Mike Rapoport @ 2025-11-16 18:36 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
	linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <CA+CK2bBVRHwBu6a77gkvsbmWkQFDcTvNo+5aOT586mie13zqqA@mail.gmail.com>

On Wed, Nov 12, 2025 at 03:40:53PM -0500, Pasha Tatashin wrote:
> On Wed, Nov 12, 2025 at 3:24 PM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Fri, Nov 07, 2025 at 04:03:20PM -0500, Pasha Tatashin wrote:
> > > Introduce an in-kernel test module to validate the core logic of the
> > > Live Update Orchestrator's File-Lifecycle-Bound feature. This
> > > provides a low-level, controlled environment to test FLB registration
> > > and callback invocation without requiring userspace interaction or
> > > actual kexec reboots.
> > >
> > > The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option.
> > >
> > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > > ---
> > >  kernel/liveupdate/luo_file.c     |   2 +
> > >  kernel/liveupdate/luo_internal.h |   8 ++
> > >  lib/Kconfig.debug                |  23 ++++++
> > >  lib/tests/Makefile               |   1 +
> > >  lib/tests/liveupdate.c           | 130 +++++++++++++++++++++++++++++++
> > >  5 files changed, 164 insertions(+)
> > >  create mode 100644 lib/tests/liveupdate.c
> > >
> > > diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
> > > index 713069b96278..4c0a75918f3d 100644
> > > --- a/kernel/liveupdate/luo_file.c
> > > +++ b/kernel/liveupdate/luo_file.c
> > > @@ -829,6 +829,8 @@ int liveupdate_register_file_handler(struct liveupdate_file_handler *fh)
> > >       INIT_LIST_HEAD(&fh->flb_list);
> > >       list_add_tail(&fh->list, &luo_file_handler_list);
> > >
> > > +     liveupdate_test_register(fh);
> > > +
> >
> > Do it mean that every flb user will be added here?
> 
> No, FLB users will use:
> 
> liveupdate_register_flb() from various subsystems. This
> liveupdate_test_register() is only to allow kernel test to register
> test-FLBs to every single file-handler for in-kernel testing purpose
> only.

Why the in kernel test cannot liveupdate_register_flb()?
 
> Pasha

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v6 07/20] liveupdate: luo_session: Add ioctls for file preservation
From: Mike Rapoport @ 2025-11-16 18:25 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
	ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
	yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
	mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
	david, joel.granados, rostedt, anna.schumaker, song, linux,
	linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
	dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
	cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
	quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
	leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
	lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
	jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251115233409.768044-8-pasha.tatashin@soleen.com>

On Sat, Nov 15, 2025 at 06:33:53PM -0500, Pasha Tatashin wrote:
> Introducing the userspace interface and internal logic required to
> manage the lifecycle of file descriptors within a session. Previously, a
> session was merely a container; this change makes it a functional
> management unit.
> 
> The following capabilities are added:
> 
> A new set of ioctl commands are added, which operate on the file
> descriptor returned by CREATE_SESSION. This allows userspace to:
> - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
>   to be preserved across the live update.
> - LIVEUPDATE_SESSION_RETRIEVE_FD: Retrieve a preserved file in the
>   new kernel using its unique token.
> - LIVEUPDATE_SESSION_FINISH: finish session
> 
> The session's .release handler is enhanced to be state-aware. When a
> session's file descriptor is closed, it correctly unpreserves
> the session based on its current state before freeing all
> associated file resources.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  include/uapi/linux/liveupdate.h | 103 ++++++++++++++++++
>  kernel/liveupdate/luo_session.c | 187 +++++++++++++++++++++++++++++++-
>  2 files changed, 286 insertions(+), 4 deletions(-)

...

>  static int luo_session_release(struct inode *inodep, struct file *filep)
>  {
>  	struct luo_session *session = filep->private_data;
>  	struct luo_session_header *sh;
> +	int err = 0;
>  
>  	/* If retrieved is set, it means this session is from incoming list */
> -	if (session->retrieved)
> +	if (session->retrieved) {
>  		sh = &luo_session_global.incoming;
> -	else
> +
> +		err = luo_session_finish_one(session);
> +		if (err) {
> +			pr_warn("Unable to finish session [%s] on release\n",
> +				session->name);

			return err;

and then else can go away here and luo_session_remove() and
luo_session_free() can be moved outside if (session->retrieved).

> +		} else {
> +			luo_session_remove(sh, session);
> +			luo_session_free(session);
> +		}
> +
> +	} else {
>  		sh = &luo_session_global.outgoing;
>  
> -	luo_session_remove(sh, session);
> -	luo_session_free(session);
> +		scoped_guard(mutex, &session->mutex)
> +			luo_file_unpreserve_files(session);
> +		luo_session_remove(sh, session);
> +		luo_session_free(session);
> +	}
> +
> +	return err;
> +}

-- 
Sincerely yours,
Mike.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox