linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: jasonmiu@google.com, graf@amazon.com, changyuanl@google.com,
	 rppt@kernel.org, dmatlack@google.com, rientjes@google.com,
	corbet@lwn.net,  rdunlap@infradead.org,
	ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
	 ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org,
	 akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr,
	 mmaurer@google.com, roman.gushchin@linux.dev,
	chenridong@huawei.com,  axboe@kernel.dk, mark.rutland@arm.com,
	jannh@google.com,  vincent.guittot@linaro.org,
	hannes@cmpxchg.org, dan.j.williams@intel.com,  david@redhat.com,
	joel.granados@kernel.org, rostedt@goodmis.org,
	 anna.schumaker@oracle.com, song@kernel.org,
	zhangguopeng@kylinos.cn,  linux@weissschuh.net,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-mm@kvack.org, gregkh@linuxfoundation.org,
	tglx@linutronix.de,  mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org,  hpa@zytor.com,
	rafael@kernel.org, dakr@kernel.org,
	 bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
	 myungjoo.ham@samsung.com, yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com,  quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com, ira.weiny@intel.com,
	 andriy.shevchenko@linux.intel.com, leon@kernel.org,
	lukas@wunner.de,  bhelgaas@google.com, wagi@kernel.org,
	djeffery@redhat.com,  stuart.w.hayes@gmail.com,
	lennart@poettering.net, brauner@kernel.org,
	 linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	saeedm@nvidia.com,  ajayachandra@nvidia.com, jgg@nvidia.com,
	parav@nvidia.com, leonro@nvidia.com,  witu@nvidia.com,
	hughd@google.com, skhawaja@google.com, chrisl@kernel.org,
	 steven.sistare@oracle.com
Subject: Re: [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file preservation and state management
Date: Wed, 29 Oct 2025 16:13:14 -0400	[thread overview]
Message-ID: <CA+CK2bBVSX26TKwgLkXCDop5u3e9McH3sQMascT47ZwwrwraOw@mail.gmail.com> (raw)
In-Reply-To: <mafs0tszhcyrw.fsf@kernel.org>

On Wed, Oct 29, 2025 at 3:07 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> Hi Pasha,
>
> On Mon, Sep 29 2025, Pasha Tatashin wrote:
>
> > Introducing the userspace interface and internal logic required to
> > manage the lifecycle of file descriptors within a session. Previously, a
> > session was merely a container; this change makes it a functional
> > management unit.
> >
> > The following capabilities are added:
> >
> > A new set of ioctl commands are added, which operate on the file
> > descriptor returned by CREATE_SESSION. This allows userspace to:
> > - LIVEUPDATE_SESSION_PRESERVE_FD: Add a file descriptor to a session
> >   to be preserved across the live update.
> > - LIVEUPDATE_SESSION_UNPRESERVE_FD: Remove a previously added file
> >   descriptor from the session.
> > - LIVEUPDATE_SESSION_RESTORE_FD: Retrieve a preserved file in the
> >   new kernel using its unique token.
> >
> > A state machine for each individual session, distinct from the global
> > LUO state. This enables more granular control, allowing userspace to
> > prepare or freeze specific sessions independently. This is managed via:
> > - LIVEUPDATE_SESSION_SET_EVENT: An ioctl to send PREPARE, FREEZE,
> >   CANCEL, or FINISH events to a single session.
> > - LIVEUPDATE_SESSION_GET_STATE: An ioctl to query the current state
> >   of a single session.
> >
> > The global subsystem callbacks (luo_session_prepare, luo_session_freeze)
> > are updated to iterate through all existing sessions. They now trigger
> > the appropriate per-session state transitions for any sessions that
> > haven't already been transitioned individually by userspace.
> >
> > The session's .release handler is enhanced to be state-aware. When a
> > session's file descriptor is closed, it now correctly cancels or
> > finishes the session based on its current state before freeing all
> > associated file resources, preventing resource leaks.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> [...]
> > +/**
> > + * struct liveupdate_session_get_state - ioctl(LIVEUPDATE_SESSION_GET_STATE)
> > + * @size:     Input; sizeof(struct liveupdate_session_get_state)
> > + * @incoming: Input; If 1, query the state of a restored file from the incoming
> > + *            (previous kernel's) set. If 0, query a file being prepared for
> > + *            preservation in the current set.
>
> Spotted this when working on updating my test suite for LUO. This seems
> to be a leftover from a previous version. I don't see it being used
> anywhere in the code.

thank you will remove this.

> Also, I think the model we should have is to only allow new sessions in
> normal state. Currently luo_session_create() allows creating a new
> session in updated state. This would end up mixing sessions from a
> previous boot and sessions from current boot. I don't really see a
> reason for that and I think the userspace should first call finish
> before starting new serialization. Keeps things simpler.

It does. However, yesterday Jason Gunthorpe suggested that we simplify
the uapi, at least for the initial landing, by removing the state
machine during boot and allowing new sessions to be created at any
time. This would also mean separating the incoming and outgoing
sessions and removing the ioctl() call used to bring the machine into
a normal state; instead, only individual sessions could be brought
into a 'normal' state.

Simplified uAPI Proposal
The simplest uAPI would look like this:
IOCTLs on /dev/liveupdate (to create and retrieve session FDs):
LIVEUPDATE_IOCTL_CREATE_SESSION
LIVEUPDATE_IOCTL_RETRIEVE_SESSION

IOCTLs on session FDs:
LIVEUPDATE_CMD_SESSION_PRESERVE_FD
LIVEUPDATE_CMD_SESSION_RETRIEVE_FD
LIVEUPDATE_CMD_SESSION_FINISH

Happy Path
The happy path would look like this:
- luod creates a session with a specific name and passes it to the vmm.
- The vmm preserves FDs in a specific order: memfd, iommufd, vfiofd.
(If the order is wrong, the preserve callbacks will fail.)
- A reboot(KEXEC) is performed.
- Each session receives a freeze() callback to notify it that
mutations are no longer possible.
- During boot, liveupdate_fh_global_state_get(&h, &obj) can be used to
retrieve the global state.
- Once the machine has booted, luod retrieves the incoming sessions
and passes them to the vmms.
- The vmm retrieves the FDs from the session and performs the
necessary IOCTLs on them.
- The vmm calls LIVEUPDATE_CMD_SESSION_FINISH on the session. Each FD
receives a finish() callback in LIFO order.
- If everything succeeds, the session becomes an empty "outgoing"
session. It can then be closed and discarded or reused for the next
live update by preserving new FDs into it.
- Once the last FD for a file-handler is finished,
h->ops->global_state_finish(h, h->global_state_obj) is called to
finish the incoming global state.

Unhappy Paths
- If an outgoing session FD is closed, each FD in that session
receives an unpreserve callback in LIFO order.
- If the last FD for a global state is unpreserved,
h->ops->global_state_unpreserve(h, h->global_state_obj) is called.
- If freeze() fails, a cancel() is performed on each FD that received
freeze() cb, and reboot(KEXEC) returns a failure.
- If an incoming session FD is closed, the resources are considered
"leaked." They are discarded only during the next live-update; this is
intended to prevent implementing rare and untested clean-up code.
- If a user tries to finish a session and it fails, it is considered
the user's problem. This might happen because some IOCTLs still need
to be run on the retrieved FDs to bring them to a state where finish
is possible.

This would also mean that subsystems would not be needed, leaving only
FLB (File-Lifecycle-Bound Global State) to use as a handle for global
state. The API I am proposing for FLB keeps the same global state for
a single file-handler type. However, HugeTLB might have multiple file
handlers, so the API would need to be extended slightly to support
this case. Multiple file handlers will share the same global resource
with the same callbacks.

Pasha

> > + * @reserved: Must be zero.
> > + * @state:    Output; The live update state of this FD.
> > + *
> > + * Query the current live update state of a specific preserved file descriptor.
> > + *
> > + * - %LIVEUPDATE_STATE_NORMAL:   Default state
> > + * - %LIVEUPDATE_STATE_PREPARED: Prepare callback has been performed on this FD.
> > + * - %LIVEUPDATE_STATE_FROZEN:   Freeze callback ahs been performed on this FD.
> > + * - %LIVEUPDATE_STATE_UPDATED:  The system has successfully rebooted into the
> > + *                               new kernel.
> > + *
> > + * See the definition of &enum liveupdate_state for more details on each state.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +struct liveupdate_session_get_state {
> > +     __u32           size;
> > +     __u8            incoming;
> > +     __u8            reserved[3];
> > +     __u32           state;
> > +};
> > +
> > +#define LIVEUPDATE_SESSION_GET_STATE                                 \
> > +     _IO(LIVEUPDATE_IOCTL_TYPE, LIVEUPDATE_CMD_SESSION_GET_STATE)
> [...]
>
> --
> Regards,
> Pratyush Yadav

  reply	other threads:[~2025-10-29 20:13 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-29  1:02 [PATCH v4 00/30] Live Update Orchestrator Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 01/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 02/30] kho: make debugfs interface optional Pasha Tatashin
2025-10-06 16:30   ` Pratyush Yadav
2025-10-06 18:02     ` Pasha Tatashin
2025-10-06 16:55   ` Pratyush Yadav
2025-10-06 17:23     ` Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 03/30] kho: drop notifiers Pasha Tatashin
2025-10-06 14:30   ` Pratyush Yadav
2025-10-06 16:17     ` Pasha Tatashin
2025-10-06 16:38     ` Pratyush Yadav
2025-10-06 17:01   ` Pratyush Yadav
2025-10-06 17:21     ` Pasha Tatashin
2025-10-07 12:09       ` Pratyush Yadav
2025-10-07 13:16         ` Pasha Tatashin
2025-10-07 13:30           ` Pratyush Yadav
2025-09-29  1:02 ` [PATCH v4 04/30] kho: add interfaces to unpreserve folios and page ranes Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 05/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 06/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 07/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 08/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 09/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 10/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 11/30] liveupdate: luo_session: Add sessions support Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 12/30] liveupdate: luo_ioctl: add user interface Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 13/30] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file preservation and state management Pasha Tatashin
2025-10-29 19:07   ` Pratyush Yadav
2025-10-29 20:13     ` Pasha Tatashin [this message]
2025-10-29 20:43       ` David Matlack
2025-10-29 20:57         ` Pasha Tatashin
2025-10-29 21:13       ` David Matlack
2025-10-29 21:17         ` Pasha Tatashin
2025-10-29 22:00       ` Samiullah Khawaja
2025-10-30 14:45         ` Pasha Tatashin
2025-10-29 20:37   ` Pratyush Yadav
2025-10-29 20:58     ` Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 15/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 16/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 17/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-10-03 23:17   ` Vipin Sharma
2025-10-04  2:08     ` Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 19/30] docs: add luo documentation Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 20/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 21/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 22/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 23/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 24/30] luo: allow preserving memfd Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 25/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test Pasha Tatashin
2025-10-03 22:51   ` Vipin Sharma
2025-10-04  2:07     ` Pasha Tatashin
2025-10-04  2:37       ` Pasha Tatashin
2025-10-09 22:57         ` Vipin Sharma
2025-09-29  1:03 ` [PATCH v4 27/30] selftests/liveupdate: Add multi-file and unreclaimed file test Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 28/30] selftests/liveupdate: Add multi-session workflow and state interaction test Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 29/30] selftests/liveupdate: Add test for unreclaimed resource cleanup Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 30/30] selftests/liveupdate: Add tests for per-session state and cancel cycles Pasha Tatashin
2025-10-07 17:10 ` [PATCH v4 00/30] Live Update Orchestrator Pasha Tatashin
2025-10-07 17:50   ` Jason Gunthorpe
2025-10-08  3:18     ` Pasha Tatashin
2025-10-08  7:03   ` Samiullah Khawaja
2025-10-08 16:40     ` Pasha Tatashin
2025-10-08 19:35       ` Jason Gunthorpe
2025-10-08 20:26         ` Pasha Tatashin
2025-10-09 14:48           ` Jason Gunthorpe
2025-10-09 15:01             ` Pasha Tatashin
2025-10-09 15:03               ` Pasha Tatashin
2025-10-09 16:46               ` Samiullah Khawaja
2025-10-09 17:39               ` Jason Gunthorpe
2025-10-09 18:37                 ` Pasha Tatashin
2025-10-10 14:35                   ` Jason Gunthorpe
2025-10-09 21:58   ` Samiullah Khawaja
2025-10-09 22:42     ` Pasha Tatashin
2025-10-10 14:42       ` Jason Gunthorpe
2025-10-10 14:58         ` Pasha Tatashin
2025-10-10 15:02           ` Jason Gunthorpe
2025-10-09 22:57   ` Pratyush Yadav
2025-10-09 23:50     ` Pasha Tatashin
2025-10-10 15:01       ` Jason Gunthorpe
2025-10-14 13:29         ` Pratyush Yadav
2025-10-20 14:29           ` Jason Gunthorpe
2025-10-27 11:37             ` Pratyush Yadav
2025-10-13 15:23       ` Pratyush Yadav
2025-10-10 12:45     ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+CK2bBVSX26TKwgLkXCDop5u3e9McH3sQMascT47ZwwrwraOw@mail.gmail.com \
    --to=pasha.tatashin@soleen.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=jgg@nvidia.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=lennart@poettering.net \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=parav@nvidia.com \
    --cc=pratyush@kernel.org \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhawaja@google.com \
    --cc=song@kernel.org \
    --cc=steven.sistare@oracle.com \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=witu@nvidia.com \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).