linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com,
	 changyuanl@google.com, dmatlack@google.com, rientjes@google.com,
	 corbet@lwn.net, rdunlap@infradead.org,
	ilpo.jarvinen@linux.intel.com,  kanie@linux.alibaba.com,
	ojeda@kernel.org, aliceryhl@google.com,  masahiroy@kernel.org,
	akpm@linux-foundation.org, tj@kernel.org,  yoann.congal@smile.fr,
	mmaurer@google.com, roman.gushchin@linux.dev,
	 chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com,
	 jannh@google.com, vincent.guittot@linaro.org,
	hannes@cmpxchg.org,  dan.j.williams@intel.com, david@redhat.com,
	joel.granados@kernel.org,  rostedt@goodmis.org,
	anna.schumaker@oracle.com, song@kernel.org,
	 zhangguopeng@kylinos.cn, linux@weissschuh.net,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org, gregkh@linuxfoundation.org,
	 tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	 dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	rafael@kernel.org,  dakr@kernel.org,
	bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
	 myungjoo.ham@samsung.com, yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com,  quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com, ira.weiny@intel.com,
	 andriy.shevchenko@linux.intel.com, leon@kernel.org,
	lukas@wunner.de,  bhelgaas@google.com, wagi@kernel.org,
	djeffery@redhat.com,  stuart.w.hayes@gmail.com,
	ptyadav@amazon.de
Subject: Re: [RFC v2 04/16] luo: luo_core: Live Update Orchestrator
Date: Fri, 30 May 2025 01:00:28 -0400	[thread overview]
Message-ID: <CA+CK2bDWVu137cPdbu7yOBNGm_ixoeJkQucZW_gPXV3FzTPMKQ@mail.gmail.com> (raw)
In-Reply-To: <aDQKrOjtHXbJqG9n@kernel.org>

> > +config LIVEUPDATE
> > +     bool "Live Update Orchestrator"
> > +     depends on KEXEC_HANDOVER
> > +     help
> > +       Enable the Live Update Orchestrator. Live Update is a mechanism,
> > +       typically based on kexec, that allows the kernel to be updated
> > +       while keeping selected devices operational across the transition.
> > +       These devices are intended to be reclaimed by the new kernel and
> > +       re-attached to their original workload without requiring a device
> > +       reset.
> > +
> > +       This functionality depends on specific support within device drivers
> > +       and related kernel subsystems.
>
> This is not clear if the ability to reattach a device to the new kernel or
> the entire live update functionality depends on specific support with
> drivers.
>
> Probably better phrase it as
>
>           Ability to handover a device from old to new kernel depends ...

Updated

>
> > +
> > +       This feature is primarily used in cloud environments to quickly
> > +       update the kernel hypervisor with minimal disruption to the
> > +       running virtual machines.
>
> I wouldn't put it into Kconfig. If anything I'd make it
>
>           This feature primarily targets virtual machine hosts to quickly ...

Ok

> > + * The core of LUO is a state machine that tracks the progress of a live update,
> > + * along with a callback API that allows other kernel subsystems to participate
> > + * in the process. Example subsystems that can hook into LUO include: kvm,
> > + * iommu, interrupts, vfio, participating filesystems, and mm.
>
> Please spell out memory management.

Done.

>
> > + * LUO uses KHO to transfer memory state from the current Kernel to the next
>
> A link to KHO docs would have been nice, but I'm not sure kernel-doc can do
> that nicely.

Added a link, a simple path to rst, is apparently correctly converted
to a link by sphinx.

>
> > + * Kernel.
>
> Why capital 'K'? :)

Fixed.

>
> > + * The LUO state machine ensures that operations are performed in the correct
> > + * sequence and provides a mechanism to track and recover from potential
> > + * failures, and select devices and subsystems that should participate in
> > + * live update sequence.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/err.h>
> > +#include <linux/kobject.h>
> > +#include <linux/liveupdate.h>
> > +#include <linux/rwsem.h>
> > +#include <linux/string.h>
> > +#include "luo_internal.h"
> > +
> > +static DECLARE_RWSEM(luo_state_rwsem);
> > +
> > +enum liveupdate_state luo_state;
>
> static?

Fixed

> Hmm, luo_state is initialized to 0 (NORMAL) which means we always start
> from NORMAL, although the second kernel is not in the normal state until
> the handover is complete. Maybe we need an initial "unknown" state until
> some of luo code starts running and would set an actual known state?

Added: LIVEUPDATE_STATE_UNDEFINED that exists only before LUO is
initialized during boot.

> > +const char *const luo_state_str[] = {
> > +     [LIVEUPDATE_STATE_NORMAL]       = "normal",
> > +     [LIVEUPDATE_STATE_PREPARED]     = "prepared",
> > +     [LIVEUPDATE_STATE_FROZEN]       = "frozen",
> > +     [LIVEUPDATE_STATE_UPDATED]      = "updated",
> > +};
> > +
> > +bool luo_enabled;
>
> static?

Fixed.

>
> > +static int __init early_liveupdate_param(char *buf)
> > +{
> > +     return kstrtobool(buf, &luo_enabled);
> > +}
> > +early_param("liveupdate", early_liveupdate_param);
> > +
> > +/* Return true if the current state is equal to the provided state */
> > +static inline bool is_current_luo_state(enum liveupdate_state expected_state)
> > +{
> > +     return READ_ONCE(luo_state) == expected_state;
> > +}
> > +
> > +static void __luo_set_state(enum liveupdate_state state)
> > +{
> > +     WRITE_ONCE(luo_state, state);
> > +}
> > +
> > +static inline void luo_set_state(enum liveupdate_state state)
> > +{
> > +     pr_info("Switched from [%s] to [%s] state\n",
> > +             LUO_STATE_STR, luo_state_str[state]);
>
> Maybe LUO_CURRENT_STATE_STR?

Done

> > +     __luo_set_state(state);
> > +}
> > +
> > +static int luo_do_freeze_calls(void)
> > +{
> > +     return 0;
> > +}
> > +
> > +static void luo_do_finish_calls(void)
> > +{
> > +}
> > +
> > +int luo_prepare(void)
> > +{
> > +     return 0;
> > +}
> > +
> > +/**
> > + * luo_freeze() - Initiate the final freeze notification phase for live update.
> > + *
> > + * Attempts to transition the live update orchestrator state from
> > + * %LIVEUPDATE_STATE_PREPARED to %LIVEUPDATE_STATE_FROZEN. This function is
> > + * typically called just before the actual reboot system call (e.g., kexec)
> > + * is invoked, either directly by the orchestration tool or potentially from
> > + * within the reboot syscall path itself.
> > + *
> > + * Based on the outcome of the notification process:
> > + * - If luo_do_freeze_calls() returns 0 (all callbacks succeeded), the state
> > + * is set to %LIVEUPDATE_STATE_FROZEN using luo_set_state(), indicating
> > + * readiness for the imminent kexec.
> > + * - If luo_do_freeze_calls() returns a negative error code (a callback
> > + * failed), the state is reverted to %LIVEUPDATE_STATE_NORMAL using
> > + * luo_set_state() to cancel the live update attempt.
>
> The kernel-doc comments are mostly for users of a function and describe how
> it should be used rather how it is implemented.

SGTM, cleaned-up.

> I don't think it's important to mention return values of
> luo_do_freeze_calls() here. The important things are whether registered
> subsystems succeeded to freeze or not and the state changes.
> I'd also mention that if a subsystem fails to freeze, everything is
> canceled.

Added

> > +/**
> > + * luo_finish - Finalize the live update process in the new kernel.
> > + *
> > + * This function is called  after a successful live update reboot into a new
> > + * kernel, once the new kernel is ready to transition to the normal operational
> > + * state. It signals the completion of the live update sequence to subsystems.
> > + *
> > + * It first attempts to acquire the write lock for the orchestrator state.
> > + *
> > + * Then, it checks if the system is in the ``LIVEUPDATE_STATE_UPDATED`` state.
> > + * If not, it logs a warning and returns ``-EINVAL``.
> > + *
> > + * If the state is correct, it triggers the ``LIVEUPDATE_FINISH`` notifier
>
> Here too, you describe what the function does rather how it should be used

Fixed

>
> > + * chain. Note that the return value of the notifier is intentionally ignored as
> > + * finish callbacks must not fail. Finally, the orchestrator state is
>
> And what should happen if there was an error in a finish callback?

Scream, warn, panic, we cannot allow running a system past liveupdate,
if some state was not properly passed from the previous kernel to the
current kernel. This may result in catastrophic memory leaks.

> > +static int __init luo_startup(void)
> > +{
> > +     __luo_set_state(LIVEUPDATE_STATE_NORMAL);
> > +
> > +     return 0;
> > +}
> > +early_initcall(luo_startup);
>
> This means that the second kernel starts with luo_state == NORMAL, then
> at early_initcall transitions to NORMAL again and later is set to UPDATED,
> doesn't it?

In the next patch, in this function we transition to UPDATED. So,
technically, we go from NORMAL to UPDATED. However, I added UNDEFINED
state so, in this function we either go from UNDEFINED to UPDATED or
UNDEFINED to NORMAL.


> > + * @return true if the system is in the ``LIVEUPDATE_STATE_NORMAL`` state,
> > + * false otherwise.
> > + */
> > +bool liveupdate_state_normal(void)
> > +{
> > +     return is_current_luo_state(LIVEUPDATE_STATE_NORMAL);
> > +}
> > +EXPORT_SYMBOL_GPL(liveupdate_state_normal);
>
> Won't liveupdate_get_state() do?

Yeah, we can simply return state, and let caller to compare. However,
I think, caller is only interested if this is normal state or if live
update is in progress. I will keep them, and also added
liveupdate_get_state().

> > +
> > +/**
> > + * liveupdate_enabled - Check if the live update feature is enabled.
> > + *
> > + * This function returns the state of the live update feature flag, which
> > + * can be controlled via the ``liveupdate`` kernel command-line parameter.
> > + *
> > + * @return true if live update is enabled, false otherwise.
> > + */
> > +bool liveupdate_enabled(void)
> > +{
> > +     return luo_enabled;
> > +}
> > +EXPORT_SYMBOL_GPL(liveupdate_enabled);
> > diff --git a/drivers/misc/liveupdate/luo_internal.h b/drivers/misc/liveupdate/luo_internal.h
> > new file mode 100644
> > index 000000000000..34e73fb0318c
> > --- /dev/null
> > +++ b/drivers/misc/liveupdate/luo_internal.h
> > @@ -0,0 +1,26 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +#ifndef _LINUX_LUO_INTERNAL_H
> > +#define _LINUX_LUO_INTERNAL_H
> > +
> > +int luo_cancel(void);
> > +int luo_prepare(void);
> > +int luo_freeze(void);
> > +int luo_finish(void);
> > +
> > +void luo_state_read_enter(void);
> > +void luo_state_read_exit(void);
> > +
> > +extern const char *const luo_state_str[];
> > +
> > +/* Get the current state as a string */
> > +#define LUO_STATE_STR luo_state_str[READ_ONCE(luo_state)]
>
> IIUC you need the macro to have LUO_STATE_STR available in all files in
> liveupdate/ but without exposing luo_state.
>
> I think that we can do a function call to get that string, will make things
> nicer IMHO.

Done.

>
> > +
> > +extern enum liveupdate_state luo_state;
> > +
> > +#endif /* _LINUX_LUO_INTERNAL_H */
> > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
> > new file mode 100644
> > index 000000000000..c2740da70958
> > --- /dev/null
> > +++ b/include/linux/liveupdate.h
> > @@ -0,0 +1,131 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Copyright (c) 2025, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +#ifndef _LINUX_LIVEUPDATE_H
> > +#define _LINUX_LIVEUPDATE_H
> > +
> > +#include <linux/bug.h>
> > +#include <linux/types.h>
> > +#include <linux/list.h>
> > +
> > +/**
> > + * enum liveupdate_event - Events that trigger live update callbacks.
> > + * @LIVEUPDATE_PREPARE: PREPARE should happens *before* the blackout window.
>
> should happen or happens ;-)

Done

>
> > + *                      Subsystems should prepare for an upcoming reboot by
> > + *                      serializing their states. However, it must be considered
>
> It's not only about state serialization, it's also about adjusting
> operational mode so that state that was serialized won't be changed or at
> least the changes from PREPARE to FREEZE would be accounted somehow.

By serialization, I mean is to save their state, but I agree, the
devices and resources are also should be in a limited state where the
serialized data should not be altered between prepare and freeze (i.e.
no memfd resizing, no new DMA mappings, etc).

Thank you for your comments.
Pasha


  reply	other threads:[~2025-05-30  5:01 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-15 18:23 [RFC v2 00/16] Live Update Orchestrator Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 01/16] kho: make debugfs interface optional Pasha Tatashin
2025-06-04 16:03   ` Pratyush Yadav
2025-06-06 16:12     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 02/16] kho: allow to drive kho from within kernel Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 03/16] kho: add kho_unpreserve_folio/phys Pasha Tatashin
2025-06-04 15:00   ` Pratyush Yadav
2025-06-06 16:22     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 04/16] luo: luo_core: Live Update Orchestrator Pasha Tatashin
2025-05-26  6:31   ` Mike Rapoport
2025-05-30  5:00     ` Pasha Tatashin [this message]
2025-06-04 15:17   ` Pratyush Yadav
2025-06-07 17:11     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 05/16] luo: luo_core: integrate with KHO Pasha Tatashin
2025-05-26  7:18   ` Mike Rapoport
2025-06-07 17:50     ` Pasha Tatashin
2025-06-09  2:14       ` Pasha Tatashin
2025-06-04 16:00   ` Pratyush Yadav
2025-06-07 23:30     ` Pasha Tatashin
2025-06-13 14:58       ` Pratyush Yadav
2025-06-17 15:23         ` Jason Gunthorpe
2025-06-17 19:32           ` Pasha Tatashin
2025-06-18 13:11             ` Pratyush Yadav
2025-06-18 14:48               ` Pasha Tatashin
2025-06-18 16:40                 ` Mike Rapoport
2025-06-18 17:00                   ` Pasha Tatashin
2025-06-18 17:43                     ` Pasha Tatashin
2025-06-19 12:00                       ` Mike Rapoport
2025-06-19 14:22                         ` Pasha Tatashin
2025-06-20 15:28                           ` Pratyush Yadav
2025-06-20 16:03                             ` Pasha Tatashin
2025-06-24 16:12                               ` Pratyush Yadav
2025-06-24 16:55                                 ` Pasha Tatashin
2025-06-24 18:31                                 ` Jason Gunthorpe
2025-06-23  7:32                       ` Mike Rapoport
2025-06-23 11:29                         ` Pasha Tatashin
2025-06-25 13:46                           ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 06/16] luo: luo_subsystems: add subsystem registration Pasha Tatashin
2025-05-26  7:31   ` Mike Rapoport
2025-06-07 23:42     ` Pasha Tatashin
2025-05-28 19:12   ` David Matlack
2025-06-07 23:58     ` Pasha Tatashin
2025-06-04 16:30   ` Pratyush Yadav
2025-06-08  0:04     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 07/16] luo: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 08/16] luo: luo_files: add infrastructure for FDs Pasha Tatashin
2025-05-15 23:15   ` James Houghton
2025-05-23 18:09     ` Pasha Tatashin
2025-05-26  7:55   ` Mike Rapoport
2025-06-05 11:56     ` Pratyush Yadav
2025-06-08 13:13     ` Pasha Tatashin
2025-06-05 15:56   ` Pratyush Yadav
2025-06-08 13:37     ` Pasha Tatashin
2025-06-13 15:27       ` Pratyush Yadav
2025-06-15 18:02         ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 09/16] luo: luo_files: implement file systems callbacks Pasha Tatashin
2025-06-05 16:03   ` Pratyush Yadav
2025-06-08 13:49     ` Pasha Tatashin
2025-06-13 15:18       ` Pratyush Yadav
2025-06-13 20:26         ` Pasha Tatashin
2025-06-16 10:43           ` Pratyush Yadav
2025-06-16 14:57             ` Pasha Tatashin
2025-06-18 13:16               ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 10/16] luo: luo_ioctl: add ioctl interface Pasha Tatashin
2025-05-26  8:42   ` Mike Rapoport
2025-06-08 15:08     ` Pasha Tatashin
2025-05-28 20:29   ` David Matlack
2025-06-08 16:32     ` Pasha Tatashin
2025-06-05 16:15   ` Pratyush Yadav
2025-06-08 16:35     ` Pasha Tatashin
2025-06-24  9:50   ` Christian Brauner
2025-06-24 14:27     ` Pasha Tatashin
2025-06-25  9:36       ` Christian Brauner
2025-06-25 16:12         ` David Matlack
2025-06-26 15:42           ` Pratyush Yadav
2025-06-26 16:24             ` David Matlack
2025-07-14 14:56               ` Pratyush Yadav
2025-07-17 16:17                 ` David Matlack
2025-07-23 14:51                   ` Pratyush Yadav
2025-07-06 14:33             ` Mike Rapoport
2025-07-07 12:56               ` Jason Gunthorpe
2025-06-25 16:58         ` pasha.tatashin
2025-07-06 14:24     ` Mike Rapoport
2025-07-09 21:27       ` Pratyush Yadav
2025-07-10  7:26         ` Mike Rapoport
2025-07-14 14:34           ` Jason Gunthorpe
2025-07-16  9:43             ` Greg KH
2025-05-15 18:23 ` [RFC v2 11/16] luo: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-06-05 16:20   ` Pratyush Yadav
2025-06-08 16:36     ` Pasha Tatashin
2025-06-13 15:13       ` Pratyush Yadav
2025-05-15 18:23 ` [RFC v2 12/16] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 13/16] luo: add selftests for subsystems un/registration Pasha Tatashin
2025-05-26  8:52   ` Mike Rapoport
2025-06-08 16:47     ` Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 14/16] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-05-15 18:23 ` [RFC v2 15/16] docs: add luo documentation Pasha Tatashin
2025-05-26  9:00   ` Mike Rapoport
2025-05-15 18:23 ` [RFC v2 16/16] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-05-20  7:25 ` [RFC v2 00/16] Live Update Orchestrator Mike Rapoport
2025-05-23 18:07   ` Pasha Tatashin
2025-05-26  6:32 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+CK2bDWVu137cPdbu7yOBNGm_ixoeJkQucZW_gPXV3FzTPMKQ@mail.gmail.com \
    --to=pasha.tatashin@soleen.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=pratyush@kernel.org \
    --cc=ptyadav@amazon.de \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).