All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
To: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: changyuanl@google.com, graf@amazon.com, rppt@kernel.org,
	rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org,
	ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
	ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org,
	akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr,
	mmaurer@google.com, roman.gushchin@linux.dev,
	chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com,
	jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org,
	dan.j.williams@intel.com, david@redhat.com,
	joel.granados@kernel.org, rostedt@goodmis.org,
	anna.schumaker@oracle.com, song@kernel.org,
	zhangguopeng@kylinos.cn, linux@weissschuh.net,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, gregkh@linuxfoundation.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	rafael@kernel.org, dakr@kernel.org,
	bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
	myungjoo.ham@samsung.com, yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com, ira.weiny@intel.com,
	leon@kernel.org, lukas@wunner.de, bhelgaas@google.com,
	wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com,
	jgowans@amazon.com, jgg@nvidia.com
Subject: Re: [RFC v1 1/3] luo: Live Update Orchestrator
Date: Thu, 20 Mar 2025 15:39:43 +0200	[thread overview]
Message-ID: <Z9wan08CpbvddHhc@smile.fi.intel.com> (raw)
In-Reply-To: <20250320024011.2995837-2-pasha.tatashin@soleen.com>

On Thu, Mar 20, 2025 at 02:40:09AM +0000, Pasha Tatashin wrote:
> Introduces the Live Update Orchestrator (LUO), a new kernel subsystem
> designed to facilitate live updates. Live update is a method to reboot
> the kernel while attempting to keep selected devices alive across the
> reboot boundary, minimizing downtime.
> 
> The primary use case is cloud environments, allowing hypervisor updates
> without fully disrupting running virtual machines. VMs can be suspended
> while the hypervisor kernel reboots, and devices attached to these VM
> are kept operational by the LUO.
> 
> Features introduced:
> 
> - Core orchestration logic for managing the live update process.
> - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track
>   the progress of live updates.
> - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU,
>   etc.) to register callbacks for different live update events:
>     - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout).
>     - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout).
>     - LIVEUPDATE_FINISH: Cleanup after update (after blackout).
>     - LIVEUPDATE_CANCEL: Rollback actions on failure or user request.
> - A sysfs interface (/sys/kernel/liveupdate/) for user-space control:
>     - `prepare`: Initiate preparation (write 1) or reset (write 0).
>     - `finish`: Finalize update in new kernel (write 1).
>     - `cancel`: Abort ongoing preparation or reboot (write 1).
>     - `reset`: Force state back to normal (write 1).
>     - `state`: Read-only view of the current LUO state.
>     - `enabled`: Read-only view of whether live update is enabled.
> - Integration with KHO to pass orchestrator state to the new kernel.
> - Version checking during startup of the new kernel to ensure
>   compatibility with the previous kernel's live update state.
> 
> This infrastructure allows various kernel subsystems to coordinate and
> participate in the live update process, serializing and restoring device
> state across a kernel reboot.

...

> +Date:		March 2025
> +KernelVersion:	6.14.0

This is way too optimistic, it even won't make v6.15.
And date can be chosen either v6.16-rc1 or v6.16 release
in accordance with prediction tool

...

> +#ifndef _LINUX_LIVEUPDATE_H
> +#define _LINUX_LIVEUPDATE_H

> +#include <linux/compiler.h>
> +#include <linux/notifier.h>

This is semi-random list of inclusions. Try to follow IWYU principle.
See below.

...

> +bool liveupdate_state_updated(void);

Where bool is defined?

...

> +/**
> + * LIVEUPDATE_DECLARE_NOTIFIER - Declare a live update notifier with default
> + * structure.
> + * @_name: A base name used to generate the names of the notifier block
> + * (e.g., ``_name##_liveupdate_notifier_block``) and the callback function
> + * (e.g., ``_name##_liveupdate``).
> + * @_priority: The priority of the notifier, specified using the
> + * ``enum liveupdate_cb_priority`` values
> + * (e.g., ``LIVEUPDATE_CB_PRIO_BEFORE_DEVICES``).
> + *
> + * This macro declares a static struct notifier_block and a corresponding
> + * notifier callback function for use with the live update orchestrator.
> + * It simplifies the process by automatically handling the dispatching of
> + * live update events to separate handler functions for prepare, reboot,
> + * finish, and cancel.
> + *
> + * This macro expects the following functions to be defined:
> + *
> + * ``_name##_liveupdate_prepare()``:  Called on LIVEUPDATE_PREPARE.
> + * ``_name##_liveupdate_reboot()``:   Called on LIVEUPDATE_REBOOT.
> + * ``_name##_liveupdate_finish()``:   Called on LIVEUPDATE_FINISH.
> + * ``_name##_liveupdate_cancel()``:   Called on LIVEUPDATE_CANCEL.
> + *
> + * The generated callback function handles the switch statement for the
> + * different live update events and calls the appropriate handler function.
> + * It also includes warnings if the finish or cancel handlers return an error.
> + *
> + * For example, declartion can look like this:
> + *
> + * ``static int foo_liveupdate_prepare(void) { ... }``
> + *
> + * ``static int foo_liveupdate_reboot(void) { ... }``
> + *
> + * ``static int foo_liveupdate_finish(void) { ... }``
> + *
> + * ``static int foo_liveupdate_cancel(void) { ... }``
> + *
> + * ``LIVEUPDATE_DECLARE_NOTIFIER(foo, LIVEUPDATE_CB_PRIO_WITH_DEVICES);``
> + *

Hmm... Have you run kernel-doc validator? There is missing Return section and
it will warn about that.

> + */

...

> +		WARN_ONCE(rv, "cancel failed[%d]\n", rv);		\

+ bug.h

...

> + #undef pr_fmt
> + #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt

Leftover from the development?

> +#undef pr_fmt

Not needed as long as pr_fmt9) is at the top of the file.

> +#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt

...

> +#include <linux/kernel.h>

What for? Can you follow IWYU, please? Here again semi-random list of
inclusions.

> +#include <linux/sysfs.h>
> +#include <linux/string.h>
> +#include <linux/rwsem.h>
> +#include <linux/err.h>
> +#include <linux/liveupdate.h>
> +#include <linux/cpufreq.h>
> +#include <linux/kexec_handover.h>

Can you keep them ordered which will be easier to read and maintain?

...

> +static int __init early_liveupdate_param(char *buf)
> +{
> +	return kstrtobool(buf, &luo_enabled);
> +}

> +

Redundant blank line.

> +early_param("liveupdate", early_liveupdate_param);

...

> +/* Show the current live update state */
> +static ssize_t state_show(struct kobject *kobj,
> +			  struct kobj_attribute *attr,

It is still enough room even for the strict 80 limit case.

> +			  char *buf)
> +{
> +	return sysfs_emit(buf, "%s\n", LUO_STATE_STR);
> +}

...

> +		ret = blocking_notifier_call_chain(&luo_notify_list,
> +						   event,
> +						   NULL);

There is room on the previous lines. Ditto for the similar cases all over
the code.

...

> +{
> +	int ret;
> +
> +	if (down_write_killable(&luo_state_rwsem)) {
> +		pr_warn(" %s, change state canceled by user\n", __func__);

Why __func__ is so important in _this_ message? And why leading space?
Ditto for the similar cases.

> +		return -EAGAIN;
> +	}
> +
> +	if (!IS_STATE(LIVEUPDATE_STATE_NORMAL)) {
> +		pr_warn("Can't switch to [%s] from [%s] state\n",
> +			luo_state_str[LIVEUPDATE_STATE_PREPARED],
> +			LUO_STATE_STR);
> +		up_write(&luo_state_rwsem);
> +
> +		return -EINVAL;
> +	}
> +
> +	ret = luo_notify(LIVEUPDATE_PREPARE);
> +	if (!ret)
> +		luo_set_state(LIVEUPDATE_STATE_PREPARED);
> +
> +	up_write(&luo_state_rwsem);
> +
> +	return ret;
> +}

...

> +static ssize_t prepare_store(struct kobject *kobj,
> +			     struct kobj_attribute *attr,
> +			     const char *buf,
> +			     size_t count)
> +{
> +	ssize_t ret;
> +	long val;

> +	if (kstrtol(buf, 0, &val) < 0)
> +		return -EINVAL;

Shadower error code.


> +	if (val != 1 && val != 0)
> +		return -EINVAL;

What's wrong with using kstrtobool() from the beginning?

> +
> +	if (val)
> +		ret = luo_prepare();
> +	else
> +		ret = luo_cancel();

> +	if (!ret)
> +		ret = count;
> +
> +	return ret;

Can we go with the usual pattern "check for errors first"?

	if (ret)
		return ret;

	...

> +}

...

> +static ssize_t finish_store(struct kobject *kobj,
> +			    struct kobj_attribute *attr,
> +			    const char *buf,
> +			    size_t count)

Same comments as per above.

...

> +static struct attribute *luo_attrs[] = {
> +	&state_attribute.attr,
> +	&prepare_attribute.attr,
> +	&finish_attribute.attr,

> +	NULL,

No comma for the terminator entry.

> +};

...

> +static int __init luo_init(void)
> +{
> +	int ret;
> +
> +	if (!luo_enabled || !kho_is_enabled()) {
> +		pr_info("disabled by user\n");
> +		luo_enabled = false;
> +
> +		return 0;
> +	}

Can be written like

	if (!kho_is_enabled())
		luo_enabled = false;
	if (!luo_enabled) {
		pr_info("disabled by user\n");
		return 0;
	}

> +	ret = sysfs_create_group(kernel_kobj, &luo_attr_group);
> +	if (ret)
> +		pr_err("Failed to create group\n");
> +
> +	luo_sysfs_initialized = true;
> +	pr_info("Initialized\n");
> +
> +	return ret;

Something is odd here between (non-)checking for errors and printed messages.

> +}

...

> +EXPORT_SYMBOL_GPL(liveupdate_state_normal);

No namespace?

...

> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -18,6 +18,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/uaccess.h>
> +#include <linux/liveupdate.h>

Can oyu preserve order (with given context at least)?

-- 
With Best Regards,
Andy Shevchenko



  reply	other threads:[~2025-03-20 13:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  2:40 [RFC v1 0/3] Live Update Orchestrator Pasha Tatashin
2025-03-20  2:40 ` [RFC v1 1/3] luo: " Pasha Tatashin
2025-03-20 13:39   ` Andy Shevchenko [this message]
2025-03-20 16:35     ` Pasha Tatashin
2025-03-20 17:50       ` Andy Shevchenko
2025-03-20 18:30         ` Pasha Tatashin
2025-03-21 13:19           ` Andy Shevchenko
2025-03-20 14:43   ` Jason Gunthorpe
2025-03-20 19:00     ` Pasha Tatashin
2025-03-20 19:26       ` Jason Gunthorpe
2025-03-27 19:29         ` Pasha Tatashin
2025-03-31 16:37           ` Jason Gunthorpe
2025-04-25 17:21           ` Lukas Wunner
2025-03-20  2:40 ` [RFC v1 2/3] luo: dev_liveupdate: Add device live update infrastructure Pasha Tatashin
2025-03-20 13:34   ` Greg KH
2025-03-20 18:03     ` Pasha Tatashin
2025-03-20 20:51       ` Greg KH
2025-03-21  9:41         ` Bartosz Golaszewski
2025-03-20  2:40 ` [RFC v1 3/3] luo: x86: Enable live update support Pasha Tatashin
2025-03-20 13:35 ` [RFC v1 0/3] Live Update Orchestrator Greg KH
2025-03-20 15:34   ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9wan08CpbvddHhc@smile.fi.intel.com \
    --to=andriy.shevchenko@linux.intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.