From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C22E5C3ABB2 for ; Fri, 30 May 2025 05:01:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 352756B0082; Fri, 30 May 2025 01:01:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DB9E6B0083; Fri, 30 May 2025 01:01:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A3E36B0085; Fri, 30 May 2025 01:01:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E6E606B0082 for ; Fri, 30 May 2025 01:01:19 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7E072866D3 for ; Fri, 30 May 2025 05:01:19 +0000 (UTC) X-FDA: 83498375478.18.6C59CF1 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf21.hostedemail.com (Postfix) with ESMTP id BEC951C0016 for ; Fri, 30 May 2025 05:01:17 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=dsF56rn6; spf=pass (imf21.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748581277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YunbdSs6YerLx1tbMdF1KtWTH3umX9wbCKgI+0kPVQc=; b=J/gPoBalrc+sqnV+W54v/sxaEjVPxduTA8DNKAkVdfal0P641hULuHacIZirXHN27j0/Zk Zgkk7Z2vu4ajNnsryxT8oz5Xp2+qaiF9P8MZO15jP5nJSitmfLLaMsAjN3ZNAcc7+cgRJ+ F2uPh7+u0pRnZqQXvteoy7cz80agYXg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=dsF56rn6; spf=pass (imf21.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748581277; a=rsa-sha256; cv=none; b=ko69PDC8YOILLxACEYDVZwlgKCkEIs8tRIfWI4dgzww9qEV+QjJShXiyR30TAXpK9pNG1K Hc/y7D4+uBkyAsgoGYdpAJtnMe3E4j41Jc95+FCPEOv8LSxsMDPsRQj3cttmtbYaezVtH3 xVZP/jUK8/E0+HqvAbSBZhWYtQ2xarw= Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-735b2699d5dso787957a34.0 for ; Thu, 29 May 2025 22:01:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1748581277; x=1749186077; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=YunbdSs6YerLx1tbMdF1KtWTH3umX9wbCKgI+0kPVQc=; b=dsF56rn6fJ5bXav6kKPTW6DVdVQ7QG4iFZ6x5rm8y7QxnOng6/BnUp+tPT2h+edbHB Y3pNmBB1tKkcdwrynBcJ1WGGyT5kgm0L5vWNO2JfbPXOk4GHMtCJDKyTBCChFEBoO2O6 nxQNAhSgOAupDUz9A8fLQuJPF36VDsY/3m1t1v+cKrFFU1hymY4i9dYzD94tctDx9+Rq AaEqi30gnfNTEUnVLAVXoSjFkns38UxLIsJNzzPmUkjnm5PpwY0O13s2wcVIHKJrkSgU 6aDgVzoCdkK+yl5DfhSz2F3Qre/wBJUY6mNtVDsbyWFcI6DgQhT+g1pT3HDSQB4RYaP9 Dh/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748581277; x=1749186077; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YunbdSs6YerLx1tbMdF1KtWTH3umX9wbCKgI+0kPVQc=; b=UImJHE6/JJFvOmk4q/kSzKuma6OrRG33ZrsubN95tWqM0t0DU/SXAqhjYDQWcTj6hz ScN3BYwdq2j3biMYkm5jsjUXNQrUov7m1FDebIg2vzJYSExpYoOf7XpkQlIKve15Qitl cciGtB3QpYFkPocF7Eamy5c1FkaMpwsrxFU7DaJnWoVV2lhTM+vd5n12PwOvDfLMkrWu nth0KOksJ6SxwC5xBTj1Xi96S83PCj88sFIiXHWADn/JgxYTtu3+BtuvGdITJpeURrOA vggS4G+ZlzIIUVE88veZ1fY5Kl0SaxbAvbf6nxCQt8OBI+t8thdPlRC3dqz2nm6vRa8B ST6w== X-Forwarded-Encrypted: i=1; AJvYcCWXOU5bUE4tpv1sapQ1zGEqDdx+OiPZBV2tYYypQxPuGgF9rWd7Nm7DnoL1aR0yB8Km/JA9lqgubQ==@kvack.org X-Gm-Message-State: AOJu0Yy/sCrBkmazgcTaHRpPoUBz5iEt0HLTVrQPbgDH7Zpl1JN0AQDW i0DY123tmDFKukbWlaPOtoWKZ9uY5GDbqdb7SuMWSAjbaNO6ZDQaTb/K+fu9wD4Z2H5y1qIJ0tQ xFNdRA4c05wqsQtwAMDXxLSSNM5USyiQ7DHVaLzpDgs2IegzKCD7b0S8= X-Gm-Gg: ASbGncvIFweEWW0lRk+0GBuALE4Kld3hHurVJpQ0dqhKTWeUZ6muyQCt2YZFhygJlQu 7/dMsqarirNBSOURrllCJLhGtXHcoxcpwDTgqk943KkdOxRdD/O//BScKY2/1tvn3JlA6/07C3D A8NYeBK4XxjzKFjuFnv/5G3sCGmYl4EQ== X-Google-Smtp-Source: AGHT+IEu0lOSPJ9gaj/CnOwHje1qwXsd2Ga+tHw7jydiVfOP0oJ25upss+fSU4AYeKjvCe3VkINEKEXkzD1honycBcI= X-Received: by 2002:a05:622a:2305:b0:4a3:e3df:f9de with SMTP id d75a77b69052e-4a4400691d9mr41392931cf.23.1748581265568; Thu, 29 May 2025 22:01:05 -0700 (PDT) MIME-Version: 1.0 References: <20250515182322.117840-1-pasha.tatashin@soleen.com> <20250515182322.117840-5-pasha.tatashin@soleen.com> In-Reply-To: From: Pasha Tatashin Date: Fri, 30 May 2025 01:00:28 -0400 X-Gm-Features: AX0GCFsuGxA3WBHfUh4AZ4DEUOnu93SjPzxpR3IzxCxUZLpciEuTt-_BKbMOu1g Message-ID: Subject: Re: [RFC v2 04/16] luo: luo_core: Live Update Orchestrator To: Mike Rapoport Cc: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com, ptyadav@amazon.de Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BEC951C0016 X-Stat-Signature: wufk14bpnat5c8hxzyf1qthzco3smc5u X-Rspam-User: X-HE-Tag: 1748581277-620863 X-HE-Meta: U2FsdGVkX19vJXcmjb98J57ekdGYYURVVMBR3ZT5xrN2q7wahbyIEW1GHp0nDtMLOOlkS0pHVFBAlTMMQoquOBecS9Gp4GAZ7qza4aDEvFqwByBbRNas32tTPwYWSLHxcvtPULsEDFke6H/45mlvd5RrNznwfp0/2qFyfFxKV++WJtpawQ0DZhAFGEJ/nNa+vvtz4/311l4HeWFs4k9WAYZf6H3NZPXDux6QZcR5wGlYL8HUr5ygtIKG22ByhoP4BhNTb54A0JAU8DQtsMGRdgf7bPCt3vQGhVb4E5dk6E7wbWIhrOnqrOXYRk1AqYkyC7rhRlEmnE9sNxkn3QVDLqeaCsqN9al9n0gys6CrGj5DCSSaek3WaodQAcwuvBGk5b3dFw5NZoKNurMQcRhBBwSx+mW5tpIv+CxrCF8ANEy44YVr0pqb7mUjAWdR2Ko/LYSLFSWqH6Y7SYv5fd9PVUh7fAwc7o9ZOOkNqo+qSe/ILqBRGT84WDdweC7/1kvhIp3kv6bmILcxTQgzwyqD3ikhOpwkr98aIOB4p8JGqQc3qk18gVzd1tTT0wlGVd8AmQ1V6jkcUc3Foe9NA8yNkFg/Xc1n/UUWu1NsQiovL2Hb7vggOQsHvU5Qp/0J8UXevK4aKjWW+8RDKFhzfM3LA3e6oSA46MfNm8GG9nA4a7o4XNMlCfTwGHwSpE039d8UsSJ7LSliml8K3LWgBtQ9IDRN+XoVo3550ivijZNCtn/9n31oM0IN7ecLn5DUw4+7tJ4MclIL5pYGKEGiRxgjXAoVWR1WQr71xFVUDYHo4EeXbvRyI5gbUIMxZ+jJTpLJ+9b3syn/owF6lW9swC/2GaN5T4u90E3ldDHHfnuww+CdWyUwfyUPX2AgBXM33wFzJNkGEKXezeZWM4PZ9rFBoggJskkff5bte2I3GxAgYft4L3ycwxFHGmJHS5DLjgXwwQLeBcBto5njYr0kuw5 OS5JDkh4 hn+XF3FS58ZRoyAvChVXFntQS2pvGlMSGHO/1zXKbDW1Yd/5ZKhqle8xheU/KrRe9egKmy3Ni+1cFhJ5orZxQrAbEuelrljdhk2aZCuV//5h/FyRnwYB+PIxN/Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > +config LIVEUPDATE > > + bool "Live Update Orchestrator" > > + depends on KEXEC_HANDOVER > > + help > > + Enable the Live Update Orchestrator. Live Update is a mechanism, > > + typically based on kexec, that allows the kernel to be updated > > + while keeping selected devices operational across the transition. > > + These devices are intended to be reclaimed by the new kernel and > > + re-attached to their original workload without requiring a device > > + reset. > > + > > + This functionality depends on specific support within device drivers > > + and related kernel subsystems. > > This is not clear if the ability to reattach a device to the new kernel or > the entire live update functionality depends on specific support with > drivers. > > Probably better phrase it as > > Ability to handover a device from old to new kernel depends ... Updated > > > + > > + This feature is primarily used in cloud environments to quickly > > + update the kernel hypervisor with minimal disruption to the > > + running virtual machines. > > I wouldn't put it into Kconfig. If anything I'd make it > > This feature primarily targets virtual machine hosts to quickly ... Ok > > + * The core of LUO is a state machine that tracks the progress of a live update, > > + * along with a callback API that allows other kernel subsystems to participate > > + * in the process. Example subsystems that can hook into LUO include: kvm, > > + * iommu, interrupts, vfio, participating filesystems, and mm. > > Please spell out memory management. Done. > > > + * LUO uses KHO to transfer memory state from the current Kernel to the next > > A link to KHO docs would have been nice, but I'm not sure kernel-doc can do > that nicely. Added a link, a simple path to rst, is apparently correctly converted to a link by sphinx. > > > + * Kernel. > > Why capital 'K'? :) Fixed. > > > + * The LUO state machine ensures that operations are performed in the correct > > + * sequence and provides a mechanism to track and recover from potential > > + * failures, and select devices and subsystems that should participate in > > + * live update sequence. > > + */ > > + > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "luo_internal.h" > > + > > +static DECLARE_RWSEM(luo_state_rwsem); > > + > > +enum liveupdate_state luo_state; > > static? Fixed > Hmm, luo_state is initialized to 0 (NORMAL) which means we always start > from NORMAL, although the second kernel is not in the normal state until > the handover is complete. Maybe we need an initial "unknown" state until > some of luo code starts running and would set an actual known state? Added: LIVEUPDATE_STATE_UNDEFINED that exists only before LUO is initialized during boot. > > +const char *const luo_state_str[] = { > > + [LIVEUPDATE_STATE_NORMAL] = "normal", > > + [LIVEUPDATE_STATE_PREPARED] = "prepared", > > + [LIVEUPDATE_STATE_FROZEN] = "frozen", > > + [LIVEUPDATE_STATE_UPDATED] = "updated", > > +}; > > + > > +bool luo_enabled; > > static? Fixed. > > > +static int __init early_liveupdate_param(char *buf) > > +{ > > + return kstrtobool(buf, &luo_enabled); > > +} > > +early_param("liveupdate", early_liveupdate_param); > > + > > +/* Return true if the current state is equal to the provided state */ > > +static inline bool is_current_luo_state(enum liveupdate_state expected_state) > > +{ > > + return READ_ONCE(luo_state) == expected_state; > > +} > > + > > +static void __luo_set_state(enum liveupdate_state state) > > +{ > > + WRITE_ONCE(luo_state, state); > > +} > > + > > +static inline void luo_set_state(enum liveupdate_state state) > > +{ > > + pr_info("Switched from [%s] to [%s] state\n", > > + LUO_STATE_STR, luo_state_str[state]); > > Maybe LUO_CURRENT_STATE_STR? Done > > + __luo_set_state(state); > > +} > > + > > +static int luo_do_freeze_calls(void) > > +{ > > + return 0; > > +} > > + > > +static void luo_do_finish_calls(void) > > +{ > > +} > > + > > +int luo_prepare(void) > > +{ > > + return 0; > > +} > > + > > +/** > > + * luo_freeze() - Initiate the final freeze notification phase for live update. > > + * > > + * Attempts to transition the live update orchestrator state from > > + * %LIVEUPDATE_STATE_PREPARED to %LIVEUPDATE_STATE_FROZEN. This function is > > + * typically called just before the actual reboot system call (e.g., kexec) > > + * is invoked, either directly by the orchestration tool or potentially from > > + * within the reboot syscall path itself. > > + * > > + * Based on the outcome of the notification process: > > + * - If luo_do_freeze_calls() returns 0 (all callbacks succeeded), the state > > + * is set to %LIVEUPDATE_STATE_FROZEN using luo_set_state(), indicating > > + * readiness for the imminent kexec. > > + * - If luo_do_freeze_calls() returns a negative error code (a callback > > + * failed), the state is reverted to %LIVEUPDATE_STATE_NORMAL using > > + * luo_set_state() to cancel the live update attempt. > > The kernel-doc comments are mostly for users of a function and describe how > it should be used rather how it is implemented. SGTM, cleaned-up. > I don't think it's important to mention return values of > luo_do_freeze_calls() here. The important things are whether registered > subsystems succeeded to freeze or not and the state changes. > I'd also mention that if a subsystem fails to freeze, everything is > canceled. Added > > +/** > > + * luo_finish - Finalize the live update process in the new kernel. > > + * > > + * This function is called after a successful live update reboot into a new > > + * kernel, once the new kernel is ready to transition to the normal operational > > + * state. It signals the completion of the live update sequence to subsystems. > > + * > > + * It first attempts to acquire the write lock for the orchestrator state. > > + * > > + * Then, it checks if the system is in the ``LIVEUPDATE_STATE_UPDATED`` state. > > + * If not, it logs a warning and returns ``-EINVAL``. > > + * > > + * If the state is correct, it triggers the ``LIVEUPDATE_FINISH`` notifier > > Here too, you describe what the function does rather how it should be used Fixed > > > + * chain. Note that the return value of the notifier is intentionally ignored as > > + * finish callbacks must not fail. Finally, the orchestrator state is > > And what should happen if there was an error in a finish callback? Scream, warn, panic, we cannot allow running a system past liveupdate, if some state was not properly passed from the previous kernel to the current kernel. This may result in catastrophic memory leaks. > > +static int __init luo_startup(void) > > +{ > > + __luo_set_state(LIVEUPDATE_STATE_NORMAL); > > + > > + return 0; > > +} > > +early_initcall(luo_startup); > > This means that the second kernel starts with luo_state == NORMAL, then > at early_initcall transitions to NORMAL again and later is set to UPDATED, > doesn't it? In the next patch, in this function we transition to UPDATED. So, technically, we go from NORMAL to UPDATED. However, I added UNDEFINED state so, in this function we either go from UNDEFINED to UPDATED or UNDEFINED to NORMAL. > > + * @return true if the system is in the ``LIVEUPDATE_STATE_NORMAL`` state, > > + * false otherwise. > > + */ > > +bool liveupdate_state_normal(void) > > +{ > > + return is_current_luo_state(LIVEUPDATE_STATE_NORMAL); > > +} > > +EXPORT_SYMBOL_GPL(liveupdate_state_normal); > > Won't liveupdate_get_state() do? Yeah, we can simply return state, and let caller to compare. However, I think, caller is only interested if this is normal state or if live update is in progress. I will keep them, and also added liveupdate_get_state(). > > + > > +/** > > + * liveupdate_enabled - Check if the live update feature is enabled. > > + * > > + * This function returns the state of the live update feature flag, which > > + * can be controlled via the ``liveupdate`` kernel command-line parameter. > > + * > > + * @return true if live update is enabled, false otherwise. > > + */ > > +bool liveupdate_enabled(void) > > +{ > > + return luo_enabled; > > +} > > +EXPORT_SYMBOL_GPL(liveupdate_enabled); > > diff --git a/drivers/misc/liveupdate/luo_internal.h b/drivers/misc/liveupdate/luo_internal.h > > new file mode 100644 > > index 000000000000..34e73fb0318c > > --- /dev/null > > +++ b/drivers/misc/liveupdate/luo_internal.h > > @@ -0,0 +1,26 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > + > > +/* > > + * Copyright (c) 2025, Google LLC. > > + * Pasha Tatashin > > + */ > > + > > +#ifndef _LINUX_LUO_INTERNAL_H > > +#define _LINUX_LUO_INTERNAL_H > > + > > +int luo_cancel(void); > > +int luo_prepare(void); > > +int luo_freeze(void); > > +int luo_finish(void); > > + > > +void luo_state_read_enter(void); > > +void luo_state_read_exit(void); > > + > > +extern const char *const luo_state_str[]; > > + > > +/* Get the current state as a string */ > > +#define LUO_STATE_STR luo_state_str[READ_ONCE(luo_state)] > > IIUC you need the macro to have LUO_STATE_STR available in all files in > liveupdate/ but without exposing luo_state. > > I think that we can do a function call to get that string, will make things > nicer IMHO. Done. > > > + > > +extern enum liveupdate_state luo_state; > > + > > +#endif /* _LINUX_LUO_INTERNAL_H */ > > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h > > new file mode 100644 > > index 000000000000..c2740da70958 > > --- /dev/null > > +++ b/include/linux/liveupdate.h > > @@ -0,0 +1,131 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > + > > +/* > > + * Copyright (c) 2025, Google LLC. > > + * Pasha Tatashin > > + */ > > +#ifndef _LINUX_LIVEUPDATE_H > > +#define _LINUX_LIVEUPDATE_H > > + > > +#include > > +#include > > +#include > > + > > +/** > > + * enum liveupdate_event - Events that trigger live update callbacks. > > + * @LIVEUPDATE_PREPARE: PREPARE should happens *before* the blackout window. > > should happen or happens ;-) Done > > > + * Subsystems should prepare for an upcoming reboot by > > + * serializing their states. However, it must be considered > > It's not only about state serialization, it's also about adjusting > operational mode so that state that was serialized won't be changed or at > least the changes from PREPARE to FREEZE would be accounted somehow. By serialization, I mean is to save their state, but I agree, the devices and resources are also should be in a limited state where the serialized data should not be altered between prepare and freeze (i.e. no memfd resizing, no new DMA mappings, etc). Thank you for your comments. Pasha