From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA795C7115C for ; Fri, 20 Jun 2025 16:04:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C8816B008A; Fri, 20 Jun 2025 12:04:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A0326B008C; Fri, 20 Jun 2025 12:04:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B6FC6B0092; Fri, 20 Jun 2025 12:04:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5F1666B008A for ; Fri, 20 Jun 2025 12:04:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D08B4BE560 for ; Fri, 20 Jun 2025 16:04:39 +0000 (UTC) X-FDA: 83576251878.21.B66DCDE Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf04.hostedemail.com (Postfix) with ESMTP id 0A60A4000D for ; Fri, 20 Jun 2025 16:04:37 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=IzGnEkN0; spf=pass (imf04.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750435478; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KBcGLgy3ZycCDoT7xlJdzPZ1CdI0tHu42dibypRo3EE=; b=XPS34erFTd9xnfLzeCi6HHCR3hwTMzp9Gf+O3SXKTMd4EdCvfyXqcdnUEP747djzzrimsP VIfD6/V/Rs5WIV0scp6+Me4PoOGEKouqcWtpGuzAR+w+84djpvC+mTBfRth5qhB/l0VxlK R/CYYtuvS5doCuJjkWOtASj+u+b+Fz8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750435478; a=rsa-sha256; cv=none; b=sPS3RF1I3MkSelP7ODgF9f50vTb2QGOjguI2HpZO8tZ7svs9VUNCf1Cx8nfF85kpX7+jVD zburoS27TQJJQWCmtsXzeuNqqrch0qXKlPlJslPyZhFMCw+QXP23zavIO5XBXkFz9ls9Xo WI7guXvfmkjwldqWVWCP3AzfKJOEe/M= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=IzGnEkN0; spf=pass (imf04.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4a58f79d6e9so25368901cf.2 for ; Fri, 20 Jun 2025 09:04:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1750435477; x=1751040277; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KBcGLgy3ZycCDoT7xlJdzPZ1CdI0tHu42dibypRo3EE=; b=IzGnEkN0Rt/thw6VtYIYJJxOJU4/ZIANyh8gDyvl7adaa1+vPipKt36/Iko1q8rWip AEyu0686g9k+MsPuYy48sKgtnsHdkG7r84U6f0Izfbers7D1fGeVGtYV4HXlxueGvIob KRZ++kgbltHy1ULJbltQSfX2cBxFHRgsBz6u2n10S/TrIzMXC2VdWQkccMysFQP3mx3e YDTNdtZixnwEWFLa2l3w/H4ki6rtwFJKAfdWKDaaYUlwRoceY2Omztldn/ux8yNFDRj8 5w0vTkE5g9TBRmCD1QPwBpGq4CQPuwfoDE8NqJ/1eJEHPS0NmOFXIaakpJ8xuFWTT+2i Youw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750435477; x=1751040277; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KBcGLgy3ZycCDoT7xlJdzPZ1CdI0tHu42dibypRo3EE=; b=VsiMqqfh7cOOtT05mLQ6ULC0ISFSqCbYmk6K48c0oPTtm3d/lHiWn1j+jJ8uSQk//r jNfq0OgGhtrFNiF7wkip/0x5JqF7lX2zRhQXcryABQecOd9NO1YMIzOxPWkABddlO+3S HlTcaBRaqM1Uj4KNTv1QPt/6Hv+363bEYfcSuq4J9TmpPTKGqpM1aYQT5D1lGGKa5unF MiTo4Fv6vbKKT9zJACUC5MsXes0clGkL7xE2VRRXoSadzniCjVD/2+D34TmsJV6SCqCg 4tKlSosrnVh8Ll5nsuG5D/N1xwbFCyoQKdYWl/fDVGk/fVN4/3SGPB5CEOfViz4+7aVa 0YGA== X-Forwarded-Encrypted: i=1; AJvYcCWLW5a7OaHwtS4QJ2AZKpe1LjMUt09DbT9Hg/BVauYa40w56NDvcjYVePKZsZ+iv7w4lCzlCbkW2A==@kvack.org X-Gm-Message-State: AOJu0YxQq7fTl0wHv1EO5DuvyTbfS88gkrsBjLbdG1DpT0itwvPrQTIM sgb7+YLEvG4d9zwYZuWY7UNEeiV6hNbhXWEfmDWfareJRFKqvgu6NBaSR4K4dV+GDaOmiujy+ia hHNSIYUkNm2LjrDAUlgkUyVqOvRx7X2BvbfzjAVCnXA== X-Gm-Gg: ASbGncuKf/7miZsBPmDVCTKml+cS5xYra4mYrTszmfveu+ikXLWgim00JrIDrxAuYw+ lehRvo3YR9m/Mpm8K74u8/EvaHU1Q68fbQE3fp01CWRkp28EHlkD67zC0A9V0BsdSuPoykhS8mJ GxvhLyT245FCoTnjIxUNCNMjNCYkMTZQjujuJyui77 X-Google-Smtp-Source: AGHT+IFYMkrIV8rD7Pif2HmczDwpVC2FXm+jHHb0EasflVe0vI4Kty5fEm7Qn6/47uvi0pcg9jQaBOvaY3LbpyffyhQ= X-Received: by 2002:ac8:5e4b:0:b0:494:adff:7fe2 with SMTP id d75a77b69052e-4a77a24907bmr54563871cf.43.1750435476818; Fri, 20 Jun 2025 09:04:36 -0700 (PDT) MIME-Version: 1.0 References: <20250617152357.GB1376515@ziepe.ca> In-Reply-To: From: Pasha Tatashin Date: Fri, 20 Jun 2025 12:03:59 -0400 X-Gm-Features: AX0GCFsQhPGk-ls7Li1zkHXTCt1at0pTBUhnscC7tj8uZk3G-noKoK_lGxaGFUA Message-ID: Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO To: Pratyush Yadav Cc: Mike Rapoport , Jason Gunthorpe , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Stat-Signature: jwyctm3q4r9yat3a4ddcdowdcu9ih6qz X-Rspam-User: X-Rspamd-Queue-Id: 0A60A4000D X-HE-Tag: 1750435477-714443 X-HE-Meta: U2FsdGVkX18Pj6vUs1L6pkMdFOY3KEtG0jWxDLC+3+eDGldaZ4HQNJU01MGIO6CHMiTFlYb9qT4txzY8C2fN/cIvvutleJV9El3w2fsx6nUReg2vDBF/AiDsneL5K/Yq1Q9dE6QL0F89TFBkYKUc/zNHRckITU8YL3b9KlUZyekiS107Uog0qBjDMNT3J7kY5Dtxl75vGHigRLqZZxLP4fl1HFfw2ArRVBQPD7atBAdoFvE01jthg/KUvkltMh/JtcjtHpBYbv42m7Q6DTAk8J+7yeqfVivtfGbYkzlt354UJtAbpj2DCj5P5oWdufUIxjg436YGUlsNd/mDrTZSgZXJjuZH9tgd8DXPLLqk1nZV190UyNlrP+qa4Cq8Bk8dZWLTkrl4vT1ThlFOSdkbD8ak2CeQCYNfxcvK/ikl19eznko3036UuuCyX86FqOS9SqrmzjO3T/4VaJQ+SN+qQS4kGVV9p8/P3g70KFCBtsHxmfo+bDcC/YJ0uvQmZe1R6Tgd/7Hkws86JM1o8e4mHb6GJaJid6iVwZMIXPEnAQM9lJaTKZgwNbkP1V4sSPLdmwg5TbKQi3KsBtJ7NUz5pVQIKV3fM26oqZ4QA0Gsl2sAxpMQn/dTtf0D1uDQ1hJUD3B8j3/nRfLcbhAxggCG31GF8zoXoc1JwNh9MBNGJQmVtIb320FTXvHjlpzybTADq9nZYzGB6U9V7AavJsbfZU/Sd/dhM/TlEVW9rAQzbdsuy4KEIhWI//Kvbzvy9hOCcdHijt+jech565gyMQIZ0A35Tnzcqri2+eEw2F5fpwbWoKfm+PIPXIms9EIWYSflvr2T5d+mBOHSnDjKs/+8py5ARRZ007znWywOfCOlz6jH6iP14iFxZ2WPpHIajdTqG6wT1bUt4UqAjlUKpkLkrJdbKByQLBq0gGOhsMyjlqi41mpbVPxo2aRXXOf6njnrPP5P/z8wFy8byRtgf6y AiK6Mgqq V/y/QPAum5NO4Qg8wfw6AcDra77kXVCOk7+QhcjH0d0pGwbC9vpcQjU1+y9QzX7UhhLnNSMjCmQhXUd1ly5fqCjUcPxAlxZUVqHEy2RMIPPy5N8A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 20, 2025 at 11:28=E2=80=AFAM Pratyush Yadav wrote: > > Hi Pasha, > > On Thu, Jun 19 2025, Pasha Tatashin wrote: > > [...] > >> And it has to be done before kexec load, at least until we resolve thi= s. > > > > The before kexec load constrained has been fixed. The only > > "finalization" constraint we have is it should be before > > reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations > > during kernel shutdown are undesirable. Once KHO moves away from a > > monolithic state machine this constraint disappears. Kernel components > > could preserve their resources at appropriate times, not necessarily > > tied to a shutdown-time. For live update scenarios, LUO already > > orchestrates this timing. > > > >> Currently this is triggered either by KHO debugfs or by LUO ioctls. If= we > >> completely drop KHO debugfs and notifiers, we still need something tha= t > >> would trigger the magic. > > > > An external "magic trigger" for KHO (like the current finalize > > notifier or debugfs command) is necessary for scenarios like live > > update, where userspace resources are being preserved in a coordinated > > fashion just before kexec. > > > > For kernel-internal resources that are unrelated to such a > > userspace-driven live update flow, the respective kernel components > > should directly use KHO's primitive preservation APIs > > (kho_preserve_folio, etc.) when they need to mark their resources for > > handover. No separate, state machine or external trigger should be > > required for these individual, self-contained preservation acts. > Hi Pratyush, > For kernel-internal components, I think this makes a lot of sense, > especially now that we don't need to get everything done by kexec load > time. I suppose the liveupdate_reboot() call at reboot time to prepare > final things can be useful, but subsystems can just as well register > reboot notifiers to get the same notification. Correct. If subsystems unrelated to the userspace live update flow, such as pstore, tracing, telemetry, debugging, or IMA, need to be notified about a reboot, they can simply register their own reboot notifier. > >> I'm not saying we should keep KHO debugfs and notifiers, I'm saying th= at if > >> we make LUO the only thing driving KHO, liveupdate is not an appropria= te > >> name. > > > > LUO drives KHO specifically for the purpose of live updates. If a > > different userspace use-case emerges that needs another distinct > > purpose (e.g., not to preserve a FD a or a device across kernel reboot > > (i.e. something for which LUO does not provide uAPI)), then that would > > probably need a separate from LUO uAPI instead of extending the LUO > > uAPI. > > Outside of hypervisor live update, I have a very clear use case in mind: > userspace memory handover (on guest side). Say a guest running an > in-memory cache like memcached with many gigabytes of cache wants to > reboot. It can just shove the cache into a memfd, give it to LUO, and > restore it after reboot. Some services that suffer from long reboots are > looking into using this to reduce downtime. Since it pretty much > overlaps with the hypervisor work for now, I haven't been talking about > it as much. > > Would you also call this use case "live update"? Does it also fit with > your vision of where LUO should go? Yes, absolutely. The use case you described (preserving a memcached instance via memfd) is a perfect fit for LUO's vision. While the primary use case driving this work is supporting the preservation of virtual machines on a hypervisor, the framework itself is not restricted to that scenario. We define "live update" as the process of updating the kernel from one version to another while preserving FD-based resources and keeping selected devices operational. The machine itself can be running storage, database, networking, containers, or anything else. A good parallel is Kernel Live Patching: we don't distinguish what workload is running on a machine when applying a security patch; we simply patch the running kernel. In the same way, Live Update is designed to be workload-agnostic. Whether the system is running an in-memory database, containers, or VMs, its primary goal is to enable a full kernel update while preserving the userspace-requested state. Thanks, Pasha