From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A010C7115A for ; Thu, 19 Jun 2025 14:23:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA5836B007B; Thu, 19 Jun 2025 10:23:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7DAC6B0088; Thu, 19 Jun 2025 10:23:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBA616B0089; Thu, 19 Jun 2025 10:23:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AC5466B007B for ; Thu, 19 Jun 2025 10:23:33 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id ABC881A02EF for ; Thu, 19 Jun 2025 14:23:32 +0000 (UTC) X-FDA: 83572368264.24.8FE1BEC Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf03.hostedemail.com (Postfix) with ESMTP id DF9992000C for ; Thu, 19 Jun 2025 14:23:30 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=N878dJ7w; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750343011; a=rsa-sha256; cv=none; b=x+RgUVQt6EGyQvmfowQWYBxt6YgA2U0ms3jhvh0kIXmSW1se4rQrTH/zFrI6+rsuCJr+0x GFF/2iDWV8fCc365ZDVfnkipoDT5av/hDzMj2Brznb0fJQtbYND7s3HitqmJg+tI+Q36D6 7wiJSULbY2ode5ICS8WQaZGj4vCyCkE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=N878dJ7w; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750343011; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5h1xIOfluYgc0C0Xb3/EzcpMoWOgLa8QXInkgO2QKQY=; b=TJd/PhwEZLRISiUikPW0Bq9Nj3gg8ZolnVmLd8hQvrOO8Cs3NmPxzBdQQlcreSs++UeI7m cXHirg4EnRZh+kVV7c9r1ktDCrhObKvc6q3AKjratMp7qYDXOOTsHBREHTCo8SC4Kkxn8v 9jwLKvKjMqQEaeIDXjYGD6wljlcOVE8= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-4a6f3f88613so7263441cf.1 for ; Thu, 19 Jun 2025 07:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1750343010; x=1750947810; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5h1xIOfluYgc0C0Xb3/EzcpMoWOgLa8QXInkgO2QKQY=; b=N878dJ7weAkJujiL6YP+YN+l4X22n/gg0VBtSNb54lu/7zLPvcJZFKuMdBhcl1324X 2igDK7VJ7UGftl8vDLXnMB7xLiQv7YxSi6eoGcRUsrRLy9R/80quqPM+exI1K0Q3ZIWr DYtPhuh7uJASWa0VppHX9ZkiQphc2y1iMgrA7C+WNbaijS7sQMCWv8+cdjlJTwHs5RiI pYAGNI6V31JcvlX0aNYQZHhpsd+nLjmLBjSCB6k5gLS82mEhNO037e9eHsA3SO08FrhW w++5mN3wkD+UMA0Ff3xbID8sIgIOfNdroh4V3ucwMjh4bs931fsKCqB9EwJbdyv43ypf Dg5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750343010; x=1750947810; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5h1xIOfluYgc0C0Xb3/EzcpMoWOgLa8QXInkgO2QKQY=; b=Gm4WrlmDCBIgYTMd6R+8lEE6XzYVhgVWShfQxQ98Ew98wjG4WskXd4ytejy2SP/7il oy+ATO/imrd0M5NSQd1e2UF4ee+M/xiwDtZkrChQLe/XNOZSztJZeZaWkF4nk6VDLvdY 7+IINQS8NrHI3YJh7vkFEiTMAFNO96ImjV60y1JNvPSzICkLQhmLuadpnthTtNEhtJsM 8Etyw+UlndhRkQxoI3oCgmLV/lJ5aXkmjy4oMm1fEVM1PwahoRSdXZwpb+Lw2gIQfihr KAAAqifBejqSe/ZFIpbV76m/ZlKkcZNgdMjLBofvyH8+9py5G0KRMijdt9b+ALIG++SQ m9mA== X-Forwarded-Encrypted: i=1; AJvYcCWZEWIHnI0r7sxIbxB+o9fNUCGLVe7RfQXEJcYWBw63eHTNl0w9KdRwo8A/BV0vswFB/zkLJYofEA==@kvack.org X-Gm-Message-State: AOJu0YzQWE2+0a0h9uoCycfkdo24pXCxes3P2aiGQ/pNLHLG/QDqGpTy i4ivWfwTtCKTUV/a+5shZHbF/mI4iEYOZ2oK1d8G8au+zL1VmLXe440nq+b2bM1dQpe97kPtdLs hS3x7YBEXdy/+7KLE3/STjaz4C+SdKu7uQrE1sdJPkg== X-Gm-Gg: ASbGncu94OGXBgNe85Zsb3YbAcnyEvhPyRc2q8Ik0+ltFV88Zklu49i7PgEP+jRRrge j011Dp1B7B3UxOdoEuSV5k6lmIJwQpJEyE3rDlc8Jbc2K/p3Revnns0ljXvu5hrTQ6uYhNf+MDe /xIGArUW1PS5ebjTmGehVKfLDKzG0eUOu5k5cScJQ8 X-Google-Smtp-Source: AGHT+IFgDl+SFbvTz0iLbkBSTl7G9nUIsLfEJdsuadZhl2TiLu62uOrOOfIf/Aq7YpsmL3SKzsl9GriFBOi6dVUWWps= X-Received: by 2002:a05:622a:1a22:b0:4a4:30e7:782 with SMTP id d75a77b69052e-4a73c55c3e4mr340017061cf.18.1750343009703; Thu, 19 Jun 2025 07:23:29 -0700 (PDT) MIME-Version: 1.0 References: <20250617152357.GB1376515@ziepe.ca> In-Reply-To: From: Pasha Tatashin Date: Thu, 19 Jun 2025 10:22:52 -0400 X-Gm-Features: AX0GCFtlezNxlPEWS3rX1awTVjMBIkf1jPaBcWwfqMJaBq_78qG-UYL3Q5zgW3Q Message-ID: Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO To: Mike Rapoport Cc: Pratyush Yadav , Jason Gunthorpe , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: DF9992000C X-Rspamd-Server: rspam10 X-Stat-Signature: hfoo18ujrfe8qoxz8utit96azjhfoboz X-HE-Tag: 1750343010-595674 X-HE-Meta: U2FsdGVkX188gbQO2Ho+UOeYiw939ToWpZ82O5c/VI8PaA1M67XiILJBdgCPnapbDElLDsziSiAogueFiy5qOJ68eQCTFeAjJ9YCpvwaM/qdh2AfMq4s33jjspPWnIrBbiNtljmqw2xlVUmxA2iH1Pxk0uXnbBvDOAg4vrCj0CqvqJ2Sd4eithaFG3F1dUFwJqkSR/+8J/GotxJdTlmbqKq0fmooGC55qCfbqDDi9qxmwAsDo115pRpO/aVU4Bkx3BemUXS6A0uaHuSbYglIwtUxo4/iy6W80J60L0ABWH/ZqOthp58BMa/Btfu0xRZd1iFx30pyJ0VCtw4LZRyEP9PDLTtYRFyI4rw9bUj+xcjvdmbyCoHJuoBeoZmmiWej2FW6JY0eRYtRfFEfUt0LoahsveyXVHlLqb//8RpfBMjXEv1xEyr0rRUreC1sOF38L7nFk1On77p2Iqs2DeKIgfiia0mxFx9LhjffreQeu7GEUMo/gJ4jz6P0/KmpLKkzcR9a/7j08JgHkFqyxyJUNDQ8Ih3/MGcoMdI9TlNRrKAQ8Qc/1iVO8L7sa+/Ufhs2ceHQc7IzJRZvkEibmxrqN5+9jiPPwdKhnymRg2U+G7IUFlaIZRykUBQ6O6XyJVIlf8wQ3/8AUyhPU8dZJ5xUbzqb+ZvVp7sAReN2rEESEQgdHFLlAcgQsEGgC1qXLR+gL+ohnSD/Z3Vvic+O+N5zImiP3+55WOP/gs5aJnSq2V1Mc6EFubkQr/VafI4e9JMkZBL8e3uo3hCdcXkqQgXs0LQiRMXTSHQNSEAuS+jn5HOHoZ2RZH1Ya7YztVuRVqgBxH514j2wwdRVD9LkTPtq1YHFjkHQc8oDcujXDzsaO+1+YeM89V95LKcgDAWbksnzs0DparM2455LGIvi/UeVAq3wwUjHC7ILkIrHEvp08A809Q/RgXmap6zPL7PUhm1Sz64D8XY1w8sFN2YmPKm YmbZx4Sk T0qKf6iYzby43RwFzxW7V3Gpq4N7ltUBE/k7R+C1JMsXezesOIKiAtYbq+iubkRITxsCOhioWClVLrtorAr5fC+Aw8M6cADHRR1rSW2QI1CSeDyY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > > I disagree, LUO is for liveupdate flows, and is designed specifically > > > around the live update flows: brownout/blackout/post-liveupdate, it > > > should not be generalized to anticipate some other random states, and > > > it should only support participants that are related to live update: > > > iommufd/vfiofd/kvmfd/memfd/eventfd and controled via "liveupdated" the > > > userspace agent. > > But it's not how the things work. Once there's an API anyone can use it, > right? > > How do you intend to restrict this API usage to subsystems that are related > to the live update flow? Or userspace driving ioctls outside "liveupdated" > user agent? Hi Mike, LUO provides both kernel and user APIs specifically for live update scenarios. Live Update is an ability to reboot kernel while keeping some devices operations and FDs intact. That is the only uAPI that LUO provides, It enables users to preserve resources via FDs for memfd, vfiofd, guestmemfd, kvmfd, eventfd, and any other supported FD. It also provides a well defined state machine for user to add an retrieve the resources, and for kernel to do proper serialization of these resources. Since this is the only uAPI that LUO provides, I do not see how it can be used for other scenarios. > There are a lot of examples of kernel subsystems that were designed for a > particular thing and later were extended to support additional use cases. If that ever becomes necessary, either the core part would need to be moved out to be a separate thing, or a separate state machine on top of KHO targeting that use case would need to be developed. Currently, I don't see an immediate need for this, especially if KHO itself is updated so the state machine is removed, and therefore finalization is not required. > I'm not saying LUO should "anticipate some other random states", what I'm > saying is that usecases other than liveupdate may appear and use the APIs > LUO provides for something else. > > > > KHO is for preserving memory, LUO uses KHO as a backbone for Live Update. > > If we make LUO the only uABI to drive KHO it becomes misnamed from the > start. > As you mentioned yourself, reserve_mem and potentially IMA and kexec Kernel-internal components like pstore/reserve_mem or IMA do not require a uAPI to drive their KHO interactions. They can, and should, directly use KHO's kernel-level APIs kho_preserve_folio() and kho_restore_folio(). KHO itself must offer these preservation primitives, rather than embedding a state machine that dictates a single "finalize" point for all users. > pstore can use reserve_mem already. That's good to know; I'll investigate how pstore currently utilizes reserve_mem. My current approach involves reserving the memmap for pstore via kernel parameters. > > So currently, KHO provides the following two types of internal API: > > > > Preserve memory and metadata > > ========================= > > kho_preserve_folio() / kho_preserve_phys() > > kho_unpreserve_folio() / kho_unpreserve_phys() > > kho_restore_folio() > > > > kho_add_subtree() kho_retrieve_subtree() > > > > State machine > > =========== > > register_kho_notifier() / unregister_kho_notifier() > > > > kho_finalize() / kho_abort() > > > > We should remove the "State machine", and only keep the "Preserve > > Memory" API functions. At the time these functions are called, KHO > > should do the magic of making sure that the memory gets preserved > > across the reboot. > > > > This way, reserve_mem_init() would call: kho_preserve_folio() and > > kho_add_subtree() during boot, and be done with it. > > Right, but we still need something to drive kho_mem_serialize(). My view is that an explicit, global kho_mem_serialize() call driven externally (like by LUO or debugfs) is not necessary for KHO operations. When kho_preserve_folio() or kho_add_subtree() is called, KHO itself should perform the immediate actions required to ensure that specific folio or subtree metadata is staged for preservation across a kexec. Similarly, kho_unpreserve_folio() or kho_remove_subtree() (which is currently missing from the KHO API) should immediately update KHO's state to reflect that the item is no longer preserved. > And it has to be done before kexec load, at least until we resolve this. The before kexec load constrained has been fixed. The only "finalization" constraint we have is it should be before reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations during kernel shutdown are undesirable. Once KHO moves away from a monolithic state machine this constraint disappears. Kernel components could preserve their resources at appropriate times, not necessarily tied to a shutdown-time. For live update scenarios, LUO already orchestrates this timing. > Currently this is triggered either by KHO debugfs or by LUO ioctls. If we > completely drop KHO debugfs and notifiers, we still need something that > would trigger the magic. An external "magic trigger" for KHO (like the current finalize notifier or debugfs command) is necessary for scenarios like live update, where userspace resources are being preserved in a coordinated fashion just before kexec. For kernel-internal resources that are unrelated to such a userspace-driven live update flow, the respective kernel components should directly use KHO's primitive preservation APIs (kho_preserve_folio, etc.) when they need to mark their resources for handover. No separate, state machine or external trigger should be required for these individual, self-contained preservation acts. > I'm not saying we should keep KHO debugfs and notifiers, I'm saying that if > we make LUO the only thing driving KHO, liveupdate is not an appropriate > name. LUO drives KHO specifically for the purpose of live updates. If a different userspace use-case emerges that needs another distinct purpose (e.g., not to preserve a FD a or a device across kernel reboot (i.e. something for which LUO does not provide uAPI)), then that would probably need a separate from LUO uAPI instead of extending the LUO uAPI. Pasha