From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6DEFC7115A for ; Wed, 18 Jun 2025 17:43:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 568CB6B00CD; Wed, 18 Jun 2025 13:43:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5195B6B00D4; Wed, 18 Jun 2025 13:43:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 392BA6B00D5; Wed, 18 Jun 2025 13:43:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 27FA66B00CD for ; Wed, 18 Jun 2025 13:43:58 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E1EEA120852 for ; Wed, 18 Jun 2025 17:43:57 +0000 (UTC) X-FDA: 83569244514.16.D84FA7B Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf27.hostedemail.com (Postfix) with ESMTP id 0A9F640003 for ; Wed, 18 Jun 2025 17:43:55 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=lOxvRLdy; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750268636; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ul1/ZS0C5mypt1EEBu+dbdiPXf41CMPEBGbHby4QA+I=; b=S21NHXFUrSlL3VFAHD7bdtwo2GQ4z7RLbGYxMLFWKevnyMOVaOojoBnOCx9VGaLmr1SMR1 SADVGuTctvM78Y106t1XmFJFfl/ETks6y73TSvoxTiIimi9O4MEGEeexSHkmoBxogAS1zk 8d7/1ztYTNkvf/1o2M0ntzfS+3mvavQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=lOxvRLdy; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750268636; a=rsa-sha256; cv=none; b=SiSfxjkABr6cF97MOsdjFq6sfUWnN9GXjKdgKLF9HVwfxHadAW6mpWab1cP2CZfMz37pbf BRjcQeVuemUnGVucssrb6n7hTTHssblqnEl+6/Nm1NDUukyv70LqODY3dKCv5WEXDXCpEy Ps1m9cnO0qZX0g4yKo3ZfIzU+AwwlT8= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-4a6f6d52af7so77820991cf.1 for ; Wed, 18 Jun 2025 10:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1750268635; x=1750873435; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ul1/ZS0C5mypt1EEBu+dbdiPXf41CMPEBGbHby4QA+I=; b=lOxvRLdyGMqgHBixYuOdOaQJ9x1hMCK8pDv57cCRTxCghQSNE3PdWQb/hORIhHZ3zl ufyEiOpGi1FrM9fnXDJx+LhRGTeUjzlDHMdZjLBQd53NRoSGcmemw8Jue2yUvbpNhXL1 TMdUAezQdJ/+HEE7w+MtKRDecUDkB8E/e+FHe2ohI7scFkRceHNAqq7EHrpjSoHJ+DDX afGGXQkGTyePjpg9YxYVP3zucFOaw8BQiZ0kBSTBnblTL+OjJ4jd8nL61xElf4g8bkGm eBn14jdxoJQhKthW8VHcsdELxjetdzP8tRx9G+z3Jv0CdsZ+p3K/ma4V45NiPXFWYN83 q0Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750268635; x=1750873435; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ul1/ZS0C5mypt1EEBu+dbdiPXf41CMPEBGbHby4QA+I=; b=bzOW1imMx9hMHy/XYngMgQwedOLYMrQrXSj3QDjv5ivZzqwL/iELoRrXMjxDGNr0+2 A14VBvXmT1usgKGBGgETWmBY83IATAnPuZKqyFxp8p6KMcNHhZVTQlkmmOIjbNRIBKox /RVWDYvALh/Yp6MjpsJTRszA9CpV08s/NtC+/PKQ6Gu7pvPiE/HJVee+tETHOYu82sV6 9qKKmfwBORGih4O68n4beB+Kewun9OG38y9HA4QGTRKD0K18O/yTFJe1LRQo/O07vPFb TzqIqjgS0ka/VezwN0x8R9Cj/0kNvOAYGKe+2tkiTNjkvglPieGryeS6fdEvaDxdzf1y uD6A== X-Forwarded-Encrypted: i=1; AJvYcCXMkoP9jRaO7qSI/3HHXzP/tWbRIer/XbMBPs6aJXcTy38kNawUVrgsnaT4C2kIQ0yrUh37gfcKOw==@kvack.org X-Gm-Message-State: AOJu0YygEe+VVs71rJuV4+lPMbChHlp9yjxvAs+6hoTW0dP/xa0KfPS5 L1eQMOaZqIoFmXXJnwL1hmbtla/xOHEysx/HzmOgBtGsKIIMNT/3OHkELF/Ys2GAtbnB4ts1P0n 3U7JSPooxMdGbU/nXIkZJcVOMajPVkAxxKmixLqwHYQ== X-Gm-Gg: ASbGncv6SAbWpo4Tnm86QxZs0S5LW8ZkvUFlib6Br8j1XQfn1FbvDAlgy0IA1NSzYHV wlyM/45PjUdPhNMhbX6Qinxn2Kqnob/wLTbck/bizp7SXdcC2/v0D++KjM2VVvrr+rtfU7Aav3G /a10TpDWlcQkRSQimh+YF7TSHR79bJeiZKBqPrldIo X-Google-Smtp-Source: AGHT+IFu5AzLPnTG+aRgFuuvEF2tLu87GBBdz0yPRuiaVFppqqiHFLRASTkXxhdOGgW2cjTM53gFmEJMm1pYRljeP8w= X-Received: by 2002:a05:622a:82:b0:494:abde:2aa3 with SMTP id d75a77b69052e-4a73c55d18amr291709221cf.18.1750268634869; Wed, 18 Jun 2025 10:43:54 -0700 (PDT) MIME-Version: 1.0 References: <20250515182322.117840-1-pasha.tatashin@soleen.com> <20250515182322.117840-6-pasha.tatashin@soleen.com> <20250617152357.GB1376515@ziepe.ca> In-Reply-To: From: Pasha Tatashin Date: Wed, 18 Jun 2025 13:43:18 -0400 X-Gm-Features: AX0GCFvgJpPcHTvmz3P0lUtBPY4Ij_RTnGTbaB_awtvHJcZktFGTbkb5RAtj-a0 Message-ID: Subject: Re: [RFC v2 05/16] luo: luo_core: integrate with KHO To: Mike Rapoport Cc: Pratyush Yadav , Jason Gunthorpe , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0A9F640003 X-Stat-Signature: nfqikkru6iw9dittq7793c6rrt7f5ijk X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1750268635-483539 X-HE-Meta: U2FsdGVkX1+V0Ul6F3EerlO1+02vwb+Cwyqjs1BY0F/75RQacAAUpZVhyaNXwYSRkMXJc6iFaCExsao+8dc0l0h0fHBp0L30MUwTG8lOI3D4Jh9v8HaCDIdIFX4+fdFTi1hr07fTdpuZTxjBYv9hx5wzxDHLQeWUkyW97qp6vmdrk0s5Cw9EdEzCQxG9EzOFWWVGSDiJi/eSPkNjMgT7kF3Qt2Pz0j/7+63yUzt0cZ4gfzxqpOWWC1PAewOoyNZ9YOnUe6aZnYuZube/QM3jPeny7hloJ8cGkHqfkOFgzl0+UQnWUkNWdtum/Tiz4rkfOBFTjp20iZKoMNERDlMkPFdzW9KLml+44+bqZid56h5OC1sLYX+kvp/C80/qCvIalWGUf5B8U1c4JVfFmDX21CcWQvrGdfJIdSCh9OZtCXOq8B8gbOnPNQuX22AzuVdRy3BjnHCI4qNnftPbmvKuQjae3ZnhqXbbbFKVnelMXgj8Cw5XqbD+jIXCYU5QEqGoZdbCrVUvmkyTCldJyHCxkJVijQBcL5jpCDZqXdigh/jpLLPtnV01tJGsglaG6QmBfSl63I51Go18dRtfIde27QNtO8b6tJYefVJ8KqUOsdKSlaVCOuPWe+xdcHaAqvUYtqho8Cg9E/k4zQcqis1DbVJLYEmht2+vncRpwN492RbU3CFwIlFJKzNH6rSD2EBhOzntCASl8Fz4hmIah+KH+SQuRjYilPViHKi5GktzTyc4E59B/ApX8A+ywre1Zrft6FraGYfK5bmLA3ljgwWis/6J89lYc7E3I2pXWfYmqNBkq6nwevasIHWca4t4TEponHNKqbgJO71ozX0TzzQ/1nQCkRqTxGZHYFo5kL52evoV7iQOihnq0xUXuPwCnipaeseLly5yQTq6reiT3IEgl3UB0uYEVp8DU0EeXYVGzot6Fc2Ej6wyiQbYv8MHUUQDBdGAC/versp9AjxkRxk bm93cFK5 y0rk+04Lk9NCD6m7rw8bPax7QhszLt62fHM12ZeHfJ7JCQjE4lk/HPby56+M1bkpEv/rNrHQsNRlLUwcCpDCgvFH1V4+2H32bSr0SyGD9qomNYiK2hkQv9h97ADVDHP+gK5i5prTHb4PkfRQYjRsAS3lDa9zzNPuJ5u1unzvyIqaq6OS/uF59tvYX5ewV95yrlIPZxKyj6oPRksIbpb+m5ZXue9CnuCLJa7qBD/5IufL+zuvz+etdkHXz6Tyx4hxTPCbeWRKjn+Mj2KDg25bH2S0Y7SWcGR/xBVepv0caFywjzNxlER9RENgDlzaAdzswLqe0tJTl0pqEaT4WL9eEv3XhikLMl0eVuJPHpov4nboEGHKv7L59fMrEk6jNj3cO/tFzSYUxl/GO98j8ehvGpk4SN+xD8Gw3b094++CeKwBoEjiUAlBEfIHE27DBqftjpmBt6n7O4BFXp8jMFNuTMiQq120OUk22btouiiQmLsksXzo0TSfq+63pNA2FSEo9eBRq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 1:00=E2=80=AFPM Pasha Tatashin wrote: > > On Wed, Jun 18, 2025 at 12:40=E2=80=AFPM Mike Rapoport = wrote: > > > > On Wed, Jun 18, 2025 at 10:48:09AM -0400, Pasha Tatashin wrote: > > > On Wed, Jun 18, 2025 at 9:12=E2=80=AFAM Pratyush Yadav wrote: > > > > > > > > On Tue, Jun 17 2025, Pasha Tatashin wrote: > > > > > > > > > On Tue, Jun 17, 2025 at 11:24=E2=80=AFAM Jason Gunthorpe wrote: > > > > >> > > > > >> On Fri, Jun 13, 2025 at 04:58:27PM +0200, Pratyush Yadav wrote: > > > > >> > On Sat, Jun 07 2025, Pasha Tatashin wrote: > > > > >> > [...] > > > > >> > >> > > > > >> > >> This weirdness happens because luo_prepare() and luo_cancel= () control > > > > >> > >> the KHO state machine, but then also get controlled by it v= ia the > > > > >> > >> notifier callbacks. So the relationship between then is not= clear. > > > > >> > >> __luo_prepare() at least needs access to struct kho_seriali= zation, so it > > > > >> > >> needs to come from the callback. So I don't have a clear wa= y to clean > > > > >> > >> this all up off the top of my head. > > > > >> > > > > > > >> > > On production machine, without KHO_DEBUGFS, only LUO can con= trol KHO > > > > >> > > state, but if debugfs is enabled, KHO can be finalized manua= lly, and > > > > >> > > in this case LUO transitions to prepared state. In both case= s, the > > > > >> > > path is identical. The KHO debugfs path is only for > > > > >> > > developers/debugging purposes. > > > > >> > > > > > >> > What I meant is that even without KHO_DEBUGFS, LUO drives KHO,= but then > > > > >> > KHO calls into LUO from the notifier, which makes the control = flow > > > > >> > somewhat convoluted. If LUO is supposed to be the only thing t= hat > > > > >> > interacts directly with KHO, maybe we should get rid of the no= tifier and > > > > >> > only let LUO drive things. > > > > >> > > > > >> Yes, we should. I think we should consider the KHO notifiers and= self > > > > >> orchestration as obsoleted by LUO. That's why it was in debugfs > > > > >> because we were not ready to commit to it. > > > > > > > > > > We could do that, however, there is one example KHO user > > > > > `reserve_mem`, that is also not liveupdate related. So, it should > > > > > either be removed or modified to be handled by LUO. > > > > > > > > It still depends on kho_finalize() being called, so it still needs > > > > something to trigger its serialization. It is not automatic. And wi= th > > > > your proposed patch to make debugfs interface optional, it can't ev= en be > > > > used with the config disabled. > > > > > > At least for now, it can still be used via LUO going into prepare > > > state, since LUO changes KHO into finalized state and reserve_mem is > > > registered to be called back from KHO. > > > > > > > So if it must be explicitly triggered to be preserved, why not let = the > > > > trigger point be LUO instead of KHO? You can make reservemem a LUO > > > > subsystem instead. > > > > > > Yes, LUO can do that, the only concern I raised is that `reserve_mem= ` > > > is not really live update related. > > > > I only now realized what bothered me about "liveupdate". It's the name = of > > the driving usecase rather then the name of the technology it implement= s. > > In the end what LUO does is a (more) sophisticated control for KHO. > > > > But essentially it's not that it actually implements live update, it > > provides kexec handover control plane that enables live update. > > > > And since the same machinery can be used regardless of live update, and= I'm > > sure other usecases will appear as soon as the technology will become m= ore > > mature, it makes me think that we probably should just > > s/liveupdate_/kho_control/g or something along those lines. > > I disagree, LUO is for liveupdate flows, and is designed specifically > around the live update flows: brownout/blackout/post-liveupdate, it > should not be generalized to anticipate some other random states, and > it should only support participants that are related to live update: > iommufd/vfiofd/kvmfd/memfd/eventfd and controled via "liveupdated" the > userspace agent. > > KHO is for preserving memory, LUO uses KHO as a backbone for Live Update. > > > > > Although to be honest, things like reservemem (or IMA perhaps?) don= 't > > > > really fit well with the explicit trigger mechanism. They can be ca= rried > > > > > > Agreed. Another example I was thinking about is "kexec telemetry": > > > precise time information about kexec, including shutdown, purgatory, > > > boot. We are planning to propose kexec telemetry, and it could be LUO > > > subsystem. On the other hand, it could be useful even without live > > > update, just to measure precise kexec reboot time. > > > > > > > across kexec without needing userspace explicitly driving it. Maybe= we > > > > allow LUO subsystems to mark themselves as auto-preservable and LUO= will > > > > preserve them regardless of state being prepared? Something to thin= k > > > > about later down the line I suppose. > > > > > > We can start with adding `reserve_mem` as regular subsystem, and make > > > this auto-preserve option a future expansion, when if needed. > > > Presumably, `luoctl prepare` would work for whoever plans to use just > > > `reserve_mem`. > > > > I think it would be nice to support auto-preserve sooner than later. > > Makes sense. > > > reserve_mem can already be useful for ftrace and pstore folks and if it > > would survive a kexec without any userspace intervention it would be gr= eat. > > The pstore use case is only potential, correct? Or can it already use > reserve_mem? So currently, KHO provides the following two types of internal API: Preserve memory and metadata =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kho_preserve_folio() / kho_preserve_phys() kho_unpreserve_folio() / kho_unpreserve_phys() kho_restore_folio() kho_add_subtree() kho_retrieve_subtree() State machine =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D register_kho_notifier() / unregister_kho_notifier() kho_finalize() / kho_abort() We should remove the "State machine", and only keep the "Preserve Memory" API functions. At the time these functions are called, KHO should do the magic of making sure that the memory gets preserved across the reboot. This way, reserve_mem_init() would call: kho_preserve_folio() and kho_add_subtree() during boot, and be done with it. Pasha