All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	 Pasha Tatashin <pasha.tatashin@soleen.com>,
	 jasonmiu@google.com,  graf@amazon.com, changyuanl@google.com,
	 rppt@kernel.org,  dmatlack@google.com, rientjes@google.com,
	 corbet@lwn.net,  rdunlap@infradead.org,
	ilpo.jarvinen@linux.intel.com,  kanie@linux.alibaba.com,
	ojeda@kernel.org,  aliceryhl@google.com,  masahiroy@kernel.org,
	akpm@linux-foundation.org,  tj@kernel.org,
	 yoann.congal@smile.fr, mmaurer@google.com,
	 roman.gushchin@linux.dev,  chenridong@huawei.com,
	axboe@kernel.dk,  mark.rutland@arm.com,  jannh@google.com,
	vincent.guittot@linaro.org,  hannes@cmpxchg.org,
	dan.j.williams@intel.com,  david@redhat.com,
	 joel.granados@kernel.org, rostedt@goodmis.org,
	 anna.schumaker@oracle.com,  song@kernel.org,
	zhangguopeng@kylinos.cn,  linux@weissschuh.net,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  gregkh@linuxfoundation.org,
	 tglx@linutronix.de, mingo@redhat.com,  bp@alien8.de,
	 dave.hansen@linux.intel.com, x86@kernel.org,  hpa@zytor.com,
	 rafael@kernel.org,  dakr@kernel.org,
	bartosz.golaszewski@linaro.org,  cw00.choi@samsung.com,
	myungjoo.ham@samsung.com,  yesanishhere@gmail.com,
	Jonathan.Cameron@huawei.com,  quic_zijuhu@quicinc.com,
	aleksander.lobakin@intel.com,  ira.weiny@intel.com,
	andriy.shevchenko@linux.intel.com,  leon@kernel.org,
	 lukas@wunner.de, bhelgaas@google.com,  wagi@kernel.org,
	 djeffery@redhat.com, stuart.w.hayes@gmail.com,
	 lennart@poettering.net,  brauner@kernel.org,
	linux-api@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	saeedm@nvidia.com,  ajayachandra@nvidia.com,  parav@nvidia.com,
	leonro@nvidia.com,  witu@nvidia.com,  hughd@google.com,
	skhawaja@google.com,  chrisl@kernel.org,
	 steven.sistare@oracle.com
Subject: Re: [PATCH v4 00/30] Live Update Orchestrator
Date: Mon, 27 Oct 2025 12:37:44 +0100	[thread overview]
Message-ID: <mafs0y0owd187.fsf@kernel.org> (raw)
In-Reply-To: <20251020142924.GS316284@nvidia.com> (Jason Gunthorpe's message of "Mon, 20 Oct 2025 11:29:24 -0300")

On Mon, Oct 20 2025, Jason Gunthorpe wrote:

> On Tue, Oct 14, 2025 at 03:29:59PM +0200, Pratyush Yadav wrote:
>> > 1) Use a vmalloc and store a list of the PFNs in the pool. Pool becomes
>> >    frozen, can't add/remove PFNs.
>> 
>> Doesn't that circumvent LUO's state machine? The idea with the state
>> machine was to have clear points in time when the system goes into the
>> "limited capacity"/"frozen" state, which is the LIVEUPDATE_PREPARE
>> event. 
>
> I wouldn't get too invested in the FSM, it is there but it doesn't
> mean every luo client has to be focused on it.

Having each subsystem have its own state machine sounds like a bad idea
to me. It can get tricky to manage both for us and our users.

>
>> With what you propose, the first FD being preserved implicitly
>> triggers the prepare event. Same thing for unprepare/cancel operations.
>
> Yes, this is easy to write and simple to manage.
>
>> I am wondering if it is better to do it the other way round: prepare all
>> files first, and then prepare the hugetlb subsystem at
>> LIVEUPDATE_PREPARE event. At that point it already knows which pages to
>> mark preserved so the serialization can be done in one go.
>
> I think this would be slower and more complex?
>
>> > 2) Require the users of hugetlb memory, like memfd, to
>> >    preserve/restore the folios they are using (using their hugetlb order)
>> > 3) Just before kexec run over the PFN list and mark a bit if the folio
>> >    was preserved by KHO or not. Make sure everything gets KHO
>> >    preserved.
>> 
>> "just before kexec" would need a callback from LUO. I suppose a
>> subsystem is the place for that callback. I wrote my email under the
>> (wrong) impression that we were replacing subsystems.
>
> The file descriptors path should have luo client ops that have all
> the required callbacks. This is probably an existing op.
>
>> That makes me wonder: how is the subsystem-level callback supposed to
>> access the global data? I suppose it can use the liveupdate_file_handler
>> directly, but it is kind of strange since technically the subsystem and
>> file handler are two different entities.
>
> If we need such things we would need a way to link these together, but
> I'm wonder if we really don't..
>
>> Also as Pasha mentioned, 1G pages for guest_memfd will use hugetlb, and
>> I'm not sure how that would map with this shared global data. memfd and
>> guest_memfd will likely have different liveupdate_file_handler but would
>> share data from the same subsystem. Maybe that's a problem to solve for
>> later...
>
> On preserve memfd should call into hugetlb to activate it as a hugetlb
> page provider and preserve it too.

From what I understand, the main problem you want to solve is that the
life cycle of the global data should be tied to the file descriptors.
And since everything should have a FD anyway, can't we directly tie the
subsystems to file handlers? The subsystem gets a "preserve" callback
when the first FD that uses it gets preserved. It gets a "unpreserve"
callback when the last FD goes away. And the rest of the state machine
like prepare, cancel, etc. stay the same.

I think this gives us a clean abstraction that has LUO-managed lifetime.

It also works with the guest_memfd and memfd case since both can have
hugetlb as their underlying subsystem. For example,

static const struct liveupdate_file_ops memfd_luo_file_ops = {
	.preserve = memfd_luo_preserve,
	.unpreserve = memfd_luo_unpreserve,
	[...]
	.subsystem = &luo_hugetlb_subsys,
};

And then luo_{un,}preserve_file() can keep a refcount for the subsystem
and preserve or unpreserve the subsystem as needed. LUO can manage the
locking for these callbacks too.

-- 
Regards,
Pratyush Yadav

  reply	other threads:[~2025-10-27 11:37 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-29  1:02 [PATCH v4 00/30] Live Update Orchestrator Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 01/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 02/30] kho: make debugfs interface optional Pasha Tatashin
2025-10-06 16:30   ` Pratyush Yadav
2025-10-06 18:02     ` Pasha Tatashin
2025-10-06 16:55   ` Pratyush Yadav
2025-10-06 17:23     ` Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 03/30] kho: drop notifiers Pasha Tatashin
2025-10-06 14:30   ` Pratyush Yadav
2025-10-06 16:17     ` Pasha Tatashin
2025-10-06 16:38     ` Pratyush Yadav
2025-10-06 17:01   ` Pratyush Yadav
2025-10-06 17:21     ` Pasha Tatashin
2025-10-07 12:09       ` Pratyush Yadav
2025-10-07 13:16         ` Pasha Tatashin
2025-10-07 13:30           ` Pratyush Yadav
2025-09-29  1:02 ` [PATCH v4 04/30] kho: add interfaces to unpreserve folios and page ranes Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 05/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 06/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 07/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-09-29  1:02 ` [PATCH v4 08/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 09/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 10/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 11/30] liveupdate: luo_session: Add sessions support Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 12/30] liveupdate: luo_ioctl: add user interface Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 13/30] liveupdate: luo_file: implement file systems callbacks Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 14/30] liveupdate: luo_session: Add ioctls for file preservation and state management Pasha Tatashin
2025-10-29 19:07   ` Pratyush Yadav
2025-10-29 20:13     ` Pasha Tatashin
2025-10-29 20:43       ` David Matlack
2025-10-29 20:57         ` Pasha Tatashin
2025-10-29 21:13       ` David Matlack
2025-10-29 21:17         ` Pasha Tatashin
2025-10-29 22:00       ` Samiullah Khawaja
2025-10-30 14:45         ` Pasha Tatashin
2025-10-29 20:37   ` Pratyush Yadav
2025-10-29 20:58     ` Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 15/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 16/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 17/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 18/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-10-03 23:17   ` Vipin Sharma
2025-10-04  2:08     ` Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 19/30] docs: add luo documentation Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 20/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 21/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 22/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 23/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 24/30] luo: allow preserving memfd Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 25/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 26/30] selftests/liveupdate: Add multi-kexec session lifecycle test Pasha Tatashin
2025-10-03 22:51   ` Vipin Sharma
2025-10-04  2:07     ` Pasha Tatashin
2025-10-04  2:37       ` Pasha Tatashin
2025-10-09 22:57         ` Vipin Sharma
2025-09-29  1:03 ` [PATCH v4 27/30] selftests/liveupdate: Add multi-file and unreclaimed file test Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 28/30] selftests/liveupdate: Add multi-session workflow and state interaction test Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 29/30] selftests/liveupdate: Add test for unreclaimed resource cleanup Pasha Tatashin
2025-09-29  1:03 ` [PATCH v4 30/30] selftests/liveupdate: Add tests for per-session state and cancel cycles Pasha Tatashin
2025-10-07 17:10 ` [PATCH v4 00/30] Live Update Orchestrator Pasha Tatashin
2025-10-07 17:50   ` Jason Gunthorpe
2025-10-08  3:18     ` Pasha Tatashin
2025-10-08  7:03   ` Samiullah Khawaja
2025-10-08 16:40     ` Pasha Tatashin
2025-10-08 19:35       ` Jason Gunthorpe
2025-10-08 20:26         ` Pasha Tatashin
2025-10-09 14:48           ` Jason Gunthorpe
2025-10-09 15:01             ` Pasha Tatashin
2025-10-09 15:03               ` Pasha Tatashin
2025-10-09 16:46               ` Samiullah Khawaja
2025-10-09 17:39               ` Jason Gunthorpe
2025-10-09 18:37                 ` Pasha Tatashin
2025-10-10 14:35                   ` Jason Gunthorpe
2025-10-09 21:58   ` Samiullah Khawaja
2025-10-09 22:42     ` Pasha Tatashin
2025-10-10 14:42       ` Jason Gunthorpe
2025-10-10 14:58         ` Pasha Tatashin
2025-10-10 15:02           ` Jason Gunthorpe
2025-10-09 22:57   ` Pratyush Yadav
2025-10-09 23:50     ` Pasha Tatashin
2025-10-10 15:01       ` Jason Gunthorpe
2025-10-14 13:29         ` Pratyush Yadav
2025-10-20 14:29           ` Jason Gunthorpe
2025-10-27 11:37             ` Pratyush Yadav [this message]
2025-10-13 15:23       ` Pratyush Yadav
2025-10-10 12:45     ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mafs0y0owd187.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anna.schumaker@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=bartosz.golaszewski@linaro.org \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=changyuanl@google.com \
    --cc=chenridong@huawei.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=cw00.choi@samsung.com \
    --cc=dakr@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=djeffery@redhat.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jasonmiu@google.com \
    --cc=jgg@nvidia.com \
    --cc=joel.granados@kernel.org \
    --cc=kanie@linux.alibaba.com \
    --cc=lennart@poettering.net \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lukas@wunner.de \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmaurer@google.com \
    --cc=myungjoo.ham@samsung.com \
    --cc=ojeda@kernel.org \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=quic_zijuhu@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhawaja@google.com \
    --cc=song@kernel.org \
    --cc=steven.sistare@oracle.com \
    --cc=stuart.w.hayes@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wagi@kernel.org \
    --cc=witu@nvidia.com \
    --cc=x86@kernel.org \
    --cc=yesanishhere@gmail.com \
    --cc=yoann.congal@smile.fr \
    --cc=zhangguopeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.