From: Pratyush Yadav <pratyush@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
jasonmiu@google.com, graf@amazon.com, changyuanl@google.com,
rppt@kernel.org, dmatlack@google.com, rientjes@google.com,
corbet@lwn.net, rdunlap@infradead.org,
ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com,
ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org,
akpm@linux-foundation.org, tj@kernel.org,
yoann.congal@smile.fr, mmaurer@google.com,
roman.gushchin@linux.dev, chenridong@huawei.com,
axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com,
vincent.guittot@linaro.org, hannes@cmpxchg.org,
dan.j.williams@intel.com, david@redhat.com,
joel.granados@kernel.org, rostedt@goodmis.org,
anna.schumaker@oracle.com, song@kernel.org,
zhangguopeng@kylinos.cn, linux@weissschuh.net,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, gregkh@linuxfoundation.org,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
rafael@kernel.org, dakr@kernel.org,
bartosz.golaszewski@linaro.org, cw00.choi@samsung.com,
myungjoo.ham@samsung.com, yesanishhere@gmail.com,
Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com,
aleksander.lobakin@intel.com, ira.weiny@intel.com,
andriy.shevchenko@linux.intel.com, leon@kernel.org,
lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org,
djeffery@redhat.com, stuart.w.hayes@gmail.com,
lennart@poettering.net, brauner@kernel.org,
linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
saeedm@nvidia.com, ajayachandra@nvidia.com, parav@nvidia.com,
leonro@nvidia.com, witu@nvidia.com
Subject: Re: [PATCH v3 29/30] luo: allow preserving memfd
Date: Wed, 03 Sep 2025 16:10:37 +0200 [thread overview]
Message-ID: <mafs0v7lzvd7m.fsf@kernel.org> (raw)
In-Reply-To: <20250902134846.GN186519@nvidia.com>
Hi Jason,
On Tue, Sep 02 2025, Jason Gunthorpe wrote:
> On Mon, Sep 01, 2025 at 07:10:53PM +0200, Pratyush Yadav wrote:
>> Building kvalloc on top of this becomes trivial.
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/commit/?h=kho-array&id=cf4c04c1e9ac854e3297018ad6dada17c54a59af
>
> This isn't really an array, it is a non-seekable serialization of
> key/values with some optimization for consecutive keys. IMHO it is
Sure, an array is not the best name for the thing. Call it whatever,
maybe a "sparse collection of pointers". But I hope you get the idea.
> most useful if you don't know the size of the thing you want to
> serialize in advance since it has a nice dynamic append.
>
> But if you do know the size, I think it makes more sense just to do a
> preserving vmalloc and write out a linear array..
I think there are two separate parts here. One is the data format and
the other is the data builder.
The format itself is quite simple. It is a linked list of discontiguous
pages that holds a set of pointers. We use that idea already for the
preserved pages bitmap. Mike's vmalloc preservation patches also use the
same idea, just with a small variation.
The builder part (ka_iter in my patches) is an abstraction on top to
build the data structure. I designed it with the nice dynamic append
property since it seemed like a nice and convenient design, but we can
have it define the size statically as well. The underlying data format
won't change.
>
> So, it could be useful, but I wouldn't use it for memfd, the vmalloc
> approach is better and we shouldn't optimize for sparsness which
> should never happen.
I disagree. I think we are re-inventing the same data format with minor
variations. I think we should define extensible fundamental data formats
first, and then use those as the building blocks for the rest of our
serialization logic.
I think KHO array does exactly that. It provides the fundamental
serialization for a collection of pointers, and other serialization use
cases can then build on top of it. For example, the preservation bitmaps
can get rid of their linked list logic and just use KHO array to hold
and retrieve its bitmaps. It will make the serialization simpler.
Similar argument for vmalloc preservation.
I also don't get why you think sparseness "should never happen". For
memfd for example, you say in one of your other emails that "And again
in real systems we expect memfd to be fully populated too." Which
systems and use cases do you have in mind? Why do you think people won't
want a sparse memfd?
And finally, from a data format perspective, the sparseness only adds a
small bit of complexity (the startpos for each kho_array_page).
Everything else is practically the same as a continuous array.
All in all, I think KHO array is going to prove useful and will make
serialization for subsystems easier. I think sparseness will also prove
useful but it is not a hill I want to die on. I am fine with starting
with a non-sparse array if people really insist. But I do think we
should go with KHO array as a base instead of re-inventing the linked
list of pages again and again.
>
>> > The versioning should be first class, not hidden away as some emergent
>> > property of registering multiple serializers or something like that.
>>
>> That makes sense. How about some simple changes to the LUO interfaces to
>> make the version more prominent:
>>
>> int (*prepare)(struct liveupdate_file_handler *handler,
>> struct file *file, u64 *data, char **compatible);
>
> Yeah, something more integrated with the ops is better.
>
> You could list the supported versions in the ops itself
>
> const char **supported_deserialize_versions;
>
> And let the luo framework find the right versions.
>
> But for prepare I would expect an inbetween object:
>
> int (*prepare)(struct liveupdate_file_handler *handler,
> struct luo_object *obj, struct file *file);
>
> And then you'd do function calls on 'obj' to store 'data' per version.
What do you mean by "data per version"? I think there should be only one
version of the serialized object. Multiple versions of the same thing
will get ugly real quick.
Other than that, I think this could work well. I am guessing luo_object
stores the version and gives us a way to query it on the other side. I
think if we are letting LUO manage supported versions, it should be
richer than just a list of strings. I think it should include a ops
structure for deserializing each version. That would encapsulate the
versioning more cleanly.
--
Regards,
Pratyush Yadav
next prev parent reply other threads:[~2025-09-03 14:10 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 1:44 [PATCH v3 00/30] Live Update Orchestrator Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 01/30] kho: init new_physxa->phys_bits to fix lockdep Pasha Tatashin
2025-08-08 11:42 ` Pratyush Yadav
2025-08-08 11:52 ` Pratyush Yadav
2025-08-08 14:00 ` Pasha Tatashin
2025-08-08 19:06 ` Andrew Morton
2025-08-08 19:51 ` Pasha Tatashin
2025-08-08 20:19 ` Pasha Tatashin
2025-08-14 13:11 ` Jason Gunthorpe
2025-08-14 14:57 ` Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 02/30] kho: mm: Don't allow deferred struct page with KHO Pasha Tatashin
2025-08-08 11:47 ` Pratyush Yadav
2025-08-08 14:01 ` Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 03/30] kho: warn if KHO is disabled due to an error Pasha Tatashin
2025-08-08 11:48 ` Pratyush Yadav
2025-08-07 1:44 ` [PATCH v3 04/30] kho: allow to drive kho from within kernel Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 05/30] kho: make debugfs interface optional Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 06/30] kho: drop notifiers Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 07/30] kho: add interfaces to unpreserve folios and physical memory ranges Pasha Tatashin
2025-08-14 13:22 ` Jason Gunthorpe
2025-08-14 15:05 ` Pasha Tatashin
2025-08-14 17:01 ` Jason Gunthorpe
2025-08-15 9:12 ` Mike Rapoport
2025-08-18 13:55 ` Jason Gunthorpe
2025-08-07 1:44 ` [PATCH v3 08/30] kho: don't unpreserve memory during abort Pasha Tatashin
2025-08-14 13:30 ` Jason Gunthorpe
2025-08-07 1:44 ` [PATCH v3 09/30] liveupdate: kho: move to kernel/liveupdate Pasha Tatashin
2025-08-30 8:35 ` Mike Rapoport
2025-08-07 1:44 ` [PATCH v3 10/30] liveupdate: luo_core: luo_ioctl: Live Update Orchestrator Pasha Tatashin
2025-08-14 13:31 ` Jason Gunthorpe
2025-08-07 1:44 ` [PATCH v3 11/30] liveupdate: luo_core: integrate with KHO Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 12/30] liveupdate: luo_subsystems: add subsystem registration Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 13/30] liveupdate: luo_subsystems: implement subsystem callbacks Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 14/30] liveupdate: luo_files: add infrastructure for FDs Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 15/30] liveupdate: luo_files: implement file systems callbacks Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 16/30] liveupdate: luo_ioctl: add userpsace interface Pasha Tatashin
2025-08-14 13:49 ` Jason Gunthorpe
2025-08-07 1:44 ` [PATCH v3 17/30] liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close Pasha Tatashin
2025-08-27 15:34 ` Pratyush Yadav
2025-08-07 1:44 ` [PATCH v3 18/30] liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state management Pasha Tatashin
2025-08-14 14:02 ` Jason Gunthorpe
2025-08-07 1:44 ` [PATCH v3 19/30] liveupdate: luo_sysfs: add sysfs state monitoring Pasha Tatashin
2025-08-26 16:03 ` Jason Gunthorpe
2025-08-26 18:58 ` Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 20/30] reboot: call liveupdate_reboot() before kexec Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 21/30] kho: move kho debugfs directory to liveupdate Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 22/30] liveupdate: add selftests for subsystems un/registration Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 23/30] selftests/liveupdate: add subsystem/state tests Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 24/30] docs: add luo documentation Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 25/30] MAINTAINERS: add liveupdate entry Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 26/30] mm: shmem: use SHMEM_F_* flags instead of VM_* flags Pasha Tatashin
2025-08-11 23:11 ` Vipin Sharma
2025-08-13 12:42 ` Pratyush Yadav
2025-08-07 1:44 ` [PATCH v3 27/30] mm: shmem: allow freezing inode mapping Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 28/30] mm: shmem: export some functions to internal.h Pasha Tatashin
2025-08-07 1:44 ` [PATCH v3 29/30] luo: allow preserving memfd Pasha Tatashin
2025-08-08 20:22 ` Pasha Tatashin
2025-08-13 12:44 ` Pratyush Yadav
2025-08-13 6:34 ` Vipin Sharma
2025-08-13 7:09 ` Greg KH
2025-08-13 12:02 ` Pratyush Yadav
2025-08-13 12:14 ` Greg KH
2025-08-13 12:41 ` Jason Gunthorpe
2025-08-13 13:00 ` Greg KH
2025-08-13 13:37 ` Pratyush Yadav
2025-08-13 13:41 ` Pasha Tatashin
2025-08-13 13:53 ` Greg KH
2025-08-13 13:53 ` Greg KH
2025-08-13 20:03 ` Jason Gunthorpe
2025-08-13 13:31 ` Pratyush Yadav
2025-08-13 12:29 ` Pratyush Yadav
2025-08-13 13:49 ` Pasha Tatashin
2025-08-13 13:55 ` Pratyush Yadav
2025-08-26 16:20 ` Jason Gunthorpe
2025-08-27 15:03 ` Pratyush Yadav
2025-08-28 12:43 ` Jason Gunthorpe
2025-08-28 23:00 ` Chris Li
2025-09-01 17:10 ` Pratyush Yadav
2025-09-02 13:48 ` Jason Gunthorpe
2025-09-03 14:10 ` Pratyush Yadav [this message]
2025-09-03 15:01 ` Jason Gunthorpe
2025-09-04 12:57 ` Pratyush Yadav
2025-09-04 14:42 ` Jason Gunthorpe
2025-08-28 7:14 ` Mike Rapoport
2025-08-29 18:47 ` Chris Li
2025-08-29 19:18 ` Chris Li
2025-09-02 13:41 ` Jason Gunthorpe
2025-09-03 12:01 ` Chris Li
2025-09-04 17:34 ` Jason Gunthorpe
2025-09-01 16:23 ` Mike Rapoport
2025-09-01 16:54 ` Pasha Tatashin
2025-09-01 17:21 ` Pratyush Yadav
2025-09-01 19:02 ` Pasha Tatashin
2025-09-02 11:38 ` Jason Gunthorpe
2025-09-03 15:59 ` Pasha Tatashin
2025-09-03 16:40 ` Jason Gunthorpe
2025-09-03 19:29 ` Mike Rapoport
2025-09-02 11:58 ` Mike Rapoport
2025-09-01 17:01 ` Pratyush Yadav
2025-09-02 11:44 ` Mike Rapoport
2025-09-03 14:17 ` Pratyush Yadav
2025-09-03 19:39 ` Mike Rapoport
2025-09-04 12:39 ` Pratyush Yadav
2025-08-07 1:44 ` [PATCH v3 30/30] docs: add documentation for memfd preservation via LUO Pasha Tatashin
2025-08-08 12:07 ` [PATCH v3 00/30] Live Update Orchestrator David Hildenbrand
2025-08-08 12:24 ` Pratyush Yadav
2025-08-08 13:53 ` Pasha Tatashin
2025-08-08 13:52 ` Pasha Tatashin
2025-08-26 13:16 ` Pratyush Yadav
2025-08-26 13:54 ` Pasha Tatashin
2025-08-26 14:24 ` Jason Gunthorpe
2025-08-26 15:02 ` Pasha Tatashin
2025-08-26 15:13 ` Jason Gunthorpe
2025-08-26 16:10 ` Pasha Tatashin
2025-08-26 16:22 ` Jason Gunthorpe
2025-08-26 17:03 ` Pasha Tatashin
2025-08-26 17:08 ` Jason Gunthorpe
2025-08-27 14:01 ` Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mafs0v7lzvd7m.fsf@kernel.org \
--to=pratyush@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=ajayachandra@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aleksander.lobakin@intel.com \
--cc=aliceryhl@google.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anna.schumaker@oracle.com \
--cc=axboe@kernel.dk \
--cc=bartosz.golaszewski@linaro.org \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=changyuanl@google.com \
--cc=chenridong@huawei.com \
--cc=corbet@lwn.net \
--cc=cw00.choi@samsung.com \
--cc=dakr@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=djeffery@redhat.com \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=hpa@zytor.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jasonmiu@google.com \
--cc=jgg@nvidia.com \
--cc=joel.granados@kernel.org \
--cc=kanie@linux.alibaba.com \
--cc=lennart@poettering.net \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@weissschuh.net \
--cc=lukas@wunner.de \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=mingo@redhat.com \
--cc=mmaurer@google.com \
--cc=myungjoo.ham@samsung.com \
--cc=ojeda@kernel.org \
--cc=parav@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=quic_zijuhu@quicinc.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=song@kernel.org \
--cc=stuart.w.hayes@gmail.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=wagi@kernel.org \
--cc=witu@nvidia.com \
--cc=x86@kernel.org \
--cc=yesanishhere@gmail.com \
--cc=yoann.congal@smile.fr \
--cc=zhangguopeng@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).