From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
linux-kselftest@vger.kernel.org, rppt@kernel.org,
shuah@kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org,
skhan@linuxfoundation.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, corbet@lwn.net,
dmatlack@google.com, kexec@lists.infradead.org,
skhawaja@google.com, graf@amazon.com
Subject: Re: [PATCH v4 07/13] kho: add support for linked-block serialization
Date: Wed, 3 Jun 2026 02:44:25 +0000 [thread overview]
Message-ID: <ah-QtVHQpvqqlPT5@plex> (raw)
In-Reply-To: <2vxzcxy8evuo.fsf@kernel.org>
On 06-02 18:43, Pratyush Yadav wrote:
> On Mon, Jun 01 2026, Pasha Tatashin wrote:
>
> > On 06-01 15:38, Pratyush Yadav wrote:
> >> On Sat, May 30 2026, Pasha Tatashin wrote:
> >>
> >> > Introduce a linked-block serialization mechanism for state handover.
> >> >
> >> > Previously, LUO used contiguous memory blocks for serializing sessions
> >> > and files, which imposed limits on the total number of items that could
> >> > be preserved across a live update.
> >> >
> >> > This commit adds the infrastructure for a more flexible, block-based
> >> > approach where serialized data is stored in a chain of linked blocks.
> >> > This is a generic KHO serialization block infrastructure that can be
> >> > used by multiple subsystems.
> >> >
> >> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> [...]
> >> > +/**
> >> > + * DOC: KHO Serialization Blocks ABI
> >> > + *
> >> > + * Subsystems using the KHO Serialization Blocks framework rely on the stable
> >> > + * Application Binary Interface defined below to pass serialized state from a
> >> > + * pre-update kernel to a post-update kernel.
> >> > + *
> >> > + * This interface is a contract. Any modification to the structure fields,
> >> > + * compatible strings, or the layout of the `__packed` serialization
> >> > + * structures defined here constitutes a breaking change. Such changes require
> >> > + * incrementing the version number in the `KHO_BLOCK_ABI_COMPATIBLE` string to
> >> > + * prevent a new kernel from misinterpreting data from an old kernel.
> >> > + *
> >> > + * Changes are allowed provided the compatibility version is incremented;
> >> > + * however, backward/forward compatibility is only guaranteed for kernels
> >> > + * supporting the same ABI version.
> >> > + */
> >> > +
> >> > +#ifndef _LINUX_KHO_ABI_BLOCK_H
> >> > +#define _LINUX_KHO_ABI_BLOCK_H
> >> > +
> >> > +#include <asm/page.h>
> >> > +#include <linux/types.h>
> >> > +
> >> > +#define KHO_BLOCK_ABI_COMPATIBLE "kho-block-v1"
> >>
> >> During KHO radix development, I argued for a separate compatible for the
> >> radix tree, but at that time, we tied the radix tree to core KHO ABI.
> >> The argument being that all core KHO data structures belong to the KHO
> >> ABI set. I imagine this will be used by kho_vmalloc, so it will also be
> >> end up being used by a core KHO API.
> >>
> >> So, do we want separate ABI? I don't much have a preference myself, but
> >> I do think the compatible management will be a bit easier if this relied
> >> on KHO compatible, especially once kho_vmalloc starts using it.
> >
> > I prefer to make them fine-grained, now that we are adding more and more
> > features: kho vmalloc, kho radix, and kho block should all have their
> > own compatibility strings. Furthermore, any components that depend on
> > them should include these compatibility strings in their own
> > compatibility strings, in the same manner I have done in this series.
>
> Sure, sounds good.
>
> >
> >>
> >> > +
> >> > +/**
> >> > + * KHO_BLOCK_SIZE - The size of each serialization block.
> >> > + *
> >> > + * This is defined as PAGE_SIZE. PAGE_SIZE is ABI compliant because live
> >> > + * update between kernels with different page sizes is not supported by KHO.
> >> > + */
> >> > +#define KHO_BLOCK_SIZE PAGE_SIZE
> >> > +
> >> > +/**
> >> > + * struct kho_block_header_ser - Header for the serialized data block.
> >> > + * @next: Physical address of the next struct kho_block_header_ser.
> >> > + * @count: The number of entries that immediately follow this header in the
> >> > + * memory block.
> >> > + *
> >> > + * This structure is located at the beginning of a block of physical memory
> >> > + * preserved across a kexec. It provides the necessary metadata to interpret
> >> > + * the array of entries that follow.
> >> > + */
> >> > +struct kho_block_header_ser {
> >> > + u64 next;
> >> > + u64 count;
> >> > +} __packed;
> >> > +
> >> > +#endif /* _LINUX_KHO_ABI_BLOCK_H */
> >> > diff --git a/include/linux/kho_block.h b/include/linux/kho_block.h
> >> > new file mode 100644
> >> > index 000000000000..5e6b87b1befa
> >> > --- /dev/null
> >> > +++ b/include/linux/kho_block.h
> >> > @@ -0,0 +1,79 @@
> >> > +/* SPDX-License-Identifier: GPL-2.0 */
> >> > +/*
> >> > + * Copyright (c) 2026, Google LLC.
> >> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> >> > + */
> >> > +
> >> > +#ifndef _LINUX_KHO_BLOCK_H
> >> > +#define _LINUX_KHO_BLOCK_H
> >> > +
> >> > +#include <linux/list.h>
> >> > +#include <linux/types.h>
> >> > +#include <linux/kho/abi/block.h>
> >> > +
> >> > +/**
> >> > + * struct kho_block - Internal representation of a serialization block.
> >> > + * @list: List head for linking blocks in memory.
> >> > + * @ser: Pointer to the serialized header in preserved memory.
> >> > + */
> >> > +struct kho_block {
> >> > + struct list_head list;
> >> > + struct kho_block_header_ser *ser;
> >> > +};
> >> > +
> >> > +/**
> >> > + * struct kho_block_set - A set of blocks that belong to the same object.
> >> > + * @blocks: The list of serialization blocks (struct kho_block).
> >> > + * @nblocks: The number of allocated serialization blocks.
> >> > + * @head_pa: Physical address of the first block header.
> >> > + * @entry_size: The size of each entry in the blocks.
> >> > + * @count_per_block: The maximum number of entries each block can hold.
> >> > + * @incoming: True if this block set was restored from the previous kernel.
> >> > + */
> >> > +struct kho_block_set {
> >> > + struct list_head blocks;
> >> > + long nblocks;
> >> > + u64 head_pa;
> >> > + size_t entry_size;
> >>
> >> I think we should add the entry_size to kho_block_header_ser? I think it
> >> is a part of the ABI of the block set. If this changes, we cannot parse
> >> a block set with a different size. If a subsystem wants to change entry
> >> size, they create a new block set with different entry size, and then
> >> they bump their compatible version.
> >
> > I have considered that, and we can certainly do it; however, I do not
> > see how it would affect the current implementation. If luo_file or
> > luo_session change entry_size, they must change the LUO compatibility
> > version, which would prevent LU from one kernel to the next. However,
> > for flexibility and future extensibility, I believe it would be useful
> > to add entry_size and block_size (which is PAGE_SIZE, but could be
> > larger for some users) to the header. This is more of a feature request
> > than an issue with the current series.
>
> My suggestion was mainly for sanity checking. So if LUO or another user
> inadvertently changes entry size, it gets caught. But thinking about it
> more, there are a million other ways to break compatibility while
> keeping the entry size same so perhaps it doesn't matter as much...
>
> >
> >>
> >> > + u64 count_per_block;
> >> > + bool incoming;
> >> > +};
> >> > +
> >> > +/**
> >> > + * struct kho_block_it - Iterator for serializing entries into blocks.
> >> > + * @bs: The block set being iterated.
> >> > + * @block: The current block.
> >> > + * @i: The current entry index within @block.
> >> > + */
> >> > +struct kho_block_it {
> >> > + struct kho_block_set *bs;
> >> > + struct kho_block *block;
> >> > + u64 i;
> >> > +};
> >> > +
> >> > +/**
> >> > + * KHO_BLOCK_SET_INIT - Initialize a static kho_block_set.
> >> > + * @_name: Name of the kho_block_set variable.
> >> > + * @_entry_size: The size of each entry in the block set.
> >> > + */
> >> > +#define KHO_BLOCK_SET_INIT(_name, _entry_size) { \
> >> > + .blocks = LIST_HEAD_INIT((_name).blocks), \
> >> > + .entry_size = _entry_size, \
> >> > +}
> >> > +
> >> > +void kho_block_set_init(struct kho_block_set *bs, size_t entry_size);
> >> > +
> >> > +int kho_block_grow(struct kho_block_set *bs, u64 count);
> >> > +void kho_block_shrink(struct kho_block_set *bs, u64 count);
> >>
> >> These block management functions seem like internal details of the block
> >
> > This is not so. The confusion here is that they must be allocated and
> > preserved at runtime as resources are registered/unregistered, while
> > these blocks are only used serialization phase,
> >
> > These calls are more like notifiers that more files/sessions are created
> > removed, so we can adjust block count accordingly if necessary (allocate
> > preserver memory), and have them available durign
> > serialization/deserialization
>
> Yeah, I got that when reading the later patches that use these.
>
> Perhaps kho_block_prealloc() and kho_block_unalloc() is more clear,
> although it does not sound as nice. If not, then I suppose at least add
> a comment explaining the intended usage.
Done
>
> >
> >> set API. Do we need to export them? I think users should not have to
> >> worry about block management. They should read, set, or clear entries
> >> using the iterators, and internally the block management should take of
> >> allocation or freeing. So here for example, I th
> >
> > something is missing :-)
>
> I don't remember what I meant to say anymore :-/
>
> [...]
> >> > +/**
> >> > + * kho_block_set_init - Initialize a block set.
> >> > + * @bs: The block set to initialize.
> >> > + * @entry_size: The size of each entry in the blocks.
> >> > + */
> >> > +void kho_block_set_init(struct kho_block_set *bs, size_t entry_size)
> >> > +{
> >> > + *bs = (struct kho_block_set)KHO_BLOCK_SET_INIT(*bs, entry_size);
> >> > +}
> >> > +
> >> > +static inline u64 kho_block_count_per_block(struct kho_block_set *bs)
> >> > +{
> >> > + if (unlikely(!bs->count_per_block)) {
> >> > + bs->count_per_block = (KHO_BLOCK_SIZE -
> >> > + sizeof(struct kho_block_header_ser)) /
> >> > + bs->entry_size;
> >> > + WARN_ON(!bs->count_per_block);
> >> > + }
> >> > + return bs->count_per_block;
> >> > +}
> >>
> >> This looks odd. I don't see a reason to calculate this lazily. Why not
> >> just do it when initializing the block set, in kho_block_set_init() or
> >> kho_block_restore()? And then use bs->count_per_block directly.
> >
> > This allows for blocks to use static initilziation, I like static inits
> > :-)
>
> You can do this:
>
> #define KHO_BLOCK_SET_INIT(_name, _entry_size) { \
> .blocks = LIST_HEAD_INIT((_name).blocks), \
> .entry_size = _entry_size, \
> .count_per_block = (KHO_BLOCK_SIZE - sizeof(struct kho_block_header_ser)) / (_entry_size), \
> }
>
> Compiles for me.
You are correct, done.
>
> [...]
> >> > +void kho_block_destroy(struct kho_block_set *bs)
> >> > +{
> >> > + u64 head_pa = bs->head_pa;
> >> > + struct kho_block *block;
> >> > +
> >> > + while (!list_empty(&bs->blocks)) {
> >> > + block = list_first_entry(&bs->blocks, struct kho_block, list);
> >> > + list_del(&block->list);
> >> > + kfree(block);
> >> > + }
> >>
> >> Nit:
> >>
> >> list_for_each_entry_safe(block, tmp, &bs->blocks, list) {
> >> list_del(&block->list);
> >> kfree(block);
> >> }
> >>
> >> is a bit more idiomatic (and IMO easier to read).
> >
> > Sure
> >
> >>
> >> > + bs->nblocks = 0;
> >> > + bs->head_pa = 0;
> >> > +
> >> > + while (head_pa) {
> >> > + struct kho_block_header_ser *ser = phys_to_virt(head_pa);
> >> > +
> >> > + head_pa = ser->next;
> >> > + kho_block_free_ser(bs, ser);
> >>
> >> Nit: also, can't you put this also in the previous loop? Something like:
> >>
> >> list_for_each_entry_safe(block, tmp, &bs->blocks, list) {
> >> list_del(&block->list);
> >> kho_block_free_ser(block->ser);
> >> kfree(block);
> >> }
> >
> > We actually can't merge these into a single loop because of partial
> > restoration failures handling in kho_block_restore().
> >
> > If kho_block_restore fails halfway through restoring a chain of blocks
> > (for example, if kho_block_add fails on block 3 of 5), we jump to the
> > err_destroy cleanup path which calls kho_block_destroy().
> >
> > At this point:
> > - bs->blocks only contains the tracked blocks we successfully added
> > (blocks 1 and 2).
> > - bs->head_pa still points to the physical head of the entire 5-block
> > incoming chain.
> >
> > But, this is a good place to add a comment.
>
> IMO it would be cleaner for kho_block_destroy() to destroy the currently
> initialized block set, and then the error handling path in restore path
> can clean up the rest.
Sounds good, done.
>
> >
> >> > + }
> >> > +}
> [...]
> >> > +/**
> >> > + * kho_block_it_prev - Return the previous entry slot in the block set.
> >> > + * @it: The block iterator.
> >> > + *
> >> > + * If the current index is at the start of a block, it automatically moves to
> >> > + * the end of the previous block.
> >> > + *
> >> > + * Return: A pointer to the previous entry slot, or NULL if at the very
> >> > + * beginning of the block set.
> >> > + */
> >> > +void *kho_block_it_prev(struct kho_block_it *it)
> >> > +{
> >> > + if (!it->block)
> >> > + return NULL;
> >> > +
> >> > + if (it->i == 0) {
> >> > + if (list_is_first(&it->block->list, &it->bs->blocks))
> >> > + return NULL;
> >> > + it->block = list_prev_entry(it->block, list);
> >> > + it->i = kho_block_count_per_block(it->bs);
> >> > + }
> >> > +
> >> > + return (void *)(it->block->ser + 1) + (--it->i * it->bs->entry_size);
> >> > +}
> >> > +
> >> > +/**
> >> > + * kho_block_it_finalize - Finalize the current block by setting its entry count.
> >> > + * @it: The block iterator.
> >> > + */
> >> > +void kho_block_it_finalize(struct kho_block_it *it)
> >> > +{
> >> > + if (it->block)
> >> > + it->block->ser->count = it->i;
> >> > +}
> >>
> >> Doesn't kho_block_it_next() already do this when you add an entry? So
> >> this seems redundant.
> >
> > It is not redundant because of how the final partially-fille block is handled.
> >
> > kho_block_it_next() only writes the count into the block header when a block is completely full and it is advancing to the next one:
> >
> > if (it->i == kho_block_count_per_block(it->bs)) {
> > it->block->ser->count = it->i;
> > ...
> >
> > But for the very last block in the set, it is usually only partially
> > filled (e.g., we write 10 entries into a block with a capacity of 64).
> > Since it->i never reaches the maximum capacity, kho_block_it_next()
> > never commits its count.
> >
> > Pasha
>
> I think we can make kho_block_it_next() always write it. I think it
> makes sense from an API point of view, since I see this API as "adding
> an entry to the block set", so updating its internal counters makes
> sense.
>
> Requiring the finalize will be error prone, since it is easy to forget.
> Then you silently lose some entries on the next boot.
Good suggetion, cleaned-up.
Thank you!
Pasha
>
> --
> Regards,
> Pratyush Yadav
next prev parent reply other threads:[~2026-06-03 2:44 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-30 22:19 [PATCH v4 00/13] liveupdate: Remove limits on sessions and files Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 01/13] liveupdate: change file_set->count type to u64 for type safety Pasha Tatashin
2026-05-31 13:35 ` Pasha Tatashin
2026-06-01 12:08 ` Pratyush Yadav
2026-06-02 8:13 ` Mike Rapoport
2026-05-30 22:19 ` [PATCH v4 02/13] liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fd Pasha Tatashin
2026-05-31 12:52 ` Pasha Tatashin
2026-06-01 12:15 ` Pratyush Yadav
2026-06-02 8:13 ` Mike Rapoport
2026-06-03 3:10 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 03/13] liveupdate: centralize state management into struct luo_ser Pasha Tatashin
2026-06-01 12:19 ` Pratyush Yadav
2026-06-02 8:13 ` Mike Rapoport
2026-06-03 2:57 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 04/13] liveupdate: register luo_ser as KHO subtree Pasha Tatashin
2026-05-31 13:44 ` Pasha Tatashin
2026-06-01 12:39 ` Pratyush Yadav
2026-06-01 13:50 ` Pasha Tatashin
2026-06-01 14:27 ` Pratyush Yadav
2026-05-30 22:19 ` [PATCH v4 05/13] liveupdate: Extract luo_file_deserialize_one helper Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 06/13] liveupdate: Extract luo_session_deserialize_one helper Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 07/13] kho: add support for linked-block serialization Pasha Tatashin
2026-06-01 13:38 ` Pratyush Yadav
2026-06-01 14:37 ` Pasha Tatashin
2026-06-02 16:43 ` Pratyush Yadav
2026-06-03 2:44 ` Pasha Tatashin [this message]
2026-06-02 8:13 ` Mike Rapoport
2026-06-02 9:04 ` Mike Rapoport
2026-06-03 2:21 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 08/13] liveupdate: defer session block allocation and PA setting Pasha Tatashin
2026-06-01 13:47 ` Pratyush Yadav
2026-06-02 8:13 ` Mike Rapoport
2026-06-03 2:50 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 09/13] liveupdate: Remove limit on the number of sessions Pasha Tatashin
2026-06-01 14:03 ` Pratyush Yadav
2026-06-01 14:44 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 10/13] liveupdate: Remove limit on the number of files per session Pasha Tatashin
2026-06-01 14:16 ` Pratyush Yadav
2026-06-01 14:40 ` Pasha Tatashin
2026-05-30 22:19 ` [PATCH v4 11/13] selftests/liveupdate: Test session and file limit removal Pasha Tatashin
2026-06-01 14:17 ` Pratyush Yadav
2026-05-30 22:19 ` [PATCH v4 12/13] selftests/liveupdate: Add stress-sessions kexec test Pasha Tatashin
2026-06-01 14:19 ` Pratyush Yadav
2026-05-30 22:19 ` [PATCH v4 13/13] selftests/liveupdate: Add stress-files " Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah-QtVHQpvqqlPT5@plex \
--to=pasha.tatashin@soleen.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pratyush@kernel.org \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=skhawaja@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox