From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50958CD6E5D for ; Wed, 3 Jun 2026 02:44:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA2F26B008C; Tue, 2 Jun 2026 22:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B540A6B0092; Tue, 2 Jun 2026 22:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6A416B0093; Tue, 2 Jun 2026 22:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 93AC46B008C for ; Tue, 2 Jun 2026 22:44:31 -0400 (EDT) Received: from smtpin01.hostedemail.com (lb01a-stub [10.200.18.249]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5C64B16149F for ; Wed, 3 Jun 2026 02:44:31 +0000 (UTC) X-FDA: 84837057942.01.E093AED Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf03.hostedemail.com (Postfix) with ESMTP id 62A7320004 for ; Wed, 3 Jun 2026 02:44:29 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=WH278a1p; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=reject) header.from=soleen.com ARC-Seal: i=1; a=rsa-sha256; d=hostedemail.com; s=arc-20220608; cv=none; t=1780454669; b=7cebQ78MjYheQr5mvISXCIcrGHFUYsUFUFRENakdJVoDQ2QOBmjQNzafWTxLrtJrs2SLrC LMpTBR1Mg5ZZL04dbPiGdU5zmMj5ONYaqpm0tPaEP21rGEBzAJiCnSSgX+pYMXWwcnq1ng xy64KXWG9fbpxbTGADwNSkK2tLuG+W0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=WH278a1p; spf=pass (imf03.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=reject) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1780454669; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tcJUWEM17xr95CuXHQAYaU7dL0/ztgzuxkWQEIuPEA8=; b=D8PIL2eUq9QTaAw7Q9GlKI+3IbCQ68dHOJuvPNNTR/U7ifEYFF+6QIVqBIDGKI0i1NElKS 9SCIsmL4q68P18Bo3jHHJDHf6Vy+4MCtBR/JhtnnMPZo1HF71si5KiiYbh4tmkhYRZSeHu 8mDvv416rJ2MAjN29HJVrGwUxlXaXj0= Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-8424b6792efso859920b3a.3 for ; Tue, 02 Jun 2026 19:44:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1780454668; x=1781059468; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tcJUWEM17xr95CuXHQAYaU7dL0/ztgzuxkWQEIuPEA8=; b=WH278a1pgJQ/7QtRvwwPh+NHgSqIIFv3YCN/0nMrC2hsAG6OOznLzWfsLSfGef9gae pWN3dtd1yDXqn+mbkkqSbZ4PDxb+znKJ0uRM9k8EIfuWNniX/o3JjF/XopAW8H7QVviA OjNUWKY0ZD6TtTrL2LO1ABg6CESiz2VGa1A24nmoUxxGMRcfZTfWXTsVkoXUk5mxYHzu KPMDZL6Ob6xZIkF+Yb3DXNHndOY1Bn0znH8uVAgXOTMDOv2J6DnwJmNBFJSDKIhnAv7r Ut3k6MyjSi/jWh82ufHngT4yfqhuDIlFldDNfI26Co1wsTFfdiyzUCzP51WXRew0bJA7 fXsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780454668; x=1781059468; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tcJUWEM17xr95CuXHQAYaU7dL0/ztgzuxkWQEIuPEA8=; b=CjqSynD/HTgV+hf0C9B840kRxZD9efq/22af55HS9kupTedRCXmUuVdBaZNkcyVxlA cZXAySLjzeZxV8+mj/LFXs9+4nxA7OonkyqD7vn38/4Mn7m3SeYJZR0IXeKYXtzbLraz hOSh7ulIz9+3hqvPo2bn8o/7Znt4pe+UpKlCGMcBYeUsjMw3O30cQD38IVy2IOTyXowg xE8vTv1BfGp7yuamgGBN+6s1T1qdNn/TKkL4op2K6D1BdfmNiVpv6jLHD9FYinqirOST OS9ExB1jZfyTzuscotMjBVMSrh/J0+431zS2aBXcf99YW8ZyI/2u+8dZoGxyVGFpNcLf NWBA== X-Forwarded-Encrypted: i=1; AFNElJ8rfzyXYmHjROVQn08fRsFXwzcS4ZkR3bvexpohSd1Vd2CC1mNPyuI1ihnDgJLmxnqbqkD4yKF0tw==@kvack.org X-Gm-Message-State: AOJu0Yxw+xfUtUDBiFquS587n18r2Q5uMEHJNjehitmmXT/aT0jdd9D+ e9eDIeIM1SdCq3IyzcHKe2tbHC9HMaejzC7l6Ft4e9sfZsXvnArlhMoFS8a5vhV++QjsXDGNex3 QVbwH X-Gm-Gg: Acq92OEpk6E1nZwOYGk5x/cJA53ROqEZIII329zSzPLY0IH1jLizDyLofquCVmvWtR4 8urZQOGGFDJsLA3K0uyJvQIxXc1nT9AiRMfM48ruZ/bwG39dRqFc8h1/11P7A8MI8UY6vPcDCf5 f2R38guJ9564drsKtWf/LHsT5ISEMp7NH7C5dWBNjIqLa6mc0oGlvGKD+2ftvURo5WfJav4LzrT WiuW/bICEGdFEOubSc3ndwEnJQRSUfczK7eWllNBj4JNkees7NyHlWWpXGNKX4b5lmaI2sw0lPl scBGbCrWspNGuJizDVuqNBZazG11McyiWRfZbL0W1HjHAHhLvnxMz6CrLsIf2p3V3xZpvyx3TU2 xHHheq8gFG3ydIEPRvxlgxCGSGeV+YwYHNpoh7Gj+U20sBSfKbOBDAqcJckGlZNuKvABcHzvtBH GyPIYqojZSRzh+DHOYkF+OBm9T/yMzXxmi4IOiTrfCjQOAuD7D64vKr2xYh05xeQ== X-Received: by 2002:aa7:88c1:0:b0:831:7f71:c810 with SMTP id d2e1a72fcca58-84284fbd441mr1411085b3a.35.1780454668067; Tue, 02 Jun 2026 19:44:28 -0700 (PDT) Received: from plex ([71.181.43.54]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-84282350f8csm1314806b3a.14.2026.06.02.19.44.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2026 19:44:27 -0700 (PDT) Date: Wed, 3 Jun 2026 02:44:25 +0000 From: Pasha Tatashin To: Pratyush Yadav Cc: Pasha Tatashin , linux-kselftest@vger.kernel.org, rppt@kernel.org, shuah@kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, skhan@linuxfoundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, corbet@lwn.net, dmatlack@google.com, kexec@lists.infradead.org, skhawaja@google.com, graf@amazon.com Subject: Re: [PATCH v4 07/13] kho: add support for linked-block serialization Message-ID: References: <20260530221938.115978-1-pasha.tatashin@soleen.com> <20260530221938.115978-8-pasha.tatashin@soleen.com> <2vxzqzmqfkit.fsf@kernel.org> <2vxzcxy8evuo.fsf@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2vxzcxy8evuo.fsf@kernel.org> X-Rspamd-Queue-Id: 62A7320004 X-Stat-Signature: aj9zie6w5x94u9zo4qwenawh143jufjy X-Rspamd-Server: rspam03 X-Rspam-User: X-HE-Tag: 1780454669-675762 X-HE-Meta: U2FsdGVkX1+ODtIDTwbbymfR3MOR0traOgHhK8QGLuYtu6yvGb0VEhoUYAakRo8qqd/8yhQ31jAPYq7Nr5BHuLmc/ulR9/PtCC0VxyKZB147B8qMuDGc9rKKUTQKwJgw/T4JdZVL2L20j7ZecWxcQAWyTUtlFTzMjIA2bM3ZPJkVDKSpinZNWCbjSZ3gwIX57DANufFBgYZ+6Y4YOiwS+Wr/LFH6npireOXvAgEGgALUowqJ4cTcyIZ5C33dTvdP6/UdBTCiW6FUUBbipXwfSGraMMlctWPDDEXgykISWGr5gGpc0882aFhWx4W62wwLgErU08/LbKaeZMSLM19IiYDWPyXc9XSC2K8tOXfcN8FdN+R7p6bnisdmZoRKcLYMuSR4L3hlHwZcW613+GN8bKdI8svPEhX4AgcwlQLhsXhE3McxDZaG3vhI+qvKZyM0szg81/BpbGXQKp5tbgAiX2MZWGAJzHJBd7UoLP8uyJr1Aro1FdLE29E9VlBwBAbQKT6f4XcjyeQVZB9jHf5ob7tW3tNdgH7r53iWVUDPe44vrW/d+Wbgf1QIOrQosIl/UcddOJoKpeVnZ+/Xr3AWCFA5bbhjQpeNbS0ke/AFu3VlMsFqUiLkSVDcXqe+ZUizTML3O+znN7PrMBSdg2pcDeZuuOEiMd1XsSudTq0T3Jb5XITvNlPGFMcnTk+nC4lnq/HGykP+posyBCLv5zUtCS6AGaVOOqwhEzeCZral4MAjjIzRKva6mi1+wzfDn/dzC4l7uTTNarvlxa45HCeizIoSigK8BNKc5fXqXFSFZzKBxB6ypNxMvTqIXyQEBGCwPdIzXxAZo2WZefXxlIURT+59xz28kPOib/JHp4mIvTWyb67Tuqrixs6Jpr3EclRwZoFnbwHCmo9LZG+xleJKJne4rIt5ADM+Z0VPnrTBAS4N0LLUncJQStOMgL8Ol8oEeSP5+dPBU7E9dTKuJyw 5trPjwjH AUr1XJKcfgFppwZdhYXkh/5iCl+iHVNXWNty+SCTtoGO48+1wsCgQ6M0rB+PxbrKNu1jaEpKpjZY8blXBvhFX9X2Ea4v/G4lzRZmvRZKMEoyKPBmXNyg0/5h9c6M2+g97shpMVfeYdVUNOjNHVW4CtTvXNlY6nLsyKd2kqZSyaUnVj3mUDuXqQpljXuzTqPpy0mMM5qQvj8l7lDAi2PUwx3ON6YvJZlNuPfkh093oGuz1RDCYfsU4B69eHvngg6j2hCMiXUJygAjU1VmTwz198Gldwz0j9pMP0Q3EHBFL34zT97qlr0+E5KYDOKvW/FfNIF1DFtWIqWzHf/4mxS2TCCgvD+nyYTtEzUhbVXH9fif2N+P4gb9lQ5issg== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06-02 18:43, Pratyush Yadav wrote: > On Mon, Jun 01 2026, Pasha Tatashin wrote: > > > On 06-01 15:38, Pratyush Yadav wrote: > >> On Sat, May 30 2026, Pasha Tatashin wrote: > >> > >> > Introduce a linked-block serialization mechanism for state handover. > >> > > >> > Previously, LUO used contiguous memory blocks for serializing sessions > >> > and files, which imposed limits on the total number of items that could > >> > be preserved across a live update. > >> > > >> > This commit adds the infrastructure for a more flexible, block-based > >> > approach where serialized data is stored in a chain of linked blocks. > >> > This is a generic KHO serialization block infrastructure that can be > >> > used by multiple subsystems. > >> > > >> > Signed-off-by: Pasha Tatashin > [...] > >> > +/** > >> > + * DOC: KHO Serialization Blocks ABI > >> > + * > >> > + * Subsystems using the KHO Serialization Blocks framework rely on the stable > >> > + * Application Binary Interface defined below to pass serialized state from a > >> > + * pre-update kernel to a post-update kernel. > >> > + * > >> > + * This interface is a contract. Any modification to the structure fields, > >> > + * compatible strings, or the layout of the `__packed` serialization > >> > + * structures defined here constitutes a breaking change. Such changes require > >> > + * incrementing the version number in the `KHO_BLOCK_ABI_COMPATIBLE` string to > >> > + * prevent a new kernel from misinterpreting data from an old kernel. > >> > + * > >> > + * Changes are allowed provided the compatibility version is incremented; > >> > + * however, backward/forward compatibility is only guaranteed for kernels > >> > + * supporting the same ABI version. > >> > + */ > >> > + > >> > +#ifndef _LINUX_KHO_ABI_BLOCK_H > >> > +#define _LINUX_KHO_ABI_BLOCK_H > >> > + > >> > +#include > >> > +#include > >> > + > >> > +#define KHO_BLOCK_ABI_COMPATIBLE "kho-block-v1" > >> > >> During KHO radix development, I argued for a separate compatible for the > >> radix tree, but at that time, we tied the radix tree to core KHO ABI. > >> The argument being that all core KHO data structures belong to the KHO > >> ABI set. I imagine this will be used by kho_vmalloc, so it will also be > >> end up being used by a core KHO API. > >> > >> So, do we want separate ABI? I don't much have a preference myself, but > >> I do think the compatible management will be a bit easier if this relied > >> on KHO compatible, especially once kho_vmalloc starts using it. > > > > I prefer to make them fine-grained, now that we are adding more and more > > features: kho vmalloc, kho radix, and kho block should all have their > > own compatibility strings. Furthermore, any components that depend on > > them should include these compatibility strings in their own > > compatibility strings, in the same manner I have done in this series. > > Sure, sounds good. > > > > >> > >> > + > >> > +/** > >> > + * KHO_BLOCK_SIZE - The size of each serialization block. > >> > + * > >> > + * This is defined as PAGE_SIZE. PAGE_SIZE is ABI compliant because live > >> > + * update between kernels with different page sizes is not supported by KHO. > >> > + */ > >> > +#define KHO_BLOCK_SIZE PAGE_SIZE > >> > + > >> > +/** > >> > + * struct kho_block_header_ser - Header for the serialized data block. > >> > + * @next: Physical address of the next struct kho_block_header_ser. > >> > + * @count: The number of entries that immediately follow this header in the > >> > + * memory block. > >> > + * > >> > + * This structure is located at the beginning of a block of physical memory > >> > + * preserved across a kexec. It provides the necessary metadata to interpret > >> > + * the array of entries that follow. > >> > + */ > >> > +struct kho_block_header_ser { > >> > + u64 next; > >> > + u64 count; > >> > +} __packed; > >> > + > >> > +#endif /* _LINUX_KHO_ABI_BLOCK_H */ > >> > diff --git a/include/linux/kho_block.h b/include/linux/kho_block.h > >> > new file mode 100644 > >> > index 000000000000..5e6b87b1befa > >> > --- /dev/null > >> > +++ b/include/linux/kho_block.h > >> > @@ -0,0 +1,79 @@ > >> > +/* SPDX-License-Identifier: GPL-2.0 */ > >> > +/* > >> > + * Copyright (c) 2026, Google LLC. > >> > + * Pasha Tatashin > >> > + */ > >> > + > >> > +#ifndef _LINUX_KHO_BLOCK_H > >> > +#define _LINUX_KHO_BLOCK_H > >> > + > >> > +#include > >> > +#include > >> > +#include > >> > + > >> > +/** > >> > + * struct kho_block - Internal representation of a serialization block. > >> > + * @list: List head for linking blocks in memory. > >> > + * @ser: Pointer to the serialized header in preserved memory. > >> > + */ > >> > +struct kho_block { > >> > + struct list_head list; > >> > + struct kho_block_header_ser *ser; > >> > +}; > >> > + > >> > +/** > >> > + * struct kho_block_set - A set of blocks that belong to the same object. > >> > + * @blocks: The list of serialization blocks (struct kho_block). > >> > + * @nblocks: The number of allocated serialization blocks. > >> > + * @head_pa: Physical address of the first block header. > >> > + * @entry_size: The size of each entry in the blocks. > >> > + * @count_per_block: The maximum number of entries each block can hold. > >> > + * @incoming: True if this block set was restored from the previous kernel. > >> > + */ > >> > +struct kho_block_set { > >> > + struct list_head blocks; > >> > + long nblocks; > >> > + u64 head_pa; > >> > + size_t entry_size; > >> > >> I think we should add the entry_size to kho_block_header_ser? I think it > >> is a part of the ABI of the block set. If this changes, we cannot parse > >> a block set with a different size. If a subsystem wants to change entry > >> size, they create a new block set with different entry size, and then > >> they bump their compatible version. > > > > I have considered that, and we can certainly do it; however, I do not > > see how it would affect the current implementation. If luo_file or > > luo_session change entry_size, they must change the LUO compatibility > > version, which would prevent LU from one kernel to the next. However, > > for flexibility and future extensibility, I believe it would be useful > > to add entry_size and block_size (which is PAGE_SIZE, but could be > > larger for some users) to the header. This is more of a feature request > > than an issue with the current series. > > My suggestion was mainly for sanity checking. So if LUO or another user > inadvertently changes entry size, it gets caught. But thinking about it > more, there are a million other ways to break compatibility while > keeping the entry size same so perhaps it doesn't matter as much... > > > > >> > >> > + u64 count_per_block; > >> > + bool incoming; > >> > +}; > >> > + > >> > +/** > >> > + * struct kho_block_it - Iterator for serializing entries into blocks. > >> > + * @bs: The block set being iterated. > >> > + * @block: The current block. > >> > + * @i: The current entry index within @block. > >> > + */ > >> > +struct kho_block_it { > >> > + struct kho_block_set *bs; > >> > + struct kho_block *block; > >> > + u64 i; > >> > +}; > >> > + > >> > +/** > >> > + * KHO_BLOCK_SET_INIT - Initialize a static kho_block_set. > >> > + * @_name: Name of the kho_block_set variable. > >> > + * @_entry_size: The size of each entry in the block set. > >> > + */ > >> > +#define KHO_BLOCK_SET_INIT(_name, _entry_size) { \ > >> > + .blocks = LIST_HEAD_INIT((_name).blocks), \ > >> > + .entry_size = _entry_size, \ > >> > +} > >> > + > >> > +void kho_block_set_init(struct kho_block_set *bs, size_t entry_size); > >> > + > >> > +int kho_block_grow(struct kho_block_set *bs, u64 count); > >> > +void kho_block_shrink(struct kho_block_set *bs, u64 count); > >> > >> These block management functions seem like internal details of the block > > > > This is not so. The confusion here is that they must be allocated and > > preserved at runtime as resources are registered/unregistered, while > > these blocks are only used serialization phase, > > > > These calls are more like notifiers that more files/sessions are created > > removed, so we can adjust block count accordingly if necessary (allocate > > preserver memory), and have them available durign > > serialization/deserialization > > Yeah, I got that when reading the later patches that use these. > > Perhaps kho_block_prealloc() and kho_block_unalloc() is more clear, > although it does not sound as nice. If not, then I suppose at least add > a comment explaining the intended usage. Done > > > > >> set API. Do we need to export them? I think users should not have to > >> worry about block management. They should read, set, or clear entries > >> using the iterators, and internally the block management should take of > >> allocation or freeing. So here for example, I th > > > > something is missing :-) > > I don't remember what I meant to say anymore :-/ > > [...] > >> > +/** > >> > + * kho_block_set_init - Initialize a block set. > >> > + * @bs: The block set to initialize. > >> > + * @entry_size: The size of each entry in the blocks. > >> > + */ > >> > +void kho_block_set_init(struct kho_block_set *bs, size_t entry_size) > >> > +{ > >> > + *bs = (struct kho_block_set)KHO_BLOCK_SET_INIT(*bs, entry_size); > >> > +} > >> > + > >> > +static inline u64 kho_block_count_per_block(struct kho_block_set *bs) > >> > +{ > >> > + if (unlikely(!bs->count_per_block)) { > >> > + bs->count_per_block = (KHO_BLOCK_SIZE - > >> > + sizeof(struct kho_block_header_ser)) / > >> > + bs->entry_size; > >> > + WARN_ON(!bs->count_per_block); > >> > + } > >> > + return bs->count_per_block; > >> > +} > >> > >> This looks odd. I don't see a reason to calculate this lazily. Why not > >> just do it when initializing the block set, in kho_block_set_init() or > >> kho_block_restore()? And then use bs->count_per_block directly. > > > > This allows for blocks to use static initilziation, I like static inits > > :-) > > You can do this: > > #define KHO_BLOCK_SET_INIT(_name, _entry_size) { \ > .blocks = LIST_HEAD_INIT((_name).blocks), \ > .entry_size = _entry_size, \ > .count_per_block = (KHO_BLOCK_SIZE - sizeof(struct kho_block_header_ser)) / (_entry_size), \ > } > > Compiles for me. You are correct, done. > > [...] > >> > +void kho_block_destroy(struct kho_block_set *bs) > >> > +{ > >> > + u64 head_pa = bs->head_pa; > >> > + struct kho_block *block; > >> > + > >> > + while (!list_empty(&bs->blocks)) { > >> > + block = list_first_entry(&bs->blocks, struct kho_block, list); > >> > + list_del(&block->list); > >> > + kfree(block); > >> > + } > >> > >> Nit: > >> > >> list_for_each_entry_safe(block, tmp, &bs->blocks, list) { > >> list_del(&block->list); > >> kfree(block); > >> } > >> > >> is a bit more idiomatic (and IMO easier to read). > > > > Sure > > > >> > >> > + bs->nblocks = 0; > >> > + bs->head_pa = 0; > >> > + > >> > + while (head_pa) { > >> > + struct kho_block_header_ser *ser = phys_to_virt(head_pa); > >> > + > >> > + head_pa = ser->next; > >> > + kho_block_free_ser(bs, ser); > >> > >> Nit: also, can't you put this also in the previous loop? Something like: > >> > >> list_for_each_entry_safe(block, tmp, &bs->blocks, list) { > >> list_del(&block->list); > >> kho_block_free_ser(block->ser); > >> kfree(block); > >> } > > > > We actually can't merge these into a single loop because of partial > > restoration failures handling in kho_block_restore(). > > > > If kho_block_restore fails halfway through restoring a chain of blocks > > (for example, if kho_block_add fails on block 3 of 5), we jump to the > > err_destroy cleanup path which calls kho_block_destroy(). > > > > At this point: > > - bs->blocks only contains the tracked blocks we successfully added > > (blocks 1 and 2). > > - bs->head_pa still points to the physical head of the entire 5-block > > incoming chain. > > > > But, this is a good place to add a comment. > > IMO it would be cleaner for kho_block_destroy() to destroy the currently > initialized block set, and then the error handling path in restore path > can clean up the rest. Sounds good, done. > > > > >> > + } > >> > +} > [...] > >> > +/** > >> > + * kho_block_it_prev - Return the previous entry slot in the block set. > >> > + * @it: The block iterator. > >> > + * > >> > + * If the current index is at the start of a block, it automatically moves to > >> > + * the end of the previous block. > >> > + * > >> > + * Return: A pointer to the previous entry slot, or NULL if at the very > >> > + * beginning of the block set. > >> > + */ > >> > +void *kho_block_it_prev(struct kho_block_it *it) > >> > +{ > >> > + if (!it->block) > >> > + return NULL; > >> > + > >> > + if (it->i == 0) { > >> > + if (list_is_first(&it->block->list, &it->bs->blocks)) > >> > + return NULL; > >> > + it->block = list_prev_entry(it->block, list); > >> > + it->i = kho_block_count_per_block(it->bs); > >> > + } > >> > + > >> > + return (void *)(it->block->ser + 1) + (--it->i * it->bs->entry_size); > >> > +} > >> > + > >> > +/** > >> > + * kho_block_it_finalize - Finalize the current block by setting its entry count. > >> > + * @it: The block iterator. > >> > + */ > >> > +void kho_block_it_finalize(struct kho_block_it *it) > >> > +{ > >> > + if (it->block) > >> > + it->block->ser->count = it->i; > >> > +} > >> > >> Doesn't kho_block_it_next() already do this when you add an entry? So > >> this seems redundant. > > > > It is not redundant because of how the final partially-fille block is handled. > > > > kho_block_it_next() only writes the count into the block header when a block is completely full and it is advancing to the next one: > > > > if (it->i == kho_block_count_per_block(it->bs)) { > > it->block->ser->count = it->i; > > ... > > > > But for the very last block in the set, it is usually only partially > > filled (e.g., we write 10 entries into a block with a capacity of 64). > > Since it->i never reaches the maximum capacity, kho_block_it_next() > > never commits its count. > > > > Pasha > > I think we can make kho_block_it_next() always write it. I think it > makes sense from an API point of view, since I see this API as "adding > an entry to the block set", so updating its internal counters makes > sense. > > Requiring the finalize will be error prone, since it is easy to forget. > Then you silently lose some entries on the next boot. Good suggetion, cleaned-up. Thank you! Pasha > > -- > Regards, > Pratyush Yadav