Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Jork Loeser <jloeser@linux.microsoft.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Alexander Graf <graf@amazon.com>,
	 Muchun Song <muchun.song@linux.dev>,
	 Oscar Salvador <osalvador@suse.de>,
	 David Hildenbrand <david@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Jason Miu <jasonmiu@google.com>,
	kexec@lists.infradead.org,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 02/18] kho: disallow wide keys in radix tree
Date: Mon, 08 Jun 2026 11:10:59 +0200	[thread overview]
Message-ID: <2vxzwlw9tn18.fsf@kernel.org> (raw)
In-Reply-To: <fb7ed559-557-b8c1-1f6-5f879feab0ec@linux.microsoft.com> (Jork Loeser's message of "Fri, 5 Jun 2026 15:06:05 -0700 (PDT)")

On Fri, Jun 05 2026, Jork Loeser wrote:

> On Fri, 5 Jun 2026, Pratyush Yadav wrote:
>
>> From: "Pratyush Yadav (Google)" <pratyush@kernel.org>
>>
>> The KHO radix tree was designed to track preserved pages. So it does not
>> provide the capability to track any 64-bit key. Instead, it limits the
>> key width to how much it needs for tracking PFNs and their orders.
>> Limiting the width reduces the number of levels in the tree.
>>
>> KHO is not expected to be the only user of the radix tree. With the API
>> generalized to allow other users, now it is possible to add any key to
>> the tree.
>>
>> Check the key width at kho_radix_add_key(), and error out if it exceeds
>> what the tree can handle. Do this instead of increasing the tree depth
>> since right now there are no users that need to use wider keys, so this
>> avoids memory overhead and ABI breakage.
>>
>> Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
>> ---
>> include/linux/kho/abi/kexec_handover.h |  8 ++++++++
>> kernel/liveupdate/kexec_handover.c     | 12 ++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/include/linux/kho/abi/kexec_handover.h b/include/linux/kho/abi/kexec_handover.h
>> index fb2d37417ad9..6dbb98bfb586 100644
>> --- a/include/linux/kho/abi/kexec_handover.h
>> +++ b/include/linux/kho/abi/kexec_handover.h
>> @@ -278,6 +278,14 @@ enum kho_radix_consts {
>> 			     KHO_TABLE_SIZE_LOG2) + 1,
>> };
>>
>> +/*
>> + * The maximum key width this radix tree can track.
>> + *
>> + * This value isn't ABI itself, but it is derived from values that are ABI.
>> + */
>> +#define KHO_RADIX_KEY_WIDTH (((KHO_TREE_MAX_DEPTH - 1) * KHO_TABLE_SIZE_LOG2) + \
>> +			     KHO_BITMAP_SIZE_LOG2)
>
> Love the auto-derivation of these values, this totally makes sense. That said,
> my lazy brain complained a bit when I asked it "so how many bits can a consumer
> actually use?". So I wonder:
>
> 1) Why is the value not "ABI itself"; it feels like it should as it
>    determines client behavior.

The main idea was that if you delve into the details, the value is a
combination of other values, and doesn't directly influence the binary
structure. For example, KHO_ORDER_0_LOG2 (64 - PAGE_SHIFT) influences it
directly. It decides the width of the keys that can be supported.

But now that I think of this again, I think this patch is kind of
stupid. The equation for KHO_RADIX_KEY_WIDTH is exactly the inverse of
the equation KHO_TREE_MAX_DEPTH. The max key width is (KHO_ORDER_0_LOG2
+ 1), and the equation for KHO_TREE_MAX_DEPTH uses that to arrive at the
tree depth.

All this is very obscure unfortunately. First of all, KHO_ORDER_0_LOG2
is a very undescriptive name. I have no idea what it is supposed to mean
or represent. The comment above doesn't help much either and I think is
misleading.

Second, the equation for KHO_TREE_MAX_DEPTH hides in itself the fact
that we need one extra bit on top of KHO_ORDER_0_LOG2. KHO_ORDER_0_LOG2
is essentially the width of PFN. And we need one more bit for the order.
That +1 is hidden in

    DIV_ROUND_UP(KHO_ORDER_0_LOG2 - KHO_BITMAP_SIZE_LOG2 + 1, ...),

I think we should to the following:

1. Rename KHO_ORDER_0_LOG2 to KHO_RADIX_KEY_WIDTH and make its equation
   (64 - PAGE_SHIFT + 1) with the comment above clearly explaining the
   reasoning.

2. Now that the +1 is in the key width itself, the equation for tree
   depth can be simplified to:

       ((KHO_RADIX_KEY_WIDTH - KHO_BITMAP_SIZE_LOG2) / KHO_TABLE_SIZE_LOG2) + 1

   ... which is an improvement I think.

I've been tripped by this radix tree math before, so I think this might
help out a bit. Will fix that in the next version.

>
> 2) Would you consider expanding the actual values for the most relevant
>    architectures (x86-64 w/ 4kb pages, arm64 w/ 4k/16/64k page-sizes) and
>    put it in a block-comment?

Good idea. Will do.

[...]

-- 
Regards,
Pratyush Yadav


  reply	other threads:[~2026-06-08  9:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 18:34 [PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 01/18] kho: generalize radix tree APIs Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 02/18] kho: disallow wide keys in radix tree Pratyush Yadav
2026-06-05 22:06   ` Jork Loeser
2026-06-08  9:10     ` Pratyush Yadav [this message]
2026-06-05 18:34 ` [PATCH v2 03/18] kho: return virtual address of mem_map Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 04/18] kho: store incoming radix tree in kho_in Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 05/18] kho: move all memory retrieval logic to kho_mem_retrieve() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 06/18] kho: add a struct for radix callbacks Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 07/18] kho: add callback for table pages Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 08/18] kho: add data argument to radix walk callback Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 09/18] kho: allow early-boot usage of the KHO radix tree Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 10/18] kho: allow destroying " Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 11/18] kho: add kho_radix_init_tree() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 12/18] kho: export kho_scratch_overlap() Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 13/18] kho: initialize kho_scratch pointer earlier in boot Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 14/18] memblock: use kho_scratch_overlap() to decide migratetype Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 15/18] kho: extend scratch Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 16/18] memblock: make HugeTLB bootmem allocation work with KHO Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 17/18] memblock: allow calculating reserved size by flags Pratyush Yadav
2026-06-05 18:34 ` [PATCH v2 18/18] kho: exclude hugetlb memory from scratch size calculation Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2vxzwlw9tn18.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=graf@amazon.com \
    --cc=jasonmiu@google.com \
    --cc=jloeser@linux.microsoft.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox