From: Jason Gunthorpe <jgg@nvidia.com>
To: Jason Miu <jasonmiu@google.com>
Cc: Alexander Graf <graf@amazon.com>,
Andrew Morton <akpm@linux-foundation.org>,
Baoquan He <bhe@redhat.com>,
Changyuan Lyu <changyuanl@google.com>,
David Matlack <dmatlack@google.com>,
David Rientjes <rientjes@google.com>,
Mike Rapoport <rppt@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Pratyush Yadav <pratyush@kernel.org>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v1 1/3] kho: Adopt KHO radix tree data structures
Date: Thu, 9 Oct 2025 14:52:05 -0300 [thread overview]
Message-ID: <20251009175205.GB3899236@nvidia.com> (raw)
In-Reply-To: <CAHN2nPKjg6=0QTcSoptxvQW9MpP0YwGUTx_gQDBxgCtSzxY5Pg@mail.gmail.com>
> > > -#define PROP_PRESERVED_MEMORY_MAP "preserved-memory-map"
> > > +#define PROP_PRESERVED_PAGE_RADIX_TREE "preserved-page-radix-tree"
> > > #define PROP_SUB_FDT "fdt"
> >
> > I'de really like to see all of these sorts of definitions in some
> > structured ABI header not open coded all over the place..
>
> Do you think `include/linux/kexec_handover.h` is the appropriate
> place, or would you prefer a new, dedicated ABI header (e.g., in
> `include/uapi/linux/`) for all KHO-related FDT constants?
I would avoid uapi, but maybe Pasha has some
idea.
include/linux/live_update/abi/ ?
> Agreed. Will change `u64` according to Pasha's comment. And we use
> explicit casts like `(u64)virt_to_phys(new_tree)` and `(struct
> kho_radix_tree *)phys_to_virt(table_entry)` in the current series. I
> believe this, along with the `u64` type, makes it clear that the table
> stores physical addresses.
Well, the macros were intended to automate this and avoid mistakes
from opencoding.. Just keep using them?
> > > + */
> > > +static unsigned long kho_radix_encode(unsigned long pa, unsigned int order)
> >
> > pa is phys_addr_t in the kernel, never unsigned long.
> >
> > If you want to make it all dynamic then this should be phys_addr_t
>
> Should this also be `u64`, or we stay with `phys_addr_t` for all page
> addresses?
you should use phys_addr_t for everything that originates from a
phys_addr_t, and u64 for all the ABI
> > > +{
> > > + unsigned long h = 1UL << (BITS_PER_LONG - PAGE_SHIFT - order);
> >
> > And this BITS_PER_LONG is confused, it is BITS_PER_PHYS_ADDR_T which
> > may not exist.
> >
> > Use an enum ORDER_0_LG2 maybe
>
> I prefer `KHO_RADIX_ORDER_0_BIT_POS` (defined as `BITS_PER_LONG -
> PAGE_SHIFT`) over `ORDER_0_LG2`, as I think the latter is a bit hard
> to understand, what do you think? This constant, along with others,
> will be placed in the enum.
Sure, though I prefer LG2 to BIT_POS
BIT_POS to me implies it is being used as bit wise operation, while
log2 is a mathematical concept
X_lg2 = ilog2(X) && X == 1 << X_lg2
> > > + kho_radix_tree_walk_callback_t cb)
> > > +{
> > > + int level_shift = ilog2(PAGE_SIZE / sizeof(unsigned long));
> > > + struct kho_radix_tree *next_tree;
> > > + unsigned long encoded, i;
> > > + int err = 0;
> > >
> > > + if (level == 1) {
> > > + encoded = offset;
> > > + return kho_radix_walk_bitmaps((struct kho_bitmap_table *)root,
> > > + encoded, cb);
> >
> > Better to do this in the caller a few lines below
>
> But the caller is in a different tree level? Should we only walk the
> bitmaps at the lowest level?
I mean just have the caller do
if (level-1 ==0)
kho_radix_walk_bitmaps()
else
..
Avoids a function call
> > > + for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) {
> > > + if (root->table[i]) {
> > > + encoded = offset << level_shift | i;
> >
> > This doesn't seem right..
> >
> > The argument to the walker should be the starting encoded of the table
> > it is about to walk.
> >
> > Since everything always starts at 0 it should always be
> > start | (i << level_shift)
> >
> > ?
>
> You're right that this line might not be immediately intuitive. The
> var `level_shift` (which is constant value 9 here) is applied to the
> *accumulated* `offset` from the parent level. Let's consider an
> example of a preserved page at physical address `0x1000`, which
> encodes to `0x10000000000001` (bit 52 is set for order 0, bit 0 is set
> for page 1).
Oh, weird, too weird maybe. I'd just keep all the values as fully
shifted, level_shift should be adjusted to have the full shift for
this level. Easier to understand.
Also, I think the order bits might have become a bit confused, I think
I explained it wrong.
My idea was to try to share the radix levels to save space eg if we
have like this patch does:
Order phys
00001 abcd
00010 0bcd
00100 00cd
01000 000d
Then we don't get too much page level sharing, the middle ends up with
0 indexes in tables that cannot be shared.
What I was going for was to push all the shared pages to the left
00001 abcd
00000 1bcd
00000 01cd
00000 001d
Here the first radix level has index 0 or 1 and is fully shared. So eg
Order 4 and 5 will have all the same 0 index table levels. This also
reduces the max height of the tree because only +1 bit is needed to
store order.
Jason
next prev parent reply other threads:[~2025-10-09 17:52 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-01 1:19 [PATCH v1 0/3] Make KHO Stateless Jason Miu
2025-10-01 1:19 ` [PATCH v1 1/3] kho: Adopt KHO radix tree data structures Jason Miu
2025-10-02 4:29 ` kernel test robot
2025-10-06 14:14 ` Jason Gunthorpe
2025-10-06 17:26 ` Pasha Tatashin
2025-10-06 22:50 ` Jason Gunthorpe
2025-10-09 2:07 ` Jason Miu
2025-10-09 17:52 ` Jason Gunthorpe [this message]
2025-10-22 0:59 ` Jason Miu
2025-10-01 1:19 ` [PATCH v1 2/3] memblock: Remove KHO notifier usage Jason Miu
2025-10-01 16:35 ` kernel test robot
2025-10-01 1:19 ` [PATCH v1 3/3] kho: Remove notifier system infrastructure Jason Miu
2025-10-01 18:07 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251009175205.GB3899236@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=changyuanl@google.com \
--cc=dmatlack@google.com \
--cc=graf@amazon.com \
--cc=jasonmiu@google.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.