linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Jason Miu <jasonmiu@google.com>
Cc: Alexander Graf <graf@amazon.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>,
	Changyuan Lyu <changyuanl@google.com>,
	David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	Joel Granados <joel.granados@kernel.org>,
	Marcos Paulo de Souza <mpdesouza@suse.com>,
	Mario Limonciello <mario.limonciello@amd.com>,
	Mike Rapoport <rppt@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Petr Mladek <pmladek@suse.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Steven Chen <chenste@linux.microsoft.com>,
	Yan Zhao <yan.y.zhao@intel.com>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC v1 1/4] kho: Introduce KHO page table data structures
Date: Wed, 17 Sep 2025 09:21:58 -0300	[thread overview]
Message-ID: <20250917122158.GC1086830@nvidia.com> (raw)
In-Reply-To: <20250917025019.1585041-2-jasonmiu@google.com>

On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote:
> + * kho_order_table
> + * +-------------------------------+--------------------+
> + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER |
> + * ++------------------------------+--------------------+
> + *  |
> + *  |
> + *  v
> + * ++------+
> + * |  Lv6  | kho_page_table
> + * ++------+

I seem to remember suggesting this could be simplified without the
special case 7h level table table for order.

Encode the phys address as:

(order << 51) | (phys >> (PAGE_SHIFT + order))

Then you don't need another table for order, the 64 bits encode
everything consistently. Order can't be > 52 so it is
only 6 bits, meaning the result fits into at most 57 bits.

> + *      63      62:54    53:45    44:36    35:27        26:0
> + * +--------+--------+--------+--------+--------+-----------------+
> + * |  Lv 6  |  Lv 5  |  Lv 4  |  Lv 3  |  Lv 2  |  Lv 1 (bitmap)  |
> + * +--------+--------+--------+--------+--------+-----------------+

This isn't quite right, the 11:0 bits are must be zero and not used to
index anything.

Adjusting to reflect the above math, it would be like this:

 63:60   59:51    50:42    41:33    32:34    23:15       14:0
+-----+--------+--------+--------+--------+--------+-----------------+
| 0   |  Lv 6  |  Lv 5  |  Lv 4  |  Lv 3  |  Lv 2  |  Lv 1 (bitmap)  |
+-----+--------+--------+--------+--------+--------+-----------------+

The order level is just folded into lv 6

> + * For higher order pages, the bit fields for each level shift to the left by
> + * the page order.

This is probably an unncessary complexity. The table levels cost only
64 bytes, it isn't so valuable to eliminate them. So with the above
math it shifts right not left. Level 1 is always the bitmap and it
doesn't move around. I'd label this 0 in the code.

If you also fix the sizes to be 64 bytes and 4096 bytes regardless of
PAGE_SIZE then everything is easy and fixed, while still efficient on
higher PAGE_SIZE architectures.

Fruther, changing the formula to this:

(1 << (63 - PAGE_SHIFT - order)) | (phys >> (PAGE_SHIFT + order))

Will shift the overhead levels to the top of the radix tree and share
them across all orders, higher PAGE_SIZE arches will just get a single
lvl 5 and an unecessary lvl 6 - cost 64 extra bytes who cares.

> +#define BITMAP_TABLE_SHIFT(_order) (PAGE_SHIFT + PAGE_SHIFT + 3 + (_order))
> +#define BITMAP_TABLE_MASK(_order) ((1ULL << BITMAP_TABLE_SHIFT(_order)) - 1)
> +#define PRESERVED_PAGE_OFFSET_SHIFT(_order) (PAGE_SHIFT + (_order))
> +#define PAGE_TABLE_SHIFT_PER_LEVEL (ilog2(PAGE_SIZE / sizeof(unsigned long)))
> +#define PAGE_TABLE_LEVEL_MASK ((1ULL << PAGE_TABLE_SHIFT_PER_LEVEL) - 1)
> +#define PTR_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long))

please use inlines and enums :(

It looks like if you make the above algorithm changes most of the this
code is deleted.

Jason


  reply	other threads:[~2025-09-17 12:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  2:50 [RFC v1 0/4] Make KHO Stateless Jason Miu
2025-09-17  2:50 ` [RFC v1 1/4] kho: Introduce KHO page table data structures Jason Miu
2025-09-17 12:21   ` Jason Gunthorpe [this message]
2025-09-17 16:18     ` Pasha Tatashin
2025-09-17 16:32       ` Jason Gunthorpe
2025-09-19  6:49         ` Jason Miu
2025-09-19 12:56           ` Jason Gunthorpe
2025-09-17  2:50 ` [RFC v1 2/4] kho: Adopt KHO page tables and remove serialization Jason Miu
2025-09-17 17:52   ` Mike Rapoport
2025-09-19  6:58     ` Jason Miu
2025-09-17  2:50 ` [RFC v1 3/4] memblock: Remove KHO notifier usage Jason Miu
2025-09-17  2:50 ` [RFC v1 4/4] kho: Remove notifier system infrastructure Jason Miu
2025-09-17 11:36 ` [RFC v1 0/4] Make KHO Stateless Jason Gunthorpe
2025-09-17 14:48   ` Pasha Tatashin
2025-09-21 22:26   ` Matthew Wilcox
2025-09-21 23:07     ` Pasha Tatashin
2025-09-25  9:19 ` Mike Rapoport
2025-09-25 12:27   ` Pratyush Yadav
2025-09-25 12:33     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250917122158.GC1086830@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=changyuanl@google.com \
    --cc=chenste@linux.microsoft.com \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=jasonmiu@google.com \
    --cc=joel.granados@kernel.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mario.limonciello@amd.com \
    --cc=mpdesouza@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pmladek@suse.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).