linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, songmuchun@bytedance.com,
	david@redhat.com
Subject: Re: [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages
Date: Sat, 22 Jul 2023 05:05:40 +0100	[thread overview]
Message-ID: <ZLtVlJA+V2+2yjxc@casper.infradead.org> (raw)
In-Reply-To: <20230720000011.GD3240@monkey>

On Wed, Jul 19, 2023 at 05:00:11PM -0700, Mike Kravetz wrote:
> On 07/10/23 16:04, Sidhartha Kumar wrote:
> > ========================== OVERVIEW ========================================
> > This patchset attempts to implement a listed filemap TODO which is
> > changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
> > functions within filemap.c as they have to special case hugetlb pages.
> > From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
> > a huge page sized index as it is used for the reservation map and the hash
> > function for the hugetlb mutex table.
> > 
> > This patchset adds new wrappers for hugetlb code to to interact with the
> > page cache. These wrappers calculate a linear page index as this is now
> > what the page cache expects for hugetlb pages.
> > 
> > From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
> > special casing throughout the core mm code. This series accomplishes
> > a part of this by shifting complexity from filemap.c to hugetlb.c. There
> > are still checks for hugetlb within the filemap code as cgroup accounting
> > and hugetlb accounting are special cased as well. 
> > 
> > =========================== PERFORMANCE =====================================
> 
> Hi Sid,
> 
> Sorry for being dense but can you tell me what the below performance
> information means.  My concern with such a change would be any noticeable
> difference in populating a large (up to TB) hugetlb file.  My guess is
> that it is going to take longer unless xarray is optimized for this.
> 
> We do have users that create and pre-populate hugetlb files this big.
> Just want to make sure there are no surprises for them.

It's Going To Depend.  Annoyingly.

Let's say you're using 1GB pages on a 4kB PAGE_SIZE machine.  That's an
order-18 folio, so we end up skipping three layers of the tree, and if
you're going up to 1TB, it's structured:

root -> node (shift 30) -> node (shift 24) -> entry
                                           -> entry (...)
			-> node (shift 24) -> entry
					   (...)
			(...)

This is essentially no different from before where each 1GB page would
occupy a single entry.  It's just that it now occupies 2^18 entries,
and everything in the tree has a different label.

Where you will (may?) see a difference is with the 2MB entries.
An order-9 page doesn't quite fit with the order-6 nodes in the tree,
so it looks like this:

root -> node (s30) -> node (s24) -> node (s18) -> node (s12) -> entry 0
							     -> sibling
							     -> sibling
							     (...)
							     -> entry 8
							     -> sibling
							     -> sibling
							     (...)

so all of a sudden the tree is 8x as big as it used to be.  The upside
is that we lose all the calculations from filemap.c/pagemap.h.  It's a
lot better than it was perhaps five years ago when each 2MB page would
occupy 512 entries, but 8 entries is still worse than 1.

Could we do better?  Undoubtedly.  We could have variable shifts & node
sizes in the tree so that we perhaps had an s18 node that was 8x as large
(4160 bytes), and then each order-9 entry in the tree would occupy one
entry in that special large node.  I've been reluctant to introduce such
a beast without strong evidence it would help.  Or we could introduce a
small s12 node which could only store 8 entries (again an order-9 entry
would occupy one entry in such a special node).

These are things which would only benefit hugetlbfs, so there's a bit
of a chicken-and-egg problem; no demand for the feature until the work
is done, and the work maybe performs badly until the feature exists.

And then some architectures have other orders for their huge pages.
Order 11 is probably the worst possibility to exist (or in general 6n -
1), but I haven't done a detailed survey to figure out if anyone supports
such a thing.


      reply	other threads:[~2023-07-22  4:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 23:04 [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Sidhartha Kumar
2023-07-10 23:04 ` [PATCH v2 1/1] mm/filemap: remove hugetlb special casing in filemap.c Sidhartha Kumar
2023-07-11 19:32   ` Andrew Morton
2023-07-21 20:22   ` Mike Kravetz
2023-07-20  0:00 ` [PATCH v2 0/1] change ->index to PAGE_SIZE for hugetlb pages Mike Kravetz
2023-07-22  4:05   ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLtVlJA+V2+2yjxc@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).