linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zach O'Keefe <zokeefe@google.com>
Cc: David Rientjes <rientjes@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: mm/khugepaged: collapse file/shmem compound pages
Date: Thu, 26 May 2022 04:36:04 +0100	[thread overview]
Message-ID: <Yo71pPTkB7taSb9Y@casper.infradead.org> (raw)
In-Reply-To: <CAAa6QmQhqHrKa5M4vRCAPtOa4pTet6MfELprN2Wb0rv46PSjTA@mail.gmail.com>

On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote:
> On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@infradead.org> wrote:
> > The khugepaged code (like much of the mm used to) assumes that memory
> > comes in two sizes, PTE and PMD.  That's still true for anon and shmem
> > for now, but hopefully we'll start managing both anon & shmem memory in
> > larger chunks, without necessarily going as far as PMD.
> >
> > I think the purpose of khugepaged should continue to be to construct
> > PMD-size pages; I don't see the point of it wandering through process VMs
> > replacing order-2 pages with order-5 pages.  I may be wrong about that,
> > of course, so feel free to argue with me.
> 
> I'd agree here.
> 
> > Anyway, that meaning behind that comment is that the PageTransCompound()
> > test is going to be true on any compound page (TransCompound doesn't
> > check that the page is necessarily a THP).  So that particular test should
> > be folio_test_pmd_mappable(), but there are probably other things which
> > ought to be changed, including converting the entire file from dealing
> > in pages to dealing in folios.
> 
> Right, at this point, the page might be a pmd-mapped THP, or it could
> be a pte-mapped compound page (I'm unsure if we can encounter compound
> pages outside hugepages).

Today, there is a way.  We can find a folio with an order between 0 and
PMD_ORDER if the underlying filesystem supports large folios and the
file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS.
In this case, we'll simply skip over it because the code believes that
means it's already a PMD.

> If we could tell it's already pmd-mapped, we're done :) IIUC,
> folio_test_pmd_mappable() is a necessary but not sufficient condition
> to determine this.

It is necessary, but from khugepaged's point of view, it's sufficient
because khugepaged's job is to create PMD-sized folios -- it's not up to
khugepaged to ensure that PMD-sized folios are actually mapped using
a PMD.  There may be some other component of the system (eg DAMON?)
which has chosen to temporarily map the PMD-sized folio using PTEs
in order to track whether the memory is all being used.  It may also
be the case that (for file-based memory), the VMA is mis-aligned and
despite creating a PMD-sized folio, it can't be mapped with a PMD.

> Else, if it's not, is it safe to try and continue? Suppose we find a
> folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and
> extend it, or will we break some filesystems that expect a certain
> order folio?

We're not giving filesystems the opportunity to request that ;-)
Filesystems are expected to handle folios of arbitrary order (if they
claim the ability to support large folios at all).  In practice, I've
capped the folio creation size at PMD_ORDER (because I don't want to track
down all the places that assume pmd_page() is necessarily a head page),
but filesystems shouldn't rely on it.

shmem still expects folios to be of order either 0 or PMD_ORDER.
That assumption extends into the swap code and I haven't had the heart
to go and fix all those places yet.  Plus Neil was doing major surgery
to the swap code in the most recent deveopment cycle and I didn't want
to get in his way.

So I am absolutely fine with khugepaged allocating a PMD-size folio for
any inode that claims mapping_large_folio_support().  If any filesystems
break, we'll fix them.

> > I actually have one patch which starts in that direction, but I haven't
> > followed it up yet with all the other patches to that file which will
> > be needed:
> 
> Thanks for the head start! Not an expert here, but would you say
> converting this file to use folios is a necessary first step?

Not _necessary_, but I find it helps keep things clearer.  Plus it's
something that needs to happen anyway.


  reply	other threads:[~2022-05-26  3:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24 22:42 mm/khugepaged: collapse file/shmem compound pages Zach O'Keefe
2022-05-25 19:07 ` Matthew Wilcox
2022-05-26  1:23   ` Zach O'Keefe
2022-05-26  3:36     ` Matthew Wilcox [this message]
2022-05-27  0:54       ` Zach O'Keefe
2022-05-27  3:47         ` Matthew Wilcox
2022-05-27 16:27           ` Zach O'Keefe
2022-05-28  3:48             ` Matthew Wilcox
2022-05-29 21:36               ` Zach O'Keefe
2022-05-30  1:25                 ` Rongwei Wang
2022-06-01  5:19                   ` Zach O'Keefe
2022-06-01 11:26                     ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yo71pPTkB7taSb9Y@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).