linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Yang Shi <shy828301@gmail.com>,
	vbabka@suse.cz, kirill.shutemov@linux.intel.com,
	linmiaohe@huawei.com, songliubraving@fb.com, riel@surriel.com,
	ziy@nvidia.com, akpm@linux-foundation.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, darrick.wong@oracle.com,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
Date: Fri, 18 Mar 2022 03:38:11 +0000	[thread overview]
Message-ID: <YjP+oyoT9Y2SFt8L@casper.infradead.org> (raw)
In-Reply-To: <20220318012948.GE1544202@dread.disaster.area>

On Fri, Mar 18, 2022 at 12:29:48PM +1100, Dave Chinner wrote:
> On Thu, Mar 17, 2022 at 04:48:19PM -0700, Yang Shi wrote:
> > 
> > Changelog
> > v2: * Collected reviewed-by tags from Miaohe Lin.
> >     * Fixed build error for patch 4/8.
> > 
> > The readonly FS THP relies on khugepaged to collapse THP for suitable
> > vmas.  But it is kind of "random luck" for khugepaged to see the
> > readonly FS vmas (see report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/) since currently the vmas are registered to khugepaged when:
> >   - Anon huge pmd page fault
> >   - VMA merge
> >   - MADV_HUGEPAGE
> >   - Shmem mmap
> > 
> > If the above conditions are not met, even though khugepaged is enabled
> > it won't see readonly FS vmas at all.  MADV_HUGEPAGE could be specified
> > explicitly to tell khugepaged to collapse this area, but when khugepaged
> > mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
> > is not set.
> > 
> > So make sure readonly FS vmas are registered to khugepaged to make the
> > behavior more consistent.
> > 
> > Registering the vmas in mmap path seems more preferred from performance
> > point of view since page fault path is definitely hot path.
> > 
> > 
> > The patch 1 ~ 7 are minor bug fixes, clean up and preparation patches.
> > The patch 8 converts ext4 and xfs.  We may need convert more filesystems,
> > but I'd like to hear some comments before doing that.
> 
> After reading through the patchset, I have no idea what this is even
> doing or enabling. I can't comment on the last patch and it's effect
> on XFS because there's no high level explanation of the
> functionality or feature to provide me with the context in which I
> should be reviewing this patchset.
> 
> I understand this has something to do with hugepages, but there's no
> explaination of exactly where huge pages are going to be used in the
> filesystem, what the problems with khugepaged and filesystems are
> that this apparently solves, what constraints it places on
> filesystems to enable huge pages to be used, etc.
> 
> I'm guessing that the result is that we'll suddenly see huge pages
> in the page cache for some undefined set of files in some undefined
> set of workloads. But that doesn't help me understand any of the
> impacts it may have. e.g:
> 
> - how does this relate to the folio conversion and use of large
>   pages in the page cache?
> - why do we want two completely separate large page mechanisms in
>   the page cache?
> - why is this limited to "read only VMAs" and how does the
>   filesystem actually ensure that the VMAs are read only?
> - what happens if we have a file that huge pages mapped into the
>   page cache via read only VMAs then has write() called on it via a
>   different file descriptor and so we need to dirty the page cache
>   that has huge pages in it?
> 
> I've got a lot more questions, but to save me having to ask them,
> how about you explain what this new functionality actually does, why
> we need to support it, and why it is better than the fully writeable
> huge page support via folios that we already have in the works...

Back in Puerto Rico when we set up the THP Cabal, we had two competing
approaches for using larger pages in the page cache; mine (which turned
into folios after I realised that THPs were the wrong model) and Song
Liu's CONFIG_READ_ONLY_THP_FOR_FS.  Song's patches were ready earlier
(2019) and were helpful in unveiling some of the problems which needed
to be fixed.  The filesystem never sees the large pages because they're
only used for read-only files, and the pages are already Uptodate at
the point they're collapsed into a THP.  So there's no changes needed
to the filesystem.

This collection of patches I'm agnostic about.  As far as I can
tell, they're a way to improve how often the ROTHP feature gets used.
That doesn't really interest me since we're so close to having proper
support for large pages/folios in filesystems.  So I'm not particularly
interested in improving a feature that we're about to delete.  But I also
don't like it that the filesystem now has to do something; the ROTHP
feature is supposed to be completely transparent from the point of view
of the filesystem.


  reply	other threads:[~2022-03-18  3:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 23:48 [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
2022-03-17 23:48 ` [v2 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
2022-03-17 23:48 ` [v2 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
2022-03-17 23:48 ` [v2 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
2022-03-21 12:04   ` Hyeonggon Yoo
2022-03-21 20:59     ` Yang Shi
2022-03-17 23:48 ` [v2 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
2022-03-17 23:48 ` [v2 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
2022-03-17 23:48 ` [v2 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
2022-03-17 23:48 ` [v2 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_file() helper Yang Shi
2022-03-17 23:48 ` [v2 PATCH 8/8] fs: register suitable readonly vmas for khugepaged Yang Shi
2022-03-18  1:10 ` [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Song Liu
2022-03-18  1:29 ` Dave Chinner
2022-03-18  3:38   ` Matthew Wilcox [this message]
2022-03-18 18:04     ` Yang Shi
2022-03-18 18:48       ` Matthew Wilcox
2022-03-18 20:19         ` Yang Shi
2022-03-18 17:31   ` Yang Shi
2022-03-24  1:47 ` Theodore Ts'o
2022-03-24  2:46   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjP+oyoT9Y2SFt8L@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=shy828301@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).