linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
	vbabka@suse.cz, kirill.shutemov@linux.intel.com,
	linmiaohe@huawei.com, songliubraving@fb.com, riel@surriel.com,
	ziy@nvidia.com, akpm@linux-foundation.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, darrick.wong@oracle.com,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent
Date: Fri, 18 Mar 2022 13:19:00 -0700	[thread overview]
Message-ID: <CAHbLzkrE1sF2bcv=eB8szx047pUgMZzo9JuyADTCBzZ9V-+7XA@mail.gmail.com> (raw)
In-Reply-To: <YjTT5Meqdn8fiuC2@casper.infradead.org>

On Fri, Mar 18, 2022 at 11:48 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Mar 18, 2022 at 11:04:29AM -0700, Yang Shi wrote:
> > I agree once page cache huge page is fully supported,
> > READ_ONLY_THP_FOR_FS could be deprecated. But actually this patchset
> > makes khugepaged collapse file THP more consistently. It guarantees
> > the THP could be collapsed as long as file THP is supported and
> > configured properly and there is suitable file vmas, it is not
> > guaranteed by the current code. So it should be useful even though
> > READ_ONLY_THP_FOR_FS is gone IMHO.
>
> I don't know if it's a good thing or not.  Experiments with 64k
> PAGE_SIZE on arm64 shows some benchmarks improving and others regressing.
> Just because we _can_ collapse a 2MB range of pages into a single 2MB
> page doesn't mean we _should_.  I suspect the right size folio for any
> given file will depend on the access pattern.  For example, dirtying a
> few bytes in a folio will result in the entire folio being written back.
> Is that what you want?  Maybe!  It may prompt the filesystem to defragment
> that range, which would be good.  On the other hand, if you're bandwidth
> limited, it may decrease your performance.  And if your media has limited
> write endurance, it may result in your drive wearing out more quickly.
>
> Changing the heuristics should come with data.  Preferably from a wide
> range of systems and use cases.  I know that's hard to do, but how else
> can we proceed?

TBH I don't think it belongs to "change the heuristics". Its users'
decision if their workloads could benefit from huge pages or not. They
could set THP to always/madivse/never per their workloads. The
patchset is aimed to fix the misbehavior. The user visible issue is
even though users enable READ_ONLY_THP_FOR_FS and configure THP to
"always" (khugepaged always runs) and do expect their huge text
section is backed by THP but THP may not be collapsed.

>
> And I think you ignored my point that READ_ONLY_THP_FOR_FS required
> no changes to filesystems.  It was completely invisible to them, by
> design.  Now this patchset requires each filesystem to do something.
> That's not a great step.

I don't mean to ignore your point. I do understand it is not perfect.
I was thinking about making it FS agnostic in the first place. But I
didn't think of a perfect way to do it at that time, so I followed
what tmpfs does.

However, by rethinking this we may be able to call
khugepaged_enter_file() in filemap_fault(). I was concerned about the
overhead in the page fault path. But it may be neglectable since
khugepaged_enter_file() does bail out in the first place if the mm is
already registered in khugepaged, just the first page fault needs to
go through all the check, but the first page fault is typically a
major fault so the overhead should be not noticeable comparing to the
overhead of I/O. Calling khugepaged_enter() in page fault path is the
approach used by anonymous THP too.

>
> P.S. khugepaged currently does nothing if a range contains a compound
> page.  It assumes that the page is compound because it's now a THP.
> Large folios break that assumption, so khugepaged will now never
> collapse a range which includes large folios.  Thanks to commit
>     mm/filemap: Support VM_HUGEPAGE for file mappings
> we'll always try to bring in PMD-sized pages for MADV_HUGEPAGE, so
> it _probably_ doesn't matter.  But it's something we should watch
> for as filesystems grow support for large folios.

Yeah, I agree, thanks for reminding this. In addition I think the
users of READ_ONLY_THP_FOR_FS should also expect the PMD-sized THP to
be collapsed for their usecase with full page cache THP support since
their benefits come from reduced TLB miss.

  reply	other threads:[~2022-03-18 20:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 23:48 [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Yang Shi
2022-03-17 23:48 ` [v2 PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Yang Shi
2022-03-17 23:48 ` [v2 PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED Yang Shi
2022-03-17 23:48 ` [v2 PATCH 3/8] mm: khugepaged: skip DAX vma Yang Shi
2022-03-21 12:04   ` Hyeonggon Yoo
2022-03-21 20:59     ` Yang Shi
2022-03-17 23:48 ` [v2 PATCH 4/8] mm: thp: only regular file could be THP eligible Yang Shi
2022-03-17 23:48 ` [v2 PATCH 5/8] mm: khugepaged: make khugepaged_enter() void function Yang Shi
2022-03-17 23:48 ` [v2 PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c Yang Shi
2022-03-17 23:48 ` [v2 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_file() helper Yang Shi
2022-03-17 23:48 ` [v2 PATCH 8/8] fs: register suitable readonly vmas for khugepaged Yang Shi
2022-03-18  1:10 ` [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent Song Liu
2022-03-18  1:29 ` Dave Chinner
2022-03-18  3:38   ` Matthew Wilcox
2022-03-18 18:04     ` Yang Shi
2022-03-18 18:48       ` Matthew Wilcox
2022-03-18 20:19         ` Yang Shi [this message]
2022-03-18 17:31   ` Yang Shi
2022-03-24  1:47 ` Theodore Ts'o
2022-03-24  2:46   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrE1sF2bcv=eB8szx047pUgMZzo9JuyADTCBzZ9V-+7XA@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=songliubraving@fb.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).