From: Barry Song <baohua@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Jonathan Corbet <corbet@lwn.net>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Lance Yang <ioworker0@gmail.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Gavin Shan <gshan@redhat.com>,
Pankaj Raghav <kernel@pankajraghav.com>,
Daniel Gomez <da.gomez@samsung.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache memory
Date: Thu, 19 Sep 2024 20:20:51 +1200 [thread overview]
Message-ID: <CAGsJ_4z8kh4Pn-TUrVq6FALR1J5j4fpvQkef2xPFYPWdWfXdxA@mail.gmail.com> (raw)
In-Reply-To: <480f34d0-a943-40da-9c69-2353fe311cf7@arm.com>
On Thu, Aug 8, 2024 at 10:27 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 17/07/2024 08:12, Ryan Roberts wrote:
> > Hi All,
> >
> > This series is an RFC that adds sysfs and kernel cmdline controls to configure
> > the set of allowed large folio sizes that can be used when allocating
> > file-memory for the page cache. As part of the control mechanism, it provides
> > for a special-case "preferred folio size for executable mappings" marker.
> >
> > I'm trying to solve 2 separate problems with this series:
> >
> > 1. Reduce pressure in iTLB and improve performance on arm64: This is a modified
> > approach for the change at [1]. Instead of hardcoding the preferred executable
> > folio size into the arch, user space can now select it. This decouples the arch
> > code and also makes the mechanism more generic; it can be bypassed (the default)
> > or any folio size can be set. For my use case, 64K is preferred, but I've also
> > heard from Willy of a use case where putting all text into 2M PMD-sized folios
> > is preferred. This approach avoids the need for synchonous MADV_COLLAPSE (and
> > therefore faulting in all text ahead of time) to achieve that.
>
> Just a polite bump on this; I'd really like to get something like this merged to
> help reduce iTLB pressure. We had a discussion at the THP Cabal meeting a few
> weeks back without solid conclusion. I haven't heard any concrete objections
> yet, but also only a luke-warm reception. How can I move this forwards?
Hi Ryan,
These requirements seem to apply to anon, swap, pagecache, and shmem to
some extent. While the swapin_enabled knob was rejected, the shmem_enabled
option is already in place.
I wonder if it's possible to use the existing 'enabled' setting across
all cases, as
from an architectural perspective with cont-pte, pagecache may not differ from
anon. The demand for reducing page faults, LRU overhead, etc., also seems
quite similar.
I imagine that once Android's file systems support mTHP, we’ll uniformly enable
64KB for anon, swap, shmem, and page cache. It should then be sufficient to
enable all of them using a single knob:
'/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/enabled'.
Is there anything that makes pagecache and shmem significantly different
from anon? In my Android case, they all seem the same. However, I assume
there might be other use cases where differentiating them is necessary?
>
> Thanks,
> Ryan
>
>
> >
> > 2. Reduce memory fragmentation in systems under high memory pressure (e.g.
> > Android): The theory goes that if all folios are 64K, then failure to allocate a
> > 64K folio should become unlikely. But if the page cache is allocating lots of
> > different orders, with most allocations having an order below 64K (as is the
> > case today) then ability to allocate 64K folios diminishes. By providing control
> > over the allowed set of folio sizes, we can tune to avoid crucial 64K folio
> > allocation failure. Additionally I've heard (second hand) of the need to disable
> > large folios in the page cache entirely due to latency concerns in some
> > settings. These controls allow all of this without kernel changes.
> >
> > The value of (1) is clear and the performance improvements are documented in
> > patch 2. I don't yet have any data demonstrating the theory for (2) since I
> > can't reproduce the setup that Barry had at [2]. But my view is that by adding
> > these controls we will enable the community to explore further, in the same way
> > that the anon mTHP controls helped harden the understanding for anonymous
> > memory.
> >
> > ---
> > This series depends on the "mTHP allocation stats for file-backed memory" series
> > at [3], which itself applies on top of yesterday's mm-unstable (650b6752c8a3). All
> > mm selftests have been run; no regressions were observed.
> >
> > [1] https://lore.kernel.org/linux-mm/20240215154059.2863126-1-ryan.roberts@arm.com/
> > [2] https://www.youtube.com/watch?v=ht7eGWqwmNs&list=PLbzoR-pLrL6oj1rVTXLnV7cOuetvjKn9q&index=4
> > [3] https://lore.kernel.org/linux-mm/20240716135907.4047689-1-ryan.roberts@arm.com/
> >
> > Thanks,
> > Ryan
> >
> > Ryan Roberts (4):
> > mm: mTHP user controls to configure pagecache large folio sizes
> > mm: Introduce "always+exec" for mTHP file_enabled control
> > mm: Override mTHP "enabled" defaults at kernel cmdline
> > mm: Override mTHP "file_enabled" defaults at kernel cmdline
> >
> > .../admin-guide/kernel-parameters.txt | 16 ++
> > Documentation/admin-guide/mm/transhuge.rst | 66 +++++++-
> > include/linux/huge_mm.h | 61 ++++---
> > mm/filemap.c | 26 ++-
> > mm/huge_memory.c | 158 +++++++++++++++++-
> > mm/readahead.c | 43 ++++-
> > 6 files changed, 329 insertions(+), 41 deletions(-)
> >
> > --
> > 2.43.0
> >
>
Thanks
Barry
next prev parent reply other threads:[~2024-12-05 15:20 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-17 7:12 [RFC PATCH v1 0/4] Control folio sizes used for page cache memory Ryan Roberts
2024-07-17 7:12 ` [RFC PATCH v1 1/4] mm: mTHP user controls to configure pagecache large folio sizes Ryan Roberts
2024-07-17 7:12 ` [RFC PATCH v1 2/4] mm: Introduce "always+exec" for mTHP file_enabled control Ryan Roberts
2024-07-17 17:10 ` Ryan Roberts
2024-07-17 7:12 ` [RFC PATCH v1 3/4] mm: Override mTHP "enabled" defaults at kernel cmdline Ryan Roberts
2024-07-19 0:46 ` Barry Song
2024-07-19 7:47 ` Ryan Roberts
2024-07-19 7:52 ` Barry Song
2024-07-19 8:18 ` Ryan Roberts
2024-07-19 8:29 ` David Hildenbrand
2024-07-22 9:13 ` Daniel Gomez
2024-07-22 9:36 ` Ryan Roberts
2024-07-22 14:10 ` Ryan Roberts
2024-07-17 7:12 ` [RFC PATCH v1 4/4] mm: Override mTHP "file_enabled" " Ryan Roberts
2024-07-17 10:31 ` [RFC PATCH v1 0/4] Control folio sizes used for page cache memory David Hildenbrand
2024-07-17 10:45 ` Ryan Roberts
2024-07-17 14:25 ` David Hildenbrand
2024-07-22 9:35 ` Daniel Gomez
2024-07-22 9:43 ` Ryan Roberts
[not found] ` <480f34d0-a943-40da-9c69-2353fe311cf7@arm.com>
2024-09-19 8:20 ` Barry Song [this message]
2024-09-19 17:21 ` Ryan Roberts
2024-12-06 5:09 ` Barry Song
2024-12-06 5:29 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4z8kh4Pn-TUrVq6FALR1J5j4fpvQkef2xPFYPWdWfXdxA@mail.gmail.com \
--to=baohua@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=da.gomez@samsung.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=kernel@pankajraghav.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).