From: David Hildenbrand <david@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <shy828301@gmail.com>,
"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Itaru Kitayama <itaru.kitayama@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs interface
Date: Wed, 29 Nov 2023 09:05:31 +0100 [thread overview]
Message-ID: <e29bbb0d-8c61-49b8-9bf0-df7ddf2728e7@redhat.com> (raw)
In-Reply-To: <1a738e0a-ac11-4cd3-be2f-6b6e7cb4980a@nvidia.com>
On 29.11.23 04:42, John Hubbard wrote:
> On 11/22/23 08:29, Ryan Roberts wrote:
>> In preparation for adding support for anonymous small-sized THP,
>> introduce new sysfs structure that will be used to control the new
>> behaviours. A new directory is added under transparent_hugepage for each
>> supported THP size, and contains an `enabled` file, which can be set to
>> "global" (to inherrit the global setting), "always", "madvise" or
>> "never". For now, the kernel still only supports PMD-sized anonymous
>> THP, so only 1 directory is populated.
>>
>> The first half of the change converts transhuge_vma_suitable() and
>> hugepage_vma_check() so that they take a bitfield of orders for which
>> the user wants to determine support, and the functions filter out all
>> the orders that can't be supported, given the current sysfs
>> configuration and the VMA dimensions. If there is only 1 order set in
>> the input then the output can continue to be treated like a boolean;
>> this is the case for most call sites.
>>
>> The second half of the change implements the new sysfs interface. It has
>> been done so that each supported THP size has a `struct thpsize`, which
>> describes the relevant metadata and is itself a kobject. This is pretty
>> minimal for now, but should make it easy to add new per-thpsize files to
>> the interface if needed in future (e.g. per-size defrag). Rather than
>> keep the `enabled` state directly in the struct thpsize, I've elected to
>> directly encode it into huge_anon_orders_[always|madvise|global]
>> bitfields since this reduces the amount of work required in
>> transhuge_vma_suitable() which is called for every page fault.
>>
>> The remainder is copied from Documentation/admin-guide/mm/transhuge.rst,
>> as modified by this commit. See that file for further details.
>>
>> Transparent Hugepage Support for anonymous memory can be entirely
>> disabled (mostly for debugging purposes) or only enabled inside
>> MADV_HUGEPAGE regions (to avoid the risk of consuming more memory
>> resources) or enabled system wide. This can be achieved
>> per-supported-THP-size with one of::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> where <size> is the hugepage size being addressed, the available sizes
>> for which vary by system. Alternatively it is possible to specify that
>> a given hugepage size will inherrit the global enabled setting::
>>
>> echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> The global (legacy) enabled setting can be set as follows::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> By default, PMD-sized hugepages have enabled="global" and all other
>> hugepage sizes have enabled="never". If enabling multiple hugepage
>> sizes, the kernel will select the most appropriate enabled size for a
>> given allocation.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> Documentation/admin-guide/mm/transhuge.rst | 74 ++++--
>> Documentation/filesystems/proc.rst | 6 +-
>> fs/proc/task_mmu.c | 3 +-
>> include/linux/huge_mm.h | 100 +++++---
>> mm/huge_memory.c | 263 +++++++++++++++++++--
>> mm/khugepaged.c | 16 +-
>> mm/memory.c | 6 +-
>> mm/page_vma_mapped.c | 3 +-
>> 8 files changed, 387 insertions(+), 84 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index b0cc8243e093..52565e0bd074 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -45,10 +45,23 @@ components:
>> the two is using hugepages just because of the fact the TLB miss is
>> going to run faster.
>>
>> +As well as PMD-sized THP described above, it is also possible to
>> +configure the system to allocate "small-sized THP" to back anonymous
>
> Here's one of the places to change to the new name, which lately is
> "multi-size THP", or mTHP or m_thp for short. (I've typed "multi-size"
> instead of "multi-sized", because the 'd' doesn't add significantly to
> the meaning, and if in doubt, shorter is better.
>
>
>> +memory (for example 16K, 32K, 64K, etc). These THPs continue to be
>> +PTE-mapped, but in many cases can still provide similar benefits to
>> +those outlined above: Page faults are significantly reduced (by a
>> +factor of e.g. 4, 8, 16, etc), but latency spikes are much less
>> +prominent because the size of each page isn't as huge as the PMD-sized
>> +variant and there is less memory to clear in each page fault. Some
>> +architectures also employ TLB compression mechanisms to squeeze more
>> +entries in when a set of PTEs are virtually and physically contiguous
>> +and approporiately aligned. In this case, TLB misses will occur less
>> +often.
>> +
>
> OK, all of the above still seems like it can remain the same.
>
>> THP can be enabled system wide or restricted to certain tasks or even
>> memory ranges inside task's address space. Unless THP is completely
>> disabled, there is ``khugepaged`` daemon that scans memory and
>> -collapses sequences of basic pages into huge pages.
>> +collapses sequences of basic pages into PMD-sized huge pages.
>>
>> The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>> interface and using madvise(2) and prctl(2) system calls.
>> @@ -95,12 +108,29 @@ Global THP controls
>> Transparent Hugepage Support for anonymous memory can be entirely disabled
>> (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
>> regions (to avoid the risk of consuming more memory resources) or enabled
>> -system wide. This can be achieved with one of::
>> +system wide. This can be achieved per-supported-THP-size with one of::
>> +
>> + echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> + echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> + echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +where <size> is the hugepage size being addressed, the available sizes
>> +for which vary by system. Alternatively it is possible to specify that
>> +a given hugepage size will inherrit the global enabled setting::
>
> typo: inherrit
>
>> +
>> + echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +The global (legacy) enabled setting can be set as follows::
>>
>> echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> +By default, PMD-sized hugepages have enabled="global" and all other
>> +hugepage sizes have enabled="never". If enabling multiple hugepage
>> +sizes, the kernel will select the most appropriate enabled size for a
>> +given allocation.
>> +
>
> This is slightly murky. I wonder if "inherited" is a little more directly
> informative than global; it certainly felt that way my first time running
> this and poking at it.
>
> And a few trivial examples would be a nice touch.
>
> And so overall with a few other minor tweaks, I'd suggest this:
>
> ...
> where <size> is the hugepage size being addressed, the available sizes
> for which vary by system.
>
> For example:
> echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>
> Alternatively it is possible to specify that a given hugepage size will inherit
> the top-level "enabled" value:
>
> echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>
> For example:
> echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>
> The top-level setting (for use with "inherited") can be by issuing one of the
> following commands::
>
> echo always >/sys/kernel/mm/transparent_hugepage/enabled
> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>
> By default, PMD-sized hugepages have enabled="inherited" and all other
> hugepage sizes have enabled="never".
"inherited" works for me as well.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-11-29 8:05 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-22 16:29 [RESEND PATCH v7 00/10] Small-sized THP for anonymous memory Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 01/10] mm: Allow deferred splitting of arbitrary anon large folios Ryan Roberts
2023-11-27 8:27 ` Barry Song
2023-11-22 16:29 ` [RESEND PATCH v7 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-11-24 17:40 ` David Hildenbrand
2023-11-27 10:34 ` Ryan Roberts
2023-11-27 4:36 ` Barry Song
2023-11-27 11:30 ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs interface Ryan Roberts
2023-11-29 3:42 ` John Hubbard
2023-11-29 8:05 ` David Hildenbrand [this message]
2023-11-29 11:05 ` Ryan Roberts
2023-11-29 19:40 ` John Hubbard
2023-11-30 12:14 ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 04/10] mm: thp: Support allocation of anonymous small-sized THP Ryan Roberts
2023-11-27 3:41 ` Barry Song
2023-11-27 11:28 ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 05/10] selftests/mm/kugepaged: Restore thp settings at exit Ryan Roberts
2023-11-23 5:54 ` Alistair Popple
2023-11-22 16:29 ` [RESEND PATCH v7 06/10] selftests/mm: Factor out thp settings management Ryan Roberts
2023-11-23 6:07 ` Alistair Popple
2023-11-27 12:22 ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 07/10] selftests/mm: Support small-sized THP interface in thp_settings Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 08/10] selftests/mm/khugepaged: Enlighten for small-sized THP Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 09/10] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-11-24 17:48 ` David Hildenbrand
2023-11-27 10:48 ` Ryan Roberts
2023-11-27 13:59 ` David Hildenbrand
2023-11-27 14:11 ` Ryan Roberts
2023-11-27 14:17 ` David Hildenbrand
2023-11-22 16:29 ` [RESEND PATCH v7 10/10] selftests/mm/cow: Add tests for anonymous small-sized THP Ryan Roberts
2023-11-27 14:02 ` Ryan Roberts
2023-11-27 14:50 ` David Hildenbrand
2023-11-27 14:54 ` Ryan Roberts
2023-11-22 16:32 ` [RESEND PATCH v7 00/10] Small-sized THP for anonymous memory David Hildenbrand
2023-11-23 6:28 ` John Hubbard
2023-11-23 15:59 ` Matthew Wilcox
2023-11-23 16:05 ` David Hildenbrand
2023-11-23 16:18 ` Matthew Wilcox
2023-11-23 16:50 ` David Hildenbrand
2023-11-24 1:14 ` John Hubbard
2023-11-24 1:34 ` Zi Yan
2023-11-24 9:02 ` David Hildenbrand
2023-11-24 9:56 ` Ryan Roberts
2023-11-24 15:13 ` Matthew Wilcox
2023-11-24 15:23 ` Ryan Roberts
2023-11-24 15:25 ` David Hildenbrand
2023-11-24 15:53 ` Matthew Wilcox
2023-11-24 17:34 ` David Hildenbrand
2023-11-27 8:20 ` Alistair Popple
2023-11-27 10:31 ` Ryan Roberts
2023-11-28 2:09 ` John Hubbard
2023-11-28 8:48 ` David Hildenbrand
2023-11-28 12:15 ` Ryan Roberts
2023-11-28 14:09 ` David Hildenbrand
2023-11-28 15:34 ` Ryan Roberts
2023-11-28 16:40 ` David Hildenbrand
2023-11-28 18:39 ` John Hubbard
2023-11-29 9:59 ` Ryan Roberts
2023-11-29 19:46 ` John Hubbard
2023-11-28 4:10 ` Matthew Wilcox
2023-11-28 4:05 ` Matthew Wilcox
2023-11-28 8:47 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e29bbb0d-8c61-49b8-9bf0-df7ddf2728e7@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=itaru.kitayama@gmail.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=rientjes@google.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).