linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Itaru Kitayama <itaru.kitayama@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs interface
Date: Wed, 29 Nov 2023 09:05:31 +0100	[thread overview]
Message-ID: <e29bbb0d-8c61-49b8-9bf0-df7ddf2728e7@redhat.com> (raw)
In-Reply-To: <1a738e0a-ac11-4cd3-be2f-6b6e7cb4980a@nvidia.com>

On 29.11.23 04:42, John Hubbard wrote:
> On 11/22/23 08:29, Ryan Roberts wrote:
>> In preparation for adding support for anonymous small-sized THP,
>> introduce new sysfs structure that will be used to control the new
>> behaviours. A new directory is added under transparent_hugepage for each
>> supported THP size, and contains an `enabled` file, which can be set to
>> "global" (to inherrit the global setting), "always", "madvise" or
>> "never". For now, the kernel still only supports PMD-sized anonymous
>> THP, so only 1 directory is populated.
>>
>> The first half of the change converts transhuge_vma_suitable() and
>> hugepage_vma_check() so that they take a bitfield of orders for which
>> the user wants to determine support, and the functions filter out all
>> the orders that can't be supported, given the current sysfs
>> configuration and the VMA dimensions. If there is only 1 order set in
>> the input then the output can continue to be treated like a boolean;
>> this is the case for most call sites.
>>
>> The second half of the change implements the new sysfs interface. It has
>> been done so that each supported THP size has a `struct thpsize`, which
>> describes the relevant metadata and is itself a kobject. This is pretty
>> minimal for now, but should make it easy to add new per-thpsize files to
>> the interface if needed in future (e.g. per-size defrag). Rather than
>> keep the `enabled` state directly in the struct thpsize, I've elected to
>> directly encode it into huge_anon_orders_[always|madvise|global]
>> bitfields since this reduces the amount of work required in
>> transhuge_vma_suitable() which is called for every page fault.
>>
>> The remainder is copied from Documentation/admin-guide/mm/transhuge.rst,
>> as modified by this commit. See that file for further details.
>>
>> Transparent Hugepage Support for anonymous memory can be entirely
>> disabled (mostly for debugging purposes) or only enabled inside
>> MADV_HUGEPAGE regions (to avoid the risk of consuming more memory
>> resources) or enabled system wide. This can be achieved
>> per-supported-THP-size with one of::
>>
>> 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> 	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> 	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> where <size> is the hugepage size being addressed, the available sizes
>> for which vary by system. Alternatively it is possible to specify that
>> a given hugepage size will inherrit the global enabled setting::
>>
>> 	echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>>
>> The global (legacy) enabled setting can be set as follows::
>>
>> 	echo always >/sys/kernel/mm/transparent_hugepage/enabled
>> 	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>> 	echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> By default, PMD-sized hugepages have enabled="global" and all other
>> hugepage sizes have enabled="never". If enabling multiple hugepage
>> sizes, the kernel will select the most appropriate enabled size for a
>> given allocation.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>    Documentation/admin-guide/mm/transhuge.rst |  74 ++++--
>>    Documentation/filesystems/proc.rst         |   6 +-
>>    fs/proc/task_mmu.c                         |   3 +-
>>    include/linux/huge_mm.h                    | 100 +++++---
>>    mm/huge_memory.c                           | 263 +++++++++++++++++++--
>>    mm/khugepaged.c                            |  16 +-
>>    mm/memory.c                                |   6 +-
>>    mm/page_vma_mapped.c                       |   3 +-
>>    8 files changed, 387 insertions(+), 84 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index b0cc8243e093..52565e0bd074 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -45,10 +45,23 @@ components:
>>       the two is using hugepages just because of the fact the TLB miss is
>>       going to run faster.
>>
>> +As well as PMD-sized THP described above, it is also possible to
>> +configure the system to allocate "small-sized THP" to back anonymous
> 
> Here's one of the places to change to the new name, which lately is
> "multi-size THP", or mTHP or m_thp for short. (I've typed "multi-size"
> instead of "multi-sized", because the 'd' doesn't add significantly to
> the meaning, and if in doubt, shorter is better.
> 
> 
>> +memory (for example 16K, 32K, 64K, etc). These THPs continue to be
>> +PTE-mapped, but in many cases can still provide similar benefits to
>> +those outlined above: Page faults are significantly reduced (by a
>> +factor of e.g. 4, 8, 16, etc), but latency spikes are much less
>> +prominent because the size of each page isn't as huge as the PMD-sized
>> +variant and there is less memory to clear in each page fault. Some
>> +architectures also employ TLB compression mechanisms to squeeze more
>> +entries in when a set of PTEs are virtually and physically contiguous
>> +and approporiately aligned. In this case, TLB misses will occur less
>> +often.
>> +
> 
> OK, all of the above still seems like it can remain the same.
> 
>>    THP can be enabled system wide or restricted to certain tasks or even
>>    memory ranges inside task's address space. Unless THP is completely
>>    disabled, there is ``khugepaged`` daemon that scans memory and
>> -collapses sequences of basic pages into huge pages.
>> +collapses sequences of basic pages into PMD-sized huge pages.
>>
>>    The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>>    interface and using madvise(2) and prctl(2) system calls.
>> @@ -95,12 +108,29 @@ Global THP controls
>>    Transparent Hugepage Support for anonymous memory can be entirely disabled
>>    (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
>>    regions (to avoid the risk of consuming more memory resources) or enabled
>> -system wide. This can be achieved with one of::
>> +system wide. This can be achieved per-supported-THP-size with one of::
>> +
>> +	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +where <size> is the hugepage size being addressed, the available sizes
>> +for which vary by system. Alternatively it is possible to specify that
>> +a given hugepage size will inherrit the global enabled setting::
> 
> typo: inherrit
> 
>> +
>> +	echo global >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
>> +
>> +The global (legacy) enabled setting can be set as follows::
>>
>>    	echo always >/sys/kernel/mm/transparent_hugepage/enabled
>>    	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
>>    	echo never >/sys/kernel/mm/transparent_hugepage/enabled
>>
>> +By default, PMD-sized hugepages have enabled="global" and all other
>> +hugepage sizes have enabled="never". If enabling multiple hugepage
>> +sizes, the kernel will select the most appropriate enabled size for a
>> +given allocation.
>> +
> 
> This is slightly murky. I wonder if "inherited" is a little more directly
> informative than global; it certainly felt that way my first time running
> this and poking at it.
> 
> And a few trivial examples would be a nice touch.
> 
> And so overall with a few other minor tweaks, I'd suggest this:
> 
> ...
> where <size> is the hugepage size being addressed, the available sizes
> for which vary by system.
> 
> For example:
> 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> 
> Alternatively it is possible to specify that a given hugepage size will inherit
> the top-level "enabled" value:
> 
> 	echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
> 
> For example:
> 	echo inherited >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> 
> The top-level setting (for use with "inherited") can be by issuing one of the
> following commands::
> 
> 	echo always >/sys/kernel/mm/transparent_hugepage/enabled
> 	echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> 	echo never >/sys/kernel/mm/transparent_hugepage/enabled
> 
> By default, PMD-sized hugepages have enabled="inherited" and all other
> hugepage sizes have enabled="never".

"inherited" works for me as well.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-11-29  8:05 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22 16:29 [RESEND PATCH v7 00/10] Small-sized THP for anonymous memory Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 01/10] mm: Allow deferred splitting of arbitrary anon large folios Ryan Roberts
2023-11-27  8:27   ` Barry Song
2023-11-22 16:29 ` [RESEND PATCH v7 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-11-24 17:40   ` David Hildenbrand
2023-11-27 10:34     ` Ryan Roberts
2023-11-27  4:36   ` Barry Song
2023-11-27 11:30     ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 03/10] mm: thp: Introduce per-size thp sysfs interface Ryan Roberts
2023-11-29  3:42   ` John Hubbard
2023-11-29  8:05     ` David Hildenbrand [this message]
2023-11-29 11:05     ` Ryan Roberts
2023-11-29 19:40       ` John Hubbard
2023-11-30 12:14         ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 04/10] mm: thp: Support allocation of anonymous small-sized THP Ryan Roberts
2023-11-27  3:41   ` Barry Song
2023-11-27 11:28     ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 05/10] selftests/mm/kugepaged: Restore thp settings at exit Ryan Roberts
2023-11-23  5:54   ` Alistair Popple
2023-11-22 16:29 ` [RESEND PATCH v7 06/10] selftests/mm: Factor out thp settings management Ryan Roberts
2023-11-23  6:07   ` Alistair Popple
2023-11-27 12:22     ` Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 07/10] selftests/mm: Support small-sized THP interface in thp_settings Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 08/10] selftests/mm/khugepaged: Enlighten for small-sized THP Ryan Roberts
2023-11-22 16:29 ` [RESEND PATCH v7 09/10] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-11-24 17:48   ` David Hildenbrand
2023-11-27 10:48     ` Ryan Roberts
2023-11-27 13:59       ` David Hildenbrand
2023-11-27 14:11         ` Ryan Roberts
2023-11-27 14:17           ` David Hildenbrand
2023-11-22 16:29 ` [RESEND PATCH v7 10/10] selftests/mm/cow: Add tests for anonymous small-sized THP Ryan Roberts
2023-11-27 14:02   ` Ryan Roberts
2023-11-27 14:50     ` David Hildenbrand
2023-11-27 14:54       ` Ryan Roberts
2023-11-22 16:32 ` [RESEND PATCH v7 00/10] Small-sized THP for anonymous memory David Hildenbrand
2023-11-23  6:28 ` John Hubbard
2023-11-23 15:59 ` Matthew Wilcox
2023-11-23 16:05   ` David Hildenbrand
2023-11-23 16:18     ` Matthew Wilcox
2023-11-23 16:50       ` David Hildenbrand
2023-11-24  1:14         ` John Hubbard
2023-11-24  1:34         ` Zi Yan
2023-11-24  9:02           ` David Hildenbrand
2023-11-24  9:56   ` Ryan Roberts
2023-11-24 15:13     ` Matthew Wilcox
2023-11-24 15:23       ` Ryan Roberts
2023-11-24 15:25       ` David Hildenbrand
2023-11-24 15:53         ` Matthew Wilcox
2023-11-24 17:34           ` David Hildenbrand
2023-11-27  8:20             ` Alistair Popple
2023-11-27 10:31               ` Ryan Roberts
2023-11-28  2:09                 ` John Hubbard
2023-11-28  8:48                   ` David Hildenbrand
2023-11-28 12:15                     ` Ryan Roberts
2023-11-28 14:09                       ` David Hildenbrand
2023-11-28 15:34                         ` Ryan Roberts
2023-11-28 16:40                           ` David Hildenbrand
2023-11-28 18:39                           ` John Hubbard
2023-11-29  9:59                             ` Ryan Roberts
2023-11-29 19:46                               ` John Hubbard
2023-11-28  4:10               ` Matthew Wilcox
2023-11-28  4:05             ` Matthew Wilcox
2023-11-28  8:47               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e29bbb0d-8c61-49b8-9bf0-df7ddf2728e7@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=itaru.kitayama@gmail.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).