* [PATCH 0/3] THP Shrinker
@ 2022-09-28 6:25 alexlzhu
0 siblings, 0 replies; 4+ messages in thread
From: alexlzhu @ 2022-09-28 6:25 UTC (permalink / raw)
To: linux-mm
Cc: willy, akpm, riel, hannes, linux-kernel, kernel-team,
Alexander Zhu
From: Alexander Zhu <alexlzhu@fb.com>
Transparent Hugepages use a larger page size of 2MB in comparison to
normal sized pages that are 4kb. A larger page size allows for fewer TLB
cache misses and thus more efficient use of the CPU. Using a larger page
size also results in more memory waste, which can hurt performance in some
use cases. THPs are currently enabled in the Linux Kernel by applications
in limited virtual address ranges via the madvise system call. The THP
shrinker tries to find a balance between increased use of THPs, and
increased use of memory. It shrinks the size of memory by removing the
underutilized THPs that are identified by the thp_utilization scanner.
In our experiments we have noticed that the least utilized THPs are almost
entirely unutilized.
Sample Output:
Utilized[0-50]: 1331 680884
Utilized[51-101]: 9 3983
Utilized[102-152]: 3 1187
Utilized[153-203]: 0 0
Utilized[204-255]: 2 539
Utilized[256-306]: 5 1135
Utilized[307-357]: 1 192
Utilized[358-408]: 0 0
Utilized[409-459]: 1 57
Utilized[460-512]: 400 13
Last Scan Time: 223.98s
Last Scan Duration: 70.65s
Above is a sample obtained from one of our test machines when THP is always
enabled. Of the 1331 THPs in this thp_utilization sample that have from
0-50 utilized subpages, we see that there are 680884 free pages. This
comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least
utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste.
Also note that the vast majority of pages are either in the least utilized
[0-50] or most utilized [460-512] buckets. The least utilized THPs are
responsible for almost all of the memory waste when THP is always
enabled. Thus by clearing out THPs in the lowest utilization bucket
we extract most of the improvement in CPU efficiency. We have seen
similar results on our production hosts.
This patchset introduces the THP shrinker we have developed to identify
and split the least utilized THPs. It includes the thp_utilization
changes that groups anonymous THPs into buckets, the split_huge_page()
changes that identify and zap zero 4KB pages within THPs and the shrinker
changes. It should be noted that the split_huge_page() changes are based
off previous work done by Yu Zhao.
In the future, we intend to allow additional tuning to the shrinker
based on workload depending on CPU/IO/Memory pressure and the
amount of anonymous memory. The long term goal is to eventually always
enable THP for all applications and deprecate madvise entirely.
In production we thus far have observed 2-3% reduction in overall cpu
usage on stateless web servers when THP is always enabled.
Alexander Zhu (3):
mm: add thp_utilization metrics to debugfs
mm: changes to split_huge_page() to free zero filled tail pages
mm: THP low utilization shrinker
Documentation/admin-guide/mm/transhuge.rst | 9 +
include/linux/huge_mm.h | 10 +
include/linux/list_lru.h | 24 ++
include/linux/mm_types.h | 5 +
include/linux/rmap.h | 2 +-
include/linux/vm_event_item.h | 3 +
mm/huge_memory.c | 306 +++++++++++++++++-
mm/list_lru.c | 49 +++
mm/migrate.c | 72 ++++-
mm/migrate_device.c | 4 +-
mm/page_alloc.c | 6 +
mm/vmstat.c | 3 +
.../selftests/vm/split_huge_page_test.c | 114 ++++++-
tools/testing/selftests/vm/vm_util.c | 23 ++
tools/testing/selftests/vm/vm_util.h | 1 +
15 files changed, 613 insertions(+), 18 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 0/3] THP Shrinker
@ 2022-09-28 6:44 alexlzhu
2022-09-28 14:21 ` David Hildenbrand
0 siblings, 1 reply; 4+ messages in thread
From: alexlzhu @ 2022-09-28 6:44 UTC (permalink / raw)
To: linux-mm
Cc: willy, akpm, riel, hannes, linux-kernel, kernel-team,
Alexander Zhu
From: Alexander Zhu <alexlzhu@fb.com>
Transparent Hugepages use a larger page size of 2MB in comparison to
normal sized pages that are 4kb. A larger page size allows for fewer TLB
cache misses and thus more efficient use of the CPU. Using a larger page
size also results in more memory waste, which can hurt performance in some
use cases. THPs are currently enabled in the Linux Kernel by applications
in limited virtual address ranges via the madvise system call. The THP
shrinker tries to find a balance between increased use of THPs, and
increased use of memory. It shrinks the size of memory by removing the
underutilized THPs that are identified by the thp_utilization scanner.
In our experiments we have noticed that the least utilized THPs are almost
entirely unutilized.
Sample Output:
Utilized[0-50]: 1331 680884
Utilized[51-101]: 9 3983
Utilized[102-152]: 3 1187
Utilized[153-203]: 0 0
Utilized[204-255]: 2 539
Utilized[256-306]: 5 1135
Utilized[307-357]: 1 192
Utilized[358-408]: 0 0
Utilized[409-459]: 1 57
Utilized[460-512]: 400 13
Last Scan Time: 223.98s
Last Scan Duration: 70.65s
Above is a sample obtained from one of our test machines when THP is always
enabled. Of the 1331 THPs in this thp_utilization sample that have from
0-50 utilized subpages, we see that there are 680884 free pages. This
comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least
utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste.
Also note that the vast majority of pages are either in the least utilized
[0-50] or most utilized [460-512] buckets. The least utilized THPs are
responsible for almost all of the memory waste when THP is always
enabled. Thus by clearing out THPs in the lowest utilization bucket
we extract most of the improvement in CPU efficiency. We have seen
similar results on our production hosts.
This patchset introduces the THP shrinker we have developed to identify
and split the least utilized THPs. It includes the thp_utilization
changes that groups anonymous THPs into buckets, the split_huge_page()
changes that identify and zap zero 4KB pages within THPs and the shrinker
changes. It should be noted that the split_huge_page() changes are based
off previous work done by Yu Zhao.
In the future, we intend to allow additional tuning to the shrinker
based on workload depending on CPU/IO/Memory pressure and the
amount of anonymous memory. The long term goal is to eventually always
enable THP for all applications and deprecate madvise entirely.
In production we thus far have observed 2-3% reduction in overall cpu
usage on stateless web servers when THP is always enabled.
Alexander Zhu (3):
mm: add thp_utilization metrics to sysfs
mm: changes to split_huge_page() to free zero filled tail pages
mm: THP low utilization shrinker
Documentation/admin-guide/mm/transhuge.rst | 9 +
include/linux/huge_mm.h | 10 +
include/linux/list_lru.h | 24 ++
include/linux/mm_types.h | 5 +
include/linux/rmap.h | 2 +-
include/linux/vm_event_item.h | 3 +
mm/huge_memory.c | 342 +++++++++++++++++-
mm/list_lru.c | 49 +++
mm/migrate.c | 72 +++-
mm/migrate_device.c | 4 +-
mm/page_alloc.c | 6 +
mm/vmstat.c | 3 +
.../selftests/vm/split_huge_page_test.c | 113 +++++-
tools/testing/selftests/vm/vm_util.c | 23 ++
tools/testing/selftests/vm/vm_util.h | 1 +
15 files changed, 648 insertions(+), 18 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/3] THP Shrinker
2022-09-28 6:44 alexlzhu
@ 2022-09-28 14:21 ` David Hildenbrand
2022-09-28 16:22 ` Alex Zhu (Kernel)
0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2022-09-28 14:21 UTC (permalink / raw)
To: alexlzhu, linux-mm; +Cc: willy, akpm, riel, hannes, linux-kernel, kernel-team
On 28.09.22 08:44, alexlzhu@fb.com wrote:
> From: Alexander Zhu <alexlzhu@fb.com>
>
> Transparent Hugepages use a larger page size of 2MB in comparison to
> normal sized pages that are 4kb. A larger page size allows for fewer TLB
> cache misses and thus more efficient use of the CPU. Using a larger page
> size also results in more memory waste, which can hurt performance in some
> use cases. THPs are currently enabled in the Linux Kernel by applications
> in limited virtual address ranges via the madvise system call. The THP
> shrinker tries to find a balance between increased use of THPs, and
> increased use of memory. It shrinks the size of memory by removing the
> underutilized THPs that are identified by the thp_utilization scanner.
>
> In our experiments we have noticed that the least utilized THPs are almost
> entirely unutilized.
>
> Sample Output:
>
> Utilized[0-50]: 1331 680884
> Utilized[51-101]: 9 3983
> Utilized[102-152]: 3 1187
> Utilized[153-203]: 0 0
> Utilized[204-255]: 2 539
> Utilized[256-306]: 5 1135
> Utilized[307-357]: 1 192
> Utilized[358-408]: 0 0
> Utilized[409-459]: 1 57
> Utilized[460-512]: 400 13
> Last Scan Time: 223.98s
> Last Scan Duration: 70.65s
>
> Above is a sample obtained from one of our test machines when THP is always
> enabled. Of the 1331 THPs in this thp_utilization sample that have from
> 0-50 utilized subpages, we see that there are 680884 free pages. This
> comes out to 680884 / (512 * 1331) = 99.91% zero pages in the least
> utilized bucket. This represents 680884 * 4KB = 2.7GB memory waste.
>
> Also note that the vast majority of pages are either in the least utilized
> [0-50] or most utilized [460-512] buckets. The least utilized THPs are
> responsible for almost all of the memory waste when THP is always
> enabled. Thus by clearing out THPs in the lowest utilization bucket
> we extract most of the improvement in CPU efficiency. We have seen
> similar results on our production hosts.
>
> This patchset introduces the THP shrinker we have developed to identify
> and split the least utilized THPs. It includes the thp_utilization
> changes that groups anonymous THPs into buckets, the split_huge_page()
> changes that identify and zap zero 4KB pages within THPs and the shrinker
> changes. It should be noted that the split_huge_page() changes are based
> off previous work done by Yu Zhao.
>
> In the future, we intend to allow additional tuning to the shrinker
> based on workload depending on CPU/IO/Memory pressure and the
> amount of anonymous memory. The long term goal is to eventually always
> enable THP for all applications and deprecate madvise entirely.
>
> In production we thus far have observed 2-3% reduction in overall cpu
> usage on stateless web servers when THP is always enabled.
What's the diff to the RFC?
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/3] THP Shrinker
2022-09-28 14:21 ` David Hildenbrand
@ 2022-09-28 16:22 ` Alex Zhu (Kernel)
0 siblings, 0 replies; 4+ messages in thread
From: Alex Zhu (Kernel) @ 2022-09-28 16:22 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-mm@kvack.org, Matthew Wilcox, akpm@linux-foundation.org,
riel@surriel.com, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, Kernel Team
Sorry about that. The diff to the RFC:
-Remove all THPs that are not in the top utilization bucket. This is what we have found to perform the best in production testing, we have found that there are an almost trivial number of THPs in the middle range of buckets that account for most of the memory waste.
-Added check for THP utilization prior to split_huge_page for the THP Shrinker. This is to account for THPs that move to the top bucket, but were underutilized at the time they were added to the list_lru.
-Refactored out the code to obtain the thp_utilization_bucket, as that now has to be used in multiple places.
-Multiply the shrink_count and scan_count by HPAGE_PMD_NR. This is because a THP is 512 pages, and should count as 512 objects in reclaim. This way reclaim is triggered at a more appropriate frequency than in the RFC.
-Added support to map to the read only zero page when splitting a THP registered with userfaultfd. Also added a self test to verify that this is working.
-Only trigger the unmap_clean/zap in split_huge_page on anonymous THPs. We cannot zap zero pages for file THPs.
Thanks,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-09-28 16:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-28 6:25 [PATCH 0/3] THP Shrinker alexlzhu
-- strict thread matches above, loose matches on Subject: below --
2022-09-28 6:44 alexlzhu
2022-09-28 14:21 ` David Hildenbrand
2022-09-28 16:22 ` Alex Zhu (Kernel)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).