linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment
@ 2025-04-29  2:41 Yafang Shao
  2025-04-29  2:41 ` [RFC PATCH 1/4] mm: move hugepage_global_{enabled,always}() to internal.h Yafang Shao
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Yafang Shao @ 2025-04-29  2:41 UTC (permalink / raw)
  To: akpm, ast, daniel, andrii; +Cc: bpf, linux-mm, Yafang Shao

In our container environment, we aim to enable THP selectively—allowing
specific services to use it while restricting others. This approach is
driven by the following considerations:

1. Memory Fragmentation
   THP can lead to increased memory fragmentation, so we want to limit its
   use across services.
2. Performance Impact
   Some services see no benefit from THP, making its usage unnecessary.
3. Performance Gains
   Certain workloads, such as machine learning services, experience
   significant performance improvements with THP, so we enable it for them
   specifically. 

Since multiple services run on a single host in a containerized environment,
enabling THP globally is not ideal. Previously, we set THP to madvise,
allowing selected services to opt in via MADV_HUGEPAGE. However, this
approach had limitation:

- Some services inadvertently used madvise(MADV_HUGEPAGE) through
  third-party libraries, bypassing our restrictions.

To address this issue, we initially hooked the __x64_sys_madvise() syscall,
which is error-injectable, to blacklist unwanted services. While this
worked, it was error-prone and ineffective for services needing always mode,
as modifying their code to use madvise was impractical.

To achieve finer-grained control, we introduced an fmod_ret-based solution.
Now, we dynamically adjust THP settings per service by hooking
hugepage_global_{enabled,always}() via BPF. This allows us to set THP to
enable or disable on a per-service basis without global impact.

The hugepage_global_{enabled,always}() functions currently share the same
BPF hook, which limits THP configuration to either always or never. While
this suffices for our specific use cases, full support for all three modes
(always, madvise, and never) would require splitting them into separate
hooks.

This is the initial RFC patch—feedback is welcome!

Yafang Shao (4):
  mm: move hugepage_global_{enabled,always}() to internal.h
  mm: pass VMA parameter to hugepage_global_{enabled,always}()
  mm: add BPF hook for THP adjustment
  selftests/bpf: Add selftest for THP adjustment

 include/linux/huge_mm.h                       |  54 +-----
 mm/Makefile                                   |   3 +
 mm/bpf.c                                      |  36 ++++
 mm/bpf.h                                      |  21 +++
 mm/huge_memory.c                              |  50 ++++-
 mm/internal.h                                 |  21 +++
 mm/khugepaged.c                               |  18 +-
 tools/testing/selftests/bpf/config            |   1 +
 .../selftests/bpf/prog_tests/thp_adjust.c     | 176 ++++++++++++++++++
 .../selftests/bpf/progs/test_thp_adjust.c     |  32 ++++
 10 files changed, 344 insertions(+), 68 deletions(-)
 create mode 100644 mm/bpf.c
 create mode 100644 mm/bpf.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/thp_adjust.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust.c

-- 
2.43.5



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-05-05  9:39 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29  2:41 [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment Yafang Shao
2025-04-29  2:41 ` [RFC PATCH 1/4] mm: move hugepage_global_{enabled,always}() to internal.h Yafang Shao
2025-04-29 15:13   ` Zi Yan
2025-04-30  2:40     ` Yafang Shao
2025-04-30 12:11       ` Zi Yan
2025-04-30 14:43         ` Yafang Shao
2025-04-29  2:41 ` [RFC PATCH 2/4] mm: pass VMA parameter to hugepage_global_{enabled,always}() Yafang Shao
2025-04-29 15:31   ` Zi Yan
2025-04-30  2:46     ` Yafang Shao
2025-04-29  2:41 ` [RFC PATCH 3/4] mm: add BPF hook for THP adjustment Yafang Shao
2025-04-29 15:19   ` Alexei Starovoitov
2025-04-30  2:48     ` Yafang Shao
2025-04-29  2:41 ` [RFC PATCH 4/4] selftests/bpf: Add selftest " Yafang Shao
2025-04-29  3:11 ` [RFC PATCH 0/4] mm, bpf: BPF based " Matthew Wilcox
2025-04-29  4:53   ` Yafang Shao
2025-04-29 15:09 ` Zi Yan
2025-04-30  2:33   ` Yafang Shao
2025-04-30 13:19     ` Zi Yan
2025-04-30 14:38       ` Yafang Shao
2025-04-30 15:00         ` Zi Yan
2025-04-30 15:16           ` Yafang Shao
2025-04-30 15:21           ` Liam R. Howlett
2025-04-30 15:37             ` Yafang Shao
2025-04-30 15:53               ` Liam R. Howlett
2025-04-30 16:06                 ` Yafang Shao
2025-04-30 17:45                   ` Johannes Weiner
2025-04-30 17:53                     ` Zi Yan
2025-05-01 19:36                       ` Gutierrez Asier
2025-05-02  5:48                         ` Yafang Shao
2025-05-02 12:00                           ` Zi Yan
2025-05-02 12:18                             ` Yafang Shao
2025-05-02 13:04                               ` David Hildenbrand
2025-05-02 13:06                                 ` Matthew Wilcox
2025-05-02 13:34                                 ` Zi Yan
2025-05-05  2:35                                 ` Yafang Shao
2025-05-05  9:11                           ` Gutierrez Asier
2025-05-05  9:38                             ` Yafang Shao
2025-04-30 17:59         ` Johannes Weiner
2025-05-01  0:40           ` Yafang Shao
2025-04-30 14:40     ` Liam R. Howlett
2025-04-30 14:49       ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).