From: Lorenzo Stoakes <ljs@kernel.org>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Vernon Yang <vernon2gm@gmail.com>,
akpm@linux-foundation.org, david@kernel.org,
roman.gushchin@linux.dev, inwardvessel@gmail.com,
shakeel.butt@linux.dev, ast@kernel.org, daniel@iogearbox.net,
surenb@google.com, tz2294@columbia.edu, baohua@kernel.org,
lance.yang@linux.dev, dev.jain@arm.com, laoar.shao@gmail.com,
gutierrez.asier@huawei-partners.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, Vernon Yang <yanglincheng@kylinos.cn>
Subject: Re: [PATCH v2 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent
Date: Fri, 8 May 2026 17:15:30 +0100 [thread overview]
Message-ID: <af4KaeaCWUSfOS-Z@lucifer> (raw)
In-Reply-To: <af4HYivyP7LDG2-k@pedro-suse.lan>
On Fri, May 08, 2026 at 05:00:04PM +0100, Pedro Falcato wrote:
> On Fri, May 08, 2026 at 11:00:51PM +0800, Vernon Yang wrote:
> > From: Vernon Yang <yanglincheng@kylinos.cn>
> >
> > Hi all,
> >
> > Background
> > ==========
> >
> > As is well known, a system can simultaneously run multiple different
> > scenarios. However, THP is not beneficial in every scenario — it is only
> > most suitable for memory-intensive applications that are not sensitive
> > to tail latency. For example, Redis, which is sensitive to tail latency,
> > is not suitable for THP. But in practice, due to Redis issues, the
> > entire THP functionality is often turned off, preventing other scenarios
> > from benefiting from it.
> >
> > There are also some embedded scenarios (e.g. Android) that directly use
> > 2MB THP, where the granularity is too large. Therefore, we introduced
> > mTHP in v6.8, which supports multiple-size THP. In practice, however, we
> > still globally fix a single mTHP size and are unable to automatically
> > select different mTHP sizes based on different scenarios.
> >
> > After testing, it was found that
> >
> > - When the system has a lot of free memory, it is normal for Redis to
> > use mTHP. performance degradation in Redis only occurs when the system
> > is under high memory pressure.
> > - Additionally, when a large number of small-memory processes use mTHP,
> > memory waste is prone to occur, and performance degradation may also
> > happen during fast memory allocation/release.
> >
> > Previously, "Cgroup-based THP control"[1] was proposed, but it had the
> > following issues.
> >
> > - It breaks the cgroup hierarchy property.
> > - Add new THP knobs, making sysadmin's job more complex
> >
> > Previously, "mm, bpf: BPF-MM, BPF-THP"[2] was proposed, but it had the
> > following issues.
> >
> > - It didn't address the issue on the per-process mode.
> > - For global mode, the prctl(PR_SET_THP_DISABLE) has already achieved
> > the same objective, there is no need to add two mechanisms for the
> > same purpose.
> > - Attaching st_ops to mm_struct, the same issues that cgroup-bpf once
> > faced are likely to arise again, e.g. lifetime of cgroup vs bpf, dying
> > cgroups, wq deadlock, etc. It is recommended to use cgroup-bpf for
> > implementation.
> > - Unclear ABI stability guarantees.
> > - The test cases are too simplistic, lacking eBPF cases similar to real
> > workloads such as sched_ext.
> >
> > If I miss some thing, please let me know. Thanks!
> >
> <snip>
> > kernbench results
> > ~~~~~~~~~~~~~~~~~
> >
> > When cgroup memory.high=max, no memory pressure, seems only noise level
> > changes, mthp_ext no regression.
> >
> > always never always+mthp_ext
> > Amean user-32 19702.39 ( 0.00%) 18428.90 * 6.46%* 19706.73 ( -0.02%)
> > Amean syst-32 1159.55 ( 0.00%) 2252.43 * -94.25%* 1177.48 * -1.55%*
> > Amean elsp-32 703.28 ( 0.00%) 699.10 * 0.59%* 703.99 * -0.10%*
> > BAmean-95 user-32 19701.79 ( 0.00%) 18425.01 ( 6.48%) 19704.78 ( -0.02%)
> > BAmean-95 syst-32 1159.43 ( 0.00%) 2251.86 ( -94.22%) 1177.03 ( -1.52%)
> > BAmean-95 elsp-32 703.24 ( 0.00%) 698.99 ( 0.61%) 703.88 ( -0.09%)
> > BAmean-99 user-32 19701.79 ( 0.00%) 18425.01 ( 6.48%) 19704.78 ( -0.02%)
> > BAmean-99 syst-32 1159.43 ( 0.00%) 2251.86 ( -94.22%) 1177.03 ( -1.52%)
> > BAmean-99 elsp-32 703.24 ( 0.00%) 698.99 ( 0.61%) 703.88 ( -0.09%)
> >
> > When cgroup memory.high=2G, high memory pressure, mthp_ext improved by 26%.
> >
> > always never always+mthp_ext
> > Amean user-32 20250.65 ( 0.00%) 18368.91 * 9.29%* 18681.27 * 7.75%*
> > Amean syst-32 12778.56 ( 0.00%) 9636.99 * 24.58%* 9392.65 * 26.50%*
> > Amean elsp-32 1377.55 ( 0.00%) 1026.10 * 25.51%* 1019.40 * 26.00%*
> > BAmean-95 user-32 20233.75 ( 0.00%) 18353.57 ( 9.29%) 18678.01 ( 7.69%)
> > BAmean-95 syst-32 12543.21 ( 0.00%) 9612.28 ( 23.37%) 9386.83 ( 25.16%)
> > BAmean-95 elsp-32 1367.82 ( 0.00%) 1023.75 ( 25.15%) 1018.17 ( 25.56%)
> > BAmean-99 user-32 20233.75 ( 0.00%) 18353.57 ( 9.29%) 18678.01 ( 7.69%)
> > BAmean-99 syst-32 12543.21 ( 0.00%) 9612.28 ( 23.37%) 9386.83 ( 25.16%)
> > BAmean-99 elsp-32 1367.82 ( 0.00%) 1023.75 ( 25.15%) 1018.17 ( 25.56%)
> >
> > TODO
> > ====
> >
> > - mthp_ext handles different "enum tva_type" values. For example, for
> > small-memory processes, only 4KB is used in TVA_PAGEFAULT, while
> > TVA_KHUGEPAGED/TVA_FORCED_COLLAPSE continues to collapse all mthp
> > size. Under high memory pressure, only 4KB is used for
> > TVA_PAGEFAULT/TVA_KHUGEPAGED, while TVA_FORCED_COLLAPSE continues to
> > collapse all mthp size.
> > - selftest
> >
> > If there are additional scenarios, please let me know as well, so I can
> > conduct further prototype verification tests to make mTHP more
> > transparent and further clear/stabilize the BPF-THP ABI.
>
> How is it more transparent if you're essentially adding mTHP
> micro-programmability from the user's side? This series makes it
> _less_ transparent.
>
> If you actually want to make it more transparent, then I would suggest
> improving the heuristics such that (m)THP doesn't churn through memory
> on high memory pressure. Or such that it doesn't feel extremely compelled
> to place the largest THP it can based on vibes.
I agree but I also don't really want to see anything like that until mTHP is
actually stabilised and the code base is less appalling :)
We've deferred paying down technical debt far too long.
>
> --
> Pedro
Thanks, Lorenzo
prev parent reply other threads:[~2026-05-08 16:15 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 15:00 [PATCH v2 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Vernon Yang
2026-05-08 15:00 ` [PATCH v2 1/4] psi: add psi_group_flush_stats() function Vernon Yang
2026-05-08 15:19 ` Lorenzo Stoakes
2026-05-08 15:00 ` [PATCH v2 2/4] bpf: add bpf_cgroup_{flush_stats,stall} function Vernon Yang
2026-05-08 15:40 ` bot+bpf-ci
2026-05-08 15:00 ` [PATCH v2 3/4] mm: introduce bpf_mthp_ops struct ops Vernon Yang
2026-05-08 15:40 ` bot+bpf-ci
2026-05-08 15:57 ` Lorenzo Stoakes
2026-05-08 20:54 ` David Hildenbrand (Arm)
2026-05-11 11:25 ` Lorenzo Stoakes
2026-05-08 15:00 ` [PATCH v2 4/4] samples: bpf: add mthp_ext Vernon Yang
2026-05-08 15:40 ` bot+bpf-ci
2026-05-08 15:14 ` [PATCH v2 0/4] mm: introduce mthp_ext via cgroup-bpf to make mTHP more transparent Lorenzo Stoakes
2026-05-08 16:05 ` Lorenzo Stoakes
2026-05-08 16:53 ` Vernon Yang
2026-05-11 11:20 ` Lorenzo Stoakes
2026-05-08 16:00 ` Pedro Falcato
2026-05-08 16:15 ` Lorenzo Stoakes [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af4KaeaCWUSfOS-Z@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=baohua@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gutierrez.asier@huawei-partners.com \
--cc=inwardvessel@gmail.com \
--cc=lance.yang@linux.dev \
--cc=laoar.shao@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pfalcato@suse.de \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tz2294@columbia.edu \
--cc=vernon2gm@gmail.com \
--cc=yanglincheng@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox