From: Hugh Dickins <hughd@google.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com,
ziy@nvidia.com, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, npache@redhat.com,
ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org,
zokeefe@google.com, shy828301@gmail.com, usamaarif642@gmail.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
Date: Tue, 24 Jun 2025 22:53:28 -0700 (PDT) [thread overview]
Message-ID: <75c02dbf-4189-958d-515e-fa80bb2187fc@google.com> (raw)
In-Reply-To: <cover.1750815384.git.baolin.wang@linux.alibaba.com>
On Wed, 25 Jun 2025, Baolin Wang wrote:
> When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
> callers who do not specify this flag, it creates a odd and surprising situation
> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system. And the MADV_COLLAPSE is an example of
> such a case, that means it will not set TVA_ENFORCE_SYSFS when calling
> thp_vma_allowable_orders().
>
> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
> the system-wide anon/shmem THP sysfs settings, which means that even though
> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
> attempt to collapse into a anon/shmem THP. This violates the rule we have
> agreed upon: never means never.
>
> For example, system administrators who disabled THP everywhere must indeed very
> much not want THP to be used for whatever reason - having individual programs
> being able to quietly override this is very surprising and likely to cause headaches
> for those who desire this not to happen on their systems.
>
> This patch set will address the MADV_COLLAPSE issue.
>
> Test
> ====
> 1. Tested the mm selftests and found no regressions.
> 2. With toggling different Anon mTHP settings, the allocation and madvise collapse for
> anonymous pages work well.
> 3. With toggling different shmem mTHP settings, the allocation and madvise collapse for
> shmem work well.
> 4. Tested the large order allocation for tmpfs, and works as expected.
>
> [1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/
>
> Changes from v3:
> - Collect reviewed tags. Thanks.
> - Update the commit message, per David.
>
> Changes from v2:
> - Update the commit message and cover letter, per Lorenzo. Thanks.
> - Simplify the logic in thp_vma_allowable_orders(), per Lorenzo and David. Thanks.
>
> Changes from v1:
> - Update the commit message, per Zi.
> - Add Zi's reviewed tag. Thanks.
> - Update the shmem logic.
>
> Baolin Wang (2):
> mm: huge_memory: disallow hugepages if the system-wide THP sysfs
> settings are disabled
> mm: shmem: disallow hugepages if the system-wide shmem THP sysfs
> settings are disabled
>
> include/linux/huge_mm.h | 51 ++++++++++++++++++-------
> mm/shmem.c | 6 +--
> tools/testing/selftests/mm/khugepaged.c | 8 +---
> 3 files changed, 43 insertions(+), 22 deletions(-)
>
> --
> 2.43.5
Sorry for chiming in so late, after so much effort: but I beg you,
please drop these.
I did not want to get into a fight, and had been hoping a voice of
reason would come from others, before I got around to responding.
And indeed Ryan understood correctly at the start; and he, Usama
and Barry, perhaps others I've missed, have raised appropriate
concerns but not prevailed.
If we're sloganeering, I much prefer "never break userspace" to
"never means never", attractive though that over-simplification is.
Seldom has a feature been so thorougly documented as MADV_COLLAPSE,
in its 6.1 commits and in the "man 2 madvise" page: which are
explicit about MADV_COLLAPSE providing a way to get THPs where the
sysfs setting governing automatic behaviour does not insert them.
We would all prefer a less messy world of THP tunables. I certainly
find plenty to dislike there too; and wish that a less assertive name
than "never" had been chosen originally for the default off position.
But please don't break the accepted and documented behaviour of
MADV_COLLAPSE now.
If you want to exclude all possibility of THPs, then please use the
prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me
who insisted that be respected by MADV_COLLAPSE back then).
Add a "deny" option to /sys/kernel/mm/transparent_hugepage/enabled
if you like. (But in these days of filesystem large folios, adding
new protections against them seems a few years late.)
If Andrew decides that these patches should go in, then I'll have to
scrutinize them more carefully than I've done so far: but currently
I'm hoping to avoid that.
Hugh
next prev parent reply other threads:[~2025-06-25 5:53 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 1:40 [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
2025-06-25 1:40 ` [PATCH v4 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-25 4:34 ` Dev Jain
2025-06-25 1:40 ` [PATCH v4 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
2025-06-25 5:53 ` Hugh Dickins [this message]
2025-06-25 6:05 ` [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP " Dev Jain
2025-06-25 6:26 ` Baolin Wang
2025-06-25 6:49 ` Dev Jain
2025-06-25 6:55 ` Baolin Wang
2025-06-25 7:20 ` Lorenzo Stoakes
2025-06-25 7:34 ` David Hildenbrand
2025-06-25 7:55 ` Lorenzo Stoakes
2025-06-25 8:12 ` Lorenzo Stoakes
2025-06-25 8:24 ` David Hildenbrand
2025-06-25 8:37 ` Lorenzo Stoakes
2025-06-25 8:52 ` Baolin Wang
2025-06-25 9:31 ` Lorenzo Stoakes
2025-06-25 10:02 ` Baolin Wang
2025-06-25 10:07 ` David Hildenbrand
2025-06-25 10:15 ` Lorenzo Stoakes
2025-06-25 10:29 ` David Hildenbrand
2025-06-25 8:53 ` David Hildenbrand
2025-06-25 11:03 ` Usama Arif
2025-06-25 11:09 ` David Hildenbrand
2025-06-26 3:49 ` Hugh Dickins
2025-06-25 7:23 ` David Hildenbrand
2025-06-25 7:30 ` Lorenzo Stoakes
2025-06-25 7:36 ` David Hildenbrand
2025-06-25 7:42 ` Lorenzo Stoakes
2025-06-25 7:49 ` David Hildenbrand
2025-06-25 8:16 ` David Hildenbrand
2025-06-25 8:22 ` Lorenzo Stoakes
2025-06-25 8:40 ` David Hildenbrand
2025-06-25 8:45 ` Lorenzo Stoakes
2025-06-25 21:51 ` Hugh Dickins
2025-07-09 12:36 ` Lorenzo Stoakes
2025-07-10 1:58 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75c02dbf-4189-958d-515e-fa80bb2187fc@google.com \
--to=hughd@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=usamaarif642@gmail.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.