From: SeongJae Park <sj@kernel.org>
To: Usama Arif <usamaarif642@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
David Hildenbrand <david@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
Jonathan Corbet <corbet@lwn.net>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
Yafang Shao <laoar.shao@gmail.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH POC] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE
Date: Mon, 21 Jul 2025 12:35:59 -0700 [thread overview]
Message-ID: <20250721193559.11503-1-sj@kernel.org> (raw)
In-Reply-To: <4a8b70b1-7ba0-4d60-a3a0-04ac896a672d@gmail.com>
On Mon, 21 Jul 2025 18:27:38 +0100 Usama Arif <usamaarif642@gmail.com> wrote:
[...]
> >From ee9004e7d34511a79726ee1314aec0503e6351d4 Mon Sep 17 00:00:00 2001
> From: Usama Arif <usamaarif642@gmail.com>
> Date: Thu, 15 May 2025 14:33:33 +0100
> Subject: [PATCH] selftests: prctl: introduce tests for
> PR_THP_DISABLE_EXCEPT_ADVISED
>
> The test is limited to 2M PMD THPs. It does not modify the system
> settings in order to not disturb other process running in the system.
> It checks if the PMD size is 2M, if the 2M policy is set to inherit
> and if the system global THP policy is set to "always", so that
> the change in behaviour due to PR_THP_DISABLE_EXCEPT_ADVISED can
> be seen.
>
> This tests if:
> - the process can successfully set the policy
> - carry it over to the new process with fork
> - if no hugepage is gotten when the process doesn't MADV_HUGEPAGE
> - if hugepage is gotten when the process does MADV_HUGEPAGE
> - the process can successfully reset the policy to PR_THP_POLICY_SYSTEM
> - if hugepage is gotten after the policy reset
Nice! I added a few trivial comments below, though.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
> ---
> tools/testing/selftests/prctl/Makefile | 2 +-
> tools/testing/selftests/prctl/thp_disable.c | 207 ++++++++++++++++++++
I once thought this might better fit on selftests/mm/, but I found we already
have selftests/prctl/set-anon-vma-name-tests.c, no no strong opinion from my
side.
> 2 files changed, 208 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/prctl/thp_disable.c
>
> diff --git a/tools/testing/selftests/prctl/Makefile b/tools/testing/selftests/prctl/Makefile
> index 01dc90fbb509..a3cf76585c48 100644
> --- a/tools/testing/selftests/prctl/Makefile
> +++ b/tools/testing/selftests/prctl/Makefile
> @@ -5,7 +5,7 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
>
> ifeq ($(ARCH),x86)
> TEST_PROGS := disable-tsc-ctxt-sw-stress-test disable-tsc-on-off-stress-test \
> - disable-tsc-test set-anon-vma-name-test set-process-name
> + disable-tsc-test set-anon-vma-name-test set-process-name thp_disable
> all: $(TEST_PROGS)
>
> include ../lib.mk
> diff --git a/tools/testing/selftests/prctl/thp_disable.c b/tools/testing/selftests/prctl/thp_disable.c
> new file mode 100644
> index 000000000000..e524723b3313
> --- /dev/null
> +++ b/tools/testing/selftests/prctl/thp_disable.c
> @@ -0,0 +1,207 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This test covers the PR_GET/SET_THP_DISABLE functionality of prctl calls
> + * for PR_THP_DISABLE_EXCEPT_ADVISED
> + */
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/prctl.h>
> +#include <sys/wait.h>
> +
> +#ifndef PR_THP_DISABLE_EXCEPT_ADVISED
> +#define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1)
> +#endif
> +
> +#define CONTENT_SIZE 256
> +#define BUF_SIZE (12 * 2 * 1024 * 1024) // 12 x 2MB pages
> +
> +enum system_policy {
> + SYSTEM_POLICY_ALWAYS,
> + SYSTEM_POLICY_MADVISE,
> + SYSTEM_POLICY_NEVER,
> +};
> +
> +int system_thp_policy;
> +
> +/* check if the sysfs file contains the expected substring */
> +static int check_file_content(const char *file_path, const char *expected_substring)
> +{
> + FILE *file = fopen(file_path, "r");
> + char buffer[CONTENT_SIZE];
> +
> + if (!file) {
> + perror("Failed to open file");
> + return -1;
> + }
> + if (fgets(buffer, CONTENT_SIZE, file) == NULL) {
> + perror("Failed to read file");
> + fclose(file);
> + return -1;
> + }
> + fclose(file);
> + // Remove newline character from the buffer
Nit. I'd suggest using "/* */" consisetntly.
> + buffer[strcspn(buffer, "\n")] = '\0';
> + if (strstr(buffer, expected_substring))
> + return 0;
> + else
> + return 1;
> +}
> +
> +/*
> + * The test is designed for 2M hugepages only.
> + * Check if hugepage size is 2M, if 2M size inherits from global
> + * setting, and if the global setting is always.
> + */
> +static int sysfs_check(void)
> +{
> + int res = 0;
> +
> + res = check_file_content("/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", "2097152");
> + if (res) {
> + printf("hpage_pmd_size is not set to 2MB. Skipping test.\n");
> + return -1;
Nit. Skipping is done by the caller, right? I think it is more natural to say
"Skipping test" from the caller.
> + }
> + res |= check_file_content("/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled",
> + "[inherit]");
Nit. I think we can drop '|' and just do 'res = '.
> + if (res) {
> + printf("hugepages-2048kB does not inherit global setting. Skipping test.\n");
> + return -1;
> + }
> +
> + res = check_file_content("/sys/kernel/mm/transparent_hugepage/enabled", "[always]");
> + if (!res) {
Seems 'res' is being used for only checking whether it is zero. Maybe doing
'if (check_file_content(...))' and removing 'res' can make code simpler?
> + system_thp_policy = SYSTEM_POLICY_ALWAYS;
> + return 0;
> + }
Also, system_thp_policy is set only here, so we know 'system_thp_policy ==
SYSTEM_POLICY_ALWAYS' if sysfs_check() returned zero. Maybe system_thp_policy
is not really required?
> + printf("Global THP policy not set to always. Skipping test.\n");
> + return -1;
> +}
> +
> +static int check_smaps_for_huge(void)
> +{
> + FILE *file = fopen("/proc/self/smaps", "r");
> + int is_anonhuge = 0;
> + char line[256];
> +
> + if (!file) {
> + perror("fopen");
> + return -1;
> + }
> +
> + while (fgets(line, sizeof(line), file)) {
> + if (strstr(line, "AnonHugePages:") && strstr(line, "24576 kB")) {
> + is_anonhuge = 1;
> + break;
> + }
> + }
> + fclose(file);
> + return is_anonhuge;
> +}
> +
> +static int test_mmap_thp(int madvise_buffer)
> +{
> + int is_anonhuge;
> +
> + char *buffer = (char *)mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> + if (buffer == MAP_FAILED) {
> + perror("mmap");
> + return -1;
> + }
> + if (madvise_buffer)
> + madvise(buffer, BUF_SIZE, MADV_HUGEPAGE);
> +
> + // set memory to ensure it's allocated
'/* */' for consistency?
> + memset(buffer, 0, BUF_SIZE);
> + is_anonhuge = check_smaps_for_huge();
> + munmap(buffer, BUF_SIZE);
> + return is_anonhuge;
> +}
> +
> +/* Global policy is always, process is changed to "madvise only" */
> +static int test_global_always_process_madvise(void)
> +{
> + int is_anonhuge = 0, res = 0, status = 0;
> + pid_t pid;
> +
> + if (prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, NULL, NULL) != 0) {
Nit. '!= 0' can be dropped.
> + perror("prctl failed to set policy to madvise");
> + return -1;
> + }
> +
> + /* Make sure prctl changes are carried across fork */
> + pid = fork();
> + if (pid < 0) {
> + perror("fork");
> + exit(EXIT_FAILURE);
> + }
> +
> + res = prctl(PR_GET_THP_DISABLE, NULL, NULL, NULL, NULL);
> + if (res != 3) {
> + printf("prctl PR_GET_THP_POLICY returned %d pid %d\n", res, pid);
> + goto err_out;
> + }
> +
> + /* global = always, process = madvise, we shouldn't get HPs without madvise */
> + is_anonhuge = test_mmap_thp(0);
> + if (is_anonhuge) {
> + printf(
> + "PR_THP_POLICY_DEFAULT_NOHUGE set but still got hugepages without MADV_HUGEPAGE\n");
> + goto err_out;
> + }
> +
> + is_anonhuge = test_mmap_thp(1);
> + if (!is_anonhuge) {
> + printf(
> + "PR_THP_POLICY_DEFAULT_NOHUGE set but did't get hugepages with MADV_HUGEPAGE\n");
> + goto err_out;
> + }
> +
> + /* Reset to system policy */
> + if (prctl(PR_SET_THP_DISABLE, 0, NULL, NULL, NULL) != 0) {
> + perror("prctl failed to set policy to system");
> + goto err_out;
> + }
> +
> + is_anonhuge = test_mmap_thp(0);
> + if (!is_anonhuge) {
> + printf("global policy is always but we still didn't get hugepages\n");
> + goto err_out;
> + }
> +
> + is_anonhuge = test_mmap_thp(1);
> + if (!is_anonhuge) {
> + printf("global policy is always but we still didn't get hugepages\n");
> + goto err_out;
> + }
Seems is_anonhugepage is used for only whether it is zero or not, just after
being assigned from test_mmap_thp(). How about removing the variable?
> + printf("PASS\n");
> +
> + if (pid == 0) {
> + exit(EXIT_SUCCESS);
> + } else {
> + wait(&status);
> + if (WIFEXITED(status))
> + return 0;
> + else
> + return -1;
> + }
> +
> +err_out:
> + if (pid == 0)
> + exit(EXIT_FAILURE);
> + else
> + return -1;
> +}
> +
> +int main(void)
> +{
> + if (sysfs_check())
> + return 0;
May better to return KSFT_SKIP insted of 0?
> +
> + if (system_thp_policy == SYSTEM_POLICY_ALWAYS)
This should be always true, since sysfs_check() returned zero, right? I think
we can remove this check.
> + return test_global_always_process_madvise();
> +
Nit. Unnecessary blank line.
> +}
> --
> 2.47.1
>
>
>
Thanks,
SJ
next prev parent reply other threads:[~2025-07-21 19:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-21 9:09 [PATCH POC] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE David Hildenbrand
2025-07-21 10:10 ` David Hildenbrand
2025-07-21 11:28 ` Lorenzo Stoakes
2025-07-21 11:45 ` David Hildenbrand
2025-07-21 13:15 ` Lorenzo Stoakes
2025-07-21 14:32 ` Usama Arif
2025-07-21 14:39 ` David Hildenbrand
2025-07-21 17:27 ` Usama Arif
2025-07-21 19:35 ` SeongJae Park [this message]
2025-07-22 10:23 ` David Hildenbrand
2025-07-22 10:27 ` Lorenzo Stoakes
2025-07-23 17:07 ` Usama Arif
2025-07-23 18:02 ` David Hildenbrand
2025-07-24 18:57 ` Usama Arif
2025-07-24 19:07 ` David Hildenbrand
2025-07-24 22:27 ` Usama Arif
2025-07-25 13:08 ` David Hildenbrand
2025-07-25 16:26 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250721193559.11503-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=jannh@google.com \
--cc=laoar.shao@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).