From: Luka Bai <lukafocus@icloud.com>
To: linux-mm@kvack.org
Cc: Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <liam@infradead.org>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>,
Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
Arnd Bergmann <arnd@arndb.de>, Kairui Song <kasong@tencent.com>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-doc@vger.kernel.org, Luka Bai <lukabai@tencent.com>
Subject: [PATCH 2/5] mm: add pmd level THP COW parameter in sysfs
Date: Fri, 01 May 2026 13:55:43 +0800 [thread overview]
Message-ID: <20260501-thp_cow-v1-2-005377483738@tencent.com> (raw)
In-Reply-To: <20260501-thp_cow-v1-0-005377483738@tencent.com>
From: Luka Bai <lukabai@tencent.com>
We would like to use similar logic of huge anonymous page or huge shmem
pages for THP COW: to categorize the strategies into three types: always,
never, madvise. If setting up to always, then we always do THP COW for
all the existing THPs. If setting up to never, then we never do THP COW.
If setting up to madvise, then we follow the setup we introduced in last
commit to decide whether we do COW for each individual vma.
We add TRANSPARENT_HUGEPAGE_COW_FLAG and
TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG that are very similar to
the TRANSPARENT_HUGEPAGE_FLAG and TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
which are used to decide whether we do anonymous huge page fault when
it permits. And we add sysfs attribute thp_cow_attr as the interface
to choose from the three strategies we mentioned before.
Signed-off-by: Luka Bai <lukabai@tencent.com>
---
.../testing/sysfs-kernel-mm-transparent-hugepage | 1 +
Documentation/admin-guide/mm/transhuge.rst | 27 +++++++++++++++
include/linux/huge_mm.h | 2 ++
mm/huge_memory.c | 39 ++++++++++++++++++++++
4 files changed, 69 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-transparent-hugepage b/Documentation/ABI/testing/sysfs-kernel-mm-transparent-hugepage
index 7bfbb9cc2c11..43a1af13efe0 100644
--- a/Documentation/ABI/testing/sysfs-kernel-mm-transparent-hugepage
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-transparent-hugepage
@@ -11,6 +11,7 @@ Description:
- khugepaged
- shmem_enabled
- use_zero_page
+ - thp_cow
- subdirectories of the form hugepages-<size>kB, where <size>
is the page size of the hugepages supported by the kernel/CPU
combination.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 0ef13c451ac8..0926651bad0d 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -226,6 +226,33 @@ to "always" or "madvise"), and it'll be automatically shutdown when
all THP sizes are disabled (when both the per-size anon control and the
top-level control are "never")
+Some workloads may want to do copy on write on the pmd size to acquire the
+tlb benifit when it tries to write on a shared anonymous pmd sized entry.
+They can do so by setting up the thp_cow control. The control is only enabled
+when the global THP controls are set to "always" or "madvise" for the
+specific memory region::
+
+::
+
+ echo always >/sys/kernel/mm/transparent_hugepage/thp_cow
+ echo madvise >/sys/kernel/mm/transparent_hugepage/thp_cow
+ echo never >/sys/kernel/mm/transparent_hugepage/thp_cow
+
+always
+ means that the writing process will always do copy on write on
+ the pmd size. If there is no pmd sized folio available, it will
+ fallback to the pte size.
+
+madvise
+ will do things like ``always`` but only for regions that have
+ used madvise(MADV_THP_COW).
+
+never
+ will not do copy on write on the pmd size no matter what setup
+ is done using madvise. When a process writes on a shared anonymous
+ pmd sized entry, it will just allocate a pte sized page and do copy
+ on write on the pte size.
+
process THP controls
--------------------
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a0ce8c0b81f5..2a62f0f92f68 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -57,6 +57,8 @@ enum transparent_hugepage_flag {
TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG,
TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG,
+ TRANSPARENT_HUGEPAGE_COW_FLAG,
+ TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG,
};
struct kobject;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1f0d0b780943..babca060feca 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -531,6 +531,44 @@ static ssize_t split_underused_thp_store(struct kobject *kobj,
static struct kobj_attribute split_underused_thp_attr = __ATTR(
shrink_underused, 0644, split_underused_thp_show, split_underused_thp_store);
+static ssize_t thp_cow_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ const char *output;
+
+ if (test_bit(TRANSPARENT_HUGEPAGE_COW_FLAG, &transparent_hugepage_flags))
+ output = "[always] madvise never";
+ else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG,
+ &transparent_hugepage_flags))
+ output = "always [madvise] never";
+ else
+ output = "always madvise [never]";
+
+ return sysfs_emit(buf, "%s\n", output);
+}
+
+static ssize_t thp_cow_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ ssize_t ret = count;
+
+ if (sysfs_streq(buf, "always")) {
+ clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG, &transparent_hugepage_flags);
+ set_bit(TRANSPARENT_HUGEPAGE_COW_FLAG, &transparent_hugepage_flags);
+ } else if (sysfs_streq(buf, "madvise")) {
+ clear_bit(TRANSPARENT_HUGEPAGE_COW_FLAG, &transparent_hugepage_flags);
+ set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG, &transparent_hugepage_flags);
+ } else if (sysfs_streq(buf, "never")) {
+ clear_bit(TRANSPARENT_HUGEPAGE_COW_FLAG, &transparent_hugepage_flags);
+ clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_COW_FLAG, &transparent_hugepage_flags);
+ } else
+ ret = -EINVAL;
+
+ return ret;
+}
+static struct kobj_attribute thp_cow_attr = __ATTR_RW(thp_cow);
+
static struct attribute *hugepage_attr[] = {
&enabled_attr.attr,
&defrag_attr.attr,
@@ -540,6 +578,7 @@ static struct attribute *hugepage_attr[] = {
&shmem_enabled_attr.attr,
#endif
&split_underused_thp_attr.attr,
+ &thp_cow_attr.attr,
NULL,
};
--
2.52.0
next prev parent reply other threads:[~2026-05-01 5:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 5:55 [PATCH 0/5] mm: Support selecting doing direct COW for anonymous pmd entry Luka Bai
2026-05-01 5:55 ` [PATCH 1/5] mm: add basic madvise helpers and branch for THP setup Luka Bai
2026-05-01 5:55 ` Luka Bai [this message]
2026-05-01 5:55 ` [PATCH 3/5] mm: add pmd level THP COW judgement helpers Luka Bai
2026-05-01 5:55 ` [PATCH 4/5] mm: enable map_anon_folio_pmd_nopf to handle unshare Luka Bai
2026-05-01 5:55 ` [PATCH 5/5] mm: support choosing to do THP COW for anonymous pmd entry Luka Bai
2026-05-01 7:11 ` David Hildenbrand (Arm)
2026-05-01 15:01 ` Luka Bai
2026-05-01 7:07 ` [PATCH 0/5] mm: Support selecting doing direct " David Hildenbrand (Arm)
2026-05-01 16:16 ` Luka Bai
2026-05-01 18:30 ` David Hildenbrand (Arm)
2026-05-02 5:06 ` Luka Bai
2026-05-03 7:03 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260501-thp_cow-v1-2-005377483738@tencent.com \
--to=lukafocus@icloud.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=jannh@google.com \
--cc=kasong@tencent.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=lukabai@tencent.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox