* [PATCH next v2 0/2] THP COW support for private executable file mmap
@ 2025-12-26 10:03 Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for " Zhang Qilong
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Zhang Qilong @ 2025-12-26 10:03 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes, corbet
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, rppt, surenb, mhocko, willy,
wangkefeng.wang, sunnanyong, linux-mm, linux-doc, linux-kernel,
lianux.mm, zhangqilong3
This patch series implementate THP COW for private executable file mmap.
It's major designed to improve the performance of hotpatch programs, and
reusing 'vma->vm_flags' hints to determine whether to trigger the exec
THP COW.
The MySQL (Ver 8.0.25) test results on AMD are as follows:
-------------------------------------------------------------------
| Exec mmap Rss(kB) | Measured tpmC (NewOrders) |
-----------------|--------------------|---------------------------|
base(page COW) | 32868 | 339686 |
-----------------|--------------------|---------------------------|
exec THP COW | 43516 | 371324 |
-------------------------------------------------------------------
The MySQL using exec THP COW consumes an additional 10648 kB of memory
but achieves 9.3% performance improvement in the scenario of hotpatch.
Additionally, another our internal program achieves approximately a 5%
performance improvement as well.
As result, using exec THP COW will consume additional memory. The
additional memory consumption may be negligible for the current system.
It's necessary to balance the memory consumption with the performance
impact.
v2:
- Add MySQL and internal program test results
Zhang Qilong (2):
mm/huge_memory: Implementation of THP COW for executable file mmap
mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW
include/linux/huge_mm.h | 1 +
mm/huge_memory.c | 91 +++++++++++++++++++++++++++++++++++++++++
mm/memory.c | 15 +++++++
3 files changed, 107 insertions(+)
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for executable file mmap
2025-12-26 10:03 [PATCH next v2 0/2] THP COW support for private executable file mmap Zhang Qilong
@ 2025-12-26 10:03 ` Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 2/2] mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW Zhang Qilong
2025-12-28 3:42 ` [PATCH next v2 0/2] THP COW support for private executable file mmap Matthew Wilcox
2 siblings, 0 replies; 4+ messages in thread
From: Zhang Qilong @ 2025-12-26 10:03 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes, corbet
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, rppt, surenb, mhocko, willy,
wangkefeng.wang, sunnanyong, linux-mm, linux-doc, linux-kernel,
lianux.mm, zhangqilong3
During the user-space hot patching, the involved executable file
segments of private mapping will be modified. If the modification
meets THP mapping, the PMD entry will be cleared at first and do
page COW fault handle.
Currently, khugepaged may attempt to merge scattered file pages
into THP. However, due to the single page COW, the modified
executable segments can not be mapped in THP once again for hot
patched process. Hence it can not benefit form khugepaged efforts.
The executable segment mapped in page granularity may reduce the
performance due to lower iTLB cache hit rate compared with the
original THP mapping.
For user-space hot patching, we introduce THP COW support for the
executable mapping. If the exec COW meets THP mapping, it will
allocate a anonymous THP and map it to remain PMD mapping.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Tested-by: wang lian <lianux.mm@gmail.com>
---
v2:
- Fix linux-next build error (call to undeclared function
vma_is_special_huge()), move it to do_huge_pmd_exec_cow()
- Add a variable 'vm_flags' in wp_huge_pmd()
---
include/linux/huge_mm.h | 1 +
mm/huge_memory.c | 91 +++++++++++++++++++++++++++++++++++++++++
mm/memory.c | 8 ++++
3 files changed, 100 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a4d9f964dfde..8b710751d1e2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -23,10 +23,11 @@ static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud)
{
}
#endif
vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf);
+vm_fault_t do_huge_pmd_exec_cow(struct vm_fault *vmf);
bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr, unsigned long next);
int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr);
int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..ae599431989d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2146,10 +2146,101 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
fallback:
__split_huge_pmd(vma, vmf->pmd, vmf->address, false);
return VM_FAULT_FALLBACK;
}
+vm_fault_t do_huge_pmd_exec_cow(struct vm_fault *vmf)
+{
+ vm_fault_t ret;
+ struct vm_area_struct *vma = vmf->vma;
+ struct folio *folio, *src_folio;
+ pmd_t orig_pmd = vmf->orig_pmd;
+ unsigned long haddr = vmf->address & PMD_MASK;
+ struct mmu_notifier_range range;
+ pgtable_t pgtable = NULL;
+
+ /* Skip special and shmem */
+ if (vma_is_special_huge(vma) || vma_is_shmem(vma))
+ return VM_FAULT_FALLBACK;
+
+ ret = vmf_anon_prepare(vmf);
+ if (ret)
+ return ret;
+
+ folio = vma_alloc_anon_folio_pmd(vma, haddr);
+ if (!folio)
+ return VM_FAULT_FALLBACK;
+
+ if (!arch_needs_pgtable_deposit()) {
+ pgtable = pte_alloc_one(vma->vm_mm);
+ if (!pgtable) {
+ ret = VM_FAULT_OOM;
+ goto release;
+ }
+ }
+
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
+ haddr, haddr + HPAGE_PMD_SIZE);
+ mmu_notifier_invalidate_range_start(&range);
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(pmdp_get(vmf->pmd), orig_pmd)))
+ goto unlock_ptl;
+
+ ret = check_stable_address_space(vma->vm_mm);
+ if (ret)
+ goto unlock_ptl;
+
+ src_folio = pmd_folio(orig_pmd);
+ if (!folio_trylock(src_folio)) {
+ ret = VM_FAULT_FALLBACK;
+ goto unlock_ptl;
+ }
+
+ /*
+ * If uptodate bit is not set, it means this source folio is
+ * stale or invalid now, this memory data in it is not
+ * untrustworthy. So we just avoid copying it and fallback.
+ */
+ if (!folio_test_uptodate(src_folio)) {
+ ret = VM_FAULT_FALLBACK;
+ goto unlock_folio;
+ }
+
+ if (copy_user_large_folio(folio, src_folio, haddr, vma)) {
+ ret = VM_FAULT_HWPOISON;
+ goto unlock_folio;
+ }
+ folio_mark_uptodate(folio);
+
+ folio_unlock(src_folio);
+ pmdp_huge_clear_flush(vma, haddr, vmf->pmd);
+ folio_remove_rmap_pmd(src_folio, folio_page(src_folio, 0), vma);
+ add_mm_counter(vma->vm_mm, mm_counter_file(src_folio), -HPAGE_PMD_NR);
+ folio_put(src_folio);
+
+ map_anon_folio_pmd_pf(folio, vmf->pmd, vma, haddr);
+ if (pgtable)
+ pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
+ mm_inc_nr_ptes(vma->vm_mm);
+ spin_unlock(vmf->ptl);
+ mmu_notifier_invalidate_range_end(&range);
+
+ return ret;
+
+unlock_folio:
+ folio_unlock(src_folio);
+unlock_ptl:
+ spin_unlock(vmf->ptl);
+ mmu_notifier_invalidate_range_end(&range);
+release:
+ if (pgtable)
+ pte_free(vma->vm_mm, pgtable);
+ folio_put(folio);
+
+ return ret;
+}
+
static inline bool can_change_pmd_writable(struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd)
{
struct page *page;
diff --git a/mm/memory.c b/mm/memory.c
index ee15303c4041..691e3ca38cc6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6104,10 +6104,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
/* `inline' is required to avoid gcc 4.1.2 build error */
static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
+ const vm_flags_t vm_flags = vma->vm_flags;
const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE;
vm_fault_t ret;
if (vma_is_anonymous(vma)) {
if (likely(!unshare) &&
@@ -6125,10 +6126,17 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
if (!(ret & VM_FAULT_FALLBACK))
return ret;
}
}
+ if (is_exec_mapping(vm_flags) &&
+ is_cow_mapping(vm_flags)) {
+ ret = do_huge_pmd_exec_cow(vmf);
+ if (!(ret & VM_FAULT_FALLBACK))
+ return ret;
+ }
+
split:
/* COW or write-notify handled on pte level: split pmd. */
__split_huge_pmd(vma, vmf->pmd, vmf->address, false);
return VM_FAULT_FALLBACK;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH next v2 2/2] mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW
2025-12-26 10:03 [PATCH next v2 0/2] THP COW support for private executable file mmap Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for " Zhang Qilong
@ 2025-12-26 10:03 ` Zhang Qilong
2025-12-28 3:42 ` [PATCH next v2 0/2] THP COW support for private executable file mmap Matthew Wilcox
2 siblings, 0 replies; 4+ messages in thread
From: Zhang Qilong @ 2025-12-26 10:03 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes, corbet
Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
baohua, lance.yang, vbabka, rppt, surenb, mhocko, willy,
wangkefeng.wang, sunnanyong, linux-mm, linux-doc, linux-kernel,
lianux.mm, zhangqilong3
Using the per-VMA hugepage flag to avoid system wide default
behavior. If 'vma->vm_flags' indicates a preference for huge pages,
then the exec THP COW can be attempted.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
---
v2:
- Use 'vma->vm_flags' as hint for exec THP COW suggested by David
---
mm/memory.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 691e3ca38cc6..eb2bb36e284c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6128,10 +6128,17 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
}
}
if (is_exec_mapping(vm_flags) &&
is_cow_mapping(vm_flags)) {
+ /*
+ * Reuse the per-VMA flag, only if VM_HUGEPAGE is
+ * set, do exec THP COW.
+ */
+ if (!(vm_flags & VM_HUGEPAGE))
+ goto split;
+
ret = do_huge_pmd_exec_cow(vmf);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH next v2 0/2] THP COW support for private executable file mmap
2025-12-26 10:03 [PATCH next v2 0/2] THP COW support for private executable file mmap Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for " Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 2/2] mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW Zhang Qilong
@ 2025-12-28 3:42 ` Matthew Wilcox
2 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2025-12-28 3:42 UTC (permalink / raw)
To: Zhang Qilong
Cc: akpm, david, lorenzo.stoakes, corbet, ziy, baolin.wang,
Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
vbabka, rppt, surenb, mhocko, wangkefeng.wang, sunnanyong,
linux-mm, linux-doc, linux-kernel, lianux.mm
On Fri, Dec 26, 2025 at 06:03:35PM +0800, Zhang Qilong wrote:
> The MySQL (Ver 8.0.25) test results on AMD are as follows:
>
> -------------------------------------------------------------------
> | Exec mmap Rss(kB) | Measured tpmC (NewOrders) |
> -----------------|--------------------|---------------------------|
> base(page COW) | 32868 | 339686 |
> -----------------|--------------------|---------------------------|
> exec THP COW | 43516 | 371324 |
> -------------------------------------------------------------------
>
> The MySQL using exec THP COW consumes an additional 10648 kB of memory
> but achieves 9.3% performance improvement in the scenario of hotpatch.
> Additionally, another our internal program achieves approximately a 5%
> performance improvement as well.
>
> As result, using exec THP COW will consume additional memory. The
> additional memory consumption may be negligible for the current system.
> It's necessary to balance the memory consumption with the performance
> impact.
I mean ... you say "negligible", I saay "32% extra". 9% performance
gain is certainly nothing to sneer at (and is consistent with measured
performance gains from using large folios for, eg, kernel compiles).
But wow, that's a lot of extra memory. My feeling is that we shouldn't
add this functionality, but I'd welcome other opinions.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-12-28 3:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 10:03 [PATCH next v2 0/2] THP COW support for private executable file mmap Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 1/2] mm/huge_memory: Implementation of THP COW for " Zhang Qilong
2025-12-26 10:03 ` [PATCH next v2 2/2] mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW Zhang Qilong
2025-12-28 3:42 ` [PATCH next v2 0/2] THP COW support for private executable file mmap Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).