* + revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch added to mm-new branch
@ 2025-10-28 22:47 Andrew Morton
0 siblings, 0 replies; 2+ messages in thread
From: Andrew Morton @ 2025-10-28 22:47 UTC (permalink / raw)
To: mm-commits, xu.xin16, david, chengming.zhou, pedrodemargomes,
akpm
The patch titled
Subject: Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
has been added to the -mm mm-new branch. Its filename is
revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Subject: Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
Date: Tue, 28 Oct 2025 10:19:43 -0300
Patch series "ksm: perform a range-walk to jump over holes in break_ksm".
When unmerging an address range, unmerge_ksm_pages function walks every
page address in the specified range to locate ksm pages. This becomes
highly inefficient when scanning large virtual memory areas that contain
mostly unmapped regions, causing the process to get blocked for several
minutes.
This patch makes break_ksm, function called by unmerge_ksm_pages for every
page in an address range, perform a range walk, allowing it to skip over
entire unmapped holes in a VMA, avoiding unnecessary lookups.
As pointed out by David Hildenbrand in [1], unmerge_ksm_pages() is called
from:
* ksm_madvise() through madvise(MADV_UNMERGEABLE). There are not a lot
of users of that function.
* __ksm_del_vma() through ksm_del_vmas(). Effectively called when
disabling KSM for a process either through the sysctl or from s390x gmap
code when enabling storage keys for a VM.
Consider the following test program which creates a 32 TiB mapping in
the virtual address space but only populates a single page:
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;
int main() {
char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);
if (area == MAP_FAILED) {
perror("mmap() failed\n");
return -1;
}
/* Populate a single page such that we get an anon_vma. */
*area = 0;
/* Enable KSM. */
madvise(area, size, MADV_MERGEABLE);
madvise(area, size, MADV_UNMERGEABLE);
return 0;
}
Without this patch, this program takes 9 minutes to finish, while with
this patch it finishes in less then 5 seconds.
This patch (of 3):
This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
PageKsm(page) to folio_test_ksm(page_folio(page)).
This reverts break_ksm() to use walk_page_range_vma() instead of
folio_walk_start().
This will make it easier to later modify break_ksm() to perform a proper
range walk.
Link: https://lkml.kernel.org/r/20251028131945.26445-1-pedrodemargomes@gmail.com
Link: https://lkml.kernel.org/r/20251028131945.26445-2-pedrodemargomes@gmail.com
Link: https://lore.kernel.org/linux-mm/e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com/ [1]
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ksm.c | 63 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 47 insertions(+), 16 deletions(-)
--- a/mm/ksm.c~revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk
+++ a/mm/ksm.c
@@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct
return atomic_read(&mm->mm_users) == 0;
}
+static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
+ struct mm_walk *walk)
+{
+ struct page *page = NULL;
+ spinlock_t *ptl;
+ pte_t *pte;
+ pte_t ptent;
+ int ret;
+
+ pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ if (!pte)
+ return 0;
+ ptent = ptep_get(pte);
+ if (pte_present(ptent)) {
+ page = vm_normal_page(walk->vma, addr, ptent);
+ } else if (!pte_none(ptent)) {
+ swp_entry_t entry = pte_to_swp_entry(ptent);
+
+ /*
+ * As KSM pages remain KSM pages until freed, no need to wait
+ * here for migration to end.
+ */
+ if (is_migration_entry(entry))
+ page = pfn_swap_entry_to_page(entry);
+ }
+ /* return 1 if the page is an normal ksm page or KSM-placed zero page */
+ ret = (page && folio_test_ksm(page_folio(page))) || is_ksm_zero_pte(ptent);
+ pte_unmap_unlock(pte, ptl);
+ return ret;
+}
+
+static const struct mm_walk_ops break_ksm_ops = {
+ .pmd_entry = break_ksm_pmd_entry,
+ .walk_lock = PGWALK_RDLOCK,
+};
+
+static const struct mm_walk_ops break_ksm_lock_vma_ops = {
+ .pmd_entry = break_ksm_pmd_entry,
+ .walk_lock = PGWALK_WRLOCK,
+};
+
/*
* We use break_ksm to break COW on a ksm page by triggering unsharing,
* such that the ksm page will get replaced by an exclusive anonymous page.
@@ -623,26 +664,16 @@ static inline bool ksm_test_exit(struct
static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma)
{
vm_fault_t ret = 0;
-
- if (lock_vma)
- vma_start_write(vma);
+ const struct mm_walk_ops *ops = lock_vma ?
+ &break_ksm_lock_vma_ops : &break_ksm_ops;
do {
- bool ksm_page = false;
- struct folio_walk fw;
- struct folio *folio;
+ int ksm_page;
cond_resched();
- folio = folio_walk_start(&fw, vma, addr,
- FW_MIGRATION | FW_ZEROPAGE);
- if (folio) {
- /* Small folio implies FW_LEVEL_PTE. */
- if (!folio_test_large(folio) &&
- (folio_test_ksm(folio) || is_ksm_zero_pte(fw.pte)))
- ksm_page = true;
- folio_walk_end(&fw, vma);
- }
-
+ ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL);
+ if (WARN_ON_ONCE(ksm_page < 0))
+ return ksm_page;
if (!ksm_page)
return 0;
ret = handle_mm_fault(vma, addr,
_
Patches currently in -mm which might be from pedrodemargomes@gmail.com are
ksm-use-range-walk-function-to-jump-over-holes-in-scan_get_next_rmap_item.patch
revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
ksm-perform-a-range-walk-in-break_ksm.patch
ksm-replace-function-unmerge_ksm_pages-with-break_ksm.patch
^ permalink raw reply [flat|nested] 2+ messages in thread
* + revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch added to mm-new branch
@ 2025-10-31 20:56 Andrew Morton
0 siblings, 0 replies; 2+ messages in thread
From: Andrew Morton @ 2025-10-31 20:56 UTC (permalink / raw)
To: mm-commits, xu.xin16, david, chengming.zhou, pedrodemargomes,
akpm
The patch titled
Subject: Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
has been added to the -mm mm-new branch. Its filename is
revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Subject: Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
Date: Fri, 31 Oct 2025 14:46:23 -0300
Patch series "ksm: perform a range-walk to jump over holes in break_ksm",
v2.
When unmerging an address range, unmerge_ksm_pages function walks every
page address in the specified range to locate ksm pages. This becomes
highly inefficient when scanning large virtual memory areas that contain
mostly unmapped regions, causing the process to get blocked for several
minutes.
This patch makes break_ksm, function called by unmerge_ksm_pages for every
page in an address range, perform a range walk, allowing it to skip over
entire unmapped holes in a VMA, avoiding unnecessary lookups.
As pointed by David Hildenbrand in [1], unmerge_ksm_pages() is called
from:
* ksm_madvise() through madvise(MADV_UNMERGEABLE). There are not a lot
of users of that function.
* __ksm_del_vma() through ksm_del_vmas(). Effectively called when
disabling KSM for a process either through the sysctl or from s390x gmap
code when enabling storage keys for a VM.
Consider the following test program which creates a 32 TiB mapping in the
virtual address space but only populates a single page:
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;
int main() {
char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);
if (area == MAP_FAILED) {
perror("mmap() failed\n");
return -1;
}
/* Populate a single page such that we get an anon_vma. */
*area = 0;
/* Enable KSM. */
madvise(area, size, MADV_MERGEABLE);
madvise(area, size, MADV_UNMERGEABLE);
return 0;
}
Without this patch, this program takes 9 minutes to finish, while with
this patch it finishes in less then 5 seconds.
This patch (of 3):
This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
function break_ksm_pmd_entry() to use folios.
This reverts break_ksm() to use walk_page_range_vma() instead of
folio_walk_start(). This will make it easier to later modify break_ksm()
to perform a proper range walk.
Link: https://lkml.kernel.org/r/20251031174625.127417-2-pedrodemargomes@gmail.com
Link: https://lore.kernel.org/linux-mm/e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com/ [1]
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ksm.c | 63 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 47 insertions(+), 16 deletions(-)
--- a/mm/ksm.c~revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk
+++ a/mm/ksm.c
@@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct
return atomic_read(&mm->mm_users) == 0;
}
+static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
+ struct mm_walk *walk)
+{
+ struct folio *folio = NULL;
+ spinlock_t *ptl;
+ pte_t *pte;
+ pte_t ptent;
+ int ret;
+
+ pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ if (!pte)
+ return 0;
+ ptent = ptep_get(pte);
+ if (pte_present(ptent)) {
+ folio = vm_normal_folio(walk->vma, addr, ptent);
+ } else if (!pte_none(ptent)) {
+ swp_entry_t entry = pte_to_swp_entry(ptent);
+
+ /*
+ * As KSM pages remain KSM pages until freed, no need to wait
+ * here for migration to end.
+ */
+ if (is_migration_entry(entry))
+ folio = pfn_swap_entry_folio(entry);
+ }
+ /* return 1 if the page is an normal ksm page or KSM-placed zero page */
+ ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);
+ pte_unmap_unlock(pte, ptl);
+ return ret;
+}
+
+static const struct mm_walk_ops break_ksm_ops = {
+ .pmd_entry = break_ksm_pmd_entry,
+ .walk_lock = PGWALK_RDLOCK,
+};
+
+static const struct mm_walk_ops break_ksm_lock_vma_ops = {
+ .pmd_entry = break_ksm_pmd_entry,
+ .walk_lock = PGWALK_WRLOCK,
+};
+
/*
* We use break_ksm to break COW on a ksm page by triggering unsharing,
* such that the ksm page will get replaced by an exclusive anonymous page.
@@ -623,26 +664,16 @@ static inline bool ksm_test_exit(struct
static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma)
{
vm_fault_t ret = 0;
-
- if (lock_vma)
- vma_start_write(vma);
+ const struct mm_walk_ops *ops = lock_vma ?
+ &break_ksm_lock_vma_ops : &break_ksm_ops;
do {
- bool ksm_page = false;
- struct folio_walk fw;
- struct folio *folio;
+ int ksm_page;
cond_resched();
- folio = folio_walk_start(&fw, vma, addr,
- FW_MIGRATION | FW_ZEROPAGE);
- if (folio) {
- /* Small folio implies FW_LEVEL_PTE. */
- if (!folio_test_large(folio) &&
- (folio_test_ksm(folio) || is_ksm_zero_pte(fw.pte)))
- ksm_page = true;
- folio_walk_end(&fw, vma);
- }
-
+ ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL);
+ if (WARN_ON_ONCE(ksm_page < 0))
+ return ksm_page;
if (!ksm_page)
return 0;
ret = handle_mm_fault(vma, addr,
_
Patches currently in -mm which might be from pedrodemargomes@gmail.com are
ksm-use-range-walk-function-to-jump-over-holes-in-scan_get_next_rmap_item.patch
revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
ksm-perform-a-range-walk-in-break_ksm.patch
ksm-replace-function-unmerge_ksm_pages-with-break_ksm.patch
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-31 20:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-28 22:47 + revert-mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch added to mm-new branch Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2025-10-31 20:56 Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).