[PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break_ksm
@ 2025-10-31 17:46 Pedro Demarchi Gomes
  2025-10-31 17:46 ` [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk" Pedro Demarchi Gomes
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-10-31 17:46 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel,
	Pedro Demarchi Gomes

When unmerging an address range, unmerge_ksm_pages function walks every
page address in the specified range to locate ksm pages. This becomes
highly inefficient when scanning large virtual memory areas that contain
mostly unmapped regions, causing the process to get blocked for several
minutes.

This patch makes break_ksm, function called by unmerge_ksm_pages for
every page in an address range, perform a range walk, allowing it to skip
over entire unmapped holes in a VMA, avoiding unnecessary lookups.

As pointed by David Hildenbrand in [1], unmerge_ksm_pages() is called
from:

* ksm_madvise() through madvise(MADV_UNMERGEABLE). There are not a lot
of users of that function.

* __ksm_del_vma() through ksm_del_vmas(). Effectively called when
disabling KSM for a process either through the sysctl or from s390x gmap
code when enabling storage keys for a VM.

Consider the following test program which creates a 32 TiB mapping in
the virtual address space but only populates a single page:

#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>

/* 32 TiB */
const size_t size = 32ul * 1024 * 1024 * 1024 * 1024;

int main() {
        char *area = mmap(NULL, size, PROT_READ | PROT_WRITE,
                          MAP_NORESERVE | MAP_PRIVATE | MAP_ANON, -1, 0);

        if (area == MAP_FAILED) {
                perror("mmap() failed\n");
                return -1;
        }

        /* Populate a single page such that we get an anon_vma. */
        *area = 0;

        /* Enable KSM. */
        madvise(area, size, MADV_MERGEABLE);
        madvise(area, size, MADV_UNMERGEABLE);
        return 0;
}

Without this patch, this program takes 9 minutes to finish, while with
this patch it finishes in less then 5 seconds.

[1] https://lore.kernel.org/linux-mm/e0886fdf-d198-4130-bd9a-be276c59da37@redhat.com/

Changelog:

v2:
  - Use folios in break_ksm_pmd_entry
  - Pass address pointer to walk_page_range_vma private parameter
    instead of struct break_ksm_arg
  - Do not unnecessarily initialize start_ptep
  - Order local variable declarations from longest to shortest line
  - Improve patch 2 description (do not reference the cover letter)
  - Quite a few styling fixes and improvements as suggested by reviewer

v1: https://lore.kernel.org/all/20251028131945.26445-1-pedrodemargomes@gmail.com/

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>

Pedro Demarchi Gomes (3):
  Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to
    folio_walk"
  ksm: perform a range-walk in break_ksm
  ksm: replace function unmerge_ksm_pages with break_ksm

 mm/ksm.c | 141 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 86 insertions(+), 55 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
  2025-10-31 17:46 [PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break_ksm Pedro Demarchi Gomes
@ 2025-10-31 17:46 ` Pedro Demarchi Gomes
  2025-11-03 17:00   ` David Hildenbrand (Red Hat)
  2025-10-31 17:46 ` [PATCH v2 2/3] ksm: perform a range-walk in break_ksm Pedro Demarchi Gomes
  2025-10-31 17:46 ` [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm Pedro Demarchi Gomes
  2 siblings, 1 reply; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-10-31 17:46 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel,
	Pedro Demarchi Gomes

This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
function break_ksm_pmd_entry() to use folios.

This reverts break_ksm() to use walk_page_range_vma() instead of
folio_walk_start().
This will make it easier to later modify break_ksm() to perform a proper
range walk.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
---
 mm/ksm.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 4f672f4f2140..922d2936e206 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
 	return atomic_read(&mm->mm_users) == 0;
 }
 
+static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
+			struct mm_walk *walk)
+{
+	struct folio *folio = NULL;
+	spinlock_t *ptl;
+	pte_t *pte;
+	pte_t ptent;
+	int ret;
+
+	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	if (!pte)
+		return 0;
+	ptent = ptep_get(pte);
+	if (pte_present(ptent)) {
+		folio = vm_normal_folio(walk->vma, addr, ptent);
+	} else if (!pte_none(ptent)) {
+		swp_entry_t entry = pte_to_swp_entry(ptent);
+
+		/*
+		 * As KSM pages remain KSM pages until freed, no need to wait
+		 * here for migration to end.
+		 */
+		if (is_migration_entry(entry))
+			folio = pfn_swap_entry_folio(entry);
+	}
+	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
+	ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);
+	pte_unmap_unlock(pte, ptl);
+	return ret;
+}
+
+static const struct mm_walk_ops break_ksm_ops = {
+	.pmd_entry = break_ksm_pmd_entry,
+	.walk_lock = PGWALK_RDLOCK,
+};
+
+static const struct mm_walk_ops break_ksm_lock_vma_ops = {
+	.pmd_entry = break_ksm_pmd_entry,
+	.walk_lock = PGWALK_WRLOCK,
+};
+
 /*
  * We use break_ksm to break COW on a ksm page by triggering unsharing,
  * such that the ksm page will get replaced by an exclusive anonymous page.
@@ -623,26 +664,16 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
 static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma)
 {
 	vm_fault_t ret = 0;
-
-	if (lock_vma)
-		vma_start_write(vma);
+	const struct mm_walk_ops *ops = lock_vma ?
+				&break_ksm_lock_vma_ops : &break_ksm_ops;
 
 	do {
-		bool ksm_page = false;
-		struct folio_walk fw;
-		struct folio *folio;
+		int ksm_page;
 
 		cond_resched();
-		folio = folio_walk_start(&fw, vma, addr,
-					 FW_MIGRATION | FW_ZEROPAGE);
-		if (folio) {
-			/* Small folio implies FW_LEVEL_PTE. */
-			if (!folio_test_large(folio) &&
-			    (folio_test_ksm(folio) || is_ksm_zero_pte(fw.pte)))
-				ksm_page = true;
-			folio_walk_end(&fw, vma);
-		}
-
+		ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL);
+		if (WARN_ON_ONCE(ksm_page < 0))
+			return ksm_page;
 		if (!ksm_page)
 			return 0;
 		ret = handle_mm_fault(vma, addr,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] ksm: perform a range-walk in break_ksm
  2025-10-31 17:46 [PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break_ksm Pedro Demarchi Gomes
  2025-10-31 17:46 ` [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk" Pedro Demarchi Gomes
@ 2025-10-31 17:46 ` Pedro Demarchi Gomes
  2025-11-03 17:06   ` David Hildenbrand (Red Hat)
  2025-10-31 17:46 ` [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm Pedro Demarchi Gomes
  2 siblings, 1 reply; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-10-31 17:46 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel,
	Pedro Demarchi Gomes

Make break_ksm() receive an address range and change
break_ksm_pmd_entry() to perform a range-walk and return the address of
the first ksm page found.

This change allows break_ksm() to skip unmapped regions instead of
iterating every page address. When unmerging large sparse VMAs, this
significantly reduces runtime.

In a benchmark unmerging a 32 TiB sparse virtual address space where
only one page was populated, the runtime dropped from 9 minutes to less
then 5 seconds.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
---
 mm/ksm.c | 88 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 48 insertions(+), 40 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 922d2936e206..64d66699133d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -607,35 +607,55 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
 	return atomic_read(&mm->mm_users) == 0;
 }
 
-static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
+struct break_ksm_arg {
+	unsigned long addr;
+};
+
+static int break_ksm_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long end,
 			struct mm_walk *walk)
 {
-	struct folio *folio = NULL;
+	unsigned long *found_addr = (unsigned long *) walk->private;
+	struct mm_struct *mm = walk->mm;
+	pte_t *start_ptep, *ptep;
 	spinlock_t *ptl;
-	pte_t *pte;
-	pte_t ptent;
-	int ret;
+	int found = 0;
 
-	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
-	if (!pte)
+	if (ksm_test_exit(walk->mm))
 		return 0;
-	ptent = ptep_get(pte);
-	if (pte_present(ptent)) {
-		folio = vm_normal_folio(walk->vma, addr, ptent);
-	} else if (!pte_none(ptent)) {
-		swp_entry_t entry = pte_to_swp_entry(ptent);
 
-		/*
-		 * As KSM pages remain KSM pages until freed, no need to wait
-		 * here for migration to end.
-		 */
-		if (is_migration_entry(entry))
-			folio = pfn_swap_entry_folio(entry);
+	if (signal_pending(current))
+		return -ERESTARTSYS;
+
+	start_ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+	if (!start_ptep)
+		return 0;
+
+	for (ptep = start_ptep; addr < end; ptep++, addr += PAGE_SIZE) {
+		pte_t pte = ptep_get(ptep);
+		struct folio *folio = NULL;
+
+		if (pte_present(pte)) {
+			folio = vm_normal_folio(walk->vma, addr, pte);
+		} else if (!pte_none(pte)) {
+			swp_entry_t entry = pte_to_swp_entry(pte);
+
+			/*
+			 * As KSM pages remain KSM pages until freed, no need to wait
+			 * here for migration to end.
+			 */
+			if (is_migration_entry(entry))
+				folio = pfn_swap_entry_folio(entry);
+		}
+		/* return 1 if the page is an normal ksm page or KSM-placed zero page */
+		found = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(pte);
+		if (found) {
+			*found_addr = addr;
+			goto out_unlock;
+		}
 	}
-	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
-	ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);
-	pte_unmap_unlock(pte, ptl);
-	return ret;
+out_unlock:
+	pte_unmap_unlock(ptep, ptl);
+	return found;
 }
 
 static const struct mm_walk_ops break_ksm_ops = {
@@ -661,7 +681,8 @@ static const struct mm_walk_ops break_ksm_lock_vma_ops = {
  * of the process that owns 'vma'.  We also do not want to enforce
  * protection keys here anyway.
  */
-static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma)
+static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
+		unsigned long end, bool lock_vma)
 {
 	vm_fault_t ret = 0;
 	const struct mm_walk_ops *ops = lock_vma ?
@@ -671,11 +692,9 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_v
 		int ksm_page;
 
 		cond_resched();
-		ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL);
-		if (WARN_ON_ONCE(ksm_page < 0))
+		ksm_page = walk_page_range_vma(vma, addr, end, ops, &addr);
+		if (ksm_page <= 0)
 			return ksm_page;
-		if (!ksm_page)
-			return 0;
 		ret = handle_mm_fault(vma, addr,
 				      FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE,
 				      NULL);
@@ -761,7 +780,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
 	mmap_read_lock(mm);
 	vma = find_mergeable_vma(mm, addr);
 	if (vma)
-		break_ksm(vma, addr, false);
+		break_ksm(vma, addr, addr + PAGE_SIZE, false);
 	mmap_read_unlock(mm);
 }
 
@@ -1072,18 +1091,7 @@ static void remove_trailing_rmap_items(struct ksm_rmap_item **rmap_list)
 static int unmerge_ksm_pages(struct vm_area_struct *vma,
 			     unsigned long start, unsigned long end, bool lock_vma)
 {
-	unsigned long addr;
-	int err = 0;
-
-	for (addr = start; addr < end && !err; addr += PAGE_SIZE) {
-		if (ksm_test_exit(vma->vm_mm))
-			break;
-		if (signal_pending(current))
-			err = -ERESTARTSYS;
-		else
-			err = break_ksm(vma, addr, lock_vma);
-	}
-	return err;
+	return break_ksm(vma, start, end, lock_vma);
 }
 
 static inline
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm
  2025-10-31 17:46 [PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break_ksm Pedro Demarchi Gomes
  2025-10-31 17:46 ` [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk" Pedro Demarchi Gomes
  2025-10-31 17:46 ` [PATCH v2 2/3] ksm: perform a range-walk in break_ksm Pedro Demarchi Gomes
@ 2025-10-31 17:46 ` Pedro Demarchi Gomes
  2025-11-03 17:06   ` David Hildenbrand (Red Hat)
  2 siblings, 1 reply; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-10-31 17:46 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel,
	Pedro Demarchi Gomes

Function unmerge_ksm_pages() is unnecessary since now break_ksm() walks
an address range. So replace it with break_ksm().

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
---
 mm/ksm.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 64d66699133d..7cd19a6ce45f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -669,6 +669,18 @@ static const struct mm_walk_ops break_ksm_lock_vma_ops = {
 };
 
 /*
+ * Though it's very tempting to unmerge rmap_items from stable tree rather
+ * than check every pte of a given vma, the locking doesn't quite work for
+ * that - an rmap_item is assigned to the stable tree after inserting ksm
+ * page and upping mmap_lock.  Nor does it fit with the way we skip dup'ing
+ * rmap_items from parent to child at fork time (so as not to waste time
+ * if exit comes before the next scan reaches it).
+ *
+ * Similarly, although we'd like to remove rmap_items (so updating counts
+ * and freeing memory) when unmerging an area, it's easier to leave that
+ * to the next pass of ksmd - consider, for example, how ksmd might be
+ * in cmp_and_merge_page on one of the rmap_items we would be removing.
+ *
  * We use break_ksm to break COW on a ksm page by triggering unsharing,
  * such that the ksm page will get replaced by an exclusive anonymous page.
  *
@@ -1075,25 +1087,6 @@ static void remove_trailing_rmap_items(struct ksm_rmap_item **rmap_list)
 	}
 }
 
-/*
- * Though it's very tempting to unmerge rmap_items from stable tree rather
- * than check every pte of a given vma, the locking doesn't quite work for
- * that - an rmap_item is assigned to the stable tree after inserting ksm
- * page and upping mmap_lock.  Nor does it fit with the way we skip dup'ing
- * rmap_items from parent to child at fork time (so as not to waste time
- * if exit comes before the next scan reaches it).
- *
- * Similarly, although we'd like to remove rmap_items (so updating counts
- * and freeing memory) when unmerging an area, it's easier to leave that
- * to the next pass of ksmd - consider, for example, how ksmd might be
- * in cmp_and_merge_page on one of the rmap_items we would be removing.
- */
-static int unmerge_ksm_pages(struct vm_area_struct *vma,
-			     unsigned long start, unsigned long end, bool lock_vma)
-{
-	return break_ksm(vma, start, end, lock_vma);
-}
-
 static inline
 struct ksm_stable_node *folio_stable_node(const struct folio *folio)
 {
@@ -1231,8 +1224,7 @@ static int unmerge_and_remove_all_rmap_items(void)
 		for_each_vma(vmi, vma) {
 			if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
 				continue;
-			err = unmerge_ksm_pages(vma,
-						vma->vm_start, vma->vm_end, false);
+			err = break_ksm(vma, vma->vm_start, vma->vm_end, false);
 			if (err)
 				goto error;
 		}
@@ -2859,7 +2851,7 @@ static int __ksm_del_vma(struct vm_area_struct *vma)
 		return 0;
 
 	if (vma->anon_vma) {
-		err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, true);
+		err = break_ksm(vma, vma->vm_start, vma->vm_end, true);
 		if (err)
 			return err;
 	}
@@ -3011,7 +3003,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 			return 0;		/* just ignore the advice */
 
 		if (vma->anon_vma) {
-			err = unmerge_ksm_pages(vma, start, end, true);
+			err = break_ksm(vma, start, end, true);
 			if (err)
 				return err;
 		}
@@ -3393,7 +3385,7 @@ static int ksm_memory_callback(struct notifier_block *self,
 		 * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
 		 * and remove_all_stable_nodes() while memory is going offline:
 		 * it is unsafe for them to touch the stable tree at this time.
-		 * But unmerge_ksm_pages(), rmap lookups and other entry points
+		 * But break_ksm(), rmap lookups and other entry points
 		 * which do not need the ksm_thread_mutex are all safe.
 		 */
 		mutex_lock(&ksm_thread_mutex);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
  2025-10-31 17:46 ` [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk" Pedro Demarchi Gomes
@ 2025-11-03 17:00   ` David Hildenbrand (Red Hat)
  2025-11-05 13:28     ` Pedro Demarchi Gomes
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-03 17:00 UTC (permalink / raw)
  To: Pedro Demarchi Gomes, David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel

On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
> This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
> function break_ksm_pmd_entry() to use folios.
> 
> This reverts break_ksm() to use walk_page_range_vma() instead of
> folio_walk_start().
> This will make it easier to later modify break_ksm() to perform a proper
> range walk.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
> ---
>   mm/ksm.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 4f672f4f2140..922d2936e206 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>   	return atomic_read(&mm->mm_users) == 0;
>   }
>   
> +static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
> +			struct mm_walk *walk)
> +{
> +	struct folio *folio = NULL;
> +	spinlock_t *ptl;
> +	pte_t *pte;
> +	pte_t ptent;
> +	int ret;
> +
> +	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> +	if (!pte)
> +		return 0;
> +	ptent = ptep_get(pte);
> +	if (pte_present(ptent)) {
> +		folio = vm_normal_folio(walk->vma, addr, ptent);
> +	} else if (!pte_none(ptent)) {
> +		swp_entry_t entry = pte_to_swp_entry(ptent);
> +
> +		/*
> +		 * As KSM pages remain KSM pages until freed, no need to wait
> +		 * here for migration to end.
> +		 */
> +		if (is_migration_entry(entry))
> +			folio = pfn_swap_entry_folio(entry);
> +	}
> +	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
> +	ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);

Staring again, we should really call is_ksm_zero_pte() only if we know 
the folio is present.

It's not super dangerous in the old code (because we would only look at 
present an migration entries), but now you are making it possible to 
call it on even more non-present ptes.

With that handled

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] ksm: perform a range-walk in break_ksm
  2025-10-31 17:46 ` [PATCH v2 2/3] ksm: perform a range-walk in break_ksm Pedro Demarchi Gomes
@ 2025-11-03 17:06   ` David Hildenbrand (Red Hat)
  2025-11-05 13:32     ` Pedro Demarchi Gomes
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-03 17:06 UTC (permalink / raw)
  To: Pedro Demarchi Gomes, David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel

On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
> Make break_ksm() receive an address range and change
> break_ksm_pmd_entry() to perform a range-walk and return the address of
> the first ksm page found.
> 
> This change allows break_ksm() to skip unmapped regions instead of
> iterating every page address. When unmerging large sparse VMAs, this
> significantly reduces runtime.
> 
> In a benchmark unmerging a 32 TiB sparse virtual address space where
> only one page was populated, the runtime dropped from 9 minutes to less
> then 5 seconds.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
> ---
>   mm/ksm.c | 88 ++++++++++++++++++++++++++++++--------------------------
>   1 file changed, 48 insertions(+), 40 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 922d2936e206..64d66699133d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -607,35 +607,55 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>   	return atomic_read(&mm->mm_users) == 0;
>   }
>   
> -static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
> +struct break_ksm_arg {
> +	unsigned long addr;
> +};

Leftover? :)

> +
> +static int break_ksm_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long end,
>   			struct mm_walk *walk)
>   {
> -	struct folio *folio = NULL;
> +	unsigned long *found_addr = (unsigned long *) walk->private;
> +	struct mm_struct *mm = walk->mm;
> +	pte_t *start_ptep, *ptep;
>   	spinlock_t *ptl;
> -	pte_t *pte;
> -	pte_t ptent;
> -	int ret;
> +	int found = 0;

Best to perform the ret -> found rename already in patch #1.


With both things

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm
  2025-10-31 17:46 ` [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm Pedro Demarchi Gomes
@ 2025-11-03 17:06   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-03 17:06 UTC (permalink / raw)
  To: Pedro Demarchi Gomes, David Hildenbrand, Andrew Morton
  Cc: Xu Xin, Chengming Zhou, linux-mm, linux-kernel

On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
> Function unmerge_ksm_pages() is unnecessary since now break_ksm() walks
> an address range. So replace it with break_ksm().
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
> ---

Thanks!

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
  2025-11-03 17:00   ` David Hildenbrand (Red Hat)
@ 2025-11-05 13:28     ` Pedro Demarchi Gomes
  2025-11-05 14:21       ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-11-05 13:28 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Hildenbrand, Andrew Morton, Xu Xin, Chengming Zhou,
	linux-mm, linux-kernel

On Mon, Nov 03, 2025 at 06:00:08PM +0100, David Hildenbrand (Red Hat) wrote:
> On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
> > This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
> > function break_ksm_pmd_entry() to use folios.
> > 
> > This reverts break_ksm() to use walk_page_range_vma() instead of
> > folio_walk_start().
> > This will make it easier to later modify break_ksm() to perform a proper
> > range walk.
> > 
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
> > ---
> >   mm/ksm.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
> >   1 file changed, 47 insertions(+), 16 deletions(-)
> > 
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 4f672f4f2140..922d2936e206 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
> >   	return atomic_read(&mm->mm_users) == 0;
> >   }
> > +static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
> > +			struct mm_walk *walk)
> > +{
> > +	struct folio *folio = NULL;
> > +	spinlock_t *ptl;
> > +	pte_t *pte;
> > +	pte_t ptent;
> > +	int ret;
> > +
> > +	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> > +	if (!pte)
> > +		return 0;
> > +	ptent = ptep_get(pte);
> > +	if (pte_present(ptent)) {
> > +		folio = vm_normal_folio(walk->vma, addr, ptent);
> > +	} else if (!pte_none(ptent)) {
> > +		swp_entry_t entry = pte_to_swp_entry(ptent);
> > +
> > +		/*
> > +		 * As KSM pages remain KSM pages until freed, no need to wait
> > +		 * here for migration to end.
> > +		 */
> > +		if (is_migration_entry(entry))
> > +			folio = pfn_swap_entry_folio(entry);
> > +	}
> > +	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
> > +	ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);
> 
> Staring again, we should really call is_ksm_zero_pte() only if we know the
> folio is present.
> 
> It's not super dangerous in the old code (because we would only look at
> present an migration entries), but now you are making it possible to call it
> on even more non-present ptes.
> 

IIUC vm_normal_folio will return NULL in the case of a ksm zero pte, so
we can not do
	found = folio && (folio_test_ksm(folio) || is_ksm_zero_pte(pte))
because it will always be false for a ksm zero pte.
So we should do 
	found = (folio && folio_test_ksm(folio)) || (pte_present(ptent) 
		&& is_ksm_zero_pte(ptent));
since if the pte is present and is a zero pte we can guarantee that
the folio is present.

Sorry if I am missing something.


> -- 
> Cheers
> 
> David
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] ksm: perform a range-walk in break_ksm
  2025-11-03 17:06   ` David Hildenbrand (Red Hat)
@ 2025-11-05 13:32     ` Pedro Demarchi Gomes
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Demarchi Gomes @ 2025-11-05 13:32 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Hildenbrand, Andrew Morton, Xu Xin, Chengming Zhou,
	linux-mm, linux-kernel

On Mon, Nov 03, 2025 at 06:06:26PM +0100, David Hildenbrand (Red Hat) wrote:
> On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
> > Make break_ksm() receive an address range and change
> > break_ksm_pmd_entry() to perform a range-walk and return the address of
> > the first ksm page found.
> > 
> > This change allows break_ksm() to skip unmapped regions instead of
> > iterating every page address. When unmerging large sparse VMAs, this
> > significantly reduces runtime.
> > 
> > In a benchmark unmerging a 32 TiB sparse virtual address space where
> > only one page was populated, the runtime dropped from 9 minutes to less
> > then 5 seconds.
> > 
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
> > ---
> >   mm/ksm.c | 88 ++++++++++++++++++++++++++++++--------------------------
> >   1 file changed, 48 insertions(+), 40 deletions(-)
> > 
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 922d2936e206..64d66699133d 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -607,35 +607,55 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
> >   	return atomic_read(&mm->mm_users) == 0;
> >   }
> > -static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
> > +struct break_ksm_arg {
> > +	unsigned long addr;
> > +};
> 
> Leftover? :)
> 

Yes, I am sorry.
I will remove it in the v3.

> > +
> > +static int break_ksm_pmd_entry(pmd_t *pmdp, unsigned long addr, unsigned long end,
> >   			struct mm_walk *walk)
> >   {
> > -	struct folio *folio = NULL;
> > +	unsigned long *found_addr = (unsigned long *) walk->private;
> > +	struct mm_struct *mm = walk->mm;
> > +	pte_t *start_ptep, *ptep;
> >   	spinlock_t *ptl;
> > -	pte_t *pte;
> > -	pte_t ptent;
> > -	int ret;
> > +	int found = 0;
> 
> Best to perform the ret -> found rename already in patch #1.
>

Ok

> With both things
> 
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

Thanks!

> -- 
> Cheers
> 
> David
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk"
  2025-11-05 13:28     ` Pedro Demarchi Gomes
@ 2025-11-05 14:21       ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-05 14:21 UTC (permalink / raw)
  To: Pedro Demarchi Gomes
  Cc: Andrew Morton, Xu Xin, Chengming Zhou, linux-mm, linux-kernel

On 05.11.25 14:28, Pedro Demarchi Gomes wrote:
> On Mon, Nov 03, 2025 at 06:00:08PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 31.10.25 18:46, Pedro Demarchi Gomes wrote:
>>> This reverts commit e317a8d8b4f600fc7ec9725e26417030ee594f52 and changes
>>> function break_ksm_pmd_entry() to use folios.
>>>
>>> This reverts break_ksm() to use walk_page_range_vma() instead of
>>> folio_walk_start().
>>> This will make it easier to later modify break_ksm() to perform a proper
>>> range walk.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
>>> ---
>>>    mm/ksm.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
>>>    1 file changed, 47 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 4f672f4f2140..922d2936e206 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -607,6 +607,47 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
>>>    	return atomic_read(&mm->mm_users) == 0;
>>>    }
>>> +static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next,
>>> +			struct mm_walk *walk)
>>> +{
>>> +	struct folio *folio = NULL;
>>> +	spinlock_t *ptl;
>>> +	pte_t *pte;
>>> +	pte_t ptent;
>>> +	int ret;
>>> +
>>> +	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
>>> +	if (!pte)
>>> +		return 0;
>>> +	ptent = ptep_get(pte);
>>> +	if (pte_present(ptent)) {
>>> +		folio = vm_normal_folio(walk->vma, addr, ptent);
>>> +	} else if (!pte_none(ptent)) {
>>> +		swp_entry_t entry = pte_to_swp_entry(ptent);
>>> +
>>> +		/*
>>> +		 * As KSM pages remain KSM pages until freed, no need to wait
>>> +		 * here for migration to end.
>>> +		 */
>>> +		if (is_migration_entry(entry))
>>> +			folio = pfn_swap_entry_folio(entry);
>>> +	}
>>> +	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
>>> +	ret = (folio && folio_test_ksm(folio)) || is_ksm_zero_pte(ptent);
>>
>> Staring again, we should really call is_ksm_zero_pte() only if we know the
>> folio is present.
>>
>> It's not super dangerous in the old code (because we would only look at
>> present an migration entries), but now you are making it possible to call it
>> on even more non-present ptes.
>>
> 
> IIUC vm_normal_folio will return NULL in the case of a ksm zero pte, so
> we can not do
> 	found = folio && (folio_test_ksm(folio) || is_ksm_zero_pte(pte))
> because it will always be false for a ksm zero pte.
> So we should do
> 	found = (folio && folio_test_ksm(folio)) || (pte_present(ptent)
> 		&& is_ksm_zero_pte(ptent));
> since if the pte is present and is a zero pte we can guarantee that
> the folio is present.

Yes exactly.


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-11-05 14:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 17:46 [PATCH v2 0/3] ksm: perform a range-walk to jump over holes in break_ksm Pedro Demarchi Gomes
2025-10-31 17:46 ` [PATCH v2 1/3] Revert "mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk" Pedro Demarchi Gomes
2025-11-03 17:00   ` David Hildenbrand (Red Hat)
2025-11-05 13:28     ` Pedro Demarchi Gomes
2025-11-05 14:21       ` David Hildenbrand (Red Hat)
2025-10-31 17:46 ` [PATCH v2 2/3] ksm: perform a range-walk in break_ksm Pedro Demarchi Gomes
2025-11-03 17:06   ` David Hildenbrand (Red Hat)
2025-11-05 13:32     ` Pedro Demarchi Gomes
2025-10-31 17:46 ` [PATCH v2 3/3] ksm: replace function unmerge_ksm_pages with break_ksm Pedro Demarchi Gomes
2025-11-03 17:06   ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).