Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v4 4/5] ksm: Optimize rmap_walk_ksm by passing a suitable address range
@ 2026-05-15  7:13 xu.xin16
  2026-05-15 12:28 ` Lorenzo Stoakes
  0 siblings, 1 reply; 5+ messages in thread
From: xu.xin16 @ 2026-05-15  7:13 UTC (permalink / raw)
  To: david; +Cc: akpm, ljs, hughd, linux-mm, linux-kernel, michel

> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index 0299a53ba7c9..a13184d00759 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -3200,6 +3200,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> >  	hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
> >  		/* Ignore the stable/unstable/sqnr flags */
> >  		const unsigned long addr = rmap_item->address & PAGE_MASK;
> > +		const unsigned long vm_pgoff = rmap_item->vm_pgoff;
> >  		struct anon_vma *anon_vma = rmap_item->anon_vma;
> >  		struct anon_vma_chain *vmac;
> >  		struct vm_area_struct *vma;
> > @@ -3213,8 +3214,12 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)
> >  			anon_vma_lock_read(anon_vma);
> >  		}
> >
> > +		/*
> > +		 * Currently KSM folios are order-0 normal pages, so pgoff_end
> > +		 * should be the same as pgoff_start.
> > +		 */
> >  		anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> > -					       0, ULONG_MAX) {
> > +					       vm_pgoff, vm_pgoff) {
>
> But vm_pgoff would just correspond to the start of the VMA, not where the page
> is actually mapped?
>
> I'd assume you really want the linear page index of the original page?

Right. I've reconsidered and realized that using vm_pgoff is indeed unstable.

My initial idea was: as long as we can find the VMA that maps this page,
it's sufficient for anon_vma_interval_tree_foreach() to check whether
"vm_pgoff <= pgoff of the original page <= (vm_pgoff + vma_pages(v) - 1)".

However, the flaw here is that the VMA may be split(e.g., due to madvise or mprotect),
causing vma_pages(v) to change, thereby making this condition no longer satisfied.

Indeed, it&apos;s better to use the linear page index of the original page.

I&apos;ll send v5 to correct this.

>
> --
> Cheers,
>
> David
>


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH v4 0/5] KSM: Optimizations for rmap_walk_ksm
@ 2026-05-03 12:35 xu.xin16
  2026-05-03 12:50 ` [PATCH v4 4/5] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
  0 siblings, 1 reply; 5+ messages in thread
From: xu.xin16 @ 2026-05-03 12:35 UTC (permalink / raw)
  To: akpm, david, ljs, hughd; +Cc: linux-mm, linux-kernel, michel, xu.xin16

From: xu xin <xu.xin16@zte.com.cn>

Deep investigation revealed that rmap_walk_ksm's 99.9% of iterations inside
the anon_vma_interval_tree_foreach loop are skipped due to the first check
"if (addr < vma->vm_start || addr >= vma->vm_end)), indicating that a large
number of loop iterations are ineffective. This inefficiency arises because
the pgoff_start and pgoff_end parameters passed to
anon_vma_interval_tree_foreach span the entire address space from 0 to
ULONG_MAX, resulting in very poor loop efficiency.

An initial immature thought was using the "rmap_item->address >> PAGE_SHIFT"
to be the pgoff passed into anon_vma_interval_tree_foreach(). But this is
flawed because when a range has been mremap-moved, when its anon folio
indexes and anon_vma pgoff correspond to the original user address,
not to the current user address, which was pointed out at:

  https://lore.kernel.org/all/02e1b8df-d568-8cbb-b8f6-46d5476d9d75@google.com/

According to the implementation of anon_vma_interval_tree_foreach —
it essentially iterates to find a suitable VMA such that the provided pgoff falls
within the VMA's range [vm_pgoff, vm_pgoff + vma_pages(v) - 1].

So the solution is to add vm_pgoff field in ksm_rmap_item and use vm_pgoff instead of
address >> PAGE_SHIFT.


Changes in v4:
 - Add a tracepoint for rmap_walk
 - Provide a testbench for rmap_walk
 - Add vm_pgoff field in ksm_rmap_item
 - use vm_pgoff instead of address >> PAGE_SHIFT (Suggested by David and Lorenzo)

Changes in v3:
- Fix some typos in commit description
- Replace "pgoff_start" and 'pgoff_end' by 'pgoff'.

Changes in v2:
- Use const variable to initialize 'addr'  "pgoff_start" and 'pgoff_end'
- Let pgoff_end = pgoff_start, since KSM folios are always order-0 (Suggested by David)

xu xin (5):
  mm/rmap: add tracepoint for rmap_walk
  tools/testing: add rmap walk latency benchmark for KSM, anonymous and
    file pages
  ksm: add vm_pgoff into ksm_rmap_item
  ksm: Optimize rmap_walk_ksm by passing a suitable address range
  ksm: add mremap selftests for ksm_rmap_walk

 MAINTAINERS                          |   3 +
 include/trace/events/rmap.h          |  49 +++
 mm/ksm.c                             |  48 ++-
 mm/rmap.c                            |  14 +
 tools/testing/rmap/Makefile          |  11 +
 tools/testing/rmap/rmap_benchmark.c  | 488 +++++++++++++++++++++++++++
 tools/testing/selftests/mm/rmap.c    |  79 +++++
 tools/testing/selftests/mm/vm_util.c |  38 +++
 tools/testing/selftests/mm/vm_util.h |   2 +
 9 files changed, 724 insertions(+), 8 deletions(-)
 create mode 100644 include/trace/events/rmap.h
 create mode 100644 tools/testing/rmap/Makefile
 create mode 100644 tools/testing/rmap/rmap_benchmark.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-15 12:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  7:13 [PATCH v4 4/5] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-05-15 12:28 ` Lorenzo Stoakes
  -- strict thread matches above, loose matches on Subject: below --
2026-05-03 12:35 [PATCH v4 0/5] KSM: Optimizations for rmap_walk_ksm xu.xin16
2026-05-03 12:50 ` [PATCH v4 4/5] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-05-13 12:10   ` David Hildenbrand (Arm)
2026-05-15  7:15     ` xu.xin16

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox