Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: xu.xin16@zte.com.cn, akpm@linux-foundation.org, ljs@kernel.org,
	hughd@google.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, michel@lespinasse.org
Subject: Re: [PATCH v4 3/5] ksm: add vm_pgoff into ksm_rmap_item
Date: Wed, 13 May 2026 13:59:21 +0200	[thread overview]
Message-ID: <0e2a90d7-8b0b-4d04-9c34-9a82294fee9f@kernel.org> (raw)
In-Reply-To: <20260503204843889ik1YHe8LX_5N0Neyn0ner@zte.com.cn>

On 5/3/26 14:48, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> The reason for adding vm_pgoff to ksm_rmap_item has been discussed in previous
> mailing list threads [1][2]. The main purpose is to allow the KSM reverse mapping
> to obtain the original VMA's vm_pgoff, so that during anon_vma_tree travering, it
> can conditionally locate the VMAs and avoid scanning the entire address space
> [0, ULONG_MAX].
> 
> To minimize the size impact of adding vm_pgoff to ksm_rmap_item as much as
> possible, a trick that David suggested is to use a UNION that groups the members
> related to the unstable tree together with the newly added vm_pgoff. The members
> that valids only when in unstable tree include oldchecksum and age information.
> However, the function should_skip_rmap_item() in the smart scanning needs slight
> modification, since this function still uses the age information even when the
> rmap_item is in a stable state (the page is not KSM), a situation that occurs
> during COW faults. After using union, the size is still 64 byte without increasing.
> 
> The setting and resetting of rmap_item->vm_pgoff are similar to rmap_item->anon_vma.
> 
> [1] https://lore.kernel.org/all/adTPQSb-qSSHviJN@lucifer/
> [2] https://lore.kernel.org/all/202604091806051535BJWZ_FTtdIm3Snk24ei_@zte.com.cn/
> 
> Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---
>  mm/ksm.c | 41 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 7d5b76478f0b..0299a53ba7c9 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -195,22 +195,28 @@ struct ksm_stable_node {
>   * @node: rb node of this rmap_item in the unstable tree
>   * @head: pointer to stable_node heading this list in the stable tree
>   * @hlist: link into hlist of rmap_items hanging off that stable_node
> - * @age: number of scan iterations since creation
> - * @remaining_skips: how many scans to skip
> + * @age: number of scan iterations since creation (unstable node)
> + * @remaining_skips: how many scans to skip (unstable node)
> + * @vm_pgoff: vm_pgoff into the original VMA where the page is mapped (stable node)
>   */
>  struct ksm_rmap_item {
>  	struct ksm_rmap_item *rmap_list;
>  	union {
> -		struct anon_vma *anon_vma;	/* when stable */
> +		struct anon_vma *anon_vma;	/* for reverse mapping, when stable */
>  #ifdef CONFIG_NUMA
>  		int nid;		/* when node of unstable tree */
>  #endif
>  	};
>  	struct mm_struct *mm;
>  	unsigned long address;		/* + low bits used for flags below */
> -	unsigned int oldchecksum;	/* when unstable */
> -	rmap_age_t age;
> -	rmap_age_t remaining_skips;
> +	union {
> +		struct {
> +			unsigned int oldchecksum;
> +			rmap_age_t age;
> +			rmap_age_t remaining_skips;
> +		};			/* when unstable */
> +		unsigned long vm_pgoff;		/* for reverse mapping, when stable */
> +	};
>  	union {
>  		struct rb_node node;	/* when node of unstable tree */
>  		struct {		/* when listed from stable tree */
> @@ -776,6 +782,10 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
>  	return vma;
>  }
> 
> +/*
> + * break_cow: actively break the write-protect of the VMA. This is calld when

s/called/called/

But why are we changing the documentation as part of this patch?

> + * rmap_item has not yet become stable, but page has been merged.
> + */
>  static void break_cow(struct ksm_rmap_item *rmap_item)
>  {
>  	struct mm_struct *mm = rmap_item->mm;
> @@ -787,6 +797,8 @@ static void break_cow(struct ksm_rmap_item *rmap_item)
>  	 * to undo, we also need to drop a reference to the anon_vma.
>  	 */
>  	put_anon_vma(rmap_item->anon_vma);
> +	/* Reset pgoff that overlays age-related information. (still unstable) */
> +	rmap_item->vm_pgoff = 0;
> 
>  	mmap_read_lock(mm);
>  	vma = find_mergeable_vma(mm, addr);
> @@ -899,6 +911,8 @@ static void remove_node_from_stable_tree(struct ksm_stable_node *stable_node)
>  		VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
>  		stable_node->rmap_hlist_len--;
>  		put_anon_vma(rmap_item->anon_vma);
> +		/* Reset pgoff that overlays age-related information. */

AI review says that this might not be the case on 32bit. So I guess the comment
should be

"might overlay"

> +		rmap_item->vm_pgoff = 0;
>  		rmap_item->address &= PAGE_MASK;
>  		cond_resched();
>  	}
> @@ -1052,6 +1066,8 @@ static void remove_rmap_item_from_tree(struct ksm_rmap_item *rmap_item)
>  		stable_node->rmap_hlist_len--;
> 
>  		put_anon_vma(rmap_item->anon_vma);
> +		/* Reset pgoff that overlays age-related information. */

Same here.

> +		rmap_item->vm_pgoff = 0;
>  		rmap_item->head = NULL;
>  		rmap_item->address &= PAGE_MASK;
> 
> @@ -1598,8 +1614,15 @@ static int try_to_merge_with_ksm_page(struct ksm_rmap_item *rmap_item,
>  	/* Unstable nid is in union with stable anon_vma: remove first */
>  	remove_rmap_item_from_tree(rmap_item);
> 
> -	/* Must get reference to anon_vma while still holding mmap_lock */
> +	/*
> +	 * Must get reference to anon_vma while still holding mmap_lock,
> +	 * We set these two members of stable node here instead of
> +	 * stable_tree_append(), maybe because we don't want to hold
> +	 * mmap_read_lock again? Here mmap_read_lock is already held to
> +	 * find_mergeable_vma before merging.
> +	 */
>  	rmap_item->anon_vma = vma->anon_vma;
> +	rmap_item->vm_pgoff = vma->vm_pgoff;

I suggested to use the actual linear page index instead at [1]. Storing the
vm_pgoff is wrong I think.

[1] https://lore.kernel.org/all/5401c1d2-5f42-4288-9dad-2b9768b579c7@kernel.org/


In another comment I said:

But it's all confusing. Because we might temporarily have rmap_item->anon_vma
set on an rmap_entry that does not yet have the STABLE_FLAG flag set through
stable_tree_append().

And then we magically call break_cow() which does a magical

	put_anon_vma(rmap_item->anon_vma);

(this doesn't look correct in once case) ... anyhow.

So we might want to reset the pgoff there as well, OR only store
the pgoff in stable_tree_append() where we actually set STABLE_FLAG.


So I guess we have to figure that out as well.

-- 
Cheers,

David


  reply	other threads:[~2026-05-13 11:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-03 12:35 [PATCH v4 0/5] KSM: Optimizations for rmap_walk_ksm xu.xin16
2026-05-03 12:39 ` [PATCH v4 1/5] mm/rmap: add tracepoint for rmap_walk xu.xin16
2026-05-13 12:03   ` David Hildenbrand (Arm)
2026-05-03 12:42 ` [PATCH v4 2/5] tools/testing: add rmap walk latency benchmark for KSM, anonymous and file pages xu.xin16
2026-05-03 12:48 ` [PATCH v4 3/5] ksm: add vm_pgoff into ksm_rmap_item xu.xin16
2026-05-13 11:59   ` David Hildenbrand (Arm) [this message]
2026-05-03 12:50 ` [PATCH v4 4/5] ksm: Optimize rmap_walk_ksm by passing a suitable address range xu.xin16
2026-05-13 12:10   ` David Hildenbrand (Arm)
2026-05-03 12:51 ` [PATCH v4 5/5] ksm: add mremap selftests for ksm_rmap_walk xu.xin16
2026-05-13 12:10   ` David Hildenbrand (Arm)
2026-05-03 14:59 ` [PATCH v4 0/5] KSM: Optimizations for rmap_walk_ksm Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e2a90d7-8b0b-4d04-9c34-9a82294fee9f@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=michel@lespinasse.org \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox