linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs
@ 2025-05-29 17:15 Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 1/4] mm: ksm: have KSM VMA checks not require a VMA pointer Lorenzo Stoakes
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 17:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

When KSM-by-default is established using prctl(PR_SET_MEMORY_MERGE), this
defaults all newly mapped VMAs to having VM_MERGEABLE set, and thus makes
them available to KSM for samepage merging. It also sets VM_MERGEABLE in
all existing VMAs.

However this causes an issue upon mapping of new VMAs - the initial flags
will never have VM_MERGEABLE set when attempting a merge with adjacent VMAs
(this is set later in the mmap() logic), and adjacent VMAs will ALWAYS have
VM_MERGEABLE set.

This renders all newly mapped VMAs unmergeable.

To avoid this, this series performs the check for PR_SET_MEMORY_MERGE far
earlier in the mmap() logic, prior to the merge being attempted.

However we run into complexity with the depreciated .mmap() callback - if a
driver hooks this, it might change flags which adjust KSM merge
eligibility.

We have to worry about this because, while KSM is only applicable to
private mappings, this includes both anonymous and MAP_PRIVATE-mapped
file-backed mappings.

This isn't a problem for brk(), where the VMA must be anonymous. However in
mmap() we must be conservative - if the VMA is anonymous then we can always
proceed, however if not, we permit only shmem mappings (whose .mmap hook
does not affect KSM eligibility) and drivers which implement
.mmap_prepare() (invoked prior to the KSM eligibility check).

If we can't be sure of the driver changing things, then we maintain the
same behaviour of performing the KSM check later in the mmap() logic (and
thus losing new VMA mergeability).

A great many use-cases for this logic will use anonymous mappings any rate,
so this change should already cover the majority of actual KSM use-cases.

v3:
* Propagated tags (thanks everyone!)
* Updated cover letter to be explicit that we're fixing merging for new
  VMAs only as per Vlastimil, and further cleaned it up for clarity.
* Corrected reference to shmem being permitted by KSM in 3/4 commit message
  as per Xu.
* Clarified reference to MAP_SHARED vs MAP_PRIVATE file-backed mappings in
  3/4 commit message as per Vlastimil.
* Updated comment around mmap_prepare check in can_set_ksm_flags_early() as
  per Xu.
* Reduced can_set_ksm_flags_early() comment to avoid confusion - the
  overloaded use of 'mergeable' is confusing (as raised by Vlastimil) and
  the first paragraph is sufficient.
* Rearranged comments and code ordering in can_set_ksm_flags_early() for
  clarity based on discussion with Vlastimil.
* Reduced and improved commit 3/4 commit message further for clarity.
* Added missing stubs to VMA userland tests.

v2:
* Removed unnecessary ret local variable in ksm_vma_flags() as per David.
* added Stefan Roesch in cc and added Fixes tag as per Andrew, David.
* Propagated tags (thanks everyone!)
* Removed unnecessary !CONFIG_KSM ksm_add_vma() stub from
  include/linux/ksm.h.
* Added missing !CONFIG_KSM ksm_vma_flags() stub in
  include/linux/ksm.h.
* After discussion with David, I've decided to defer removing the
  VM_SPECIAL case for KSM, we can address this in a follow-up series.
* Expanded 3/4 commit message to reference KSM eligibility vs. merging and
  referenced future plans to permit KSM for VM_SPECIAL VMAs.
https://lore.kernel.org/all/cover.1747844463.git.lorenzo.stoakes@oracle.com/

v1:
https://lore.kernel.org/all/cover.1747431920.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (4):
  mm: ksm: have KSM VMA checks not require a VMA pointer
  mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible()
  mm: prevent KSM from breaking VMA merging for new VMAs
  tools/testing/selftests: add VMA merge tests for KSM merge

 include/linux/fs.h                 |  7 ++-
 include/linux/ksm.h                |  8 +--
 mm/ksm.c                           | 49 ++++++++++++-------
 mm/vma.c                           | 44 ++++++++++++++++-
 tools/testing/selftests/mm/merge.c | 78 ++++++++++++++++++++++++++++++
 tools/testing/vma/vma_internal.h   | 11 +++++
 6 files changed, 173 insertions(+), 24 deletions(-)

--
2.49.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/4] mm: ksm: have KSM VMA checks not require a VMA pointer
  2025-05-29 17:15 [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs Lorenzo Stoakes
@ 2025-05-29 17:15 ` Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 2/4] mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible() Lorenzo Stoakes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 17:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

In subsequent commits we are going to determine KSM eligibility prior to a
VMA being constructed, at which point we will of course not yet have access
to a VMA pointer.

It is trivial to boil down the check logic to be parameterised on
mm_struct, file and VMA flags, so do so.

As a part of this change, additionally expose and use file_is_dax() to
determine whether a file is being mapped under a DAX inode.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/fs.h |  7 ++++++-
 mm/ksm.c           | 32 ++++++++++++++++++++------------
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 09c8495dacdb..e1397e2b55ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3691,9 +3691,14 @@ void setattr_copy(struct mnt_idmap *, struct inode *inode,
 
 extern int file_update_time(struct file *file);
 
+static inline bool file_is_dax(const struct file *file)
+{
+	return file && IS_DAX(file->f_mapping->host);
+}
+
 static inline bool vma_is_dax(const struct vm_area_struct *vma)
 {
-	return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
+	return file_is_dax(vma->vm_file);
 }
 
 static inline bool vma_is_fsdax(struct vm_area_struct *vma)
diff --git a/mm/ksm.c b/mm/ksm.c
index 8583fb91ef13..08d486f188ff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -677,28 +677,33 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_v
 	return (ret & VM_FAULT_OOM) ? -ENOMEM : 0;
 }
 
-static bool vma_ksm_compatible(struct vm_area_struct *vma)
+static bool ksm_compatible(const struct file *file, vm_flags_t vm_flags)
 {
-	if (vma->vm_flags & (VM_SHARED  | VM_MAYSHARE   | VM_PFNMAP  |
-			     VM_IO      | VM_DONTEXPAND | VM_HUGETLB |
-			     VM_MIXEDMAP| VM_DROPPABLE))
+	if (vm_flags & (VM_SHARED   | VM_MAYSHARE   | VM_PFNMAP  |
+			VM_IO       | VM_DONTEXPAND | VM_HUGETLB |
+			VM_MIXEDMAP | VM_DROPPABLE))
 		return false;		/* just ignore the advice */
 
-	if (vma_is_dax(vma))
+	if (file_is_dax(file))
 		return false;
 
 #ifdef VM_SAO
-	if (vma->vm_flags & VM_SAO)
+	if (vm_flags & VM_SAO)
 		return false;
 #endif
 #ifdef VM_SPARC_ADI
-	if (vma->vm_flags & VM_SPARC_ADI)
+	if (vm_flags & VM_SPARC_ADI)
 		return false;
 #endif
 
 	return true;
 }
 
+static bool vma_ksm_compatible(struct vm_area_struct *vma)
+{
+	return ksm_compatible(vma->vm_file, vma->vm_flags);
+}
+
 static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
 		unsigned long addr)
 {
@@ -2696,14 +2701,17 @@ static int ksm_scan_thread(void *nothing)
 	return 0;
 }
 
-static void __ksm_add_vma(struct vm_area_struct *vma)
+static bool __ksm_should_add_vma(const struct file *file, vm_flags_t vm_flags)
 {
-	unsigned long vm_flags = vma->vm_flags;
-
 	if (vm_flags & VM_MERGEABLE)
-		return;
+		return false;
+
+	return ksm_compatible(file, vm_flags);
+}
 
-	if (vma_ksm_compatible(vma))
+static void __ksm_add_vma(struct vm_area_struct *vma)
+{
+	if (__ksm_should_add_vma(vma->vm_file, vma->vm_flags))
 		vm_flags_set(vma, VM_MERGEABLE);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/4] mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible()
  2025-05-29 17:15 [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 1/4] mm: ksm: have KSM VMA checks not require a VMA pointer Lorenzo Stoakes
@ 2025-05-29 17:15 ` Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge Lorenzo Stoakes
  3 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 17:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

There's no need to spell out all the special cases, also doing it this way
makes it absolutely clear that we preclude unmergeable VMAs in general, and
puts the other excluded flags in stark and clear contrast.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 mm/ksm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 08d486f188ff..d0c763abd499 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -679,9 +679,8 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_v
 
 static bool ksm_compatible(const struct file *file, vm_flags_t vm_flags)
 {
-	if (vm_flags & (VM_SHARED   | VM_MAYSHARE   | VM_PFNMAP  |
-			VM_IO       | VM_DONTEXPAND | VM_HUGETLB |
-			VM_MIXEDMAP | VM_DROPPABLE))
+	if (vm_flags & (VM_SHARED  | VM_MAYSHARE | VM_SPECIAL |
+			VM_HUGETLB | VM_DROPPABLE))
 		return false;		/* just ignore the advice */
 
 	if (file_is_dax(file))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-05-29 17:15 [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 1/4] mm: ksm: have KSM VMA checks not require a VMA pointer Lorenzo Stoakes
  2025-05-29 17:15 ` [PATCH v3 2/4] mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible() Lorenzo Stoakes
@ 2025-05-29 17:15 ` Lorenzo Stoakes
  2025-05-30  7:15   ` Vlastimil Babka
                     ` (2 more replies)
  2025-05-29 17:15 ` [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge Lorenzo Stoakes
  3 siblings, 3 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 17:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.

This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this flag
for all existing VMAs and propagating this across fork/exec.

However it also breaks VMA merging for new VMAs, both in the process and
all forked (and fork/exec'd) child processes.

This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set. However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.

To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge. In the case of brk() this can always be done.

However on mmap() things are more complicated - while KSM is not supported
for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
file-backed mappings.

These mappings may have deprecated .mmap() callbacks specified which could,
in theory, adjust flags and thus KSM eligibility.

So we check to determine whether this is possible. If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.

This fixes VMA merging for all new anonymous mappings, which covers the
majority of real-world cases, so we should see a significant improvement in
VMA mergeability.

For MAP_PRIVATE file-backed mappings, those which implement the
.mmap_prepare() hook and shmem are both known to be safe, so we allow
these, disallowing all other cases.

Also add stubs for newly introduced function invocations to VMA userland
testing.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 include/linux/ksm.h              |  8 +++---
 mm/ksm.c                         | 18 ++++++++-----
 mm/vma.c                         | 44 ++++++++++++++++++++++++++++++--
 tools/testing/vma/vma_internal.h | 11 ++++++++
 4 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index d73095b5cd96..51787f0b0208 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -17,8 +17,8 @@
 #ifdef CONFIG_KSM
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		unsigned long end, int advice, unsigned long *vm_flags);
-
-void ksm_add_vma(struct vm_area_struct *vma);
+vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+			 vm_flags_t vm_flags);
 int ksm_enable_merge_any(struct mm_struct *mm);
 int ksm_disable_merge_any(struct mm_struct *mm);
 int ksm_disable(struct mm_struct *mm);
@@ -97,8 +97,10 @@ bool ksm_process_mergeable(struct mm_struct *mm);
 
 #else  /* !CONFIG_KSM */
 
-static inline void ksm_add_vma(struct vm_area_struct *vma)
+static inline vm_flags_t ksm_vma_flags(const struct mm_struct *mm,
+		const struct file *file, vm_flags_t vm_flags)
 {
+	return vm_flags;
 }
 
 static inline int ksm_disable(struct mm_struct *mm)
diff --git a/mm/ksm.c b/mm/ksm.c
index d0c763abd499..18b3690bb69a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2731,16 +2731,22 @@ static int __ksm_del_vma(struct vm_area_struct *vma)
 	return 0;
 }
 /**
- * ksm_add_vma - Mark vma as mergeable if compatible
+ * ksm_vma_flags - Update VMA flags to mark as mergeable if compatible
  *
- * @vma:  Pointer to vma
+ * @mm:       Proposed VMA's mm_struct
+ * @file:     Proposed VMA's file-backed mapping, if any.
+ * @vm_flags: Proposed VMA"s flags.
+ *
+ * Returns: @vm_flags possibly updated to mark mergeable.
  */
-void ksm_add_vma(struct vm_area_struct *vma)
+vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+			 vm_flags_t vm_flags)
 {
-	struct mm_struct *mm = vma->vm_mm;
+	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags) &&
+	    __ksm_should_add_vma(file, vm_flags))
+		vm_flags |= VM_MERGEABLE;
 
-	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
-		__ksm_add_vma(vma);
+	return vm_flags;
 }
 
 static void ksm_add_vmas(struct mm_struct *mm)
diff --git a/mm/vma.c b/mm/vma.c
index 7ebc9eb608f4..3e351beb82ca 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2490,7 +2490,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
 	 */
 	if (!vma_is_anonymous(vma))
 		khugepaged_enter_vma(vma, map->flags);
-	ksm_add_vma(vma);
 	*vmap = vma;
 	return 0;
 
@@ -2593,6 +2592,40 @@ static void set_vma_user_defined_fields(struct vm_area_struct *vma,
 	vma->vm_private_data = map->vm_private_data;
 }
 
+static void update_ksm_flags(struct mmap_state *map)
+{
+	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
+}
+
+/*
+ * Are we guaranteed no driver can change state such as to preclude KSM merging?
+ * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
+ */
+static bool can_set_ksm_flags_early(struct mmap_state *map)
+{
+	struct file *file = map->file;
+
+	/* Anonymous mappings have no driver which can change them. */
+	if (!file)
+		return true;
+
+	/*
+	 * If .mmap_prepare() is specified, then the driver will have already
+	 * manipulated state prior to updating KSM flags. So no need to worry
+	 * about mmap callbacks modifying VMA flags after the KSM flag has been
+	 * updated here, which could otherwise affect KSM eligibility.
+	 */
+	if (file->f_op->mmap_prepare)
+		return true;
+
+	/* shmem is safe. */
+	if (shmem_file(file))
+		return true;
+
+	/* Any other .mmap callback is not safe. */
+	return false;
+}
+
 static unsigned long __mmap_region(struct file *file, unsigned long addr,
 		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
 		struct list_head *uf)
@@ -2603,6 +2636,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
 	VMA_ITERATOR(vmi, mm, addr);
 	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+	bool check_ksm_early = can_set_ksm_flags_early(&map);
 
 	error = __mmap_prepare(&map, uf);
 	if (!error && have_mmap_prepare)
@@ -2610,6 +2644,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	if (error)
 		goto abort_munmap;
 
+	if (check_ksm_early)
+		update_ksm_flags(&map);
+
 	/* Attempt to merge with adjacent VMAs... */
 	if (map.prev || map.next) {
 		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
@@ -2619,6 +2656,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 
 	/* ...but if we can't, allocate a new VMA. */
 	if (!vma) {
+		if (!check_ksm_early)
+			update_ksm_flags(&map);
+
 		error = __mmap_new_vma(&map, &vma);
 		if (error)
 			goto unacct_error;
@@ -2721,6 +2761,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	 * Note: This happens *after* clearing old mappings in some code paths.
 	 */
 	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
+	flags = ksm_vma_flags(mm, NULL, flags);
 	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
 		return -ENOMEM;
 
@@ -2764,7 +2805,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 
 	mm->map_count++;
 	validate_mm(mm);
-	ksm_add_vma(vma);
 out:
 	perf_event_mmap(vma);
 	mm->total_vm += len >> PAGE_SHIFT;
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 4505b1c31be1..77b2949d874a 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -1468,4 +1468,15 @@ static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma)
 	(void)vma;
 }
 
+static inline bool shmem_file(struct file *)
+{
+	return false;
+}
+
+static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct file *,
+			 vm_flags_t vm_flags)
+{
+	return vm_flags;
+}
+
 #endif	/* __MM_VMA_INTERNAL_H */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge
  2025-05-29 17:15 [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs Lorenzo Stoakes
                   ` (2 preceding siblings ...)
  2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
@ 2025-05-29 17:15 ` Lorenzo Stoakes
  2025-05-30  7:17   ` Vlastimil Babka
  3 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 17:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

Add test to assert that we have now allowed merging of VMAs when KSM
merging-by-default has been set by prctl(PR_SET_MEMORY_MERGE, ...).

We simply perform a trivial mapping of adjacent VMAs expecting a merge,
however prior to recent changes implementing this mode earlier than before,
these merges would not have succeeded.

Assert that we have fixed this!

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Tested-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 tools/testing/selftests/mm/merge.c | 78 ++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/tools/testing/selftests/mm/merge.c b/tools/testing/selftests/mm/merge.c
index c76646cdf6e6..2380a5a6a529 100644
--- a/tools/testing/selftests/mm/merge.c
+++ b/tools/testing/selftests/mm/merge.c
@@ -2,10 +2,12 @@
 
 #define _GNU_SOURCE
 #include "../kselftest_harness.h"
+#include <linux/prctl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/mman.h>
+#include <sys/prctl.h>
 #include <sys/wait.h>
 #include "vm_util.h"
 
@@ -31,6 +33,11 @@ FIXTURE_TEARDOWN(merge)
 {
 	ASSERT_EQ(munmap(self->carveout, 12 * self->page_size), 0);
 	ASSERT_EQ(close_procmap(&self->procmap), 0);
+	/*
+	 * Clear unconditionally, as some tests set this. It is no issue if this
+	 * fails (KSM may be disabled for instance).
+	 */
+	prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0);
 }
 
 TEST_F(merge, mprotect_unfaulted_left)
@@ -452,4 +459,75 @@ TEST_F(merge, forked_source_vma)
 	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr2 + 5 * page_size);
 }
 
+TEST_F(merge, ksm_merge)
+{
+	unsigned int page_size = self->page_size;
+	char *carveout = self->carveout;
+	struct procmap_fd *procmap = &self->procmap;
+	char *ptr, *ptr2;
+	int err;
+
+	/*
+	 * Map two R/W immediately adjacent to one another, they should
+	 * trivially merge:
+	 *
+	 * |-----------|-----------|
+	 * |    R/W    |    R/W    |
+	 * |-----------|-----------|
+	 *      ptr         ptr2
+	 */
+
+	ptr = mmap(&carveout[page_size], page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	ptr2 = mmap(&carveout[2 * page_size], page_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr2, MAP_FAILED);
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 2 * page_size);
+
+	/* Unmap the second half of this merged VMA. */
+	ASSERT_EQ(munmap(ptr2, page_size), 0);
+
+	/* OK, now enable global KSM merge. We clear this on test teardown. */
+	err = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
+	if (err == -1) {
+		int errnum = errno;
+
+		/* Only non-failure case... */
+		ASSERT_EQ(errnum, EINVAL);
+		/* ...but indicates we should skip. */
+		SKIP(return, "KSM memory merging not supported, skipping.");
+	}
+
+	/*
+	 * Now map a VMA adjacent to the existing that was just made
+	 * VM_MERGEABLE, this should merge as well.
+	 */
+	ptr2 = mmap(&carveout[2 * page_size], page_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr2, MAP_FAILED);
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 2 * page_size);
+
+	/* Now this VMA altogether. */
+	ASSERT_EQ(munmap(ptr, 2 * page_size), 0);
+
+	/* Try the same operation as before, asserting this also merges fine. */
+	ptr = mmap(&carveout[page_size], page_size, PROT_READ | PROT_WRITE,
+		   MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr, MAP_FAILED);
+	ptr2 = mmap(&carveout[2 * page_size], page_size,
+		    PROT_READ | PROT_WRITE,
+		    MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
+	ASSERT_NE(ptr2, MAP_FAILED);
+	ASSERT_TRUE(find_vma_procmap(procmap, ptr));
+	ASSERT_EQ(procmap->query.vma_start, (unsigned long)ptr);
+	ASSERT_EQ(procmap->query.vma_end, (unsigned long)ptr + 2 * page_size);
+}
+
 TEST_HARNESS_MAIN
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
@ 2025-05-30  7:15   ` Vlastimil Babka
  2025-06-02  7:00   ` xu.xin16
  2025-06-20 12:48   ` Lorenzo Stoakes
  2 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-05-30  7:15 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Jann Horn, Pedro Falcato, David Hildenbrand, Xu Xin,
	Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On 5/29/25 19:15, Lorenzo Stoakes wrote:
> If a user wishes to enable KSM mergeability for an entire process and all
> fork/exec'd processes that come after it, they use the prctl()
> PR_SET_MEMORY_MERGE operation.
> 
> This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> (in order to indicate they are KSM mergeable), as well as setting this flag
> for all existing VMAs and propagating this across fork/exec.
> 
> However it also breaks VMA merging for new VMAs, both in the process and
> all forked (and fork/exec'd) child processes.
> 
> This is because when a new mapping is proposed, the flags specified will
> never have VM_MERGEABLE set. However all adjacent VMAs will already have
> VM_MERGEABLE set, rendering VMAs unmergeable by default.
> 
> To work around this, we try to set the VM_MERGEABLE flag prior to
> attempting a merge. In the case of brk() this can always be done.
> 
> However on mmap() things are more complicated - while KSM is not supported
> for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
> file-backed mappings.
> 
> These mappings may have deprecated .mmap() callbacks specified which could,
> in theory, adjust flags and thus KSM eligibility.
> 
> So we check to determine whether this is possible. If not, we set
> VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> previous behaviour.
> 
> This fixes VMA merging for all new anonymous mappings, which covers the
> majority of real-world cases, so we should see a significant improvement in
> VMA mergeability.
> 
> For MAP_PRIVATE file-backed mappings, those which implement the
> .mmap_prepare() hook and shmem are both known to be safe, so we allow
> these, disallowing all other cases.
> 
> Also add stubs for newly introduced function invocations to VMA userland
> testing.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

The commit log is much clearer to me now, thanks :)

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge
  2025-05-29 17:15 ` [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge Lorenzo Stoakes
@ 2025-05-30  7:17   ` Vlastimil Babka
  0 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-05-30  7:17 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Jann Horn, Pedro Falcato, David Hildenbrand, Xu Xin,
	Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On 5/29/25 19:15, Lorenzo Stoakes wrote:
> Add test to assert that we have now allowed merging of VMAs when KSM
> merging-by-default has been set by prctl(PR_SET_MEMORY_MERGE, ...).
> 
> We simply perform a trivial mapping of adjacent VMAs expecting a merge,
> however prior to recent changes implementing this mode earlier than before,
> these merges would not have succeeded.
> 
> Assert that we have fixed this!
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
> Tested-by: Chengming Zhou <chengming.zhou@linux.dev>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
  2025-05-30  7:15   ` Vlastimil Babka
@ 2025-06-02  7:00   ` xu.xin16
  2025-06-20 12:48   ` Lorenzo Stoakes
  2 siblings, 0 replies; 13+ messages in thread
From: xu.xin16 @ 2025-06-02  7:00 UTC (permalink / raw)
  To: lorenzo.stoakes
  Cc: akpm, viro, brauner, jack, Liam.Howlett, vbabka, jannh, pfalcato,
	david, chengming.zhou, linux-mm, linux-kernel, linux-fsdevel, shr

> If a user wishes to enable KSM mergeability for an entire process and all
> fork/exec'd processes that come after it, they use the prctl()
> PR_SET_MEMORY_MERGE operation.
> 
> This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> (in order to indicate they are KSM mergeable), as well as setting this flag
> for all existing VMAs and propagating this across fork/exec.
> 
> However it also breaks VMA merging for new VMAs, both in the process and
> all forked (and fork/exec'd) child processes.
> 
> This is because when a new mapping is proposed, the flags specified will
> never have VM_MERGEABLE set. However all adjacent VMAs will already have
> VM_MERGEABLE set, rendering VMAs unmergeable by default.
> 
> To work around this, we try to set the VM_MERGEABLE flag prior to
> attempting a merge. In the case of brk() this can always be done.
> 
> However on mmap() things are more complicated - while KSM is not supported
> for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
> file-backed mappings.
> 
> These mappings may have deprecated .mmap() callbacks specified which could,
> in theory, adjust flags and thus KSM eligibility.
> 
> So we check to determine whether this is possible. If not, we set
> VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> previous behaviour.
> 
> This fixes VMA merging for all new anonymous mappings, which covers the
> majority of real-world cases, so we should see a significant improvement in
> VMA mergeability.
> 
> For MAP_PRIVATE file-backed mappings, those which implement the
> .mmap_prepare() hook and shmem are both known to be safe, so we allow
> these, disallowing all other cases.
> 
> Also add stubs for newly introduced function invocations to VMA userland
> testing.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>


Thanks, everything looks clearer to me.
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>





> ---
>  include/linux/ksm.h              |  8 +++---
>  mm/ksm.c                         | 18 ++++++++-----
>  mm/vma.c                         | 44 ++++++++++++++++++++++++++++++--
>  tools/testing/vma/vma_internal.h | 11 ++++++++
>  4 files changed, 70 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index d73095b5cd96..51787f0b0208 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -17,8 +17,8 @@
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  		unsigned long end, int advice, unsigned long *vm_flags);
> -
> -void ksm_add_vma(struct vm_area_struct *vma);
> +vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
> +			 vm_flags_t vm_flags);
>  int ksm_enable_merge_any(struct mm_struct *mm);
>  int ksm_disable_merge_any(struct mm_struct *mm);
>  int ksm_disable(struct mm_struct *mm);
> @@ -97,8 +97,10 @@ bool ksm_process_mergeable(struct mm_struct *mm);
>  
>  #else  /* !CONFIG_KSM */
>  
> -static inline void ksm_add_vma(struct vm_area_struct *vma)
> +static inline vm_flags_t ksm_vma_flags(const struct mm_struct *mm,
> +		const struct file *file, vm_flags_t vm_flags)
>  {
> +	return vm_flags;
>  }
>  
>  static inline int ksm_disable(struct mm_struct *mm)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d0c763abd499..18b3690bb69a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2731,16 +2731,22 @@ static int __ksm_del_vma(struct vm_area_struct *vma)
>  	return 0;
>  }
>  /**
> - * ksm_add_vma - Mark vma as mergeable if compatible
> + * ksm_vma_flags - Update VMA flags to mark as mergeable if compatible
>   *
> - * @vma:  Pointer to vma
> + * @mm:       Proposed VMA's mm_struct
> + * @file:     Proposed VMA's file-backed mapping, if any.
> + * @vm_flags: Proposed VMA"s flags.
> + *
> + * Returns: @vm_flags possibly updated to mark mergeable.
>   */
> -void ksm_add_vma(struct vm_area_struct *vma)
> +vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
> +			 vm_flags_t vm_flags)
>  {
> -	struct mm_struct *mm = vma->vm_mm;
> +	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags) &&
> +	    __ksm_should_add_vma(file, vm_flags))
> +		vm_flags |= VM_MERGEABLE;
>  
> -	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> -		__ksm_add_vma(vma);
> +	return vm_flags;
>  }
>  
>  static void ksm_add_vmas(struct mm_struct *mm)
> diff --git a/mm/vma.c b/mm/vma.c
> index 7ebc9eb608f4..3e351beb82ca 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2490,7 +2490,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
>  	 */
>  	if (!vma_is_anonymous(vma))
>  		khugepaged_enter_vma(vma, map->flags);
> -	ksm_add_vma(vma);
>  	*vmap = vma;
>  	return 0;
>  
> @@ -2593,6 +2592,40 @@ static void set_vma_user_defined_fields(struct vm_area_struct *vma,
>  	vma->vm_private_data = map->vm_private_data;
>  }
>  
> +static void update_ksm_flags(struct mmap_state *map)
> +{
> +	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
> +}
> +
> +/*
> + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> + */
> +static bool can_set_ksm_flags_early(struct mmap_state *map)
> +{
> +	struct file *file = map->file;
> +
> +	/* Anonymous mappings have no driver which can change them. */
> +	if (!file)
> +		return true;
> +
> +	/*
> +	 * If .mmap_prepare() is specified, then the driver will have already
> +	 * manipulated state prior to updating KSM flags. So no need to worry
> +	 * about mmap callbacks modifying VMA flags after the KSM flag has been
> +	 * updated here, which could otherwise affect KSM eligibility.
> +	 */
> +	if (file->f_op->mmap_prepare)
> +		return true;
> +
> +	/* shmem is safe. */
> +	if (shmem_file(file))
> +		return true;
> +
> +	/* Any other .mmap callback is not safe. */
> +	return false;
> +}
> +
>  static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
>  		struct list_head *uf)
> @@ -2603,6 +2636,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
>  	VMA_ITERATOR(vmi, mm, addr);
>  	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
> +	bool check_ksm_early = can_set_ksm_flags_early(&map);
>  
>  	error = __mmap_prepare(&map, uf);
>  	if (!error && have_mmap_prepare)
> @@ -2610,6 +2644,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	if (error)
>  		goto abort_munmap;
>  
> +	if (check_ksm_early)
> +		update_ksm_flags(&map);
> +
>  	/* Attempt to merge with adjacent VMAs... */
>  	if (map.prev || map.next) {
>  		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
> @@ -2619,6 +2656,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  
>  	/* ...but if we can't, allocate a new VMA. */
>  	if (!vma) {
> +		if (!check_ksm_early)
> +			update_ksm_flags(&map);
> +
>  		error = __mmap_new_vma(&map, &vma);
>  		if (error)
>  			goto unacct_error;
> @@ -2721,6 +2761,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	 * Note: This happens *after* clearing old mappings in some code paths.
>  	 */
>  	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
> +	flags = ksm_vma_flags(mm, NULL, flags);
>  	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
>  		return -ENOMEM;
>  
> @@ -2764,7 +2805,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  
>  	mm->map_count++;
>  	validate_mm(mm);
> -	ksm_add_vma(vma);
>  out:
>  	perf_event_mmap(vma);
>  	mm->total_vm += len >> PAGE_SHIFT;
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index 4505b1c31be1..77b2949d874a 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -1468,4 +1468,15 @@ static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma)
>  	(void)vma;
>  }
>  
> +static inline bool shmem_file(struct file *)
> +{
> +	return false;
> +}
> +
> +static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct file *,
> +			 vm_flags_t vm_flags)
> +{
> +	return vm_flags;
> +}
> +
>  #endif	/* __MM_VMA_INTERNAL_H */
> -- 
> 2.49.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
  2025-05-30  7:15   ` Vlastimil Babka
  2025-06-02  7:00   ` xu.xin16
@ 2025-06-20 12:48   ` Lorenzo Stoakes
  2025-06-22 19:39     ` Andrew Morton
  2025-06-23  8:37     ` Vlastimil Babka
  2 siblings, 2 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-20 12:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

Hi Andrew,

Sending a fix-patch for this commit due to a reported syzbot issue which
highlighted a bug in the implementation.

I discuss the syzbot report at [0].

[0]: https://lore.kernel.org/all/a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local/

There's a very minor conflict around the map->vm_flags vs. map->flags change,
easily resolvable, but if you need a respin let me know.

I ran through all mm self tests included the newly introduced one in 4/4 and all
good.

Thanks, Lorenzo

----8<----
From 4d9dde3013837595d733b5059c2d6474261654d6 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 20 Jun 2025 13:21:03 +0100
Subject: [PATCH] mm/vma: correctly invoke late KSM check after mmap hook

Previously we erroneously checked whether KSM was applicable prior to
invoking the f_op->mmap() hook in the case of not being able to perform
this check early.

This is problematic, as filesystems such as hugetlb, which use anonymous
memory and might otherwise get KSM'd, set VM_HUGETLB in the f_op->mmap()
hook.

Correct this by checking at the appropriate time.

Reported-by: syzbot+a74a028d848147bc5931@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6853fc57.a00a0220.137b3.0009.GAE@google.com/
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 4abed296d882..eccc4e0b4d32 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -32,6 +32,9 @@ struct mmap_state {
 	struct vma_munmap_struct vms;
 	struct ma_state mas_detach;
 	struct maple_tree mt_detach;
+
+	/* Determine if we can check KSM flags early in mmap() logic. */
+	bool check_ksm_early;
 };

 #define MMAP_STATE(name, mm_, vmi_, addr_, len_, pgoff_, vm_flags_, file_) \
@@ -2334,6 +2337,11 @@ static void vms_abort_munmap_vmas(struct vma_munmap_struct *vms,
 	vms_complete_munmap_vmas(vms, mas_detach);
 }

+static void update_ksm_flags(struct mmap_state *map)
+{
+	map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
+}
+
 /*
  * __mmap_prepare() - Prepare to gather any overlapping VMAs that need to be
  * unmapped once the map operation is completed, check limits, account mapping
@@ -2438,6 +2446,7 @@ static int __mmap_new_file_vma(struct mmap_state *map,
 			!(map->vm_flags & VM_MAYWRITE) &&
 			(vma->vm_flags & VM_MAYWRITE));

+	map->file = vma->vm_file;
 	map->vm_flags = vma->vm_flags;

 	return 0;
@@ -2487,6 +2496,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
 	if (error)
 		goto free_iter_vma;

+	if (!map->check_ksm_early) {
+		update_ksm_flags(map);
+		vm_flags_init(vma, map->vm_flags);
+	}
+
 #ifdef CONFIG_SPARC64
 	/* TODO: Fix SPARC ADI! */
 	WARN_ON_ONCE(!arch_validate_flags(map->vm_flags));
@@ -2606,11 +2620,6 @@ static void set_vma_user_defined_fields(struct vm_area_struct *vma,
 	vma->vm_private_data = map->vm_private_data;
 }

-static void update_ksm_flags(struct mmap_state *map)
-{
-	map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
-}
-
 /*
  * Are we guaranteed no driver can change state such as to preclude KSM merging?
  * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
@@ -2650,7 +2659,8 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
 	VMA_ITERATOR(vmi, mm, addr);
 	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
-	bool check_ksm_early = can_set_ksm_flags_early(&map);
+
+	map.check_ksm_early = can_set_ksm_flags_early(&map);

 	error = __mmap_prepare(&map, uf);
 	if (!error && have_mmap_prepare)
@@ -2658,7 +2668,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	if (error)
 		goto abort_munmap;

-	if (check_ksm_early)
+	if (map.check_ksm_early)
 		update_ksm_flags(&map);

 	/* Attempt to merge with adjacent VMAs... */
@@ -2670,9 +2680,6 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,

 	/* ...but if we can't, allocate a new VMA. */
 	if (!vma) {
-		if (!check_ksm_early)
-			update_ksm_flags(&map);
-
 		error = __mmap_new_vma(&map, &vma);
 		if (error)
 			goto unacct_error;
--
2.49.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-06-20 12:48   ` Lorenzo Stoakes
@ 2025-06-22 19:39     ` Andrew Morton
  2025-06-23  9:16       ` Lorenzo Stoakes
  2025-06-23  8:37     ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2025-06-22 19:39 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On Fri, 20 Jun 2025 13:48:09 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> Hi Andrew,
> 
> Sending a fix-patch for this commit due to a reported syzbot issue which
> highlighted a bug in the implementation.
> 
> I discuss the syzbot report at [0].
> 
> [0]: https://lore.kernel.org/all/a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local/
> 
> There's a very minor conflict around the map->vm_flags vs. map->flags change,
> easily resolvable, but if you need a respin let me know.

I actually saw 4 conflicts, fixed various things up and...

> @@ -2487,6 +2496,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
>  	if (error)
>  		goto free_iter_vma;
> 
> +	if (!map->check_ksm_early) {
> +		update_ksm_flags(map);
> +		vm_flags_init(vma, map->vm_flags);
> +	}
> +

Guessing map->flags was intended here, I made that change then unmade
it in the later mm-update-core-kernel-code-to-use-vm_flags_t-consistently.patch.

I'll do a full rebuild at a couple of bisection points, please check
that all landed OK.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-06-20 12:48   ` Lorenzo Stoakes
  2025-06-22 19:39     ` Andrew Morton
@ 2025-06-23  8:37     ` Vlastimil Babka
  2025-06-23  9:18       ` Lorenzo Stoakes
  1 sibling, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-23  8:37 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Jann Horn, Pedro Falcato, David Hildenbrand, Xu Xin,
	Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On 6/20/25 14:48, Lorenzo Stoakes wrote:
> Hi Andrew,
> 
> Sending a fix-patch for this commit due to a reported syzbot issue which
> highlighted a bug in the implementation.
> 
> I discuss the syzbot report at [0].
> 
> [0]: https://lore.kernel.org/all/a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local/
> 
> There's a very minor conflict around the map->vm_flags vs. map->flags change,
> easily resolvable, but if you need a respin let me know.
> 
> I ran through all mm self tests included the newly introduced one in 4/4 and all
> good.
> 
> Thanks, Lorenzo
> 
> ----8<----
> From 4d9dde3013837595d733b5059c2d6474261654d6 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Fri, 20 Jun 2025 13:21:03 +0100
> Subject: [PATCH] mm/vma: correctly invoke late KSM check after mmap hook
> 
> Previously we erroneously checked whether KSM was applicable prior to
> invoking the f_op->mmap() hook in the case of not being able to perform
> this check early.
> 
> This is problematic, as filesystems such as hugetlb, which use anonymous
> memory and might otherwise get KSM'd, set VM_HUGETLB in the f_op->mmap()
> hook.
> 
> Correct this by checking at the appropriate time.
> 
> Reported-by: syzbot+a74a028d848147bc5931@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6853fc57.a00a0220.137b3.0009.GAE@google.com/
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

I've checked the version in mm tree, LGTM.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-06-22 19:39     ` Andrew Morton
@ 2025-06-23  9:16       ` Lorenzo Stoakes
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23  9:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Liam R . Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On Sun, Jun 22, 2025 at 12:39:31PM -0700, Andrew Morton wrote:
> On Fri, 20 Jun 2025 13:48:09 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > Hi Andrew,
> >
> > Sending a fix-patch for this commit due to a reported syzbot issue which
> > highlighted a bug in the implementation.
> >
> > I discuss the syzbot report at [0].
> >
> > [0]: https://lore.kernel.org/all/a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local/
> >
> > There's a very minor conflict around the map->vm_flags vs. map->flags change,
> > easily resolvable, but if you need a respin let me know.
>
> I actually saw 4 conflicts, fixed various things up and...
>
> > @@ -2487,6 +2496,11 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
> >  	if (error)
> >  		goto free_iter_vma;
> >
> > +	if (!map->check_ksm_early) {
> > +		update_ksm_flags(map);
> > +		vm_flags_init(vma, map->vm_flags);
> > +	}
> > +
>
> Guessing map->flags was intended here, I made that change then unmade
> it in the later mm-update-core-kernel-code-to-use-vm_flags_t-consistently.patch.
>
> I'll do a full rebuild at a couple of bisection points, please check
> that all landed OK.
>

Thanks, appreciate it, apologies for the inconveniece! It all looks good to me
from my side.

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs
  2025-06-23  8:37     ` Vlastimil Babka
@ 2025-06-23  9:18       ` Lorenzo Stoakes
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23  9:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Alexander Viro, Christian Brauner, Jan Kara,
	Liam R . Howlett, Jann Horn, Pedro Falcato, David Hildenbrand,
	Xu Xin, Chengming Zhou, linux-mm, linux-kernel, linux-fsdevel,
	Stefan Roesch

On Mon, Jun 23, 2025 at 10:37:53AM +0200, Vlastimil Babka wrote:
> On 6/20/25 14:48, Lorenzo Stoakes wrote:
> > Hi Andrew,
> >
> > Sending a fix-patch for this commit due to a reported syzbot issue which
> > highlighted a bug in the implementation.
> >
> > I discuss the syzbot report at [0].
> >
> > [0]: https://lore.kernel.org/all/a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local/
> >
> > There's a very minor conflict around the map->vm_flags vs. map->flags change,
> > easily resolvable, but if you need a respin let me know.
> >
> > I ran through all mm self tests included the newly introduced one in 4/4 and all
> > good.
> >
> > Thanks, Lorenzo
> >
> > ----8<----
> > From 4d9dde3013837595d733b5059c2d6474261654d6 Mon Sep 17 00:00:00 2001
> > From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Date: Fri, 20 Jun 2025 13:21:03 +0100
> > Subject: [PATCH] mm/vma: correctly invoke late KSM check after mmap hook
> >
> > Previously we erroneously checked whether KSM was applicable prior to
> > invoking the f_op->mmap() hook in the case of not being able to perform
> > this check early.
> >
> > This is problematic, as filesystems such as hugetlb, which use anonymous
> > memory and might otherwise get KSM'd, set VM_HUGETLB in the f_op->mmap()
> > hook.
> >
> > Correct this by checking at the appropriate time.
> >
> > Reported-by: syzbot+a74a028d848147bc5931@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/6853fc57.a00a0220.137b3.0009.GAE@google.com/
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> I've checked the version in mm tree, LGTM.
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>

Cheers, appreciated!

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-06-23  9:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-29 17:15 [PATCH v3 0/4] mm: ksm: prevent KSM from breaking merging of new VMAs Lorenzo Stoakes
2025-05-29 17:15 ` [PATCH v3 1/4] mm: ksm: have KSM VMA checks not require a VMA pointer Lorenzo Stoakes
2025-05-29 17:15 ` [PATCH v3 2/4] mm: ksm: refer to special VMAs via VM_SPECIAL in ksm_compatible() Lorenzo Stoakes
2025-05-29 17:15 ` [PATCH v3 3/4] mm: prevent KSM from breaking VMA merging for new VMAs Lorenzo Stoakes
2025-05-30  7:15   ` Vlastimil Babka
2025-06-02  7:00   ` xu.xin16
2025-06-20 12:48   ` Lorenzo Stoakes
2025-06-22 19:39     ` Andrew Morton
2025-06-23  9:16       ` Lorenzo Stoakes
2025-06-23  8:37     ` Vlastimil Babka
2025-06-23  9:18       ` Lorenzo Stoakes
2025-05-29 17:15 ` [PATCH v3 4/4] tools/testing/selftests: add VMA merge tests for KSM merge Lorenzo Stoakes
2025-05-30  7:17   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).