linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: madvise: use per_vma lock for MADV_FREE
@ 2025-06-11 10:47 Barry Song
  2025-06-11 17:59 ` SeongJae Park
  0 siblings, 1 reply; 2+ messages in thread
From: Barry Song @ 2025-06-11 10:47 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, Barry Song, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Jann Horn, Suren Baghdasaryan,
	Lokesh Gidra, Mike Rapoport, Michal Hocko, Tangquan Zheng,
	Qi Zheng

From: Barry Song <v-songbaohua@oppo.com>

MADV_FREE is another option, besides MADV_DONTNEED, for dynamic memory
freeing in user-space native or Java heap memory management. For example,
jemalloc can be configured to use MADV_FREE, and recent versions of the
Android Java heap have also increasingly adopted MADV_FREE. Supporting
per-VMA locking for MADV_FREE thus appears increasingly necessary.

We have replaced walk_page_range() with walk_page_range_vma(). Along with
the proposed madvise_lock_mode by Lorenzo, the necessary infrastructure is
now in place to begin exploring per-VMA locking support for MADV_FREE and
potentially other madvise using walk_page_range_vma().

This patch adds support for the PGWALK_VMA_RDLOCK walk_lock mode in
walk_page_range_vma(), and leverages madvise_lock_mode from
madv_behavior to select the appropriate walk_lock—either mmap_lock or
per-VMA lock—based on the context.

Because we now dynamically update the walk_ops->walk_lock field, we
must ensure this is thread-safe. The madvise_free_walk_ops is now
defined as a stack variable instead of a global constant.

Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 -v2:
  * Collect David's acked-by and Lorenzo's reviewed-by;
  * refine changelog and code cleanup according to David and Lorenzo

 include/linux/pagewalk.h |  2 ++
 mm/madvise.c             | 25 +++++++++++++++++++------
 mm/pagewalk.c            |  5 ++++-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 9700a29f8afb..a4afa64ef0ab 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -14,6 +14,8 @@ enum page_walk_lock {
 	PGWALK_WRLOCK = 1,
 	/* vma is expected to be already write-locked during the walk */
 	PGWALK_WRLOCK_VERIFY = 2,
+	/* vma is expected to be already read-locked during the walk */
+	PGWALK_VMA_RDLOCK_VERIFY = 3,
 };
 
 /**
diff --git a/mm/madvise.c b/mm/madvise.c
index 790c238d04d4..267d8e4adf31 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -777,10 +777,19 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 
-static const struct mm_walk_ops madvise_free_walk_ops = {
-	.pmd_entry		= madvise_free_pte_range,
-	.walk_lock		= PGWALK_RDLOCK,
-};
+static inline enum page_walk_lock get_walk_lock(enum madvise_lock_mode mode)
+{
+	switch (mode) {
+	case MADVISE_VMA_READ_LOCK:
+		return PGWALK_VMA_RDLOCK_VERIFY;
+	case MADVISE_MMAP_READ_LOCK:
+		return PGWALK_RDLOCK;
+	default:
+		/* Other modes don't require fixing up the walk_lock */
+		WARN_ON_ONCE(1);
+		return PGWALK_RDLOCK;
+	}
+}
 
 static int madvise_free_single_vma(struct madvise_behavior *madv_behavior,
 			struct vm_area_struct *vma,
@@ -789,6 +798,9 @@ static int madvise_free_single_vma(struct madvise_behavior *madv_behavior,
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier_range range;
 	struct mmu_gather *tlb = madv_behavior->tlb;
+	struct mm_walk_ops walk_ops = {
+		.pmd_entry		= madvise_free_pte_range,
+	};
 
 	/* MADV_FREE works for only anon vma at the moment */
 	if (!vma_is_anonymous(vma))
@@ -808,8 +820,9 @@ static int madvise_free_single_vma(struct madvise_behavior *madv_behavior,
 
 	mmu_notifier_invalidate_range_start(&range);
 	tlb_start_vma(tlb, vma);
+	walk_ops.walk_lock = get_walk_lock(madv_behavior->lock_mode);
 	walk_page_range_vma(vma, range.start, range.end,
-			&madvise_free_walk_ops, tlb);
+			&walk_ops, tlb);
 	tlb_end_vma(tlb, vma);
 	mmu_notifier_invalidate_range_end(&range);
 	return 0;
@@ -1655,7 +1668,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
 	case MADV_WILLNEED:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
-	case MADV_FREE:
 	case MADV_POPULATE_READ:
 	case MADV_POPULATE_WRITE:
 	case MADV_COLLAPSE:
@@ -1664,6 +1676,7 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
 		return MADVISE_MMAP_READ_LOCK;
 	case MADV_DONTNEED:
 	case MADV_DONTNEED_LOCKED:
+	case MADV_FREE:
 		return MADVISE_VMA_READ_LOCK;
 	default:
 		return MADVISE_MMAP_WRITE_LOCK;
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index e478777c86e1..74f623159f7b 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -422,7 +422,7 @@ static inline void process_mm_walk_lock(struct mm_struct *mm,
 {
 	if (walk_lock == PGWALK_RDLOCK)
 		mmap_assert_locked(mm);
-	else
+	else if (walk_lock != PGWALK_VMA_RDLOCK_VERIFY)
 		mmap_assert_write_locked(mm);
 }
 
@@ -437,6 +437,9 @@ static inline void process_vma_walk_lock(struct vm_area_struct *vma,
 	case PGWALK_WRLOCK_VERIFY:
 		vma_assert_write_locked(vma);
 		break;
+	case PGWALK_VMA_RDLOCK_VERIFY:
+		vma_assert_locked(vma);
+		break;
 	case PGWALK_RDLOCK:
 		/* PGWALK_RDLOCK is handled by process_mm_walk_lock */
 		break;
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] mm: madvise: use per_vma lock for MADV_FREE
  2025-06-11 10:47 [PATCH v2] mm: madvise: use per_vma lock for MADV_FREE Barry Song
@ 2025-06-11 17:59 ` SeongJae Park
  0 siblings, 0 replies; 2+ messages in thread
From: SeongJae Park @ 2025-06-11 17:59 UTC (permalink / raw)
  To: Barry Song
  Cc: SeongJae Park, akpm, linux-mm, linux-kernel, Barry Song,
	David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
	Vlastimil Babka, Jann Horn, Suren Baghdasaryan, Lokesh Gidra,
	Mike Rapoport, Michal Hocko, Tangquan Zheng, Qi Zheng

On Wed, 11 Jun 2025 22:47:45 +1200 Barry Song <21cnbao@gmail.com> wrote:

> From: Barry Song <v-songbaohua@oppo.com>
> 
> MADV_FREE is another option, besides MADV_DONTNEED, for dynamic memory
> freeing in user-space native or Java heap memory management. For example,
> jemalloc can be configured to use MADV_FREE, and recent versions of the
> Android Java heap have also increasingly adopted MADV_FREE. Supporting
> per-VMA locking for MADV_FREE thus appears increasingly necessary.
> 
> We have replaced walk_page_range() with walk_page_range_vma(). Along with
> the proposed madvise_lock_mode by Lorenzo, the necessary infrastructure is
> now in place to begin exploring per-VMA locking support for MADV_FREE and
> potentially other madvise using walk_page_range_vma().
> 
> This patch adds support for the PGWALK_VMA_RDLOCK walk_lock mode in
> walk_page_range_vma(), and leverages madvise_lock_mode from
> madv_behavior to select the appropriate walk_lock—either mmap_lock or
> per-VMA lock—based on the context.
> 
> Because we now dynamically update the walk_ops->walk_lock field, we
> must ensure this is thread-safe. The madvise_free_walk_ops is now
> defined as a stack variable instead of a global constant.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jann Horn <jannh@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Lokesh Gidra <lokeshgidra@google.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> Cc: Qi Zheng <zhengqi.arch@bytedance.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

Acked-by: SeongJae Park <sj@kernel.org>


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-06-11 17:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 10:47 [PATCH v2] mm: madvise: use per_vma lock for MADV_FREE Barry Song
2025-06-11 17:59 ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).