[PATCH mm-new v8 0/4] Improve khugepaged scan logic

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH mm-new v8 0/4] Improve khugepaged scan logic
@ 2026-02-21  9:39 Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Vernon Yang @ 2026-02-21  9:39 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

From: Vernon Yang <yanglincheng@kylinos.cn>

Hi all,

This series is improve the khugepaged scan logic, reduce CPU consumption,
prioritize scanning task that access memory frequently.

The following data is traced by bpftrace[1] on a desktop system. After
the system has been left idle for 10 minutes upon booting, a lot of
SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan by
khugepaged.

@scan_pmd_status[1]: 1           ## SCAN_SUCCEED
@scan_pmd_status[6]: 2           ## SCAN_EXCEED_SHARED_PTE
@scan_pmd_status[3]: 142         ## SCAN_PMD_MAPPED
@scan_pmd_status[2]: 178         ## SCAN_NO_PTE_TABLE
total progress size: 674 MB
Total time         : 419 seconds ## include khugepaged_scan_sleep_millisecs

The khugepaged has below phenomenon: the khugepaged list is scanned in a
FIFO manner, as long as the task is not destroyed,
1. the task no longer has memory that can be collapsed into hugepage,
   continues scan it always.
2. the task at the front of the khugepaged scan list is cold, they are
   still scanned first.
3. everyone scan at intervals of khugepaged_scan_sleep_millisecs
   (default 10s). If we always scan the above two cases first, the valid
   scan will have to wait for a long time.

For the first case, when the memory is either SCAN_PMD_MAPPED or
SCAN_NO_PTE_TABLE or SCAN_PTE_MAPPED_HUGEPAGE [5], just skip it.

For the second case, if the user has explicitly informed us via
MADV_FREE that these folios will be freed, just skip it only.

The below is some performance test results.

kernbench results (testing on x86_64 machine):

                     baseline w/o patches   test w/ patches
Amean     user-32    18522.51 (   0.00%)    18333.64 *   1.02%*
Amean     syst-32     1137.96 (   0.00%)     1113.79 *   2.12%*
Amean     elsp-32      666.04 (   0.00%)      659.44 *   0.99%*
BAmean-95 user-32    18520.01 (   0.00%)    18323.57 (   1.06%)
BAmean-95 syst-32     1137.68 (   0.00%)     1110.50 (   2.39%)
BAmean-95 elsp-32      665.92 (   0.00%)      659.06 (   1.03%)
BAmean-99 user-32    18520.01 (   0.00%)    18323.57 (   1.06%)
BAmean-99 syst-32     1137.68 (   0.00%)     1110.50 (   2.39%)
BAmean-99 elsp-32      665.92 (   0.00%)      659.06 (   1.03%)

Create three task[2]: hot1 -> cold -> hot2. After all three task are
created, each allocate memory 128MB. the hot1/hot2 task continuously
access 128 MB memory, while the cold task only accesses its memory
briefly andthen call madvise(MADV_FREE). Here are the performance test
results:
(Throughput bigger is better, other smaller is better)

Testing on x86_64 machine:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
| cycles per access   |  4.96         |  2.21         | -55.44% |
| Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
| dTLB-load-misses    |  284814532    |  69597236     | -75.56% |

Testing on qemu-system-x86_64 -enable-kvm:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
| cycles per access   |  7.29         |  2.07         | -71.60% |
| Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
| dTLB-load-misses    |  241600871    |  3216108      | -98.67% |

This series is based on mm-new.

Thank you very much for your comments and discussions.


V7 -> V8:
- Just not skip collapse for lazyfree folios in VM_DROPPABLE mappings.
- "cur_progress" equal to 1 when SCAN_PTE_MAPPED_HUGEPAGE in file case.
- V7 PATCH #5 has been merged into the mm-new branch.
- Some cleaning, more detail commit log, and pickup Reviewed-by.

V6 -> V7:
- Use "*cur_progress += 1" at the beginning of the loop in anon case.
- Always "cur_progress" equal to HPAGE_PMD_NR in file case.
- Some cleaning, and pickup Acked-by and Reviewed-by.

V5 -> V6:
- Simplify hpage_collapse_scan_file() [3] and hpage_collapse_scan_pmd().
- Skip lazy-free folios in the khugepaged only [4].
- pickup Reviewed-by.

V4 -> V5:
- Patch #3 are squashed to Patch #2
- File patch utilize "xas->xa_index" to fix issue.
- folio_is_lazyfree() to folio_test_lazyfree()
- Just skip lazyfree folio simply.
- Again test kernbench in the performance mode to improve stability.
- pickup Acked-by and Reviewed-by.

V3 -> V4:
- Rebase on mm-new.
- Make Patch #2 cleaner
- Fix the lazyfree folio continue to be collapsed when skipped ahead.

V2 -> V3:
- Refine scan progress number, add folio_is_lazyfree helper
- Fix warnings at SCAN_PTE_MAPPED_HUGEPAGE.
- For MADV_FREE, we will skip the lazy-free folios instead.
- For MADV_COLD, remove it.
- Used hpage_collapse_test_exit_or_disable() instead of vma = NULL.
- pickup Reviewed-by.

V1 -> V2:
- Rename full to full_scan_finished, pickup Acked-by.
- Just skip SCAN_PMD_MAPPED/NO_PTE_TABLE memory, not remove mm.
- Set VM_NOHUGEPAGE flag when MADV_COLD/MADV_FREE to just skip, not move mm.
- Again test performance at the v6.19-rc2.

V7 : https://lore.kernel.org/linux-mm/20260207081613.588598-1-vernon2gm@gmail.com
V6 : https://lore.kernel.org/linux-mm/20260201122554.1470071-1-vernon2gm@gmail.com
V5 : https://lore.kernel.org/linux-mm/20260123082232.16413-1-vernon2gm@gmail.com
V4 : https://lore.kernel.org/linux-mm/20260111121909.8410-1-yanglincheng@kylinos.cn
V3 : https://lore.kernel.org/linux-mm/20260104054112.4541-1-yanglincheng@kylinos.cn
V2 : https://lore.kernel.org/linux-mm/20251229055151.54887-1-yanglincheng@kylinos.cn
V1 : https://lore.kernel.org/linux-mm/20251215090419.174418-1-yanglincheng@kylinos.cn

[1] https://github.com/vernon2gh/app_and_module/blob/main/khugepaged/khugepaged_mm.bt
[2] https://github.com/vernon2gh/app_and_module/blob/main/khugepaged/app.c
[3] https://lore.kernel.org/linux-mm/4c35391e-a944-4e62-9103-4a1c4961f62a@arm.com
[4] https://lore.kernel.org/linux-mm/CACZaFFNY8+UKLzBGnmB3ij9amzBdKJgytcSNtA8fLCake8Ua=A@mail.gmail.com
[5] https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq

Vernon Yang (4):
  mm: khugepaged: add trace_mm_khugepaged_scan event
  mm: khugepaged: refine scan progress number
  mm: add folio_test_lazyfree helper
  mm: khugepaged: skip lazy-free folios

 include/linux/page-flags.h         |  5 +++
 include/trace/events/huge_memory.h | 26 ++++++++++++++
 mm/khugepaged.c                    | 57 ++++++++++++++++++++++++------
 mm/rmap.c                          |  2 +-
 mm/vmscan.c                        |  5 ++-
 5 files changed, 81 insertions(+), 14 deletions(-)


base-commit: a6fdc327de4678e54b5122441c970371014117b0
--
2.51.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event
  2026-02-21  9:39 [PATCH mm-new v8 0/4] Improve khugepaged scan logic Vernon Yang
@ 2026-02-21  9:39 ` Vernon Yang
  2026-03-25 14:06   ` Lorenzo Stoakes (Oracle)
  2026-02-21  9:39 ` [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number Vernon Yang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-21  9:39 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

From: Vernon Yang <yanglincheng@kylinos.cn>

Add mm_khugepaged_scan event to track the total time for full scan
and the total number of pages scanned of khugepaged.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Dev Jain <dev.jain@arm.com>
---
 include/trace/events/huge_memory.h | 25 +++++++++++++++++++++++++
 mm/khugepaged.c                    |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 4e41bff31888..384e29f6bef0 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -237,5 +237,30 @@ TRACE_EVENT(mm_khugepaged_collapse_file,
 		__print_symbolic(__entry->result, SCAN_STATUS))
 );
 
+TRACE_EVENT(mm_khugepaged_scan,
+
+	TP_PROTO(struct mm_struct *mm, unsigned int progress,
+		 bool full_scan_finished),
+
+	TP_ARGS(mm, progress, full_scan_finished),
+
+	TP_STRUCT__entry(
+		__field(struct mm_struct *, mm)
+		__field(unsigned int, progress)
+		__field(bool, full_scan_finished)
+	),
+
+	TP_fast_assign(
+		__entry->mm = mm;
+		__entry->progress = progress;
+		__entry->full_scan_finished = full_scan_finished;
+	),
+
+	TP_printk("mm=%p, progress=%u, full_scan_finished=%d",
+		__entry->mm,
+		__entry->progress,
+		__entry->full_scan_finished)
+);
+
 #endif /* __HUGE_MEMORY_H */
 #include <trace/define_trace.h>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c0f893bebcff..e2f6b68a0011 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2527,6 +2527,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 		collect_mm_slot(slot);
 	}
 
+	trace_mm_khugepaged_scan(mm, progress, khugepaged_scan.mm_slot == NULL);
+
 	return progress;
 }
 
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-21  9:39 [PATCH mm-new v8 0/4] Improve khugepaged scan logic Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
@ 2026-02-21  9:39 ` Vernon Yang
  2026-02-24  3:52   ` Wei Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 3/4] mm: add folio_test_lazyfree helper Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios Vernon Yang
  3 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-21  9:39 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

From: Vernon Yang <yanglincheng@kylinos.cn>

Currently, each scan always increases "progress" by HPAGE_PMD_NR,
even if only scanning a single PTE/PMD entry.

- When only scanning a sigle PTE entry, let me provide a detailed
  example:

static int hpage_collapse_scan_pmd()
{
	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
	     _pte++, addr += PAGE_SIZE) {
		pte_t pteval = ptep_get(_pte);
		...
		if (pte_uffd_wp(pteval)) { <-- first scan hit
			result = SCAN_PTE_UFFD_WP;
			goto out_unmap;
		}
	}
}

During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
directly. In practice, only one PTE is scanned before termination.
Here, "progress += 1" reflects the actual number of PTEs scanned, but
previously "progress += HPAGE_PMD_NR" always.

- When the memory has been collapsed to PMD, let me provide a detailed
  example:

The following data is traced by bpftrace on a desktop system. After
the system has been left idle for 10 minutes upon booting, a lot of
SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
by khugepaged.

From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
following statuses were observed, with frequency mentioned next to them:

SCAN_SUCCEED          : 1
SCAN_EXCEED_SHARED_PTE: 2
SCAN_PMD_MAPPED       : 142
SCAN_NO_PTE_TABLE     : 178
total progress size   : 674 MB
Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs

The khugepaged_scan list save all task that support collapse into hugepage,
as long as the task is not destroyed, khugepaged will not remove it from
the khugepaged_scan list. This exist a phenomenon where task has already
collapsed all memory regions into hugepage, but khugepaged continues to
scan it, which wastes CPU time and invalid, and due to
khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
scanning a large number of invalid task, so scanning really valid task
is later.

After applying this patch, when the memory is either SCAN_PMD_MAPPED or
SCAN_NO_PTE_TABLE, just skip it, as follow:

SCAN_EXCEED_SHARED_PTE: 2
SCAN_PMD_MAPPED       : 147
SCAN_NO_PTE_TABLE     : 173
total progress size   : 45 MB
Total time            : 20 seconds

SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Reviewed-by: Dev Jain <dev.jain@arm.com>
---
 mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e2f6b68a0011..61e25cf5424b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -68,7 +68,10 @@ enum scan_result {
 static struct task_struct *khugepaged_thread __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);
 
-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
+/*
+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
+ * every 10 second.
+ */
 static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
 static unsigned int khugepaged_full_scans;
@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
 }
 
 static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
+		struct vm_area_struct *vma, unsigned long start_addr,
+		bool *mmap_locked, unsigned int *cur_progress,
 		struct collapse_control *cc)
 {
 	pmd_t *pmd;
@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
 
 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
-	if (result != SCAN_SUCCEED)
+	if (result != SCAN_SUCCEED) {
+		if (cur_progress)
+			*cur_progress = 1;
 		goto out;
+	}
 
 	memset(cc->node_load, 0, sizeof(cc->node_load));
 	nodes_clear(cc->alloc_nmask);
 	pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
 	if (!pte) {
+		if (cur_progress)
+			*cur_progress = 1;
 		result = SCAN_NO_PTE_TABLE;
 		goto out;
 	}
 
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
+		if (cur_progress)
+			*cur_progress += 1;
+
 		pte_t pteval = ptep_get(_pte);
 		if (pte_none_or_zero(pteval)) {
 			++none_or_zero;
@@ -2279,8 +2291,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
-static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
-		struct file *file, pgoff_t start, struct collapse_control *cc)
+static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
+		unsigned long addr, struct file *file, pgoff_t start,
+		unsigned int *cur_progress, struct collapse_control *cc)
 {
 	struct folio *folio = NULL;
 	struct address_space *mapping = file->f_mapping;
@@ -2370,6 +2383,12 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
 		}
 	}
 	rcu_read_unlock();
+	if (cur_progress) {
+		if (result == SCAN_PTE_MAPPED_HUGEPAGE)
+			*cur_progress = 1;
+		else
+			*cur_progress = HPAGE_PMD_NR;
+	}
 
 	if (result == SCAN_SUCCEED) {
 		if (cc->is_khugepaged &&
@@ -2448,6 +2467,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 
 		while (khugepaged_scan.address < hend) {
 			bool mmap_locked = true;
+			unsigned int cur_progress = 0;
 
 			cond_resched();
 			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
@@ -2464,7 +2484,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				mmap_read_unlock(mm);
 				mmap_locked = false;
 				*result = hpage_collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff, cc);
+					khugepaged_scan.address, file, pgoff,
+					&cur_progress, cc);
 				fput(file);
 				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
 					mmap_read_lock(mm);
@@ -2478,7 +2499,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				}
 			} else {
 				*result = hpage_collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked, cc);
+					khugepaged_scan.address, &mmap_locked,
+					&cur_progress, cc);
 			}
 
 			if (*result == SCAN_SUCCEED)
@@ -2486,7 +2508,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			progress += HPAGE_PMD_NR;
+			progress += cur_progress;
 			if (!mmap_locked)
 				/*
 				 * We released mmap_lock so break loop.  Note
@@ -2809,7 +2831,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			mmap_locked = false;
 			*lock_dropped = true;
 			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
-							  cc);
+							  NULL, cc);
 
 			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
 			    mapping_can_writeback(file->f_mapping)) {
@@ -2824,7 +2846,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			fput(file);
 		} else {
 			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, cc);
+							 &mmap_locked, NULL, cc);
 		}
 		if (!mmap_locked)
 			*lock_dropped = true;
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH mm-new v8 3/4] mm: add folio_test_lazyfree helper
  2026-02-21  9:39 [PATCH mm-new v8 0/4] Improve khugepaged scan logic Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number Vernon Yang
@ 2026-02-21  9:39 ` Vernon Yang
  2026-02-21  9:39 ` [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios Vernon Yang
  3 siblings, 0 replies; 41+ messages in thread
From: Vernon Yang @ 2026-02-21  9:39 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

From: Vernon Yang <yanglincheng@kylinos.cn>

Add folio_test_lazyfree() function to identify lazy-free folios to improve
code readability.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Barry Song <baohua@kernel.org>
---
 include/linux/page-flags.h | 5 +++++
 mm/rmap.c                  | 2 +-
 mm/vmscan.c                | 5 ++---
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index fb6a83fe88b0..0426cac91c0b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -724,6 +724,11 @@ static __always_inline bool folio_test_anon(const struct folio *folio)
 	return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0;
 }
 
+static __always_inline bool folio_test_lazyfree(const struct folio *folio)
+{
+	return folio_test_anon(folio) && !folio_test_swapbacked(folio);
+}
+
 static __always_inline bool PageAnonNotKsm(const struct page *page)
 {
 	unsigned long flags = (unsigned long)page_folio(page)->mapping;
diff --git a/mm/rmap.c b/mm/rmap.c
index 0f00570d1b9e..bff8f222004e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2046,7 +2046,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		}
 
 		if (!pvmw.pte) {
-			if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
+			if (folio_test_lazyfree(folio)) {
 				if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio))
 					goto walk_done;
 				/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6a87ac7be43c..9ce3f54f43b8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -963,8 +963,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
 	 * They could be mistakenly treated as file lru. So further anon
 	 * test is needed.
 	 */
-	if (!folio_is_file_lru(folio) ||
-	    (folio_test_anon(folio) && !folio_test_swapbacked(folio))) {
+	if (!folio_is_file_lru(folio) || folio_test_lazyfree(folio)) {
 		*dirty = false;
 		*writeback = false;
 		return;
@@ -1508,7 +1507,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			}
 		}
 
-		if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
+		if (folio_test_lazyfree(folio)) {
 			/* follow __remove_mapping for reference */
 			if (!folio_ref_freeze(folio, 1))
 				goto keep_locked;
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-21  9:39 [PATCH mm-new v8 0/4] Improve khugepaged scan logic Vernon Yang
                   ` (2 preceding siblings ...)
  2026-02-21  9:39 ` [PATCH mm-new v8 3/4] mm: add folio_test_lazyfree helper Vernon Yang
@ 2026-02-21  9:39 ` Vernon Yang
  2026-02-21 10:27   ` Barry Song
  3 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-21  9:39 UTC (permalink / raw)
  To: akpm, david
  Cc: lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

From: Vernon Yang <yanglincheng@kylinos.cn>

For example, create three task: hot1 -> cold -> hot2. After all three
task are created, each allocate memory 128MB. the hot1/hot2 task
continuously access 128 MB memory, while the cold task only accesses
its memory briefly and then call madvise(MADV_FREE). However, khugepaged
still prioritizes scanning the cold task and only scans the hot2 task
after completing the scan of the cold task.

And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
that property, so we can just collapse and memory pressure in the future
will free it up. In contrast, collapsing in !VM_DROPPABLE does not
maintain that property, the collapsed folio will not be lazyfree and
memory pressure in the future will not be able to free it up.

So if the user has explicitly informed us via MADV_FREE that this memory
will be freed, and this vma does not have VM_DROPPABLE flags, it is
appropriate for khugepaged to skip it only, thereby avoiding unnecessary
scan and collapse operations to reducing CPU wastage.

Here are the performance test results:
(Throughput bigger is better, other smaller is better)

Testing on x86_64 machine:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
| cycles per access   |  4.96         |  2.21         | -55.44% |
| Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
| dTLB-load-misses    |  284814532    |  69597236     | -75.56% |

Testing on qemu-system-x86_64 -enable-kvm:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
| cycles per access   |  7.29         |  2.07         | -71.60% |
| Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
| dTLB-load-misses    |  241600871    |  3216108      | -98.67% |

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
---
 include/trace/events/huge_memory.h |  1 +
 mm/khugepaged.c                    | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 384e29f6bef0..bcdc57eea270 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -25,6 +25,7 @@
 	EM( SCAN_PAGE_LRU,		"page_not_in_lru")		\
 	EM( SCAN_PAGE_LOCK,		"page_locked")			\
 	EM( SCAN_PAGE_ANON,		"page_not_anon")		\
+	EM( SCAN_PAGE_LAZYFREE,		"page_lazyfree")		\
 	EM( SCAN_PAGE_COMPOUND,		"page_compound")		\
 	EM( SCAN_ANY_PROCESS,		"no_process_for_page")		\
 	EM( SCAN_VMA_NULL,		"vma_null")			\
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 61e25cf5424b..e792e9074b48 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -46,6 +46,7 @@ enum scan_result {
 	SCAN_PAGE_LRU,
 	SCAN_PAGE_LOCK,
 	SCAN_PAGE_ANON,
+	SCAN_PAGE_LAZYFREE,
 	SCAN_PAGE_COMPOUND,
 	SCAN_ANY_PROCESS,
 	SCAN_VMA_NULL,
@@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		folio = page_folio(page);
 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
 
+		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
+		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
+			result = SCAN_PAGE_LAZYFREE;
+			goto out;
+		}
+
 		/* See hpage_collapse_scan_pmd(). */
 		if (folio_maybe_mapped_shared(folio)) {
 			++shared;
@@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		}
 		folio = page_folio(page);
 
+		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
+		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
+			result = SCAN_PAGE_LAZYFREE;
+			goto out_unmap;
+		}
+
 		if (!folio_test_anon(folio)) {
 			result = SCAN_PAGE_ANON;
 			goto out_unmap;
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-21  9:39 ` [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios Vernon Yang
@ 2026-02-21 10:27   ` Barry Song
  2026-02-21 13:38     ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: Barry Song @ 2026-02-21 10:27 UTC (permalink / raw)
  To: Vernon Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, dev.jain, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>
> From: Vernon Yang <yanglincheng@kylinos.cn>
>
> For example, create three task: hot1 -> cold -> hot2. After all three
> task are created, each allocate memory 128MB. the hot1/hot2 task
> continuously access 128 MB memory, while the cold task only accesses
> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> still prioritizes scanning the cold task and only scans the hot2 task
> after completing the scan of the cold task.
>
> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> that property, so we can just collapse and memory pressure in the future

I don’t think this is accurate. A VMA without VM_DROPPABLE
can still have all folios marked as lazyfree. Therefore, having
all folios lazyfree is not the reason why collapsing preserves
the property.

This raises a question: if a VMA without VM_DROPPABLE has
many contiguous lazyfree folios that can be collapsed, and
none of those folios are non-lazyfree, should we collapse
them and pass the lazyfree state to the new folio?

Currently, our approach skips the collapse, which also feels
a bit inconsistent.

> will free it up. In contrast, collapsing in !VM_DROPPABLE does not
> maintain that property, the collapsed folio will not be lazyfree and
> memory pressure in the future will not be able to free it up.
>
> So if the user has explicitly informed us via MADV_FREE that this memory
> will be freed, and this vma does not have VM_DROPPABLE flags, it is
> appropriate for khugepaged to skip it only, thereby avoiding unnecessary
> scan and collapse operations to reducing CPU wastage.
>
> Here are the performance test results:
> (Throughput bigger is better, other smaller is better)
>
> Testing on x86_64 machine:
>
> | task hot2           | without patch | with patch    |  delta  |
> |---------------------|---------------|---------------|---------|
> | total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
> | cycles per access   |  4.96         |  2.21         | -55.44% |
> | Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
> | dTLB-load-misses    |  284814532    |  69597236     | -75.56% |
>
> Testing on qemu-system-x86_64 -enable-kvm:
>
> | task hot2           | without patch | with patch    |  delta  |
> |---------------------|---------------|---------------|---------|
> | total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
> | cycles per access   |  7.29         |  2.07         | -71.60% |
> | Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
> | dTLB-load-misses    |  241600871    |  3216108      | -98.67% |
>
> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> Acked-by: David Hildenbrand (arm) <david@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> ---

Overall, LGTM,

Reviewed-by: Barry Song <baohua@kernel.org>

>  include/trace/events/huge_memory.h |  1 +
>  mm/khugepaged.c                    | 13 +++++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 384e29f6bef0..bcdc57eea270 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -25,6 +25,7 @@
>         EM( SCAN_PAGE_LRU,              "page_not_in_lru")              \
>         EM( SCAN_PAGE_LOCK,             "page_locked")                  \
>         EM( SCAN_PAGE_ANON,             "page_not_anon")                \
> +       EM( SCAN_PAGE_LAZYFREE,         "page_lazyfree")                \
>         EM( SCAN_PAGE_COMPOUND,         "page_compound")                \
>         EM( SCAN_ANY_PROCESS,           "no_process_for_page")          \
>         EM( SCAN_VMA_NULL,              "vma_null")                     \
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 61e25cf5424b..e792e9074b48 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -46,6 +46,7 @@ enum scan_result {
>         SCAN_PAGE_LRU,
>         SCAN_PAGE_LOCK,
>         SCAN_PAGE_ANON,
> +       SCAN_PAGE_LAZYFREE,
>         SCAN_PAGE_COMPOUND,
>         SCAN_ANY_PROCESS,
>         SCAN_VMA_NULL,
> @@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                 folio = page_folio(page);
>                 VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
>
> +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {

I would prefer to add a comment about VM_DROPPABLE here
rather than only mentioning it in the changelog.

> +                       result = SCAN_PAGE_LAZYFREE;
> +                       goto out;
> +               }
> +
>                 /* See hpage_collapse_scan_pmd(). */
>                 if (folio_maybe_mapped_shared(folio)) {
>                         ++shared;
> @@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>                 }
>                 folio = page_folio(page);
>
> +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
> +                       result = SCAN_PAGE_LAZYFREE;
> +                       goto out_unmap;
> +               }

As above.

> +
>                 if (!folio_test_anon(folio)) {
>                         result = SCAN_PAGE_ANON;
>                         goto out_unmap;
> --
> 2.51.0
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-21 10:27   ` Barry Song
@ 2026-02-21 13:38     ` Vernon Yang
  2026-02-23 13:16       ` David Hildenbrand (Arm)
  2026-02-23 20:10       ` Barry Song
  0 siblings, 2 replies; 41+ messages in thread
From: Vernon Yang @ 2026-02-21 13:38 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, david, lorenzo.stoakes, ziy, dev.jain, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > From: Vernon Yang <yanglincheng@kylinos.cn>
> >
> > For example, create three task: hot1 -> cold -> hot2. After all three
> > task are created, each allocate memory 128MB. the hot1/hot2 task
> > continuously access 128 MB memory, while the cold task only accesses
> > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > still prioritizes scanning the cold task and only scans the hot2 task
> > after completing the scan of the cold task.
> >
> > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> > that property, so we can just collapse and memory pressure in the future
>
> I don’t think this is accurate. A VMA without VM_DROPPABLE
> can still have all folios marked as lazyfree. Therefore, having
> all folios lazyfree is not the reason why collapsing preserves
> the property.

In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
attribute, which is the root reason why Collapsing maintains that property.
The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
                                                ^^^^^^^^^^^^^^^
(the "if" is redundant and should be removed), not "all folios are lazyfree".

> This raises a question: if a VMA without VM_DROPPABLE has
> many contiguous lazyfree folios that can be collapsed, and
> none of those folios are non-lazyfree, should we collapse
> them and pass the lazyfree state to the new folio?
>
> Currently, our approach skips the collapse, which also feels
> a bit inconsistent.

Yes, they are inconsistent, because this question need to scan all folios
to make a decision, and it cannot solve the hot1->cold->hot2 scenario.

> > will free it up. In contrast, collapsing in !VM_DROPPABLE does not
> > maintain that property, the collapsed folio will not be lazyfree and
> > memory pressure in the future will not be able to free it up.
> >
> > So if the user has explicitly informed us via MADV_FREE that this memory
> > will be freed, and this vma does not have VM_DROPPABLE flags, it is
> > appropriate for khugepaged to skip it only, thereby avoiding unnecessary
> > scan and collapse operations to reducing CPU wastage.
> >
> > Here are the performance test results:
> > (Throughput bigger is better, other smaller is better)
> >
> > Testing on x86_64 machine:
> >
> > | task hot2           | without patch | with patch    |  delta  |
> > |---------------------|---------------|---------------|---------|
> > | total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
> > | cycles per access   |  4.96         |  2.21         | -55.44% |
> > | Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
> > | dTLB-load-misses    |  284814532    |  69597236     | -75.56% |
> >
> > Testing on qemu-system-x86_64 -enable-kvm:
> >
> > | task hot2           | without patch | with patch    |  delta  |
> > |---------------------|---------------|---------------|---------|
> > | total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
> > | cycles per access   |  7.29         |  2.07         | -71.60% |
> > | Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
> > | dTLB-load-misses    |  241600871    |  3216108      | -98.67% |
> >
> > Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> > Acked-by: David Hildenbrand (arm) <david@kernel.org>
> > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > ---
>
> Overall, LGTM,
>
> Reviewed-by: Barry Song <baohua@kernel.org>

Thank you for review.

> >  include/trace/events/huge_memory.h |  1 +
> >  mm/khugepaged.c                    | 13 +++++++++++++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> > index 384e29f6bef0..bcdc57eea270 100644
> > --- a/include/trace/events/huge_memory.h
> > +++ b/include/trace/events/huge_memory.h
> > @@ -25,6 +25,7 @@
> >         EM( SCAN_PAGE_LRU,              "page_not_in_lru")              \
> >         EM( SCAN_PAGE_LOCK,             "page_locked")                  \
> >         EM( SCAN_PAGE_ANON,             "page_not_anon")                \
> > +       EM( SCAN_PAGE_LAZYFREE,         "page_lazyfree")                \
> >         EM( SCAN_PAGE_COMPOUND,         "page_compound")                \
> >         EM( SCAN_ANY_PROCESS,           "no_process_for_page")          \
> >         EM( SCAN_VMA_NULL,              "vma_null")                     \
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 61e25cf5424b..e792e9074b48 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -46,6 +46,7 @@ enum scan_result {
> >         SCAN_PAGE_LRU,
> >         SCAN_PAGE_LOCK,
> >         SCAN_PAGE_ANON,
> > +       SCAN_PAGE_LAZYFREE,
> >         SCAN_PAGE_COMPOUND,
> >         SCAN_ANY_PROCESS,
> >         SCAN_VMA_NULL,
> > @@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                 folio = page_folio(page);
> >                 VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
> >
> > +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> > +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
>
> I would prefer to add a comment about VM_DROPPABLE here
> rather than only mentioning it in the changelog.

Is the following comment clear?

/*
 * If the vma has the VM_DROPPABLE flag, the collapse will
 * preserve the lazyfree property without needing to skip.
 */

> > +                       result = SCAN_PAGE_LAZYFREE;
> > +                       goto out;
> > +               }
> > +
> >                 /* See hpage_collapse_scan_pmd(). */
> >                 if (folio_maybe_mapped_shared(folio)) {
> >                         ++shared;
> > @@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >                 }
> >                 folio = page_folio(page);
> >
> > +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> > +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
> > +                       result = SCAN_PAGE_LAZYFREE;
> > +                       goto out_unmap;
> > +               }
>
> As above.
>
> > +
> >                 if (!folio_test_anon(folio)) {
> >                         result = SCAN_PAGE_ANON;
> >                         goto out_unmap;
> > --
> > 2.51.0
> >
>
> Thanks
> Barry
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-21 13:38     ` Vernon Yang
@ 2026-02-23 13:16       ` David Hildenbrand (Arm)
  2026-02-23 20:08         ` Barry Song
  2026-02-23 20:10       ` Barry Song
  1 sibling, 1 reply; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-23 13:16 UTC (permalink / raw)
  To: Vernon Yang, Barry Song
  Cc: akpm, lorenzo.stoakes, ziy, dev.jain, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On 2/21/26 14:38, Vernon Yang wrote:
> On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
>> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>>>
>>> From: Vernon Yang <yanglincheng@kylinos.cn>
>>>
>>> For example, create three task: hot1 -> cold -> hot2. After all three
>>> task are created, each allocate memory 128MB. the hot1/hot2 task
>>> continuously access 128 MB memory, while the cold task only accesses
>>> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
>>> still prioritizes scanning the cold task and only scans the hot2 task
>>> after completing the scan of the cold task.
>>>
>>> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
>>> that property, so we can just collapse and memory pressure in the future
>>
>> I don’t think this is accurate. A VMA without VM_DROPPABLE
>> can still have all folios marked as lazyfree. Therefore, having
>> all folios lazyfree is not the reason why collapsing preserves
>> the property.
> 
> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> attribute, which is the root reason why Collapsing maintains that property.
> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>                                                  ^^^^^^^^^^^^^^^
> (the "if" is redundant and should be removed), not "all folios are lazyfree".


Exactly. folio_add_new_anon_rmap() makes sure that all folios (except 
the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.

In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as 
folio_mark_lazyfree() doesn't do anything.

> 
>> This raises a question: if a VMA without VM_DROPPABLE has
>> many contiguous lazyfree folios that can be collapsed, and
>> none of those folios are non-lazyfree, should we collapse
>> them and pass the lazyfree state to the new folio?

I'd assume we'd only want to add support for that when there are actual 
known use cases that can trigger that + benefit from it.

Adds complexity.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-23 13:16       ` David Hildenbrand (Arm)
@ 2026-02-23 20:08         ` Barry Song
  2026-02-24 10:10           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 41+ messages in thread
From: Barry Song @ 2026-02-23 20:08 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Vernon Yang, akpm, lorenzo.stoakes, ziy, dev.jain, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Mon, Feb 23, 2026 at 9:16 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/21/26 14:38, Vernon Yang wrote:
> > On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> >> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >>>
> >>> From: Vernon Yang <yanglincheng@kylinos.cn>
> >>>
> >>> For example, create three task: hot1 -> cold -> hot2. After all three
> >>> task are created, each allocate memory 128MB. the hot1/hot2 task
> >>> continuously access 128 MB memory, while the cold task only accesses
> >>> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> >>> still prioritizes scanning the cold task and only scans the hot2 task
> >>> after completing the scan of the cold task.
> >>>
> >>> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> >>> that property, so we can just collapse and memory pressure in the future
> >>
> >> I don’t think this is accurate. A VMA without VM_DROPPABLE
> >> can still have all folios marked as lazyfree. Therefore, having
> >> all folios lazyfree is not the reason why collapsing preserves
> >> the property.
> >
> > In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> > attribute, which is the root reason why Collapsing maintains that property.
> > The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
> >                                                  ^^^^^^^^^^^^^^^
> > (the "if" is redundant and should be removed), not "all folios are lazyfree".
>
>
> Exactly. folio_add_new_anon_rmap() makes sure that all folios (except
> the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.
>
> In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as
> folio_mark_lazyfree() doesn't do anything.
>

Maybe we could do something like the following?

diff --git a/mm/madvise.c b/mm/madvise.c
index c0370d9b4e23..173b0e5308b5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -817,6 +817,11 @@ static int madvise_free_single_vma(struct
madvise_behavior *madv_behavior)
        range.end = min(vma->vm_end, end_addr);
        if (range.end <= vma->vm_start)
                return -EINVAL;
+
+       /* All folios in the VM_DROPPABLE VMA are already lazyfree */
+       if (vma->vm_flags & VM_DROPPABLE)
+               return 0;
+
        mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
                                range.start, range.end);

Thanks
Barry


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-21 13:38     ` Vernon Yang
  2026-02-23 13:16       ` David Hildenbrand (Arm)
@ 2026-02-23 20:10       ` Barry Song
  2026-02-26  7:55         ` Vernon Yang
  1 sibling, 1 reply; 41+ messages in thread
From: Barry Song @ 2026-02-23 20:10 UTC (permalink / raw)
  To: Vernon Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, dev.jain, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On Sat, Feb 21, 2026 at 9:39 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>
> On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> > On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> > >
> > > From: Vernon Yang <yanglincheng@kylinos.cn>
> > >
> > > For example, create three task: hot1 -> cold -> hot2. After all three
> > > task are created, each allocate memory 128MB. the hot1/hot2 task
> > > continuously access 128 MB memory, while the cold task only accesses
> > > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > > still prioritizes scanning the cold task and only scans the hot2 task
> > > after completing the scan of the cold task.
> > >
> > > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> > > that property, so we can just collapse and memory pressure in the future
> >
> > I don’t think this is accurate. A VMA without VM_DROPPABLE
> > can still have all folios marked as lazyfree. Therefore, having
> > all folios lazyfree is not the reason why collapsing preserves
> > the property.
>
> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> attribute, which is the root reason why Collapsing maintains that property.
> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>                                                 ^^^^^^^^^^^^^^^
> (the "if" is redundant and should be removed), not "all folios are lazyfree".

Yes, we should remove the if; otherwise, it’s misleading.

[...]

> >
> > I would prefer to add a comment about VM_DROPPABLE here
> > rather than only mentioning it in the changelog.
>
> Is the following comment clear?
>
> /*
>  * If the vma has the VM_DROPPABLE flag, the collapse will
>  * preserve the lazyfree property without needing to skip.
>  */

Looks good to me.

Best Regards
Barry



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-21  9:39 ` [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number Vernon Yang
@ 2026-02-24  3:52   ` Wei Yang
  2026-02-25 14:25     ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: Wei Yang @ 2026-02-24  3:52 UTC (permalink / raw)
  To: Vernon Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
>From: Vernon Yang <yanglincheng@kylinos.cn>
>
>Currently, each scan always increases "progress" by HPAGE_PMD_NR,
>even if only scanning a single PTE/PMD entry.
>
>- When only scanning a sigle PTE entry, let me provide a detailed
>  example:
>
>static int hpage_collapse_scan_pmd()
>{
>	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>	     _pte++, addr += PAGE_SIZE) {
>		pte_t pteval = ptep_get(_pte);
>		...
>		if (pte_uffd_wp(pteval)) { <-- first scan hit
>			result = SCAN_PTE_UFFD_WP;
>			goto out_unmap;
>		}
>	}
>}
>
>During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
>directly. In practice, only one PTE is scanned before termination.
>Here, "progress += 1" reflects the actual number of PTEs scanned, but
>previously "progress += HPAGE_PMD_NR" always.
>
>- When the memory has been collapsed to PMD, let me provide a detailed
>  example:
>
>The following data is traced by bpftrace on a desktop system. After
>the system has been left idle for 10 minutes upon booting, a lot of
>SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
>by khugepaged.
>
>>From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
>following statuses were observed, with frequency mentioned next to them:
>
>SCAN_SUCCEED          : 1
>SCAN_EXCEED_SHARED_PTE: 2
>SCAN_PMD_MAPPED       : 142
>SCAN_NO_PTE_TABLE     : 178
>total progress size   : 674 MB
>Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
>
>The khugepaged_scan list save all task that support collapse into hugepage,
>as long as the task is not destroyed, khugepaged will not remove it from
>the khugepaged_scan list. This exist a phenomenon where task has already
>collapsed all memory regions into hugepage, but khugepaged continues to
>scan it, which wastes CPU time and invalid, and due to
>khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
>scanning a large number of invalid task, so scanning really valid task
>is later.
>
>After applying this patch, when the memory is either SCAN_PMD_MAPPED or
>SCAN_NO_PTE_TABLE, just skip it, as follow:
>
>SCAN_EXCEED_SHARED_PTE: 2
>SCAN_PMD_MAPPED       : 147
>SCAN_NO_PTE_TABLE     : 173
>total progress size   : 45 MB
>Total time            : 20 seconds
>
>SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
>https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
>
>Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
>Reviewed-by: Dev Jain <dev.jain@arm.com>
>---
> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 32 insertions(+), 10 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index e2f6b68a0011..61e25cf5424b 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -68,7 +68,10 @@ enum scan_result {
> static struct task_struct *khugepaged_thread __read_mostly;
> static DEFINE_MUTEX(khugepaged_mutex);
> 
>-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
>+/*
>+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
>+ * every 10 second.
>+ */
> static unsigned int khugepaged_pages_to_scan __read_mostly;
> static unsigned int khugepaged_pages_collapsed;
> static unsigned int khugepaged_full_scans;
>@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> }
> 
> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
>+		struct vm_area_struct *vma, unsigned long start_addr,
>+		bool *mmap_locked, unsigned int *cur_progress,
> 		struct collapse_control *cc)
> {
> 	pmd_t *pmd;
>@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> 
> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
>-	if (result != SCAN_SUCCEED)
>+	if (result != SCAN_SUCCEED) {
>+		if (cur_progress)
>+			*cur_progress = 1;
> 		goto out;
>+	}

How about put cur_progress in struct collapse_control?

Then we don't need to check cur_progress every time before modification.

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-23 20:08         ` Barry Song
@ 2026-02-24 10:10           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-24 10:10 UTC (permalink / raw)
  To: Barry Song
  Cc: Vernon Yang, akpm, lorenzo.stoakes, ziy, dev.jain, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On 2/23/26 21:08, Barry Song wrote:
> On Mon, Feb 23, 2026 at 9:16 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/21/26 14:38, Vernon Yang wrote:
>>>
>>> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
>>> attribute, which is the root reason why Collapsing maintains that property.
>>> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>>>                                                  ^^^^^^^^^^^^^^^
>>> (the "if" is redundant and should be removed), not "all folios are lazyfree".
>>
>>
>> Exactly. folio_add_new_anon_rmap() makes sure that all folios (except
>> the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.
>>
>> In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as
>> folio_mark_lazyfree() doesn't do anything.
>>
> 
> Maybe we could do something like the following?
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c0370d9b4e23..173b0e5308b5 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -817,6 +817,11 @@ static int madvise_free_single_vma(struct
> madvise_behavior *madv_behavior)
>         range.end = min(vma->vm_end, end_addr);
>         if (range.end <= vma->vm_start)
>                 return -EINVAL;
> +
> +       /* All folios in the VM_DROPPABLE VMA are already lazyfree */
> +       if (vma->vm_flags & VM_DROPPABLE)
> +               return 0;

We could, but it feels like optimizing for a case that likely nobody
triggers :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-24  3:52   ` Wei Yang
@ 2026-02-25 14:25     ` Vernon Yang
  2026-02-25 14:29       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-25 14:25 UTC (permalink / raw)
  To: Wei Yang, david
  Cc: akpm, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
> >From: Vernon Yang <yanglincheng@kylinos.cn>
> >
> >Currently, each scan always increases "progress" by HPAGE_PMD_NR,
> >even if only scanning a single PTE/PMD entry.
> >
> >- When only scanning a sigle PTE entry, let me provide a detailed
> >  example:
> >
> >static int hpage_collapse_scan_pmd()
> >{
> >	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >	     _pte++, addr += PAGE_SIZE) {
> >		pte_t pteval = ptep_get(_pte);
> >		...
> >		if (pte_uffd_wp(pteval)) { <-- first scan hit
> >			result = SCAN_PTE_UFFD_WP;
> >			goto out_unmap;
> >		}
> >	}
> >}
> >
> >During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
> >directly. In practice, only one PTE is scanned before termination.
> >Here, "progress += 1" reflects the actual number of PTEs scanned, but
> >previously "progress += HPAGE_PMD_NR" always.
> >
> >- When the memory has been collapsed to PMD, let me provide a detailed
> >  example:
> >
> >The following data is traced by bpftrace on a desktop system. After
> >the system has been left idle for 10 minutes upon booting, a lot of
> >SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
> >by khugepaged.
> >
> >>From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
> >following statuses were observed, with frequency mentioned next to them:
> >
> >SCAN_SUCCEED          : 1
> >SCAN_EXCEED_SHARED_PTE: 2
> >SCAN_PMD_MAPPED       : 142
> >SCAN_NO_PTE_TABLE     : 178
> >total progress size   : 674 MB
> >Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
> >
> >The khugepaged_scan list save all task that support collapse into hugepage,
> >as long as the task is not destroyed, khugepaged will not remove it from
> >the khugepaged_scan list. This exist a phenomenon where task has already
> >collapsed all memory regions into hugepage, but khugepaged continues to
> >scan it, which wastes CPU time and invalid, and due to
> >khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
> >scanning a large number of invalid task, so scanning really valid task
> >is later.
> >
> >After applying this patch, when the memory is either SCAN_PMD_MAPPED or
> >SCAN_NO_PTE_TABLE, just skip it, as follow:
> >
> >SCAN_EXCEED_SHARED_PTE: 2
> >SCAN_PMD_MAPPED       : 147
> >SCAN_NO_PTE_TABLE     : 173
> >total progress size   : 45 MB
> >Total time            : 20 seconds
> >
> >SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
> >https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
> >
> >Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> >Reviewed-by: Dev Jain <dev.jain@arm.com>
> >---
> > mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> > 1 file changed, 32 insertions(+), 10 deletions(-)
> >
> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >index e2f6b68a0011..61e25cf5424b 100644
> >--- a/mm/khugepaged.c
> >+++ b/mm/khugepaged.c
> >@@ -68,7 +68,10 @@ enum scan_result {
> > static struct task_struct *khugepaged_thread __read_mostly;
> > static DEFINE_MUTEX(khugepaged_mutex);
> >
> >-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
> >+/*
> >+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
> >+ * every 10 second.
> >+ */
> > static unsigned int khugepaged_pages_to_scan __read_mostly;
> > static unsigned int khugepaged_pages_collapsed;
> > static unsigned int khugepaged_full_scans;
> >@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> > }
> >
> > static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
> >+		struct vm_area_struct *vma, unsigned long start_addr,
> >+		bool *mmap_locked, unsigned int *cur_progress,
> > 		struct collapse_control *cc)
> > {
> > 	pmd_t *pmd;
> >@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> > 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >
> > 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
> >-	if (result != SCAN_SUCCEED)
> >+	if (result != SCAN_SUCCEED) {
> >+		if (cur_progress)
> >+			*cur_progress = 1;
> > 		goto out;
> >+	}
>
> How about put cur_progress in struct collapse_control?
>
> Then we don't need to check cur_progress every time before modification.

Thank you for suggestion.

Placing it inside "struct collapse_control" makes the overall code
simpler, there also coincidentally has a 4-bytes hole, as shown below:

struct collapse_control {
        bool                       is_khugepaged;        /*     0     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        node_load[64];        /*     4   256 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        nodemask_t                 alloc_nmask;          /*   264     8 */

        /* size: 272, cachelines: 5, members: 3 */
        /* sum members: 265, holes: 2, sum holes: 7 */
        /* last cacheline: 16 bytes */
};

But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
will be counted, while madvise(MADV_COLLAPSE) actually does not need to
be counted.

David, do we want to place "cur_progress" inside the "struct collapse_control"?
If Yes, it would be better to rename "cur_progress" to "pmd_progress",
as show below:

struct collapse_control {
        bool is_khugepaged;

        /* Num pages scanned per node */
        u32 node_load[MAX_NUMNODES];

        /*
         * Num pages scanned per pmd, include ptes,
         * pte_mapped_hugepage, pmd_mapped or no_pte_table.
         */
        unsigned int pmd_progress;

        /* nodemask for allocation fallback */
        nodemask_t alloc_nmask;
};

--
Cheers,
Vernon


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-25 14:25     ` Vernon Yang
@ 2026-02-25 14:29       ` David Hildenbrand (Arm)
  2026-02-26 14:31         ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 14:29 UTC (permalink / raw)
  To: Vernon Yang, Wei Yang
  Cc: akpm, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On 2/25/26 15:25, Vernon Yang wrote:
> On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
>> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
>>> From: Vernon Yang <yanglincheng@kylinos.cn>
>>>
>>> Currently, each scan always increases "progress" by HPAGE_PMD_NR,
>>> even if only scanning a single PTE/PMD entry.
>>>
>>> - When only scanning a sigle PTE entry, let me provide a detailed
>>>  example:
>>>
>>> static int hpage_collapse_scan_pmd()
>>> {
>>> 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>> 	     _pte++, addr += PAGE_SIZE) {
>>> 		pte_t pteval = ptep_get(_pte);
>>> 		...
>>> 		if (pte_uffd_wp(pteval)) { <-- first scan hit
>>> 			result = SCAN_PTE_UFFD_WP;
>>> 			goto out_unmap;
>>> 		}
>>> 	}
>>> }
>>>
>>> During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
>>> directly. In practice, only one PTE is scanned before termination.
>>> Here, "progress += 1" reflects the actual number of PTEs scanned, but
>>> previously "progress += HPAGE_PMD_NR" always.
>>>
>>> - When the memory has been collapsed to PMD, let me provide a detailed
>>>  example:
>>>
>>> The following data is traced by bpftrace on a desktop system. After
>>> the system has been left idle for 10 minutes upon booting, a lot of
>>> SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
>>> by khugepaged.
>>>
>>> >From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
>>> following statuses were observed, with frequency mentioned next to them:
>>>
>>> SCAN_SUCCEED          : 1
>>> SCAN_EXCEED_SHARED_PTE: 2
>>> SCAN_PMD_MAPPED       : 142
>>> SCAN_NO_PTE_TABLE     : 178
>>> total progress size   : 674 MB
>>> Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
>>>
>>> The khugepaged_scan list save all task that support collapse into hugepage,
>>> as long as the task is not destroyed, khugepaged will not remove it from
>>> the khugepaged_scan list. This exist a phenomenon where task has already
>>> collapsed all memory regions into hugepage, but khugepaged continues to
>>> scan it, which wastes CPU time and invalid, and due to
>>> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
>>> scanning a large number of invalid task, so scanning really valid task
>>> is later.
>>>
>>> After applying this patch, when the memory is either SCAN_PMD_MAPPED or
>>> SCAN_NO_PTE_TABLE, just skip it, as follow:
>>>
>>> SCAN_EXCEED_SHARED_PTE: 2
>>> SCAN_PMD_MAPPED       : 147
>>> SCAN_NO_PTE_TABLE     : 173
>>> total progress size   : 45 MB
>>> Total time            : 20 seconds
>>>
>>> SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
>>> https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
>>>
>>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
>>> 1 file changed, 32 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index e2f6b68a0011..61e25cf5424b 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -68,7 +68,10 @@ enum scan_result {
>>> static struct task_struct *khugepaged_thread __read_mostly;
>>> static DEFINE_MUTEX(khugepaged_mutex);
>>>
>>> -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
>>> +/*
>>> + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
>>> + * every 10 second.
>>> + */
>>> static unsigned int khugepaged_pages_to_scan __read_mostly;
>>> static unsigned int khugepaged_pages_collapsed;
>>> static unsigned int khugepaged_full_scans;
>>> @@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>>> }
>>>
>>> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>>> -		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
>>> +		struct vm_area_struct *vma, unsigned long start_addr,
>>> +		bool *mmap_locked, unsigned int *cur_progress,
>>> 		struct collapse_control *cc)
>>> {
>>> 	pmd_t *pmd;
>>> @@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>>> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>>>
>>> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
>>> -	if (result != SCAN_SUCCEED)
>>> +	if (result != SCAN_SUCCEED) {
>>> +		if (cur_progress)
>>> +			*cur_progress = 1;
>>> 		goto out;
>>> +	}
>>
>> How about put cur_progress in struct collapse_control?
>>
>> Then we don't need to check cur_progress every time before modification.
> 
> Thank you for suggestion.
> 
> Placing it inside "struct collapse_control" makes the overall code
> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> 
> struct collapse_control {
>         bool                       is_khugepaged;        /*     0     1 */
> 
>         /* XXX 3 bytes hole, try to pack */
> 
>         u32                        node_load[64];        /*     4   256 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
>         nodemask_t                 alloc_nmask;          /*   264     8 */
> 
>         /* size: 272, cachelines: 5, members: 3 */
>         /* sum members: 265, holes: 2, sum holes: 7 */
>         /* last cacheline: 16 bytes */
> };
> 
> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> be counted.
> 
> David, do we want to place "cur_progress" inside the "struct collapse_control"?

Might end up looking nicer code-wise. But the reset semantics (within a
pmd) are a bit weird.

> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> as show below:
> 

"pmd_progress" is misleading. "progress_in_pmd" might be clearer.

Play with it to see if it looks better :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-23 20:10       ` Barry Song
@ 2026-02-26  7:55         ` Vernon Yang
  2026-03-16 19:41           ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-26  7:55 UTC (permalink / raw)
  To: Barry Song, akpm
  Cc: david, lorenzo.stoakes, ziy, dev.jain, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On Tue, Feb 24, 2026 at 04:10:47AM +0800, Barry Song wrote:
> On Sat, Feb 21, 2026 at 9:39 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> > > On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> > > >
> > > > From: Vernon Yang <yanglincheng@kylinos.cn>
> > > >
> > > > For example, create three task: hot1 -> cold -> hot2. After all three
> > > > task are created, each allocate memory 128MB. the hot1/hot2 task
> > > > continuously access 128 MB memory, while the cold task only accesses
> > > > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > > > still prioritizes scanning the cold task and only scans the hot2 task
> > > > after completing the scan of the cold task.
> > > >
> > > > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
            ^^
here

> > > > that property, so we can just collapse and memory pressure in the future
> > >
> > > I don’t think this is accurate. A VMA without VM_DROPPABLE
> > > can still have all folios marked as lazyfree. Therefore, having
> > > all folios lazyfree is not the reason why collapsing preserves
> > > the property.
> >
> > In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> > attribute, which is the root reason why Collapsing maintains that property.
> > The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
> >                                                 ^^^^^^^^^^^^^^^
> > (the "if" is redundant and should be removed), not "all folios are lazyfree".
>
> Yes, we should remove the if; otherwise, it’s misleading.
>
> [...]
>
> > >
> > > I would prefer to add a comment about VM_DROPPABLE here
> > > rather than only mentioning it in the changelog.
> >
> > Is the following comment clear?
> >
> > /*
> >  * If the vma has the VM_DROPPABLE flag, the collapse will
> >  * preserve the lazyfree property without needing to skip.
> >  */
>
> Looks good to me.

Hi Andrew, could you please squash the following fix into this patch?
also remove "if" in the changelog above.

---
From ab5060c7be655dd00bf3a9abc779915922b2f969 Mon Sep 17 00:00:00 2001
From: Vernon Yang <yanglincheng@kylinos.cn>
Date: Thu, 26 Feb 2026 13:18:39 +0800
Subject: [PATCH] fixup! mm: khugepaged: skip lazy-free folios

add comment about VM_DROPPABLE in code, make it clearer.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c85d7381adb5..7c1642fbe394 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -575,6 +575,10 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		folio = page_folio(page);
 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);

+		/*
+		 * If the vma has the VM_DROPPABLE flag, the collapse will
+		 * preserve the lazyfree property without needing to skip.
+		 */
 		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
 		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
 			result = SCAN_PAGE_LAZYFREE;
@@ -1333,6 +1337,10 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		}
 		folio = page_folio(page);

+		/*
+		 * If the vma has the VM_DROPPABLE flag, the collapse will
+		 * preserve the lazyfree property without needing to skip.
+		 */
 		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
 		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
 			result = SCAN_PAGE_LAZYFREE;
--
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-25 14:29       ` David Hildenbrand (Arm)
@ 2026-02-26 14:31         ` Vernon Yang
  2026-02-26 15:45           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-26 14:31 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm
  Cc: Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> On 2/25/26 15:25, Vernon Yang wrote:
> > On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
> >> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
> >>> From: Vernon Yang <yanglincheng@kylinos.cn>
> >>>
> >>> Currently, each scan always increases "progress" by HPAGE_PMD_NR,
> >>> even if only scanning a single PTE/PMD entry.
> >>>
> >>> - When only scanning a sigle PTE entry, let me provide a detailed
> >>>  example:
> >>>
> >>> static int hpage_collapse_scan_pmd()
> >>> {
> >>> 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >>> 	     _pte++, addr += PAGE_SIZE) {
> >>> 		pte_t pteval = ptep_get(_pte);
> >>> 		...
> >>> 		if (pte_uffd_wp(pteval)) { <-- first scan hit
> >>> 			result = SCAN_PTE_UFFD_WP;
> >>> 			goto out_unmap;
> >>> 		}
> >>> 	}
> >>> }
> >>>
> >>> During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
> >>> directly. In practice, only one PTE is scanned before termination.
> >>> Here, "progress += 1" reflects the actual number of PTEs scanned, but
> >>> previously "progress += HPAGE_PMD_NR" always.
> >>>
> >>> - When the memory has been collapsed to PMD, let me provide a detailed
> >>>  example:
> >>>
> >>> The following data is traced by bpftrace on a desktop system. After
> >>> the system has been left idle for 10 minutes upon booting, a lot of
> >>> SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
> >>> by khugepaged.
> >>>
> >>> >From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
> >>> following statuses were observed, with frequency mentioned next to them:
> >>>
> >>> SCAN_SUCCEED          : 1
> >>> SCAN_EXCEED_SHARED_PTE: 2
> >>> SCAN_PMD_MAPPED       : 142
> >>> SCAN_NO_PTE_TABLE     : 178
> >>> total progress size   : 674 MB
> >>> Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
> >>>
> >>> The khugepaged_scan list save all task that support collapse into hugepage,
> >>> as long as the task is not destroyed, khugepaged will not remove it from
> >>> the khugepaged_scan list. This exist a phenomenon where task has already
> >>> collapsed all memory regions into hugepage, but khugepaged continues to
> >>> scan it, which wastes CPU time and invalid, and due to
> >>> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
> >>> scanning a large number of invalid task, so scanning really valid task
> >>> is later.
> >>>
> >>> After applying this patch, when the memory is either SCAN_PMD_MAPPED or
> >>> SCAN_NO_PTE_TABLE, just skip it, as follow:
> >>>
> >>> SCAN_EXCEED_SHARED_PTE: 2
> >>> SCAN_PMD_MAPPED       : 147
> >>> SCAN_NO_PTE_TABLE     : 173
> >>> total progress size   : 45 MB
> >>> Total time            : 20 seconds
> >>>
> >>> SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
> >>> https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
> >>>
> >>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> >>> Reviewed-by: Dev Jain <dev.jain@arm.com>
> >>> ---
> >>> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> >>> 1 file changed, 32 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>> index e2f6b68a0011..61e25cf5424b 100644
> >>> --- a/mm/khugepaged.c
> >>> +++ b/mm/khugepaged.c
> >>> @@ -68,7 +68,10 @@ enum scan_result {
> >>> static struct task_struct *khugepaged_thread __read_mostly;
> >>> static DEFINE_MUTEX(khugepaged_mutex);
> >>>
> >>> -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
> >>> +/*
> >>> + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
> >>> + * every 10 second.
> >>> + */
> >>> static unsigned int khugepaged_pages_to_scan __read_mostly;
> >>> static unsigned int khugepaged_pages_collapsed;
> >>> static unsigned int khugepaged_full_scans;
> >>> @@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >>> }
> >>>
> >>> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >>> -		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
> >>> +		struct vm_area_struct *vma, unsigned long start_addr,
> >>> +		bool *mmap_locked, unsigned int *cur_progress,
> >>> 		struct collapse_control *cc)
> >>> {
> >>> 	pmd_t *pmd;
> >>> @@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >>> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >>>
> >>> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
> >>> -	if (result != SCAN_SUCCEED)
> >>> +	if (result != SCAN_SUCCEED) {
> >>> +		if (cur_progress)
> >>> +			*cur_progress = 1;
> >>> 		goto out;
> >>> +	}
> >>
> >> How about put cur_progress in struct collapse_control?
> >>
> >> Then we don't need to check cur_progress every time before modification.
> >
> > Thank you for suggestion.
> >
> > Placing it inside "struct collapse_control" makes the overall code
> > simpler, there also coincidentally has a 4-bytes hole, as shown below:
> >
> > struct collapse_control {
> >         bool                       is_khugepaged;        /*     0     1 */
> >
> >         /* XXX 3 bytes hole, try to pack */
> >
> >         u32                        node_load[64];        /*     4   256 */
> >
> >         /* XXX 4 bytes hole, try to pack */
> >
> >         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> >         nodemask_t                 alloc_nmask;          /*   264     8 */
> >
> >         /* size: 272, cachelines: 5, members: 3 */
> >         /* sum members: 265, holes: 2, sum holes: 7 */
> >         /* last cacheline: 16 bytes */
> > };
> >
> > But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> > will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> > be counted.
> >
> > David, do we want to place "cur_progress" inside the "struct collapse_control"?
>
> Might end up looking nicer code-wise. But the reset semantics (within a
> pmd) are a bit weird.
>
> > If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> > as show below:
> >
>
> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
>
> Play with it to see if it looks better :)

Hi Andrew, David,

Based on previous discussions [1], v2 as follow, and testing shows the
same performance benefits. Just make code cleaner, no function changes.

If David has no further revisions, Andrew, could you please squash the
following clean into this patch? If you prefer a new version, please let
me know. Thanks.

[1] https://lore.kernel.org/linux-mm/zdvzmoop5xswqcyiwmvvrdfianm4ccs3gryfecwbm4bhuh7ebo@7an4huwgbuwo

---

From 73e6aa8ffcd5ac1ee510938ff4bdbd24edc86680 Mon Sep 17 00:00:00 2001
From: Vernon Yang <yanglincheng@kylinos.cn>
Date: Thu, 26 Feb 2026 18:24:21 +0800
Subject: [PATCH] mm: khugepaged: simplify scanning progress

Placing "progress" inside "struct collapse_control" makes the overall
code simpler, there also coincidentally has a 4-bytes hole, as shown
below:

struct collapse_control {
        bool                       is_khugepaged;        /*     0     1 */
        /* XXX 3 bytes hole, try to pack */
        u32                        node_load[64];        /*     4   256 */
        /* XXX 4 bytes hole, try to pack */
        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        nodemask_t                 alloc_nmask;          /*   264     8 */

        /* size: 272, cachelines: 5, members: 3 */
        /* sum members: 265, holes: 2, sum holes: 7 */
        /* last cacheline: 16 bytes */
};

No function changes.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
 mm/khugepaged.c | 78 ++++++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7c1642fbe394..13b0fe50dfc5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -70,8 +70,8 @@ static struct task_struct *khugepaged_thread __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);

 /*
- * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
- * every 10 second.
+ * default scan 8*HPAGE_PMD_NR ptes, pte_mapped_hugepage, pmd_mapped,
+ * no_pte_table or vmas every 10 second.
  */
 static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
@@ -104,6 +104,9 @@ struct collapse_control {
 	/* Num pages scanned per node */
 	u32 node_load[MAX_NUMNODES];

+	/* Num pages scanned (see khugepaged_pages_to_scan) */
+	unsigned int progress;
+
 	/* nodemask for allocation fallback */
 	nodemask_t alloc_nmask;
 };
@@ -1246,8 +1249,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a

 static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, unsigned int *cur_progress,
-		struct collapse_control *cc)
+		bool *mmap_locked, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1263,8 +1265,7 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,

 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
 	if (result != SCAN_SUCCEED) {
-		if (cur_progress)
-			*cur_progress = 1;
+		cc->progress++;
 		goto out;
 	}

@@ -1272,16 +1273,14 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 	nodes_clear(cc->alloc_nmask);
 	pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
 	if (!pte) {
-		if (cur_progress)
-			*cur_progress = 1;
+		cc->progress++;
 		result = SCAN_NO_PTE_TABLE;
 		goto out;
 	}

 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
-		if (cur_progress)
-			*cur_progress += 1;
+		cc->progress++;

 		pte_t pteval = ptep_get(_pte);
 		if (pte_none_or_zero(pteval)) {
@@ -2314,7 +2313,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,

 static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 		unsigned long addr, struct file *file, pgoff_t start,
-		unsigned int *cur_progress, struct collapse_control *cc)
+		struct collapse_control *cc)
 {
 	struct folio *folio = NULL;
 	struct address_space *mapping = file->f_mapping;
@@ -2404,12 +2403,10 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 		}
 	}
 	rcu_read_unlock();
-	if (cur_progress) {
-		if (result == SCAN_PTE_MAPPED_HUGEPAGE)
-			*cur_progress = 1;
-		else
-			*cur_progress = HPAGE_PMD_NR;
-	}
+	if (result == SCAN_PTE_MAPPED_HUGEPAGE)
+		cc->progress++;
+	else
+		cc->progress += HPAGE_PMD_NR;

 	if (result == SCAN_SUCCEED) {
 		if (cc->is_khugepaged &&
@@ -2425,8 +2422,8 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 	return result;
 }

-static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result *result,
-					    struct collapse_control *cc)
+static void khugepaged_scan_mm_slot(unsigned int progress_max,
+		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
 	__acquires(&khugepaged_mm_lock)
 {
@@ -2434,9 +2431,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int progress = 0;
+	unsigned int progress_prev = cc->progress;

-	VM_BUG_ON(!pages);
 	lockdep_assert_held(&khugepaged_mm_lock);
 	*result = SCAN_FAIL;

@@ -2459,7 +2455,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 	if (unlikely(!mmap_read_trylock(mm)))
 		goto breakouterloop_mmap_lock;

-	progress++;
+	cc->progress++;
 	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
 		goto breakouterloop;

@@ -2469,17 +2465,17 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 		cond_resched();
 		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
-			progress++;
+			cc->progress++;
 			break;
 		}
 		if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) {
-			progress++;
+			cc->progress++;
 			continue;
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
 		if (khugepaged_scan.address > hend) {
-			progress++;
+			cc->progress++;
 			continue;
 		}
 		if (khugepaged_scan.address < hstart)
@@ -2488,7 +2484,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 		while (khugepaged_scan.address < hend) {
 			bool mmap_locked = true;
-			unsigned int cur_progress = 0;

 			cond_resched();
 			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
@@ -2505,8 +2500,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				mmap_read_unlock(mm);
 				mmap_locked = false;
 				*result = hpage_collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff,
-					&cur_progress, cc);
+					khugepaged_scan.address, file, pgoff, cc);
 				fput(file);
 				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
 					mmap_read_lock(mm);
@@ -2520,8 +2514,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				}
 			} else {
 				*result = hpage_collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked,
-					&cur_progress, cc);
+					khugepaged_scan.address, &mmap_locked, cc);
 			}

 			if (*result == SCAN_SUCCEED)
@@ -2529,7 +2522,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			progress += cur_progress;
 			if (!mmap_locked)
 				/*
 				 * We released mmap_lock so break loop.  Note
@@ -2539,7 +2531,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				 * correct result back to caller.
 				 */
 				goto breakouterloop_mmap_lock;
-			if (progress >= pages)
+			if (cc->progress >= progress_max)
 				goto breakouterloop;
 		}
 	}
@@ -2570,9 +2562,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 		collect_mm_slot(slot);
 	}

-	trace_mm_khugepaged_scan(mm, progress, khugepaged_scan.mm_slot == NULL);
-
-	return progress;
+	trace_mm_khugepaged_scan(mm, cc->progress - progress_prev,
+				 khugepaged_scan.mm_slot == NULL);
 }

 static int khugepaged_has_work(void)
@@ -2588,13 +2579,14 @@ static int khugepaged_wait_event(void)

 static void khugepaged_do_scan(struct collapse_control *cc)
 {
-	unsigned int progress = 0, pass_through_head = 0;
-	unsigned int pages = READ_ONCE(khugepaged_pages_to_scan);
+	const unsigned int progress_max = READ_ONCE(khugepaged_pages_to_scan);
+	unsigned int pass_through_head = 0;
 	bool wait = true;
 	enum scan_result result = SCAN_SUCCEED;

 	lru_add_drain_all();

+	cc->progress = 0;
 	while (true) {
 		cond_resched();

@@ -2606,13 +2598,12 @@ static void khugepaged_do_scan(struct collapse_control *cc)
 			pass_through_head++;
 		if (khugepaged_has_work() &&
 		    pass_through_head < 2)
-			progress += khugepaged_scan_mm_slot(pages - progress,
-							    &result, cc);
+			khugepaged_scan_mm_slot(progress_max, &result, cc);
 		else
-			progress = pages;
+			cc->progress = progress_max;
 		spin_unlock(&khugepaged_mm_lock);

-		if (progress >= pages)
+		if (cc->progress >= progress_max)
 			break;

 		if (result == SCAN_ALLOC_HUGE_PAGE_FAIL) {
@@ -2818,6 +2809,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	if (!cc)
 		return -ENOMEM;
 	cc->is_khugepaged = false;
+	cc->progress = 0;

 	mmgrab(mm);
 	lru_add_drain_all();
@@ -2852,7 +2844,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			mmap_locked = false;
 			*lock_dropped = true;
 			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
-							  NULL, cc);
+							  cc);

 			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
 			    mapping_can_writeback(file->f_mapping)) {
@@ -2867,7 +2859,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			fput(file);
 		} else {
 			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, NULL, cc);
+							 &mmap_locked, cc);
 		}
 		if (!mmap_locked)
 			*lock_dropped = true;
--
2.51.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-26 14:31         ` Vernon Yang
@ 2026-02-26 15:45           ` David Hildenbrand (Arm)
  2026-02-26 17:15             ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-26 15:45 UTC (permalink / raw)
  To: Vernon Yang, akpm
  Cc: Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On 2/26/26 15:31, Vernon Yang wrote:
> On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/25/26 15:25, Vernon Yang wrote:
>>>
>>> Thank you for suggestion.
>>>
>>> Placing it inside "struct collapse_control" makes the overall code
>>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
>>>
>>> struct collapse_control {
>>>         bool                       is_khugepaged;        /*     0     1 */
>>>
>>>         /* XXX 3 bytes hole, try to pack */
>>>
>>>         u32                        node_load[64];        /*     4   256 */
>>>
>>>         /* XXX 4 bytes hole, try to pack */
>>>
>>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
>>>         nodemask_t                 alloc_nmask;          /*   264     8 */
>>>
>>>         /* size: 272, cachelines: 5, members: 3 */
>>>         /* sum members: 265, holes: 2, sum holes: 7 */
>>>         /* last cacheline: 16 bytes */
>>> };
>>>
>>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
>>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
>>> be counted.
>>>
>>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
>>
>> Might end up looking nicer code-wise. But the reset semantics (within a
>> pmd) are a bit weird.
>>
>>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
>>> as show below:
>>>
>>
>> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
>>
>> Play with it to see if it looks better :)
> 
> Hi Andrew, David,
> 
> Based on previous discussions [1], v2 as follow, and testing shows the
> same performance benefits. Just make code cleaner, no function changes.
> 
> If David has no further revisions, Andrew, could you please squash the
> following clean into this patch? If you prefer a new version, please let
> me know. Thanks.

Do we also have to update the resulting patch description? Patch itself
LGTM.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-26 15:45           ` David Hildenbrand (Arm)
@ 2026-02-26 17:15             ` Vernon Yang
  2026-03-25 14:10               ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-02-26 17:15 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: akpm, Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua,
	lance.yang, linux-mm, linux-kernel, Vernon Yang

On Thu, Feb 26, 2026 at 11:45 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/26/26 15:31, Vernon Yang wrote:
> > On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> >> On 2/25/26 15:25, Vernon Yang wrote:
> >>>
> >>> Thank you for suggestion.
> >>>
> >>> Placing it inside "struct collapse_control" makes the overall code
> >>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> >>>
> >>> struct collapse_control {
> >>>         bool                       is_khugepaged;        /*     0     1 */
> >>>
> >>>         /* XXX 3 bytes hole, try to pack */
> >>>
> >>>         u32                        node_load[64];        /*     4   256 */
> >>>
> >>>         /* XXX 4 bytes hole, try to pack */
> >>>
> >>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> >>>         nodemask_t                 alloc_nmask;          /*   264     8 */
> >>>
> >>>         /* size: 272, cachelines: 5, members: 3 */
> >>>         /* sum members: 265, holes: 2, sum holes: 7 */
> >>>         /* last cacheline: 16 bytes */
> >>> };
> >>>
> >>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> >>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> >>> be counted.
> >>>
> >>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
> >>
> >> Might end up looking nicer code-wise. But the reset semantics (within a
> >> pmd) are a bit weird.
> >>
> >>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> >>> as show below:
> >>>
> >>
> >> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
> >>
> >> Play with it to see if it looks better :)
> >
> > Hi Andrew, David,
> >
> > Based on previous discussions [1], v2 as follow, and testing shows the
> > same performance benefits. Just make code cleaner, no function changes.
> >
> > If David has no further revisions, Andrew, could you please squash the
> > following clean into this patch? If you prefer a new version, please let
> > me know. Thanks.
>
> Do we also have to update the resulting patch description? Patch itself
> LGTM.

No need to update the patch description.

--
Cheers,
Vernon


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-02-26  7:55         ` Vernon Yang
@ 2026-03-16 19:41           ` Andrew Morton
  2026-03-17  2:16             ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-16 19:41 UTC (permalink / raw)
  To: Vernon Yang
  Cc: Barry Song, david, lorenzo.stoakes, ziy, dev.jain, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Thu, 26 Feb 2026 15:55:45 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:

> Hi Andrew, could you please squash the following fix into this patch?

yup.

> also remove "if" in the changelog above.

So you want it like this?

: All folios in VM_DROPPABLE are lazyfree, Collapsing maintains that
: property, so we can just collapse and memory pressure in the future will
: free it up.  In contrast, collapsing in !VM_DROPPABLE does not maintain
: that property, the collapsed folio will not be lazyfree and memory
: pressure in the future will not be able to free it up.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
  2026-03-16 19:41           ` Andrew Morton
@ 2026-03-17  2:16             ` Vernon Yang
  0 siblings, 0 replies; 41+ messages in thread
From: Vernon Yang @ 2026-03-17  2:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Barry Song, david, lorenzo.stoakes, ziy, dev.jain, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Mon, Mar 16, 2026 at 12:41:57PM -0700, Andrew Morton wrote:
> On Thu, 26 Feb 2026 15:55:45 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
>
> > Hi Andrew, could you please squash the following fix into this patch?
>
> yup.
>
> > also remove "if" in the changelog above.
>
> So you want it like this?

Yes, we should remove the "if"; otherwise, it’s misleading.

> : All folios in VM_DROPPABLE are lazyfree, Collapsing maintains that
> : property, so we can just collapse and memory pressure in the future will
> : free it up.  In contrast, collapsing in !VM_DROPPABLE does not maintain
> : that property, the collapsed folio will not be lazyfree and memory
> : pressure in the future will not be able to free it up.
>

LGTM, Thanks!

--
Cheers,
Vernon


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event
  2026-02-21  9:39 ` [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
@ 2026-03-25 14:06   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 14:06 UTC (permalink / raw)
  To: Vernon Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Sat, Feb 21, 2026 at 05:39:15PM +0800, Vernon Yang wrote:
> From: Vernon Yang <yanglincheng@kylinos.cn>
>
> Add mm_khugepaged_scan event to track the total time for full scan
> and the total number of pages scanned of khugepaged.
>
> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Reviewed-by: Barry Song <baohua@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Dev Jain <dev.jain@arm.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  include/trace/events/huge_memory.h | 25 +++++++++++++++++++++++++
>  mm/khugepaged.c                    |  2 ++
>  2 files changed, 27 insertions(+)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 4e41bff31888..384e29f6bef0 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -237,5 +237,30 @@ TRACE_EVENT(mm_khugepaged_collapse_file,
>  		__print_symbolic(__entry->result, SCAN_STATUS))
>  );
>
> +TRACE_EVENT(mm_khugepaged_scan,
> +
> +	TP_PROTO(struct mm_struct *mm, unsigned int progress,
> +		 bool full_scan_finished),
> +
> +	TP_ARGS(mm, progress, full_scan_finished),
> +
> +	TP_STRUCT__entry(
> +		__field(struct mm_struct *, mm)
> +		__field(unsigned int, progress)
> +		__field(bool, full_scan_finished)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->mm = mm;
> +		__entry->progress = progress;
> +		__entry->full_scan_finished = full_scan_finished;
> +	),
> +
> +	TP_printk("mm=%p, progress=%u, full_scan_finished=%d",
> +		__entry->mm,
> +		__entry->progress,
> +		__entry->full_scan_finished)
> +);
> +
>  #endif /* __HUGE_MEMORY_H */
>  #include <trace/define_trace.h>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index c0f893bebcff..e2f6b68a0011 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2527,6 +2527,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>  		collect_mm_slot(slot);
>  	}
>
> +	trace_mm_khugepaged_scan(mm, progress, khugepaged_scan.mm_slot == NULL);
> +
>  	return progress;
>  }
>
> --
> 2.51.0
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-02-26 17:15             ` Vernon Yang
@ 2026-03-25 14:10               ` Lorenzo Stoakes (Oracle)
  2026-03-25 14:22                 ` Lorenzo Stoakes (Oracle)
  2026-03-25 15:09                 ` Andrew Morton
  0 siblings, 2 replies; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 14:10 UTC (permalink / raw)
  To: Vernon Yang
  Cc: David Hildenbrand (Arm), akpm, Wei Yang, lorenzo.stoakes, ziy,
	dev.jain, baohua, lance.yang, linux-mm, linux-kernel, Vernon Yang

On Fri, Feb 27, 2026 at 01:15:24AM +0800, Vernon Yang wrote:
> On Thu, Feb 26, 2026 at 11:45 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 2/26/26 15:31, Vernon Yang wrote:
> > > On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> > >> On 2/25/26 15:25, Vernon Yang wrote:
> > >>>
> > >>> Thank you for suggestion.
> > >>>
> > >>> Placing it inside "struct collapse_control" makes the overall code
> > >>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> > >>>
> > >>> struct collapse_control {
> > >>>         bool                       is_khugepaged;        /*     0     1 */
> > >>>
> > >>>         /* XXX 3 bytes hole, try to pack */
> > >>>
> > >>>         u32                        node_load[64];        /*     4   256 */
> > >>>
> > >>>         /* XXX 4 bytes hole, try to pack */
> > >>>
> > >>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> > >>>         nodemask_t                 alloc_nmask;          /*   264     8 */
> > >>>
> > >>>         /* size: 272, cachelines: 5, members: 3 */
> > >>>         /* sum members: 265, holes: 2, sum holes: 7 */
> > >>>         /* last cacheline: 16 bytes */
> > >>> };
> > >>>
> > >>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> > >>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> > >>> be counted.
> > >>>
> > >>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
> > >>
> > >> Might end up looking nicer code-wise. But the reset semantics (within a
> > >> pmd) are a bit weird.
> > >>
> > >>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> > >>> as show below:
> > >>>
> > >>
> > >> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
> > >>
> > >> Play with it to see if it looks better :)
> > >
> > > Hi Andrew, David,
> > >
> > > Based on previous discussions [1], v2 as follow, and testing shows the
> > > same performance benefits. Just make code cleaner, no function changes.
> > >
> > > If David has no further revisions, Andrew, could you please squash the
> > > following clean into this patch? If you prefer a new version, please let
> > > me know. Thanks.
> >
> > Do we also have to update the resulting patch description? Patch itself
> > LGTM.
>
> No need to update the patch description.

I will take a look at this (sorry for delay) but general point - while
fix-patches are convenient, they're incredibly anti-reviewer.

I hope at some point in the future we can move away from that so you can look at
a series on list and know that what's shown there is the actual patch.

As it stands, I can't go line-by-line correctly here without quite a bit of
additional effort.

(Not a criticism of you Vernon just a general point about mm process).

>
> --
> Cheers,
> Vernon

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 14:10               ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 14:22                 ` Lorenzo Stoakes (Oracle)
  2026-03-25 15:09                 ` Andrew Morton
  1 sibling, 0 replies; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 14:22 UTC (permalink / raw)
  To: Vernon Yang
  Cc: David Hildenbrand (Arm), akpm, Wei Yang, lorenzo.stoakes, ziy,
	dev.jain, baohua, lance.yang, linux-mm, linux-kernel, Vernon Yang

On Wed, Mar 25, 2026 at 02:10:23PM +0000, Lorenzo Stoakes (Oracle) wrote:
>
> I will take a look at this (sorry for delay) but general point - while

OK well on second thoughts this is in mm-stable with 2 weeks to go (I still
have no understanding of how any of the process works) and thus is
immutable, and so I won't be reviewing this then I suppose.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 14:10               ` Lorenzo Stoakes (Oracle)
  2026-03-25 14:22                 ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 15:09                 ` Andrew Morton
  2026-03-25 15:17                   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 15:09 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Vernon Yang, David Hildenbrand (Arm), Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > No need to update the patch description.
> 
> I will take a look at this (sorry for delay) but general point - while
> fix-patches are convenient, they're incredibly anti-reviewer.
> 
> I hope at some point in the future we can move away from that so you can look at
> a series on list and know that what's shown there is the actual patch.

Oh.  I've never really received that message, at least not at all
clearly.

I've been hoping that the -fix patches are actually pro-reviewer, for
those reviewers who have looked at the previous version.  A full resend
of something you've already looked at is quite annoying!

I try to mitigate that by sending the
heres-what-you-changed-since-last-time replies.  It's a little more
work at this end, but that's not at all a problem.

I see a couple of options here

a) I can fold the -fix into the base patch then send out the
   resulting diff as a reply-to-all.

b) We can just deprecate the -fix things and ask people for full
   resends.

It depends on what people prefer.  How do we determine that?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 15:09                 ` Andrew Morton
@ 2026-03-25 15:17                   ` David Hildenbrand (Arm)
  2026-03-25 15:20                     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-25 15:17 UTC (permalink / raw)
  To: Andrew Morton, Lorenzo Stoakes (Oracle)
  Cc: Vernon Yang, Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua,
	lance.yang, linux-mm, linux-kernel, Vernon Yang

On 3/25/26 16:09, Andrew Morton wrote:
> On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> 
>>> No need to update the patch description.
>>
>> I will take a look at this (sorry for delay) but general point - while
>> fix-patches are convenient, they're incredibly anti-reviewer.

+1, I could have sworn we brought that up before. :)

>>
>> I hope at some point in the future we can move away from that so you can look at
>> a series on list and know that what's shown there is the actual patch.
> 
> Oh.  I've never really received that message, at least not at all
> clearly.
> 
> I've been hoping that the -fix patches are actually pro-reviewer, for
> those reviewers who have looked at the previous version.  A full resend
> of something you've already looked at is quite annoying!
> 
> I try to mitigate that by sending the
> heres-what-you-changed-since-last-time replies.  It's a little more
> work at this end, but that's not at all a problem.
> 
> I see a couple of options here
> 
> a) I can fold the -fix into the base patch then send out the
>    resulting diff as a reply-to-all.
> 
> b) We can just deprecate the -fix things and ask people for full
>    resends.
> 
> It depends on what people prefer.  How do we determine that?

I like "fix" for smaller "obvious" stuff where a resend is really just
noise.

But for bigger stuff I prefer a full resend (we can still have these
temporary fixups, but for reviewers a follow-up resend is better).

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 15:17                   ` David Hildenbrand (Arm)
@ 2026-03-25 15:20                     ` Lorenzo Stoakes (Oracle)
  2026-03-25 15:22                       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 15:20 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Vernon Yang, Wei Yang, lorenzo.stoakes, ziy,
	dev.jain, baohua, lance.yang, linux-mm, linux-kernel, Vernon Yang

On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> On 3/25/26 16:09, Andrew Morton wrote:
> > On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> >>> No need to update the patch description.
> >>
> >> I will take a look at this (sorry for delay) but general point - while
> >> fix-patches are convenient, they're incredibly anti-reviewer.
>
> +1, I could have sworn we brought that up before. :)
>
> >>
> >> I hope at some point in the future we can move away from that so you can look at
> >> a series on list and know that what's shown there is the actual patch.
> >
> > Oh.  I've never really received that message, at least not at all
> > clearly.
> >
> > I've been hoping that the -fix patches are actually pro-reviewer, for
> > those reviewers who have looked at the previous version.  A full resend
> > of something you've already looked at is quite annoying!
> >
> > I try to mitigate that by sending the
> > heres-what-you-changed-since-last-time replies.  It's a little more
> > work at this end, but that's not at all a problem.
> >
> > I see a couple of options here
> >
> > a) I can fold the -fix into the base patch then send out the
> >    resulting diff as a reply-to-all.
> >
> > b) We can just deprecate the -fix things and ask people for full
> >    resends.
> >
> > It depends on what people prefer.  How do we determine that?
>
> I like "fix" for smaller "obvious" stuff where a resend is really just
> noise.
>
> But for bigger stuff I prefer a full resend (we can still have these
> temporary fixups, but for reviewers a follow-up resend is better).

Yeah, it's really about being able to come to a series later and be able to
comment line-by-line.

Really larger stuff should be resent I think, esp. if there's multiple
fixes in the series.

A reply with the same-patch-but-with-fix-applied would definitely be
useful!

>
> --
> Cheers,
>
> David

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 15:20                     ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 15:22                       ` David Hildenbrand (Arm)
  2026-03-25 16:17                         ` Andrew Morton
  2026-03-25 17:00                         ` Vernon Yang
  0 siblings, 2 replies; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-25 15:22 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Andrew Morton, Vernon Yang, Wei Yang, lorenzo.stoakes, ziy,
	dev.jain, baohua, lance.yang, linux-mm, linux-kernel, Vernon Yang

On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
>> On 3/25/26 16:09, Andrew Morton wrote:
>>>
>>
>> +1, I could have sworn we brought that up before. :)
>>
>>>
>>> Oh.  I've never really received that message, at least not at all
>>> clearly.
>>>
>>> I've been hoping that the -fix patches are actually pro-reviewer, for
>>> those reviewers who have looked at the previous version.  A full resend
>>> of something you've already looked at is quite annoying!
>>>
>>> I try to mitigate that by sending the
>>> heres-what-you-changed-since-last-time replies.  It's a little more
>>> work at this end, but that's not at all a problem.
>>>
>>> I see a couple of options here
>>>
>>> a) I can fold the -fix into the base patch then send out the
>>>    resulting diff as a reply-to-all.
>>>
>>> b) We can just deprecate the -fix things and ask people for full
>>>    resends.
>>>
>>> It depends on what people prefer.  How do we determine that?
>>
>> I like "fix" for smaller "obvious" stuff where a resend is really just
>> noise.
>>
>> But for bigger stuff I prefer a full resend (we can still have these
>> temporary fixups, but for reviewers a follow-up resend is better).
> 
> Yeah, it's really about being able to come to a series later and be able to
> comment line-by-line.
> 
> Really larger stuff should be resent I think, esp. if there's multiple
> fixes in the series.
> 
> A reply with the same-patch-but-with-fix-applied would definitely be
> useful!

Right, for completeness, this is what we had in an off-list thread:

"
Not sure if that's a problem for others, but I got the feeling that this
escalated a bit lately.

I know, that we prefer fixups to sort out smaller stuff. So far so good.
In the last time there were some series where I was seriously completely
lost which state of the patches would go upstream, or what I should even
review, because there were just fixups over fixups.

Fixups are nice, but for someone reviewing a series, too many fixups
(either as inline patch or even worse, as independent patches) just
causes a mess.

It also gives the impression of "this is mostly done, so don't waste
your time reviewing it anymore." --- "just the finishing touches" ---
"don't jump in late and cause trouble".
"

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 15:22                       ` David Hildenbrand (Arm)
@ 2026-03-25 16:17                         ` Andrew Morton
  2026-03-25 16:26                           ` Lorenzo Stoakes (Oracle)
  2026-03-25 17:00                         ` Vernon Yang
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 16:17 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lorenzo Stoakes (Oracle), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, 25 Mar 2026 16:22:55 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> > 
> > Really larger stuff should be resent I think, esp. if there's multiple
> > fixes in the series.
> > 
> > A reply with the same-patch-but-with-fix-applied would definitely be
> > useful!
> 
> Right, for completeness, this is what we had in an off-list thread:
> 
> "
> Not sure if that's a problem for others, but I got the feeling that this
> escalated a bit lately.
> 
> I know, that we prefer fixups to sort out smaller stuff. So far so good.
> In the last time there were some series where I was seriously completely
> lost which state of the patches would go upstream, or what I should even
> review, because there were just fixups over fixups.
> 
> Fixups are nice, but for someone reviewing a series, too many fixups
> (either as inline patch or even worse, as independent patches) just
> causes a mess.
> 
> It also gives the impression of "this is mostly done, so don't waste
> your time reviewing it anymore." --- "just the finishing touches" ---
> "don't jump in late and cause trouble".
> "

hm OK, so what to do.  We're OK with teeny -fixes but anything more
substantial we ask for a full resend and I do the heres-what-changed
reply?

I presently don't fold the -fixes until the very last moment.  Could do
that much earlier if it helps anything?  Possibly useful to people who
are looking at the series in the mm.git tree.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 16:17                         ` Andrew Morton
@ 2026-03-25 16:26                           ` Lorenzo Stoakes (Oracle)
  2026-03-25 18:36                             ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 16:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Arm), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, Mar 25, 2026 at 09:17:35AM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 16:22:55 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>
> > >
> > > Really larger stuff should be resent I think, esp. if there's multiple
> > > fixes in the series.
> > >
> > > A reply with the same-patch-but-with-fix-applied would definitely be
> > > useful!
> >
> > Right, for completeness, this is what we had in an off-list thread:
> >
> > "
> > Not sure if that's a problem for others, but I got the feeling that this
> > escalated a bit lately.
> >
> > I know, that we prefer fixups to sort out smaller stuff. So far so good.
> > In the last time there were some series where I was seriously completely
> > lost which state of the patches would go upstream, or what I should even
> > review, because there were just fixups over fixups.
> >
> > Fixups are nice, but for someone reviewing a series, too many fixups
> > (either as inline patch or even worse, as independent patches) just
> > causes a mess.
> >
> > It also gives the impression of "this is mostly done, so don't waste
> > your time reviewing it anymore." --- "just the finishing touches" ---
> > "don't jump in late and cause trouble".
> > "
>
> hm OK, so what to do.  We're OK with teeny -fixes but anything more
> substantial we ask for a full resend and I do the heres-what-changed
> reply?
>

Yeah that works for me.

> I presently don't fold the -fixes until the very last moment.  Could do
> that much earlier if it helps anything?  Possibly useful to people who
> are looking at the series in the mm.git tree.

It'd generally be easier imo to have those changes folded, but with something
added to the commit message to indicate this so I can know whether or not that
was folded in.

Maybe just directly squash the commits?

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 15:22                       ` David Hildenbrand (Arm)
  2026-03-25 16:17                         ` Andrew Morton
@ 2026-03-25 17:00                         ` Vernon Yang
  2026-03-25 17:08                           ` Lorenzo Stoakes (Oracle)
  2026-03-25 18:59                           ` Andrew Morton
  1 sibling, 2 replies; 41+ messages in thread
From: Vernon Yang @ 2026-03-25 17:00 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand (Arm), Lorenzo Stoakes (Oracle)
  Cc: Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang,
	linux-mm, linux-kernel, Vernon Yang

On Wed, Mar 25, 2026 at 11:23 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> >> On 3/25/26 16:09, Andrew Morton wrote:
> >>>
> >>
> >> +1, I could have sworn we brought that up before. :)
> >>
> >>>
> >>> Oh.  I've never really received that message, at least not at all
> >>> clearly.
> >>>
> >>> I've been hoping that the -fix patches are actually pro-reviewer, for
> >>> those reviewers who have looked at the previous version.  A full resend
> >>> of something you've already looked at is quite annoying!
> >>>
> >>> I try to mitigate that by sending the
> >>> heres-what-you-changed-since-last-time replies.  It's a little more
> >>> work at this end, but that's not at all a problem.
> >>>
> >>> I see a couple of options here
> >>>
> >>> a) I can fold the -fix into the base patch then send out the
> >>>    resulting diff as a reply-to-all.
> >>>
> >>> b) We can just deprecate the -fix things and ask people for full
> >>>    resends.
> >>>
> >>> It depends on what people prefer.  How do we determine that?
> >>
> >> I like "fix" for smaller "obvious" stuff where a resend is really just
> >> noise.
> >>
> >> But for bigger stuff I prefer a full resend (we can still have these
> >> temporary fixups, but for reviewers a follow-up resend is better).

Yeah, I completely agree with this policy.

Initially, I thought it was just cleanup without functional changes, but
I didn't consider the new reviewers (haven't seen the previous version).
Sorry.

Although this patchset is already in the mm-stable branch, if we want to
resend V9 version, I'd be happy to do so. Please let me know. Thanks!

> > Yeah, it's really about being able to come to a series later and be able to
> > comment line-by-line.
> >
> > Really larger stuff should be resent I think, esp. if there's multiple
> > fixes in the series.
> >
> > A reply with the same-patch-but-with-fix-applied would definitely be
> > useful!
>
> Right, for completeness, this is what we had in an off-list thread:
>
> "
> Not sure if that's a problem for others, but I got the feeling that this
> escalated a bit lately.
>
> I know, that we prefer fixups to sort out smaller stuff. So far so good.
> In the last time there were some series where I was seriously completely
> lost which state of the patches would go upstream, or what I should even
> review, because there were just fixups over fixups.
>
> Fixups are nice, but for someone reviewing a series, too many fixups
> (either as inline patch or even worse, as independent patches) just
> causes a mess.
>
> It also gives the impression of "this is mostly done, so don't waste
> your time reviewing it anymore." --- "just the finishing touches" ---
> "don't jump in late and cause trouble".
> "

--
Cheers,
Vernon


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 17:00                         ` Vernon Yang
@ 2026-03-25 17:08                           ` Lorenzo Stoakes (Oracle)
  2026-03-25 18:59                           ` Andrew Morton
  1 sibling, 0 replies; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 17:08 UTC (permalink / raw)
  To: Vernon Yang
  Cc: Andrew Morton, David Hildenbrand (Arm), Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Thu, Mar 26, 2026 at 01:00:23AM +0800, Vernon Yang wrote:
> On Wed, Mar 25, 2026 at 11:23 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> > > On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> > >> On 3/25/26 16:09, Andrew Morton wrote:
> > >>>
> > >>
> > >> +1, I could have sworn we brought that up before. :)
> > >>
> > >>>
> > >>> Oh.  I've never really received that message, at least not at all
> > >>> clearly.
> > >>>
> > >>> I've been hoping that the -fix patches are actually pro-reviewer, for
> > >>> those reviewers who have looked at the previous version.  A full resend
> > >>> of something you've already looked at is quite annoying!
> > >>>
> > >>> I try to mitigate that by sending the
> > >>> heres-what-you-changed-since-last-time replies.  It's a little more
> > >>> work at this end, but that's not at all a problem.
> > >>>
> > >>> I see a couple of options here
> > >>>
> > >>> a) I can fold the -fix into the base patch then send out the
> > >>>    resulting diff as a reply-to-all.
> > >>>
> > >>> b) We can just deprecate the -fix things and ask people for full
> > >>>    resends.
> > >>>
> > >>> It depends on what people prefer.  How do we determine that?
> > >>
> > >> I like "fix" for smaller "obvious" stuff where a resend is really just
> > >> noise.
> > >>
> > >> But for bigger stuff I prefer a full resend (we can still have these
> > >> temporary fixups, but for reviewers a follow-up resend is better).
>
> Yeah, I completely agree with this policy.
>
> Initially, I thought it was just cleanup without functional changes, but
> I didn't consider the new reviewers (haven't seen the previous version).
> Sorry.
>
> Although this patchset is already in the mm-stable branch, if we want to
> resend V9 version, I'd be happy to do so. Please let me know. Thanks!

There's no need don't worry :) this isn't about your series, just a general
point.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 16:26                           ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 18:36                             ` Andrew Morton
  2026-03-25 18:53                               ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 18:36 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: David Hildenbrand (Arm), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, 25 Mar 2026 16:26:12 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > hm OK, so what to do.  We're OK with teeny -fixes but anything more
> > substantial we ask for a full resend and I do the heres-what-changed
> > reply?
> >
> 
> Yeah that works for me.
> 
> > I presently don't fold the -fixes until the very last moment.  Could do
> > that much earlier if it helps anything?  Possibly useful to people who
> > are looking at the series in the mm.git tree.
> 
> It'd generally be easier imo to have those changes folded, but with something
> added to the commit message to indicate this so I can know whether or not that
> was folded in.

I always add a [footer] when folding -fixes, eg:

[sj@kernel.org: verify found biggest system ram]
  Link: https://lkml.kernel.org/r/20260317144725.88524-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260311052927.93921-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Yang yingliang <yangyingliang@huawei.com>

So one can simply chase the link.


An unfortunate exception is when the -fix is from myself - I don't take
the patch from a mailing list so I have no Link: to include, eg

[akpm@linux-foundation.org: fix spello, add comment]
Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

But these things are usually quite minor and the precipitating
discussion can be found by reading the main Link:.

> Maybe just directly squash the commits?

Not understanding this proposal?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 18:36                             ` Andrew Morton
@ 2026-03-25 18:53                               ` Lorenzo Stoakes (Oracle)
  2026-03-25 19:15                                 ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 18:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Arm), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, Mar 25, 2026 at 11:36:50AM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 16:26:12 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > > hm OK, so what to do.  We're OK with teeny -fixes but anything more
> > > substantial we ask for a full resend and I do the heres-what-changed
> > > reply?
> > >
> >
> > Yeah that works for me.
> >
> > > I presently don't fold the -fixes until the very last moment.  Could do
> > > that much earlier if it helps anything?  Possibly useful to people who
> > > are looking at the series in the mm.git tree.
> >
> > It'd generally be easier imo to have those changes folded, but with something
> > added to the commit message to indicate this so I can know whether or not that
> > was folded in.
>
> I always add a [footer] when folding -fixes, eg:
>
> [sj@kernel.org: verify found biggest system ram]
>   Link: https://lkml.kernel.org/r/20260317144725.88524-1-sj@kernel.org
> Link: https://lkml.kernel.org/r/20260311052927.93921-3-sj@kernel.org
> Signed-off-by: SeongJae Park <sj@kernel.org>
> Cc: Yang yingliang <yangyingliang@huawei.com>
>
> So one can simply chase the link.
>
>
> An unfortunate exception is when the -fix is from myself - I don't take
> the patch from a mailing list so I have no Link: to include, eg
>
> [akpm@linux-foundation.org: fix spello, add comment]
> Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Wang Jinchao <wangjinchao600@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> But these things are usually quite minor and the precipitating
> discussion can be found by reading the main Link:.
>
> > Maybe just directly squash the commits?
>
> Not understanding this proposal?

I mean instead of having a separate commit for the fix, put that fix into the
patch before it and denote it with a footer as you put above.

I guess that translates to what you do when you rebase and fold the fixes into
commits as you do now anyway.

I don't see any reason not to do that right away, as really it's good to see the
combined change in one go for all practical purposes (if I resend, I'll be
combining work, if I can grab it from the tree and avoid a git rebase -i all the
better).

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 17:00                         ` Vernon Yang
  2026-03-25 17:08                           ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 18:59                           ` Andrew Morton
  2026-03-25 19:04                             ` Lorenzo Stoakes (Oracle)
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 18:59 UTC (permalink / raw)
  To: Vernon Yang
  Cc: David Hildenbrand (Arm), Lorenzo Stoakes (Oracle), Wei Yang,
	lorenzo.stoakes, ziy, dev.jain, baohua, lance.yang, linux-mm,
	linux-kernel, Vernon Yang

On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:

> Although this patchset is already in the mm-stable branch, if we want to
> resend V9 version, I'd be happy to do so. Please let me know. Thanks!

Depends on what changed in v9.  If it's a major change then I expect
I'd drop this series from mm-stable and we restart the clock on the
integration and review of this work.

If it's a minor touchup then a standalone patch against mm-stable would
be fine, or just leave things as-is and prepare that change after
7.1-rc1.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 18:59                           ` Andrew Morton
@ 2026-03-25 19:04                             ` Lorenzo Stoakes (Oracle)
  2026-03-26  1:59                               ` Vernon Yang
  0 siblings, 1 reply; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 19:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vernon Yang, David Hildenbrand (Arm), Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
>
> > Although this patchset is already in the mm-stable branch, if we want to
> > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
>
> Depends on what changed in v9.  If it's a major change then I expect
> I'd drop this series from mm-stable and we restart the clock on the
> integration and review of this work.
>
> If it's a minor touchup then a standalone patch against mm-stable would
> be fine, or just leave things as-is and prepare that change after
> 7.1-rc1.

I think he means just resending this for the sake of reviewability or
whatnot?  Anyway if it's for my sake it's fine there's no need, it's
already in mm-stable and I don't want to make a fuss, I'll just try to
structure review more efficiently in future so I don't end up with BOTH a
backlog AND a random-walk of what I actually review :)

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 18:53                               ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 19:15                                 ` Andrew Morton
  2026-03-25 20:03                                   ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 19:15 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: David Hildenbrand (Arm), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > [akpm@linux-foundation.org: fix spello, add comment]
> > Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> > Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> > Cc: Joel Granados <joel.granados@kernel.org>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Wang Jinchao <wangjinchao600@gmail.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
> > But these things are usually quite minor and the precipitating
> > discussion can be found by reading the main Link:.
> >
> > > Maybe just directly squash the commits?
> >
> > Not understanding this proposal?
> 
> I mean instead of having a separate commit for the fix, put that fix into the
> patch before it and denote it with a footer as you put above.
> 
> I guess that translates to what you do when you rebase and fold the fixes into
> commits as you do now anyway.
> 
> I don't see any reason not to do that right away, as really it's good to see the
> combined change in one go for all practical purposes (if I resend, I'll be
> combining work, if I can grab it from the tree and avoid a git rebase -i all the
> better).

OK.  So what have we concluded here?

Is it: if I get a -fix, I add that in the usual way, then temporarily
fold it into the base patch and mail the result out for fyi.  Then
after <period> I permanently fold the fix into the base and add the
footer?

If so, what's <period>?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 19:15                                 ` Andrew Morton
@ 2026-03-25 20:03                                   ` Lorenzo Stoakes (Oracle)
  2026-03-25 20:16                                     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 20:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Arm), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > > [akpm@linux-foundation.org: fix spello, add comment]
> > > Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> > > Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> > > Cc: Joel Granados <joel.granados@kernel.org>
> > > Cc: Petr Mladek <pmladek@suse.com>
> > > Cc: Wang Jinchao <wangjinchao600@gmail.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > >
> > > But these things are usually quite minor and the precipitating
> > > discussion can be found by reading the main Link:.
> > >
> > > > Maybe just directly squash the commits?
> > >
> > > Not understanding this proposal?
> >
> > I mean instead of having a separate commit for the fix, put that fix into the
> > patch before it and denote it with a footer as you put above.
> >
> > I guess that translates to what you do when you rebase and fold the fixes into
> > commits as you do now anyway.
> >
> > I don't see any reason not to do that right away, as really it's good to see the
> > combined change in one go for all practical purposes (if I resend, I'll be
> > combining work, if I can grab it from the tree and avoid a git rebase -i all the
> > better).
>
> OK.  So what have we concluded here?
>
> Is it: if I get a -fix, I add that in the usual way, then temporarily
> fold it into the base patch and mail the result out for fyi.  Then
> after <period> I permanently fold the fix into the base and add the
> footer?
>
> If so, what's <period>?

To me it feels like that should be 0, just squash it in right away, since the
trees are being rebased constantly right?

And that means the tree contains exactly what it would if the series were
re-sent.

David, what do you think?

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 20:03                                   ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 20:16                                     ` David Hildenbrand (Arm)
  2026-03-25 20:49                                       ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-25 20:16 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: Vernon Yang, Wei Yang, lorenzo.stoakes, ziy, dev.jain, baohua,
	lance.yang, linux-mm, linux-kernel, Vernon Yang

On 3/25/26 21:03, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
>> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>>
>>>
>>> I mean instead of having a separate commit for the fix, put that fix into the
>>> patch before it and denote it with a footer as you put above.
>>>
>>> I guess that translates to what you do when you rebase and fold the fixes into
>>> commits as you do now anyway.
>>>
>>> I don't see any reason not to do that right away, as really it's good to see the
>>> combined change in one go for all practical purposes (if I resend, I'll be
>>> combining work, if I can grab it from the tree and avoid a git rebase -i all the
>>> better).
>>
>> OK.  So what have we concluded here?
>>
>> Is it: if I get a -fix, I add that in the usual way, then temporarily
>> fold it into the base patch and mail the result out for fyi.  Then
>> after <period> I permanently fold the fix into the base and add the
>> footer?
>>
>> If so, what's <period>?
> 
> To me it feels like that should be 0, just squash it in right away, since the
> trees are being rebased constantly right?
> 
> And that means the tree contains exactly what it would if the series were
> re-sent.
> 
> David, what do you think?

How often was it helpful that a fixup patch would stay separate? I would
assume "not often". :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 20:16                                     ` David Hildenbrand (Arm)
@ 2026-03-25 20:49                                       ` Andrew Morton
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew Morton @ 2026-03-25 20:49 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lorenzo Stoakes (Oracle), Vernon Yang, Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Wed, 25 Mar 2026 21:16:11 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> On 3/25/26 21:03, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
> >> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >>
> >>>
> >>> I mean instead of having a separate commit for the fix, put that fix into the
> >>> patch before it and denote it with a footer as you put above.
> >>>
> >>> I guess that translates to what you do when you rebase and fold the fixes into
> >>> commits as you do now anyway.
> >>>
> >>> I don't see any reason not to do that right away, as really it's good to see the
> >>> combined change in one go for all practical purposes (if I resend, I'll be
> >>> combining work, if I can grab it from the tree and avoid a git rebase -i all the
> >>> better).
> >>
> >> OK.  So what have we concluded here?
> >>
> >> Is it: if I get a -fix, I add that in the usual way, then temporarily
> >> fold it into the base patch and mail the result out for fyi.  Then
> >> after <period> I permanently fold the fix into the base and add the
> >> footer?
> >>
> >> If so, what's <period>?
> > 
> > To me it feels like that should be 0, just squash it in right away,

OK...

> since the trees are being rebased constantly right?

yup, the mm-*_unstable branches and mm-new are blown away and rebuilt
from quilt each time.  And the quilt patches are rediffed and refreshed
during this.

> > And that means the tree contains exactly what it would if the series were
> > re-sent.
> > 
> > David, what do you think?
> 
> How often was it helpful that a fixup patch would stay separate? I would
> assume "not often". :)

Not often.  Sometimes a -fix is messed up and we grow a -fix-fix.  The
record is something like -fix-fix-fix-fix-fix.  I suppose there's
slight value in tracking this for a while.  Not much though.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-25 19:04                             ` Lorenzo Stoakes (Oracle)
@ 2026-03-26  1:59                               ` Vernon Yang
  2026-03-26  8:03                                 ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 41+ messages in thread
From: Vernon Yang @ 2026-03-26  1:59 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Andrew Morton, David Hildenbrand (Arm), Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Thu, Mar 26, 2026 at 3:04 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> > On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > > Although this patchset is already in the mm-stable branch, if we want to
> > > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
> >
> > Depends on what changed in v9.  If it's a major change then I expect
> > I'd drop this series from mm-stable and we restart the clock on the
> > integration and review of this work.
> >
> > If it's a minor touchup then a standalone patch against mm-stable would
> > be fine, or just leave things as-is and prepare that change after
> > 7.1-rc1.
>
> I think he means just resending this for the sake of reviewability or

Yes, Just resending this for the sake of reviewability.

> whatnot?  Anyway if it's for my sake it's fine there's no need, it's
> already in mm-stable and I don't want to make a fuss, I'll just try to
> structure review more efficiently in future so I don't end up with BOTH a
> backlog AND a random-walk of what I actually review :)
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number
  2026-03-26  1:59                               ` Vernon Yang
@ 2026-03-26  8:03                                 ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-26  8:03 UTC (permalink / raw)
  To: Vernon Yang
  Cc: Andrew Morton, David Hildenbrand (Arm), Wei Yang, lorenzo.stoakes,
	ziy, dev.jain, baohua, lance.yang, linux-mm, linux-kernel,
	Vernon Yang

On Thu, Mar 26, 2026 at 09:59:46AM +0800, Vernon Yang wrote:
> On Thu, Mar 26, 2026 at 3:04 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> > > On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
> > >
> > > > Although this patchset is already in the mm-stable branch, if we want to
> > > > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
> > >
> > > Depends on what changed in v9.  If it's a major change then I expect
> > > I'd drop this series from mm-stable and we restart the clock on the
> > > integration and review of this work.
> > >
> > > If it's a minor touchup then a standalone patch against mm-stable would
> > > be fine, or just leave things as-is and prepare that change after
> > > 7.1-rc1.
> >
> > I think he means just resending this for the sake of reviewability or
>
> Yes, Just resending this for the sake of reviewability.

Yeah then there's no need, let's leave this as it is.

Separately - thanks very much for doing this Vernon :) paying down technical
debt in THP is _very_ much appreciated!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-03-26  8:03 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-21  9:39 [PATCH mm-new v8 0/4] Improve khugepaged scan logic Vernon Yang
2026-02-21  9:39 ` [PATCH mm-new v8 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
2026-03-25 14:06   ` Lorenzo Stoakes (Oracle)
2026-02-21  9:39 ` [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number Vernon Yang
2026-02-24  3:52   ` Wei Yang
2026-02-25 14:25     ` Vernon Yang
2026-02-25 14:29       ` David Hildenbrand (Arm)
2026-02-26 14:31         ` Vernon Yang
2026-02-26 15:45           ` David Hildenbrand (Arm)
2026-02-26 17:15             ` Vernon Yang
2026-03-25 14:10               ` Lorenzo Stoakes (Oracle)
2026-03-25 14:22                 ` Lorenzo Stoakes (Oracle)
2026-03-25 15:09                 ` Andrew Morton
2026-03-25 15:17                   ` David Hildenbrand (Arm)
2026-03-25 15:20                     ` Lorenzo Stoakes (Oracle)
2026-03-25 15:22                       ` David Hildenbrand (Arm)
2026-03-25 16:17                         ` Andrew Morton
2026-03-25 16:26                           ` Lorenzo Stoakes (Oracle)
2026-03-25 18:36                             ` Andrew Morton
2026-03-25 18:53                               ` Lorenzo Stoakes (Oracle)
2026-03-25 19:15                                 ` Andrew Morton
2026-03-25 20:03                                   ` Lorenzo Stoakes (Oracle)
2026-03-25 20:16                                     ` David Hildenbrand (Arm)
2026-03-25 20:49                                       ` Andrew Morton
2026-03-25 17:00                         ` Vernon Yang
2026-03-25 17:08                           ` Lorenzo Stoakes (Oracle)
2026-03-25 18:59                           ` Andrew Morton
2026-03-25 19:04                             ` Lorenzo Stoakes (Oracle)
2026-03-26  1:59                               ` Vernon Yang
2026-03-26  8:03                                 ` Lorenzo Stoakes (Oracle)
2026-02-21  9:39 ` [PATCH mm-new v8 3/4] mm: add folio_test_lazyfree helper Vernon Yang
2026-02-21  9:39 ` [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios Vernon Yang
2026-02-21 10:27   ` Barry Song
2026-02-21 13:38     ` Vernon Yang
2026-02-23 13:16       ` David Hildenbrand (Arm)
2026-02-23 20:08         ` Barry Song
2026-02-24 10:10           ` David Hildenbrand (Arm)
2026-02-23 20:10       ` Barry Song
2026-02-26  7:55         ` Vernon Yang
2026-03-16 19:41           ` Andrew Morton
2026-03-17  2:16             ` Vernon Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox