Linux Kernel Selftest development
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
@ 2026-05-07  7:05 Chen Wandun
  2026-05-07  7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Chen Wandun @ 2026-05-07  7:05 UTC (permalink / raw)
  To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest

madvise_collapse() computes a THP-aligned window from the caller's range:

  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
  hend   =  end   &  HPAGE_PMD_MASK                    /* round down */

When the caller's range is smaller than one PMD (2 MiB) and/or not
PMD-aligned, hstart can end up greater than hend.  In that case the
collapsing loop is correctly skipped, but the return value was computed
as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
subtraction wraps unsigned, producing a huge value, the comparison
"thps != 0" fires, and -EINVAL is returned instead of 0.

A concrete example:

  /* both cover less than one THP; both should return 0 */
  madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
  madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */

The fix moves the hstart/hend calculation before kmalloc_obj() and
returns 0 early when hstart >= hend.  This also avoids the kmalloc,
mmgrab(), and lru_add_drain_all() calls for ranges that trivially
contain no PMD window.  The same effect could be achieved by only
guarding the final return expression, but early-return keeps the
no-op path free of the allocator and drain overhead.

Patch 1 fixes the kernel bug.
Patch 2 adds a selftest with two cases covering the hstart == hend
(aligned, was already correct) and hstart > hend (unaligned, was
broken) scenarios.

Chen Wandun (2):
  mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  selftests/mm: add MADV_COLLAPSE sub-PMD range tests

 mm/khugepaged.c                               |   9 +-
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   2 +
 .../selftests/mm/ksft_madv_collapse.sh        |   4 +
 .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
 tools/testing/selftests/mm/run_vmtests.sh     |   5 +
 6 files changed, 159 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
 create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-07  7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun
@ 2026-05-07  7:05 ` Chen Wandun
  2026-05-08 12:27   ` David Hildenbrand (Arm)
  2026-05-07  7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun
  2026-05-09  9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang
  2 siblings, 1 reply; 14+ messages in thread
From: Chen Wandun @ 2026-05-07  7:05 UTC (permalink / raw)
  To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest

madvise_collapse() computes the THP-aligned window:

  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
  hend   =  end   &  HPAGE_PMD_MASK                    /* round down */

Previously this was done after kmalloc_obj(), so problem arose when
the range contained no complete PMD-aligned window (hstart >= hend).

When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
final comparison fails and -EINVAL is returned instead of 0.  Consider
two single-page calls on a 2 MiB-aligned address:

    /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
    madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);

    /* hstart = aligned + 2MiB, hend = aligned
     * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
    madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);

Both calls cover less than one THP and collapse nothing; both should
return 0.

In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
called before discovering there was nothing to do, only for the code
to kfree() and return immediately after.

Fix both by computing hstart/hend after thp_vma_allowable_order() but
before kmalloc_obj(), and returning 0 early when hstart >= hend.

Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
---
 mm/khugepaged.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..92473d93e837 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
 		return -EINVAL;
 
+	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
+	hend = end & HPAGE_PMD_MASK;
+
+	if (hstart >= hend)
+		return 0;
+
 	cc = kmalloc_obj(*cc);
 	if (!cc)
 		return -ENOMEM;
@@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	mmgrab(mm);
 	lru_add_drain_all();
 
-	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
-	hend = end & HPAGE_PMD_MASK;
-
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests
  2026-05-07  7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun
  2026-05-07  7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
@ 2026-05-07  7:05 ` Chen Wandun
  2026-05-08 12:23   ` David Hildenbrand (Arm)
  2026-05-09  9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang
  2 siblings, 1 reply; 14+ messages in thread
From: Chen Wandun @ 2026-05-07  7:05 UTC (permalink / raw)
  To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest

Add madv_collapse_range to verify the fix for the spurious -EINVAL
returned by madvise_collapse() when the madvised range contains no
complete PMD-aligned window.

madvise_collapse() rounds the caller range inward to PMD boundaries:

  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  // round up
  hend   =  end   &  HPAGE_PMD_MASK                    // round down

When hstart >= hend there is no PMD window to collapse.  Previously
the final expression computed (hend - hstart) without guarding against
hstart > hend, causing unsigned wrap-around and a spurious -EINVAL.
Both tests expect 0: "no PMD window to collapse" is a successful no-op.

  Test 1 - aligned start (hstart == hend):
    start = 2MiB-aligned, len = PAGE_SIZE
    Both hstart and hend round to the same 2MiB boundary.
    Was already correct; included as a regression reference.

  Test 2 - unaligned start (hstart > hend):
    start = aligned + PAGE_SIZE, len = PAGE_SIZE
    hstart rounds up past the next 2MiB boundary while hend rounds
    down below start.  (hend - hstart) wrapped unsigned, causing the
    final comparison to fail and return -EINVAL instead of 0.

Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
---
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   2 +
 .../selftests/mm/ksft_madv_collapse.sh        |   4 +
 .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
 tools/testing/selftests/mm/run_vmtests.sh     |   5 +
 5 files changed, 153 insertions(+)
 create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
 create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c

diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index b0c30c5ee9e3..a24f8c3cf3dc 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -28,6 +28,7 @@ protection_keys
 protection_keys_32
 protection_keys_64
 madv_populate
+madv_collapse_range
 uffd-stress
 uffd-unit-tests
 uffd-wp-mremap
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index cd24596cdd27..758639f8ae8e 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -68,6 +68,7 @@ TEST_GEN_FILES += hugepage-mremap
 TEST_GEN_FILES += hugepage-shm
 TEST_GEN_FILES += hugepage-vmemmap
 TEST_GEN_FILES += khugepaged
+TEST_GEN_FILES += madv_collapse_range
 TEST_GEN_FILES += madv_populate
 TEST_GEN_FILES += map_fixed_noreplace
 TEST_GEN_FILES += map_hugetlb
@@ -153,6 +154,7 @@ TEST_PROGS += ksft_hugetlb.sh
 TEST_PROGS += ksft_hugevm.sh
 TEST_PROGS += ksft_ksm.sh
 TEST_PROGS += ksft_ksm_numa.sh
+TEST_PROGS += ksft_madv_collapse.sh
 TEST_PROGS += ksft_madv_guard.sh
 TEST_PROGS += ksft_madv_populate.sh
 TEST_PROGS += ksft_memfd_secret.sh
diff --git a/tools/testing/selftests/mm/ksft_madv_collapse.sh b/tools/testing/selftests/mm/ksft_madv_collapse.sh
new file mode 100755
index 000000000000..0d0b0356cbd0
--- /dev/null
+++ b/tools/testing/selftests/mm/ksft_madv_collapse.sh
@@ -0,0 +1,4 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0
+
+./run_vmtests.sh -t madv_collapse
diff --git a/tools/testing/selftests/mm/madv_collapse_range.c b/tools/testing/selftests/mm/madv_collapse_range.c
new file mode 100644
index 000000000000..11850be80dd8
--- /dev/null
+++ b/tools/testing/selftests/mm/madv_collapse_range.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Tests for MADV_COLLAPSE behavior when the madvise range contains no
+ * complete PMD-aligned window (range smaller than 2 MiB).
+ *
+ * madvise_collapse() rounds the caller range inward to PMD boundaries:
+ *
+ *   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  // round up
+ *   hend   =  end   &  HPAGE_PMD_MASK                    // round down
+ *
+ * When hstart >= hend the collapsing loop is not entered.  Previously,
+ * the final return expression computed (hend - hstart) without guarding
+ * against hstart > hend, causing unsigned wrap-around and a spurious
+ * -EINVAL.  Both tests expect 0: "no PMD window to collapse" is a
+ * successful no-op, not an error.
+ *
+ * Test 1: aligned start (hstart == hend):
+ *   start = 2MiB-aligned, len = PAGE_SIZE
+ *   hstart = aligned, hend = aligned  ->  0 == 0  ->  0  (was already correct)
+ *
+ * Test 2: unaligned start (hstart > hend):
+ *   start = aligned + PAGE_SIZE, len = PAGE_SIZE
+ *   hstart = aligned + 2MiB, hend = aligned
+ *   (hend - hstart) wraps unsigned  ->  was -EINVAL, fixed to 0
+ */
+#define _GNU_SOURCE
+#include <errno.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <linux/mman.h>
+
+#include "kselftest.h"
+#include "vm_util.h"
+
+#ifndef MADV_COLLAPSE
+#define MADV_COLLAPSE	25
+#endif
+
+static unsigned long	page_size;
+static unsigned long	hpage_size;
+
+/*
+ * Test 1: 2MiB-aligned start, len = PAGE_SIZE.
+ *   hstart == hend  ->  0
+ */
+static void test_subpmd_aligned(char *aligned)
+{
+	int ret;
+
+	ksft_print_msg("[RUN] sub-PMD: 2MiB-aligned start, len=PAGE_SIZE\n");
+	ret = madvise(aligned, page_size, MADV_COLLAPSE);
+	ksft_test_result(ret == 0,
+			 "sub-PMD aligned start returns 0 (ret=%d errno=%d)\n",
+			 ret, ret ? errno : 0);
+}
+
+/*
+ * Test 2: start = aligned + PAGE_SIZE, len = PAGE_SIZE.
+ *   hstart = aligned + hpage_size  >  hend = aligned
+ *   unsigned wrap was -EINVAL; correct answer is 0.
+ */
+static void test_subpmd_unaligned(char *aligned)
+{
+	int ret;
+
+	ksft_print_msg("[RUN] sub-PMD: unaligned start (aligned+PAGE), len=PAGE_SIZE\n");
+	ksft_print_msg("      hstart=%p > hend=%p\n",
+		       (void *)(aligned + hpage_size), (void *)aligned);
+
+	ret = madvise(aligned + page_size, page_size, MADV_COLLAPSE);
+	if (ret && errno == EINVAL)
+		ksft_print_msg("      got -EINVAL: unsigned-wrap bug not fixed\n");
+	ksft_test_result(ret == 0,
+			 "sub-PMD unaligned start returns 0 (ret=%d errno=%d)\n",
+			 ret, ret ? errno : 0);
+}
+
+int main(void)
+{
+	char *base, *aligned;
+	unsigned long map_size;
+	int probe_ret;
+
+	ksft_print_header();
+	ksft_set_plan(2);
+
+	page_size  = (unsigned long)getpagesize();
+	hpage_size = (unsigned long)read_pmd_pagesize();
+	if (!hpage_size)
+		ksft_exit_skip("transparent hugepages not available\n");
+
+	/*
+	 * Probe: map one hpage-sized region, touch all pages, and attempt a
+	 * real collapse to confirm MADV_COLLAPSE is supported.  EAGAIN is a
+	 * transient resource failure and still counts as "available".
+	 */
+	map_size = 2 * hpage_size;
+	base = mmap(NULL, map_size, PROT_READ | PROT_WRITE,
+		    MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (base == MAP_FAILED)
+		ksft_exit_fail_msg("probe mmap failed: %s\n", strerror(errno));
+
+	aligned = (char *)(((unsigned long)base + hpage_size - 1)
+			   & ~(hpage_size - 1));
+
+	for (unsigned long i = 0; i < hpage_size; i += page_size)
+		aligned[i] = 0;
+
+	probe_ret = madvise(aligned, hpage_size, MADV_COLLAPSE);
+	munmap(base, map_size);
+	if (probe_ret && errno != EAGAIN)
+		ksft_exit_skip("MADV_COLLAPSE not available: %s\n",
+			       strerror(errno));
+
+	/*
+	 * Both sub-PMD tests share a single 2 * hpage mapping so that
+	 * every test range falls within the same VMA.
+	 */
+	map_size = 2 * hpage_size;
+	base = mmap(NULL, map_size, PROT_READ | PROT_WRITE,
+		    MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (base == MAP_FAILED)
+		ksft_exit_fail_msg("mmap failed: %s\n", strerror(errno));
+
+	for (unsigned long i = 0; i < map_size; i += page_size)
+		base[i] = 0;
+
+	aligned = (char *)(((unsigned long)base + hpage_size - 1)
+			   & ~(hpage_size - 1));
+
+	test_subpmd_aligned(aligned);
+	test_subpmd_unaligned(aligned);
+
+	munmap(base, map_size);
+
+	if (ksft_get_fail_cnt())
+		ksft_exit_fail_msg("%d out of %d tests failed\n",
+				   ksft_get_fail_cnt(), ksft_test_num());
+	ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index d8468451b3a3..58402f8261e0 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -53,6 +53,8 @@ separated by spaces:
 	test madvise(2) MADV_GUARD_INSTALL and MADV_GUARD_REMOVE options
 - madv_populate
 	test memadvise(2) MADV_POPULATE_{READ,WRITE} options
+- madv_collapse
+	test madvise(2) MADV_COLLAPSE sub-PMD range handling
 - memfd_secret
 	test memfd_secret(2)
 - process_mrelease
@@ -422,6 +424,9 @@ CATEGORY="madv_guard" run_test ./guard-regions
 # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
 CATEGORY="madv_populate" run_test ./madv_populate
 
+# MADV_COLLAPSE sub-PMD range tests
+CATEGORY="madv_collapse" run_test ./madv_collapse_range
+
 # PROCESS_MADV test
 CATEGORY="process_madv" run_test ./process_madv
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests
  2026-05-07  7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun
@ 2026-05-08 12:23   ` David Hildenbrand (Arm)
  2026-05-08 15:03     ` Lorenzo Stoakes
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 12:23 UTC (permalink / raw)
  To: Chen Wandun, akpm, ljs, shuah, zokeefe
  Cc: linux-kernel, linux-mm, linux-kselftest

On 5/7/26 09:05, Chen Wandun wrote:
> Add madv_collapse_range to verify the fix for the spurious -EINVAL
> returned by madvise_collapse() when the madvised range contains no
> complete PMD-aligned window.

Do we really need 141 LOC of test to verify a "userspace is being stupid" scenario?

Nothing was really broken, it's just a suboptimal return value, or what am I
missing?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-07  7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
@ 2026-05-08 12:27   ` David Hildenbrand (Arm)
  2026-05-08 15:02     ` Lorenzo Stoakes
  2026-05-09  5:56     ` Wandun
  0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 12:27 UTC (permalink / raw)
  To: Chen Wandun, akpm, ljs, shuah, zokeefe
  Cc: linux-kernel, linux-mm, linux-kselftest

On 5/7/26 09:05, Chen Wandun wrote:
> madvise_collapse() computes the THP-aligned window:
> 
>   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>   hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
> 
> Previously this was done after kmalloc_obj(), so problem arose when
> the range contained no complete PMD-aligned window (hstart >= hend).
> 
> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
> final comparison fails and -EINVAL is returned instead of 0.  Consider
> two single-page calls on a 2 MiB-aligned address:
> 
>     /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
>     madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);
> 
>     /* hstart = aligned + 2MiB, hend = aligned
>      * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
>     madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
> 
> Both calls cover less than one THP and collapse nothing; both should
> return 0.

Okay, so we talk about a "userspace is being stupid" scenario.

> 
> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
> called before discovering there was nothing to do, only for the code
> to kfree() and return immediately after.

Just a comment as you motivate here why this is suboptimal: we do not care about
a "userspace is being stupid" scenario being fast.

> 
> Fix both by computing hstart/hend after thp_vma_allowable_order() but
> before kmalloc_obj(), and returning 0 early when hstart >= hend.
> 
> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")

Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
stable.

> Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
> ---
>  mm/khugepaged.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b8452dbdb043..92473d93e837 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
>  		return -EINVAL;
>  
> +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> +	hend = end & HPAGE_PMD_MASK;
> +
> +	if (hstart >= hend)
> +		return 0;
> +
>  	cc = kmalloc_obj(*cc);
>  	if (!cc)
>  		return -ENOMEM;
> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	mmgrab(mm);
>  	lru_add_drain_all();
>  
> -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> -	hend = end & HPAGE_PMD_MASK;
> -
>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>  		enum scan_result result = SCAN_FAIL;
>  

In general, LGTM, but see for conflict:
https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 12:27   ` David Hildenbrand (Arm)
@ 2026-05-08 15:02     ` Lorenzo Stoakes
  2026-05-08 15:04       ` Lorenzo Stoakes
                         ` (2 more replies)
  2026-05-09  5:56     ` Wandun
  1 sibling, 3 replies; 14+ messages in thread
From: Lorenzo Stoakes @ 2026-05-08 15:02 UTC (permalink / raw)
  To: Chen Wandun
  Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel,
	linux-mm, linux-kselftest

On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote:
> On 5/7/26 09:05, Chen Wandun wrote:
> > madvise_collapse() computes the THP-aligned window:
> >
> >   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
> >   hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
> >
> > Previously this was done after kmalloc_obj(), so problem arose when
> > the range contained no complete PMD-aligned window (hstart >= hend).
> >
> > When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
> > final comparison fails and -EINVAL is returned instead of 0.  Consider

I think both should return -EINVAL.

> > two single-page calls on a 2 MiB-aligned address:
> >
> >     /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
> >     madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);

What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address?

> >
> >     /* hstart = aligned + 2MiB, hend = aligned
> >      * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
> >     madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
> >
> > Both calls cover less than one THP and collapse nothing; both should
> > return 0.

Disagree.

>
> Okay, so we talk about a "userspace is being stupid" scenario.

Yes!

I feel that -EINVAL is correct for hend > hstart, and I think it might even be a
userland A[BP]I break to change it (maybe somebody, somewhere is being foolish
enough to use this to also validate input ranges).

The weirdness is when hstart == hend being 0 but that's sort of established
behaviour I guess.

>
> >
> > In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
> > called before discovering there was nothing to do, only for the code
> > to kfree() and return immediately after.
>
> Just a comment as you motivate here why this is suboptimal: we do not care about
> a "userspace is being stupid" scenario being fast.

Yes, in general - so what? The user is doing stupid things, so the user wins
stupid prizes?

>
> >
> > Fix both by computing hstart/hend after thp_vma_allowable_order() but
> > before kmalloc_obj(), and returning 0 early when hstart >= hend.
> >
> > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
>
> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
> stable.

I'm not sure I want a fixes here, this isn't really fixing anything. This isn't
a bug afaik, it's just us not handling this brilliantly, but (possibly by
mistake) getting the right output.

>
> > Signed-off-by: Chen Wandun <chenwandun@lixiang.com>

I put this patch through AI detection and it's telling me there's an 80% chance
this whole thing is LLM-generated, which is making me grumpy.

Can you confirm that this is, in fact, your own work? Plagiarism is not a nice
thing to do, and THP doesn't need more traffic, we're overloaded as it is.

> > ---
> >  mm/khugepaged.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index b8452dbdb043..92473d93e837 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >  	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
> >  		return -EINVAL;
> >
> > +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > +	hend = end & HPAGE_PMD_MASK;

See below re: conflict.

> > +
> > +	if (hstart >= hend)
> > +		return 0;

	if (hstart > hend)
		return -EINVAL;
	/* For compatibility, users may rely on this. */
	if (hstart == hend)
		return 0;

Is probably better.

But I'm not sure what the point is if we're already doing this behaviour?

> > +
> >  	cc = kmalloc_obj(*cc);
> >  	if (!cc)
> >  		return -ENOMEM;
> > @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >  	mmgrab(mm);
> >  	lru_add_drain_all();
> >
> > -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > -	hend = end & HPAGE_PMD_MASK;
> > -
> >  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >  		enum scan_result result = SCAN_FAIL;
> >
>
> In general, LGTM, but see for conflict:
> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/

Please use mm-unstable as a basis for your mm work Chen, this is something you
need to fix, the patch above has been around for a while and is in
mm-unstable.

You have patches in mm already so you should know better by now.

But I'm really not sure I'm in favour of this anyway. I'll defer to David but
this feels useless to me.

>
>
> --
> Cheers,
>
> David

Thanks, Lorenzo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests
  2026-05-08 12:23   ` David Hildenbrand (Arm)
@ 2026-05-08 15:03     ` Lorenzo Stoakes
  0 siblings, 0 replies; 14+ messages in thread
From: Lorenzo Stoakes @ 2026-05-08 15:03 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Chen Wandun, akpm, shuah, zokeefe, linux-kernel, linux-mm,
	linux-kselftest

On Fri, May 08, 2026 at 02:23:41PM +0200, David Hildenbrand (Arm) wrote:
> On 5/7/26 09:05, Chen Wandun wrote:
> > Add madv_collapse_range to verify the fix for the spurious -EINVAL
> > returned by madvise_collapse() when the madvised range contains no
> > complete PMD-aligned window.
>
> Do we really need 141 LOC of test to verify a "userspace is being stupid" scenario?
>
> Nothing was really broken, it's just a suboptimal return value, or what am I
> missing?

Agreed, drop this please.

>
> --
> Cheers,
>
> David

Thanks, Lorenzo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 15:02     ` Lorenzo Stoakes
@ 2026-05-08 15:04       ` Lorenzo Stoakes
  2026-05-09  7:53         ` Wandun
  2026-05-08 19:29       ` David Hildenbrand (Arm)
  2026-05-09  7:04       ` Wandun
  2 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Stoakes @ 2026-05-08 15:04 UTC (permalink / raw)
  To: Chen Wandun
  Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel,
	linux-mm, linux-kselftest

On Fri, May 08, 2026 at 04:02:37PM +0100, Lorenzo Stoakes wrote:
> On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote:
> > On 5/7/26 09:05, Chen Wandun wrote:
> > > madvise_collapse() computes the THP-aligned window:
> > >
> > >   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
> > >   hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
> > >
> > > Previously this was done after kmalloc_obj(), so problem arose when
> > > the range contained no complete PMD-aligned window (hstart >= hend).
> > >
> > > When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
> > > final comparison fails and -EINVAL is returned instead of 0.  Consider
>
> I think both should return -EINVAL.

Correction: I changed my mind (see below), and think == should return 0 simply
for compatibility reasons. Though honestly both really should have been -EINVAL
from the start...

>
> > > two single-page calls on a 2 MiB-aligned address:
> > >
> > >     /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
> > >     madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);
>
> What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address?
>
> > >
> > >     /* hstart = aligned + 2MiB, hend = aligned
> > >      * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
> > >     madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
> > >
> > > Both calls cover less than one THP and collapse nothing; both should
> > > return 0.
>
> Disagree.
>
> >
> > Okay, so we talk about a "userspace is being stupid" scenario.
>
> Yes!
>
> I feel that -EINVAL is correct for hend > hstart, and I think it might even be a
> userland A[BP]I break to change it (maybe somebody, somewhere is being foolish
> enough to use this to also validate input ranges).
>
> The weirdness is when hstart == hend being 0 but that's sort of established
> behaviour I guess.
>
> >
> > >
> > > In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
> > > called before discovering there was nothing to do, only for the code
> > > to kfree() and return immediately after.
> >
> > Just a comment as you motivate here why this is suboptimal: we do not care about
> > a "userspace is being stupid" scenario being fast.
>
> Yes, in general - so what? The user is doing stupid things, so the user wins
> stupid prizes?
>
> >
> > >
> > > Fix both by computing hstart/hend after thp_vma_allowable_order() but
> > > before kmalloc_obj(), and returning 0 early when hstart >= hend.
> > >
> > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> >
> > Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
> > stable.
>
> I'm not sure I want a fixes here, this isn't really fixing anything. This isn't
> a bug afaik, it's just us not handling this brilliantly, but (possibly by
> mistake) getting the right output.
>
> >
> > > Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
>
> I put this patch through AI detection and it's telling me there's an 80% chance
> this whole thing is LLM-generated, which is making me grumpy.
>
> Can you confirm that this is, in fact, your own work? Plagiarism is not a nice
> thing to do, and THP doesn't need more traffic, we're overloaded as it is.
>
> > > ---
> > >  mm/khugepaged.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index b8452dbdb043..92473d93e837 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > >  	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
> > >  		return -EINVAL;
> > >
> > > +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > +	hend = end & HPAGE_PMD_MASK;
>
> See below re: conflict.
>
> > > +
> > > +	if (hstart >= hend)
> > > +		return 0;
>
> 	if (hstart > hend)
> 		return -EINVAL;
> 	/* For compatibility, users may rely on this. */
> 	if (hstart == hend)
> 		return 0;
>
> Is probably better.
>
> But I'm not sure what the point is if we're already doing this behaviour?
>
> > > +
> > >  	cc = kmalloc_obj(*cc);
> > >  	if (!cc)
> > >  		return -ENOMEM;
> > > @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > >  	mmgrab(mm);
> > >  	lru_add_drain_all();
> > >
> > > -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
> > > -	hend = end & HPAGE_PMD_MASK;
> > > -
> > >  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> > >  		enum scan_result result = SCAN_FAIL;
> > >
> >
> > In general, LGTM, but see for conflict:
> > https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/
>
> Please use mm-unstable as a basis for your mm work Chen, this is something you
> need to fix, the patch above has been around for a while and is in
> mm-unstable.
>
> You have patches in mm already so you should know better by now.
>
> But I'm really not sure I'm in favour of this anyway. I'll defer to David but
> this feels useless to me.
>
> >
> >
> > --
> > Cheers,
> >
> > David
>
> Thanks, Lorenzo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 15:02     ` Lorenzo Stoakes
  2026-05-08 15:04       ` Lorenzo Stoakes
@ 2026-05-08 19:29       ` David Hildenbrand (Arm)
  2026-05-09  7:04       ` Wandun
  2 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-08 19:29 UTC (permalink / raw)
  To: Lorenzo Stoakes, Chen Wandun
  Cc: akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest

>>
>> In general, LGTM, but see for conflict:
>> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/
> 
> Please use mm-unstable as a basis for your mm work Chen, this is something you
> need to fix, the patch above has been around for a while and is in
> mm-unstable.
> 
> You have patches in mm already so you should know better by now.
> 
> But I'm really not sure I'm in favour of this anyway. I'll defer to David but
> this feels useless to me.
I don't enjoy the underflow in the return statement, so I think we should just
clean that up.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 12:27   ` David Hildenbrand (Arm)
  2026-05-08 15:02     ` Lorenzo Stoakes
@ 2026-05-09  5:56     ` Wandun
  1 sibling, 0 replies; 14+ messages in thread
From: Wandun @ 2026-05-09  5:56 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm, ljs, shuah, zokeefe
  Cc: linux-kernel, linux-mm, linux-kselftest



On 5/8/26 20:27, David Hildenbrand (Arm) wrote:
> On 5/7/26 09:05, Chen Wandun wrote:
>> madvise_collapse() computes the THP-aligned window:
>>
>>    hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>>    hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>>
>> Previously this was done after kmalloc_obj(), so problem arose when
>> the range contained no complete PMD-aligned window (hstart >= hend).
>>
>> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
>> final comparison fails and -EINVAL is returned instead of 0.  Consider
>> two single-page calls on a 2 MiB-aligned address:
>>
>>      /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
>>      madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);
>>
>>      /* hstart = aligned + 2MiB, hend = aligned
>>       * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
>>      madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
>>
>> Both calls cover less than one THP and collapse nothing; both should
>> return 0.
> Okay, so we talk about a "userspace is being stupid" scenario.
>
>> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
>> called before discovering there was nothing to do, only for the code
>> to kfree() and return immediately after.
> Just a comment as you motivate here why this is suboptimal: we do not care about
> a "userspace is being stupid" scenario being fast.
>
>> Fix both by computing hstart/hend after thp_vma_allowable_order() but
>> before kmalloc_obj(), and returning 0 early when hstart >= hend.
>>
>> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
> stable.
Yes, agree, I would drop this Fixes tag in v2 to avoid any confusion.
>
>> Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
>> ---
>>   mm/khugepaged.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b8452dbdb043..92473d93e837 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>   	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
>>   		return -EINVAL;
>>   
>> +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>> +	hend = end & HPAGE_PMD_MASK;
>> +
>> +	if (hstart >= hend)
>> +		return 0;
>> +
>>   	cc = kmalloc_obj(*cc);
>>   	if (!cc)
>>   		return -ENOMEM;
>> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>   	mmgrab(mm);
>>   	lru_add_drain_all();
>>   
>> -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>> -	hend = end & HPAGE_PMD_MASK;
>> -
>>   	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>   		enum scan_result result = SCAN_FAIL;
>>   
> In general, LGTM, but see for conflict:
> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/
Thanks for your review, I will fix the conflict and send v2 version.

Best regards,
Wandun
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 15:02     ` Lorenzo Stoakes
  2026-05-08 15:04       ` Lorenzo Stoakes
  2026-05-08 19:29       ` David Hildenbrand (Arm)
@ 2026-05-09  7:04       ` Wandun
  2 siblings, 0 replies; 14+ messages in thread
From: Wandun @ 2026-05-09  7:04 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel,
	linux-mm, linux-kselftest



On 5/8/26 23:02, Lorenzo Stoakes wrote:
> On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote:
>> On 5/7/26 09:05, Chen Wandun wrote:
>>> madvise_collapse() computes the THP-aligned window:
>>>
>>>    hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>>>    hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>>>
>>> Previously this was done after kmalloc_obj(), so problem arose when
>>> the range contained no complete PMD-aligned window (hstart >= hend).
>>>
>>> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
>>> final comparison fails and -EINVAL is returned instead of 0.  Consider
> I think both should return -EINVAL.
>
>>> two single-page calls on a 2 MiB-aligned address:
>>>
>>>      /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
>>>      madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);
> What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address?
Yes, PMD-aligned address.
>
>>>      /* hstart = aligned + 2MiB, hend = aligned
>>>       * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
>>>      madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
>>>
>>> Both calls cover less than one THP and collapse nothing; both should
>>> return 0.
> Disagree.
>
>> Okay, so we talk about a "userspace is being stupid" scenario.
> Yes!
>
> I feel that -EINVAL is correct for hend > hstart, and I think it might even be a
> userland A[BP]I break to change it (maybe somebody, somewhere is being foolish
> enough to use this to also validate input ranges).
>
> The weirdness is when hstart == hend being 0 but that's sort of established
> behaviour I guess.
>
>>> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
>>> called before discovering there was nothing to do, only for the code
>>> to kfree() and return immediately after.
>> Just a comment as you motivate here why this is suboptimal: we do not care about
>> a "userspace is being stupid" scenario being fast.
> Yes, in general - so what? The user is doing stupid things, so the user wins
> stupid prizes?
>
>>> Fix both by computing hstart/hend after thp_vma_allowable_order() but
>>> before kmalloc_obj(), and returning 0 early when hstart >= hend.
>>>
>>> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
>> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
>> stable.
> I'm not sure I want a fixes here, this isn't really fixing anything. This isn't
> a bug afaik, it's just us not handling this brilliantly, but (possibly by
> mistake) getting the right output.
Yes, I also thinks this patch only fixes minor issue or cosidered a 
clean-up.
I would drop this Fixes tag in v2 to avoid any confusion.
>
>>> Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
> I put this patch through AI detection and it's telling me there's an 80% chance
> this whole thing is LLM-generated, which is making me grumpy.
>
> Can you confirm that this is, in fact, your own work? Plagiarism is not a nice
> thing to do, and THP doesn't need more traffic, we're overloaded as it is.
I can confirm this patch is my own work,I found the issue and wrote this 
patch myself.

The issue was found when I noticed THP pages were still being generated 
even after
adding "transparent_hugepage=never" to the cmdline, after debugging, and 
finally found
this was due to madvise + collapse path, while reviewing code I found 
this minor
issue and wrote this patch.

I did use an LLM, but only to check the commit message to find 
spelling/grammar
errors and improve readability.

I fully understand your concern about the traffic, I will be more 
careful about
what I send to the list.
>
>>> ---
>>>   mm/khugepaged.c | 9 ++++++---
>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index b8452dbdb043..92473d93e837 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>   	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
>>>   		return -EINVAL;
>>>
>>> +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>>> +	hend = end & HPAGE_PMD_MASK;
> See below re: conflict.
>
>>> +
>>> +	if (hstart >= hend)
>>> +		return 0;
> 	if (hstart > hend)
> 		return -EINVAL;
> 	/* For compatibility, users may rely on this. */
> 	if (hstart == hend)
> 		return 0;
>
> Is probably better.
>
> But I'm not sure what the point is if we're already doing this behaviour?
>
>>> +
>>>   	cc = kmalloc_obj(*cc);
>>>   	if (!cc)
>>>   		return -ENOMEM;
>>> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>   	mmgrab(mm);
>>>   	lru_add_drain_all();
>>>
>>> -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>>> -	hend = end & HPAGE_PMD_MASK;
>>> -
>>>   	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>>   		enum scan_result result = SCAN_FAIL;
>>>
>> In general, LGTM, but see for conflict:
>> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/
> Please use mm-unstable as a basis for your mm work Chen, this is something you
> need to fix, the patch above has been around for a while and is in
> mm-unstable.
>
> You have patches in mm already so you should know better by now.
Apologies for not basing this on mm-unstable, I'll fix in v2.

Thanks for your review.

Best regards,
Wandun
>
> But I'm really not sure I'm in favour of this anyway. I'll defer to David but
> this feels useless to me.
>
>>
>> --
>> Cheers,
>>
>> David
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  2026-05-08 15:04       ` Lorenzo Stoakes
@ 2026-05-09  7:53         ` Wandun
  0 siblings, 0 replies; 14+ messages in thread
From: Wandun @ 2026-05-09  7:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel,
	linux-mm, linux-kselftest



On 5/8/26 23:04, Lorenzo Stoakes wrote:
> On Fri, May 08, 2026 at 04:02:37PM +0100, Lorenzo Stoakes wrote:
>> On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote:
>>> On 5/7/26 09:05, Chen Wandun wrote:
>>>> madvise_collapse() computes the THP-aligned window:
>>>>
>>>>    hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>>>>    hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>>>>
>>>> Previously this was done after kmalloc_obj(), so problem arose when
>>>> the range contained no complete PMD-aligned window (hstart >= hend).
>>>>
>>>> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the
>>>> final comparison fails and -EINVAL is returned instead of 0.  Consider
>> I think both should return -EINVAL.
> Correction: I changed my mind (see below), and think == should return 0 simply
> for compatibility reasons. Though honestly both really should have been -EINVAL
> from the start...
I also hesitated before return -EINVAL or 0, for the two cases mentioned 
above, it
didn't collapse into any THP, so I wanted to return -EINVAL.

Later, I saw the function comment for do_madvise and madvise manual, so 
I thought
the cases mentioned above didn't meet the conditions for returning 
-EINVAL, so
I followed madvise manual and do_madvise comment.

Best regards,
Wandun

>
>>>> two single-page calls on a 2 MiB-aligned address:
>>>>
>>>>      /* hstart == hend == aligned  ->  0 == 0  ->  returns 0 */
>>>>      madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);
>> What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address?
>>
>>>>      /* hstart = aligned + 2MiB, hend = aligned
>>>>       * (hend - hstart) wraps unsigned  ->  returns -EINVAL */
>>>>      madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE);
>>>>
>>>> Both calls cover less than one THP and collapse nothing; both should
>>>> return 0.
>> Disagree.
>>
>>> Okay, so we talk about a "userspace is being stupid" scenario.
>> Yes!
>>
>> I feel that -EINVAL is correct for hend > hstart, and I think it might even be a
>> userland A[BP]I break to change it (maybe somebody, somewhere is being foolish
>> enough to use this to also validate input ranges).
>>
>> The weirdness is when hstart == hend being 0 but that's sort of established
>> behaviour I guess.
>>
>>>> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all
>>>> called before discovering there was nothing to do, only for the code
>>>> to kfree() and return immediately after.
>>> Just a comment as you motivate here why this is suboptimal: we do not care about
>>> a "userspace is being stupid" scenario being fast.
>> Yes, in general - so what? The user is doing stupid things, so the user wins
>> stupid prizes?
>>
>>>> Fix both by computing hstart/hend after thp_vma_allowable_order() but
>>>> before kmalloc_obj(), and returning 0 early when hstart >= hend.
>>>>
>>>> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
>>> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC
>>> stable.
>> I'm not sure I want a fixes here, this isn't really fixing anything. This isn't
>> a bug afaik, it's just us not handling this brilliantly, but (possibly by
>> mistake) getting the right output.
>>
>>>> Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
>> I put this patch through AI detection and it's telling me there's an 80% chance
>> this whole thing is LLM-generated, which is making me grumpy.
>>
>> Can you confirm that this is, in fact, your own work? Plagiarism is not a nice
>> thing to do, and THP doesn't need more traffic, we're overloaded as it is.
>>
>>>> ---
>>>>   mm/khugepaged.c | 9 ++++++---
>>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index b8452dbdb043..92473d93e837 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>>   	if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER))
>>>>   		return -EINVAL;
>>>>
>>>> +	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>>>> +	hend = end & HPAGE_PMD_MASK;
>> See below re: conflict.
>>
>>>> +
>>>> +	if (hstart >= hend)
>>>> +		return 0;
>> 	if (hstart > hend)
>> 		return -EINVAL;
>> 	/* For compatibility, users may rely on this. */
>> 	if (hstart == hend)
>> 		return 0;
>>
>> Is probably better.
>>
>> But I'm not sure what the point is if we're already doing this behaviour?
>>
>>>> +
>>>>   	cc = kmalloc_obj(*cc);
>>>>   	if (!cc)
>>>>   		return -ENOMEM;
>>>> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>>   	mmgrab(mm);
>>>>   	lru_add_drain_all();
>>>>
>>>> -	hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
>>>> -	hend = end & HPAGE_PMD_MASK;
>>>> -
>>>>   	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>>>   		enum scan_result result = SCAN_FAIL;
>>>>
>>> In general, LGTM, but see for conflict:
>>> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/
>> Please use mm-unstable as a basis for your mm work Chen, this is something you
>> need to fix, the patch above has been around for a while and is in
>> mm-unstable.
>>
>> You have patches in mm already so you should know better by now.
>>
>> But I'm really not sure I'm in favour of this anyway. I'll defer to David but
>> this feels useless to me.
>>
>>>
>>> --
>>> Cheers,
>>>
>>> David
>> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
  2026-05-07  7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun
  2026-05-07  7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
  2026-05-07  7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun
@ 2026-05-09  9:47 ` Lance Yang
  2026-05-11  2:06   ` Wandun
  2 siblings, 1 reply; 14+ messages in thread
From: Lance Yang @ 2026-05-09  9:47 UTC (permalink / raw)
  To: chenwandun1
  Cc: akpm, david, ljs, shuah, zokeefe, linux-kernel, linux-mm,
	linux-kselftest, Lance Yang

Hi,

scripts/get_maintainer.pl is your friend :)
Please use it to Cc the relevant maintainers and reviewers next time.

Cheers, Lance

On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote:
>madvise_collapse() computes a THP-aligned window from the caller's range:
>
>  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>  hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>
>When the caller's range is smaller than one PMD (2 MiB) and/or not
>PMD-aligned, hstart can end up greater than hend.  In that case the
>collapsing loop is correctly skipped, but the return value was computed
>as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
>subtraction wraps unsigned, producing a huge value, the comparison
>"thps != 0" fires, and -EINVAL is returned instead of 0.
>
>A concrete example:
>
>  /* both cover less than one THP; both should return 0 */
>  madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
>  madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */
>
>The fix moves the hstart/hend calculation before kmalloc_obj() and
>returns 0 early when hstart >= hend.  This also avoids the kmalloc,
>mmgrab(), and lru_add_drain_all() calls for ranges that trivially
>contain no PMD window.  The same effect could be achieved by only
>guarding the final return expression, but early-return keeps the
>no-op path free of the allocator and drain overhead.
>
>Patch 1 fixes the kernel bug.
>Patch 2 adds a selftest with two cases covering the hstart == hend
>(aligned, was already correct) and hstart > hend (unaligned, was
>broken) scenarios.
>
>Chen Wandun (2):
>  mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
>  selftests/mm: add MADV_COLLAPSE sub-PMD range tests
>
> mm/khugepaged.c                               |   9 +-
> tools/testing/selftests/mm/.gitignore         |   1 +
> tools/testing/selftests/mm/Makefile           |   2 +
> .../selftests/mm/ksft_madv_collapse.sh        |   4 +
> .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
> tools/testing/selftests/mm/run_vmtests.sh     |   5 +
> 6 files changed, 159 insertions(+), 3 deletions(-)
> create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
> create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
>
>-- 
>2.43.0
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
  2026-05-09  9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang
@ 2026-05-11  2:06   ` Wandun
  0 siblings, 0 replies; 14+ messages in thread
From: Wandun @ 2026-05-11  2:06 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ljs, shuah, zokeefe, linux-kernel, linux-mm,
	linux-kselftest



On 5/9/26 17:47, Lance Yang wrote:
> Hi,
>
> scripts/get_maintainer.pl is your friend :)
> Please use it to Cc the relevant maintainers and reviewers next time.
Many thanks for your kind reminder :)
I will do it next time.

Best regards,
Wandun
>
> Cheers, Lance
>
> On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote:
>> madvise_collapse() computes a THP-aligned window from the caller's range:
>>
>>   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>>   hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>>
>> When the caller's range is smaller than one PMD (2 MiB) and/or not
>> PMD-aligned, hstart can end up greater than hend.  In that case the
>> collapsing loop is correctly skipped, but the return value was computed
>> as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
>> subtraction wraps unsigned, producing a huge value, the comparison
>> "thps != 0" fires, and -EINVAL is returned instead of 0.
>>
>> A concrete example:
>>
>>   /* both cover less than one THP; both should return 0 */
>>   madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
>>   madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */
>>
>> The fix moves the hstart/hend calculation before kmalloc_obj() and
>> returns 0 early when hstart >= hend.  This also avoids the kmalloc,
>> mmgrab(), and lru_add_drain_all() calls for ranges that trivially
>> contain no PMD window.  The same effect could be achieved by only
>> guarding the final return expression, but early-return keeps the
>> no-op path free of the allocator and drain overhead.
>>
>> Patch 1 fixes the kernel bug.
>> Patch 2 adds a selftest with two cases covering the hstart == hend
>> (aligned, was already correct) and hstart > hend (unaligned, was
>> broken) scenarios.
>>
>> Chen Wandun (2):
>>   mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
>>   selftests/mm: add MADV_COLLAPSE sub-PMD range tests
>>
>> mm/khugepaged.c                               |   9 +-
>> tools/testing/selftests/mm/.gitignore         |   1 +
>> tools/testing/selftests/mm/Makefile           |   2 +
>> .../selftests/mm/ksft_madv_collapse.sh        |   4 +
>> .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
>> tools/testing/selftests/mm/run_vmtests.sh     |   5 +
>> 6 files changed, 159 insertions(+), 3 deletions(-)
>> create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
>> create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
>>
>> -- 
>> 2.43.0
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-05-11  2:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07  7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun
2026-05-07  7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
2026-05-08 12:27   ` David Hildenbrand (Arm)
2026-05-08 15:02     ` Lorenzo Stoakes
2026-05-08 15:04       ` Lorenzo Stoakes
2026-05-09  7:53         ` Wandun
2026-05-08 19:29       ` David Hildenbrand (Arm)
2026-05-09  7:04       ` Wandun
2026-05-09  5:56     ` Wandun
2026-05-07  7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun
2026-05-08 12:23   ` David Hildenbrand (Arm)
2026-05-08 15:03     ` Lorenzo Stoakes
2026-05-09  9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang
2026-05-11  2:06   ` Wandun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox