* [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
@ 2026-05-07 7:05 Chen Wandun
2026-05-07 7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Chen Wandun @ 2026-05-07 7:05 UTC (permalink / raw)
To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest
madvise_collapse() computes a THP-aligned window from the caller's range:
hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */
hend = end & HPAGE_PMD_MASK /* round down */
When the caller's range is smaller than one PMD (2 MiB) and/or not
PMD-aligned, hstart can end up greater than hend. In that case the
collapsing loop is correctly skipped, but the return value was computed
as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
subtraction wraps unsigned, producing a huge value, the comparison
"thps != 0" fires, and -EINVAL is returned instead of 0.
A concrete example:
/* both cover less than one THP; both should return 0 */
madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); /* OK, returns 0 */
madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */
The fix moves the hstart/hend calculation before kmalloc_obj() and
returns 0 early when hstart >= hend. This also avoids the kmalloc,
mmgrab(), and lru_add_drain_all() calls for ranges that trivially
contain no PMD window. The same effect could be achieved by only
guarding the final return expression, but early-return keeps the
no-op path free of the allocator and drain overhead.
Patch 1 fixes the kernel bug.
Patch 2 adds a selftest with two cases covering the hstart == hend
(aligned, was already correct) and hstart > hend (unaligned, was
broken) scenarios.
Chen Wandun (2):
mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
selftests/mm: add MADV_COLLAPSE sub-PMD range tests
mm/khugepaged.c | 9 +-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 2 +
.../selftests/mm/ksft_madv_collapse.sh | 4 +
.../selftests/mm/madv_collapse_range.c | 141 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 5 +
6 files changed, 159 insertions(+), 3 deletions(-)
create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-07 7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun @ 2026-05-07 7:05 ` Chen Wandun 2026-05-08 12:27 ` David Hildenbrand (Arm) 2026-05-07 7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun 2026-05-09 9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang 2 siblings, 1 reply; 14+ messages in thread From: Chen Wandun @ 2026-05-07 7:05 UTC (permalink / raw) To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest madvise_collapse() computes the THP-aligned window: hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ hend = end & HPAGE_PMD_MASK /* round down */ Previously this was done after kmalloc_obj(), so problem arose when the range contained no complete PMD-aligned window (hstart >= hend). When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the final comparison fails and -EINVAL is returned instead of 0. Consider two single-page calls on a 2 MiB-aligned address: /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); /* hstart = aligned + 2MiB, hend = aligned * (hend - hstart) wraps unsigned -> returns -EINVAL */ madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); Both calls cover less than one THP and collapse nothing; both should return 0. In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all called before discovering there was nothing to do, only for the code to kfree() and return immediately after. Fix both by computing hstart/hend after thp_vma_allowable_order() but before kmalloc_obj(), and returning 0 early when hstart >= hend. Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Signed-off-by: Chen Wandun <chenwandun@lixiang.com> --- mm/khugepaged.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b8452dbdb043..92473d93e837 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) return -EINVAL; + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + + if (hstart >= hend) + return 0; + cc = kmalloc_obj(*cc); if (!cc) return -ENOMEM; @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, mmgrab(mm); lru_add_drain_all(); - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; - hend = end & HPAGE_PMD_MASK; - for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { enum scan_result result = SCAN_FAIL; -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-07 7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun @ 2026-05-08 12:27 ` David Hildenbrand (Arm) 2026-05-08 15:02 ` Lorenzo Stoakes 2026-05-09 5:56 ` Wandun 0 siblings, 2 replies; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 12:27 UTC (permalink / raw) To: Chen Wandun, akpm, ljs, shuah, zokeefe Cc: linux-kernel, linux-mm, linux-kselftest On 5/7/26 09:05, Chen Wandun wrote: > madvise_collapse() computes the THP-aligned window: > > hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ > hend = end & HPAGE_PMD_MASK /* round down */ > > Previously this was done after kmalloc_obj(), so problem arose when > the range contained no complete PMD-aligned window (hstart >= hend). > > When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the > final comparison fails and -EINVAL is returned instead of 0. Consider > two single-page calls on a 2 MiB-aligned address: > > /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ > madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); > > /* hstart = aligned + 2MiB, hend = aligned > * (hend - hstart) wraps unsigned -> returns -EINVAL */ > madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); > > Both calls cover less than one THP and collapse nothing; both should > return 0. Okay, so we talk about a "userspace is being stupid" scenario. > > In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all > called before discovering there was nothing to do, only for the code > to kfree() and return immediately after. Just a comment as you motivate here why this is suboptimal: we do not care about a "userspace is being stupid" scenario being fast. > > Fix both by computing hstart/hend after thp_vma_allowable_order() but > before kmalloc_obj(), and returning 0 early when hstart >= hend. > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC stable. > Signed-off-by: Chen Wandun <chenwandun@lixiang.com> > --- > mm/khugepaged.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index b8452dbdb043..92473d93e837 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) > return -EINVAL; > > + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > + hend = end & HPAGE_PMD_MASK; > + > + if (hstart >= hend) > + return 0; > + > cc = kmalloc_obj(*cc); > if (!cc) > return -ENOMEM; > @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > mmgrab(mm); > lru_add_drain_all(); > > - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > - hend = end & HPAGE_PMD_MASK; > - > for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { > enum scan_result result = SCAN_FAIL; > In general, LGTM, but see for conflict: https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 12:27 ` David Hildenbrand (Arm) @ 2026-05-08 15:02 ` Lorenzo Stoakes 2026-05-08 15:04 ` Lorenzo Stoakes ` (2 more replies) 2026-05-09 5:56 ` Wandun 1 sibling, 3 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2026-05-08 15:02 UTC (permalink / raw) To: Chen Wandun Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote: > On 5/7/26 09:05, Chen Wandun wrote: > > madvise_collapse() computes the THP-aligned window: > > > > hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ > > hend = end & HPAGE_PMD_MASK /* round down */ > > > > Previously this was done after kmalloc_obj(), so problem arose when > > the range contained no complete PMD-aligned window (hstart >= hend). > > > > When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the > > final comparison fails and -EINVAL is returned instead of 0. Consider I think both should return -EINVAL. > > two single-page calls on a 2 MiB-aligned address: > > > > /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ > > madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address? > > > > /* hstart = aligned + 2MiB, hend = aligned > > * (hend - hstart) wraps unsigned -> returns -EINVAL */ > > madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); > > > > Both calls cover less than one THP and collapse nothing; both should > > return 0. Disagree. > > Okay, so we talk about a "userspace is being stupid" scenario. Yes! I feel that -EINVAL is correct for hend > hstart, and I think it might even be a userland A[BP]I break to change it (maybe somebody, somewhere is being foolish enough to use this to also validate input ranges). The weirdness is when hstart == hend being 0 but that's sort of established behaviour I guess. > > > > > In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all > > called before discovering there was nothing to do, only for the code > > to kfree() and return immediately after. > > Just a comment as you motivate here why this is suboptimal: we do not care about > a "userspace is being stupid" scenario being fast. Yes, in general - so what? The user is doing stupid things, so the user wins stupid prizes? > > > > > Fix both by computing hstart/hend after thp_vma_allowable_order() but > > before kmalloc_obj(), and returning 0 early when hstart >= hend. > > > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > > Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC > stable. I'm not sure I want a fixes here, this isn't really fixing anything. This isn't a bug afaik, it's just us not handling this brilliantly, but (possibly by mistake) getting the right output. > > > Signed-off-by: Chen Wandun <chenwandun@lixiang.com> I put this patch through AI detection and it's telling me there's an 80% chance this whole thing is LLM-generated, which is making me grumpy. Can you confirm that this is, in fact, your own work? Plagiarism is not a nice thing to do, and THP doesn't need more traffic, we're overloaded as it is. > > --- > > mm/khugepaged.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index b8452dbdb043..92473d93e837 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > > if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) > > return -EINVAL; > > > > + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > + hend = end & HPAGE_PMD_MASK; See below re: conflict. > > + > > + if (hstart >= hend) > > + return 0; if (hstart > hend) return -EINVAL; /* For compatibility, users may rely on this. */ if (hstart == hend) return 0; Is probably better. But I'm not sure what the point is if we're already doing this behaviour? > > + > > cc = kmalloc_obj(*cc); > > if (!cc) > > return -ENOMEM; > > @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > > mmgrab(mm); > > lru_add_drain_all(); > > > > - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > - hend = end & HPAGE_PMD_MASK; > > - > > for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { > > enum scan_result result = SCAN_FAIL; > > > > In general, LGTM, but see for conflict: > https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ Please use mm-unstable as a basis for your mm work Chen, this is something you need to fix, the patch above has been around for a while and is in mm-unstable. You have patches in mm already so you should know better by now. But I'm really not sure I'm in favour of this anyway. I'll defer to David but this feels useless to me. > > > -- > Cheers, > > David Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 15:02 ` Lorenzo Stoakes @ 2026-05-08 15:04 ` Lorenzo Stoakes 2026-05-09 7:53 ` Wandun 2026-05-08 19:29 ` David Hildenbrand (Arm) 2026-05-09 7:04 ` Wandun 2 siblings, 1 reply; 14+ messages in thread From: Lorenzo Stoakes @ 2026-05-08 15:04 UTC (permalink / raw) To: Chen Wandun Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On Fri, May 08, 2026 at 04:02:37PM +0100, Lorenzo Stoakes wrote: > On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote: > > On 5/7/26 09:05, Chen Wandun wrote: > > > madvise_collapse() computes the THP-aligned window: > > > > > > hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ > > > hend = end & HPAGE_PMD_MASK /* round down */ > > > > > > Previously this was done after kmalloc_obj(), so problem arose when > > > the range contained no complete PMD-aligned window (hstart >= hend). > > > > > > When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the > > > final comparison fails and -EINVAL is returned instead of 0. Consider > > I think both should return -EINVAL. Correction: I changed my mind (see below), and think == should return 0 simply for compatibility reasons. Though honestly both really should have been -EINVAL from the start... > > > > two single-page calls on a 2 MiB-aligned address: > > > > > > /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ > > > madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); > > What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address? > > > > > > > /* hstart = aligned + 2MiB, hend = aligned > > > * (hend - hstart) wraps unsigned -> returns -EINVAL */ > > > madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); > > > > > > Both calls cover less than one THP and collapse nothing; both should > > > return 0. > > Disagree. > > > > > Okay, so we talk about a "userspace is being stupid" scenario. > > Yes! > > I feel that -EINVAL is correct for hend > hstart, and I think it might even be a > userland A[BP]I break to change it (maybe somebody, somewhere is being foolish > enough to use this to also validate input ranges). > > The weirdness is when hstart == hend being 0 but that's sort of established > behaviour I guess. > > > > > > > > > In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all > > > called before discovering there was nothing to do, only for the code > > > to kfree() and return immediately after. > > > > Just a comment as you motivate here why this is suboptimal: we do not care about > > a "userspace is being stupid" scenario being fast. > > Yes, in general - so what? The user is doing stupid things, so the user wins > stupid prizes? > > > > > > > > > Fix both by computing hstart/hend after thp_vma_allowable_order() but > > > before kmalloc_obj(), and returning 0 early when hstart >= hend. > > > > > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > > > > Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC > > stable. > > I'm not sure I want a fixes here, this isn't really fixing anything. This isn't > a bug afaik, it's just us not handling this brilliantly, but (possibly by > mistake) getting the right output. > > > > > > Signed-off-by: Chen Wandun <chenwandun@lixiang.com> > > I put this patch through AI detection and it's telling me there's an 80% chance > this whole thing is LLM-generated, which is making me grumpy. > > Can you confirm that this is, in fact, your own work? Plagiarism is not a nice > thing to do, and THP doesn't need more traffic, we're overloaded as it is. > > > > --- > > > mm/khugepaged.c | 9 ++++++--- > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index b8452dbdb043..92473d93e837 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > > > if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) > > > return -EINVAL; > > > > > > + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > + hend = end & HPAGE_PMD_MASK; > > See below re: conflict. > > > > + > > > + if (hstart >= hend) > > > + return 0; > > if (hstart > hend) > return -EINVAL; > /* For compatibility, users may rely on this. */ > if (hstart == hend) > return 0; > > Is probably better. > > But I'm not sure what the point is if we're already doing this behaviour? > > > > + > > > cc = kmalloc_obj(*cc); > > > if (!cc) > > > return -ENOMEM; > > > @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, > > > mmgrab(mm); > > > lru_add_drain_all(); > > > > > > - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > - hend = end & HPAGE_PMD_MASK; > > > - > > > for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { > > > enum scan_result result = SCAN_FAIL; > > > > > > > In general, LGTM, but see for conflict: > > https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ > > Please use mm-unstable as a basis for your mm work Chen, this is something you > need to fix, the patch above has been around for a while and is in > mm-unstable. > > You have patches in mm already so you should know better by now. > > But I'm really not sure I'm in favour of this anyway. I'll defer to David but > this feels useless to me. > > > > > > > -- > > Cheers, > > > > David > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 15:04 ` Lorenzo Stoakes @ 2026-05-09 7:53 ` Wandun 0 siblings, 0 replies; 14+ messages in thread From: Wandun @ 2026-05-09 7:53 UTC (permalink / raw) To: Lorenzo Stoakes Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On 5/8/26 23:04, Lorenzo Stoakes wrote: > On Fri, May 08, 2026 at 04:02:37PM +0100, Lorenzo Stoakes wrote: >> On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote: >>> On 5/7/26 09:05, Chen Wandun wrote: >>>> madvise_collapse() computes the THP-aligned window: >>>> >>>> hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ >>>> hend = end & HPAGE_PMD_MASK /* round down */ >>>> >>>> Previously this was done after kmalloc_obj(), so problem arose when >>>> the range contained no complete PMD-aligned window (hstart >= hend). >>>> >>>> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the >>>> final comparison fails and -EINVAL is returned instead of 0. Consider >> I think both should return -EINVAL. > Correction: I changed my mind (see below), and think == should return 0 simply > for compatibility reasons. Though honestly both really should have been -EINVAL > from the start... I also hesitated before return -EINVAL or 0, for the two cases mentioned above, it didn't collapse into any THP, so I wanted to return -EINVAL. Later, I saw the function comment for do_madvise and madvise manual, so I thought the cases mentioned above didn't meet the conditions for returning -EINVAL, so I followed madvise manual and do_madvise comment. Best regards, Wandun > >>>> two single-page calls on a 2 MiB-aligned address: >>>> >>>> /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ >>>> madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); >> What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address? >> >>>> /* hstart = aligned + 2MiB, hend = aligned >>>> * (hend - hstart) wraps unsigned -> returns -EINVAL */ >>>> madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); >>>> >>>> Both calls cover less than one THP and collapse nothing; both should >>>> return 0. >> Disagree. >> >>> Okay, so we talk about a "userspace is being stupid" scenario. >> Yes! >> >> I feel that -EINVAL is correct for hend > hstart, and I think it might even be a >> userland A[BP]I break to change it (maybe somebody, somewhere is being foolish >> enough to use this to also validate input ranges). >> >> The weirdness is when hstart == hend being 0 but that's sort of established >> behaviour I guess. >> >>>> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all >>>> called before discovering there was nothing to do, only for the code >>>> to kfree() and return immediately after. >>> Just a comment as you motivate here why this is suboptimal: we do not care about >>> a "userspace is being stupid" scenario being fast. >> Yes, in general - so what? The user is doing stupid things, so the user wins >> stupid prizes? >> >>>> Fix both by computing hstart/hend after thp_vma_allowable_order() but >>>> before kmalloc_obj(), and returning 0 early when hstart >= hend. >>>> >>>> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") >>> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC >>> stable. >> I'm not sure I want a fixes here, this isn't really fixing anything. This isn't >> a bug afaik, it's just us not handling this brilliantly, but (possibly by >> mistake) getting the right output. >> >>>> Signed-off-by: Chen Wandun <chenwandun@lixiang.com> >> I put this patch through AI detection and it's telling me there's an 80% chance >> this whole thing is LLM-generated, which is making me grumpy. >> >> Can you confirm that this is, in fact, your own work? Plagiarism is not a nice >> thing to do, and THP doesn't need more traffic, we're overloaded as it is. >> >>>> --- >>>> mm/khugepaged.c | 9 ++++++--- >>>> 1 file changed, 6 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>> index b8452dbdb043..92473d93e837 100644 >>>> --- a/mm/khugepaged.c >>>> +++ b/mm/khugepaged.c >>>> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >>>> if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) >>>> return -EINVAL; >>>> >>>> + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >>>> + hend = end & HPAGE_PMD_MASK; >> See below re: conflict. >> >>>> + >>>> + if (hstart >= hend) >>>> + return 0; >> if (hstart > hend) >> return -EINVAL; >> /* For compatibility, users may rely on this. */ >> if (hstart == hend) >> return 0; >> >> Is probably better. >> >> But I'm not sure what the point is if we're already doing this behaviour? >> >>>> + >>>> cc = kmalloc_obj(*cc); >>>> if (!cc) >>>> return -ENOMEM; >>>> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >>>> mmgrab(mm); >>>> lru_add_drain_all(); >>>> >>>> - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >>>> - hend = end & HPAGE_PMD_MASK; >>>> - >>>> for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { >>>> enum scan_result result = SCAN_FAIL; >>>> >>> In general, LGTM, but see for conflict: >>> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ >> Please use mm-unstable as a basis for your mm work Chen, this is something you >> need to fix, the patch above has been around for a while and is in >> mm-unstable. >> >> You have patches in mm already so you should know better by now. >> >> But I'm really not sure I'm in favour of this anyway. I'll defer to David but >> this feels useless to me. >> >>> >>> -- >>> Cheers, >>> >>> David >> Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 15:02 ` Lorenzo Stoakes 2026-05-08 15:04 ` Lorenzo Stoakes @ 2026-05-08 19:29 ` David Hildenbrand (Arm) 2026-05-09 7:04 ` Wandun 2 siblings, 0 replies; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 19:29 UTC (permalink / raw) To: Lorenzo Stoakes, Chen Wandun Cc: akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest >> >> In general, LGTM, but see for conflict: >> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ > > Please use mm-unstable as a basis for your mm work Chen, this is something you > need to fix, the patch above has been around for a while and is in > mm-unstable. > > You have patches in mm already so you should know better by now. > > But I'm really not sure I'm in favour of this anyway. I'll defer to David but > this feels useless to me. I don't enjoy the underflow in the return statement, so I think we should just clean that up. -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 15:02 ` Lorenzo Stoakes 2026-05-08 15:04 ` Lorenzo Stoakes 2026-05-08 19:29 ` David Hildenbrand (Arm) @ 2026-05-09 7:04 ` Wandun 2 siblings, 0 replies; 14+ messages in thread From: Wandun @ 2026-05-09 7:04 UTC (permalink / raw) To: Lorenzo Stoakes Cc: David Hildenbrand (Arm), akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On 5/8/26 23:02, Lorenzo Stoakes wrote: > On Fri, May 08, 2026 at 02:27:37PM +0200, David Hildenbrand (Arm) wrote: >> On 5/7/26 09:05, Chen Wandun wrote: >>> madvise_collapse() computes the THP-aligned window: >>> >>> hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ >>> hend = end & HPAGE_PMD_MASK /* round down */ >>> >>> Previously this was done after kmalloc_obj(), so problem arose when >>> the range contained no complete PMD-aligned window (hstart >= hend). >>> >>> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the >>> final comparison fails and -EINVAL is returned instead of 0. Consider > I think both should return -EINVAL. > >>> two single-page calls on a 2 MiB-aligned address: >>> >>> /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ >>> madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); > What's aligned? You're putting a random variable name in there? Presumably a PMD-aligned address? Yes, PMD-aligned address. > >>> /* hstart = aligned + 2MiB, hend = aligned >>> * (hend - hstart) wraps unsigned -> returns -EINVAL */ >>> madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); >>> >>> Both calls cover less than one THP and collapse nothing; both should >>> return 0. > Disagree. > >> Okay, so we talk about a "userspace is being stupid" scenario. > Yes! > > I feel that -EINVAL is correct for hend > hstart, and I think it might even be a > userland A[BP]I break to change it (maybe somebody, somewhere is being foolish > enough to use this to also validate input ranges). > > The weirdness is when hstart == hend being 0 but that's sort of established > behaviour I guess. > >>> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all >>> called before discovering there was nothing to do, only for the code >>> to kfree() and return immediately after. >> Just a comment as you motivate here why this is suboptimal: we do not care about >> a "userspace is being stupid" scenario being fast. > Yes, in general - so what? The user is doing stupid things, so the user wins > stupid prizes? > >>> Fix both by computing hstart/hend after thp_vma_allowable_order() but >>> before kmalloc_obj(), and returning 0 early when hstart >= hend. >>> >>> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") >> Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC >> stable. > I'm not sure I want a fixes here, this isn't really fixing anything. This isn't > a bug afaik, it's just us not handling this brilliantly, but (possibly by > mistake) getting the right output. Yes, I also thinks this patch only fixes minor issue or cosidered a clean-up. I would drop this Fixes tag in v2 to avoid any confusion. > >>> Signed-off-by: Chen Wandun <chenwandun@lixiang.com> > I put this patch through AI detection and it's telling me there's an 80% chance > this whole thing is LLM-generated, which is making me grumpy. > > Can you confirm that this is, in fact, your own work? Plagiarism is not a nice > thing to do, and THP doesn't need more traffic, we're overloaded as it is. I can confirm this patch is my own work,I found the issue and wrote this patch myself. The issue was found when I noticed THP pages were still being generated even after adding "transparent_hugepage=never" to the cmdline, after debugging, and finally found this was due to madvise + collapse path, while reviewing code I found this minor issue and wrote this patch. I did use an LLM, but only to check the commit message to find spelling/grammar errors and improve readability. I fully understand your concern about the traffic, I will be more careful about what I send to the list. > >>> --- >>> mm/khugepaged.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> index b8452dbdb043..92473d93e837 100644 >>> --- a/mm/khugepaged.c >>> +++ b/mm/khugepaged.c >>> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >>> if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) >>> return -EINVAL; >>> >>> + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >>> + hend = end & HPAGE_PMD_MASK; > See below re: conflict. > >>> + >>> + if (hstart >= hend) >>> + return 0; > if (hstart > hend) > return -EINVAL; > /* For compatibility, users may rely on this. */ > if (hstart == hend) > return 0; > > Is probably better. > > But I'm not sure what the point is if we're already doing this behaviour? > >>> + >>> cc = kmalloc_obj(*cc); >>> if (!cc) >>> return -ENOMEM; >>> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >>> mmgrab(mm); >>> lru_add_drain_all(); >>> >>> - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >>> - hend = end & HPAGE_PMD_MASK; >>> - >>> for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { >>> enum scan_result result = SCAN_FAIL; >>> >> In general, LGTM, but see for conflict: >> https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ > Please use mm-unstable as a basis for your mm work Chen, this is something you > need to fix, the patch above has been around for a while and is in > mm-unstable. > > You have patches in mm already so you should know better by now. Apologies for not basing this on mm-unstable, I'll fix in v2. Thanks for your review. Best regards, Wandun > > But I'm really not sure I'm in favour of this anyway. I'll defer to David but > this feels useless to me. > >> >> -- >> Cheers, >> >> David > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range 2026-05-08 12:27 ` David Hildenbrand (Arm) 2026-05-08 15:02 ` Lorenzo Stoakes @ 2026-05-09 5:56 ` Wandun 1 sibling, 0 replies; 14+ messages in thread From: Wandun @ 2026-05-09 5:56 UTC (permalink / raw) To: David Hildenbrand (Arm), akpm, ljs, shuah, zokeefe Cc: linux-kernel, linux-mm, linux-kselftest On 5/8/26 20:27, David Hildenbrand (Arm) wrote: > On 5/7/26 09:05, Chen Wandun wrote: >> madvise_collapse() computes the THP-aligned window: >> >> hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ >> hend = end & HPAGE_PMD_MASK /* round down */ >> >> Previously this was done after kmalloc_obj(), so problem arose when >> the range contained no complete PMD-aligned window (hstart >= hend). >> >> When hstart > hend, (hend - hstart) wraps unsigned to a huge value, the >> final comparison fails and -EINVAL is returned instead of 0. Consider >> two single-page calls on a 2 MiB-aligned address: >> >> /* hstart == hend == aligned -> 0 == 0 -> returns 0 */ >> madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); >> >> /* hstart = aligned + 2MiB, hend = aligned >> * (hend - hstart) wraps unsigned -> returns -EINVAL */ >> madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); >> >> Both calls cover less than one THP and collapse nothing; both should >> return 0. > Okay, so we talk about a "userspace is being stupid" scenario. > >> In addition, kmalloc_obj(), mmgrab() and lru_add_drain_all() were all >> called before discovering there was nothing to do, only for the code >> to kfree() and return immediately after. > Just a comment as you motivate here why this is suboptimal: we do not care about > a "userspace is being stupid" scenario being fast. > >> Fix both by computing hstart/hend after thp_vma_allowable_order() but >> before kmalloc_obj(), and returning 0 early when hstart >= hend. >> >> Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Fixes: is likely ok, but I don't think we want to treat this as a hotfix or CC > stable. Yes, agree, I would drop this Fixes tag in v2 to avoid any confusion. > >> Signed-off-by: Chen Wandun <chenwandun@lixiang.com> >> --- >> mm/khugepaged.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index b8452dbdb043..92473d93e837 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -2836,6 +2836,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >> if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_FORCED_COLLAPSE, PMD_ORDER)) >> return -EINVAL; >> >> + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >> + hend = end & HPAGE_PMD_MASK; >> + >> + if (hstart >= hend) >> + return 0; >> + >> cc = kmalloc_obj(*cc); >> if (!cc) >> return -ENOMEM; >> @@ -2845,9 +2851,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, >> mmgrab(mm); >> lru_add_drain_all(); >> >> - hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; >> - hend = end & HPAGE_PMD_MASK; >> - >> for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) { >> enum scan_result result = SCAN_FAIL; >> > In general, LGTM, but see for conflict: > https://lore.kernel.org/all/20260409014323.2385982-1-ye.liu@linux.dev/ Thanks for your review, I will fix the conflict and send v2 version. Best regards, Wandun > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests 2026-05-07 7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun 2026-05-07 7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun @ 2026-05-07 7:05 ` Chen Wandun 2026-05-08 12:23 ` David Hildenbrand (Arm) 2026-05-09 9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang 2 siblings, 1 reply; 14+ messages in thread From: Chen Wandun @ 2026-05-07 7:05 UTC (permalink / raw) To: akpm, david, ljs, shuah, zokeefe; +Cc: linux-kernel, linux-mm, linux-kselftest Add madv_collapse_range to verify the fix for the spurious -EINVAL returned by madvise_collapse() when the madvised range contains no complete PMD-aligned window. madvise_collapse() rounds the caller range inward to PMD boundaries: hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK // round up hend = end & HPAGE_PMD_MASK // round down When hstart >= hend there is no PMD window to collapse. Previously the final expression computed (hend - hstart) without guarding against hstart > hend, causing unsigned wrap-around and a spurious -EINVAL. Both tests expect 0: "no PMD window to collapse" is a successful no-op. Test 1 - aligned start (hstart == hend): start = 2MiB-aligned, len = PAGE_SIZE Both hstart and hend round to the same 2MiB boundary. Was already correct; included as a regression reference. Test 2 - unaligned start (hstart > hend): start = aligned + PAGE_SIZE, len = PAGE_SIZE hstart rounds up past the next 2MiB boundary while hend rounds down below start. (hend - hstart) wrapped unsigned, causing the final comparison to fail and return -EINVAL instead of 0. Signed-off-by: Chen Wandun <chenwandun@lixiang.com> --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 2 + .../selftests/mm/ksft_madv_collapse.sh | 4 + .../selftests/mm/madv_collapse_range.c | 141 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + 5 files changed, 153 insertions(+) create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index b0c30c5ee9e3..a24f8c3cf3dc 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -28,6 +28,7 @@ protection_keys protection_keys_32 protection_keys_64 madv_populate +madv_collapse_range uffd-stress uffd-unit-tests uffd-wp-mremap diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index cd24596cdd27..758639f8ae8e 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -68,6 +68,7 @@ TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm TEST_GEN_FILES += hugepage-vmemmap TEST_GEN_FILES += khugepaged +TEST_GEN_FILES += madv_collapse_range TEST_GEN_FILES += madv_populate TEST_GEN_FILES += map_fixed_noreplace TEST_GEN_FILES += map_hugetlb @@ -153,6 +154,7 @@ TEST_PROGS += ksft_hugetlb.sh TEST_PROGS += ksft_hugevm.sh TEST_PROGS += ksft_ksm.sh TEST_PROGS += ksft_ksm_numa.sh +TEST_PROGS += ksft_madv_collapse.sh TEST_PROGS += ksft_madv_guard.sh TEST_PROGS += ksft_madv_populate.sh TEST_PROGS += ksft_memfd_secret.sh diff --git a/tools/testing/selftests/mm/ksft_madv_collapse.sh b/tools/testing/selftests/mm/ksft_madv_collapse.sh new file mode 100755 index 000000000000..0d0b0356cbd0 --- /dev/null +++ b/tools/testing/selftests/mm/ksft_madv_collapse.sh @@ -0,0 +1,4 @@ +#!/bin/sh -e +# SPDX-License-Identifier: GPL-2.0 + +./run_vmtests.sh -t madv_collapse diff --git a/tools/testing/selftests/mm/madv_collapse_range.c b/tools/testing/selftests/mm/madv_collapse_range.c new file mode 100644 index 000000000000..11850be80dd8 --- /dev/null +++ b/tools/testing/selftests/mm/madv_collapse_range.c @@ -0,0 +1,141 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Tests for MADV_COLLAPSE behavior when the madvise range contains no + * complete PMD-aligned window (range smaller than 2 MiB). + * + * madvise_collapse() rounds the caller range inward to PMD boundaries: + * + * hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK // round up + * hend = end & HPAGE_PMD_MASK // round down + * + * When hstart >= hend the collapsing loop is not entered. Previously, + * the final return expression computed (hend - hstart) without guarding + * against hstart > hend, causing unsigned wrap-around and a spurious + * -EINVAL. Both tests expect 0: "no PMD window to collapse" is a + * successful no-op, not an error. + * + * Test 1: aligned start (hstart == hend): + * start = 2MiB-aligned, len = PAGE_SIZE + * hstart = aligned, hend = aligned -> 0 == 0 -> 0 (was already correct) + * + * Test 2: unaligned start (hstart > hend): + * start = aligned + PAGE_SIZE, len = PAGE_SIZE + * hstart = aligned + 2MiB, hend = aligned + * (hend - hstart) wraps unsigned -> was -EINVAL, fixed to 0 + */ +#define _GNU_SOURCE +#include <errno.h> +#include <string.h> +#include <unistd.h> +#include <sys/mman.h> +#include <linux/mman.h> + +#include "kselftest.h" +#include "vm_util.h" + +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + +static unsigned long page_size; +static unsigned long hpage_size; + +/* + * Test 1: 2MiB-aligned start, len = PAGE_SIZE. + * hstart == hend -> 0 + */ +static void test_subpmd_aligned(char *aligned) +{ + int ret; + + ksft_print_msg("[RUN] sub-PMD: 2MiB-aligned start, len=PAGE_SIZE\n"); + ret = madvise(aligned, page_size, MADV_COLLAPSE); + ksft_test_result(ret == 0, + "sub-PMD aligned start returns 0 (ret=%d errno=%d)\n", + ret, ret ? errno : 0); +} + +/* + * Test 2: start = aligned + PAGE_SIZE, len = PAGE_SIZE. + * hstart = aligned + hpage_size > hend = aligned + * unsigned wrap was -EINVAL; correct answer is 0. + */ +static void test_subpmd_unaligned(char *aligned) +{ + int ret; + + ksft_print_msg("[RUN] sub-PMD: unaligned start (aligned+PAGE), len=PAGE_SIZE\n"); + ksft_print_msg(" hstart=%p > hend=%p\n", + (void *)(aligned + hpage_size), (void *)aligned); + + ret = madvise(aligned + page_size, page_size, MADV_COLLAPSE); + if (ret && errno == EINVAL) + ksft_print_msg(" got -EINVAL: unsigned-wrap bug not fixed\n"); + ksft_test_result(ret == 0, + "sub-PMD unaligned start returns 0 (ret=%d errno=%d)\n", + ret, ret ? errno : 0); +} + +int main(void) +{ + char *base, *aligned; + unsigned long map_size; + int probe_ret; + + ksft_print_header(); + ksft_set_plan(2); + + page_size = (unsigned long)getpagesize(); + hpage_size = (unsigned long)read_pmd_pagesize(); + if (!hpage_size) + ksft_exit_skip("transparent hugepages not available\n"); + + /* + * Probe: map one hpage-sized region, touch all pages, and attempt a + * real collapse to confirm MADV_COLLAPSE is supported. EAGAIN is a + * transient resource failure and still counts as "available". + */ + map_size = 2 * hpage_size; + base = mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (base == MAP_FAILED) + ksft_exit_fail_msg("probe mmap failed: %s\n", strerror(errno)); + + aligned = (char *)(((unsigned long)base + hpage_size - 1) + & ~(hpage_size - 1)); + + for (unsigned long i = 0; i < hpage_size; i += page_size) + aligned[i] = 0; + + probe_ret = madvise(aligned, hpage_size, MADV_COLLAPSE); + munmap(base, map_size); + if (probe_ret && errno != EAGAIN) + ksft_exit_skip("MADV_COLLAPSE not available: %s\n", + strerror(errno)); + + /* + * Both sub-PMD tests share a single 2 * hpage mapping so that + * every test range falls within the same VMA. + */ + map_size = 2 * hpage_size; + base = mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (base == MAP_FAILED) + ksft_exit_fail_msg("mmap failed: %s\n", strerror(errno)); + + for (unsigned long i = 0; i < map_size; i += page_size) + base[i] = 0; + + aligned = (char *)(((unsigned long)base + hpage_size - 1) + & ~(hpage_size - 1)); + + test_subpmd_aligned(aligned); + test_subpmd_unaligned(aligned); + + munmap(base, map_size); + + if (ksft_get_fail_cnt()) + ksft_exit_fail_msg("%d out of %d tests failed\n", + ksft_get_fail_cnt(), ksft_test_num()); + ksft_exit_pass(); +} diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index d8468451b3a3..58402f8261e0 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -53,6 +53,8 @@ separated by spaces: test madvise(2) MADV_GUARD_INSTALL and MADV_GUARD_REMOVE options - madv_populate test memadvise(2) MADV_POPULATE_{READ,WRITE} options +- madv_collapse + test madvise(2) MADV_COLLAPSE sub-PMD range handling - memfd_secret test memfd_secret(2) - process_mrelease @@ -422,6 +424,9 @@ CATEGORY="madv_guard" run_test ./guard-regions # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests CATEGORY="madv_populate" run_test ./madv_populate +# MADV_COLLAPSE sub-PMD range tests +CATEGORY="madv_collapse" run_test ./madv_collapse_range + # PROCESS_MADV test CATEGORY="process_madv" run_test ./process_madv -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests 2026-05-07 7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun @ 2026-05-08 12:23 ` David Hildenbrand (Arm) 2026-05-08 15:03 ` Lorenzo Stoakes 0 siblings, 1 reply; 14+ messages in thread From: David Hildenbrand (Arm) @ 2026-05-08 12:23 UTC (permalink / raw) To: Chen Wandun, akpm, ljs, shuah, zokeefe Cc: linux-kernel, linux-mm, linux-kselftest On 5/7/26 09:05, Chen Wandun wrote: > Add madv_collapse_range to verify the fix for the spurious -EINVAL > returned by madvise_collapse() when the madvised range contains no > complete PMD-aligned window. Do we really need 141 LOC of test to verify a "userspace is being stupid" scenario? Nothing was really broken, it's just a suboptimal return value, or what am I missing? -- Cheers, David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests 2026-05-08 12:23 ` David Hildenbrand (Arm) @ 2026-05-08 15:03 ` Lorenzo Stoakes 0 siblings, 0 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2026-05-08 15:03 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Chen Wandun, akpm, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On Fri, May 08, 2026 at 02:23:41PM +0200, David Hildenbrand (Arm) wrote: > On 5/7/26 09:05, Chen Wandun wrote: > > Add madv_collapse_range to verify the fix for the spurious -EINVAL > > returned by madvise_collapse() when the madvised range contains no > > complete PMD-aligned window. > > Do we really need 141 LOC of test to verify a "userspace is being stupid" scenario? > > Nothing was really broken, it's just a suboptimal return value, or what am I > missing? Agreed, drop this please. > > -- > Cheers, > > David Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling 2026-05-07 7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun 2026-05-07 7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun 2026-05-07 7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun @ 2026-05-09 9:47 ` Lance Yang 2026-05-11 2:06 ` Wandun 2 siblings, 1 reply; 14+ messages in thread From: Lance Yang @ 2026-05-09 9:47 UTC (permalink / raw) To: chenwandun1 Cc: akpm, david, ljs, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest, Lance Yang Hi, scripts/get_maintainer.pl is your friend :) Please use it to Cc the relevant maintainers and reviewers next time. Cheers, Lance On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote: >madvise_collapse() computes a THP-aligned window from the caller's range: > > hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ > hend = end & HPAGE_PMD_MASK /* round down */ > >When the caller's range is smaller than one PMD (2 MiB) and/or not >PMD-aligned, hstart can end up greater than hend. In that case the >collapsing loop is correctly skipped, but the return value was computed >as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the >subtraction wraps unsigned, producing a huge value, the comparison >"thps != 0" fires, and -EINVAL is returned instead of 0. > >A concrete example: > > /* both cover less than one THP; both should return 0 */ > madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); /* OK, returns 0 */ > madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */ > >The fix moves the hstart/hend calculation before kmalloc_obj() and >returns 0 early when hstart >= hend. This also avoids the kmalloc, >mmgrab(), and lru_add_drain_all() calls for ranges that trivially >contain no PMD window. The same effect could be achieved by only >guarding the final return expression, but early-return keeps the >no-op path free of the allocator and drain overhead. > >Patch 1 fixes the kernel bug. >Patch 2 adds a selftest with two cases covering the hstart == hend >(aligned, was already correct) and hstart > hend (unaligned, was >broken) scenarios. > >Chen Wandun (2): > mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range > selftests/mm: add MADV_COLLAPSE sub-PMD range tests > > mm/khugepaged.c | 9 +- > tools/testing/selftests/mm/.gitignore | 1 + > tools/testing/selftests/mm/Makefile | 2 + > .../selftests/mm/ksft_madv_collapse.sh | 4 + > .../selftests/mm/madv_collapse_range.c | 141 ++++++++++++++++++ > tools/testing/selftests/mm/run_vmtests.sh | 5 + > 6 files changed, 159 insertions(+), 3 deletions(-) > create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh > create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c > >-- >2.43.0 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling 2026-05-09 9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang @ 2026-05-11 2:06 ` Wandun 0 siblings, 0 replies; 14+ messages in thread From: Wandun @ 2026-05-11 2:06 UTC (permalink / raw) To: Lance Yang Cc: akpm, david, ljs, shuah, zokeefe, linux-kernel, linux-mm, linux-kselftest On 5/9/26 17:47, Lance Yang wrote: > Hi, > > scripts/get_maintainer.pl is your friend :) > Please use it to Cc the relevant maintainers and reviewers next time. Many thanks for your kind reminder :) I will do it next time. Best regards, Wandun > > Cheers, Lance > > On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote: >> madvise_collapse() computes a THP-aligned window from the caller's range: >> >> hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK /* round up */ >> hend = end & HPAGE_PMD_MASK /* round down */ >> >> When the caller's range is smaller than one PMD (2 MiB) and/or not >> PMD-aligned, hstart can end up greater than hend. In that case the >> collapsing loop is correctly skipped, but the return value was computed >> as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the >> subtraction wraps unsigned, producing a huge value, the comparison >> "thps != 0" fires, and -EINVAL is returned instead of 0. >> >> A concrete example: >> >> /* both cover less than one THP; both should return 0 */ >> madvise(aligned, PAGE_SIZE, MADV_COLLAPSE); /* OK, returns 0 */ >> madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */ >> >> The fix moves the hstart/hend calculation before kmalloc_obj() and >> returns 0 early when hstart >= hend. This also avoids the kmalloc, >> mmgrab(), and lru_add_drain_all() calls for ranges that trivially >> contain no PMD window. The same effect could be achieved by only >> guarding the final return expression, but early-return keeps the >> no-op path free of the allocator and drain overhead. >> >> Patch 1 fixes the kernel bug. >> Patch 2 adds a selftest with two cases covering the hstart == hend >> (aligned, was already correct) and hstart > hend (unaligned, was >> broken) scenarios. >> >> Chen Wandun (2): >> mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range >> selftests/mm: add MADV_COLLAPSE sub-PMD range tests >> >> mm/khugepaged.c | 9 +- >> tools/testing/selftests/mm/.gitignore | 1 + >> tools/testing/selftests/mm/Makefile | 2 + >> .../selftests/mm/ksft_madv_collapse.sh | 4 + >> .../selftests/mm/madv_collapse_range.c | 141 ++++++++++++++++++ >> tools/testing/selftests/mm/run_vmtests.sh | 5 + >> 6 files changed, 159 insertions(+), 3 deletions(-) >> create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh >> create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c >> >> -- >> 2.43.0 >> >> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-05-11 2:06 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-07 7:05 [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Chen Wandun 2026-05-07 7:05 ` [PATCH 1/2] mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range Chen Wandun 2026-05-08 12:27 ` David Hildenbrand (Arm) 2026-05-08 15:02 ` Lorenzo Stoakes 2026-05-08 15:04 ` Lorenzo Stoakes 2026-05-09 7:53 ` Wandun 2026-05-08 19:29 ` David Hildenbrand (Arm) 2026-05-09 7:04 ` Wandun 2026-05-09 5:56 ` Wandun 2026-05-07 7:05 ` [PATCH 2/2] selftests/mm: add MADV_COLLAPSE sub-PMD range tests Chen Wandun 2026-05-08 12:23 ` David Hildenbrand (Arm) 2026-05-08 15:03 ` Lorenzo Stoakes 2026-05-09 9:47 ` [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling Lance Yang 2026-05-11 2:06 ` Wandun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox