* [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-03-17 8:17 [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-03-17 8:17 ` Shivam Kalra via B4 Relay
2026-03-17 14:16 ` Alice Ryhl
2026-03-18 17:53 ` Uladzislau Rezki
2026-03-17 8:17 ` [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
` (3 subsequent siblings)
4 siblings, 2 replies; 17+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-03-17 8:17 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
Extract the page-freeing loop and NR_VMALLOC stat accounting from
vfree() into a reusable vm_area_free_pages() helper. The helper operates
on a range [start, end) of pages from a vm_struct, making it suitable
for both full free (vfree) and partial free (upcoming vrealloc shrink).
Freed page pointers in vm->pages[] are set to NULL to prevent stale
references when the vm_struct outlives the free (as in vrealloc shrink).
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
mm/vmalloc.c | 47 +++++++++++++++++++++++++++++++++--------------
1 file changed, 33 insertions(+), 14 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c607307c657a..b29bf58c0e3f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3416,6 +3416,38 @@ void vfree_atomic(const void *addr)
schedule_work(&p->wq);
}
+/*
+ * vm_area_free_pages - free a range of pages from a vmalloc allocation
+ * @vm: the vm_struct containing the pages
+ * @start: first page index to free (inclusive)
+ * @end: last page index to free (exclusive)
+ *
+ * Free pages [start, end) updating NR_VMALLOC stat accounting.
+ * Freed vm->pages[] entries are set to NULL.
+ * Caller is responsible for unmapping (vunmap_range) and KASAN
+ * poisoning before calling this.
+ */
+static void vm_area_free_pages(struct vm_struct *vm, unsigned int start,
+ unsigned int end)
+{
+ unsigned int i;
+
+ for (i = start; i < end; i++) {
+ struct page *page = vm->pages[i];
+
+ BUG_ON(!page);
+ /*
+ * High-order allocs for huge vmallocs are split, so
+ * can be freed as an array of order-0 allocations
+ */
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ mod_lruvec_page_state(page, NR_VMALLOC, -1);
+ __free_page(page);
+ vm->pages[i] = NULL;
+ cond_resched();
+ }
+}
+
/**
* vfree - Release memory allocated by vmalloc()
* @addr: Memory base address
@@ -3436,7 +3468,6 @@ void vfree_atomic(const void *addr)
void vfree(const void *addr)
{
struct vm_struct *vm;
- int i;
if (unlikely(in_interrupt())) {
vfree_atomic(addr);
@@ -3459,19 +3490,7 @@ void vfree(const void *addr)
if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
vm_reset_perms(vm);
- for (i = 0; i < vm->nr_pages; i++) {
- struct page *page = vm->pages[i];
-
- BUG_ON(!page);
- /*
- * High-order allocs for huge vmallocs are split, so
- * can be freed as an array of order-0 allocations
- */
- if (!(vm->flags & VM_MAP_PUT_PAGES))
- mod_lruvec_page_state(page, NR_VMALLOC, -1);
- __free_page(page);
- cond_resched();
- }
+ vm_area_free_pages(vm, 0, vm->nr_pages);
kvfree(vm->pages);
kfree(vm);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-03-17 8:17 ` [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
@ 2026-03-17 14:16 ` Alice Ryhl
2026-03-18 17:53 ` Uladzislau Rezki
1 sibling, 0 replies; 17+ messages in thread
From: Alice Ryhl @ 2026-03-17 14:16 UTC (permalink / raw)
To: Shivam Kalra
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On Tue, Mar 17, 2026 at 01:47:33PM +0530, Shivam Kalra wrote:
> Extract the page-freeing loop and NR_VMALLOC stat accounting from
> vfree() into a reusable vm_area_free_pages() helper. The helper operates
> on a range [start, end) of pages from a vm_struct, making it suitable
> for both full free (vfree) and partial free (upcoming vrealloc shrink).
>
> Freed page pointers in vm->pages[] are set to NULL to prevent stale
> references when the vm_struct outlives the free (as in vrealloc shrink).
>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-03-17 8:17 ` [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
2026-03-17 14:16 ` Alice Ryhl
@ 2026-03-18 17:53 ` Uladzislau Rezki
2026-03-20 9:42 ` Shivam Kalra
2026-03-21 8:02 ` Shivam Kalra
1 sibling, 2 replies; 17+ messages in thread
From: Uladzislau Rezki @ 2026-03-18 17:53 UTC (permalink / raw)
To: shivamkalra98
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Alice Ryhl, Danilo Krummrich
On Tue, Mar 17, 2026 at 01:47:33PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@zohomail.in>
>
> Extract the page-freeing loop and NR_VMALLOC stat accounting from
> vfree() into a reusable vm_area_free_pages() helper. The helper operates
> on a range [start, end) of pages from a vm_struct, making it suitable
> for both full free (vfree) and partial free (upcoming vrealloc shrink).
>
> Freed page pointers in vm->pages[] are set to NULL to prevent stale
> references when the vm_struct outlives the free (as in vrealloc shrink).
>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
> mm/vmalloc.c | 47 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index c607307c657a..b29bf58c0e3f 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3416,6 +3416,38 @@ void vfree_atomic(const void *addr)
> schedule_work(&p->wq);
> }
>
> +/*
> + * vm_area_free_pages - free a range of pages from a vmalloc allocation
> + * @vm: the vm_struct containing the pages
> + * @start: first page index to free (inclusive)
> + * @end: last page index to free (exclusive)
> + *
> + * Free pages [start, end) updating NR_VMALLOC stat accounting.
> + * Freed vm->pages[] entries are set to NULL.
> + * Caller is responsible for unmapping (vunmap_range) and KASAN
> + * poisoning before calling this.
> + */
> +static void vm_area_free_pages(struct vm_struct *vm, unsigned int start,
> + unsigned int end)
> +{
> + unsigned int i;
> +
> + for (i = start; i < end; i++) {
> + struct page *page = vm->pages[i];
> +
> + BUG_ON(!page);
> + /*
> + * High-order allocs for huge vmallocs are split, so
> + * can be freed as an array of order-0 allocations
> + */
> + if (!(vm->flags & VM_MAP_PUT_PAGES))
> + mod_lruvec_page_state(page, NR_VMALLOC, -1);
> + __free_page(page);
> + vm->pages[i] = NULL;
> + cond_resched();
> + }
> +}
> +
>
Since you will update second patch, probably you can also improve this
one. To me start/end variables sound like a VA range whereas it is
indices in the array.
Any thoughts?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-03-18 17:53 ` Uladzislau Rezki
@ 2026-03-20 9:42 ` Shivam Kalra
2026-03-21 8:02 ` Shivam Kalra
1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kalra @ 2026-03-20 9:42 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Andrew Morton, linux-mm, linux-kernel, Alice Ryhl,
Danilo Krummrich
On 18/03/26 23:23, Uladzislau Rezki wrote:
> On Tue, Mar 17, 2026 at 01:47:33PM +0530, Shivam Kalra via B4 Relay wrote:
>> From: Shivam Kalra <shivamkalra98@zohomail.in>
>>
>> Extract the page-freeing loop and NR_VMALLOC stat accounting from
>> vfree() into a reusable vm_area_free_pages() helper. The helper operates
>> on a range [start, end) of pages from a vm_struct, making it suitable
>> for both full free (vfree) and partial free (upcoming vrealloc shrink).
>>
>> Freed page pointers in vm->pages[] are set to NULL to prevent stale
>> references when the vm_struct outlives the free (as in vrealloc shrink).
>>
>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
>> ---
>> mm/vmalloc.c | 47 +++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index c607307c657a..b29bf58c0e3f 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -3416,6 +3416,38 @@ void vfree_atomic(const void *addr)
>> schedule_work(&p->wq);
>> }
>>
>> +/*
>> + * vm_area_free_pages - free a range of pages from a vmalloc allocation
>> + * @vm: the vm_struct containing the pages
>> + * @start: first page index to free (inclusive)
>> + * @end: last page index to free (exclusive)
>> + *
>> + * Free pages [start, end) updating NR_VMALLOC stat accounting.
>> + * Freed vm->pages[] entries are set to NULL.
>> + * Caller is responsible for unmapping (vunmap_range) and KASAN
>> + * poisoning before calling this.
>> + */
>> +static void vm_area_free_pages(struct vm_struct *vm, unsigned int start,
>> + unsigned int end)
>> +{
>> + unsigned int i;
>> +
>> + for (i = start; i < end; i++) {
>> + struct page *page = vm->pages[i];
>> +
>> + BUG_ON(!page);
>> + /*
>> + * High-order allocs for huge vmallocs are split, so
>> + * can be freed as an array of order-0 allocations
>> + */
>> + if (!(vm->flags & VM_MAP_PUT_PAGES))
>> + mod_lruvec_page_state(page, NR_VMALLOC, -1);
>> + __free_page(page);
>> + vm->pages[i] = NULL;
>> + cond_resched();
>> + }
>> +}
>> +
>>
> Since you will update second patch, probably you can also improve this
> one. To me start/end variables sound like a VA range whereas it is
> indices in the array.
>
> Any thoughts?
>
> --
> Uladzislau Rezki
I see that's where the confusion began. I have a better picture of
the situation. Will post my questions in the RFC soon.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-03-18 17:53 ` Uladzislau Rezki
2026-03-20 9:42 ` Shivam Kalra
@ 2026-03-21 8:02 ` Shivam Kalra
1 sibling, 0 replies; 17+ messages in thread
From: Shivam Kalra @ 2026-03-21 8:02 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Andrew Morton, linux-mm, linux-kernel, Alice Ryhl,
Danilo Krummrich
On 18/03/26 23:23, Uladzislau Rezki wrote:
> On Tue, Mar 17, 2026 at 01:47:33PM +0530, Shivam Kalra via B4 Relay wrote:
>> From: Shivam Kalra <shivamkalra98@zohomail.in>
>>
>> Extract the page-freeing loop and NR_VMALLOC stat accounting from
>> vfree() into a reusable vm_area_free_pages() helper. The helper operates
>> on a range [start, end) of pages from a vm_struct, making it suitable
>> for both full free (vfree) and partial free (upcoming vrealloc shrink).
>>
>> Freed page pointers in vm->pages[] are set to NULL to prevent stale
>> references when the vm_struct outlives the free (as in vrealloc shrink).
>>
>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
>> ---
>> mm/vmalloc.c | 47 +++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 33 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index c607307c657a..b29bf58c0e3f 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -3416,6 +3416,38 @@ void vfree_atomic(const void *addr)
>> schedule_work(&p->wq);
>> }
>>
>> +/*
>> + * vm_area_free_pages - free a range of pages from a vmalloc allocation
>> + * @vm: the vm_struct containing the pages
>> + * @start: first page index to free (inclusive)
>> + * @end: last page index to free (exclusive)
>> + *
>> + * Free pages [start, end) updating NR_VMALLOC stat accounting.
>> + * Freed vm->pages[] entries are set to NULL.
>> + * Caller is responsible for unmapping (vunmap_range) and KASAN
>> + * poisoning before calling this.
>> + */
>> +static void vm_area_free_pages(struct vm_struct *vm, unsigned int start,
>> + unsigned int end)
>> +{
>> + unsigned int i;
>> +
>> + for (i = start; i < end; i++) {
>> + struct page *page = vm->pages[i];
>> +
>> + BUG_ON(!page);
>> + /*
>> + * High-order allocs for huge vmallocs are split, so
>> + * can be freed as an array of order-0 allocations
>> + */
>> + if (!(vm->flags & VM_MAP_PUT_PAGES))
>> + mod_lruvec_page_state(page, NR_VMALLOC, -1);
>> + __free_page(page);
>> + vm->pages[i] = NULL;
>> + cond_resched();
>> + }
>> +}
>> +
>>
> Since you will update second patch, probably you can also improve this
> one. To me start/end variables sound like a VA range whereas it is
> indices in the array.
>
> Any thoughts?
>
> --
> Uladzislau Rezki
Oops, replied to the wrong thread! But regarding start/end, yes you're
absolutely right. I will rename them to start_idx and end_idx to make it
clear they are array indices in the next version
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 8:17 [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-03-17 8:17 ` [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
@ 2026-03-17 8:17 ` Shivam Kalra via B4 Relay
2026-03-17 14:39 ` Alice Ryhl
2026-03-17 8:17 ` [PATCH v5 3/3] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-03-17 8:17 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
When vrealloc() shrinks an allocation and the new size crosses a page
boundary, unmap and free the tail pages that are no longer needed. This
reclaims physical memory that was previously wasted for the lifetime
of the allocation.
The heuristic is simple: always free when at least one full page becomes
unused. Huge page allocations (page_order > 0) are skipped, as partial
freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
are also skipped, as their direct-map permissions must be reset before
pages are returned to the page allocator, which is handled by
vm_reset_perms() during vfree().
The virtual address reservation (vm->size / vmap_area) is intentionally
kept unchanged, preserving the address for potential future grow-in-place
support.
Fix the grow-in-place check to compare against vm->nr_pages rather than
get_vm_area_size(), since the latter reflects the virtual reservation
which does not shrink. Without this fix, a grow after shrink would
access freed pages.
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
mm/vmalloc.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b29bf58c0e3f..f3820c6712c1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
goto need_realloc;
}
- /*
- * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
- * would be a good heuristic for when to shrink the vm_area?
- */
if (size <= old_size) {
+ unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
/* Zero out "freed" memory, potentially for future realloc. */
if (want_init_on_free() || want_init_on_alloc(flags))
memset((void *)p + size, 0, old_size - size);
+
+ /* Free tail pages when shrink crosses a page boundary. */
+ if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
+ !(vm->flags & VM_FLUSH_RESET_PERMS)) {
+ unsigned long addr = (unsigned long)p;
+
+ vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
+ addr + (vm->nr_pages << PAGE_SHIFT));
+
+ vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
+ vm->nr_pages = new_nr_pages;
+ }
vm->requested_size = size;
kasan_vrealloc(p, old_size, size);
return (void *)p;
@@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
/*
* We already have the bytes available in the allocation; use them.
*/
- if (size <= alloced_size) {
+ if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
/*
* No need to zero memory here, as unused memory will have
* already been zeroed at initial allocation time or during
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 8:17 ` [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-03-17 14:39 ` Alice Ryhl
2026-03-17 14:45 ` Danilo Krummrich
0 siblings, 1 reply; 17+ messages in thread
From: Alice Ryhl @ 2026-03-17 14:39 UTC (permalink / raw)
To: Shivam Kalra
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
>
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
>
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
>
> Fix the grow-in-place check to compare against vm->nr_pages rather than
> get_vm_area_size(), since the latter reflects the virtual reservation
> which does not shrink. Without this fix, a grow after shrink would
> access freed pages.
>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
> mm/vmalloc.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b29bf58c0e3f..f3820c6712c1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
> goto need_realloc;
> }
>
> - /*
> - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> - * would be a good heuristic for when to shrink the vm_area?
> - */
> if (size <= old_size) {
> + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> /* Zero out "freed" memory, potentially for future realloc. */
> if (want_init_on_free() || want_init_on_alloc(flags))
> memset((void *)p + size, 0, old_size - size);
> +
> + /* Free tail pages when shrink crosses a page boundary. */
> + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> + !(vm->flags & VM_FLUSH_RESET_PERMS)) {
> + unsigned long addr = (unsigned long)p;
> +
> + vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
> + addr + (vm->nr_pages << PAGE_SHIFT));
> +
> + vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
> + vm->nr_pages = new_nr_pages;
> + }
> vm->requested_size = size;
> kasan_vrealloc(p, old_size, size);
> return (void *)p;
> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
> /*
> * We already have the bytes available in the allocation; use them.
> */
> - if (size <= alloced_size) {
> + if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
> /*
> * No need to zero memory here, as unused memory will have
> * already been zeroed at initial allocation time or during
Hmm. So what happened here is that it has previously always been the
case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
constants were interchangable. But now that is no longer the case.
For example, 'remap_vmalloc_range_partial' compares the vm area size
with the range being mapped, and then proceeds to look up the pages and
map them. But now those pages may be missing.
I can't really tell if there are other places in this file that need to
be updated too.
Alice
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 14:39 ` Alice Ryhl
@ 2026-03-17 14:45 ` Danilo Krummrich
2026-03-17 16:01 ` Shivam Kalra
0 siblings, 1 reply; 17+ messages in thread
From: Danilo Krummrich @ 2026-03-17 14:45 UTC (permalink / raw)
To: Shivam Kalra, Alice Ryhl
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel
On Tue Mar 17, 2026 at 3:39 PM CET, Alice Ryhl wrote:
> On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
>> When vrealloc() shrinks an allocation and the new size crosses a page
>> boundary, unmap and free the tail pages that are no longer needed. This
>> reclaims physical memory that was previously wasted for the lifetime
>> of the allocation.
>>
>> The heuristic is simple: always free when at least one full page becomes
>> unused. Huge page allocations (page_order > 0) are skipped, as partial
>> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
>> are also skipped, as their direct-map permissions must be reset before
>> pages are returned to the page allocator, which is handled by
>> vm_reset_perms() during vfree().
>>
>> The virtual address reservation (vm->size / vmap_area) is intentionally
>> kept unchanged, preserving the address for potential future grow-in-place
>> support.
>>
>> Fix the grow-in-place check to compare against vm->nr_pages rather than
>> get_vm_area_size(), since the latter reflects the virtual reservation
>> which does not shrink. Without this fix, a grow after shrink would
>> access freed pages.
>>
>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
Feel free to add
Suggested-by: Danilo Krummrich <dakr@kernel.org>
>> ---
>> mm/vmalloc.c | 20 +++++++++++++++-----
>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index b29bf58c0e3f..f3820c6712c1 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>> goto need_realloc;
>> }
>>
>> - /*
>> - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
>> - * would be a good heuristic for when to shrink the vm_area?
>> - */
>> if (size <= old_size) {
>> + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>> +
>> /* Zero out "freed" memory, potentially for future realloc. */
>> if (want_init_on_free() || want_init_on_alloc(flags))
>> memset((void *)p + size, 0, old_size - size);
>> +
>> + /* Free tail pages when shrink crosses a page boundary. */
>> + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
>> + !(vm->flags & VM_FLUSH_RESET_PERMS)) {
>> + unsigned long addr = (unsigned long)p;
>> +
>> + vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
>> + addr + (vm->nr_pages << PAGE_SHIFT));
>> +
>> + vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
>> + vm->nr_pages = new_nr_pages;
>> + }
>> vm->requested_size = size;
>> kasan_vrealloc(p, old_size, size);
>> return (void *)p;
>> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>> /*
>> * We already have the bytes available in the allocation; use them.
>> */
>> - if (size <= alloced_size) {
>> + if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
>> /*
>> * No need to zero memory here, as unused memory will have
>> * already been zeroed at initial allocation time or during
>
> Hmm. So what happened here is that it has previously always been the
> case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
> constants were interchangable. But now that is no longer the case.
>
> For example, 'remap_vmalloc_range_partial' compares the vm area size
> with the range being mapped, and then proceeds to look up the pages and
> map them. But now those pages may be missing.
>
> I can't really tell if there are other places in this file that need to
> be updated too.
This may well be possible. I remember that when I added vrealloc() and looked
into growing and shrinking, I concluded that it might need a bit of rework in
terms of tracking the sizes of the different layers. Unfortunately, I don't
remember the details anymore, but I'm quite sure there were some subtleties
along the lines of what Alice points out, so I recommend to double check.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 14:45 ` Danilo Krummrich
@ 2026-03-17 16:01 ` Shivam Kalra
0 siblings, 0 replies; 17+ messages in thread
From: Shivam Kalra @ 2026-03-17 16:01 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel
On 17/03/26 20:15, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 3:39 PM CET, Alice Ryhl wrote:
>> On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
>>> When vrealloc() shrinks an allocation and the new size crosses a page
>>> boundary, unmap and free the tail pages that are no longer needed. This
>>> reclaims physical memory that was previously wasted for the lifetime
>>> of the allocation.
>>>
>>> The heuristic is simple: always free when at least one full page becomes
>>> unused. Huge page allocations (page_order > 0) are skipped, as partial
>>> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
>>> are also skipped, as their direct-map permissions must be reset before
>>> pages are returned to the page allocator, which is handled by
>>> vm_reset_perms() during vfree().
>>>
>>> The virtual address reservation (vm->size / vmap_area) is intentionally
>>> kept unchanged, preserving the address for potential future grow-in-place
>>> support.
>>>
>>> Fix the grow-in-place check to compare against vm->nr_pages rather than
>>> get_vm_area_size(), since the latter reflects the virtual reservation
>>> which does not shrink. Without this fix, a grow after shrink would
>>> access freed pages.
>>>
>>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
>
> Feel free to add
>
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
>
>>> ---
>>> mm/vmalloc.c | 20 +++++++++++++++-----
>>> 1 file changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index b29bf58c0e3f..f3820c6712c1 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>> goto need_realloc;
>>> }
>>>
>>> - /*
>>> - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
>>> - * would be a good heuristic for when to shrink the vm_area?
>>> - */
>>> if (size <= old_size) {
>>> + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>>> +
>>> /* Zero out "freed" memory, potentially for future realloc. */
>>> if (want_init_on_free() || want_init_on_alloc(flags))
>>> memset((void *)p + size, 0, old_size - size);
>>> +
>>> + /* Free tail pages when shrink crosses a page boundary. */
>>> + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
>>> + !(vm->flags & VM_FLUSH_RESET_PERMS)) {
>>> + unsigned long addr = (unsigned long)p;
>>> +
>>> + vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
>>> + addr + (vm->nr_pages << PAGE_SHIFT));
>>> +
>>> + vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
>>> + vm->nr_pages = new_nr_pages;
>>> + }
>>> vm->requested_size = size;
>>> kasan_vrealloc(p, old_size, size);
>>> return (void *)p;
>>> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>> /*
>>> * We already have the bytes available in the allocation; use them.
>>> */
>>> - if (size <= alloced_size) {
>>> + if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
>>> /*
>>> * No need to zero memory here, as unused memory will have
>>> * already been zeroed at initial allocation time or during
>>
>> Hmm. So what happened here is that it has previously always been the
>> case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
>> constants were interchangable. But now that is no longer the case.
>>
>> For example, 'remap_vmalloc_range_partial' compares the vm area size
>> with the range being mapped, and then proceeds to look up the pages and
>> map them. But now those pages may be missing.
>>
>> I can't really tell if there are other places in this file that need to
>> be updated too.
>
> This may well be possible. I remember that when I added vrealloc() and looked
> into growing and shrinking, I concluded that it might need a bit of rework in
> terms of tracking the sizes of the different layers. Unfortunately, I don't
> remember the details anymore, but I'm quite sure there were some subtleties
> along the lines of what Alice points out, so I recommend to double check.
I will leave an update if I find some issue.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 3/3] lib/test_vmalloc: add vrealloc test case
2026-03-17 8:17 [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-03-17 8:17 ` [PATCH v5 1/3] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
2026-03-17 8:17 ` [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-03-17 8:17 ` Shivam Kalra via B4 Relay
2026-03-17 21:11 ` [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Andrew Morton
2026-03-21 8:15 ` Shivam Kalra
4 siblings, 0 replies; 17+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-03-17 8:17 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
Introduce a new test case "vrealloc_test" that exercises the vrealloc()
shrink and in-place grow paths:
- Grow beyond allocated pages (triggers full reallocation).
- Shrink crossing a page boundary (frees tail pages).
- Shrink within the same page (no page freeing).
- Grow within the already allocated page count (in-place).
Data integrity is validated after each realloc step by checking that
the first byte of the original allocation is preserved.
The test is gated behind run_test_mask bit 12 (id 4096).
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 876c72c18a0c..ce2b2777a785 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -55,6 +55,7 @@ __param(int, run_test_mask, 7,
"\t\tid: 512, name: kvfree_rcu_2_arg_vmalloc_test\n"
"\t\tid: 1024, name: vm_map_ram_test\n"
"\t\tid: 2048, name: no_block_alloc_test\n"
+ "\t\tid: 4096, name: vrealloc_test\n"
/* Add a new test case description here. */
);
@@ -421,6 +422,56 @@ vm_map_ram_test(void)
return nr_allocated != map_nr_pages;
}
+static int vrealloc_test(void)
+{
+ void *ptr;
+ int i;
+
+ for (i = 0; i < test_loop_count; i++) {
+ ptr = vmalloc(PAGE_SIZE);
+ if (!ptr)
+ return -1;
+
+ *((__u8 *)ptr) = 'a';
+
+ /* Grow: beyond allocated pages, triggers full realloc. */
+ ptr = vrealloc(ptr, 4 * PAGE_SIZE, GFP_KERNEL);
+ if (!ptr)
+ return -1;
+
+ if (*((__u8 *)ptr) != 'a')
+ return -1;
+
+ /* Shrink: crosses page boundary, frees tail pages. */
+ ptr = vrealloc(ptr, PAGE_SIZE, GFP_KERNEL);
+ if (!ptr)
+ return -1;
+
+ if (*((__u8 *)ptr) != 'a')
+ return -1;
+
+ /* Shrink: within same page, no page freeing. */
+ ptr = vrealloc(ptr, PAGE_SIZE / 2, GFP_KERNEL);
+ if (!ptr)
+ return -1;
+
+ if (*((__u8 *)ptr) != 'a')
+ return -1;
+
+ /* Grow: within allocated page, in-place, no realloc. */
+ ptr = vrealloc(ptr, PAGE_SIZE, GFP_KERNEL);
+ if (!ptr)
+ return -1;
+
+ if (*((__u8 *)ptr) != 'a')
+ return -1;
+
+ vfree(ptr);
+ }
+
+ return 0;
+}
+
struct test_case_desc {
const char *test_name;
int (*test_func)(void);
@@ -440,6 +491,7 @@ static struct test_case_desc test_case_array[] = {
{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test, },
{ "vm_map_ram_test", vm_map_ram_test, },
{ "no_block_alloc_test", no_block_alloc_test, true },
+ { "vrealloc_test", vrealloc_test, },
/* Add a new test case here. */
};
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 8:17 [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
` (2 preceding siblings ...)
2026-03-17 8:17 ` [PATCH v5 3/3] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay
@ 2026-03-17 21:11 ` Andrew Morton
2026-03-18 8:00 ` Shivam Kalra
2026-03-21 8:15 ` Shivam Kalra
4 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2026-03-17 21:11 UTC (permalink / raw)
To: shivamkalra98
Cc: Shivam Kalra via B4 Relay, Uladzislau Rezki, linux-mm,
linux-kernel, Alice Ryhl, Danilo Krummrich
On Tue, 17 Mar 2026 13:47:32 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote:
> This series implements the TODO in vrealloc() to unmap and free unused
> pages when shrinking across a page boundary.
Lots of questions have been posed by AI review:
https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-17 8:17 [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
` (3 preceding siblings ...)
2026-03-17 21:11 ` [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink Andrew Morton
@ 2026-03-21 8:15 ` Shivam Kalra
2026-03-21 18:04 ` Andrew Morton
2026-03-22 12:48 ` Alice Ryhl
4 siblings, 2 replies; 17+ messages in thread
From: Shivam Kalra @ 2026-03-21 8:15 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich
On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> This series implements the TODO in vrealloc() to unmap and free unused
> pages when shrinking across a page boundary.
>
> Problem:
> When vrealloc() shrinks an allocation, it updates bookkeeping
> (requested_size, KASAN shadow) but does not free the underlying physical
> pages. This wastes memory for the lifetime of the allocation.
>
> Solution:
> - Patch 1: Extracts a vm_area_free_pages(vm, start, end) helper from
> vfree() that frees a range of pages with memcg and nr_vmalloc_pages
> accounting. Freed page pointers are set to NULL to prevent stale
> references.
> - Patch 2: Uses the helper to free tail pages when vrealloc() shrinks
> across a page boundary. Skips huge page allocations (page_order > 0)
> since compound pages cannot be partially freed. Allocations with
> VM_FLUSH_RESET_PERMS are also skipped. Also fixes the grow-in-place
> path to check vm->nr_pages instead of get_vm_area_size(), which
> reflects the virtual reservation and does not change on shrink.
> - Patch 3: Adds a vrealloc test case to lib/test_vmalloc that exercises
> grow-realloc, shrink-across-boundary, shrink-within-page, and
> grow-in-place paths with data integrity validation.
>
> The virtual address reservation is kept intact to preserve the range
> for potential future grow-in-place support.
> A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
> performs explicit vrealloc() shrinks for memory reclamation.
>
> Tested:
> - KASAN KUnit (vmalloc_oob passes)
> - lib/test_vmalloc stress tests (3/3, 1M iterations each)
> - checkpatch, sparse, W=1, allmodconfig, coccicheck clean
>
> [1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/
>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
> Changes in v5:
> - Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
> - Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in
>
> Changes in v4:
> - Rename vmalloc_free_pages() to vm_area_free_pages() to align with
> vm_area_alloc_pages() (Uladzislau Rezki)
> - NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
> - Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
> - Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
> - Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in
>
> Changes in v3:
> - Restore the comment.
> - Rebase to the latest mm-new
> - Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in
>
> Changes in v2:
> - Updated the base-commit to mm-new
> - Fix conflicts after rebase
> - Ran `clang-format` on the changes made
> - Use a single `kasan_vrealloc` (Alice Ryhl)
> - Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in
>
> ---
> Shivam Kalra (3):
> mm/vmalloc: extract vm_area_free_pages() helper from vfree()
> mm/vmalloc: free unused pages on vrealloc() shrink
> lib/test_vmalloc: add vrealloc test case
>
> lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++
> mm/vmalloc.c | 67 ++++++++++++++++++++++++++++++++++++++----------------
> 2 files changed, 100 insertions(+), 19 deletions(-)
> ---
> base-commit: 7d47a508dfdc335c107fb00b4d9ef46488281a52
> change-id: 20260302-vmalloc-shrink-04b2fa688a14
>
> Best regards,
Hi everyone,
Following up on the concerns raised regarding `get_vm_area_size()` versus
`vm->nr_pages << PAGE_SHIFT`, Andrew kindly ran the patchset through an
AI review which flagged several concrete issues.
I've used those results to audit the code and figure out exactly what
breaks when we shrink allocations while preserving the virtual area size.
Based on that research, here is what I am planning to include in the v6
series to address these edge cases:
1. Fixing the VM_USERMAP crash
Alice correctly pointed out that `remap_vmalloc_range_partial()` relies
on `get_vm_area_size()` to validate the mapping size. If we free tail
pages but keep `vm->size` unchanged, mapping the full original size
would cause a NULL pointer dereference in `vm_insert_page()`.
Plan: I'll update the shrink path to explicitly bail out if `VM_USERMAP`
is set, ensuring safety for these mappings.
2. Fixing the Kmemleak scanner panic
Kmemleak tracks the original allocation size and scans it periodically.
If we unmap and free tail pages without notifying kmemleak, its scanner
will fault on the unmapped virtual addresses.
Plan: I'll add a call to `kmemleak_free_part()` during the shrink to
keep its tracked object size updated.
3. Fixing a /proc/vmallocinfo race condition
`show_numa_info()` iterates over `v->nr_pages`. During a shrink,
modifying `nr_pages` and NULL-ing out the page pointers concurrently
could cause a reader to dereference a NULL page pointer.
Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
freeing the pages. This guarantees that concurrent readers either see
the old count with valid pages or the new, smaller count.
4. Fixing a stale data leak on grow
A vrealloc grow with `__GFP_ZERO` could leak previously discarded data
if an intermediate shrink happened without `__GFP_ZERO` (which skips
zeroing the freed region).
Plan: I will add mandatory zeroing in the grow-in-place path for
`want_init_on_alloc()` to clear any newly exposed bytes.
Thanks again to Alice and Danilo for prompting the closer look, and to
Andrew for providing the review. I should have v6 ready for review soon.
Best regards,
Shivam
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-21 8:15 ` Shivam Kalra
@ 2026-03-21 18:04 ` Andrew Morton
2026-03-22 12:48 ` Alice Ryhl
1 sibling, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2026-03-21 18:04 UTC (permalink / raw)
To: Shivam Kalra
Cc: Uladzislau Rezki, linux-mm, linux-kernel, Alice Ryhl,
Danilo Krummrich
On Sat, 21 Mar 2026 13:45:35 +0530 Shivam Kalra <shivamkalra98@zohomail.in> wrote:
> Thanks again to Alice and Danilo for prompting the closer look, and to
> Andrew for providing the review.
Well, thanks to those who developed and provided the reviewbot!
Reviews seem to take 12+ hours at present. Go to https://sashiko.dev/,
select linux-mm list, paste in the subject. Or go directly to
id=<Message-ID>
https://sashiko.dev/#/patchset/$id
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-21 8:15 ` Shivam Kalra
2026-03-21 18:04 ` Andrew Morton
@ 2026-03-22 12:48 ` Alice Ryhl
2026-03-22 14:32 ` Uladzislau Rezki
1 sibling, 1 reply; 17+ messages in thread
From: Alice Ryhl @ 2026-03-22 12:48 UTC (permalink / raw)
To: Shivam Kalra
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote:
> On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> 3. Fixing a /proc/vmallocinfo race condition
> `show_numa_info()` iterates over `v->nr_pages`. During a shrink,
> modifying `nr_pages` and NULL-ing out the page pointers concurrently
> could cause a reader to dereference a NULL page pointer.
> Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
> the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
> freeing the pages. This guarantees that concurrent readers either see
> the old count with valid pages or the new, smaller count.
This doesn't fix the race. Consider this:
nr < vm->nr_pages == true
vm->nr_pages = nr
free vm->pages[nr]
page_to_nid(v->pages[nr]) // UAF
perhaps changing vm->nr_pages should happen under the vn->busy.lock
spinlock? show_numa_info() is called under that lock too.
Alice
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
2026-03-22 12:48 ` Alice Ryhl
@ 2026-03-22 14:32 ` Uladzislau Rezki
0 siblings, 0 replies; 17+ messages in thread
From: Uladzislau Rezki @ 2026-03-22 14:32 UTC (permalink / raw)
To: Alice Ryhl
Cc: Shivam Kalra, Andrew Morton, Uladzislau Rezki, linux-mm,
linux-kernel, Danilo Krummrich
On Sun, Mar 22, 2026 at 12:48:28PM +0000, Alice Ryhl wrote:
> On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote:
> > On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> > 3. Fixing a /proc/vmallocinfo race condition
> > `show_numa_info()` iterates over `v->nr_pages`. During a shrink,
> > modifying `nr_pages` and NULL-ing out the page pointers concurrently
> > could cause a reader to dereference a NULL page pointer.
> > Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
> > the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
> > freeing the pages. This guarantees that concurrent readers either see
> > the old count with valid pages or the new, smaller count.
>
> This doesn't fix the race. Consider this:
>
> nr < vm->nr_pages == true
> vm->nr_pages = nr
> free vm->pages[nr]
> page_to_nid(v->pages[nr]) // UAF
>
> perhaps changing vm->nr_pages should happen under the vn->busy.lock
> spinlock? show_numa_info() is called under that lock too.
>
vn->busy.lock protects VA in a busy tree. So if you update the nr_pages
of given VA you should hold the lock of the node VA belongs to.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread