* [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink
@ 2026-04-01 17:16 Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-04-01 17:16 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
This series implements the TODO in vrealloc() to unmap and free unused
pages when shrinking across a page boundary.
Problem:
When vrealloc() shrinks an allocation, it updates bookkeeping
(requested_size, KASAN shadow) but does not free the underlying physical
pages. This wastes memory for the lifetime of the allocation.
Solution:
- Patch 1: Extracts a vm_area_free_pages(vm, start_idx, end_idx) helper
from vfree() that frees a range of pages with memcg and nr_vmalloc_pages
accounting. Freed page pointers are set to NULL to prevent stale
references.
- Patch 2: Update the grow-in-place check in vrealloc() to compare the
requested size against the actual physical page count (vm->nr_pages)
rather than the virtual area sizes. This is a prerequisite for shrinking.
- Patch 3: Uses the helper to free tail pages when vrealloc() shrinks
across a page boundary.
- Patch 4: Adds a vrealloc test case to lib/test_vmalloc that exercises
grow-realloc, shrink-across-boundary, shrink-within-page, and
grow-in-place paths.
The virtual address reservation is kept intact to preserve the range
for potential future grow-in-place support.
A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
performs explicit vrealloc() shrinks for memory reclamation.
Tested:
- KASAN KUnit (vmalloc_oob passes)
- lib/test_vmalloc stress tests (3/3, 1M iterations each)
- checkpatch, sparse, W=1, allmodconfig, coccicheck clean
[1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
Changes in v9:
- Remove READ_ONCE, WRITE_ONCE and drop commit
about show_numa_info. (Uladzislau Rezki)
- Update the commit message in Patch 2. (Alice Ryhl)
- Remove zero newly exposed memory commit.
- Link to v8: https://lore.kernel.org/r/20260327-vmalloc-shrink-v8-0-cc6b57059ed7@zohomail.in
Changes in v8:
- Strip the KASAN tag from the pointer before addr_to_node()
to avoid acquiring the wrong node lock (Sashiko).
- Rebase to latest mm-new.
- Link to v7: https://lore.kernel.org/r/20260324-vmalloc-shrink-v7-0-c0e62b8e5d83@zohomail.in
Changes in v7:
- Fix NULL pointer dereference in shrink path (Sashiko)
- Acquire vn->busy.lock when updating vm->nr_pages to synchronize
with concurrent readers (Uladzislau Rezki)
- Use READ_ONCE in vmalloc_dump_obj (Sashiko)
- Skip shrink path on GFP_NIO or GFP_NOFS. (Sashiko)
- Fix Overflow issue for large allocations. (Sashiko)
- Use vrealloc instead of vmalloc in vrealloc test.
- Link to v6: https://lore.kernel.org/r/20260321-vmalloc-shrink-v6-0-062ca7b7ceb2@zohomail.in
Changes in v6:
- Fix VM_USERMAP crash by explicitly bypassing early in the shrink path if the flag is set.(Sashiko)
- Fix Kmemleak scanner panic by calling kmemleak_free_part() to update tracking on shrink.(Sashiko)
- Fix /proc/vmallocinfo race condition by protecting vm->nr_pages access with
READ_ONCE()/WRITE_ONCE() for concurrent readers.(Sashiko)
- Fix stale data leak on grow-after-shrink by enforcing mandatory zeroing of the newly exposed memory.(Sashiko)
- Fix memory leaks in vrealloc_test() by using a temporary pointer to preserve and
free the original allocation upon failure.(Sashiko)
- Rename vmalloc_free_pages parameters from start/end to start_idx/end_idx for better clarity.(Uladzislau Rezki)
- Link to v5: https://lore.kernel.org/r/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
- Link to Sashiko: https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265%40zohomail.in
Changes in v5:
- Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
- Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in
Changes in v4:
- Rename vmalloc_free_pages() to vm_area_free_pages() to align with
vm_area_alloc_pages() (Uladzislau Rezki)
- NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
- Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
- Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
- Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in
Changes in v3:
- Restore the comment.
- Rebase to the latest mm-new
- Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in
Changes in v2:
- Updated the base-commit to mm-new
- Fix conflicts after rebase
- Ran `clang-format` on the changes made
- Use a single `kasan_vrealloc` (Alice Ryhl)
- Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in
---
Shivam Kalra (4):
mm/vmalloc: extract vm_area_free_pages() helper from vfree()
mm/vmalloc: use physical page count for vrealloc() grow-in-place check
mm/vmalloc: free unused pages on vrealloc() shrink
lib/test_vmalloc: add vrealloc test case
lib/test_vmalloc.c | 62 ++++++++++++++++++++++++++++++
mm/vmalloc.c | 111 ++++++++++++++++++++++++++++++++++++++++++++---------
2 files changed, 154 insertions(+), 19 deletions(-)
---
base-commit: 54c9d0359b180b34070aa7ff8d9428fa3db8acbb
change-id: 20260302-vmalloc-shrink-04b2fa688a14
Best regards,
--
Shivam Kalra <shivamkalra98@zohomail.in>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree()
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-04-01 17:16 ` Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check Shivam Kalra via B4 Relay
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-04-01 17:16 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
Extract the page-freeing loop and NR_VMALLOC stat accounting from
vfree() into a reusable vm_area_free_pages() helper. The helper operates
on a range [start_idx, end_idx) of pages from a vm_struct, making it
suitable for both full free (vfree) and partial free (upcoming vrealloc
shrink).
Freed page pointers in vm->pages[] are set to NULL to prevent stale
references when the vm_struct outlives the free (as in vrealloc shrink).
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
mm/vmalloc.c | 47 +++++++++++++++++++++++++++++++++--------------
1 file changed, 33 insertions(+), 14 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 57eae99d9909..fe8700270139 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3424,6 +3424,38 @@ void vfree_atomic(const void *addr)
schedule_work(&p->wq);
}
+/*
+ * vm_area_free_pages - free a range of pages from a vmalloc allocation
+ * @vm: the vm_struct containing the pages
+ * @start_idx: first page index to free (inclusive)
+ * @end_idx: last page index to free (exclusive)
+ *
+ * Free pages [start_idx, end_idx) updating NR_VMALLOC stat accounting.
+ * Freed vm->pages[] entries are set to NULL.
+ * Caller is responsible for unmapping (vunmap_range) and KASAN
+ * poisoning before calling this.
+ */
+static void vm_area_free_pages(struct vm_struct *vm, unsigned int start_idx,
+ unsigned int end_idx)
+{
+ unsigned int i;
+
+ for (i = start_idx; i < end_idx; i++) {
+ struct page *page = vm->pages[i];
+
+ BUG_ON(!page);
+ /*
+ * High-order allocs for huge vmallocs are split, so
+ * can be freed as an array of order-0 allocations
+ */
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ mod_lruvec_page_state(page, NR_VMALLOC, -1);
+ __free_page(page);
+ vm->pages[i] = NULL;
+ cond_resched();
+ }
+}
+
/**
* vfree - Release memory allocated by vmalloc()
* @addr: Memory base address
@@ -3444,7 +3476,6 @@ void vfree_atomic(const void *addr)
void vfree(const void *addr)
{
struct vm_struct *vm;
- int i;
if (unlikely(in_interrupt())) {
vfree_atomic(addr);
@@ -3467,19 +3498,7 @@ void vfree(const void *addr)
if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
vm_reset_perms(vm);
- for (i = 0; i < vm->nr_pages; i++) {
- struct page *page = vm->pages[i];
-
- BUG_ON(!page);
- /*
- * High-order allocs for huge vmallocs are split, so
- * can be freed as an array of order-0 allocations
- */
- if (!(vm->flags & VM_MAP_PUT_PAGES))
- mod_lruvec_page_state(page, NR_VMALLOC, -1);
- __free_page(page);
- cond_resched();
- }
+ vm_area_free_pages(vm, 0, vm->nr_pages);
kvfree(vm->pages);
kfree(vm);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
@ 2026-04-01 17:16 ` Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay
3 siblings, 0 replies; 10+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-04-01 17:16 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
Update the grow-in-place check in vrealloc() to compare the requested size
against the actual physical page count (vm->nr_pages) rather than the
virtual area size (alloced_size, derived from get_vm_area_size()).
Currently both values are equivalent, but the upcoming vrealloc() shrink
functionality will free pages without reducing the virtual reservation
size. After such a shrink, the old alloced_size-based comparison would
incorrectly allow a grow-in-place operation to succeed and attempt to
access freed pages. Switch to vm->nr_pages now so the check remains
correct once shrink support is added.
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
mm/vmalloc.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index fe8700270139..1c6d747220ce 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4351,6 +4351,12 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
if (unlikely(flags & __GFP_THISNODE) && nid != NUMA_NO_NODE &&
nid != page_to_nid(vmalloc_to_page(p)))
goto need_realloc;
+ } else {
+ /*
+ * If p is NULL, vrealloc behaves exactly like vmalloc.
+ * Skip the shrink and in-place grow paths.
+ */
+ goto need_realloc;
}
/*
@@ -4369,7 +4375,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
/*
* We already have the bytes available in the allocation; use them.
*/
- if (size <= alloced_size) {
+ if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
/*
* No need to zero memory here, as unused memory will have
* already been zeroed at initial allocation time or during
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check Shivam Kalra via B4 Relay
@ 2026-04-01 17:16 ` Shivam Kalra via B4 Relay
2026-04-01 21:19 ` Alice Ryhl
2026-04-01 17:16 ` [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay
3 siblings, 1 reply; 10+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-04-01 17:16 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
When vrealloc() shrinks an allocation and the new size crosses a page
boundary, unmap and free the tail pages that are no longer needed. This
reclaims physical memory that was previously wasted for the lifetime
of the allocation.
The heuristic is simple: always free when at least one full page becomes
unused. Huge page allocations (page_order > 0) are skipped, as partial
freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
are also skipped, as their direct-map permissions must be reset before
pages are returned to the page allocator, which is handled by
vm_reset_perms() during vfree().
Additionally, allocations with VM_USERMAP are skipped because
remap_vmalloc_range_partial() validates mapping requests against the
unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
to return NULL for the unmapped range.
To protect concurrent readers, the shrink path uses Node lock to
synchronize before freeing the pages.
Finally, we notify kmemleak of the reduced allocation size using
kmemleak_free_part() to prevent the kmemleak scanner from faulting on
the newly unmapped virtual addresses.
The virtual address reservation (vm->size / vmap_area) is intentionally
kept unchanged, preserving the address for potential future grow-in-place
support.
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 52 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1c6d747220ce..a7731e54560b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
goto need_realloc;
}
- /*
- * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
- * would be a good heuristic for when to shrink the vm_area?
- */
if (size <= old_size) {
+ unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
/* Zero out "freed" memory, potentially for future realloc. */
if (want_init_on_free() || want_init_on_alloc(flags))
memset((void *)p + size, 0, old_size - size);
+
+ /*
+ * Free tail pages when shrink crosses a page boundary.
+ *
+ * Skip huge page allocations (page_order > 0) as partial
+ * freeing would require splitting.
+ *
+ * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
+ * be reset before pages are returned to the allocator.
+ *
+ * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
+ * mapping requests against the unchanged vm->size; freeing
+ * tail pages would cause vmalloc_to_page() to return NULL for
+ * the unmapped range.
+ *
+ * Skip if either GFP_NOFS or GFP_NOIO are used.
+ * kmemleak_free_part() internally allocates with
+ * GFP_KERNEL, which could trigger a recursive deadlock
+ * if we are under filesystem or I/O reclaim.
+ */
+ if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
+ !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
+ gfp_has_io_fs(flags)) {
+ unsigned long addr = (unsigned long)kasan_reset_tag(p);
+ unsigned int old_nr_pages = vm->nr_pages;
+
+ /* Notify kmemleak of the reduced allocation size before unmapping. */
+ kmemleak_free_part(
+ (void *)addr + ((unsigned long)new_nr_pages
+ << PAGE_SHIFT),
+ (unsigned long)(old_nr_pages - new_nr_pages)
+ << PAGE_SHIFT);
+
+ vunmap_range(addr + ((unsigned long)new_nr_pages
+ << PAGE_SHIFT),
+ addr + ((unsigned long)old_nr_pages
+ << PAGE_SHIFT));
+
+ /*
+ * Use the node lock to synchronize with concurrent
+ * readers (vmalloc_info_show).
+ */
+ struct vmap_node *vn = addr_to_node(addr);
+
+ spin_lock(&vn->busy.lock);
+ vm->nr_pages = new_nr_pages;
+ spin_unlock(&vn->busy.lock);
+
+ vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
+ }
vm->requested_size = size;
kasan_vrealloc(p, old_size, size);
return (void *)p;
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
` (2 preceding siblings ...)
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-04-01 17:16 ` Shivam Kalra via B4 Relay
3 siblings, 0 replies; 10+ messages in thread
From: Shivam Kalra via B4 Relay @ 2026-04-01 17:16 UTC (permalink / raw)
To: Andrew Morton, Uladzislau Rezki
Cc: linux-mm, linux-kernel, Alice Ryhl, Danilo Krummrich,
Shivam Kalra
From: Shivam Kalra <shivamkalra98@zohomail.in>
Introduce a new test case "vrealloc_test" that exercises the vrealloc()
shrink and in-place grow paths:
- Grow beyond allocated pages (triggers full reallocation).
- Shrink crossing a page boundary (frees tail pages).
- Shrink within the same page (no page freeing).
- Grow within the already allocated page count (in-place).
Data integrity is validated after each realloc step by checking that
the first byte of the original allocation is preserved.
The test is gated behind run_test_mask bit 12 (id 4096).
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
lib/test_vmalloc.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 876c72c18a0c..b23f85e8f8ca 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -55,6 +55,7 @@ __param(int, run_test_mask, 7,
"\t\tid: 512, name: kvfree_rcu_2_arg_vmalloc_test\n"
"\t\tid: 1024, name: vm_map_ram_test\n"
"\t\tid: 2048, name: no_block_alloc_test\n"
+ "\t\tid: 4096, name: vrealloc_test\n"
/* Add a new test case description here. */
);
@@ -421,6 +422,66 @@ vm_map_ram_test(void)
return nr_allocated != map_nr_pages;
}
+static int vrealloc_test(void)
+{
+ void *ptr, *tmp;
+ int i;
+
+ for (i = 0; i < test_loop_count; i++) {
+ int err = -1;
+
+ ptr = vrealloc(NULL, PAGE_SIZE, GFP_KERNEL);
+ if (!ptr)
+ return -1;
+
+ *((__u8 *)ptr) = 'a';
+
+ /* Grow: beyond allocated pages, triggers full realloc. */
+ tmp = vrealloc(ptr, 4 * PAGE_SIZE, GFP_KERNEL);
+ if (!tmp)
+ goto error;
+ ptr = tmp;
+
+ if (*((__u8 *)ptr) != 'a')
+ goto error;
+
+ /* Shrink: crosses page boundary, frees tail pages. */
+ tmp = vrealloc(ptr, PAGE_SIZE, GFP_KERNEL);
+ if (!tmp)
+ goto error;
+ ptr = tmp;
+
+ if (*((__u8 *)ptr) != 'a')
+ goto error;
+
+ /* Shrink: within same page, no page freeing. */
+ tmp = vrealloc(ptr, PAGE_SIZE / 2, GFP_KERNEL);
+ if (!tmp)
+ goto error;
+ ptr = tmp;
+
+ if (*((__u8 *)ptr) != 'a')
+ goto error;
+
+ /* Grow: within allocated page, in-place, no realloc. */
+ tmp = vrealloc(ptr, PAGE_SIZE, GFP_KERNEL);
+ if (!tmp)
+ goto error;
+ ptr = tmp;
+
+ if (*((__u8 *)ptr) != 'a')
+ goto error;
+
+ err = 0;
+error:
+ vfree(ptr);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
struct test_case_desc {
const char *test_name;
int (*test_func)(void);
@@ -440,6 +501,7 @@ static struct test_case_desc test_case_array[] = {
{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test, },
{ "vm_map_ram_test", vm_map_ram_test, },
{ "no_block_alloc_test", no_block_alloc_test, true },
+ { "vrealloc_test", vrealloc_test, },
/* Add a new test case here. */
};
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
@ 2026-04-01 21:19 ` Alice Ryhl
2026-04-02 2:01 ` Shivam Kalra
0 siblings, 1 reply; 10+ messages in thread
From: Alice Ryhl @ 2026-04-01 21:19 UTC (permalink / raw)
To: shivamkalra98
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@zohomail.in>
>
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
>
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
>
> Additionally, allocations with VM_USERMAP are skipped because
> remap_vmalloc_range_partial() validates mapping requests against the
> unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
> to return NULL for the unmapped range.
>
> To protect concurrent readers, the shrink path uses Node lock to
> synchronize before freeing the pages.
>
> Finally, we notify kmemleak of the reduced allocation size using
> kmemleak_free_part() to prevent the kmemleak scanner from faulting on
> the newly unmapped virtual addresses.
>
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
>
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
> mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 52 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1c6d747220ce..a7731e54560b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
> goto need_realloc;
> }
>
> - /*
> - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> - * would be a good heuristic for when to shrink the vm_area?
> - */
> if (size <= old_size) {
> + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> /* Zero out "freed" memory, potentially for future realloc. */
> if (want_init_on_free() || want_init_on_alloc(flags))
> memset((void *)p + size, 0, old_size - size);
> +
> + /*
> + * Free tail pages when shrink crosses a page boundary.
> + *
> + * Skip huge page allocations (page_order > 0) as partial
> + * freeing would require splitting.
> + *
> + * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
> + * be reset before pages are returned to the allocator.
> + *
> + * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
> + * mapping requests against the unchanged vm->size; freeing
> + * tail pages would cause vmalloc_to_page() to return NULL for
> + * the unmapped range.
> + *
> + * Skip if either GFP_NOFS or GFP_NOIO are used.
> + * kmemleak_free_part() internally allocates with
> + * GFP_KERNEL, which could trigger a recursive deadlock
> + * if we are under filesystem or I/O reclaim.
> + */
> + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> + !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
> + gfp_has_io_fs(flags)) {
> + unsigned long addr = (unsigned long)kasan_reset_tag(p);
> + unsigned int old_nr_pages = vm->nr_pages;
> +
> + /* Notify kmemleak of the reduced allocation size before unmapping. */
> + kmemleak_free_part(
> + (void *)addr + ((unsigned long)new_nr_pages
> + << PAGE_SHIFT),
> + (unsigned long)(old_nr_pages - new_nr_pages)
> + << PAGE_SHIFT);
> +
> + vunmap_range(addr + ((unsigned long)new_nr_pages
> + << PAGE_SHIFT),
> + addr + ((unsigned long)old_nr_pages
> + << PAGE_SHIFT));
> +
> + /*
> + * Use the node lock to synchronize with concurrent
> + * readers (vmalloc_info_show).
> + */
> + struct vmap_node *vn = addr_to_node(addr);
> +
> + spin_lock(&vn->busy.lock);
> + vm->nr_pages = new_nr_pages;
> + spin_unlock(&vn->busy.lock);
Should we set nr_pages first? Right now, another thread may observe the
range being unmapped but still see the old nr_pages value.
> + vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
> + }
> vm->requested_size = size;
> kasan_vrealloc(p, old_size, size);
> return (void *)p;
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-01 21:19 ` Alice Ryhl
@ 2026-04-02 2:01 ` Shivam Kalra
2026-04-02 2:53 ` Shivam Kalra
0 siblings, 1 reply; 10+ messages in thread
From: Shivam Kalra @ 2026-04-02 2:01 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On 02/04/26 02:49, Alice Ryhl wrote:
> Should we set nr_pages first? Right now, another thread may observe the
> range being unmapped but still see the old nr_pages value.
Isn't this exactly what the spinlock is for? The observer is supposed to
free the lock after they are done observing the value.
Can you point out the code where this might not be the case? Or am I
missing something?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-02 2:01 ` Shivam Kalra
@ 2026-04-02 2:53 ` Shivam Kalra
2026-04-07 8:06 ` Alice Ryhl
0 siblings, 1 reply; 10+ messages in thread
From: Shivam Kalra @ 2026-04-02 2:53 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On 02/04/26 07:31, Shivam Kalra wrote:
> On 02/04/26 02:49, Alice Ryhl wrote:
> Should we set nr_pages first? Right now, another thread may observe the
> range being unmapped but still see the old nr_pages value.
Or is this what you mean?
<snip>
struct vmap_node *vn = addr_to_node(addr);
/* Notify kmemleak of the reduced allocation size before unmapping. */
kmemleak_free_part(...);
spin_lock(&vn->busy.lock);
vm->nr_pages = new_nr_pages;
spin_unlock(&vn->busy.lock);
vunmap_range(...);
vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
<snip>
If this is the case, then I agree this will be much cleaner.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-02 2:53 ` Shivam Kalra
@ 2026-04-07 8:06 ` Alice Ryhl
2026-04-07 11:05 ` Shivam Kalra
0 siblings, 1 reply; 10+ messages in thread
From: Alice Ryhl @ 2026-04-07 8:06 UTC (permalink / raw)
To: Shivam Kalra
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On Thu, Apr 02, 2026 at 08:23:42AM +0530, Shivam Kalra wrote:
> On 02/04/26 07:31, Shivam Kalra wrote:
> > On 02/04/26 02:49, Alice Ryhl wrote:
> > Should we set nr_pages first? Right now, another thread may observe the
> > range being unmapped but still see the old nr_pages value.
>
> Or is this what you mean?
> <snip>
> struct vmap_node *vn = addr_to_node(addr);
> /* Notify kmemleak of the reduced allocation size before unmapping. */
> kmemleak_free_part(...);
>
> spin_lock(&vn->busy.lock);
> vm->nr_pages = new_nr_pages;
> spin_unlock(&vn->busy.lock);
>
> vunmap_range(...);
> vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
>
> <snip>
> If this is the case, then I agree this will be much cleaner.
Yes, I mean that you should change nr_pages first, before you calling
vunmap_range().
Alice
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
2026-04-07 8:06 ` Alice Ryhl
@ 2026-04-07 11:05 ` Shivam Kalra
0 siblings, 0 replies; 10+ messages in thread
From: Shivam Kalra @ 2026-04-07 11:05 UTC (permalink / raw)
To: Alice Ryhl
Cc: Andrew Morton, Uladzislau Rezki, linux-mm, linux-kernel,
Danilo Krummrich
On 07/04/26 13:36, Alice Ryhl wrote:
> Yes, I mean that you should change nr_pages first, before you calling
> vunmap_range().
>
> Alice
Addressed this concern in v10.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-07 11:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 21:19 ` Alice Ryhl
2026-04-02 2:01 ` Shivam Kalra
2026-04-02 2:53 ` Shivam Kalra
2026-04-07 8:06 ` Alice Ryhl
2026-04-07 11:05 ` Shivam Kalra
2026-04-01 17:16 ` [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox