Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alice Ryhl <aliceryhl@google.com>
To: shivamkalra98@zohomail.in
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Danilo Krummrich <dakr@kernel.org>
Subject: Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
Date: Wed, 1 Apr 2026 21:19:21 +0000	[thread overview]
Message-ID: <ac2L2VNExpp8t7oF@google.com> (raw)
In-Reply-To: <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in>

On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@zohomail.in>
> 
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
> 
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
> 
> Additionally, allocations with VM_USERMAP are skipped because
> remap_vmalloc_range_partial() validates mapping requests against the
> unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
> to return NULL for the unmapped range.
> 
> To protect concurrent readers, the shrink path uses Node lock to
> synchronize before freeing the pages.
> 
> Finally, we notify kmemleak of the reduced allocation size using
> kmemleak_free_part() to prevent the kmemleak scanner from faulting on
> the newly unmapped virtual addresses.
> 
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
> 
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
>  mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1c6d747220ce..a7731e54560b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>  		goto need_realloc;
>  	}
>  
> -	/*
> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> -	 * would be a good heuristic for when to shrink the vm_area?
> -	 */
>  	if (size <= old_size) {
> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
>  		/* Zero out "freed" memory, potentially for future realloc. */
>  		if (want_init_on_free() || want_init_on_alloc(flags))
>  			memset((void *)p + size, 0, old_size - size);
> +
> +		/*
> +		 * Free tail pages when shrink crosses a page boundary.
> +		 *
> +		 * Skip huge page allocations (page_order > 0) as partial
> +		 * freeing would require splitting.
> +		 *
> +		 * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
> +		 * be reset before pages are returned to the allocator.
> +		 *
> +		 * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
> +		 * mapping requests against the unchanged vm->size; freeing
> +		 * tail pages would cause vmalloc_to_page() to return NULL for
> +		 * the unmapped range.
> +		 *
> +		 * Skip if either GFP_NOFS or GFP_NOIO are used.
> +		 * kmemleak_free_part() internally allocates with
> +		 * GFP_KERNEL, which could trigger a recursive deadlock
> +		 * if we are under filesystem or I/O reclaim.
> +		 */
> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> +		    !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
> +		    gfp_has_io_fs(flags)) {
> +			unsigned long addr = (unsigned long)kasan_reset_tag(p);
> +			unsigned int old_nr_pages = vm->nr_pages;
> +
> +			/* Notify kmemleak of the reduced allocation size before unmapping. */
> +			kmemleak_free_part(
> +				(void *)addr + ((unsigned long)new_nr_pages
> +						<< PAGE_SHIFT),
> +				(unsigned long)(old_nr_pages - new_nr_pages)
> +					<< PAGE_SHIFT);
> +
> +			vunmap_range(addr + ((unsigned long)new_nr_pages
> +					     << PAGE_SHIFT),
> +				     addr + ((unsigned long)old_nr_pages
> +					     << PAGE_SHIFT));
> +
> +			/*
> +			 * Use the node lock to synchronize with concurrent
> +			 * readers (vmalloc_info_show).
> +			 */
> +			struct vmap_node *vn = addr_to_node(addr);
> +
> +			spin_lock(&vn->busy.lock);
> +			vm->nr_pages = new_nr_pages;
> +			spin_unlock(&vn->busy.lock);

Should we set nr_pages first? Right now, another thread may observe the
range being unmapped but still see the old nr_pages value.

> +			vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
> +		}
>  		vm->requested_size = size;
>  		kasan_vrealloc(p, old_size, size);
>  		return (void *)p;
> 
> -- 
> 2.43.0
> 
>

next prev parent reply	other threads:[~2026-04-01 21:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra
2026-04-01 17:16 ` Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra
2026-04-01 17:16   ` Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check Shivam Kalra
2026-04-01 17:16   ` Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra
2026-04-01 17:16   ` Shivam Kalra via B4 Relay
2026-04-01 21:19   ` Alice Ryhl [this message]
2026-04-02  2:01     ` Shivam Kalra
2026-04-02  2:53       ` Shivam Kalra
2026-04-07  8:06         ` Alice Ryhl
2026-04-07 11:05           ` Shivam Kalra
2026-04-01 17:16 ` [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case Shivam Kalra
2026-04-01 17:16   ` Shivam Kalra via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac2L2VNExpp8t7oF@google.com \
    --to=aliceryhl@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dakr@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shivamkalra98@zohomail.in \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.