public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alice Ryhl <aliceryhl@google.com>
To: shivamkalra98@zohomail.in
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Danilo Krummrich <dakr@kernel.org>
Subject: Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
Date: Wed, 1 Apr 2026 21:19:21 +0000	[thread overview]
Message-ID: <ac2L2VNExpp8t7oF@google.com> (raw)
In-Reply-To: <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in>

On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@zohomail.in>
> 
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
> 
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
> 
> Additionally, allocations with VM_USERMAP are skipped because
> remap_vmalloc_range_partial() validates mapping requests against the
> unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
> to return NULL for the unmapped range.
> 
> To protect concurrent readers, the shrink path uses Node lock to
> synchronize before freeing the pages.
> 
> Finally, we notify kmemleak of the reduced allocation size using
> kmemleak_free_part() to prevent the kmemleak scanner from faulting on
> the newly unmapped virtual addresses.
> 
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
> 
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
>  mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1c6d747220ce..a7731e54560b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>  		goto need_realloc;
>  	}
>  
> -	/*
> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> -	 * would be a good heuristic for when to shrink the vm_area?
> -	 */
>  	if (size <= old_size) {
> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
>  		/* Zero out "freed" memory, potentially for future realloc. */
>  		if (want_init_on_free() || want_init_on_alloc(flags))
>  			memset((void *)p + size, 0, old_size - size);
> +
> +		/*
> +		 * Free tail pages when shrink crosses a page boundary.
> +		 *
> +		 * Skip huge page allocations (page_order > 0) as partial
> +		 * freeing would require splitting.
> +		 *
> +		 * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
> +		 * be reset before pages are returned to the allocator.
> +		 *
> +		 * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
> +		 * mapping requests against the unchanged vm->size; freeing
> +		 * tail pages would cause vmalloc_to_page() to return NULL for
> +		 * the unmapped range.
> +		 *
> +		 * Skip if either GFP_NOFS or GFP_NOIO are used.
> +		 * kmemleak_free_part() internally allocates with
> +		 * GFP_KERNEL, which could trigger a recursive deadlock
> +		 * if we are under filesystem or I/O reclaim.
> +		 */
> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> +		    !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
> +		    gfp_has_io_fs(flags)) {
> +			unsigned long addr = (unsigned long)kasan_reset_tag(p);
> +			unsigned int old_nr_pages = vm->nr_pages;
> +
> +			/* Notify kmemleak of the reduced allocation size before unmapping. */
> +			kmemleak_free_part(
> +				(void *)addr + ((unsigned long)new_nr_pages
> +						<< PAGE_SHIFT),
> +				(unsigned long)(old_nr_pages - new_nr_pages)
> +					<< PAGE_SHIFT);
> +
> +			vunmap_range(addr + ((unsigned long)new_nr_pages
> +					     << PAGE_SHIFT),
> +				     addr + ((unsigned long)old_nr_pages
> +					     << PAGE_SHIFT));
> +
> +			/*
> +			 * Use the node lock to synchronize with concurrent
> +			 * readers (vmalloc_info_show).
> +			 */
> +			struct vmap_node *vn = addr_to_node(addr);
> +
> +			spin_lock(&vn->busy.lock);
> +			vm->nr_pages = new_nr_pages;
> +			spin_unlock(&vn->busy.lock);

Should we set nr_pages first? Right now, another thread may observe the
range being unmapped but still see the old nr_pages value.

> +			vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
> +		}
>  		vm->requested_size = size;
>  		kasan_vrealloc(p, old_size, size);
>  		return (void *)p;
> 
> -- 
> 2.43.0
> 
> 

  reply	other threads:[~2026-04-01 21:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 17:16 [PATCH v9 0/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 1/4] mm/vmalloc: extract vm_area_free_pages() helper from vfree() Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 2/4] mm/vmalloc: use physical page count for vrealloc() grow-in-place check Shivam Kalra via B4 Relay
2026-04-01 17:16 ` [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink Shivam Kalra via B4 Relay
2026-04-01 21:19   ` Alice Ryhl [this message]
2026-04-02  2:01     ` Shivam Kalra
2026-04-02  2:53       ` Shivam Kalra
2026-04-07  8:06         ` Alice Ryhl
2026-04-01 17:16 ` [PATCH v9 4/4] lib/test_vmalloc: add vrealloc test case Shivam Kalra via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac2L2VNExpp8t7oF@google.com \
    --to=aliceryhl@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dakr@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shivamkalra98@zohomail.in \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox