From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1625730E83A
	for <linux-kernel@vger.kernel.org>; Wed,  1 Apr 2026 21:19:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775078367; cv=none; b=oa5SNon8+NpgU4/zUS+T1tW2AUbSBH4jAHpFG9nOe1tCfbhsA9EOljznJxigXFIGNtk8hBgoIBDFMyqIGEFobb3ICr4HhWAoBjFioNmzZZOmiDeTJ/5WacM2Um4W6vzGKpvYf/CwzySfBvy4+1rUD0hlj35JqiBbwfMe3LvAI4w=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775078367; c=relaxed/simple;
	bh=A2RYWfYnacNJ6BQzTvKgm/c1RGcPa/KVbuhU98WD1k8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=PkiaqXUkB4nQor/kYO1z1tuE9O2PjRCMQiW363/SptCB7hRFlEdaVPOG6jABV08x5JUrt1WrxQe6TPJgtgdhvsFGr8sgZSodUITAUh4xh0+ixpcW1VQFqoxLG0DsnIvMoucHbYPQy9BG9Xv0KDVE6ipcHxxF+ix+N+Cnj46cq24=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tnvVvNPW; arc=none smtp.client-ip=209.85.128.73
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tnvVvNPW"
Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-487018c8244so666085e9.3
        for <linux-kernel@vger.kernel.org>; Wed, 01 Apr 2026 14:19:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1775078364; x=1775683164; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=EuPQyGyuWKxEs8V1iwj2Rv9xmDEoESFNVmV4eOpaK8A=;
        b=tnvVvNPWCPymp9f94YKn6EWxMOdyKQ8XwoNnMBAPjshq+BBrn3ZyxtSNRH9p/XE5Kl
         a86HP3JCC6eE6zIcIzLt0XozpPfe+0JAmr/4gCwBuFfJ7VSU+AXRvfrSjIOUclLsKIiO
         AVT6Wxh3E9tnBCxW/4GekwyR5+dILqqHgntOWiDb/RE5SW7RNIp6VJQ79ZZCP1j3HUWU
         XHpCZB4vBiPxt+7/aBhLKJElki0qcWrLxSsBBEAEwDLZJNBho9a6dPIyKE6885IZdJq6
         fBOiRYE2Yws26fPnLy2hnlr2vB8iCOmLovCMwaXc7CiE4oio3oiC8jNAdcy6mhGNhoc0
         Vtiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1775078364; x=1775683164;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=EuPQyGyuWKxEs8V1iwj2Rv9xmDEoESFNVmV4eOpaK8A=;
        b=cw7EOCGn14gR1sUm4e4jYkvm0I71zeXIPknX7Co3YVxs5Xqc2Rdcx21XPh9u9oRC1z
         7ZBiYNm2E+JCPLr19Q/LcpTCE5uXkvIw8rJU+gjx7hLswrV0na17cEMIf7oIERP01BaJ
         /7k8XGDBmh+Ly3yCwbB/QZEXP5vxKGgNh/lKmktJC3Jw57kClPdHXFyYFWPI5Im2/qsp
         1JPiHH77vlkLe/68hMdgip9Hzfy3lV5dmGV2smuMo0QaNt4adZi3ktHMTcD5d190n60n
         M065bqeCL1srx2QDPLR4Ea/8QfQX/SG2WdfR97BM7/RTXMpTuoz/eZ+OCOQTLHWpF9Dy
         uvfw==
X-Forwarded-Encrypted: i=1; AJvYcCULG65rGGX71Wdpb5BQai89SmXb74VzN7KCw24IaU2d19wrUAXzprzXSwumSfBbGsxZ8O/00FK93wWyLWM=@vger.kernel.org
X-Gm-Message-State: AOJu0YzJ//hTKuohBzwfYGH/GzYdVCKJJhxHK08yv1RF7vQ4LCffBUNa
	O6o5t6z96U5kGwuA46myOIOd/Wc4453VSkfnT3QrhKvRbBs0TEUrMVyyzxK81xanda0xcKluFlP
	ZobUYozRaZ1nqzGbTMg==
X-Received: from wmjt1.prod.google.com ([2002:a7b:c3c1:0:b0:485:318b:96c6])
 (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:600c:8115:b0:486:f8d6:5dea with SMTP id 5b1f17b1804b1-488835997d6mr84594845e9.19.1775078364368;
 Wed, 01 Apr 2026 14:19:24 -0700 (PDT)
Date: Wed, 1 Apr 2026 21:19:21 +0000
In-Reply-To: <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260401-vmalloc-shrink-v9-0-bf58dfb997d8@zohomail.in> <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in>
Message-ID: <ac2L2VNExpp8t7oF@google.com>
Subject: Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink
From: Alice Ryhl <aliceryhl@google.com>
To: shivamkalra98@zohomail.in
Cc: Andrew Morton <akpm@linux-foundation.org>, Uladzislau Rezki <urezki@gmail.com>, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, Danilo Krummrich <dakr@kernel.org>
Content-Type: text/plain; charset="utf-8"

On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@zohomail.in>
> 
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
> 
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
> 
> Additionally, allocations with VM_USERMAP are skipped because
> remap_vmalloc_range_partial() validates mapping requests against the
> unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
> to return NULL for the unmapped range.
> 
> To protect concurrent readers, the shrink path uses Node lock to
> synchronize before freeing the pages.
> 
> Finally, we notify kmemleak of the reduced allocation size using
> kmemleak_free_part() to prevent the kmemleak scanner from faulting on
> the newly unmapped virtual addresses.
> 
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
> 
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
>  mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1c6d747220ce..a7731e54560b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>  		goto need_realloc;
>  	}
>  
> -	/*
> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> -	 * would be a good heuristic for when to shrink the vm_area?
> -	 */
>  	if (size <= old_size) {
> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
>  		/* Zero out "freed" memory, potentially for future realloc. */
>  		if (want_init_on_free() || want_init_on_alloc(flags))
>  			memset((void *)p + size, 0, old_size - size);
> +
> +		/*
> +		 * Free tail pages when shrink crosses a page boundary.
> +		 *
> +		 * Skip huge page allocations (page_order > 0) as partial
> +		 * freeing would require splitting.
> +		 *
> +		 * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
> +		 * be reset before pages are returned to the allocator.
> +		 *
> +		 * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
> +		 * mapping requests against the unchanged vm->size; freeing
> +		 * tail pages would cause vmalloc_to_page() to return NULL for
> +		 * the unmapped range.
> +		 *
> +		 * Skip if either GFP_NOFS or GFP_NOIO are used.
> +		 * kmemleak_free_part() internally allocates with
> +		 * GFP_KERNEL, which could trigger a recursive deadlock
> +		 * if we are under filesystem or I/O reclaim.
> +		 */
> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> +		    !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
> +		    gfp_has_io_fs(flags)) {
> +			unsigned long addr = (unsigned long)kasan_reset_tag(p);
> +			unsigned int old_nr_pages = vm->nr_pages;
> +
> +			/* Notify kmemleak of the reduced allocation size before unmapping. */
> +			kmemleak_free_part(
> +				(void *)addr + ((unsigned long)new_nr_pages
> +						<< PAGE_SHIFT),
> +				(unsigned long)(old_nr_pages - new_nr_pages)
> +					<< PAGE_SHIFT);
> +
> +			vunmap_range(addr + ((unsigned long)new_nr_pages
> +					     << PAGE_SHIFT),
> +				     addr + ((unsigned long)old_nr_pages
> +					     << PAGE_SHIFT));
> +
> +			/*
> +			 * Use the node lock to synchronize with concurrent
> +			 * readers (vmalloc_info_show).
> +			 */
> +			struct vmap_node *vn = addr_to_node(addr);
> +
> +			spin_lock(&vn->busy.lock);
> +			vm->nr_pages = new_nr_pages;
> +			spin_unlock(&vn->busy.lock);

Should we set nr_pages first? Right now, another thread may observe the
range being unmapped but still see the old nr_pages value.

> +			vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
> +		}
>  		vm->requested_size = size;
>  		kasan_vrealloc(p, old_size, size);
>  		return (void *)p;
> 
> -- 
> 2.43.0
> 
>