From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1625730E83A for ; Wed, 1 Apr 2026 21:19:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775078367; cv=none; b=oa5SNon8+NpgU4/zUS+T1tW2AUbSBH4jAHpFG9nOe1tCfbhsA9EOljznJxigXFIGNtk8hBgoIBDFMyqIGEFobb3ICr4HhWAoBjFioNmzZZOmiDeTJ/5WacM2Um4W6vzGKpvYf/CwzySfBvy4+1rUD0hlj35JqiBbwfMe3LvAI4w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775078367; c=relaxed/simple; bh=A2RYWfYnacNJ6BQzTvKgm/c1RGcPa/KVbuhU98WD1k8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PkiaqXUkB4nQor/kYO1z1tuE9O2PjRCMQiW363/SptCB7hRFlEdaVPOG6jABV08x5JUrt1WrxQe6TPJgtgdhvsFGr8sgZSodUITAUh4xh0+ixpcW1VQFqoxLG0DsnIvMoucHbYPQy9BG9Xv0KDVE6ipcHxxF+ix+N+Cnj46cq24= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tnvVvNPW; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tnvVvNPW" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-487018c8244so666085e9.3 for ; Wed, 01 Apr 2026 14:19:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1775078364; x=1775683164; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EuPQyGyuWKxEs8V1iwj2Rv9xmDEoESFNVmV4eOpaK8A=; b=tnvVvNPWCPymp9f94YKn6EWxMOdyKQ8XwoNnMBAPjshq+BBrn3ZyxtSNRH9p/XE5Kl a86HP3JCC6eE6zIcIzLt0XozpPfe+0JAmr/4gCwBuFfJ7VSU+AXRvfrSjIOUclLsKIiO AVT6Wxh3E9tnBCxW/4GekwyR5+dILqqHgntOWiDb/RE5SW7RNIp6VJQ79ZZCP1j3HUWU XHpCZB4vBiPxt+7/aBhLKJElki0qcWrLxSsBBEAEwDLZJNBho9a6dPIyKE6885IZdJq6 fBOiRYE2Yws26fPnLy2hnlr2vB8iCOmLovCMwaXc7CiE4oio3oiC8jNAdcy6mhGNhoc0 Vtiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775078364; x=1775683164; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EuPQyGyuWKxEs8V1iwj2Rv9xmDEoESFNVmV4eOpaK8A=; b=cw7EOCGn14gR1sUm4e4jYkvm0I71zeXIPknX7Co3YVxs5Xqc2Rdcx21XPh9u9oRC1z 7ZBiYNm2E+JCPLr19Q/LcpTCE5uXkvIw8rJU+gjx7hLswrV0na17cEMIf7oIERP01BaJ /7k8XGDBmh+Ly3yCwbB/QZEXP5vxKGgNh/lKmktJC3Jw57kClPdHXFyYFWPI5Im2/qsp 1JPiHH77vlkLe/68hMdgip9Hzfy3lV5dmGV2smuMo0QaNt4adZi3ktHMTcD5d190n60n M065bqeCL1srx2QDPLR4Ea/8QfQX/SG2WdfR97BM7/RTXMpTuoz/eZ+OCOQTLHWpF9Dy uvfw== X-Forwarded-Encrypted: i=1; AJvYcCULG65rGGX71Wdpb5BQai89SmXb74VzN7KCw24IaU2d19wrUAXzprzXSwumSfBbGsxZ8O/00FK93wWyLWM=@vger.kernel.org X-Gm-Message-State: AOJu0YzJ//hTKuohBzwfYGH/GzYdVCKJJhxHK08yv1RF7vQ4LCffBUNa O6o5t6z96U5kGwuA46myOIOd/Wc4453VSkfnT3QrhKvRbBs0TEUrMVyyzxK81xanda0xcKluFlP ZobUYozRaZ1nqzGbTMg== X-Received: from wmjt1.prod.google.com ([2002:a7b:c3c1:0:b0:485:318b:96c6]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:8115:b0:486:f8d6:5dea with SMTP id 5b1f17b1804b1-488835997d6mr84594845e9.19.1775078364368; Wed, 01 Apr 2026 14:19:24 -0700 (PDT) Date: Wed, 1 Apr 2026 21:19:21 +0000 In-Reply-To: <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260401-vmalloc-shrink-v9-0-bf58dfb997d8@zohomail.in> <20260401-vmalloc-shrink-v9-3-bf58dfb997d8@zohomail.in> Message-ID: Subject: Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink From: Alice Ryhl To: shivamkalra98@zohomail.in Cc: Andrew Morton , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Danilo Krummrich Content-Type: text/plain; charset="utf-8" On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote: > From: Shivam Kalra > > When vrealloc() shrinks an allocation and the new size crosses a page > boundary, unmap and free the tail pages that are no longer needed. This > reclaims physical memory that was previously wasted for the lifetime > of the allocation. > > The heuristic is simple: always free when at least one full page becomes > unused. Huge page allocations (page_order > 0) are skipped, as partial > freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS > are also skipped, as their direct-map permissions must be reset before > pages are returned to the page allocator, which is handled by > vm_reset_perms() during vfree(). > > Additionally, allocations with VM_USERMAP are skipped because > remap_vmalloc_range_partial() validates mapping requests against the > unchanged vm->size; freeing tail pages would cause vmalloc_to_page() > to return NULL for the unmapped range. > > To protect concurrent readers, the shrink path uses Node lock to > synchronize before freeing the pages. > > Finally, we notify kmemleak of the reduced allocation size using > kmemleak_free_part() to prevent the kmemleak scanner from faulting on > the newly unmapped virtual addresses. > > The virtual address reservation (vm->size / vmap_area) is intentionally > kept unchanged, preserving the address for potential future grow-in-place > support. > > Suggested-by: Danilo Krummrich > Signed-off-by: Shivam Kalra > --- > mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 52 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 1c6d747220ce..a7731e54560b 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align > goto need_realloc; > } > > - /* > - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What > - * would be a good heuristic for when to shrink the vm_area? > - */ > if (size <= old_size) { > + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; > + > /* Zero out "freed" memory, potentially for future realloc. */ > if (want_init_on_free() || want_init_on_alloc(flags)) > memset((void *)p + size, 0, old_size - size); > + > + /* > + * Free tail pages when shrink crosses a page boundary. > + * > + * Skip huge page allocations (page_order > 0) as partial > + * freeing would require splitting. > + * > + * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must > + * be reset before pages are returned to the allocator. > + * > + * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates > + * mapping requests against the unchanged vm->size; freeing > + * tail pages would cause vmalloc_to_page() to return NULL for > + * the unmapped range. > + * > + * Skip if either GFP_NOFS or GFP_NOIO are used. > + * kmemleak_free_part() internally allocates with > + * GFP_KERNEL, which could trigger a recursive deadlock > + * if we are under filesystem or I/O reclaim. > + */ > + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) && > + !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) && > + gfp_has_io_fs(flags)) { > + unsigned long addr = (unsigned long)kasan_reset_tag(p); > + unsigned int old_nr_pages = vm->nr_pages; > + > + /* Notify kmemleak of the reduced allocation size before unmapping. */ > + kmemleak_free_part( > + (void *)addr + ((unsigned long)new_nr_pages > + << PAGE_SHIFT), > + (unsigned long)(old_nr_pages - new_nr_pages) > + << PAGE_SHIFT); > + > + vunmap_range(addr + ((unsigned long)new_nr_pages > + << PAGE_SHIFT), > + addr + ((unsigned long)old_nr_pages > + << PAGE_SHIFT)); > + > + /* > + * Use the node lock to synchronize with concurrent > + * readers (vmalloc_info_show). > + */ > + struct vmap_node *vn = addr_to_node(addr); > + > + spin_lock(&vn->busy.lock); > + vm->nr_pages = new_nr_pages; > + spin_unlock(&vn->busy.lock); Should we set nr_pages first? Right now, another thread may observe the range being unmapped but still see the old nr_pages value. > + vm_area_free_pages(vm, new_nr_pages, old_nr_pages); > + } > vm->requested_size = size; > kasan_vrealloc(p, old_size, size); > return (void *)p; > > -- > 2.43.0 > >