From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 87DD977F39; Wed, 22 Apr 2026 13:42:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776865377; cv=none; b=cgobiQG4026MbssGo/ojl/mp3zW4cg9AknMOSbx+33rlYCXSOjqWVijuBwYH1wagXVokbNefR3YHc9zAz1ucYSSMzv7fl4jzIMyaoBBCx+ThNzF11QpsFkjbJgW8NtQi9CkteIJOSwfn65O0CltxweP/8O/Qtm3id1ZVhQg8wvI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776865377; c=relaxed/simple; bh=Vx64GZKttVMr0fTXGUstSKBpCcNADw/xlPJ3ED67K7s=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=QwbOu4p6Jb2uRtRIAW21nJ1TX1Za/w6WMy53z/zqDbG7i3OncA+LPIc6B94+8rXQns7Xs87MCYipKj6NuAVBz1wYxEC+XR5/UvSSE6lmXkgllCAazplPEYYQzLodIOAiD0c9wRWl9iOmRXe3r6pTDzj5QNspMZVE7U8tDyLHnBY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=dDlHpjOo; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="dDlHpjOo" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7316C22D9; Wed, 22 Apr 2026 06:42:50 -0700 (PDT) Received: from [10.57.89.34] (unknown [10.57.89.34]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EDBC23F836; Wed, 22 Apr 2026 06:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1776865376; bh=Vx64GZKttVMr0fTXGUstSKBpCcNADw/xlPJ3ED67K7s=; h=Date:Subject:To:References:From:In-Reply-To:From; b=dDlHpjOoHba0kXrIfjxHqbOUa3sTKw06GESEAJ79dMzP3hxiA+2YYw2vDGTUtfPPM kQ9M9hNMP6d51DYOKvtQ4y+N/oPhVDpNERO1kDpUrPRx+wsfKWje1sOFiqcuVl/b0c GnwQG4aiN5Wk/IiAj+rPtrBuyWwaCjEMLFjw7oPs= Message-ID: <0a8ba718-7ff2-4876-9154-9dd5d404df3e@arm.com> Date: Wed, 22 Apr 2026 14:42:51 +0100 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 0/3] mm: Free contiguous order-0 pages efficiently Content-Language: en-GB To: Muhammad Usama Anjum , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Uladzislau Rezki , Nick Terrell , David Sterba , Vishal Moola , linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, david.hildenbrand@arm.com References: <20260401101634.2868165-1-usama.anjum@arm.com> From: Ryan Roberts In-Reply-To: <20260401101634.2868165-1-usama.anjum@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi Andrew, Small nudge on this series; I believe it has all appropriate A-b/R-b tags and no outstanding actions. Is it ready to go into mm-new? Thanks, Ryan On 01/04/2026 11:16, Muhammad Usama Anjum wrote: > Hi All, > > A recent change to vmalloc caused some performance benchmark regressions (see > [1]). I'm attempting to fix that (and at the same time significantly improve > beyond the baseline) by freeing a contiguous set of order-0 pages as a batch. > > At the same time I observed that free_contig_range() was essentially doing the > same thing as vfree() so I've fixed it there too. While at it, optimize the > __free_contig_frozen_range() as well. > > Check that the contiguous range falls in the same section. If they aren't enabled, > the if conditions get optimized out by the compiler as memdesc_section() returns 0. > See num_pages_contiguous() for more details about it. > > [1] https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com > > v6.18 - Before the patch causing regression was added > mm-new - current latest code > this series - v2 series of these patches > > (>0 is faster, <0 is slower, (R)/(I) = statistically significant > Regression/Improvement) > > v6.18 vs mm-new > +-----------------+----------------------------------------------------------+-------------------+-------------+ > | Benchmark | Result Class | v6.18 (base) | mm-new | > +=================+==========================================================+===================+=============+ > | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 653643.33 | (R) -50.92% | > | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 366167.33 | (R) -11.96% | > | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 489484.00 | (R) -35.21% | > | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1011250.33 | (R) -36.45% | > | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1086812.33 | (R) -31.83% | > | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 657940.00 | (R) -38.62% | > | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 765422.00 | (R) -24.84% | > | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 2468585.00 | (R) -37.83% | > | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 2815758.33 | (R) -26.32% | > | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 4851969.00 | (R) -37.76% | > | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 4496257.33 | (R) -31.15% | > | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 570605.00 | -8.97% | > | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 500866.00 | -5.88% | > | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 499733.00 | -6.95% | > | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 5266237.67 | (R) -40.19% | > | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 490284.00 | -2.10% | > | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 850986.33 | (R) -48.03% | > | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 2712106.00 | (R) -40.48% | > | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 111151.33 | 3.52% | > +-----------------+----------------------------------------------------------+-------------------+-------------+ > > v6.18 vs mm-new with patches > +-----------------+----------------------------------------------------------+-------------------+--------------+ > | Benchmark | Result Class | v6.18 (base) | this series | > +=================+==========================================================+===================+==============+ > | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 653643.33 | -14.02% | > | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 366167.33 | -7.23% | > | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 489484.00 | -1.57% | > | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1011250.33 | 1.57% | > | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1086812.33 | (I) 15.75% | > | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 657940.00 | (I) 9.05% | > | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 765422.00 | (I) 38.45% | > | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 2468585.00 | (I) 12.56% | > | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 2815758.33 | (I) 38.61% | > | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 4851969.00 | (I) 13.43% | > | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 4496257.33 | (I) 49.21% | > | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 570605.00 | -8.47% | > | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 500866.00 | -8.17% | > | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 499733.00 | -5.54% | > | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 5266237.67 | (I) 4.63% | > | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 490284.00 | 1.53% | > | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 850986.33 | -0.00% | > | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 2712106.00 | 1.22% | > | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 111151.33 | (I) 4.98% | > +-----------------+----------------------------------------------------------+-------------------+--------------+ > > mm-new vs vmalloc_2 results are in 2/3 patch. > > So this series is mitigating the regression on average as results show -14% to 49% improvement. > > Thanks, > Muhammad Usama Anjum > > --- > Changes since v5: > - Patch 1: Move page_to_pfn() outside the loop free_prepared_contig_range() > - Patch 2: Change subject of the patch > Changes since v4: (summary) > - Patch 1: move can_free initialization inside the loop > - Patch 1: Use pfn_to_page() for each pfn instead of page++ > - Patch 2: Use num_pages_contiguous() instead of raw loop > > Chagnes since v3: (summary) > - Introduce __free_contig_range_common() in first patch and use it in > 3rd patch as well > - Cosmetic changes related to comments and kerneldoc > > Changes since v2: (summary) > - Patch 1 and 3: Rework the loop to check for memory sections > - Patch 2: Rework by removing the BUG on and add helper free_pages_bulk() > > Changes since v1: > - Update description > - Rebase on mm-new and rerun benchmarks/tests > - Patch 1: move FPI_PREPARED check and add todo > - Patch 2: Rework catering newer changes in vfree() > - New Patch 3: optimizes __free_contig_frozen_range() > > Muhammad Usama Anjum (1): > mm/page_alloc: Optimize __free_contig_frozen_range() > > Ryan Roberts (2): > mm/page_alloc: Optimize free_contig_range() > vmalloc: Optimize vfree with free_pages_bulk() > > include/linux/gfp.h | 4 ++ > mm/page_alloc.c | 143 ++++++++++++++++++++++++++++++++++++++++++-- > mm/vmalloc.c | 16 ++--- > 3 files changed, 146 insertions(+), 17 deletions(-) >