All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muhammad Usama Anjum <usama.anjum@arm.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Nick Terrell <terrelln@fb.com>, David Sterba <dsterba@suse.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, Ryan.Roberts@arm.com,
	david.hildenbrand@arm.com
Cc: usama.anjum@arm.com
Subject: Re: [PATCH v4 2/3] vmalloc: Optimize vfree
Date: Mon, 30 Mar 2026 17:15:27 +0100	[thread overview]
Message-ID: <7eb26b81-ec64-470c-9fd2-52f9b9692b48@arm.com> (raw)
In-Reply-To: <82e4055c-2343-49ee-b772-d3f4d134f8d0@kernel.org>

On 30/03/2026 3:38 pm, David Hildenbrand (Arm) wrote:
> On 3/27/26 13:57, Muhammad Usama Anjum wrote:
>> From: Ryan Roberts <ryan.roberts@arm.com>
>>
>> Whenever vmalloc allocates high order pages (e.g. for a huge mapping) it
>> must immediately split_page() to order-0 so that it remains compatible
>> with users that want to access the underlying struct page.
>> Commit a06157804399 ("mm/vmalloc: request large order pages from buddy
>> allocator") recently made it much more likely for vmalloc to allocate
>> high order pages which are subsequently split to order-0.
>>
>> Unfortunately this had the side effect of causing performance
>> regressions for tight vmalloc/vfree loops (e.g. test_vmalloc.ko
>> benchmarks). See Closes: tag. This happens because the high order pages
>> must be gotten from the buddy but then because they are split to
>> order-0, when they are freed they are freed to the order-0 pcp.
>> Previously allocation was for order-0 pages so they were recycled from
>> the pcp.
>>
>> It would be preferable if when vmalloc allocates an (e.g.) order-3 page
>> that it also frees that order-3 page to the order-3 pcp, then the
>> regression could be removed.
>>
>> So let's do exactly that; update stats separately first as coalescing is
>> hard to do correctly without complexity. Use free_pages_bulk() which uses
>> the new __free_contig_range() API to batch-free contiguous ranges of pfns.
>> This not only removes the regression, but significantly improves
>> performance of vfree beyond the baseline.
>>
>> A selection of test_vmalloc benchmarks running on arm64 server class
>> system. mm-new is the baseline. Commit a06157804399 ("mm/vmalloc: request
>> large order pages from buddy allocator") was added in v6.19-rc1 where we
>> see regressions. Then with this change performance is much better. (>0
>> is faster, <0 is slower, (R)/(I) = statistically significant
>> Regression/Improvement):
>>
>> +-----------------+----------------------------------------------------------+-------------------+--------------------+
>> | Benchmark       | Result Class                                             |   mm-new          |  this series       |
>> +=================+==========================================================+===================+====================+
>> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |        1331843.33 |         (I) 67.17% |
>> |                 | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |         415907.33 |             -5.14% |
>> |                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |         755448.00 |         (I) 53.55% |
>> |                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |        1591331.33 |         (I) 57.26% |
>> |                 | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |        1594345.67 |         (I) 68.46% |
>> |                 | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |        1071826.00 |         (I) 79.27% |
>> |                 | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |        1018385.00 |         (I) 84.17% |
>> |                 | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         |        3970899.67 |         (I) 77.01% |
>> |                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         |        3821788.67 |         (I) 89.44% |
>> |                 | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         |        7795968.00 |         (I) 82.67% |
>> |                 | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         |        6530169.67 |        (I) 118.09% |
>> |                 | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |         626808.33 |             -0.98% |
>> |                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         532145.67 |             -1.68% |
>> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |         537032.67 |             -0.96% |
>> |                 | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     |        8805069.00 |         (I) 74.58% |
>> |                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |         500824.67 |              4.35% |
>> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |        1637554.67 |         (I) 76.99% |
>> |                 | random_size_alloc_test: p:1, h:0, l:500000 (usec)        |        4556288.67 |         (I) 72.23% |
>> |                 | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |         107371.00 |             -0.70% |
>> +-----------------+----------------------------------------------------------+-------------------+--------------------+
>>
>> Fixes: a06157804399 ("mm/vmalloc: request large order pages from buddy allocator")
>> Closes: https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com/
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Co-developed-by: Muhammad Usama Anjum <usama.anjum@arm.com>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
>> ---
>> Changes since v3:
>> - Add kerneldoc comment and update description
>> - Add tag
>>
>> Changes since v2:
>> - Remove BUG_ON in favour of simple implementation as this has never
>>   been seen to output any bug in the past as well
>> - Move the free loop to separate function, free_pages_bulk()
>> - Update stats, lruvec_stat in separate loop
>>
>> Changes since v1:
>> - Rebase on mm-new
>> - Rerun benchmarks
>> ---
>>  include/linux/gfp.h |  2 ++
>>  mm/page_alloc.c     | 38 ++++++++++++++++++++++++++++++++++++++
>>  mm/vmalloc.c        | 16 +++++-----------
>>  3 files changed, 45 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index 7c1f9da7c8e56..71f9097ab99a0 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -239,6 +239,8 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>>  				struct page **page_array);
>>  #define __alloc_pages_bulk(...)			alloc_hooks(alloc_pages_bulk_noprof(__VA_ARGS__))
>>  
>> +void free_pages_bulk(struct page **page_array, unsigned long nr_pages);
>> +
>>  unsigned long alloc_pages_bulk_mempolicy_noprof(gfp_t gfp,
>>  				unsigned long nr_pages,
>>  				struct page **page_array);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 18a96b51aa0be..64be8a9019dca 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5175,6 +5175,44 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
>>  }
>>  EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);
>>  
>> +/*
>> + * free_pages_bulk - Free an array of order-0 pages
>> + * @page_array: Array of pages to free
>> + * @nr_pages: The number of pages in the array
>> + *
>> + * Free the order-0 pages. Adjacent entries whose PFNs form a contiguous
>> + * run are released with a single __free_contig_range() call.
>> + *
>> + * This assumes page_array is sorted in ascending PFN order. Without that,
>> + * the function still frees all pages, but contiguous runs may not be
>> + * detected and the freeing pattern can degrade to freeing one page at a
>> + * time.
>> + *
>> + * Context: Sleepable process context only; calls cond_resched()
>> + */
>> +void free_pages_bulk(struct page **page_array, unsigned long nr_pages)
>> +{
>> +	unsigned long start_pfn = 0, pfn;
>> +	unsigned long i, nr_contig = 0;
>> +
>> +	for (i = 0; i < nr_pages; i++) {
>> +		pfn = page_to_pfn(page_array[i]);
>> +		if (!nr_contig) {
>> +			start_pfn = pfn;
>> +			nr_contig = 1;
>> +		} else if (start_pfn + nr_contig != pfn) {
>> +			__free_contig_range(start_pfn, nr_contig);
>> +			start_pfn = pfn;
>> +			nr_contig = 1;
>> +			cond_resched();
>> +		} else {
>> +			nr_contig++;
>> +		}
>> +	}
> 
> What happened to the idea of using num_pages_contiguous()? I think that
> should generate more efficient code (all we're doing is comparing
> pointers really on SPARSEMEM_VMEMMAP) and the end result looks more
> readable?
Sorry, I misunderstood you in the light of duplicate usage of
num_pages_contiguous(). I'll update the implementation.

Copying from previous thread.

>>> Could we use num_pages_contiguous() here?
>>>
>>> while (nr_pages) {
>>> 	unsigned long nr_contig_pages = num_pages_contiguous(page_array, nr_pages);
>>>
>>> 	__free_contig_range(pfn_to_page(*page_array), nr_contig_pages);
>>> 	
>>> 	nr_pages -= nr_contig;
>>> 	page_array += nr_contig;
>>> 	cond_resched();
>>> }
>>>
>>> Something like that?
>> __free_contig_range() is already checking for the sections. If
>> num_pages_contiguous() is called here, it'll cause the duplication
>> of the section check.
> No problem. For configs we care about it's optimized out entirely either
> way.

Thanks,
Usama


  reply	other threads:[~2026-03-30 16:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 12:57 [PATCH v4 0/3] mm: Free contiguous order-0 pages efficiently Muhammad Usama Anjum
2026-03-27 12:57 ` [PATCH v4 1/3] mm/page_alloc: Optimize free_contig_range() Muhammad Usama Anjum
2026-03-27 15:54   ` Zi Yan
2026-03-30 14:27   ` Vlastimil Babka (SUSE)
2026-03-31 13:51     ` Muhammad Usama Anjum
2026-03-30 14:30   ` Vlastimil Babka (SUSE)
2026-03-30 16:36     ` Muhammad Usama Anjum
2026-03-30 14:33   ` David Hildenbrand (Arm)
2026-03-31 13:52     ` Muhammad Usama Anjum
2026-03-27 12:57 ` [PATCH v4 2/3] vmalloc: Optimize vfree Muhammad Usama Anjum
2026-03-30 12:30   ` Uladzislau Rezki
2026-03-31 15:08     ` Muhammad Usama Anjum
2026-03-30 14:35   ` Vlastimil Babka (SUSE)
2026-03-31 15:09     ` Muhammad Usama Anjum
2026-03-30 14:38   ` David Hildenbrand (Arm)
2026-03-30 16:15     ` Muhammad Usama Anjum [this message]
2026-03-31 10:08       ` David Hildenbrand
2026-03-27 12:57 ` [PATCH v4 3/3] mm/page_alloc: Optimize __free_contig_frozen_range() Muhammad Usama Anjum
2026-03-27 15:54   ` Zi Yan
2026-03-30 14:36   ` Vlastimil Babka (SUSE)
2026-03-30 14:40   ` David Hildenbrand (Arm)
2026-03-30 14:41   ` David Hildenbrand (Arm)
2026-03-27 19:42 ` [PATCH v4 0/3] mm: Free contiguous order-0 pages efficiently Andrew Morton
2026-03-30 11:27   ` Muhammad Usama Anjum
2026-03-30 14:43   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7eb26b81-ec64-470c-9fd2-52f9b9692b48@arm.com \
    --to=usama.anjum@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Ryan.Roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=david.hildenbrand@arm.com \
    --cc=david@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=terrelln@fb.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.