Re: [PATCH 2/2] mm/vmalloc: Add attempt_larger_order_alloc parameter

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vishal Moola <vishal.moola@gmail.com>,
	Dev Jain <dev.jain@arm.com>, Baoquan He <bhe@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] mm/vmalloc: Add attempt_larger_order_alloc parameter
Date: Thu, 18 Dec 2025 12:33:55 +0100	[thread overview]
Message-ID: <aUPmo686XKsD1uQY@milan> (raw)
In-Reply-To: <37efa0a9-99bc-4099-ba64-2474f3f09aa2@arm.com>

On Thu, Dec 18, 2025 at 11:12:15AM +0000, Ryan Roberts wrote:
> On 17/12/2025 19:22, Uladzislau Rezki wrote:
> > On Wed, Dec 17, 2025 at 05:01:19PM +0000, Ryan Roberts wrote:
> >> On 17/12/2025 15:20, Ryan Roberts wrote:
> >>> On 17/12/2025 12:02, Uladzislau Rezki wrote:
> >>>>> On 16/12/2025 21:19, Uladzislau Rezki (Sony) wrote:
> >>>>>> Introduce a module parameter to enable or disable the large-order
> >>>>>> allocation path in vmalloc. High-order allocations are disabled by
> >>>>>> default so far, but users may explicitly enable them at runtime if
> >>>>>> desired.
> >>>>>>
> >>>>>> High-order pages allocated for vmalloc are immediately split into
> >>>>>> order-0 pages and later freed as order-0, which means they do not
> >>>>>> feed the per-CPU page caches. As a result, high-order attempts tend
> >>>>>> to bypass the PCP fastpath and fall back to the buddy allocator that
> >>>>>> can affect performance.
> >>>>>>
> >>>>>> However, when the PCP caches are empty, high-order allocations may
> >>>>>> show better performance characteristics especially for larger
> >>>>>> allocation requests.
> >>>>>
> >>>>> I wonder if a better solution would be "allocate order-0 if available in pcp,
> >>>>> else try large order, else fallback to order-0" Could that provide the best of
> >>>>> all worlds without needing a configuration knob?
> >>>>>
> >>>> I am not sure, to me it looks like a bit odd. 
> >>>
> >>> Perhaps it would feel better if it was generalized to "first try allocation from
> >>> PCP list, highest to lowest order, then try allocation from the buddy, highest
> >>> to lowest order"?
> >>>
> >>>> Ideally it would be
> >>>> good just free it as high-order page and not order-0 peaces.
> >>>
> >>> Yeah perhaps that's better. How about something like this (very lightly tested
> >>> and no performance results yet):
> >>>
> >>> (And I should admit I'm not 100% sure it is safe to call free_frozen_pages()
> >>> with a contiguous run of order-0 pages, but I'm not seeing any warnings or
> >>> memory leaks when running mm selftests...)
> >>>
> >>> ---8<---
> >>> commit caa3e5eb5bfade81a32fa62d1a8924df1eb0f619
> >>> Author: Ryan Roberts <ryan.roberts@arm.com>
> >>> Date:   Wed Dec 17 15:11:08 2025 +0000
> >>>
> >>>     WIP
> >>>
> >>>     Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>
> >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> >>> index b155929af5b1..d25f5b867e6b 100644
> >>> --- a/include/linux/gfp.h
> >>> +++ b/include/linux/gfp.h
> >>> @@ -383,6 +383,8 @@ extern void __free_pages(struct page *page, unsigned int order);
> >>>  extern void free_pages_nolock(struct page *page, unsigned int order);
> >>>  extern void free_pages(unsigned long addr, unsigned int order);
> >>>
> >>> +void free_pages_bulk(struct page *page, int nr_pages);
> >>> +
> >>>  #define __free_page(page) __free_pages((page), 0)
> >>>  #define free_page(addr) free_pages((addr), 0)
> >>>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index 822e05f1a964..5f11224cf353 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -5304,6 +5304,48 @@ static void ___free_pages(struct page *page, unsigned int
> >>> order,
> >>>  	}
> >>>  }
> >>>
> >>> +static void free_frozen_pages_bulk(struct page *page, int nr_pages)
> >>> +{
> >>> +	while (nr_pages) {
> >>> +		unsigned int fit_order, align_order, order;
> >>> +		unsigned long pfn;
> >>> +
> >>> +		pfn = page_to_pfn(page);
> >>> +		fit_order = ilog2(nr_pages);
> >>> +		align_order = pfn ? __ffs(pfn) : fit_order;
> >>> +		order = min3(fit_order, align_order, MAX_PAGE_ORDER);
> >>> +
> >>> +		free_frozen_pages(page, order);
> >>> +
> >>> +		page += 1U << order;
> >>> +		nr_pages -= 1U << order;
> >>> +	}
> >>> +}
> >>> +
> >>> +void free_pages_bulk(struct page *page, int nr_pages)
> >>> +{
> >>> +	struct page *start = NULL;
> >>> +	bool can_free;
> >>> +	int i;
> >>> +
> >>> +	for (i = 0; i < nr_pages; i++, page++) {
> >>> +		VM_BUG_ON_PAGE(PageHead(page), page);
> >>> +		VM_BUG_ON_PAGE(PageTail(page), page);
> >>> +
> >>> +		can_free = put_page_testzero(page);
> >>> +
> >>> +		if (!can_free && start) {
> >>> +			free_frozen_pages_bulk(start, page - start);
> >>> +			start = NULL;
> >>> +		} else if (can_free && !start) {
> >>> +			start = page;
> >>> +		}
> >>> +	}
> >>> +
> >>> +	if (start)
> >>> +		free_frozen_pages_bulk(start, page - start);
> >>> +}
> >>> +
> >>>  /**
> >>>   * __free_pages - Free pages allocated with alloc_pages().
> >>>   * @page: The page pointer returned from alloc_pages().
> >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >>> index ecbac900c35f..8f782bac1ece 100644
> >>> --- a/mm/vmalloc.c
> >>> +++ b/mm/vmalloc.c
> >>> @@ -3429,7 +3429,8 @@ void vfree_atomic(const void *addr)
> >>>  void vfree(const void *addr)
> >>>  {
> >>>  	struct vm_struct *vm;
> >>> -	int i;
> >>> +	struct page *start;
> >>> +	int i, nr;
> >>>
> >>>  	if (unlikely(in_interrupt())) {
> >>>  		vfree_atomic(addr);
> >>> @@ -3455,17 +3456,26 @@ void vfree(const void *addr)
> >>>  	/* All pages of vm should be charged to same memcg, so use first one. */
> >>>  	if (vm->nr_pages && !(vm->flags & VM_MAP_PUT_PAGES))
> >>>  		mod_memcg_page_state(vm->pages[0], MEMCG_VMALLOC, -vm->nr_pages);
> >>> -	for (i = 0; i < vm->nr_pages; i++) {
> >>> +
> >>> +	start = vm->pages[0];
> >>> +	BUG_ON(!start);
> >>> +	nr = 1;
> >>> +	for (i = 1; i < vm->nr_pages; i++) {
> >>>  		struct page *page = vm->pages[i];
> >>>
> >>>  		BUG_ON(!page);
> >>> -		/*
> >>> -		 * High-order allocs for huge vmallocs are split, so
> >>> -		 * can be freed as an array of order-0 allocations
> >>> -		 */
> >>> -		__free_page(page);
> >>> -		cond_resched();
> >>> +
> >>> +		if (start + nr != page) {
> >>> +			free_pages_bulk(start, nr);
> >>> +			start = page;
> >>> +			nr = 1;
> >>> +			cond_resched();
> >>> +		} else {
> >>> +			nr++;
> >>> +		}
> >>>  	}
> >>> +	free_pages_bulk(start, nr);
> >>> +
> >>>  	if (!(vm->flags & VM_MAP_PUT_PAGES))
> >>>  		atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
> >>>  	kvfree(vm->pages);
> >>> ---8<---
> >>
> >> I tested this on a performance monitoring system and see a huge improvement for 
> >> the test_vmalloc tests.
> >>
> >> Both columns are compared to v6.18. 6-19-0-rc1 has Vishal's change to allocate 
> >> large orders, which I previously reported the regressions for. vfree-high-order 
> >> adds the above patch to free contiguous order-0 pages in bulk.
> >>
> >> (R)/(I) means statistically significant regression/improvement. Results are 
> >> normalized so that less than zero is regression and greater than zero is 
> >> improvement.
> >>
> >> +-----------------+----------------------------------------------------------+--------------+------------------+
> >> | Benchmark       | Result Class                                             |   6-19-0-rc1 | vfree-high-order |
> >> +=================+==========================================================+==============+==================+
> >> | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |  (R) -40.69% |        (I) 3.98% |
> >> |                 | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |        0.10% |           -1.47% |
> >> |                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |  (R) -22.74% |       (I) 11.57% |
> >> |                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |  (R) -23.63% |       (I) 47.42% |
> >> |                 | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |       -1.58% |      (I) 106.01% |
> >> |                 | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |  (R) -24.39% |       (I) 99.12% |
> >> |                 | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |    (I) 2.34% |      (I) 196.87% |
> >> |                 | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         |  (R) -23.29% |      (I) 125.42% |
> >> |                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         |    (I) 3.74% |      (I) 238.59% |
> >> |                 | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         |  (R) -23.80% |      (I) 132.38% |
> >> |                 | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         |   (R) -2.84% |      (I) 514.75% |
> >> |                 | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |        2.74% |            0.33% |
> >> |                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |        0.58% |            1.36% |
> >> |                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |       -0.66% |            1.48% |
> >> |                 | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     |  (R) -25.24% |       (I) 77.95% |
> >> |                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |       -0.58% |            0.60% |
> >> |                 | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |  (R) -45.75% |        (I) 8.51% |
> >> |                 | random_size_alloc_test: p:1, h:0, l:500000 (usec)        |  (R) -28.16% |       (I) 65.34% |
> >> |                 | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |       -0.54% |           -0.33% |
> >> +-----------------+----------------------------------------------------------+--------------+------------------+
> >>
> >> What do you think?
> >>
> > You were first :)
> > 
> > Some figures from me:
> > 
> > # Default(3 pages)
> 
> What is Default? I'm guessing it's the state prior to Vishal's patch?
> 
Right.

> > fix_size_alloc_test passed: 1 failed: 0 xfailed: 0 repeat: 1 loops: 1000000 avg: 541868 usec
> > fix_size_alloc_test passed: 1 failed: 0 xfailed: 0 repeat: 1 loops: 1000000 avg: 542515 usec
> > fix_size_alloc_test passed: 1 failed: 0 xfailed: 0 repeat: 1 loops: 1000000 avg: 541561 usec
> > fix_size_alloc_test passed: 1 failed: 0 xfailed: 0 repeat: 1 loops: 1000000 avg: 542951 usec
> > 
> > # Patch(3 pages)
> 
> What is Patch? I'm guessing state after applying both Vishal's and my patches?
> 
Right.

--
Uladzislau Rezki

next prev parent reply	other threads:[~2025-12-18 11:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16 21:19 [PATCH 1/2] mm/vmalloc: Add large-order allocation helper Uladzislau Rezki (Sony)
2025-12-16 21:19 ` [PATCH 2/2] mm/vmalloc: Add attempt_larger_order_alloc parameter Uladzislau Rezki (Sony)
2025-12-16 23:36   ` Andrew Morton
2025-12-17 11:37     ` Uladzislau Rezki
2025-12-17  3:54   ` Baoquan He
2025-12-17 11:44     ` Uladzislau Rezki
2025-12-17 11:49       ` Dev Jain
2025-12-17 11:53         ` Uladzislau Rezki
2025-12-18 10:34       ` Baoquan He
2025-12-17  8:27   ` Ryan Roberts
2025-12-17 12:02     ` Uladzislau Rezki
2025-12-17 15:20       ` Ryan Roberts
2025-12-17 17:01         ` Ryan Roberts
2025-12-17 19:22           ` Uladzislau Rezki
2025-12-18 11:12             ` Ryan Roberts
2025-12-18 11:33               ` Uladzislau Rezki [this message]
2025-12-17 20:08           ` Uladzislau Rezki
2025-12-18 11:14             ` Ryan Roberts
2025-12-18 11:29               ` Uladzislau Rezki
2025-12-18  4:55         ` Dev Jain
2025-12-18 11:53           ` Ryan Roberts
2025-12-18 11:56             ` Ryan Roberts
2025-12-19  8:33               ` David Hildenbrand (Red Hat)
2025-12-19 11:17                 ` Ryan Roberts
2025-12-19  0:34             ` Vishal Moola (Oracle)
2025-12-19 11:23               ` Ryan Roberts
2025-12-24  6:35             ` Dev Jain
  -- strict thread matches above, loose matches on Subject: below --
2025-12-20 13:46 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUPmo686XKsD1uQY@milan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=vishal.moola@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.