Re: [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Uladzislau Rezki <urezki@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator
Date: Wed, 15 Oct 2025 02:28:49 -0700	[thread overview]
Message-ID: <aO9pUS3zLHsap81f@fedora> (raw)
In-Reply-To: <aO8behuGn5jVo28K@casper.infradead.org>

On Wed, Oct 15, 2025 at 04:56:42AM +0100, Matthew Wilcox wrote:
> On Tue, Oct 14, 2025 at 11:27:54AM -0700, Vishal Moola (Oracle) wrote:
> > Running 1000 iterations of allocations on a small 4GB system finds:
> > 
> > 1000 2mb allocations:
> > 	[Baseline]			[This patch]
> > 	real    46.310s			real    34.380s
> > 	user    0.001s			user    0.008s
> > 	sys     46.058s			sys     34.152s
> > 
> > 10000 200kb allocations:
> > 	[Baseline]			[This patch]
> > 	real    56.104s			real    43.946s
> > 	user    0.001s			user    0.003s
> > 	sys     55.375s			sys     43.259s
> > 
> > 10000 20kb allocations:
> > 	[Baseline]			[This patch]
> > 	real    0m8.438s		real    0m9.160s
> > 	user    0m0.001s		user    0m0.002s
> > 	sys     0m7.936s		sys     0m8.671s
> 
> I'd be more confident in the 20kB numbers if you'd done 10x more
> iterations.

I actually ran my a number of times to mitigate the effects of possibly
too small sample sizes, so I do have that number for you too:

[Baseline]			[This patch]
real    1m28.119s		real    1m32.630s
user    0m0.012s		user    0m0.011s
sys     1m23.270s		sys     1m28.529s

> Also, I think 20kB is probably an _interesting_ number, but it's not
> going to display your change to its best advantage.  A 32kB allocation
> will look much better, for example.

I provided those particular numbers to showcase the beneficial cases as
well as the regression case.

I ended up finding that allocating sizes <=20k had noticeable
regressions, while [20k, 90k] was approximately the same, and >= 90k had
improvements (getting more and more noticeable as size grows in
magnitude).

> Also, can you go into more detail of the test?  Based on our off-list
> conversation, we were talking about allocating something like 100MB
> of memory (in these various sizes) then freeing it, just to be sure
> that we're measuring the performance of the buddy allocator and
> not the PCP list.

Yup.

What I did to get the numbers above was: call vmalloc() n number of
times on that particular size, then free all those allocations. Then,
I did 1000 iterations of that to get a better average.

So none of these allocations were freed until all the allocations were
done, every single time.

> > This is an RFC, comments and thoughts are welcomed. There is a
> > clear benefit to be had for large allocations, but there is
> > some regression for smaller allocations.
> 
> Also we think that there's probably a later win to be had by
> not splitting the page we allocated.
> 
> At some point, we should also start allocating frozen pages
> for vmalloc.  That's going to be interesting for the users which
> map vmalloc pages to userspace.
> 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 97cef2cc14d3..0a25e5cf841c 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3621,6 +3621,38 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> >  	unsigned int nr_allocated = 0;
> >  	struct page *page;
> >  	int i;
> > +	gfp_t large_gfp = (gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> > +	unsigned int large_order = ilog2(nr_pages - nr_allocated);
> > +
> > +	/*
> > +	 * Initially, attempt to have the page allocator give us large order
> > +	 * pages. Do not attempt allocating smaller than order chunks since
> > +	 * __vmap_pages_range() expects physically contigous pages of exactly
> > +	 * order long chunks.
> > +	 */
> > +	while (large_order > order && nr_allocated < nr_pages) {
> > +		/*
> > +		 * High-order nofail allocations are really expensive and
> > +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
> > +		 * and compaction etc.
> > +		 */
> > +		if (gfp & __GFP_NOFAIL)
> > +			break;
> 
> sure, but we could just clear NOFAIL from the large_gfp flags instead
> of giving up on this path so quickly?

Yeah I'll do that.

> > +		if (nid == NUMA_NO_NODE)
> > +			page = alloc_pages_noprof(large_gfp, large_order);
> > +		else
> > +			page = alloc_pages_node_noprof(nid, large_gfp, large_order);
> > +
> > +		if (unlikely(!page))
> > +			break;
> 
> I'm not entirely convinced here.  We might want to fall back to the next
> larger size.  eg if we try to allocate an order-6 page, and there's not
> one readily available, perhaps we should try to allocate an order-5 page
> instead of falling back to the bulk allocator?

I'll try that out and see how that affects the numbers.

> > @@ -3665,7 +3697,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> >  		}
> >  	}
> >  
> > -	/* High-order pages or fallback path if "bulk" fails. */
> > +	/* High-order arch pages or fallback path if "bulk" fails. */
> 
> I'm not quite clear what this comment change is meant to convey?

Ah that was a comment I had inserted to remind myself that the passed in
order is tied to the HAVE_ARCH_HUGE_VMALLOC config. I meant to leave
that out.

next prev parent reply	other threads:[~2025-10-15  9:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 18:27 [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator Vishal Moola (Oracle)
2025-10-15  3:56 ` Matthew Wilcox
2025-10-15  9:28   ` Vishal Moola (Oracle) [this message]
2025-10-16 16:12     ` Uladzislau Rezki
2025-10-16 17:42       ` Vishal Moola (Oracle)
2025-10-16 19:02         ` Vishal Moola (Oracle)
2025-10-17 16:15           ` Uladzislau Rezki
2025-10-17 17:19             ` Uladzislau Rezki
2025-10-20 18:23               ` Vishal Moola (Oracle)
2025-10-15  8:23 ` Uladzislau Rezki
2025-10-15 10:44   ` Vishal Moola (Oracle)
2025-10-15 12:42     ` Matthew Wilcox
2025-10-15 13:42       ` Uladzislau Rezki
2025-10-16  6:57         ` Christoph Hellwig
2025-10-16 11:53           ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aO9pUS3zLHsap81f@fedora \
    --to=vishal.moola@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.