Re: [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Uladzislau Rezki <urezki@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator
Date: Wed, 15 Oct 2025 10:23:19 +0200	[thread overview]
Message-ID: <aO9Z90vphRcyFv2n@milan> (raw)
In-Reply-To: <20251014182754.4329-1-vishal.moola@gmail.com>

On Tue, Oct 14, 2025 at 11:27:54AM -0700, Vishal Moola (Oracle) wrote:
> Sometimes, vm_area_alloc_pages() will want many pages from the buddy
> allocator. Rather than making requests to the buddy allocator for at
> most 100 pages at a time, we can eagerly request large order pages a
> smaller number of times.
> 
> We still split the large order pages down to order-0 as the rest of the
> vmalloc code (and some callers) depend on it. We still defer to the bulk
> allocator and fallback path in case of order-0 pages or failure.
> 
> Running 1000 iterations of allocations on a small 4GB system finds:
> 
> 1000 2mb allocations:
> 	[Baseline]			[This patch]
> 	real    46.310s			real    34.380s
> 	user    0.001s			user    0.008s
> 	sys     46.058s			sys     34.152s
> 
> 10000 200kb allocations:
> 	[Baseline]			[This patch]
> 	real    56.104s			real    43.946s
> 	user    0.001s			user    0.003s
> 	sys     55.375s			sys     43.259s
> 
> 10000 20kb allocations:
> 	[Baseline]			[This patch]
> 	real    0m8.438s		real    0m9.160s
> 	user    0m0.001s		user    0m0.002s
> 	sys     0m7.936s		sys     0m8.671s
> 
> This is an RFC, comments and thoughts are welcomed. There is a
> clear benefit to be had for large allocations, but there is
> some regression for smaller allocations.
> 
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
>  mm/vmalloc.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 97cef2cc14d3..0a25e5cf841c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3621,6 +3621,38 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	unsigned int nr_allocated = 0;
>  	struct page *page;
>  	int i;
> +	gfp_t large_gfp = (gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	unsigned int large_order = ilog2(nr_pages - nr_allocated);
>
If large_order is > MAX_ORDER - 1 then there is no need even try
larger_order attempt.

>> unsigned int large_order = ilog2(nr_pages - nr_allocated);
I think, it is better to introduce "remaining" variable which
is nr_pages - nr_allocated. And on entry "remaining" can be set
to just nr_pages because "nr_allocated" is zero.

Maybe it is worth to drop/warn if __GFP_COMP is set also?

> +
> +	/*
> +	 * Initially, attempt to have the page allocator give us large order
> +	 * pages. Do not attempt allocating smaller than order chunks since
> +	 * __vmap_pages_range() expects physically contigous pages of exactly
> +	 * order long chunks.
> +	 */
> +	while (large_order > order && nr_allocated < nr_pages) {
> +		/*
> +		 * High-order nofail allocations are really expensive and
> +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
> +		 * and compaction etc.
> +		 */
> +		if (gfp & __GFP_NOFAIL)
> +			break;
> +		if (nid == NUMA_NO_NODE)
> +			page = alloc_pages_noprof(large_gfp, large_order);
> +		else
> +			page = alloc_pages_node_noprof(nid, large_gfp, large_order);
> +
> +		if (unlikely(!page))
> +			break;
> +
> +		split_page(page, large_order);
> +		for (i = 0; i < (1U << large_order); i++)
> +			pages[nr_allocated + i] = page + i;
> +
> +		nr_allocated += 1U << large_order;
> +		large_order = ilog2(nr_pages - nr_allocated);
> +	}
>  
So this is a third path for page allocation. The question is should we
try all orders? Like already noted by Matthew, if there is no 5-order
page but there is 4-order page? Try until we check all orders. For
example we can get different order pages to fulfill the request.

The concern is then if it is a waste of high-order pages. Because we can
easily go with a single page allocator. Whereas someone in a system can not.

Apart of that, maybe we can drop the bulk_path instead of having three paths?

--
Uladzislau Rezki

next prev parent reply	other threads:[~2025-10-15  8:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 18:27 [RFC PATCH] mm/vmalloc: request large order pages from buddy allocator Vishal Moola (Oracle)
2025-10-15  3:56 ` Matthew Wilcox
2025-10-15  9:28   ` Vishal Moola (Oracle)
2025-10-16 16:12     ` Uladzislau Rezki
2025-10-16 17:42       ` Vishal Moola (Oracle)
2025-10-16 19:02         ` Vishal Moola (Oracle)
2025-10-17 16:15           ` Uladzislau Rezki
2025-10-17 17:19             ` Uladzislau Rezki
2025-10-20 18:23               ` Vishal Moola (Oracle)
2025-10-15  8:23 ` Uladzislau Rezki [this message]
2025-10-15 10:44   ` Vishal Moola (Oracle)
2025-10-15 12:42     ` Matthew Wilcox
2025-10-15 13:42       ` Uladzislau Rezki
2025-10-16  6:57         ` Christoph Hellwig
2025-10-16 11:53           ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aO9Z90vphRcyFv2n@milan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vishal.moola@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.