[PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
@ 2025-12-15  5:30 Barry Song
  2025-12-18 13:01 ` David Hildenbrand (Red Hat)
  2025-12-18 14:00 ` Uladzislau Rezki
  0 siblings, 2 replies; 12+ messages in thread
From: Barry Song @ 2025-12-15  5:30 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: dri-devel, jstultz, linaro-mm-sig, linux-kernel, linux-media,
	Barry Song, David Hildenbrand, Uladzislau Rezki, Sumit Semwal,
	Maxime Ripard, Tangquan Zheng

From: Barry Song <v-songbaohua@oppo.com>

In many cases, the pages passed to vmap() may include high-order
pages allocated with __GFP_COMP flags. For example, the systemheap
often allocates pages in descending order: order 8, then 4, then 0.
Currently, vmap() iterates over every page individually—even pages
inside a high-order block are handled one by one.

This patch detects high-order pages and maps them as a single
contiguous block whenever possible.

An alternative would be to implement a new API, vmap_sg(), but that
change seems to be large in scope.

When vmapping a 128MB dma-buf using the systemheap, this patch
makes system_heap_do_vmap() roughly 17× faster.

W/ patch:
[   10.404769] system_heap_do_vmap took 2494000 ns
[   12.525921] system_heap_do_vmap took 2467008 ns
[   14.517348] system_heap_do_vmap took 2471008 ns
[   16.593406] system_heap_do_vmap took 2444000 ns
[   19.501341] system_heap_do_vmap took 2489008 ns

W/o patch:
[    7.413756] system_heap_do_vmap took 42626000 ns
[    9.425610] system_heap_do_vmap took 42500992 ns
[   11.810898] system_heap_do_vmap took 42215008 ns
[   14.336790] system_heap_do_vmap took 42134992 ns
[   16.373890] system_heap_do_vmap took 42750000 ns

Cc: David Hildenbrand <david@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: John Stultz <jstultz@google.com>
Cc: Maxime Ripard <mripard@kernel.org>
Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 * diff with rfc:
 Many code refinements based on David's suggestions, thanks!
 Refine comment and changelog according to Uladzislau, thanks!
 rfc link:
 https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/

 mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 41dd01e8430c..8d577767a9e5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
 	return err;
 }
 
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int stride, unsigned int max_steps, unsigned int idx)
+{
+	int nr_pages = 1;
+
+	/*
+	 * Currently, batching is only supported in vmap_pages_range
+	 * when page_shift == PAGE_SHIFT.
+	 */
+	if (stride != 1)
+		return 0;
+
+	nr_pages = compound_nr(pages[idx]);
+	if (nr_pages == 1)
+		return 0;
+	if (max_steps < nr_pages)
+		return 0;
+
+	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+		return compound_order(pages[idx]);
+	return 0;
+}
+
 /*
  * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
  * flush caches.
@@ -655,23 +678,33 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 		pgprot_t prot, struct page **pages, unsigned int page_shift)
 {
 	unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
+	unsigned int stride;
 
 	WARN_ON(page_shift < PAGE_SHIFT);
 
+	/*
+	 * For vmap(), users may allocate pages from high orders down to
+	 * order 0, while always using PAGE_SHIFT as the page_shift.
+	 * We first check whether the initial page is a compound page. If so,
+	 * there may be an opportunity to batch multiple pages together.
+	 */
 	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
-			page_shift == PAGE_SHIFT)
+			(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
 		return vmap_small_pages_range_noflush(addr, end, prot, pages);
 
-	for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
-		int err;
+	stride = 1U << (page_shift - PAGE_SHIFT);
+	for (i = 0; i < nr; ) {
+		int err, order;
 
-		err = vmap_range_noflush(addr, addr + (1UL << page_shift),
+		order = get_vmap_batch_order(pages, stride, nr - i, i);
+		err = vmap_range_noflush(addr, addr + (1UL << (page_shift + order)),
 					page_to_phys(pages[i]), prot,
-					page_shift);
+					page_shift + order);
 		if (err)
 			return err;
 
-		addr += 1UL << page_shift;
+		addr += 1UL  << (page_shift + order);
+		i += 1U << (order + page_shift - PAGE_SHIFT);
 	}
 
 	return 0;
-- 
2.39.3 (Apple Git-146)



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-15  5:30 [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible Barry Song
@ 2025-12-18 13:01 ` David Hildenbrand (Red Hat)
  2025-12-18 13:54   ` Uladzislau Rezki
  2025-12-18 14:00 ` Uladzislau Rezki
  1 sibling, 1 reply; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-18 13:01 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: dri-devel, jstultz, linaro-mm-sig, linux-kernel, linux-media,
	Barry Song, Uladzislau Rezki, Sumit Semwal, Maxime Ripard,
	Tangquan Zheng

On 12/15/25 06:30, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In many cases, the pages passed to vmap() may include high-order
> pages allocated with __GFP_COMP flags. For example, the systemheap
> often allocates pages in descending order: order 8, then 4, then 0.
> Currently, vmap() iterates over every page individually—even pages
> inside a high-order block are handled one by one.
> 
> This patch detects high-order pages and maps them as a single
> contiguous block whenever possible.
> 
> An alternative would be to implement a new API, vmap_sg(), but that
> change seems to be large in scope.
> 
> When vmapping a 128MB dma-buf using the systemheap, this patch
> makes system_heap_do_vmap() roughly 17× faster.
> 
> W/ patch:
> [   10.404769] system_heap_do_vmap took 2494000 ns
> [   12.525921] system_heap_do_vmap took 2467008 ns
> [   14.517348] system_heap_do_vmap took 2471008 ns
> [   16.593406] system_heap_do_vmap took 2444000 ns
> [   19.501341] system_heap_do_vmap took 2489008 ns
> 
> W/o patch:
> [    7.413756] system_heap_do_vmap took 42626000 ns
> [    9.425610] system_heap_do_vmap took 42500992 ns
> [   11.810898] system_heap_do_vmap took 42215008 ns
> [   14.336790] system_heap_do_vmap took 42134992 ns
> [   16.373890] system_heap_do_vmap took 42750000 ns
> 

That's quite a speedup.

> Cc: David Hildenbrand <david@kernel.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   * diff with rfc:
>   Many code refinements based on David's suggestions, thanks!
>   Refine comment and changelog according to Uladzislau, thanks!
>   rfc link:
>   https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
> 
>   mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 41dd01e8430c..8d577767a9e5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>   	return err;
>   }
>   
> +static inline int get_vmap_batch_order(struct page **pages,
> +		unsigned int stride, unsigned int max_steps, unsigned int idx)
> +{
> +	int nr_pages = 1;

unsigned int, maybe

Why are you initializing nr_pages when you overwrite it below?

> +
> +	/*
> +	 * Currently, batching is only supported in vmap_pages_range
> +	 * when page_shift == PAGE_SHIFT.

I don't know the code so realizing how we go from page_shift to stride 
too me a second. Maybe only talk about stride here?

OTOH, is "stride" really the right terminology?

we calculate it as

	stride = 1U << (page_shift - PAGE_SHIFT);

page_shift - PAGE_SHIFT should give us an "order". So is this a 
"granularity" in nr_pages?

Again, I don't know this code, so sorry for the question.

> +	 */
> +	if (stride != 1)
> +		return 0;
> +
> +	nr_pages = compound_nr(pages[idx]);
> +	if (nr_pages == 1)
> +		return 0;
> +	if (max_steps < nr_pages)
> +		return 0;

Might combine these simple checks

if (nr_pages == 1 || max_steps < nr_pages)
	return 0;


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-18 13:01 ` David Hildenbrand (Red Hat)
@ 2025-12-18 13:54   ` Uladzislau Rezki
  2025-12-18 21:24     ` Barry Song
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2025-12-18 13:54 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), Barry Song
  Cc: Barry Song, akpm, linux-mm, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, Barry Song, Uladzislau Rezki,
	Sumit Semwal, Maxime Ripard, Tangquan Zheng

On Thu, Dec 18, 2025 at 02:01:56PM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/15/25 06:30, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> > 
> > In many cases, the pages passed to vmap() may include high-order
> > pages allocated with __GFP_COMP flags. For example, the systemheap
> > often allocates pages in descending order: order 8, then 4, then 0.
> > Currently, vmap() iterates over every page individually—even pages
> > inside a high-order block are handled one by one.
> > 
> > This patch detects high-order pages and maps them as a single
> > contiguous block whenever possible.
> > 
> > An alternative would be to implement a new API, vmap_sg(), but that
> > change seems to be large in scope.
> > 
> > When vmapping a 128MB dma-buf using the systemheap, this patch
> > makes system_heap_do_vmap() roughly 17× faster.
> > 
> > W/ patch:
> > [   10.404769] system_heap_do_vmap took 2494000 ns
> > [   12.525921] system_heap_do_vmap took 2467008 ns
> > [   14.517348] system_heap_do_vmap took 2471008 ns
> > [   16.593406] system_heap_do_vmap took 2444000 ns
> > [   19.501341] system_heap_do_vmap took 2489008 ns
> > 
> > W/o patch:
> > [    7.413756] system_heap_do_vmap took 42626000 ns
> > [    9.425610] system_heap_do_vmap took 42500992 ns
> > [   11.810898] system_heap_do_vmap took 42215008 ns
> > [   14.336790] system_heap_do_vmap took 42134992 ns
> > [   16.373890] system_heap_do_vmap took 42750000 ns
> > 
> 
> That's quite a speedup.
> 
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Uladzislau Rezki <urezki@gmail.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: John Stultz <jstultz@google.com>
> > Cc: Maxime Ripard <mripard@kernel.org>
> > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >   * diff with rfc:
> >   Many code refinements based on David's suggestions, thanks!
> >   Refine comment and changelog according to Uladzislau, thanks!
> >   rfc link:
> >   https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
> > 
> >   mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
> >   1 file changed, 39 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 41dd01e8430c..8d577767a9e5 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> >   	return err;
> >   }
> > +static inline int get_vmap_batch_order(struct page **pages,
> > +		unsigned int stride, unsigned int max_steps, unsigned int idx)
> > +{
> > +	int nr_pages = 1;
> 
> unsigned int, maybe
> 
> Why are you initializing nr_pages when you overwrite it below?
> 
> > +
> > +	/*
> > +	 * Currently, batching is only supported in vmap_pages_range
> > +	 * when page_shift == PAGE_SHIFT.
> 
> I don't know the code so realizing how we go from page_shift to stride too
> me a second. Maybe only talk about stride here?
> 
> OTOH, is "stride" really the right terminology?
> 
> we calculate it as
> 
> 	stride = 1U << (page_shift - PAGE_SHIFT);
> 
> page_shift - PAGE_SHIFT should give us an "order". So is this a
> "granularity" in nr_pages?
> 
> Again, I don't know this code, so sorry for the question.
> 
To me "stride" also sounds unclear.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-18 13:54   ` Uladzislau Rezki
@ 2025-12-18 21:24     ` Barry Song
  2025-12-22 13:08       ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Barry Song @ 2025-12-18 21:24 UTC (permalink / raw)
  To: urezki
  Cc: 21cnbao, akpm, david, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	v-songbaohua, zhengtangquan

On Thu, Dec 18, 2025 at 9:55 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Thu, Dec 18, 2025 at 02:01:56PM +0100, David Hildenbrand (Red Hat) wrote:
> > On 12/15/25 06:30, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@oppo.com>
> > >
> > > In many cases, the pages passed to vmap() may include high-order
> > > pages allocated with __GFP_COMP flags. For example, the systemheap
> > > often allocates pages in descending order: order 8, then 4, then 0.
> > > Currently, vmap() iterates over every page individually—even pages
> > > inside a high-order block are handled one by one.
> > >
> > > This patch detects high-order pages and maps them as a single
> > > contiguous block whenever possible.
> > >
> > > An alternative would be to implement a new API, vmap_sg(), but that
> > > change seems to be large in scope.
> > >
> > > When vmapping a 128MB dma-buf using the systemheap, this patch
> > > makes system_heap_do_vmap() roughly 17× faster.
> > >
> > > W/ patch:
> > > [   10.404769] system_heap_do_vmap took 2494000 ns
> > > [   12.525921] system_heap_do_vmap took 2467008 ns
> > > [   14.517348] system_heap_do_vmap took 2471008 ns
> > > [   16.593406] system_heap_do_vmap took 2444000 ns
> > > [   19.501341] system_heap_do_vmap took 2489008 ns
> > >
> > > W/o patch:
> > > [    7.413756] system_heap_do_vmap took 42626000 ns
> > > [    9.425610] system_heap_do_vmap took 42500992 ns
> > > [   11.810898] system_heap_do_vmap took 42215008 ns
> > > [   14.336790] system_heap_do_vmap took 42134992 ns
> > > [   16.373890] system_heap_do_vmap took 42750000 ns
> > >
> >
> > That's quite a speedup.
> >
> > > Cc: David Hildenbrand <david@kernel.org>
> > > Cc: Uladzislau Rezki <urezki@gmail.com>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: John Stultz <jstultz@google.com>
> > > Cc: Maxime Ripard <mripard@kernel.org>
> > > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > ---
> > >   * diff with rfc:
> > >   Many code refinements based on David's suggestions, thanks!
> > >   Refine comment and changelog according to Uladzislau, thanks!
> > >   rfc link:
> > >   https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
> > >
> > >   mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
> > >   1 file changed, 39 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 41dd01e8430c..8d577767a9e5 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > >     return err;
> > >   }
> > > +static inline int get_vmap_batch_order(struct page **pages,
> > > +           unsigned int stride, unsigned int max_steps, unsigned int idx)
> > > +{
> > > +   int nr_pages = 1;
> >
> > unsigned int, maybe

Right

> >
> > Why are you initializing nr_pages when you overwrite it below?

Right, initializing nr_pages can be dropped.

> >
> > > +
> > > +   /*
> > > +    * Currently, batching is only supported in vmap_pages_range
> > > +    * when page_shift == PAGE_SHIFT.
> >
> > I don't know the code so realizing how we go from page_shift to stride too
> > me a second. Maybe only talk about stride here?
> >
> > OTOH, is "stride" really the right terminology?
> >
> > we calculate it as
> >
> >       stride = 1U << (page_shift - PAGE_SHIFT);
> >
> > page_shift - PAGE_SHIFT should give us an "order". So is this a
> > "granularity" in nr_pages?

This is the case where vmalloc() may realize that it has
high-order pages and therefore calls
vmap_pages_range_noflush() with a page_shift larger than
PAGE_SHIFT. For vmap(), we take a pages array, so
page_shift is always PAGE_SHIFT.

> >
> > Again, I don't know this code, so sorry for the question.
> >
> To me "stride" also sounds unclear.

Thanks, David and Uladzislau. On second thought, this stride may be
redundant, and it should be possible to drop it entirely. This results
in the code below:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 41dd01e8430c..3962bdcb43e5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -642,6 +642,20 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
 	return err;
 }
 
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int max_steps, unsigned int idx)
+{
+	unsigned int nr_pages	 = compound_nr(pages[idx]);
+
+	if (nr_pages == 1 || max_steps < nr_pages)
+		return 0;
+
+	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+		return compound_order(pages[idx]);
+	return 0;
+}
+
 /*
  * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
  * flush caches.
@@ -658,20 +672,35 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 
 	WARN_ON(page_shift < PAGE_SHIFT);
 
+	/*
+	 * For vmap(), users may allocate pages from high orders down to
+	 * order 0, while always using PAGE_SHIFT as the page_shift.
+	 * We first check whether the initial page is a compound page. If so,
+	 * there may be an opportunity to batch multiple pages together.
+	 */
 	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
-			page_shift == PAGE_SHIFT)
+			(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
 		return vmap_small_pages_range_noflush(addr, end, prot, pages);
 
-	for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
+	for (i = 0; i < nr; ) {
+		unsigned int shift = page_shift;
 		int err;
 
-		err = vmap_range_noflush(addr, addr + (1UL << page_shift),
+		/*
+		 * For vmap() cases, page_shift is always PAGE_SHIFT, even
+		 * if the pages are physically contiguous, they may still
+		 * be mapped in a batch.
+		 */
+		if (page_shift == PAGE_SHIFT)
+			shift += get_vmap_batch_order(pages, nr - i, i);
+		err = vmap_range_noflush(addr, addr + (1UL << shift),
 					page_to_phys(pages[i]), prot,
-					page_shift);
+					shift);
 		if (err)
 			return err;
 
-		addr += 1UL << page_shift;
+		addr += 1UL  << shift;
+		i += 1U << shift;
 	}
 
 	return 0;

Does this look clearer?

Thanks
Barry


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-18 21:24     ` Barry Song
@ 2025-12-22 13:08       ` Uladzislau Rezki
  2025-12-23 21:23         ` Barry Song
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2025-12-22 13:08 UTC (permalink / raw)
  To: Barry Song
  Cc: urezki, akpm, david, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	v-songbaohua, zhengtangquan

On Fri, Dec 19, 2025 at 05:24:36AM +0800, Barry Song wrote:
> On Thu, Dec 18, 2025 at 9:55 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 02:01:56PM +0100, David Hildenbrand (Red Hat) wrote:
> > > On 12/15/25 06:30, Barry Song wrote:
> > > > From: Barry Song <v-songbaohua@oppo.com>
> > > >
> > > > In many cases, the pages passed to vmap() may include high-order
> > > > pages allocated with __GFP_COMP flags. For example, the systemheap
> > > > often allocates pages in descending order: order 8, then 4, then 0.
> > > > Currently, vmap() iterates over every page individually—even pages
> > > > inside a high-order block are handled one by one.
> > > >
> > > > This patch detects high-order pages and maps them as a single
> > > > contiguous block whenever possible.
> > > >
> > > > An alternative would be to implement a new API, vmap_sg(), but that
> > > > change seems to be large in scope.
> > > >
> > > > When vmapping a 128MB dma-buf using the systemheap, this patch
> > > > makes system_heap_do_vmap() roughly 17× faster.
> > > >
> > > > W/ patch:
> > > > [   10.404769] system_heap_do_vmap took 2494000 ns
> > > > [   12.525921] system_heap_do_vmap took 2467008 ns
> > > > [   14.517348] system_heap_do_vmap took 2471008 ns
> > > > [   16.593406] system_heap_do_vmap took 2444000 ns
> > > > [   19.501341] system_heap_do_vmap took 2489008 ns
> > > >
> > > > W/o patch:
> > > > [    7.413756] system_heap_do_vmap took 42626000 ns
> > > > [    9.425610] system_heap_do_vmap took 42500992 ns
> > > > [   11.810898] system_heap_do_vmap took 42215008 ns
> > > > [   14.336790] system_heap_do_vmap took 42134992 ns
> > > > [   16.373890] system_heap_do_vmap took 42750000 ns
> > > >
> > >
> > > That's quite a speedup.
> > >
> > > > Cc: David Hildenbrand <david@kernel.org>
> > > > Cc: Uladzislau Rezki <urezki@gmail.com>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: John Stultz <jstultz@google.com>
> > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
> > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > > ---
> > > >   * diff with rfc:
> > > >   Many code refinements based on David's suggestions, thanks!
> > > >   Refine comment and changelog according to Uladzislau, thanks!
> > > >   rfc link:
> > > >   https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
> > > >
> > > >   mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
> > > >   1 file changed, 39 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 41dd01e8430c..8d577767a9e5 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > > >     return err;
> > > >   }
> > > > +static inline int get_vmap_batch_order(struct page **pages,
> > > > +           unsigned int stride, unsigned int max_steps, unsigned int idx)
> > > > +{
> > > > +   int nr_pages = 1;
> > >
> > > unsigned int, maybe
> 
> Right
> 
> > >
> > > Why are you initializing nr_pages when you overwrite it below?
> 
> Right, initializing nr_pages can be dropped.
> 
> > >
> > > > +
> > > > +   /*
> > > > +    * Currently, batching is only supported in vmap_pages_range
> > > > +    * when page_shift == PAGE_SHIFT.
> > >
> > > I don't know the code so realizing how we go from page_shift to stride too
> > > me a second. Maybe only talk about stride here?
> > >
> > > OTOH, is "stride" really the right terminology?
> > >
> > > we calculate it as
> > >
> > >       stride = 1U << (page_shift - PAGE_SHIFT);
> > >
> > > page_shift - PAGE_SHIFT should give us an "order". So is this a
> > > "granularity" in nr_pages?
> 
> This is the case where vmalloc() may realize that it has
> high-order pages and therefore calls
> vmap_pages_range_noflush() with a page_shift larger than
> PAGE_SHIFT. For vmap(), we take a pages array, so
> page_shift is always PAGE_SHIFT.
> 
> > >
> > > Again, I don't know this code, so sorry for the question.
> > >
> > To me "stride" also sounds unclear.
> 
> Thanks, David and Uladzislau. On second thought, this stride may be
> redundant, and it should be possible to drop it entirely. This results
> in the code below:
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 41dd01e8430c..3962bdcb43e5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,20 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>  	return err;
>  }
>  
> +static inline int get_vmap_batch_order(struct page **pages,
> +		unsigned int max_steps, unsigned int idx)
> +{
> +	unsigned int nr_pages	 = compound_nr(pages[idx]);
> +
> +	if (nr_pages == 1 || max_steps < nr_pages)
> +		return 0;
> +
> +	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
> +		return compound_order(pages[idx]);
> +	return 0;
> +}
> +
>



>  /*
>   * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
>   * flush caches.
> @@ -658,20 +672,35 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>  
>  	WARN_ON(page_shift < PAGE_SHIFT);
>  
> +	/*
> +	 * For vmap(), users may allocate pages from high orders down to
> +	 * order 0, while always using PAGE_SHIFT as the page_shift.
> +	 * We first check whether the initial page is a compound page. If so,
> +	 * there may be an opportunity to batch multiple pages together.
> +	 */
>  	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> -			page_shift == PAGE_SHIFT)
> +			(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
>  		return vmap_small_pages_range_noflush(addr, end, prot, pages);
Hm.. If first few pages are order-0 and the rest are compound
then we do nothing.

>  
> -	for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> +	for (i = 0; i < nr; ) {
> +		unsigned int shift = page_shift;
>  		int err;
>  
> -		err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> +		/*
> +		 * For vmap() cases, page_shift is always PAGE_SHIFT, even
> +		 * if the pages are physically contiguous, they may still
> +		 * be mapped in a batch.
> +		 */
> +		if (page_shift == PAGE_SHIFT)
> +			shift += get_vmap_batch_order(pages, nr - i, i);
> +		err = vmap_range_noflush(addr, addr + (1UL << shift),
>  					page_to_phys(pages[i]), prot,
> -					page_shift);
> +					shift);
>  		if (err)
>  			return err;
>  
> -		addr += 1UL << page_shift;
> +		addr += 1UL  << shift;
> +		i += 1U << shift;
>  	}
>  
>  	return 0;
> 
> Does this look clearer?
> 
The concern is we mix it with a huge page mapping path. If we want to batch
v-mapping for page_shift == PAGE_SHIFT case, where "pages" array may contain 
compound pages(folio)(corner case to me), i think we should split it.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-22 13:08       ` Uladzislau Rezki
@ 2025-12-23 21:23         ` Barry Song
  2026-01-05 16:28           ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Barry Song @ 2025-12-23 21:23 UTC (permalink / raw)
  To: urezki
  Cc: 21cnbao, akpm, david, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	v-songbaohua, zhengtangquan

> >  /*
> >   * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
> >   * flush caches.
> > @@ -658,20 +672,35 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> >
> >       WARN_ON(page_shift < PAGE_SHIFT);
> >
> > +     /*
> > +      * For vmap(), users may allocate pages from high orders down to
> > +      * order 0, while always using PAGE_SHIFT as the page_shift.
> > +      * We first check whether the initial page is a compound page. If so,
> > +      * there may be an opportunity to batch multiple pages together.
> > +      */
> >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > -                     page_shift == PAGE_SHIFT)
> > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> >               return vmap_small_pages_range_noflush(addr, end, prot, pages);
> Hm.. If first few pages are order-0 and the rest are compound
> then we do nothing.

Now the dma-buf is allocated in descending order. If page0
is not huge, page1 will not be either. However, I agree that
we may extend support for this case.

>
> >
> > -     for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> > +     for (i = 0; i < nr; ) {
> > +             unsigned int shift = page_shift;
> >               int err;
> >
> > -             err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> > +             /*
> > +              * For vmap() cases, page_shift is always PAGE_SHIFT, even
> > +              * if the pages are physically contiguous, they may still
> > +              * be mapped in a batch.
> > +              */
> > +             if (page_shift == PAGE_SHIFT)
> > +                     shift += get_vmap_batch_order(pages, nr - i, i);
> > +             err = vmap_range_noflush(addr, addr + (1UL << shift),
> >                                       page_to_phys(pages[i]), prot,
> > -                                     page_shift);
> > +                                     shift);
> >               if (err)
> >                       return err;
> >
> > -             addr += 1UL << page_shift;
> > +             addr += 1UL  << shift;
> > +             i += 1U << shift;
> >       }
> >
> >       return 0;
> >
> > Does this look clearer?
> >
> The concern is we mix it with a huge page mapping path. If we want to batch
> v-mapping for page_shift == PAGE_SHIFT case, where "pages" array may contain
> compound pages(folio)(corner case to me), i think we should split it.

I agree this might not be common when the vmap buffer is only
used by the CPU. However, for GPUs, NPUs, and similar devices,
benefiting from larger mappings may be quite common.

Does the code below, which moves batched mapping to vmap(),
address both of your concerns?

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ecbac900c35f..782f2eac8a63 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3501,6 +3501,20 @@ void vunmap(const void *addr)
 }
 EXPORT_SYMBOL(vunmap);
 
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int max_steps, unsigned int idx)
+{
+	unsigned int nr_pages;
+
+	nr_pages = compound_nr(pages[idx]);
+	if (nr_pages == 1 || max_steps < nr_pages)
+		return 0;
+
+	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+		return compound_order(pages[idx]);
+	return 0;
+}
+
 /**
  * vmap - map an array of pages into virtually contiguous space
  * @pages: array of page pointers
@@ -3544,10 +3558,21 @@ void *vmap(struct page **pages, unsigned int count,
 		return NULL;
 
 	addr = (unsigned long)area->addr;
-	if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
-				pages, PAGE_SHIFT) < 0) {
-		vunmap(area->addr);
-		return NULL;
+	for (unsigned int i = 0; i < count; ) {
+		unsigned int shift = PAGE_SHIFT;
+		int err;
+
+		shift += get_vmap_batch_order(pages, count - i, i);
+		err = vmap_range_noflush(addr, addr + (1UL << shift),
+				page_to_phys(pages[i]), pgprot_nx(prot),
+				shift);
+		if (err) {
+			vunmap(area->addr);
+			return NULL;
+		}
+
+		addr += 1UL  << shift;
+		i += 1U << shift;
 	}
 
 	if (flags & VM_MAP_PUT_PAGES) {
-- 
2.48.1

Thanks
Barry


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-23 21:23         ` Barry Song
@ 2026-01-05 16:28           ` Uladzislau Rezki
  2026-04-03  9:20             ` Barry Song
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2026-01-05 16:28 UTC (permalink / raw)
  To: Barry Song
  Cc: urezki, akpm, david, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	v-songbaohua, zhengtangquan

On Wed, Dec 24, 2025 at 10:23:34AM +1300, Barry Song wrote:
> > >  /*
> > >   * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
> > >   * flush caches.
> > > @@ -658,20 +672,35 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> > >
> > >       WARN_ON(page_shift < PAGE_SHIFT);
> > >
> > > +     /*
> > > +      * For vmap(), users may allocate pages from high orders down to
> > > +      * order 0, while always using PAGE_SHIFT as the page_shift.
> > > +      * We first check whether the initial page is a compound page. If so,
> > > +      * there may be an opportunity to batch multiple pages together.
> > > +      */
> > >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > -                     page_shift == PAGE_SHIFT)
> > > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > >               return vmap_small_pages_range_noflush(addr, end, prot, pages);
> > Hm.. If first few pages are order-0 and the rest are compound
> > then we do nothing.
> 
> Now the dma-buf is allocated in descending order. If page0
> is not huge, page1 will not be either. However, I agree that
> we may extend support for this case.
> 
> >
> > >
> > > -     for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> > > +     for (i = 0; i < nr; ) {
> > > +             unsigned int shift = page_shift;
> > >               int err;
> > >
> > > -             err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> > > +             /*
> > > +              * For vmap() cases, page_shift is always PAGE_SHIFT, even
> > > +              * if the pages are physically contiguous, they may still
> > > +              * be mapped in a batch.
> > > +              */
> > > +             if (page_shift == PAGE_SHIFT)
> > > +                     shift += get_vmap_batch_order(pages, nr - i, i);
> > > +             err = vmap_range_noflush(addr, addr + (1UL << shift),
> > >                                       page_to_phys(pages[i]), prot,
> > > -                                     page_shift);
> > > +                                     shift);
> > >               if (err)
> > >                       return err;
> > >
> > > -             addr += 1UL << page_shift;
> > > +             addr += 1UL  << shift;
> > > +             i += 1U << shift;
> > >       }
> > >
> > >       return 0;
> > >
> > > Does this look clearer?
> > >
I think so, at least the place:

<snip>
[    2.959030] Oops: Oops: 0000 [#66] SMP NOPTI
[    2.960004] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0+ #220 PREEMPT(none)
[    2.961781] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    2.963870] BUG: unable to handle page fault for address: ffffffff3fd68118
[    2.965383] #PF: supervisor read access in kernel mode
[    2.966532] #PF: error_code(0x0000) - not-present page
[    2.967682] BAD
<snip>

but it is broken for sure:

i += 1U << shift - "i" is an index in the page array.
For example if order-0 you jump 4096 indices ahead.

Should be: i += 1U << (shift - PAGE_SHIFT)

vmap_page_range() does flushing and it has instrumented KMSAN inside.
We should follow same semantic. Also it uses ioremap_max_page_shift as
maximum page shift policy.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2026-01-05 16:28           ` Uladzislau Rezki
@ 2026-04-03  9:20             ` Barry Song
  2026-04-13 20:34               ` Barry Song (Xiaomi)
  0 siblings, 1 reply; 12+ messages in thread
From: Barry Song @ 2026-04-03  9:20 UTC (permalink / raw)
  To: urezki
  Cc: 21cnbao, akpm, david, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	xueyuan.chen21


> I think so, at least the place:
> 
> <snip>
> [    2.959030] Oops: Oops: 0000 [#66] SMP NOPTI
> [    2.960004] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0+ #220 PREEMPT(none)
> [    2.961781] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [    2.963870] BUG: unable to handle page fault for address: ffffffff3fd68118
> [    2.965383] #PF: supervisor read access in kernel mode
> [    2.966532] #PF: error_code(0x0000) - not-present page
> [    2.967682] BAD
> <snip>
> 
> but it is broken for sure:

> i += 1U << shift - "i" is an index in the page array.
> For example if order-0 you jump 4096 indices ahead.

> Should be: i += 1U << (shift - PAGE_SHIFT)

You’re right! And sorry for the slow response—it’s been
three months since the last discussion.

> vmap_page_range() does flushing and it has instrumented KMSAN inside.
> We should follow same semantic. Also it uses ioremap_max_page_shift as
> maximum page shift policy.

Not quite sure if vmap() should follow ioremap()’s
ioremap_max_page_shift. If needed, it shouldn’t be
difficult to do so.

I have a version queued for testing (Xueyuan is working
hard on it). Meanwhile, if you have any comments, please
feel free to share.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 57eae99d9909..8d449e78a07a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3513,6 +3513,60 @@ void vunmap(const void *addr)
 }
 EXPORT_SYMBOL(vunmap);
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int max_steps, unsigned int idx)
+{
+	unsigned int nr_pages;
+
+	if (ioremap_max_page_shift == PAGE_SHIFT)
+		return 0;
+
+	nr_pages = compound_nr(pages[idx]);
+	if (nr_pages == 1 || max_steps < nr_pages)
+		return 0;
+
+	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+		return compound_order(pages[idx]);
+	return 0;
+}
+#else
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int max_steps, unsigned int idx)
+{
+	return 0;
+}
+#endif
+
+static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
+		pgprot_t prot, struct page **pages)
+{
+	unsigned int count = (end - addr) >> PAGE_SHIFT;
+	int err;
+
+	err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
+						PAGE_SHIFT, GFP_KERNEL);
+	if (err)
+		goto out;
+
+	for (unsigned int i = 0; i < count; ) {
+		unsigned int shift = PAGE_SHIFT;
+
+		shift += get_vmap_batch_order(pages, count - i, i);
+		err = vmap_range_noflush(addr, addr + (1UL << shift),
+				page_to_phys(pages[i]), prot, shift);
+		if (err)
+			goto out;
+
+		addr += 1UL  << shift;
+		i += 1U << (shift - PAGE_SHIFT);
+	}
+
+out:
+	flush_cache_vmap(addr, end);
+	return err;
+}
+
 /**
  * vmap - map an array of pages into virtually contiguous space
  * @pages: array of page pointers
@@ -3556,8 +3610,8 @@ void *vmap(struct page **pages, unsigned int count,
 		return NULL;
 
 	addr = (unsigned long)area->addr;
-	if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
-				pages, PAGE_SHIFT) < 0) {
+	if (vmap_contig_pages_range(addr, addr + size, pgprot_nx(prot),
+				pages) < 0) {
 		vunmap(area->addr);
 		return NULL;
 	}
-- 
2.39.3 (Apple Git-146)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2026-04-03  9:20             ` Barry Song
@ 2026-04-13 20:34               ` Barry Song (Xiaomi)
  0 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-13 20:34 UTC (permalink / raw)
  To: david, urezki
  Cc: baohua, 21cnbao, akpm, dri-devel, jstultz, linaro-mm-sig,
	linux-kernel, linux-media, linux-mm, mripard, sumit.semwal,
	xueyuan.chen21

>> vmap_page_range() does flushing and it has instrumented KMSAN inside.
>> We should follow same semantic. Also it uses ioremap_max_page_shift as
>> maximum page shift policy.
> 
> Not quite sure if vmap() should follow ioremap()’s
> ioremap_max_page_shift. If needed, it shouldn’t be
> difficult to do so.
> 
> I have a version queued for testing (Xueyuan is working
> hard on it). Meanwhile, if you have any comments, please
> feel free to share.
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 57eae99d9909..8d449e78a07a 100644

Hi Uladzislau, David,

As explained there [1], this standalone patch is withdrawn,
as I have moved to a series that addresses a broader set of
issues.
Sorry for any confusion this may have caused.

[1] https://lore.kernel.org/all/CAGsJ_4wCBeVfyFraj_dRdsUrSNqDG5a8SO9C3=PFRSt04dRvGw@mail.gmail.com/

Best Regards
Barry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-15  5:30 [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible Barry Song
  2025-12-18 13:01 ` David Hildenbrand (Red Hat)
@ 2025-12-18 14:00 ` Uladzislau Rezki
  2025-12-18 20:05   ` Barry Song
  1 sibling, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2025-12-18 14:00 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, dri-devel, jstultz, linaro-mm-sig, linux-kernel,
	linux-media, Barry Song, David Hildenbrand, Uladzislau Rezki,
	Sumit Semwal, Maxime Ripard, Tangquan Zheng

On Mon, Dec 15, 2025 at 01:30:50PM +0800, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In many cases, the pages passed to vmap() may include high-order
> pages allocated with __GFP_COMP flags. For example, the systemheap
> often allocates pages in descending order: order 8, then 4, then 0.
> Currently, vmap() iterates over every page individually—even pages
> inside a high-order block are handled one by one.
> 
> This patch detects high-order pages and maps them as a single
> contiguous block whenever possible.
> 
> An alternative would be to implement a new API, vmap_sg(), but that
> change seems to be large in scope.
> 
> When vmapping a 128MB dma-buf using the systemheap, this patch
> makes system_heap_do_vmap() roughly 17× faster.
> 
> W/ patch:
> [   10.404769] system_heap_do_vmap took 2494000 ns
> [   12.525921] system_heap_do_vmap took 2467008 ns
> [   14.517348] system_heap_do_vmap took 2471008 ns
> [   16.593406] system_heap_do_vmap took 2444000 ns
> [   19.501341] system_heap_do_vmap took 2489008 ns
> 
> W/o patch:
> [    7.413756] system_heap_do_vmap took 42626000 ns
> [    9.425610] system_heap_do_vmap took 42500992 ns
> [   11.810898] system_heap_do_vmap took 42215008 ns
> [   14.336790] system_heap_do_vmap took 42134992 ns
> [   16.373890] system_heap_do_vmap took 42750000 ns
> 
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Tested-by: Tangquan Zheng <zhengtangquan@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  * diff with rfc:
>  Many code refinements based on David's suggestions, thanks!
>  Refine comment and changelog according to Uladzislau, thanks!
>  rfc link:
>  https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
> 
>  mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 41dd01e8430c..8d577767a9e5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>  	return err;
>  }
>  
> +static inline int get_vmap_batch_order(struct page **pages,
> +		unsigned int stride, unsigned int max_steps, unsigned int idx)
> +{
> +	int nr_pages = 1;
> +
> +	/*
> +	 * Currently, batching is only supported in vmap_pages_range
> +	 * when page_shift == PAGE_SHIFT.
> +	 */
> +	if (stride != 1)
> +		return 0;
> +
> +	nr_pages = compound_nr(pages[idx]);
> +	if (nr_pages == 1)
> +		return 0;
> +	if (max_steps < nr_pages)
> +		return 0;
> +
> +	if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
> +		return compound_order(pages[idx]);
> +	return 0;
> +}
> +
Can we instead look at this as: it can be that we have continues
set of pages let's find out. I mean if we do not stick just to
compound pages.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-18 14:00 ` Uladzislau Rezki
@ 2025-12-18 20:05   ` Barry Song
  2026-01-14 12:59     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 12+ messages in thread
From: Barry Song @ 2025-12-18 20:05 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: akpm, linux-mm, dri-devel, jstultz, linaro-mm-sig, linux-kernel,
	linux-media, Barry Song, David Hildenbrand, Sumit Semwal,
	Maxime Ripard, Tangquan Zheng

[...]
> >
> > +static inline int get_vmap_batch_order(struct page **pages,
> > +             unsigned int stride, unsigned int max_steps, unsigned int idx)
> > +{
> > +     int nr_pages = 1;
> > +
> > +     /*
> > +      * Currently, batching is only supported in vmap_pages_range
> > +      * when page_shift == PAGE_SHIFT.
> > +      */
> > +     if (stride != 1)
> > +             return 0;
> > +
> > +     nr_pages = compound_nr(pages[idx]);
> > +     if (nr_pages == 1)
> > +             return 0;
> > +     if (max_steps < nr_pages)
> > +             return 0;
> > +
> > +     if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
> > +             return compound_order(pages[idx]);
> > +     return 0;
> > +}
> > +
> Can we instead look at this as: it can be that we have continues
> set of pages let's find out. I mean if we do not stick just to
> compound pages.

We use PageCompound(pages[0]) and compound_nr() as quick
filters to skip checking the contiguous count, and this is
now the intended use case. Always checking contiguity might
cause a slight regression, I guess.

BTW, do we have a strong use case where GFP_COMP or folio is
not used, yet the pages are physically contiguous?

Thanks
Barry


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible
  2025-12-18 20:05   ` Barry Song
@ 2026-01-14 12:59     ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-14 12:59 UTC (permalink / raw)
  To: Barry Song, Uladzislau Rezki
  Cc: akpm, linux-mm, dri-devel, jstultz, linaro-mm-sig, linux-kernel,
	linux-media, Barry Song, Sumit Semwal, Maxime Ripard,
	Tangquan Zheng

On 12/18/25 21:05, Barry Song wrote:
> [...]
>>>
>>> +static inline int get_vmap_batch_order(struct page **pages,
>>> +             unsigned int stride, unsigned int max_steps, unsigned int idx)
>>> +{
>>> +     int nr_pages = 1;
>>> +
>>> +     /*
>>> +      * Currently, batching is only supported in vmap_pages_range
>>> +      * when page_shift == PAGE_SHIFT.
>>> +      */
>>> +     if (stride != 1)
>>> +             return 0;
>>> +
>>> +     nr_pages = compound_nr(pages[idx]);
>>> +     if (nr_pages == 1)
>>> +             return 0;
>>> +     if (max_steps < nr_pages)
>>> +             return 0;
>>> +
>>> +     if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
>>> +             return compound_order(pages[idx]);
>>> +     return 0;
>>> +}
>>> +
>> Can we instead look at this as: it can be that we have continues
>> set of pages let's find out. I mean if we do not stick just to
>> compound pages.
> 
> We use PageCompound(pages[0]) and compound_nr() as quick
> filters to skip checking the contiguous count, and this is
> now the intended use case. Always checking contiguity might
> cause a slight regression, I guess.
> 
> BTW, do we have a strong use case where GFP_COMP or folio is
> not used, yet the pages are physically contiguous?

It usually happens by accident :)

E.g., allocate 2 pages and because we had to split an order-1 page into 
two order-0 pages, we get both of them.

Using num_pages_contiguous() only might indeed be nicer, but then we 
have to add some handling for getting aligned ranges (start and size 
aligned to order) ... so not sure if that is worth it.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-04-13 20:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15  5:30 [PATCH] mm/vmalloc: map contiguous pages in batches for vmap() whenever possible Barry Song
2025-12-18 13:01 ` David Hildenbrand (Red Hat)
2025-12-18 13:54   ` Uladzislau Rezki
2025-12-18 21:24     ` Barry Song
2025-12-22 13:08       ` Uladzislau Rezki
2025-12-23 21:23         ` Barry Song
2026-01-05 16:28           ` Uladzislau Rezki
2026-04-03  9:20             ` Barry Song
2026-04-13 20:34               ` Barry Song (Xiaomi)
2025-12-18 14:00 ` Uladzislau Rezki
2025-12-18 20:05   ` Barry Song
2026-01-14 12:59     ` David Hildenbrand (Red Hat)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.