* [RFC PATCH 1/8] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 2/8] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Barry Song (Xiaomi)
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
For sizes aligned to CONT_PTE_SIZE and smaller than PMD_SIZE,
we can batch CONT_PTE settings instead of handling them individually.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
arch/arm64/mm/hugetlbpage.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index a42c05cf5640..bf31c11ebd3b 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -110,6 +110,12 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
contig_ptes = CONT_PTES;
break;
default:
+ if (size < CONT_PMD_SIZE && size > 0 &&
+ IS_ALIGNED(size, CONT_PTE_SIZE)) {
+ contig_ptes = size >> PAGE_SHIFT;
+ *pgsize = PAGE_SIZE;
+ break;
+ }
WARN_ON(!__hugetlb_valid_size(size));
}
@@ -359,6 +365,10 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
case CONT_PTE_SIZE:
return pte_mkcont(entry);
default:
+ if (pagesize < CONT_PMD_SIZE && pagesize > 0 &&
+ IS_ALIGNED(pagesize, CONT_PTE_SIZE))
+ return pte_mkcont(entry);
+
break;
}
pr_warn("%s: unrecognized huge page size 0x%lx\n",
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 2/8] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 1/8] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 3/8] mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger page_shift sizes Barry Song (Xiaomi)
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE hugepages,
reducing both PTE setup and TLB flush iterations.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
arch/arm64/include/asm/vmalloc.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 4ec1acd3c1b3..9eea06d0f75d 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -23,6 +23,8 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
unsigned long end, u64 pfn,
unsigned int max_page_shift)
{
+ unsigned long size;
+
/*
* If the block is at least CONT_PTE_SIZE in size, and is naturally
* aligned in both virtual and physical space, then we can pte-map the
@@ -40,7 +42,9 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE))
return PAGE_SIZE;
- return CONT_PTE_SIZE;
+ size = min3(end - addr, 1UL << max_page_shift, PMD_SIZE >> 1);
+ size = 1UL << (fls(size) - 1);
+ return size;
}
#define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 3/8] mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger page_shift sizes
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 1/8] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 2/8] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 4/8] mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings Barry Song (Xiaomi)
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
vmap_small_pages_range_noflush() provides a clean interface by taking
struct page **pages and mapping them via direct PTE iteration. This
avoids the page table zigzag seen when using
vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
Extend it to support larger page_shift values, and add PMD- and
contiguous-PTE mappings as well.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 54 ++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 42 insertions(+), 12 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 57eae99d9909..5bf072297536 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -524,8 +524,9 @@ void vunmap_range(unsigned long addr, unsigned long end)
static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
+ unsigned int steps = 1;
int err = 0;
pte_t *pte;
@@ -543,6 +544,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
do {
struct page *page = pages[*nr];
+ steps = 1;
if (WARN_ON(!pte_none(ptep_get(pte)))) {
err = -EBUSY;
break;
@@ -556,9 +558,24 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
break;
}
+#ifdef CONFIG_HUGETLB_PAGE
+ if (shift != PAGE_SHIFT) {
+ unsigned long pfn = page_to_pfn(page), size;
+
+ size = arch_vmap_pte_range_map_size(addr, end, pfn, shift);
+ if (size != PAGE_SIZE) {
+ steps = size >> PAGE_SHIFT;
+ pte_t entry = pfn_pte(pfn, prot);
+
+ entry = arch_make_huge_pte(entry, ilog2(size), 0);
+ set_huge_pte_at(&init_mm, addr, pte, entry, size);
+ continue;
+ }
+ }
+#endif
+
set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
- (*nr)++;
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte += steps, *nr += steps, addr += PAGE_SIZE * steps, addr != end);
lazy_mmu_mode_disable();
*mask |= PGTBL_PTE_MODIFIED;
@@ -568,7 +585,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pmd_t *pmd;
unsigned long next;
@@ -578,7 +595,20 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
- if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
+
+ if (shift == PMD_SHIFT) {
+ struct page *page = pages[*nr];
+ phys_addr_t phys_addr = page_to_phys(page);
+
+ if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
+ shift)) {
+ *mask |= PGTBL_PMD_MODIFIED;
+ *nr += 1 << (shift - PAGE_SHIFT);
+ continue;
+ }
+ }
+
+ if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
@@ -586,7 +616,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pud_t *pud;
unsigned long next;
@@ -596,7 +626,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
- if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
@@ -604,7 +634,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
p4d_t *p4d;
unsigned long next;
@@ -614,14 +644,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
- if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
}
static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
- pgprot_t prot, struct page **pages)
+ pgprot_t prot, struct page **pages, unsigned int shift)
{
unsigned long start = addr;
pgd_t *pgd;
@@ -636,7 +666,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
- err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
+ err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
if (err)
break;
} while (pgd++, addr = next, addr != end);
@@ -665,7 +695,7 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
page_shift == PAGE_SHIFT)
- return vmap_small_pages_range_noflush(addr, end, prot, pages);
+ return vmap_small_pages_range_noflush(addr, end, prot, pages, PAGE_SHIFT);
for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
int err;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 4/8] mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (2 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 3/8] mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger page_shift sizes Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible Barry Song (Xiaomi)
` (4 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer
need to iterate over pages one by one, which would otherwise lead to
zigzag page table mappings.
The code is now unified with the PAGE_SHIFT case by simply
calling vmap_small_pages_range_noflush().
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 22 ++++------------------
1 file changed, 4 insertions(+), 18 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5bf072297536..eba436386929 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -689,27 +689,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
- unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
-
WARN_ON(page_shift < PAGE_SHIFT);
- if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
- page_shift == PAGE_SHIFT)
- return vmap_small_pages_range_noflush(addr, end, prot, pages, PAGE_SHIFT);
-
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
- int err;
-
- err = vmap_range_noflush(addr, addr + (1UL << page_shift),
- page_to_phys(pages[i]), prot,
- page_shift);
- if (err)
- return err;
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
+ page_shift = PAGE_SHIFT;
- addr += 1UL << page_shift;
- }
-
- return 0;
+ return vmap_small_pages_range_noflush(addr, end, prot, pages,
+ min(page_shift, PMD_SHIFT));
}
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (3 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 4/8] mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 4:19 ` Dev Jain
2026-04-08 2:51 ` [RFC PATCH 6/8] mm/vmalloc: align vm_area so vmap() can batch mappings Barry Song (Xiaomi)
` (3 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In many cases, the pages passed to vmap() may include high-order
pages allocated with __GFP_COMP flags. For example, the systemheap
often allocates pages in descending order: order 8, then 4, then 0.
Currently, vmap() iterates over every page individually—even pages
inside a high-order block are handled one by one.
This patch detects high-order pages and maps them as a single
contiguous block whenever possible.
An alternative would be to implement a new API, vmap_sg(), but that
change seems to be large in scope.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 49 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index eba436386929..e8dbfada42bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3529,6 +3529,53 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);
+static inline int get_vmap_batch_order(struct page **pages,
+ unsigned int max_steps, unsigned int idx)
+{
+ unsigned int nr_pages;
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
+ ioremap_max_page_shift == PAGE_SHIFT)
+ return 0;
+
+ nr_pages = compound_nr(pages[idx]);
+ if (nr_pages == 1 || max_steps < nr_pages)
+ return 0;
+
+ if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+ return compound_order(pages[idx]);
+ return 0;
+}
+
+static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages)
+{
+ unsigned int count = (end - addr) >> PAGE_SHIFT;
+ int err;
+
+ err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
+ PAGE_SHIFT, GFP_KERNEL);
+ if (err)
+ goto out;
+
+ for (unsigned int i = 0; i < count; ) {
+ unsigned int shift = PAGE_SHIFT +
+ get_vmap_batch_order(pages, count - i, i);
+
+ err = vmap_range_noflush(addr, addr + (1UL << shift),
+ page_to_phys(pages[i]), prot, shift);
+ if (err)
+ goto out;
+
+ addr += 1UL << shift;
+ i += 1U << (shift - PAGE_SHIFT);
+ }
+
+out:
+ flush_cache_vmap(addr, end);
+ return err;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3572,8 +3619,8 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
addr = (unsigned long)area->addr;
- if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
- pages, PAGE_SHIFT) < 0) {
+ if (vmap_contig_pages_range(addr, addr + size, pgprot_nx(prot),
+ pages) < 0) {
vunmap(area->addr);
return NULL;
}
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-04-08 2:51 ` [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible Barry Song (Xiaomi)
@ 2026-04-08 4:19 ` Dev Jain
2026-04-08 5:12 ` Barry Song
0 siblings, 1 reply; 12+ messages in thread
From: Dev Jain @ 2026-04-08 4:19 UTC (permalink / raw)
To: Barry Song (Xiaomi), linux-mm, linux-arm-kernel, catalin.marinas,
will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21
On 08/04/26 8:21 am, Barry Song (Xiaomi) wrote:
> In many cases, the pages passed to vmap() may include high-order
> pages allocated with __GFP_COMP flags. For example, the systemheap
> often allocates pages in descending order: order 8, then 4, then 0.
> Currently, vmap() iterates over every page individually—even pages
> inside a high-order block are handled one by one.
>
> This patch detects high-order pages and maps them as a single
> contiguous block whenever possible.
>
> An alternative would be to implement a new API, vmap_sg(), but that
> change seems to be large in scope.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
Coincidentally, I was working on the same thing :)
We have a usecase regarding Arm TRBE and SPE aux buffers.
I'll take a look at your patches later, but my implementation is the
following, if you have any comments. I have squashed the patches into
a single diff.
From ccb9670a52b7f50b1f1e07b579a1316f76b84811 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain@arm.com>
Date: Thu, 26 Feb 2026 16:21:29 +0530
Subject: [PATCH] arm64/perf: map AUX buffer with large pages
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
.../hwtracing/coresight/coresight-etm-perf.c | 3 +-
drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
drivers/perf/arm_spe_pmu.c | 5 +-
mm/vmalloc.c | 86 ++++++++++++++++---
4 files changed, 79 insertions(+), 18 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 72017dcc3b7f1..e90a430af86bb 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -984,7 +984,8 @@ int __init etm_perf_init(void)
etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
PERF_PMU_CAP_ITRACE |
- PERF_PMU_CAP_AUX_PAUSE);
+ PERF_PMU_CAP_AUX_PAUSE |
+ PERF_PMU_CAP_AUX_PREFER_LARGE);
etm_pmu.attr_groups = etm_pmu_attr_groups;
etm_pmu.task_ctx_nr = perf_sw_context;
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 1511f8eb95afb..74e6ad891e236 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -760,7 +760,8 @@ static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
for (i = 0; i < nr_pages; i++)
pglist[i] = virt_to_page(pages[i]);
- buf->trbe_base = (unsigned long)vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ buf->trbe_base = (unsigned long)vmap(pglist, nr_pages,
+ VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
if (!buf->trbe_base) {
kfree(pglist);
kfree(buf);
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index dbd0da1116390..90c349fd66b2c 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1027,7 +1027,7 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
for (i = 0; i < nr_pages; ++i)
pglist[i] = virt_to_page(pages[i]);
- buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ buf->base = vmap(pglist, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
if (!buf->base)
goto out_free_pglist;
@@ -1064,7 +1064,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
spe_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
.parent = &spe_pmu->pdev->dev,
- .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
+ .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE |
+ PERF_PMU_CAP_AUX_PREFER_LARGE,
.attr_groups = arm_spe_pmu_attr_groups,
/*
* We hitch a ride on the software context here, so that
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61caa55a44027..8482463d41203 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -660,14 +660,14 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
-
+ unsigned long step = 1UL << (page_shift - PAGE_SHIFT);
WARN_ON(page_shift < PAGE_SHIFT);
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
page_shift == PAGE_SHIFT)
return vmap_small_pages_range_noflush(addr, end, prot, pages);
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
+ for (i = 0; i < ALIGN_DOWN(nr, step); i += step) {
int err;
err = vmap_range_noflush(addr, addr + (1UL << page_shift),
@@ -678,8 +678,9 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
addr += 1UL << page_shift;
}
-
- return 0;
+ if (IS_ALIGNED(nr, step))
+ return 0;
+ return vmap_small_pages_range_noflush(addr, end, prot, pages + i);
}
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
@@ -3514,6 +3515,50 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);
+static inline unsigned int vm_shift(pgprot_t prot, unsigned long size)
+{
+ if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
+ return PMD_SHIFT;
+
+ return arch_vmap_pte_supported_shift(size);
+}
+
+static inline int __vmap_huge(struct page **pages, pgprot_t prot,
+ unsigned long addr, unsigned int count)
+{
+ unsigned int i = 0;
+ unsigned int shift;
+ unsigned long nr;
+
+ while (i < count) {
+ nr = num_pages_contiguous(pages + i, count - i);
+ shift = vm_shift(prot, nr << PAGE_SHIFT);
+ if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
+ pgprot_nx(prot), pages + i, shift) < 0) {
+ return 1;
+ }
+ i += nr;
+ addr += (nr << PAGE_SHIFT);
+ }
+ return 0;
+}
+
+static unsigned long max_contiguous_stride_order(struct page **pages,
+ pgprot_t prot, unsigned int count)
+{
+ unsigned long max_shift = PAGE_SHIFT;
+ unsigned int i = 0;
+
+ while (i < count) {
+ unsigned long nr = num_pages_contiguous(pages + i, count - i);
+ unsigned long shift = vm_shift(prot, nr << PAGE_SHIFT);
+
+ max_shift = max(max_shift, shift);
+ i += nr;
+ }
+ return max_shift;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3552,15 +3597,32 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
size = (unsigned long)count << PAGE_SHIFT;
- area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ if (flags & VM_ALLOW_HUGE_VMAP) {
+ /* determine from page array, the max alignment */
+ unsigned long max_shift = max_contiguous_stride_order(pages, prot, count);
+
+ area = __get_vm_area_node(size, 1 << max_shift, max_shift, flags,
+ VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
+ GFP_KERNEL, __builtin_return_address(0));
+ } else {
+ area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ }
if (!area)
return NULL;
addr = (unsigned long)area->addr;
- if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
- pages, PAGE_SHIFT) < 0) {
- vunmap(area->addr);
- return NULL;
+
+ if (flags & VM_ALLOW_HUGE_VMAP) {
+ if (__vmap_huge(pages, prot, addr, count)) {
+ vunmap(area->addr);
+ return NULL;
+ }
+ } else {
+ if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
+ pages, PAGE_SHIFT) < 0) {
+ vunmap(area->addr);
+ return NULL;
+ }
}
if (flags & VM_MAP_PUT_PAGES) {
@@ -4011,11 +4073,7 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
* their allocations due to apply_to_page_range not
* supporting them.
*/
-
- if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
- shift = PMD_SHIFT;
- else
- shift = arch_vmap_pte_supported_shift(size);
+ shift = vm_shift(prot, size);
align = max(original_align, 1UL << shift);
}
--
2.34.1
> mm/vmalloc.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index eba436386929..e8dbfada42bc 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3529,6 +3529,53 @@ void vunmap(const void *addr)
> }
> EXPORT_SYMBOL(vunmap);
>
> +static inline int get_vmap_batch_order(struct page **pages,
> + unsigned int max_steps, unsigned int idx)
> +{
> + unsigned int nr_pages;
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
> + ioremap_max_page_shift == PAGE_SHIFT)
> + return 0;
> +
> + nr_pages = compound_nr(pages[idx]);
> + if (nr_pages == 1 || max_steps < nr_pages)
> + return 0;
> +
> + if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
> + return compound_order(pages[idx]);
> + return 0;
> +}
> +
> +static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages)
> +{
> + unsigned int count = (end - addr) >> PAGE_SHIFT;
> + int err;
> +
> + err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
> + PAGE_SHIFT, GFP_KERNEL);
> + if (err)
> + goto out;
> +
> + for (unsigned int i = 0; i < count; ) {
> + unsigned int shift = PAGE_SHIFT +
> + get_vmap_batch_order(pages, count - i, i);
> +
> + err = vmap_range_noflush(addr, addr + (1UL << shift),
> + page_to_phys(pages[i]), prot, shift);
> + if (err)
> + goto out;
> +
> + addr += 1UL << shift;
> + i += 1U << (shift - PAGE_SHIFT);
> + }
> +
> +out:
> + flush_cache_vmap(addr, end);
> + return err;
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3572,8 +3619,8 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> addr = (unsigned long)area->addr;
> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> - pages, PAGE_SHIFT) < 0) {
> + if (vmap_contig_pages_range(addr, addr + size, pgprot_nx(prot),
> + pages) < 0) {
> vunmap(area->addr);
> return NULL;
> }
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-04-08 4:19 ` Dev Jain
@ 2026-04-08 5:12 ` Barry Song
0 siblings, 0 replies; 12+ messages in thread
From: Barry Song @ 2026-04-08 5:12 UTC (permalink / raw)
To: Dev Jain
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21
On Wed, Apr 8, 2026 at 12:20 PM Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 08/04/26 8:21 am, Barry Song (Xiaomi) wrote:
> > In many cases, the pages passed to vmap() may include high-order
> > pages allocated with __GFP_COMP flags. For example, the systemheap
> > often allocates pages in descending order: order 8, then 4, then 0.
> > Currently, vmap() iterates over every page individually—even pages
> > inside a high-order block are handled one by one.
> >
> > This patch detects high-order pages and maps them as a single
> > contiguous block whenever possible.
> >
> > An alternative would be to implement a new API, vmap_sg(), but that
> > change seems to be large in scope.
> >
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
>
> Coincidentally, I was working on the same thing :)
Interesting, thanks — at least I’ve got one good reviewer :-)
>
> We have a usecase regarding Arm TRBE and SPE aux buffers.
>
> I'll take a look at your patches later, but my implementation is the
Yes. Please.
> following, if you have any comments. I have squashed the patches into
> a single diff.
Thanks very much, Dev. What you’ve done is quite similar to
patches 5/8 and 6/8, although the code differs somewhat.
>
>
>
> From ccb9670a52b7f50b1f1e07b579a1316f76b84811 Mon Sep 17 00:00:00 2001
> From: Dev Jain <dev.jain@arm.com>
> Date: Thu, 26 Feb 2026 16:21:29 +0530
> Subject: [PATCH] arm64/perf: map AUX buffer with large pages
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> .../hwtracing/coresight/coresight-etm-perf.c | 3 +-
> drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
> drivers/perf/arm_spe_pmu.c | 5 +-
> mm/vmalloc.c | 86 ++++++++++++++++---
> 4 files changed, 79 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 72017dcc3b7f1..e90a430af86bb 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -984,7 +984,8 @@ int __init etm_perf_init(void)
>
> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> PERF_PMU_CAP_ITRACE |
> - PERF_PMU_CAP_AUX_PAUSE);
> + PERF_PMU_CAP_AUX_PAUSE |
> + PERF_PMU_CAP_AUX_PREFER_LARGE);
>
> etm_pmu.attr_groups = etm_pmu_attr_groups;
> etm_pmu.task_ctx_nr = perf_sw_context;
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 1511f8eb95afb..74e6ad891e236 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -760,7 +760,8 @@ static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> for (i = 0; i < nr_pages; i++)
> pglist[i] = virt_to_page(pages[i]);
>
> - buf->trbe_base = (unsigned long)vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> + buf->trbe_base = (unsigned long)vmap(pglist, nr_pages,
> + VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
> if (!buf->trbe_base) {
> kfree(pglist);
> kfree(buf);
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index dbd0da1116390..90c349fd66b2c 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -1027,7 +1027,7 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
> for (i = 0; i < nr_pages; ++i)
> pglist[i] = virt_to_page(pages[i]);
>
> - buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> + buf->base = vmap(pglist, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
> if (!buf->base)
> goto out_free_pglist;
>
> @@ -1064,7 +1064,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
> spe_pmu->pmu = (struct pmu) {
> .module = THIS_MODULE,
> .parent = &spe_pmu->pdev->dev,
> - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
> + .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE |
> + PERF_PMU_CAP_AUX_PREFER_LARGE,
> .attr_groups = arm_spe_pmu_attr_groups,
> /*
> * We hitch a ride on the software context here, so that
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 61caa55a44027..8482463d41203 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -660,14 +660,14 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> pgprot_t prot, struct page **pages, unsigned int page_shift)
> {
> unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> -
> + unsigned long step = 1UL << (page_shift - PAGE_SHIFT);
> WARN_ON(page_shift < PAGE_SHIFT);
>
> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> page_shift == PAGE_SHIFT)
> return vmap_small_pages_range_noflush(addr, end, prot, pages);
>
> - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> + for (i = 0; i < ALIGN_DOWN(nr, step); i += step) {
> int err;
>
> err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> @@ -678,8 +678,9 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>
> addr += 1UL << page_shift;
> }
> -
> - return 0;
> + if (IS_ALIGNED(nr, step))
> + return 0;
> + return vmap_small_pages_range_noflush(addr, end, prot, pages + i);
> }
>
> int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> @@ -3514,6 +3515,50 @@ void vunmap(const void *addr)
> }
> EXPORT_SYMBOL(vunmap);
>
> +static inline unsigned int vm_shift(pgprot_t prot, unsigned long size)
> +{
> + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> + return PMD_SHIFT;
> +
> + return arch_vmap_pte_supported_shift(size);
> +}
> +
> +static inline int __vmap_huge(struct page **pages, pgprot_t prot,
> + unsigned long addr, unsigned int count)
> +{
> + unsigned int i = 0;
> + unsigned int shift;
> + unsigned long nr;
> +
> + while (i < count) {
> + nr = num_pages_contiguous(pages + i, count - i);
> + shift = vm_shift(prot, nr << PAGE_SHIFT);
> + if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
> + pgprot_nx(prot), pages + i, shift) < 0) {
> + return 1;
> + }
One observation on my side is that the performance gain is somewhat
offset by page table zigzagging caused by what you are doing here -
iterating each mem segment by vmap_pages_range() .
In patch 3/8, I enhanced vmap_small_pages_range_noflush() to
avoid repeated pgd → p4d → pud → pmd → pte traversals for page
shifts other than PAGE_SHIFT. This improves performance for
vmalloc as well as vmap(). Then, in patch 7/8, I adopt the new
vmap_small_pages_range_noflush() and eliminate the iteration.
> + i += nr;
> + addr += (nr << PAGE_SHIFT);
> + }
> + return 0;
> +}
> +
> +static unsigned long max_contiguous_stride_order(struct page **pages,
> + pgprot_t prot, unsigned int count)
> +{
> + unsigned long max_shift = PAGE_SHIFT;
> + unsigned int i = 0;
> +
> + while (i < count) {
> + unsigned long nr = num_pages_contiguous(pages + i, count - i);
> + unsigned long shift = vm_shift(prot, nr << PAGE_SHIFT);
> +
> + max_shift = max(max_shift, shift);
> + i += nr;
> + }
> + return max_shift;
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3552,15 +3597,32 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> size = (unsigned long)count << PAGE_SHIFT;
> - area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + if (flags & VM_ALLOW_HUGE_VMAP) {
> + /* determine from page array, the max alignment */
> + unsigned long max_shift = max_contiguous_stride_order(pages, prot, count);
> +
> + area = __get_vm_area_node(size, 1 << max_shift, max_shift, flags,
> + VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
> + GFP_KERNEL, __builtin_return_address(0));
> + } else {
> + area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + }
> if (!area)
> return NULL;
>
> addr = (unsigned long)area->addr;
> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> - pages, PAGE_SHIFT) < 0) {
> - vunmap(area->addr);
> - return NULL;
> +
> + if (flags & VM_ALLOW_HUGE_VMAP) {
> + if (__vmap_huge(pages, prot, addr, count)) {
> + vunmap(area->addr);
> + return NULL;
> + }
> + } else {
> + if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> + pages, PAGE_SHIFT) < 0) {
> + vunmap(area->addr);
> + return NULL;
> + }
> }
>
> if (flags & VM_MAP_PUT_PAGES) {
> @@ -4011,11 +4073,7 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
> * their allocations due to apply_to_page_range not
> * supporting them.
> */
> -
> - if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> - shift = PMD_SHIFT;
> - else
> - shift = arch_vmap_pte_supported_shift(size);
> + shift = vm_shift(prot, size);
What I actually did is different. In patches 1/8 and 2/8, I
extended the arm64 levels to support N * CONT_PTE, and let the
final PTE mapping use the maximum possible batch after avoiding
zigzag. This further improves all orders greater than CONT_PTE.
Thanks
Barry
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH 6/8] mm/vmalloc: align vm_area so vmap() can batch mappings
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (4 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 7/8] mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable zigzag Barry Song (Xiaomi)
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
Try to align the vmap virtual address to PMD_SHIFT or a
larger PTE mapping size hinted by the architecture, so
contiguous pages can be batch-mapped when setting PMD or
PTE entries.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e8dbfada42bc..6643ec0288cd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3576,6 +3576,35 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
return err;
}
+static struct vm_struct *get_aligned_vm_area(unsigned long size, unsigned long flags)
+{
+ unsigned int shift = (size >= PMD_SIZE) ? PMD_SHIFT :
+ arch_vmap_pte_supported_shift(size);
+ struct vm_struct *vm_area = NULL;
+
+ /*
+ * Try to allocate an aligned vm_area so contiguous pages can be
+ * mapped in batches.
+ */
+ while (1) {
+ unsigned long align = 1UL << shift;
+
+ vm_area = __get_vm_area_node(size, align, PAGE_SHIFT, flags,
+ VMALLOC_START, VMALLOC_END,
+ NUMA_NO_NODE, GFP_KERNEL,
+ __builtin_return_address(0));
+ if (vm_area || shift <= PAGE_SHIFT)
+ goto out;
+ if (shift == PMD_SHIFT)
+ shift = arch_vmap_pte_supported_shift(size);
+ else if (shift > PAGE_SHIFT)
+ shift = PAGE_SHIFT;
+ }
+
+out:
+ return vm_area;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3614,7 +3643,7 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
size = (unsigned long)count << PAGE_SHIFT;
- area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ area = get_aligned_vm_area(size, flags);
if (!area)
return NULL;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 7/8] mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable zigzag
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (5 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 6/8] mm/vmalloc: align vm_area so vmap() can batch mappings Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 2:51 ` [RFC PATCH 8/8] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap Barry Song (Xiaomi)
2026-04-08 9:14 ` [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Dev Jain
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
For vmap(), detect pages with the same page_shift and map them in
batches, avoiding the pgtable zigzag caused by per-page mapping.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6643ec0288cd..3c3b7217693a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3551,6 +3551,8 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages)
{
unsigned int count = (end - addr) >> PAGE_SHIFT;
+ unsigned int prev_shift = 0, idx = 0;
+ unsigned long map_addr = addr;
int err;
err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
@@ -3562,15 +3564,29 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
unsigned int shift = PAGE_SHIFT +
get_vmap_batch_order(pages, count - i, i);
- err = vmap_range_noflush(addr, addr + (1UL << shift),
- page_to_phys(pages[i]), prot, shift);
- if (err)
- goto out;
+ if (!i)
+ prev_shift = shift;
+
+ if (shift != prev_shift) {
+ err = vmap_small_pages_range_noflush(map_addr, addr,
+ prot, pages + idx,
+ min(prev_shift, PMD_SHIFT));
+ if (err)
+ goto out;
+ prev_shift = shift;
+ map_addr = addr;
+ idx = i;
+ }
addr += 1UL << shift;
i += 1U << (shift - PAGE_SHIFT);
}
+ /* Remaining */
+ if (map_addr < end)
+ err = vmap_small_pages_range_noflush(map_addr, end,
+ prot, pages + idx, min(prev_shift, PMD_SHIFT));
+
out:
flush_cache_vmap(addr, end);
return err;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 8/8] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (6 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 7/8] mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable zigzag Barry Song (Xiaomi)
@ 2026-04-08 2:51 ` Barry Song (Xiaomi)
2026-04-08 9:14 ` [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Dev Jain
8 siblings, 0 replies; 12+ messages in thread
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
Users typically allocate memory in descending orders, e.g.
8 → 4 → 0. Once an order-0 page is encountered, subsequent
pages are likely to also be order-0, so we stop scanning
for compound pages at that point.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3c3b7217693a..242f4bc1379c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3577,6 +3577,12 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
map_addr = addr;
idx = i;
}
+ /*
+ * Once small pages are encountered, the remaining pages
+ * are likely small as well
+ */
+ if (shift == PAGE_SHIFT)
+ break;
addr += 1UL << shift;
i += 1U << (shift - PAGE_SHIFT);
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
2026-04-08 2:51 [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Barry Song (Xiaomi)
` (7 preceding siblings ...)
2026-04-08 2:51 ` [RFC PATCH 8/8] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap Barry Song (Xiaomi)
@ 2026-04-08 9:14 ` Dev Jain
8 siblings, 0 replies; 12+ messages in thread
From: Dev Jain @ 2026-04-08 9:14 UTC (permalink / raw)
To: Barry Song (Xiaomi), linux-mm, linux-arm-kernel, catalin.marinas,
will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21
On 08/04/26 8:21 am, Barry Song (Xiaomi) wrote:
> This patchset accelerates ioremap, vmalloc, and vmap when the memory
> is physically fully or partially contiguous. Two techniques are used:
>
> 1. Avoid page table zigzag when setting PTEs/PMDs for multiple memory
> segments
> 2. Use batched mappings wherever possible in both vmalloc and ARM64
> layers
>
> Patches 1–2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
> CONT-PTE regions instead of just one.
>
> Patches 3–4 extend vmap_small_pages_range_noflush() to support page
> shifts other than PAGE_SHIFT. This allows mapping multiple memory
> segments for vmalloc() without zigzagging page tables.
>
> Patches 5–8 add huge vmap support for contiguous pages. This not only
> improves performance but also enables PMD or CONT-PTE mapping for the
> vmapped area, reducing TLB pressure.
>
> Many thanks to Xueyuan Chen for his substantial testing efforts
> on RK3588 boards.
>
> On the RK3588 8-core ARM64 SoC, with tasks pinned to CPU2 and
> the performance CPUfreq policy enabled, Xueyuan’s tests report:
>
> * ioremap(1 MB): 1.2× faster
> * vmalloc(1 MB) mapping time (excluding allocation) with
> VM_ALLOW_HUGE_VMAP: 1.5× faster
> * vmap(): 5.6× faster when memory includes some order-8 pages,
> with no regression observed for order-0 pages
>
> Barry Song (Xiaomi) (8):
> arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE
> setup
> arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple
> CONT_PTE
> mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger
> page_shift sizes
> mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings
> mm/vmalloc: map contiguous pages in batches for vmap() if possible
> mm/vmalloc: align vm_area so vmap() can batch mappings
> mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable
> zigzag
> mm/vmalloc: Stop scanning for compound pages after encountering small
> pages in vmap
>
> arch/arm64/include/asm/vmalloc.h | 6 +-
> arch/arm64/mm/hugetlbpage.c | 10 ++
> mm/vmalloc.c | 178 +++++++++++++++++++++++++------
> 3 files changed, 161 insertions(+), 33 deletions(-)
>
On Linux VM on Apple M3, running mm-selftests:
./run_vmtests.sh -t "hugetlb"
TAP version 13
# -----------------------
# running ./hugepage-mmap
# -----------------------
# TAP version 13
# 1..1
# # Returned address is 0xffffe7c00000
[ 30.884630] kernel BUG at mm/page_table_check.c:86!
[ 30.884701] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 30.886803] Modules linked in:
[ 30.887217] CPU: 3 UID: 0 PID: 1869 Comm: hugepage-mmap Not tainted 7.0.0-rc5+ #86 PREEMPT
[ 30.888218] Hardware name: linux,dummy-virt (DT)
[ 30.889413] pstate: a1400005 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 30.889901] pc : page_table_check_clear.part.0+0x128/0x1a0
[ 30.890337] lr : page_table_check_clear.part.0+0x7c/0x1a0
[ 30.890714] sp : ffff800084da3ad0
[ 30.890946] x29: ffff800084da3ad0 x28: 0000000000000001 x27: 0010000000000001
[ 30.891434] x26: 0040000000000040 x25: ffffa06bb8fb9000 x24: 00000000ffffffff
[ 30.891932] x23: 0000000000000001 x22: 0000000000000000 x21: ffffa06bb8997810
[ 30.892514] x20: 0000000000113e39 x19: 0000000000113e38 x18: 0000000000000000
[ 30.893007] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 30.893500] x14: ffffa06bb7013780 x13: 0000fffff7f90fff x12: 0000000000000000
[ 30.893990] x11: 1fffe0001a1282c1 x10: ffff0000d094160c x9 : ffffa06bb568a858
[ 30.894479] x8 : ffff5f95c8474000 x7 : 0000000000000000 x6 : ffff00017fffc500
[ 30.894973] x5 : ffff000191208fc0 x4 : 0000000000000000 x3 : 0000000000004000
[ 30.895449] x2 : 0000000000000000 x1 : 00000000ffffffff x0 : ffff0000c071f1b8
[ 30.895875] Call trace:
[ 30.896027] page_table_check_clear.part.0+0x128/0x1a0 (P)
[ 30.896369] page_table_check_clear+0xc8/0x138
[ 30.896776] __page_table_check_ptes_set+0xe4/0x1e8
[ 30.897073] __set_ptes_anysz+0x2e4/0x308
[ 30.897327] set_huge_pte_at+0xec/0x210
[ 30.897561] hugetlb_no_page+0x1ec/0x8e0
[ 30.897807] hugetlb_fault+0x188/0x740
[ 30.898036] handle_mm_fault+0x294/0x2c0
[ 30.898283] do_page_fault+0x120/0x748
[ 30.898539] do_translation_fault+0x68/0x90
[ 30.898796] do_mem_abort+0x4c/0xa8
[ 30.899011] el0_da+0x2c/0x90
[ 30.899205] el0t_64_sync_handler+0xd0/0xe8
[ 30.899461] el0t_64_sync+0x198/0x1a0
[ 30.899688] Code: 91001021 b8f80022 51000441 36fffd41 (d4210000)
[ 30.900053] ---[ end trace 0000000000000000 ]---
The bug is at
BUG_ON(atomic_dec_return(&ptc->file_map_count) < 0);
My tree is mm-unstable, commit 3fa44141e0bb.
^ permalink raw reply [flat|nested] 12+ messages in thread