* [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-05-26 7:56 ` Dev Jain
2026-05-22 5:31 ` [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Wen Jiang
` (5 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
For sizes aligned to CONT_PTE_SIZE and smaller than PMD_SIZE,
we can batch CONT_PTE settings instead of handling them individually.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
arch/arm64/mm/hugetlbpage.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index a42c05cf56408..c4d8b226126cb 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -110,6 +110,12 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
contig_ptes = CONT_PTES;
break;
default:
+ if (size > 0 && size < PMD_SIZE &&
+ IS_ALIGNED(size, CONT_PTE_SIZE)) {
+ contig_ptes = size >> PAGE_SHIFT;
+ *pgsize = PAGE_SIZE;
+ break;
+ }
WARN_ON(!__hugetlb_valid_size(size));
}
@@ -359,6 +365,10 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
case CONT_PTE_SIZE:
return pte_mkcont(entry);
default:
+ if (pagesize > 0 && pagesize < PMD_SIZE &&
+ IS_ALIGNED(pagesize, CONT_PTE_SIZE))
+ return pte_mkcont(entry);
+
break;
}
pr_warn("%s: unrecognized huge page size 0x%lx\n",
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup
2026-05-22 5:31 ` [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Wen Jiang
@ 2026-05-26 7:56 ` Dev Jain
0 siblings, 0 replies; 21+ messages in thread
From: Dev Jain @ 2026-05-26 7:56 UTC (permalink / raw)
To: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki
Cc: baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> For sizes aligned to CONT_PTE_SIZE and smaller than PMD_SIZE,
> we can batch CONT_PTE settings instead of handling them individually.
Better wording: "we can handle CONT_PTE_SIZE groups together"
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> arch/arm64/mm/hugetlbpage.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index a42c05cf56408..c4d8b226126cb 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -110,6 +110,12 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
> contig_ptes = CONT_PTES;
> break;
> default:
> + if (size > 0 && size < PMD_SIZE &&
> + IS_ALIGNED(size, CONT_PTE_SIZE)) {
> + contig_ptes = size >> PAGE_SHIFT;
> + *pgsize = PAGE_SIZE;
> + break;
> + }
> WARN_ON(!__hugetlb_valid_size(size));
> }
>
> @@ -359,6 +365,10 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
> case CONT_PTE_SIZE:
> return pte_mkcont(entry);
> default:
> + if (pagesize > 0 && pagesize < PMD_SIZE &&
> + IS_ALIGNED(pagesize, CONT_PTE_SIZE))
> + return pte_mkcont(entry);
> +
> break;
> }
> pr_warn("%s: unrecognized huge page size 0x%lx\n",
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
2026-05-22 5:31 ` [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-05-27 5:43 ` Dev Jain
2026-05-22 5:31 ` [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic Wen Jiang
` (4 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE hugepages,
reducing both PTE setup and TLB flush iterations.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
arch/arm64/include/asm/vmalloc.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 4ec1acd3c1b34..787fd17b48e2c 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -23,6 +23,8 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
unsigned long end, u64 pfn,
unsigned int max_page_shift)
{
+ unsigned long size;
+
/*
* If the block is at least CONT_PTE_SIZE in size, and is naturally
* aligned in both virtual and physical space, then we can pte-map the
@@ -40,7 +42,9 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE))
return PAGE_SIZE;
- return CONT_PTE_SIZE;
+ size = min3(end - addr, 1UL << max_page_shift, PMD_SIZE >> 1);
+ size = 1UL << __fls(size);
+ return size;
}
#define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE
2026-05-22 5:31 ` [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Wen Jiang
@ 2026-05-27 5:43 ` Dev Jain
0 siblings, 0 replies; 21+ messages in thread
From: Dev Jain @ 2026-05-27 5:43 UTC (permalink / raw)
To: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki
Cc: baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE hugepages,
..."to batch across multiple CONT_PTE blocks"
> reducing both PTE setup and TLB flush iterations.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> arch/arm64/include/asm/vmalloc.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
> index 4ec1acd3c1b34..787fd17b48e2c 100644
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -23,6 +23,8 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
> unsigned long end, u64 pfn,
> unsigned int max_page_shift)
> {
> + unsigned long size;
> +
> /*
> * If the block is at least CONT_PTE_SIZE in size, and is naturally
> * aligned in both virtual and physical space, then we can pte-map the
> @@ -40,7 +42,9 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
> if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE))
> return PAGE_SIZE;
>
> - return CONT_PTE_SIZE;
> + size = min3(end - addr, 1UL << max_page_shift, PMD_SIZE >> 1);
> + size = 1UL << __fls(size);
> + return size;
> }
>
> #define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
2026-05-22 5:31 ` [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Wen Jiang
2026-05-22 5:31 ` [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-06-01 17:34 ` Uladzislau Rezki
2026-05-22 5:31 ` [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk Wen Jiang
` (3 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
Extract the common PTE mapping logic from vmap_pte_range() into a
shared helper vmap_set_ptes(). This handles both CONT_PTE and regular
PTE mappings in a single function, preparing for the next patch which
will extend vmap_pages_pte_range() to also use this helper.
The #ifdef CONFIG_HUGETLB_PAGE guard is moved inside vmap_set_ptes(),
so callers no longer need to handle the conditional compilation.
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
mm/vmalloc.c | 49 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 34 insertions(+), 15 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2c2f74a07f396..53fd4ee460ea4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -91,6 +91,35 @@ struct vfree_deferred {
static DEFINE_PER_CPU(struct vfree_deferred, vfree_deferred);
/*** Page table manipulation functions ***/
+
+/*
+ * Set PTE mappings for the given PFN. Try CONT_PTE mappings first when
+ * supported, otherwise fall back to PAGE_SIZE mappings.
+ *
+ * Return: mapping size.
+ */
+static __always_inline unsigned long vmap_set_ptes(pte_t *pte,
+ unsigned long addr, unsigned long end, u64 pfn,
+ pgprot_t prot, unsigned int max_page_shift)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+ if (max_page_shift > PAGE_SHIFT) {
+ unsigned long size;
+
+ size = arch_vmap_pte_range_map_size(addr, end, pfn, max_page_shift);
+ if (size != PAGE_SIZE) {
+ pte_t entry = pfn_pte(pfn, prot);
+
+ entry = arch_make_huge_pte(entry, ilog2(size), 0);
+ set_huge_pte_at(&init_mm, addr, pte, entry, size);
+ return size;
+ }
+ }
+#endif
+ set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
+ return PAGE_SIZE;
+}
+
static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift, pgtbl_mod_mask *mask)
@@ -98,7 +127,8 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte_t *pte;
u64 pfn;
struct page *page;
- unsigned long size = PAGE_SIZE;
+ unsigned long size;
+ unsigned int steps;
if (WARN_ON_ONCE(!PAGE_ALIGNED(end - addr)))
return -EINVAL;
@@ -119,20 +149,9 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
BUG();
}
-#ifdef CONFIG_HUGETLB_PAGE
- size = arch_vmap_pte_range_map_size(addr, end, pfn, max_page_shift);
- if (size != PAGE_SIZE) {
- pte_t entry = pfn_pte(pfn, prot);
-
- entry = arch_make_huge_pte(entry, ilog2(size), 0);
- set_huge_pte_at(&init_mm, addr, pte, entry, size);
- pfn += PFN_DOWN(size);
- continue;
- }
-#endif
- set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
- pfn++;
- } while (pte += PFN_DOWN(size), addr += size, addr != end);
+ size = vmap_set_ptes(pte, addr, end, pfn, prot, max_page_shift);
+ steps = PFN_DOWN(size);
+ } while (pte += steps, pfn += steps, addr += size, addr != end);
lazy_mmu_mode_disable();
*mask |= PGTBL_PTE_MODIFIED;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic
2026-05-22 5:31 ` [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic Wen Jiang
@ 2026-06-01 17:34 ` Uladzislau Rezki
0 siblings, 0 replies; 21+ messages in thread
From: Uladzislau Rezki @ 2026-06-01 17:34 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Fri, May 22, 2026 at 01:31:43PM +0800, Wen Jiang wrote:
> Extract the common PTE mapping logic from vmap_pte_range() into a
> shared helper vmap_set_ptes(). This handles both CONT_PTE and regular
> PTE mappings in a single function, preparing for the next patch which
> will extend vmap_pages_pte_range() to also use this helper.
>
> The #ifdef CONFIG_HUGETLB_PAGE guard is moved inside vmap_set_ptes(),
> so callers no longer need to handle the conditional compilation.
>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 49 ++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2c2f74a07f396..53fd4ee460ea4 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -91,6 +91,35 @@ struct vfree_deferred {
> static DEFINE_PER_CPU(struct vfree_deferred, vfree_deferred);
>
> /*** Page table manipulation functions ***/
> +
> +/*
> + * Set PTE mappings for the given PFN. Try CONT_PTE mappings first when
> + * supported, otherwise fall back to PAGE_SIZE mappings.
> + *
> + * Return: mapping size.
> + */
> +static __always_inline unsigned long vmap_set_ptes(pte_t *pte,
> + unsigned long addr, unsigned long end, u64 pfn,
> + pgprot_t prot, unsigned int max_page_shift)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> + if (max_page_shift > PAGE_SHIFT) {
> + unsigned long size;
> +
> + size = arch_vmap_pte_range_map_size(addr, end, pfn, max_page_shift);
> + if (size != PAGE_SIZE) {
> + pte_t entry = pfn_pte(pfn, prot);
> +
> + entry = arch_make_huge_pte(entry, ilog2(size), 0);
> + set_huge_pte_at(&init_mm, addr, pte, entry, size);
> + return size;
> + }
> + }
> +#endif
> + set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
> + return PAGE_SIZE;
> +}
> +
> static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> phys_addr_t phys_addr, pgprot_t prot,
> unsigned int max_page_shift, pgtbl_mod_mask *mask)
> @@ -98,7 +127,8 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> pte_t *pte;
> u64 pfn;
> struct page *page;
> - unsigned long size = PAGE_SIZE;
> + unsigned long size;
> + unsigned int steps;
>
> if (WARN_ON_ONCE(!PAGE_ALIGNED(end - addr)))
> return -EINVAL;
> @@ -119,20 +149,9 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> BUG();
> }
>
> -#ifdef CONFIG_HUGETLB_PAGE
> - size = arch_vmap_pte_range_map_size(addr, end, pfn, max_page_shift);
> - if (size != PAGE_SIZE) {
> - pte_t entry = pfn_pte(pfn, prot);
> -
> - entry = arch_make_huge_pte(entry, ilog2(size), 0);
> - set_huge_pte_at(&init_mm, addr, pte, entry, size);
> - pfn += PFN_DOWN(size);
> - continue;
> - }
> -#endif
> - set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
> - pfn++;
> - } while (pte += PFN_DOWN(size), addr += size, addr != end);
> + size = vmap_set_ptes(pte, addr, end, pfn, prot, max_page_shift);
> + steps = PFN_DOWN(size);
> + } while (pte += steps, pfn += steps, addr += size, addr != end);
>
> lazy_mmu_mode_disable();
> *mask |= PGTBL_PTE_MODIFIED;
> --
> 2.34.1
>
IMO, we should add just a helper with "no functional change" and second
patch will extend it.
Otherwise you added a helper which has already been slightly modified.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
` (2 preceding siblings ...)
2026-05-22 5:31 ` [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-05-27 5:58 ` Dev Jain
2026-05-22 5:31 ` [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible Wen Jiang
` (2 subsequent siblings)
6 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
vmap_pages_range_noflush_walk() (formerly vmap_small_pages_range_noflush())
provides a clean interface by taking struct page **pages and mapping them
via direct PTE iteration. This avoids the page table rewalk seen when
using vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
Extend it to support larger page_shift values, and add PMD- and
contiguous-PTE mappings as well. Rename it to vmap_pages_range_noflush_walk()
since it now handles more than just small pages.
For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer need to
iterate over pages one by one via vmap_range_noflush(), which would
otherwise lead to page table rewalk. The code is now unified with the
PAGE_SHIFT case by simply calling vmap_pages_range_noflush_walk().
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
mm/vmalloc.c | 71 +++++++++++++++++++++++++++++-----------------------
1 file changed, 40 insertions(+), 31 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 53fd4ee460ea4..deb764abc0571 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -543,8 +543,10 @@ void vunmap_range(unsigned long addr, unsigned long end)
static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
+ unsigned long pfn, size;
+ unsigned int steps;
int err = 0;
pte_t *pte;
@@ -575,9 +577,10 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
break;
}
- set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
- (*nr)++;
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ pfn = page_to_pfn(page);
+ size = vmap_set_ptes(pte, addr, end, pfn, prot, shift);
+ steps = PFN_DOWN(size);
+ } while (pte += steps, *nr += steps, addr += size, addr != end);
lazy_mmu_mode_disable();
*mask |= PGTBL_PTE_MODIFIED;
@@ -587,7 +590,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pmd_t *pmd;
unsigned long next;
@@ -597,7 +600,27 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
- if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
+
+ if (shift == PMD_SHIFT) {
+ struct page *page = pages[*nr];
+ phys_addr_t phys_addr;
+
+ if (WARN_ON(!page))
+ return -ENOMEM;
+ if (WARN_ON(!pfn_valid(page_to_pfn(page))))
+ return -EINVAL;
+
+ phys_addr = page_to_phys(page);
+
+ if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
+ shift)) {
+ *mask |= PGTBL_PMD_MODIFIED;
+ *nr += 1 << (shift - PAGE_SHIFT);
+ continue;
+ }
+ }
+
+ if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
@@ -605,7 +628,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pud_t *pud;
unsigned long next;
@@ -615,7 +638,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
- if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
@@ -623,7 +646,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
p4d_t *p4d;
unsigned long next;
@@ -633,14 +656,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
- if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
}
-static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
- pgprot_t prot, struct page **pages)
+static int vmap_pages_range_noflush_walk(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int shift)
{
unsigned long start = addr;
pgd_t *pgd;
@@ -655,7 +678,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
- err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
+ err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
if (err)
break;
} while (pgd++, addr = next, addr != end);
@@ -678,27 +701,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
- unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
-
WARN_ON(page_shift < PAGE_SHIFT);
- if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
- page_shift == PAGE_SHIFT)
- return vmap_small_pages_range_noflush(addr, end, prot, pages);
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
+ page_shift = PAGE_SHIFT;
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
- int err;
-
- err = vmap_range_noflush(addr, addr + (1UL << page_shift),
- page_to_phys(pages[i]), prot,
- page_shift);
- if (err)
- return err;
-
- addr += 1UL << page_shift;
- }
-
- return 0;
+ return vmap_pages_range_noflush_walk(addr, end, prot, pages,
+ min(page_shift, PMD_SHIFT));
}
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk
2026-05-22 5:31 ` [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk Wen Jiang
@ 2026-05-27 5:58 ` Dev Jain
2026-05-28 3:39 ` Wen Jiang
0 siblings, 1 reply; 21+ messages in thread
From: Dev Jain @ 2026-05-27 5:58 UTC (permalink / raw)
To: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki
Cc: baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> vmap_pages_range_noflush_walk() (formerly vmap_small_pages_range_noflush())
> provides a clean interface by taking struct page **pages and mapping them
> via direct PTE iteration. This avoids the page table rewalk seen when
> using vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
>
> Extend it to support larger page_shift values, and add PMD- and
> contiguous-PTE mappings as well. Rename it to vmap_pages_range_noflush_walk()
> since it now handles more than just small pages.
>
> For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer need to
> iterate over pages one by one via vmap_range_noflush(), which would
> otherwise lead to page table rewalk. The code is now unified with the
> PAGE_SHIFT case by simply calling vmap_pages_range_noflush_walk().
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 71 +++++++++++++++++++++++++++++-----------------------
> 1 file changed, 40 insertions(+), 31 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 53fd4ee460ea4..deb764abc0571 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -543,8 +543,10 @@ void vunmap_range(unsigned long addr, unsigned long end)
>
> static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> + unsigned long pfn, size;
> + unsigned int steps;
> int err = 0;
> pte_t *pte;
>
> @@ -575,9 +577,10 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> break;
> }
>
> - set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> - (*nr)++;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + pfn = page_to_pfn(page);
> + size = vmap_set_ptes(pte, addr, end, pfn, prot, shift);
> + steps = PFN_DOWN(size);
> + } while (pte += steps, *nr += steps, addr += size, addr != end);
>
> lazy_mmu_mode_disable();
> *mask |= PGTBL_PTE_MODIFIED;
> @@ -587,7 +590,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>
> static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> pmd_t *pmd;
> unsigned long next;
> @@ -597,7 +600,27 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
> return -ENOMEM;
> do {
> next = pmd_addr_end(addr, end);
> - if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
> +
> + if (shift == PMD_SHIFT) {
> + struct page *page = pages[*nr];
> + phys_addr_t phys_addr;
> +
> + if (WARN_ON(!page))
> + return -ENOMEM;
> + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> + return -EINVAL;
So I know these !page and !pfn_valid checks have been copied from vmap_pages_pte_range,
but do they mean anything?
I think pfn_valid() makes sense in that someone may take a random VA/PA, convert it into a struct
page and pass to vmap layer. But I don't see how anyone would pass page == NULL? At the
very least, returning ENOMEM does not make sense because the pages are not being
allocated by vmap() but have already been allocated.
> +
> + phys_addr = page_to_phys(page);
> +
> + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
> + shift)) {
> + *mask |= PGTBL_PMD_MODIFIED;
> + *nr += 1 << (shift - PAGE_SHIFT);
> + continue;
> + }
> + }
> +
> + if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
> return -ENOMEM;
> } while (pmd++, addr = next, addr != end);
> return 0;
> @@ -605,7 +628,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
>
> static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> pud_t *pud;
> unsigned long next;
> @@ -615,7 +638,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
> return -ENOMEM;
> do {
> next = pud_addr_end(addr, end);
> - if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
> + if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
> return -ENOMEM;
> } while (pud++, addr = next, addr != end);
> return 0;
> @@ -623,7 +646,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
>
> static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> p4d_t *p4d;
> unsigned long next;
> @@ -633,14 +656,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
> return -ENOMEM;
> do {
> next = p4d_addr_end(addr, end);
> - if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
> + if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
> return -ENOMEM;
> } while (p4d++, addr = next, addr != end);
> return 0;
> }
>
> -static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> - pgprot_t prot, struct page **pages)
> +static int vmap_pages_range_noflush_walk(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages, unsigned int shift)
> {
> unsigned long start = addr;
> pgd_t *pgd;
> @@ -655,7 +678,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> next = pgd_addr_end(addr, end);
> if (pgd_bad(*pgd))
> mask |= PGTBL_PGD_MODIFIED;
> - err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
> + err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
> if (err)
> break;
> } while (pgd++, addr = next, addr != end);
> @@ -678,27 +701,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> pgprot_t prot, struct page **pages, unsigned int page_shift)
> {
> - unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> -
> WARN_ON(page_shift < PAGE_SHIFT);
>
> - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> - page_shift == PAGE_SHIFT)
> - return vmap_small_pages_range_noflush(addr, end, prot, pages);
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
> + page_shift = PAGE_SHIFT;
>
> - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> - int err;
> -
> - err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> - page_to_phys(pages[i]), prot,
> - page_shift);
> - if (err)
> - return err;
> -
> - addr += 1UL << page_shift;
> - }
> -
> - return 0;
> + return vmap_pages_range_noflush_walk(addr, end, prot, pages,
> + min(page_shift, PMD_SHIFT));
We can easily extend to PUD huge mappings right? Not sure whether we
should keep everything symmetric to how vmap_range_noflush() operates
right now, since P4D mappings don't exist, but PUD looks worthwhile.
> }
>
> int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk
2026-05-27 5:58 ` Dev Jain
@ 2026-05-28 3:39 ` Wen Jiang
2026-05-29 5:28 ` Dev Jain
0 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-28 3:39 UTC (permalink / raw)
To: Dev Jain
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Wed, 27 May 2026 at 13:59, Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> vmap_pages_range_noflush_walk() (formerly vmap_small_pages_range_noflush())
> provides a clean interface by taking struct page **pages and mapping them
> via direct PTE iteration. This avoids the page table rewalk seen when
> using vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
>
> Extend it to support larger page_shift values, and add PMD- and
> contiguous-PTE mappings as well. Rename it to vmap_pages_range_noflush_walk()
> since it now handles more than just small pages.
>
> For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer need to
> iterate over pages one by one via vmap_range_noflush(), which would
> otherwise lead to page table rewalk. The code is now unified with the
> PAGE_SHIFT case by simply calling vmap_pages_range_noflush_walk().
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 71 +++++++++++++++++++++++++++++-----------------------
> 1 file changed, 40 insertions(+), 31 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 53fd4ee460ea4..deb764abc0571 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -543,8 +543,10 @@ void vunmap_range(unsigned long addr, unsigned long end)
>
> static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> + unsigned long pfn, size;
> + unsigned int steps;
> int err = 0;
> pte_t *pte;
>
> @@ -575,9 +577,10 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> break;
> }
>
> - set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> - (*nr)++;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + pfn = page_to_pfn(page);
> + size = vmap_set_ptes(pte, addr, end, pfn, prot, shift);
> + steps = PFN_DOWN(size);
> + } while (pte += steps, *nr += steps, addr += size, addr != end);
>
> lazy_mmu_mode_disable();
> *mask |= PGTBL_PTE_MODIFIED;
> @@ -587,7 +590,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>
> static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> - pgtbl_mod_mask *mask)
> + pgtbl_mod_mask *mask, unsigned int shift)
> {
> pmd_t *pmd;
> unsigned long next;
> > @@ -597,7 +600,27 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
> > return -ENOMEM;
> > do {
> > next = pmd_addr_end(addr, end);
> > - if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
> > +
> > + if (shift == PMD_SHIFT) {
> > + struct page *page = pages[*nr];
> > + phys_addr_t phys_addr;
> > +
> > + if (WARN_ON(!page))
> > + return -ENOMEM;
> > + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > + return -EINVAL;
>
>
> So I know these !page and !pfn_valid checks have been copied from vmap_pages_pte_range,
> but do they mean anything?
>
> I think pfn_valid() makes sense in that someone may take a random VA/PA, convert it into a struct
> page and pass to vmap layer. But I don't see how anyone would pass page == NULL? At the
> very least, returning ENOMEM does not make sense because the pages are not being
> allocated by vmap() but have already been allocated.
Hi Dev,
vmap() is EXPORT_SYMBOL with many callers across drivers, each
constructing the pages array differently. The !page check guards
against malformed arrays at this API boundary.
The same -ENOMEM issue also exists in vmap_pages_pte_range().
Should I fix both in this patchset or leave it as a separate cleanup?
>
> > +
> > + phys_addr = page_to_phys(page);
> > +
> > + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
> > + shift)) {
> > + *mask |= PGTBL_PMD_MODIFIED;
> > + *nr += 1 << (shift - PAGE_SHIFT);
> > + continue;
> > + }
> > + }
> > +
> > + if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
> > return -ENOMEM;
> > } while (pmd++, addr = next, addr != end);
> > return 0;
> > @@ -605,7 +628,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
> >
> > static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
> > unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> > - pgtbl_mod_mask *mask)
> > + pgtbl_mod_mask *mask, unsigned int shift)
> > {
> > pud_t *pud;
> > unsigned long next;
> > @@ -615,7 +638,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
> > return -ENOMEM;
> > do {
> > next = pud_addr_end(addr, end);
> > - if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
> > + if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
> > return -ENOMEM;
> > } while (pud++, addr = next, addr != end);
> > return 0;
> > @@ -623,7 +646,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
> >
> > static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
> > unsigned long end, pgprot_t prot, struct page **pages, int *nr,
> > - pgtbl_mod_mask *mask)
> > + pgtbl_mod_mask *mask, unsigned int shift)
> > {
> > p4d_t *p4d;
> > unsigned long next;
> > @@ -633,14 +656,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
> > return -ENOMEM;
> > do {
> > next = p4d_addr_end(addr, end);
> > - if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
> > + if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
> > return -ENOMEM;
> > } while (p4d++, addr = next, addr != end);
> > return 0;
> > }
> >
> > -static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > - pgprot_t prot, struct page **pages)
> > +static int vmap_pages_range_noflush_walk(unsigned long addr, unsigned long end,
> > + pgprot_t prot, struct page **pages, unsigned int shift)
> > {
> > unsigned long start = addr;
> > pgd_t *pgd;
> > @@ -655,7 +678,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > next = pgd_addr_end(addr, end);
> > if (pgd_bad(*pgd))
> > mask |= PGTBL_PGD_MODIFIED;
> > - err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
> > + err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
> > if (err)
> > break;
> > } while (pgd++, addr = next, addr != end);
> > @@ -678,27 +701,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> > pgprot_t prot, struct page **pages, unsigned int page_shift)
> > {
> > - unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> > -
> > WARN_ON(page_shift < PAGE_SHIFT);
> >
> > - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > - page_shift == PAGE_SHIFT)
> > - return vmap_small_pages_range_noflush(addr, end, prot, pages);
> > + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
> > + page_shift = PAGE_SHIFT;
> >
> > - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> > - int err;
> > -
> > - err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> > - page_to_phys(pages[i]), prot,
> > - page_shift);
> > - if (err)
> > - return err;
> > -
> > - addr += 1UL << page_shift;
> > - }
> > -
> > - return 0;
> > + return vmap_pages_range_noflush_walk(addr, end, prot, pages,
> > + min(page_shift, PMD_SHIFT));
>
>
> We can easily extend to PUD huge mappings right? Not sure whether we
> should keep everything symmetric to how vmap_range_noflush() operates
> right now, since P4D mappings don't exist, but PUD looks worthwhile.
>
PUD mapping requires 1GB of contiguous physical memory, but the buddy
allocator's MAX_PAGE_ORDER is 10 (4MB on 4K pages). So page_shift
passed to vmap_pages_range_noflush_walk() never exceeds PMD_SHIFT.
Thanks,
Wen
> > }
> >
> > int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk
2026-05-28 3:39 ` Wen Jiang
@ 2026-05-29 5:28 ` Dev Jain
0 siblings, 0 replies; 21+ messages in thread
From: Dev Jain @ 2026-05-29 5:28 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 28/05/26 9:09 am, Wen Jiang wrote:
> On Wed, 27 May 2026 at 13:59, Dev Jain <dev.jain@arm.com> wrote:
>>
>>
>>
>> On 22/05/26 11:01 am, Wen Jiang wrote:
>> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>>
>> vmap_pages_range_noflush_walk() (formerly vmap_small_pages_range_noflush())
>> provides a clean interface by taking struct page **pages and mapping them
>> via direct PTE iteration. This avoids the page table rewalk seen when
>> using vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
>>
>> Extend it to support larger page_shift values, and add PMD- and
>> contiguous-PTE mappings as well. Rename it to vmap_pages_range_noflush_walk()
>> since it now handles more than just small pages.
>>
>> For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer need to
>> iterate over pages one by one via vmap_range_noflush(), which would
>> otherwise lead to page table rewalk. The code is now unified with the
>> PAGE_SHIFT case by simply calling vmap_pages_range_noflush_walk().
>>
>> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
>> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
>> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
>> ---
>> mm/vmalloc.c | 71 +++++++++++++++++++++++++++++-----------------------
>> 1 file changed, 40 insertions(+), 31 deletions(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 53fd4ee460ea4..deb764abc0571 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -543,8 +543,10 @@ void vunmap_range(unsigned long addr, unsigned long end)
>>
>> static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
>> - pgtbl_mod_mask *mask)
>> + pgtbl_mod_mask *mask, unsigned int shift)
>> {
>> + unsigned long pfn, size;
>> + unsigned int steps;
>> int err = 0;
>> pte_t *pte;
>>
>> @@ -575,9 +577,10 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>> break;
>> }
>>
>> - set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
>> - (*nr)++;
>> - } while (pte++, addr += PAGE_SIZE, addr != end);
>> + pfn = page_to_pfn(page);
>> + size = vmap_set_ptes(pte, addr, end, pfn, prot, shift);
>> + steps = PFN_DOWN(size);
>> + } while (pte += steps, *nr += steps, addr += size, addr != end);
>>
>> lazy_mmu_mode_disable();
>> *mask |= PGTBL_PTE_MODIFIED;
>> @@ -587,7 +590,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>
>> static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
>> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
>> - pgtbl_mod_mask *mask)
>> + pgtbl_mod_mask *mask, unsigned int shift)
>> {
>> pmd_t *pmd;
>> unsigned long next;
>>> @@ -597,7 +600,27 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
>>> return -ENOMEM;
>>> do {
>>> next = pmd_addr_end(addr, end);
>>> - if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
>>> +
>>> + if (shift == PMD_SHIFT) {
>>> + struct page *page = pages[*nr];
>>> + phys_addr_t phys_addr;
>>> +
>>> + if (WARN_ON(!page))
>>> + return -ENOMEM;
>>> + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
>>> + return -EINVAL;
>>
>>
>> So I know these !page and !pfn_valid checks have been copied from vmap_pages_pte_range,
>> but do they mean anything?
>>
>> I think pfn_valid() makes sense in that someone may take a random VA/PA, convert it into a struct
>> page and pass to vmap layer. But I don't see how anyone would pass page == NULL? At the
>> very least, returning ENOMEM does not make sense because the pages are not being
>> allocated by vmap() but have already been allocated.
>
> Hi Dev,
>
> vmap() is EXPORT_SYMBOL with many callers across drivers, each
> constructing the pages array differently. The !page check guards
> against malformed arrays at this API boundary.
>
> The same -ENOMEM issue also exists in vmap_pages_pte_range().
> Should I fix both in this patchset or leave it as a separate cleanup?
Hmm - I think this is the issue of who validates what. I think the change
should be done but this is not an urgent issue, so let us leave this
for now.
>
>>
>>> +
>>> + phys_addr = page_to_phys(page);
>>> +
>>> + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
>>> + shift)) {
>>> + *mask |= PGTBL_PMD_MODIFIED;
>>> + *nr += 1 << (shift - PAGE_SHIFT);
>>> + continue;
>>> + }
>>> + }
>>> +
>>> + if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
>>> return -ENOMEM;
>>> } while (pmd++, addr = next, addr != end);
>>> return 0;
>>> @@ -605,7 +628,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
>>>
>>> static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
>>> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
>>> - pgtbl_mod_mask *mask)
>>> + pgtbl_mod_mask *mask, unsigned int shift)
>>> {
>>> pud_t *pud;
>>> unsigned long next;
>>> @@ -615,7 +638,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
>>> return -ENOMEM;
>>> do {
>>> next = pud_addr_end(addr, end);
>>> - if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
>>> + if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
>>> return -ENOMEM;
>>> } while (pud++, addr = next, addr != end);
>>> return 0;
>>> @@ -623,7 +646,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
>>>
>>> static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
>>> unsigned long end, pgprot_t prot, struct page **pages, int *nr,
>>> - pgtbl_mod_mask *mask)
>>> + pgtbl_mod_mask *mask, unsigned int shift)
>>> {
>>> p4d_t *p4d;
>>> unsigned long next;
>>> @@ -633,14 +656,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
>>> return -ENOMEM;
>>> do {
>>> next = p4d_addr_end(addr, end);
>>> - if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
>>> + if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
>>> return -ENOMEM;
>>> } while (p4d++, addr = next, addr != end);
>>> return 0;
>>> }
>>>
>>> -static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>>> - pgprot_t prot, struct page **pages)
>>> +static int vmap_pages_range_noflush_walk(unsigned long addr, unsigned long end,
>>> + pgprot_t prot, struct page **pages, unsigned int shift)
>>> {
>>> unsigned long start = addr;
>>> pgd_t *pgd;
>>> @@ -655,7 +678,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>>> next = pgd_addr_end(addr, end);
>>> if (pgd_bad(*pgd))
>>> mask |= PGTBL_PGD_MODIFIED;
>>> - err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
>>> + err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
>>> if (err)
>>> break;
>>> } while (pgd++, addr = next, addr != end);
>>> @@ -678,27 +701,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>>> int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>>> pgprot_t prot, struct page **pages, unsigned int page_shift)
>>> {
>>> - unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
>>> -
>>> WARN_ON(page_shift < PAGE_SHIFT);
>>>
>>> - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
>>> - page_shift == PAGE_SHIFT)
>>> - return vmap_small_pages_range_noflush(addr, end, prot, pages);
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
>>> + page_shift = PAGE_SHIFT;
>>>
>>> - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
>>> - int err;
>>> -
>>> - err = vmap_range_noflush(addr, addr + (1UL << page_shift),
>>> - page_to_phys(pages[i]), prot,
>>> - page_shift);
>>> - if (err)
>>> - return err;
>>> -
>>> - addr += 1UL << page_shift;
>>> - }
>>> -
>>> - return 0;
>>> + return vmap_pages_range_noflush_walk(addr, end, prot, pages,
>>> + min(page_shift, PMD_SHIFT));
>>
>>
>> We can easily extend to PUD huge mappings right? Not sure whether we
>> should keep everything symmetric to how vmap_range_noflush() operates
>> right now, since P4D mappings don't exist, but PUD looks worthwhile.
>>
>
> PUD mapping requires 1GB of contiguous physical memory, but the buddy
> allocator's MAX_PAGE_ORDER is 10 (4MB on 4K pages). So page_shift
> passed to vmap_pages_range_noflush_walk() never exceeds PMD_SHIFT.
Ah okay. It is really confusing to see PUD helpers everywhere when
they are basically dead code ...
>
> Thanks,
> Wen
>>> }
>>>
>>> int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
` (3 preceding siblings ...)
2026-05-22 5:31 ` [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-05-27 8:27 ` Dev Jain
2026-05-22 5:31 ` [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
2026-05-22 18:07 ` [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Andrew Morton
6 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
In many cases, the pages passed to vmap() may include high-order
pages. For example, the systemheap often allocates pages in descending
order: order 8, then 4, then 0. Currently, vmap() iterates over every
page individually—even pages inside a high-order block are handled
one by one.
This patch detects physically contiguous pages (regardless of whether
they are compound or non-compound) by scanning with
num_pages_contiguous(), and maps them as a single contiguous block
whenever possible. The first page's pfn must be aligned to the
mapping order for the batched mapping to be used.
Pages with the same page_shift are coalesced and mapped via
vmap_pages_range_noflush_walk() to avoid page table rewalk.
As users typically allocate memory in descending orders (e.g.
8 → 4 → 0), once an order-0 page is encountered, we stop scanning
for contiguous pages since subsequent pages are likely order-0 as well.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Co-developed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
mm/vmalloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 80 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index deb764abc0571..50642246f4d40 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3542,6 +3542,84 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);
+static inline int get_vmap_batch_order(struct page **pages,
+ unsigned int max_steps, unsigned int idx)
+{
+ unsigned int nr_contig;
+ int order;
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
+ ioremap_max_page_shift == PAGE_SHIFT)
+ return 0;
+
+ nr_contig = num_pages_contiguous(&pages[idx], max_steps);
+ if (nr_contig < 2)
+ return 0;
+
+ order = fls(nr_contig) - 1;
+
+ if (arch_vmap_pte_supported_shift(PAGE_SIZE << order) == PAGE_SHIFT)
+ return 0;
+
+ /* Ensure the first page's pfn is aligned to the order */
+ if (!IS_ALIGNED(page_to_pfn(pages[idx]), 1 << order))
+ return 0;
+
+ return order;
+}
+
+static int vmap_batched(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages)
+{
+ unsigned int count = (end - addr) >> PAGE_SHIFT;
+ unsigned int prev_shift = 0, idx = 0;
+ unsigned long start = addr, map_addr = addr;
+ int err;
+
+ err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
+ PAGE_SHIFT, GFP_KERNEL);
+ if (err)
+ goto out;
+
+ for (unsigned int i = 0; i < count; ) {
+ unsigned int shift = PAGE_SHIFT +
+ get_vmap_batch_order(pages, count - i, i);
+
+ if (!i)
+ prev_shift = shift;
+
+ if (shift != prev_shift) {
+ err = vmap_pages_range_noflush_walk(map_addr, addr,
+ prot, pages + idx,
+ min(prev_shift, PMD_SHIFT));
+ if (err)
+ goto out;
+ prev_shift = shift;
+ map_addr = addr;
+ idx = i;
+ }
+
+ /*
+ * Once small pages are encountered, the remaining pages
+ * are likely small as well.
+ */
+ if (shift == PAGE_SHIFT)
+ break;
+
+ addr += 1UL << shift;
+ i += 1U << (shift - PAGE_SHIFT);
+ }
+
+ /* Remaining */
+ if (map_addr < end)
+ err = vmap_pages_range_noflush_walk(map_addr, end,
+ prot, pages + idx, min(prev_shift, PMD_SHIFT));
+
+out:
+ flush_cache_vmap(start, end);
+ return err;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3585,8 +3663,8 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
addr = (unsigned long)area->addr;
- if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
- pages, PAGE_SHIFT) < 0) {
+ if (vmap_batched(addr, addr + size, pgprot_nx(prot),
+ pages) < 0) {
vunmap(area->addr);
return NULL;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-05-22 5:31 ` [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible Wen Jiang
@ 2026-05-27 8:27 ` Dev Jain
2026-05-28 3:42 ` Wen Jiang
0 siblings, 1 reply; 21+ messages in thread
From: Dev Jain @ 2026-05-27 8:27 UTC (permalink / raw)
To: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki
Cc: baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> In many cases, the pages passed to vmap() may include high-order
> pages. For example, the systemheap often allocates pages in descending
> order: order 8, then 4, then 0. Currently, vmap() iterates over every
> page individually—even pages inside a high-order block are handled
> one by one.
>
> This patch detects physically contiguous pages (regardless of whether
> they are compound or non-compound) by scanning with
> num_pages_contiguous(), and maps them as a single contiguous block
> whenever possible. The first page's pfn must be aligned to the
> mapping order for the batched mapping to be used.
>
> Pages with the same page_shift are coalesced and mapped via
> vmap_pages_range_noflush_walk() to avoid page table rewalk.
>
> As users typically allocate memory in descending orders (e.g.
> 8 → 4 → 0), once an order-0 page is encountered, we stop scanning
> for contiguous pages since subsequent pages are likely order-0 as well.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Co-developed-by: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 80 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index deb764abc0571..50642246f4d40 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3542,6 +3542,84 @@ void vunmap(const void *addr)
> }
> EXPORT_SYMBOL(vunmap);
>
> +static inline int get_vmap_batch_order(struct page **pages,
> + unsigned int max_steps, unsigned int idx)
> +{
> + unsigned int nr_contig;
> + int order;
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
> + ioremap_max_page_shift == PAGE_SHIFT)
Why bail out on ioremap_max_page_shift == PAGE_SHIFT? The code
path for ioremap is different from vmap right?
> + return 0;
> +
> + nr_contig = num_pages_contiguous(&pages[idx], max_steps);
> + if (nr_contig < 2)
> + return 0;
> +
> + order = fls(nr_contig) - 1;
> +
> + if (arch_vmap_pte_supported_shift(PAGE_SIZE << order) == PAGE_SHIFT)
> + return 0;
> +
> + /* Ensure the first page's pfn is aligned to the order */
> + if (!IS_ALIGNED(page_to_pfn(pages[idx]), 1 << order))
> + return 0;
> +
> + return order;
> +}
> +
> +static int vmap_batched(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages)
> +{
> + unsigned int count = (end - addr) >> PAGE_SHIFT;
> + unsigned int prev_shift = 0, idx = 0;
> + unsigned long start = addr, map_addr = addr;
> + int err;
> +
> + err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
> + PAGE_SHIFT, GFP_KERNEL);
> + if (err)
> + goto out;
> +
> + for (unsigned int i = 0; i < count; ) {
> + unsigned int shift = PAGE_SHIFT +
> + get_vmap_batch_order(pages, count - i, i);
> +
> + if (!i)
> + prev_shift = shift;
> +
> + if (shift != prev_shift) {
> + err = vmap_pages_range_noflush_walk(map_addr, addr,
It would be worth documenting vmap_pages_range_noflush_walk() that
it can take an array of pages which are not all contiguous, but it
may have contiguous chunks, as hinted by page_shift.
Otherwise this looks good.
> + prot, pages + idx,
> + min(prev_shift, PMD_SHIFT));
> + if (err)
> + goto out;
> + prev_shift = shift;
> + map_addr = addr;
> + idx = i;
> + }
> +
> + /*
> + * Once small pages are encountered, the remaining pages
> + * are likely small as well.
> + */
> + if (shift == PAGE_SHIFT)
> + break;
> +
> + addr += 1UL << shift;
> + i += 1U << (shift - PAGE_SHIFT);
> + }
> +
> + /* Remaining */
> + if (map_addr < end)
> + err = vmap_pages_range_noflush_walk(map_addr, end,
> + prot, pages + idx, min(prev_shift, PMD_SHIFT));
> +
> +out:
> + flush_cache_vmap(start, end);
> + return err;
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3585,8 +3663,8 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> addr = (unsigned long)area->addr;
> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> - pages, PAGE_SHIFT) < 0) {
> + if (vmap_batched(addr, addr + size, pgprot_nx(prot),
> + pages) < 0) {
> vunmap(area->addr);
> return NULL;
> }
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-05-27 8:27 ` Dev Jain
@ 2026-05-28 3:42 ` Wen Jiang
2026-05-29 5:57 ` Dev Jain
0 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-28 3:42 UTC (permalink / raw)
To: Dev Jain
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Wed, 27 May 2026 at 16:28, Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 22/05/26 11:01 am, Wen Jiang wrote:
> > From: "Barry Song (Xiaomi)" <baohua@kernel.org>
> >
> > In many cases, the pages passed to vmap() may include high-order
> > pages. For example, the systemheap often allocates pages in descending
> > order: order 8, then 4, then 0. Currently, vmap() iterates over every
> > page individually—even pages inside a high-order block are handled
> > one by one.
> >
> > This patch detects physically contiguous pages (regardless of whether
> > they are compound or non-compound) by scanning with
> > num_pages_contiguous(), and maps them as a single contiguous block
> > whenever possible. The first page's pfn must be aligned to the
> > mapping order for the batched mapping to be used.
> >
> > Pages with the same page_shift are coalesced and mapped via
> > vmap_pages_range_noflush_walk() to avoid page table rewalk.
> >
> > As users typically allocate memory in descending orders (e.g.
> > 8 → 4 → 0), once an order-0 page is encountered, we stop scanning
> > for contiguous pages since subsequent pages are likely order-0 as well.
> >
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > Co-developed-by: Dev Jain <dev.jain@arm.com>
> > Signed-off-by: Dev Jain <dev.jain@arm.com>
> > Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> > Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> > ---
> > mm/vmalloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 80 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index deb764abc0571..50642246f4d40 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3542,6 +3542,84 @@ void vunmap(const void *addr)
> > }
> > EXPORT_SYMBOL(vunmap);
> >
> > +static inline int get_vmap_batch_order(struct page **pages,
> > + unsigned int max_steps, unsigned int idx)
> > +{
> > + unsigned int nr_contig;
> > + int order;
> > +
> > + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
> > + ioremap_max_page_shift == PAGE_SHIFT)
>
>
> Why bail out on ioremap_max_page_shift == PAGE_SHIFT? The code
> path for ioremap is different from vmap right?
>
>
ioremap_max_page_shift is under CONFIG_HAVE_ARCH_HUGE_VMAP which
controls both ioremap and vmap huge mappings.
> > + return 0;
> > +
> > + nr_contig = num_pages_contiguous(&pages[idx], max_steps);
> > + if (nr_contig < 2)
> > + return 0;
> > +
> > + order = fls(nr_contig) - 1;
> > +
> > + if (arch_vmap_pte_supported_shift(PAGE_SIZE << order) == PAGE_SHIFT)
> > + return 0;
> > +
> > + /* Ensure the first page's pfn is aligned to the order */
> > + if (!IS_ALIGNED(page_to_pfn(pages[idx]), 1 << order))
> > + return 0;
> > +
> > + return order;
> > +}
> > +
> > +static int vmap_batched(unsigned long addr, unsigned long end,
> > + pgprot_t prot, struct page **pages)
> > +{
> > + unsigned int count = (end - addr) >> PAGE_SHIFT;
> > + unsigned int prev_shift = 0, idx = 0;
> > + unsigned long start = addr, map_addr = addr;
> > + int err;
> > +
> > + err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
> > + PAGE_SHIFT, GFP_KERNEL);
> > + if (err)
> > + goto out;
> > +
> > + for (unsigned int i = 0; i < count; ) {
> > + unsigned int shift = PAGE_SHIFT +
> > + get_vmap_batch_order(pages, count - i, i);
> > +
> > + if (!i)
> > + prev_shift = shift;
> > +
> > + if (shift != prev_shift) {
> > + err = vmap_pages_range_noflush_walk(map_addr, addr,
>
> It would be worth documenting vmap_pages_range_noflush_walk() that
> it can take an array of pages which are not all contiguous, but it
> may have contiguous chunks, as hinted by page_shift.
>
> Otherwise this looks good.
>
> > + prot, pages + idx,
> > + min(prev_shift, PMD_SHIFT));
> > + if (err)
> > + goto out;
> > + prev_shift = shift;
> > + map_addr = addr;
> > + idx = i;
> > + }
> > +
> > + /*
> > + * Once small pages are encountered, the remaining pages
> > + * are likely small as well.
> > + */
> > + if (shift == PAGE_SHIFT)
> > + break;
> > +
> > + addr += 1UL << shift;
> > + i += 1U << (shift - PAGE_SHIFT);
> > + }
> > +
> > + /* Remaining */
> > + if (map_addr < end)
> > + err = vmap_pages_range_noflush_walk(map_addr, end,
> > + prot, pages + idx, min(prev_shift, PMD_SHIFT));
> > +
> > +out:
> > + flush_cache_vmap(start, end);
> > + return err;
> > +}
> > +
> > /**
> > * vmap - map an array of pages into virtually contiguous space
> > * @pages: array of page pointers
> > @@ -3585,8 +3663,8 @@ void *vmap(struct page **pages, unsigned int count,
> > return NULL;
> >
> > addr = (unsigned long)area->addr;
> > - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> > - pages, PAGE_SHIFT) < 0) {
> > + if (vmap_batched(addr, addr + size, pgprot_nx(prot),
> > + pages) < 0) {
> > vunmap(area->addr);
> > return NULL;
> > }
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible
2026-05-28 3:42 ` Wen Jiang
@ 2026-05-29 5:57 ` Dev Jain
0 siblings, 0 replies; 21+ messages in thread
From: Dev Jain @ 2026-05-29 5:57 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 28/05/26 9:12 am, Wen Jiang wrote:
> On Wed, 27 May 2026 at 16:28, Dev Jain <dev.jain@arm.com> wrote:
>>
>>
>>
>> On 22/05/26 11:01 am, Wen Jiang wrote:
>>> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>>>
>>> In many cases, the pages passed to vmap() may include high-order
>>> pages. For example, the systemheap often allocates pages in descending
>>> order: order 8, then 4, then 0. Currently, vmap() iterates over every
>>> page individually—even pages inside a high-order block are handled
>>> one by one.
>>>
>>> This patch detects physically contiguous pages (regardless of whether
>>> they are compound or non-compound) by scanning with
>>> num_pages_contiguous(), and maps them as a single contiguous block
>>> whenever possible. The first page's pfn must be aligned to the
>>> mapping order for the batched mapping to be used.
>>>
>>> Pages with the same page_shift are coalesced and mapped via
>>> vmap_pages_range_noflush_walk() to avoid page table rewalk.
>>>
>>> As users typically allocate memory in descending orders (e.g.
>>> 8 → 4 → 0), once an order-0 page is encountered, we stop scanning
>>> for contiguous pages since subsequent pages are likely order-0 as well.
>>>
>>> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
>>> Co-developed-by: Dev Jain <dev.jain@arm.com>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
>>> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
>>> ---
>>> mm/vmalloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>> 1 file changed, 80 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index deb764abc0571..50642246f4d40 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -3542,6 +3542,84 @@ void vunmap(const void *addr)
>>> }
>>> EXPORT_SYMBOL(vunmap);
>>>
>>> +static inline int get_vmap_batch_order(struct page **pages,
>>> + unsigned int max_steps, unsigned int idx)
>>> +{
>>> + unsigned int nr_contig;
>>> + int order;
>>> +
>>> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
>>> + ioremap_max_page_shift == PAGE_SHIFT)
>>
>>
>> Why bail out on ioremap_max_page_shift == PAGE_SHIFT? The code
>> path for ioremap is different from vmap right?
>>
>>
>
> ioremap_max_page_shift is under CONFIG_HAVE_ARCH_HUGE_VMAP which
> controls both ioremap and vmap huge mappings.
I don't get it. So with this patch if nohugeiomap is passed on kernel
cmdline, then vmap-huge is also disabled. That does not sound correct.
Currently ioremap_max_page_shift does not play at all with the normal
vmap code path. It is only involved in ioremap_page_range().
>
>>> + return 0;
>>> +
>>> + nr_contig = num_pages_contiguous(&pages[idx], max_steps);
>>> + if (nr_contig < 2)
>>> + return 0;
>>> +
>>> + order = fls(nr_contig) - 1;
>>> +
>>> + if (arch_vmap_pte_supported_shift(PAGE_SIZE << order) == PAGE_SHIFT)
>>> + return 0;
Also, for arches where this function does not do anything special
(i.e return PAGE_SHIFT), we will effectively not do any huge mappings
for them.
>>> +
>>> + /* Ensure the first page's pfn is aligned to the order */
>>> + if (!IS_ALIGNED(page_to_pfn(pages[idx]), 1 << order))
>>> + return 0;
This condition is a bit fragile. It may happen that we have, say 2^8
contigous pages, but they are aligned to only 2^4. We are operating
on a page array and have no idea if the caller has passed some
random subrange of the array.
I think the purpose of these checks is this - to do an early bailout
if arch does not support huge mappings, or the alignment is not correct,
instead of finding this out very deep into vmap_pages_range_noflush_walk.
So you could do something like (completely untested and may miss some edge cases):
order = ilog2(nr_contig);
order = min(order, __ffs(page_to_pfn(pages[idx])));
order = vm_shift(PAGE_SIZE << order) - PAGE_SHIFT;
Where vm_shift() is the helper I had used in my patch.
>>> +
>>> + return order;
>>> +}
>>> +
>>> +static int vmap_batched(unsigned long addr, unsigned long end,
>>> + pgprot_t prot, struct page **pages)
>>> +{
>>> + unsigned int count = (end - addr) >> PAGE_SHIFT;
>>> + unsigned int prev_shift = 0, idx = 0;
>>> + unsigned long start = addr, map_addr = addr;
>>> + int err;
>>> +
>>> + err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
>>> + PAGE_SHIFT, GFP_KERNEL);
>>> + if (err)
>>> + goto out;
>>> +
>>> + for (unsigned int i = 0; i < count; ) {
>>> + unsigned int shift = PAGE_SHIFT +
>>> + get_vmap_batch_order(pages, count - i, i);
>>> +
>>> + if (!i)
>>> + prev_shift = shift;
>>> +
>>> + if (shift != prev_shift) {
>>> + err = vmap_pages_range_noflush_walk(map_addr, addr,
>>
>> It would be worth documenting vmap_pages_range_noflush_walk() that
>> it can take an array of pages which are not all contiguous, but it
>> may have contiguous chunks, as hinted by page_shift.
>>
>> Otherwise this looks good.
>>
>>> + prot, pages + idx,
>>> + min(prev_shift, PMD_SHIFT));
>>> + if (err)
>>> + goto out;
>>> + prev_shift = shift;
>>> + map_addr = addr;
>>> + idx = i;
>>> + }
>>> +
>>> + /*
>>> + * Once small pages are encountered, the remaining pages
>>> + * are likely small as well.
>>> + */
>>> + if (shift == PAGE_SHIFT)
>>> + break;
>>> +
>>> + addr += 1UL << shift;
>>> + i += 1U << (shift - PAGE_SHIFT);
>>> + }
>>> +
>>> + /* Remaining */
>>> + if (map_addr < end)
>>> + err = vmap_pages_range_noflush_walk(map_addr, end,
>>> + prot, pages + idx, min(prev_shift, PMD_SHIFT));
>>> +
>>> +out:
>>> + flush_cache_vmap(start, end);
>>> + return err;
>>> +}
>>> +
>>> /**
>>> * vmap - map an array of pages into virtually contiguous space
>>> * @pages: array of page pointers
>>> @@ -3585,8 +3663,8 @@ void *vmap(struct page **pages, unsigned int count,
>>> return NULL;
>>>
>>> addr = (unsigned long)area->addr;
>>> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
>>> - pages, PAGE_SHIFT) < 0) {
>>> + if (vmap_batched(addr, addr + size, pgprot_nx(prot),
>>> + pages) < 0) {
>>> vunmap(area->addr);
>>> return NULL;
>>> }
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
` (4 preceding siblings ...)
2026-05-22 5:31 ` [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible Wen Jiang
@ 2026-05-22 5:31 ` Wen Jiang
2026-05-23 7:53 ` Uladzislau Rezki
2026-05-27 6:25 ` Dev Jain
2026-05-22 18:07 ` [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Andrew Morton
6 siblings, 2 replies; 21+ messages in thread
From: Wen Jiang @ 2026-05-22 5:31 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
Try to align the vmap virtual address to PMD_SHIFT or a
larger PTE mapping size hinted by the architecture, so
contiguous pages can be batch-mapped when setting PMD or
PTE entries.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
---
mm/vmalloc.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 50642246f4d40..040d400928aab 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3620,6 +3620,37 @@ static int vmap_batched(unsigned long addr, unsigned long end,
return err;
}
+static struct vm_struct *get_aligned_vm_area(unsigned long size,
+ unsigned long flags, const void *caller)
+{
+ struct vm_struct *vm_area;
+ unsigned int shift;
+
+ /* Try PMD alignment for large sizes */
+ if (size >= PMD_SIZE) {
+ vm_area = __get_vm_area_node(size, PMD_SIZE, PAGE_SHIFT, flags,
+ VMALLOC_START, VMALLOC_END,
+ NUMA_NO_NODE, GFP_KERNEL, caller);
+ if (vm_area)
+ return vm_area;
+ }
+
+ /* Try CONT_PTE alignment */
+ shift = arch_vmap_pte_supported_shift(size);
+ if (shift > PAGE_SHIFT) {
+ vm_area = __get_vm_area_node(size, 1UL << shift, PAGE_SHIFT, flags,
+ VMALLOC_START, VMALLOC_END,
+ NUMA_NO_NODE, GFP_KERNEL, caller);
+ if (vm_area)
+ return vm_area;
+ }
+
+ /* Fall back to page alignment */
+ return __get_vm_area_node(size, PAGE_SIZE, PAGE_SHIFT, flags,
+ VMALLOC_START, VMALLOC_END,
+ NUMA_NO_NODE, GFP_KERNEL, caller);
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3658,7 +3689,7 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
size = (unsigned long)count << PAGE_SHIFT;
- area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ area = get_aligned_vm_area(size, flags, __builtin_return_address(0));
if (!area)
return NULL;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings
2026-05-22 5:31 ` [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
@ 2026-05-23 7:53 ` Uladzislau Rezki
2026-05-27 6:25 ` Dev Jain
1 sibling, 0 replies; 21+ messages in thread
From: Uladzislau Rezki @ 2026-05-23 7:53 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
baohua, Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Fri, May 22, 2026 at 01:31:46PM +0800, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> Try to align the vmap virtual address to PMD_SHIFT or a
> larger PTE mapping size hinted by the architecture, so
> contiguous pages can be batch-mapped when setting PMD or
> PTE entries.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
> ---
> mm/vmalloc.c | 33 ++++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 50642246f4d40..040d400928aab 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3620,6 +3620,37 @@ static int vmap_batched(unsigned long addr, unsigned long end,
> return err;
> }
>
> +static struct vm_struct *get_aligned_vm_area(unsigned long size,
> + unsigned long flags, const void *caller)
> +{
> + struct vm_struct *vm_area;
> + unsigned int shift;
> +
> + /* Try PMD alignment for large sizes */
> + if (size >= PMD_SIZE) {
> + vm_area = __get_vm_area_node(size, PMD_SIZE, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
> + if (vm_area)
> + return vm_area;
> + }
> +
> + /* Try CONT_PTE alignment */
> + shift = arch_vmap_pte_supported_shift(size);
> + if (shift > PAGE_SHIFT) {
> + vm_area = __get_vm_area_node(size, 1UL << shift, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
> + if (vm_area)
> + return vm_area;
> + }
> +
> + /* Fall back to page alignment */
> + return __get_vm_area_node(size, PAGE_SIZE, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3658,7 +3689,7 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> size = (unsigned long)count << PAGE_SHIFT;
> - area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + area = get_aligned_vm_area(size, flags, __builtin_return_address(0));
> if (!area)
> return NULL;
>
> --
> 2.34.1
>
This one LGTM:
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings
2026-05-22 5:31 ` [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
2026-05-23 7:53 ` Uladzislau Rezki
@ 2026-05-27 6:25 ` Dev Jain
1 sibling, 0 replies; 21+ messages in thread
From: Dev Jain @ 2026-05-27 6:25 UTC (permalink / raw)
To: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki
Cc: baohua, Xueyuan.chen21, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On 22/05/26 11:01 am, Wen Jiang wrote:
> From: "Barry Song (Xiaomi)" <baohua@kernel.org>
>
> Try to align the vmap virtual address to PMD_SHIFT or a
> larger PTE mapping size hinted by the architecture, so
> contiguous pages can be batch-mapped when setting PMD or
> PTE entries.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> Signed-off-by: Wen Jiang <jiangwen6@xiaomi.com>
> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Hmm okay I would have preferred to squash this in the previous, but
the correctness of previous patch does not rely on this, so it's fine.
> ---
> mm/vmalloc.c | 33 ++++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 50642246f4d40..040d400928aab 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3620,6 +3620,37 @@ static int vmap_batched(unsigned long addr, unsigned long end,
> return err;
> }
>
This is screaming for a helper :)
> +static struct vm_struct *get_aligned_vm_area(unsigned long size,
> + unsigned long flags, const void *caller)
Call this vmap_get_aligned_vm_area, then ...
> +{
> + struct vm_struct *vm_area;
> + unsigned int shift;
> +
> + /* Try PMD alignment for large sizes */
> + if (size >= PMD_SIZE) {
> + vm_area = __get_vm_area_node(size, PMD_SIZE, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
Add a wrapper over this called __get_vm_area_node_aligned_caller, which can
call __get_vm_area_node() with all other arguments fixed, except "align".
> + if (vm_area)
> + return vm_area;
> + }
> +
> + /* Try CONT_PTE alignment */
> + shift = arch_vmap_pte_supported_shift(size);
> + if (shift > PAGE_SHIFT) {
> + vm_area = __get_vm_area_node(size, 1UL << shift, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
> + if (vm_area)
> + return vm_area;
> + }
> +
> + /* Fall back to page alignment */
> + return __get_vm_area_node(size, PAGE_SIZE, PAGE_SHIFT, flags,
> + VMALLOC_START, VMALLOC_END,
> + NUMA_NO_NODE, GFP_KERNEL, caller);
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3658,7 +3689,7 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> size = (unsigned long)count << PAGE_SHIFT;
> - area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + area = get_aligned_vm_area(size, flags, __builtin_return_address(0));
> if (!area)
> return NULL;
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
2026-05-22 5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
` (5 preceding siblings ...)
2026-05-22 5:31 ` [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
@ 2026-05-22 18:07 ` Andrew Morton
2026-05-23 8:26 ` Wen Jiang
6 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2026-05-22 18:07 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, urezki, baohua,
Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Fri, 22 May 2026 13:31:40 +0800 Wen Jiang <jiangwenxiaomi@gmail.com> wrote:
> This patchset accelerates ioremap, vmalloc, and vmap when the memory
> is physically fully or partially contiguous.
Thanks. AI review asked a few things and might have found an existing
32-bit bug in vmap():
https://sashiko.dev/#/patchset/20260522053146.83209-1-jiangwenxiaomi@gmail.com
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
2026-05-22 18:07 ` [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Andrew Morton
@ 2026-05-23 8:26 ` Wen Jiang
2026-05-23 21:40 ` Andrew Morton
0 siblings, 1 reply; 21+ messages in thread
From: Wen Jiang @ 2026-05-23 8:26 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, urezki, baohua,
Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Sat, 23 May 2026 at 02:07, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 22 May 2026 13:31:40 +0800 Wen Jiang <jiangwenxiaomi@gmail.com> wrote:
>
> > This patchset accelerates ioremap, vmalloc, and vmap when the memory
> > is physically fully or partially contiguous.
>
> Thanks. AI review asked a few things and might have found an existing
> 32-bit bug in vmap():
>
> https://sashiko.dev/#/patchset/20260522053146.83209-1-jiangwenxiaomi@gmail.com
Hi Andrew,
I've gone through the Sashiko findings:
- Patch 5 (arch_vmap_pte_supported_shift on x86): Over-interpretation.
This targets ARM64 CONT_PTE. x86 falls through with PAGE_SHIFT
same as before.
- Patch 5 (1 << order overflow at order=31): Over-interpretation.
Reaching order=31 requires 8TB contiguous in a single vmap()
not a realistic usage pattern.
- Patch 6 (GFP_KERNEL triggering purge): The purge only triggers
when vmalloc space is already under pressure, and benefits the
subsequent PAGE_SIZE fallback as well, not wasted work.
- Patch 6 (32-bit count << PAGE_SHIFT overflow): Pre-existing.
Will send a separate fix.
- Patch 6 (unconditional alignment without checking contiguity):
The main vmap() users typically pass contiguous pages
(e.g. system_heap order 8 -> 4 -> 0).
Thanks,
Wen
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
2026-05-23 8:26 ` Wen Jiang
@ 2026-05-23 21:40 ` Andrew Morton
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2026-05-23 21:40 UTC (permalink / raw)
To: Wen Jiang
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, urezki, baohua,
Xueyuan.chen21, dev.jain, rppt, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, jiangwen6
On Sat, 23 May 2026 16:26:36 +0800 Wen Jiang <jiangwenxiaomi@gmail.com> wrote:
> On Sat, 23 May 2026 at 02:07, Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 22 May 2026 13:31:40 +0800 Wen Jiang <jiangwenxiaomi@gmail.com> wrote:
> >
> > > This patchset accelerates ioremap, vmalloc, and vmap when the memory
> > > is physically fully or partially contiguous.
> >
> > Thanks. AI review asked a few things and might have found an existing
> > 32-bit bug in vmap():
> >
> > https://sashiko.dev/#/patchset/20260522053146.83209-1-jiangwenxiaomi@gmail.com
>
> Hi Andrew,
>
> I've gone through the Sashiko findings:
Great, thanks. I won't take any action at this time - let's see what
reviewers have to say.
^ permalink raw reply [flat|nested] 21+ messages in thread