From: Wen Jiang <jiangwenxiaomi@gmail.com>
To: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
catalin.marinas@arm.com, will@kernel.org,
akpm@linux-foundation.org, urezki@gmail.com
Cc: baohua@kernel.org, Xueyuan.chen21@gmail.com, dev.jain@arm.com,
rppt@kernel.org, david@kernel.org, ryan.roberts@arm.com,
anshuman.khandual@arm.com, ajd@linux.ibm.com,
linux-kernel@vger.kernel.org, Wen Jiang <jiangwen6@xiaomi.com>
Subject: [PATCH v2 0/7] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
Date: Thu, 14 May 2026 17:41:01 +0800 [thread overview]
Message-ID: <20260514094108.2016201-1-jiangwen6@xiaomi.com> (raw)
This patchset accelerates ioremap, vmalloc, and vmap when the memory
is physically fully or partially contiguous. Two techniques are used:
1. Avoid page table rewalk when setting PTEs/PMDs for multiple memory
segments
2. Use batched mappings wherever possible in both vmalloc and ARM64
layers
Besides accelerating the mapping path, this also enables large
mappings (PMD and cont-PTE) for vmap, which are currently not
supported.
Patches 1-2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
CONT-PTE regions instead of just one.
Patch 3 extracts a common helper vmap_set_ptes() that consolidates PTE
mapping logic between the ioremap and vmalloc/vmap paths, handling both
CONT_PTE and regular PTE mappings. This prepares for the next patch.
Patch 4 extends the page table walk path to support page shifts other
than PAGE_SHIFT and eliminates the page table rewalk for huge vmalloc
mappings. The function is renamed from vmap_small_pages_range_noflush()
to vmap_pages_range_noflush_walk().
Patches 5-7 add huge vmap support for contiguous pages, including
support for non-compound pages with pfn alignment verification.
On the RK3588 8-core ARM64 SoC, with tasks pinned to a little core and
the performance CPUfreq policy enabled, benchmark results:
* ioremap(1 MB): 1.35× faster (3407 ns -> 2526 ns)
* vmalloc(1 MB) mapping time (excluding allocation) with
VM_ALLOW_HUGE_VMAP: 1.42× faster (5.00 us -> 3.53us)
* vmap(100MB) with order-8 pages: 8.3× faster (1235 us -> 149 us)
Many thanks to Xueyuan Chen for his testing efforts on RK3588 boards.
Changes since v1:
- Fix condition order and use PMD_SIZE instead of CONT_PMD_SIZE in
patch 1 (Dev Jain)
- Squash patch 3+4 and patch 5+7 (Dev Jain)
- Replace "zigzag" with "page table rewalk" in commit messages
(Dev Jain)
- Rename vmap_small_pages_range_noflush() to
vmap_pages_range_noflush_walk() (Dev Jain)
- Extract vmap_set_ptes() as a new patch to consolidate PTE mapping
logic between vmap_pte_range() and vmap_pages_pte_range(), handling
both CONT_PTE and regular mappings (Mike Rapoport)
- Support non-compound pages in get_vmap_batch_order() by falling
back to physical contiguity scanning with pfn alignment check
(Dev Jain, Uladzislau Rezki)
- In get_vmap_batch_order(), filter out orders that the architecture
cannot batch by checking arch_vmap_pte_supported_shift() directly.
This avoids overhead for orders 1-3 on ARM64 CONT_PTE with 4K
pages. (patch 5)
Barry Song (Xiaomi) (6):
arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE
setup
arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple
CONT_PTE
mm/vmalloc: Extend page table walk to support larger page_shift sizes
and eliminate page table rewalk
mm/vmalloc: map contiguous pages in batches for vmap() if possible
mm/vmalloc: align vm_area so vmap() can batch mappings
mm/vmalloc: Stop scanning for compound pages after encountering small
pages in vmap
Wen Jiang (1):
mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic
arch/arm64/include/asm/vmalloc.h | 6 +-
arch/arm64/mm/hugetlbpage.c | 10 ++
mm/vmalloc.c | 221 ++++++++++++++++++++++++-------
3 files changed, 189 insertions(+), 48 deletions(-)
--
2.34.1
next reply other threads:[~2026-05-14 9:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 9:41 Wen Jiang [this message]
2026-05-14 9:41 ` [PATCH v2 1/7] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Wen Jiang
2026-05-14 9:41 ` [PATCH v2 2/7] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Wen Jiang
2026-05-14 9:41 ` [PATCH v2 3/7] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic Wen Jiang
2026-05-14 9:41 ` [PATCH v2 4/7] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk Wen Jiang
2026-05-14 9:41 ` [PATCH v2 5/7] mm/vmalloc: map contiguous pages in batches for vmap() if possible Wen Jiang
2026-05-14 9:41 ` [PATCH v2 6/7] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
2026-05-14 9:41 ` [PATCH v2 7/7] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap Wen Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260514094108.2016201-1-jiangwen6@xiaomi.com \
--to=jiangwenxiaomi@gmail.com \
--cc=Xueyuan.chen21@gmail.com \
--cc=ajd@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=jiangwen6@xiaomi.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=urezki@gmail.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox