linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] dma-mapping: arm64: support batched cache sync
@ 2025-12-26 22:52 Barry Song
  2025-12-26 22:52 ` [PATCH v2 1/8] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Barry Song @ 2025-12-26 22:52 UTC (permalink / raw)
  To: catalin.marinas, m.szyprowski, robin.murphy, will, iommu,
	linux-arm-kernel
  Cc: Juergen Gross, Barry Song, Stefano Stabellini, Ryan Roberts,
	Leon Romanovsky, Anshuman Khandual, Marc Zyngier, Joerg Roedel,
	linux-kernel, Tangquan Zheng, Oleksandr Tyshchenko, xen-devel,
	Suren Baghdasaryan, Ard Biesheuvel, Huacai Zhou

From: Barry Song <baohua@kernel.org>

Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.

For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.

Tangquan's results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.

I also ran this patch set on an RK3588 Rock5B+ board and
observed that millions of DMA sync operations were batched.

v2:
 * Refine a large amount of arm64 asm code based on feedback from
   Robin, thanks!
 * Drop batch_add APIs and always use arch_sync_dma_for_* + flush,
   even for a single buffer, based on Leon’s suggestion, thanks!
 * Refine a large amount of code based on feedback from Leon, thanks!
 * Also add batch support for iommu_dma_sync_sg_for_{cpu,device}
v1 link:
 https://lore.kernel.org/lkml/20251219053658.84978-1-21cnbao@gmail.com/

v1, diff with RFC:
 * Drop a large number of #ifdef/#else/#endif blocks based on feedback
   from Catalin and Marek, thanks!
 * Also add batched iova link/unlink support, marked as RFC since I lack
   the required hardware. This was suggested by Marek, thanks!
RFC link:
 https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/

Barry Song (8):
  arm64: Provide dcache_by_myline_op_nosync helper
  arm64: Provide dcache_clean_poc_nosync helper
  arm64: Provide dcache_inval_poc_nosync helper
  dma-mapping: Separate DMA sync issuing and completion waiting
  dma-mapping: Support batch mode for dma_direct_sync_sg_for_*
  dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
  dma-iommu: Support DMA sync batch mode for IOVA link and unlink
  dma-iommu: Support DMA sync batch mode for iommu_dma_sync_sg_for_{cpu,
    device}

 arch/arm64/include/asm/assembler.h  | 24 +++++++++---
 arch/arm64/include/asm/cache.h      |  6 +++
 arch/arm64/include/asm/cacheflush.h |  2 +
 arch/arm64/kernel/relocate_kernel.S |  3 +-
 arch/arm64/mm/cache.S               | 57 +++++++++++++++++++++++------
 arch/arm64/mm/dma-mapping.c         |  4 +-
 drivers/iommu/dma-iommu.c           | 35 ++++++++++++++----
 drivers/xen/swiotlb-xen.c           | 24 ++++++++----
 include/linux/dma-map-ops.h         |  6 +++
 kernel/dma/direct.c                 | 23 +++++++++---
 kernel/dma/direct.h                 | 21 ++++++++---
 kernel/dma/mapping.c                |  6 +--
 kernel/dma/swiotlb.c                |  4 +-
 13 files changed, 165 insertions(+), 50 deletions(-)

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: Huacai Zhou <zhouhuacai@oppo.com>
--
2.43.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-01-06 19:47 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 22:52 [PATCH v2 0/8] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-26 22:52 ` [PATCH v2 1/8] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
2025-12-26 22:52 ` [PATCH v2 2/8] arm64: Provide dcache_clean_poc_nosync helper Barry Song
2025-12-26 22:52 ` [PATCH v2 3/8] arm64: Provide dcache_inval_poc_nosync helper Barry Song
2025-12-26 22:52 ` [PATCH v2 4/8] dma-mapping: Separate DMA sync issuing and completion waiting Barry Song
2025-12-27 20:07   ` Leon Romanovsky
2025-12-27 21:45     ` Barry Song
2025-12-28 14:49       ` Leon Romanovsky
2025-12-28 21:38         ` Barry Song
2025-12-29 14:40           ` Leon Romanovsky
2025-12-31 14:43           ` Marek Szyprowski
2026-01-05 12:28   ` Jürgen Groß
2025-12-26 22:52 ` [PATCH v2 5/8] dma-mapping: Support batch mode for dma_direct_sync_sg_for_* Barry Song
2025-12-27 20:09   ` Leon Romanovsky
2025-12-27 20:52     ` Barry Song
2025-12-28 14:50       ` Leon Romanovsky
2026-01-06 18:41         ` Barry Song
2026-01-06 19:12           ` Robin Murphy
2026-01-06 19:47             ` Barry Song
2025-12-26 22:52 ` [PATCH v2 6/8] dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg Barry Song
2025-12-27 20:14   ` Leon Romanovsky
2025-12-26 22:52 ` [PATCH RFC v2 7/8] dma-iommu: Support DMA sync batch mode for IOVA link and unlink Barry Song
2025-12-26 22:52 ` [PATCH RFC v2 8/8] dma-iommu: Support DMA sync batch mode for iommu_dma_sync_sg_for_{cpu, device} Barry Song
2025-12-27 20:16   ` Leon Romanovsky
2025-12-27 20:59     ` Barry Song
2026-01-06 19:42       ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).