linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] dma-mapping: arm64: support batched cache sync
@ 2025-12-19  5:36 Barry Song
  2025-12-19  5:36 ` [PATCH 1/6] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Barry Song @ 2025-12-19  5:36 UTC (permalink / raw)
  To: catalin.marinas, m.szyprowski, robin.murphy, will
  Cc: v-songbaohua, zhengtangquan, ryan.roberts, anshuman.khandual, maz,
	linux-kernel, iommu, surenb, ardb, linux-arm-kernel

From: Barry Song <v-songbaohua@oppo.com>

Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.

For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.

Tangquan's results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.

I also ran this patch set on an RK3588 Rock5B+ board and
observed that millions of DMA sync operations were batched.

diff with RFC:
 * Dropped lots of #ifdef/#else/#endif according to Catalin and Marek,
  thanks!
 * Also add iova link/unlink batches, which is marked as RFC as i lack
   hardware. This is suggested by Marek, thanks!

RFC link:
 https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/

Barry Song (6):
  arm64: Provide dcache_by_myline_op_nosync helper
  arm64: Provide dcache_clean_poc_nosync helper
  arm64: Provide dcache_inval_poc_nosync helper
  arm64: Provide arch_sync_dma_ batched helpers
  dma-mapping: Allow batched DMA sync operations if supported by the
    arch
  dma-iommu: Allow DMA sync batching for IOVA link/unlink

 arch/arm64/Kconfig                  |  1 +
 arch/arm64/include/asm/assembler.h  | 79 +++++++++++++++++++-------
 arch/arm64/include/asm/cacheflush.h |  2 +
 arch/arm64/mm/cache.S               | 58 +++++++++++++++----
 arch/arm64/mm/dma-mapping.c         | 24 ++++++++
 drivers/iommu/dma-iommu.c           | 12 +++-
 include/linux/dma-map-ops.h         | 22 ++++++++
 kernel/dma/Kconfig                  |  3 +
 kernel/dma/direct.c                 | 28 +++++++---
 kernel/dma/direct.h                 | 86 +++++++++++++++++++++++++----
 10 files changed, 262 insertions(+), 53 deletions(-)

-- 
2.39.3 (Apple Git-146)



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-12-25 13:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19  5:36 [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  5:36 ` [PATCH 1/6] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
2025-12-19 12:20   ` Robin Murphy
2025-12-21  7:22     ` Barry Song
2025-12-19  5:36 ` [PATCH 2/6] arm64: Provide dcache_clean_poc_nosync helper Barry Song
2025-12-19  5:36 ` [PATCH 3/6] arm64: Provide dcache_inval_poc_nosync helper Barry Song
2025-12-19 12:34   ` Robin Murphy
2025-12-21  7:59     ` Barry Song
2025-12-19  5:36 ` [PATCH 4/6] arm64: Provide arch_sync_dma_ batched helpers Barry Song
2025-12-19  5:36 ` [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch Barry Song
2025-12-20 17:37   ` kernel test robot
2025-12-21  5:15     ` Barry Song
2025-12-21 11:55   ` Leon Romanovsky
2025-12-21 19:24     ` Barry Song
2025-12-22  8:49       ` Leon Romanovsky
2025-12-23  0:02         ` Barry Song
2025-12-23  2:36           ` Barry Song
2025-12-23 14:14           ` Leon Romanovsky
2025-12-24  1:29             ` Barry Song
2025-12-24  8:51               ` Leon Romanovsky
2025-12-25  5:45                 ` Barry Song
2025-12-25 12:36                   ` Leon Romanovsky
2025-12-25 13:31                     ` Barry Song
2025-12-25 13:40                       ` Leon Romanovsky
2025-12-21 12:36   ` kernel test robot
2025-12-22 12:43   ` kernel test robot
2025-12-22 14:00   ` kernel test robot
2025-12-19  5:36 ` [PATCH RFC 6/6] dma-iommu: Allow DMA sync batching for IOVA link/unlink Barry Song
2025-12-19  6:04 ` [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  6:12 ` Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).