All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	iommu@lists.linux.dev,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Tangquan Zheng <zhengtangquan@oppo.com>,
	linux-kernel@vger.kernel.org, Barry Song <v-songbaohua@oppo.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 0/5] dma-mapping: arm64: support batched cache sync
Date: Wed, 29 Oct 2025 10:31:10 +0800	[thread overview]
Message-ID: <20251029023115.22809-1-21cnbao@gmail.com> (raw)

From: Barry Song <v-songbaohua@oppo.com>

Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.

For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.

Tangquan's initial results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.

Barry Song (5):
  arm64: Provide dcache_by_myline_op_nosync helper
  arm64: Provide dcache_clean_poc_nosync helper
  arm64: Provide dcache_inval_poc_nosync helper
  arm64: Provide arch_sync_dma_ batched helpers
  dma-mapping: Allow batched DMA sync operations if supported by the
    arch

 arch/arm64/Kconfig                  |  1 +
 arch/arm64/include/asm/assembler.h  | 79 +++++++++++++++++++-------
 arch/arm64/include/asm/cacheflush.h |  2 +
 arch/arm64/mm/cache.S               | 58 +++++++++++++++----
 arch/arm64/mm/dma-mapping.c         | 24 ++++++++
 include/linux/dma-map-ops.h         |  8 +++
 kernel/dma/Kconfig                  |  3 +
 kernel/dma/direct.c                 | 53 ++++++++++++++++--
 kernel/dma/direct.h                 | 86 +++++++++++++++++++++++++----
 9 files changed, 267 insertions(+), 47 deletions(-)

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: iommu@lists.linux.dev

-- 
2.39.3 (Apple Git-146)



WARNING: multiple messages have this Message-ID (diff)
From: Barry Song <21cnbao@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>
Cc: Barry Song <v-songbaohua@oppo.com>,
	Ada Couprie Diaz <ada.coupriediaz@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Tangquan Zheng <zhengtangquan@oppo.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev
Subject: [RFC PATCH 0/5] dma-mapping: arm64: support batched cache sync
Date: Wed, 29 Oct 2025 10:31:10 +0800	[thread overview]
Message-ID: <20251029023115.22809-1-21cnbao@gmail.com> (raw)

From: Barry Song <v-songbaohua@oppo.com>

Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.

For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.

Tangquan's initial results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.

Barry Song (5):
  arm64: Provide dcache_by_myline_op_nosync helper
  arm64: Provide dcache_clean_poc_nosync helper
  arm64: Provide dcache_inval_poc_nosync helper
  arm64: Provide arch_sync_dma_ batched helpers
  dma-mapping: Allow batched DMA sync operations if supported by the
    arch

 arch/arm64/Kconfig                  |  1 +
 arch/arm64/include/asm/assembler.h  | 79 +++++++++++++++++++-------
 arch/arm64/include/asm/cacheflush.h |  2 +
 arch/arm64/mm/cache.S               | 58 +++++++++++++++----
 arch/arm64/mm/dma-mapping.c         | 24 ++++++++
 include/linux/dma-map-ops.h         |  8 +++
 kernel/dma/Kconfig                  |  3 +
 kernel/dma/direct.c                 | 53 ++++++++++++++++--
 kernel/dma/direct.h                 | 86 +++++++++++++++++++++++++----
 9 files changed, 267 insertions(+), 47 deletions(-)

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: iommu@lists.linux.dev

-- 
2.39.3 (Apple Git-146)


             reply	other threads:[~2025-10-29  2:32 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  2:31 Barry Song [this message]
2025-10-29  2:31 ` [RFC PATCH 0/5] dma-mapping: arm64: support batched cache sync Barry Song
2025-10-29  2:31 ` [RFC PATCH 1/5] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
2025-10-29  2:31   ` Barry Song
2025-10-29  2:31 ` [RFC PATCH 2/5] arm64: Provide dcache_clean_poc_nosync helper Barry Song
2025-10-29  2:31   ` Barry Song
2025-10-29  2:31 ` [RFC PATCH 3/5] arm64: Provide dcache_inval_poc_nosync helper Barry Song
2025-10-29  2:31   ` Barry Song
2025-10-29  2:31 ` [RFC PATCH 4/5] arm64: Provide arch_sync_dma_ batched helpers Barry Song
2025-10-29  2:31   ` Barry Song
2025-10-29  2:31 ` [RFC PATCH 5/5] dma-mapping: Allow batched DMA sync operations if supported by the arch Barry Song
2025-10-29  2:31   ` Barry Song
2025-11-13 18:19   ` Catalin Marinas
2025-11-13 18:19     ` Catalin Marinas
2025-11-17 21:12     ` Barry Song
2025-11-17 21:12       ` Barry Song
2025-11-21 16:09       ` Marek Szyprowski
2025-11-21 16:09         ` Marek Szyprowski
2025-11-21 23:28         ` Barry Song
2025-11-21 23:28           ` Barry Song
2025-11-24 18:11           ` Marek Szyprowski
2025-11-24 18:11             ` Marek Szyprowski
2025-11-06 20:44 ` [RFC PATCH 0/5] dma-mapping: arm64: support batched cache sync Barry Song
2025-11-06 20:44   ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251029023115.22809-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=will@kernel.org \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.