All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: v-songbaohua@oppo.com, zhengtangquan@oppo.com,
	ryan.roberts@arm.com, will@kernel.org, anshuman.khandual@arm.com,
	catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	surenb@google.com, iommu@lists.linux.dev, maz@kernel.org,
	robin.murphy@arm.com, ardb@kernel.org,
	linux-arm-kernel@lists.infradead.org, m.szyprowski@samsung.com
Subject: Re: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch
Date: Sun, 21 Dec 2025 13:55:23 +0200	[thread overview]
Message-ID: <20251221115523.GI13030@unreal> (raw)
In-Reply-To: <20251219053658.84978-6-21cnbao@gmail.com>

On Fri, Dec 19, 2025 at 01:36:57PM +0800, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> This enables dma_direct_sync_sg_for_device, dma_direct_sync_sg_for_cpu,
> dma_direct_map_sg, and dma_direct_unmap_sg to use batched DMA sync
> operations when possible. This significantly improves performance on
> devices without hardware cache coherence.
> 
> Tangquan's initial results show that batched synchronization can reduce
> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> phone platform (MediaTek Dimensity 9500). The tests were performed by
> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> sg entries per buffer) for 200 iterations and then averaging the
> results.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  kernel/dma/direct.c | 28 ++++++++++-----
>  kernel/dma/direct.h | 86 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 95 insertions(+), 19 deletions(-)

<...>

>  		if (!dev_is_dma_coherent(dev))
> -			arch_sync_dma_for_device(paddr, sg->length,
> -					dir);
> +			arch_sync_dma_for_device_batch_add(paddr, sg->length, dir);

<...>

> -static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> +#ifdef CONFIG_ARCH_WANT_BATCHED_DMA_SYNC
> +static inline void dma_direct_sync_single_for_cpu_batch_add(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +	phys_addr_t paddr = dma_to_phys(dev, addr);
> +
> +	if (!dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu_batch_add(paddr, size, dir);
> +
> +	__dma_direct_sync_single_for_cpu(dev, paddr, size, dir);
> +}
> +#endif
> +
> +static inline void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +	phys_addr_t paddr = dma_to_phys(dev, addr);
> +
> +	if (!dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu(paddr, size, dir);
> +
> +	__dma_direct_sync_single_for_cpu(dev, paddr, size, dir);
> +}
> +

I'm wondering why you don't implement this batch‑sync support inside the
arch_sync_dma_*() functions. Doing so would minimize changes to the generic
kernel/dma/* code and reduce the amount of #ifdef‑based spaghetti.

Thanks."


WARNING: multiple messages have this Message-ID (diff)
From: Leon Romanovsky <leon@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: catalin.marinas@arm.com, m.szyprowski@samsung.com,
	robin.murphy@arm.com, will@kernel.org, ada.coupriediaz@arm.com,
	anshuman.khandual@arm.com, ardb@kernel.org,
	iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, maz@kernel.org,
	ryan.roberts@arm.com, surenb@google.com, v-songbaohua@oppo.com,
	zhengtangquan@oppo.com
Subject: Re: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch
Date: Sun, 21 Dec 2025 13:55:23 +0200	[thread overview]
Message-ID: <20251221115523.GI13030@unreal> (raw)
In-Reply-To: <20251219053658.84978-6-21cnbao@gmail.com>

On Fri, Dec 19, 2025 at 01:36:57PM +0800, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> This enables dma_direct_sync_sg_for_device, dma_direct_sync_sg_for_cpu,
> dma_direct_map_sg, and dma_direct_unmap_sg to use batched DMA sync
> operations when possible. This significantly improves performance on
> devices without hardware cache coherence.
> 
> Tangquan's initial results show that batched synchronization can reduce
> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> phone platform (MediaTek Dimensity 9500). The tests were performed by
> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> sg entries per buffer) for 200 iterations and then averaging the
> results.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Tangquan Zheng <zhengtangquan@oppo.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  kernel/dma/direct.c | 28 ++++++++++-----
>  kernel/dma/direct.h | 86 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 95 insertions(+), 19 deletions(-)

<...>

>  		if (!dev_is_dma_coherent(dev))
> -			arch_sync_dma_for_device(paddr, sg->length,
> -					dir);
> +			arch_sync_dma_for_device_batch_add(paddr, sg->length, dir);

<...>

> -static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> +#ifdef CONFIG_ARCH_WANT_BATCHED_DMA_SYNC
> +static inline void dma_direct_sync_single_for_cpu_batch_add(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +	phys_addr_t paddr = dma_to_phys(dev, addr);
> +
> +	if (!dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu_batch_add(paddr, size, dir);
> +
> +	__dma_direct_sync_single_for_cpu(dev, paddr, size, dir);
> +}
> +#endif
> +
> +static inline void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +	phys_addr_t paddr = dma_to_phys(dev, addr);
> +
> +	if (!dev_is_dma_coherent(dev))
> +		arch_sync_dma_for_cpu(paddr, size, dir);
> +
> +	__dma_direct_sync_single_for_cpu(dev, paddr, size, dir);
> +}
> +

I'm wondering why you don't implement this batch‑sync support inside the
arch_sync_dma_*() functions. Doing so would minimize changes to the generic
kernel/dma/* code and reduce the amount of #ifdef‑based spaghetti.

Thanks."

  parent reply	other threads:[~2025-12-21 11:55 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19  5:36 [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  5:36 ` Barry Song
2025-12-19  5:36 ` [PATCH 1/6] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19 12:20   ` Robin Murphy
2025-12-19 12:20     ` Robin Murphy
2025-12-21  7:22     ` Barry Song
2025-12-21  7:22       ` Barry Song
2025-12-19  5:36 ` [PATCH 2/6] arm64: Provide dcache_clean_poc_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  5:36 ` [PATCH 3/6] arm64: Provide dcache_inval_poc_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19 12:34   ` Robin Murphy
2025-12-19 12:34     ` Robin Murphy
2025-12-21  7:59     ` Barry Song
2025-12-21  7:59       ` Barry Song
2025-12-19  5:36 ` [PATCH 4/6] arm64: Provide arch_sync_dma_ batched helpers Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  5:36 ` [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-20 17:37   ` kernel test robot
2025-12-20 17:37     ` kernel test robot
2025-12-21  5:15     ` Barry Song
2025-12-21  5:15       ` Barry Song
2025-12-21 11:55   ` Leon Romanovsky [this message]
2025-12-21 11:55     ` Leon Romanovsky
2025-12-21 19:24     ` Barry Song
2025-12-21 19:24       ` Barry Song
2025-12-22  8:49       ` Leon Romanovsky
2025-12-22  8:49         ` Leon Romanovsky
2025-12-23  0:02         ` Barry Song
2025-12-23  0:02           ` Barry Song
2025-12-23  2:36           ` Barry Song
2025-12-23  2:36             ` Barry Song
2025-12-23 14:14           ` Leon Romanovsky
2025-12-23 14:14             ` Leon Romanovsky
2025-12-24  1:29             ` Barry Song
2025-12-24  1:29               ` Barry Song
2025-12-24  8:51               ` Leon Romanovsky
2025-12-24  8:51                 ` Leon Romanovsky
2025-12-25  5:45                 ` Barry Song
2025-12-25  5:45                   ` Barry Song
2025-12-25 12:36                   ` Leon Romanovsky
2025-12-25 12:36                     ` Leon Romanovsky
2025-12-25 13:31                     ` Barry Song
2025-12-25 13:31                       ` Barry Song
2025-12-25 13:40                       ` Leon Romanovsky
2025-12-25 13:40                         ` Leon Romanovsky
2025-12-21 12:36   ` kernel test robot
2025-12-21 12:36     ` kernel test robot
2025-12-22 12:43   ` kernel test robot
2025-12-22 12:43     ` kernel test robot
2025-12-22 14:00   ` kernel test robot
2025-12-22 14:00     ` kernel test robot
2025-12-19  5:36 ` [PATCH RFC 6/6] dma-iommu: Allow DMA sync batching for IOVA link/unlink Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  6:04 ` [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  6:04   ` Barry Song
2025-12-19  6:12 ` Barry Song
2025-12-19  6:12   ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251221115523.GI13030@unreal \
    --to=leon@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=will@kernel.org \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.