Re: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Leon Romanovsky <leon@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: v-songbaohua@oppo.com, zhengtangquan@oppo.com,
	ryan.roberts@arm.com, will@kernel.org, anshuman.khandual@arm.com,
	catalin.marinas@arm.com, linux-kernel@vger.kernel.org,
	surenb@google.com, iommu@lists.linux.dev, maz@kernel.org,
	robin.murphy@arm.com, ardb@kernel.org,
	linux-arm-kernel@lists.infradead.org, m.szyprowski@samsung.com
Subject: Re: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch
Date: Wed, 24 Dec 2025 10:51:45 +0200	[thread overview]
Message-ID: <20251224085145.GF11869@unreal> (raw)
In-Reply-To: <CAGsJ_4x9YOT2kG0iWz_5NzRVt2iDN9YeUTknZ2MKg7SwmOGOqg@mail.gmail.com>

On Wed, Dec 24, 2025 at 02:29:13PM +1300, Barry Song wrote:
> On Wed, Dec 24, 2025 at 3:14 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Dec 23, 2025 at 01:02:55PM +1300, Barry Song wrote:
> > > On Mon, Dec 22, 2025 at 9:49 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Dec 22, 2025 at 03:24:58AM +0800, Barry Song wrote:
> > > > > On Sun, Dec 21, 2025 at 7:55 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > [...]
> > > > > > > +
> > > > > >
> > > > > > I'm wondering why you don't implement this batch‑sync support inside the
> > > > > > arch_sync_dma_*() functions. Doing so would minimize changes to the generic
> > > > > > kernel/dma/* code and reduce the amount of #ifdef‑based spaghetti.
> > > > > >
> > > > >
> > > > > There are two cases: mapping an sg list and mapping a single
> > > > > buffer. The former can be batched with
> > > > > arch_sync_dma_*_batch_add() and flushed via
> > > > > arch_sync_dma_batch_flush(), while the latter requires all work to
> > > > > be done inside arch_sync_dma_*(). Therefore,
> > > > > arch_sync_dma_*() cannot always batch and flush.
> > > >
> > > > Probably in all cases you can call the _batch_ variant, followed by _flush_,
> > > > even when handling a single page. This keeps the code consistent across all
> > > > paths. On platforms that do not support _batch_, the _flush_ operation will be
> > > > a NOP anyway.
> > >
> > > We have a lot of code outside kernel/dma that also calls
> > > arch_sync_dma_for_* such as arch/arm, arch/mips, drivers/xen,
> > > I guess we don’t want to modify so many things?
> >
> > Aren't they using internal, arch specific, arch_sync_dma_for_* implementations?
> 
> for arch/arm, arch/mips, they are arch-specific implementations.
> xen is an exception:

Right, and this is the only location outside of kernel/dma where you need to
invoke arch_sync_dma_flush().

> 
> static void xen_swiotlb_unmap_phys(struct device *hwdev, dma_addr_t dev_addr,
>                 size_t size, enum dma_data_direction dir, unsigned long attrs)
> {
>         phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
>         struct io_tlb_pool *pool;
> 
>         BUG_ON(dir == DMA_NONE);
> 
>         if (!dev_is_dma_coherent(hwdev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
>                 if (pfn_valid(PFN_DOWN(dma_to_phys(hwdev, dev_addr))))
>                         arch_sync_dma_for_cpu(paddr, size, dir);
>                 else
>                         xen_dma_sync_for_cpu(hwdev, dev_addr, size, dir);
>         }
> 
>         /* NOTE: We use dev_addr here, not paddr! */
>         pool = xen_swiotlb_find_pool(hwdev, dev_addr);
>         if (pool)
>                 __swiotlb_tbl_unmap_single(hwdev, paddr, size, dir,
>                                            attrs, pool);
> }
> 
> >
> > >
> > > for kernel/dma, we have two "single" callers only:
> > > kernel/dma/direct.h, kernel/dma/swiotlb.c.  and they looks quite
> > > straightforward:
> > >
> > > static inline void dma_direct_sync_single_for_device(struct device *dev,
> > >                 dma_addr_t addr, size_t size, enum dma_data_direction dir)
> > > {
> > >         phys_addr_t paddr = dma_to_phys(dev, addr);
> > >
> > >         swiotlb_sync_single_for_device(dev, paddr, size, dir);
> > >
> > >         if (!dev_is_dma_coherent(dev))
> > >                 arch_sync_dma_for_device(paddr, size, dir);
> > > }
> > >
> > > I guess moving to arch_sync_dma_for_device_batch + flush
> > > doesn’t really look much better, does it?
> > >
> > > >
> > > > I would also rename arch_sync_dma_batch_flush() to arch_sync_dma_flush().
> > >
> > > Sure.
> > >
> > > >
> > > > You can also minimize changes in dma_direct_map_phys() too, by extending
> > > > it's signature to provide if flush is needed or not.
> > >
> > > Yes. I have
> > >
> > > static inline dma_addr_t __dma_direct_map_phys(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs, bool flush)
> >
> > My suggestion is to use it directly, without wrappers.
> >
> > >
> > > and two wrappers:
> > > static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs)
> > > {
> > >         return __dma_direct_map_phys(dev, phys, size, dir, attrs, true);
> > > }
> > >
> > > static inline dma_addr_t dma_direct_map_phys_batch_add(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs)
> > > {
> > >         return __dma_direct_map_phys(dev, phys, size, dir, attrs, false);
> > > }
> > >
> > > If you prefer exposing "flush" directly in dma_direct_map_phys()
> > > and updating its callers with flush=true, I think that’s fine.
> >
> > Yes
> >
> 
> OK. Could you take a look at [1] and see if any further
> improvements are needed before I send v2?

Everything looks ok, except these renames:
-			arch_sync_dma_for_cpu(paddr, sg->length, dir);
+			arch_sync_dma_for_cpu_batch_add(paddr, sg->length, dir);

Thanks

> 
> [1] https://lore.kernel.org/lkml/20251223023648.31614-1-21cnbao@gmail.com/
> 
> Thanks
> Barry
>

WARNING: multiple messages have this Message-ID (diff)

From: Leon Romanovsky <leon@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: ada.coupriediaz@arm.com, anshuman.khandual@arm.com,
	ardb@kernel.org, catalin.marinas@arm.com, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, m.szyprowski@samsung.com,
	maz@kernel.org, robin.murphy@arm.com, ryan.roberts@arm.com,
	surenb@google.com, v-songbaohua@oppo.com, will@kernel.org,
	zhengtangquan@oppo.com
Subject: Re: [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch
Date: Wed, 24 Dec 2025 10:51:45 +0200	[thread overview]
Message-ID: <20251224085145.GF11869@unreal> (raw)
In-Reply-To: <CAGsJ_4x9YOT2kG0iWz_5NzRVt2iDN9YeUTknZ2MKg7SwmOGOqg@mail.gmail.com>

On Wed, Dec 24, 2025 at 02:29:13PM +1300, Barry Song wrote:
> On Wed, Dec 24, 2025 at 3:14 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Dec 23, 2025 at 01:02:55PM +1300, Barry Song wrote:
> > > On Mon, Dec 22, 2025 at 9:49 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Dec 22, 2025 at 03:24:58AM +0800, Barry Song wrote:
> > > > > On Sun, Dec 21, 2025 at 7:55 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > [...]
> > > > > > > +
> > > > > >
> > > > > > I'm wondering why you don't implement this batch‑sync support inside the
> > > > > > arch_sync_dma_*() functions. Doing so would minimize changes to the generic
> > > > > > kernel/dma/* code and reduce the amount of #ifdef‑based spaghetti.
> > > > > >
> > > > >
> > > > > There are two cases: mapping an sg list and mapping a single
> > > > > buffer. The former can be batched with
> > > > > arch_sync_dma_*_batch_add() and flushed via
> > > > > arch_sync_dma_batch_flush(), while the latter requires all work to
> > > > > be done inside arch_sync_dma_*(). Therefore,
> > > > > arch_sync_dma_*() cannot always batch and flush.
> > > >
> > > > Probably in all cases you can call the _batch_ variant, followed by _flush_,
> > > > even when handling a single page. This keeps the code consistent across all
> > > > paths. On platforms that do not support _batch_, the _flush_ operation will be
> > > > a NOP anyway.
> > >
> > > We have a lot of code outside kernel/dma that also calls
> > > arch_sync_dma_for_* such as arch/arm, arch/mips, drivers/xen,
> > > I guess we don’t want to modify so many things?
> >
> > Aren't they using internal, arch specific, arch_sync_dma_for_* implementations?
> 
> for arch/arm, arch/mips, they are arch-specific implementations.
> xen is an exception:

Right, and this is the only location outside of kernel/dma where you need to
invoke arch_sync_dma_flush().

> 
> static void xen_swiotlb_unmap_phys(struct device *hwdev, dma_addr_t dev_addr,
>                 size_t size, enum dma_data_direction dir, unsigned long attrs)
> {
>         phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
>         struct io_tlb_pool *pool;
> 
>         BUG_ON(dir == DMA_NONE);
> 
>         if (!dev_is_dma_coherent(hwdev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
>                 if (pfn_valid(PFN_DOWN(dma_to_phys(hwdev, dev_addr))))
>                         arch_sync_dma_for_cpu(paddr, size, dir);
>                 else
>                         xen_dma_sync_for_cpu(hwdev, dev_addr, size, dir);
>         }
> 
>         /* NOTE: We use dev_addr here, not paddr! */
>         pool = xen_swiotlb_find_pool(hwdev, dev_addr);
>         if (pool)
>                 __swiotlb_tbl_unmap_single(hwdev, paddr, size, dir,
>                                            attrs, pool);
> }
> 
> >
> > >
> > > for kernel/dma, we have two "single" callers only:
> > > kernel/dma/direct.h, kernel/dma/swiotlb.c.  and they looks quite
> > > straightforward:
> > >
> > > static inline void dma_direct_sync_single_for_device(struct device *dev,
> > >                 dma_addr_t addr, size_t size, enum dma_data_direction dir)
> > > {
> > >         phys_addr_t paddr = dma_to_phys(dev, addr);
> > >
> > >         swiotlb_sync_single_for_device(dev, paddr, size, dir);
> > >
> > >         if (!dev_is_dma_coherent(dev))
> > >                 arch_sync_dma_for_device(paddr, size, dir);
> > > }
> > >
> > > I guess moving to arch_sync_dma_for_device_batch + flush
> > > doesn’t really look much better, does it?
> > >
> > > >
> > > > I would also rename arch_sync_dma_batch_flush() to arch_sync_dma_flush().
> > >
> > > Sure.
> > >
> > > >
> > > > You can also minimize changes in dma_direct_map_phys() too, by extending
> > > > it's signature to provide if flush is needed or not.
> > >
> > > Yes. I have
> > >
> > > static inline dma_addr_t __dma_direct_map_phys(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs, bool flush)
> >
> > My suggestion is to use it directly, without wrappers.
> >
> > >
> > > and two wrappers:
> > > static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs)
> > > {
> > >         return __dma_direct_map_phys(dev, phys, size, dir, attrs, true);
> > > }
> > >
> > > static inline dma_addr_t dma_direct_map_phys_batch_add(struct device *dev,
> > >                 phys_addr_t phys, size_t size, enum dma_data_direction dir,
> > >                 unsigned long attrs)
> > > {
> > >         return __dma_direct_map_phys(dev, phys, size, dir, attrs, false);
> > > }
> > >
> > > If you prefer exposing "flush" directly in dma_direct_map_phys()
> > > and updating its callers with flush=true, I think that’s fine.
> >
> > Yes
> >
> 
> OK. Could you take a look at [1] and see if any further
> improvements are needed before I send v2?

Everything looks ok, except these renames:
-			arch_sync_dma_for_cpu(paddr, sg->length, dir);
+			arch_sync_dma_for_cpu_batch_add(paddr, sg->length, dir);

Thanks

> 
> [1] https://lore.kernel.org/lkml/20251223023648.31614-1-21cnbao@gmail.com/
> 
> Thanks
> Barry
>

next prev parent reply	other threads:[~2025-12-24  8:52 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19  5:36 [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  5:36 ` Barry Song
2025-12-19  5:36 ` [PATCH 1/6] arm64: Provide dcache_by_myline_op_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19 12:20   ` Robin Murphy
2025-12-19 12:20     ` Robin Murphy
2025-12-21  7:22     ` Barry Song
2025-12-21  7:22       ` Barry Song
2025-12-19  5:36 ` [PATCH 2/6] arm64: Provide dcache_clean_poc_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  5:36 ` [PATCH 3/6] arm64: Provide dcache_inval_poc_nosync helper Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19 12:34   ` Robin Murphy
2025-12-19 12:34     ` Robin Murphy
2025-12-21  7:59     ` Barry Song
2025-12-21  7:59       ` Barry Song
2025-12-19  5:36 ` [PATCH 4/6] arm64: Provide arch_sync_dma_ batched helpers Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  5:36 ` [PATCH 5/6] dma-mapping: Allow batched DMA sync operations if supported by the arch Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-20 17:37   ` kernel test robot
2025-12-20 17:37     ` kernel test robot
2025-12-21  5:15     ` Barry Song
2025-12-21  5:15       ` Barry Song
2025-12-21 11:55   ` Leon Romanovsky
2025-12-21 11:55     ` Leon Romanovsky
2025-12-21 19:24     ` Barry Song
2025-12-21 19:24       ` Barry Song
2025-12-22  8:49       ` Leon Romanovsky
2025-12-22  8:49         ` Leon Romanovsky
2025-12-23  0:02         ` Barry Song
2025-12-23  0:02           ` Barry Song
2025-12-23  2:36           ` Barry Song
2025-12-23  2:36             ` Barry Song
2025-12-23 14:14           ` Leon Romanovsky
2025-12-23 14:14             ` Leon Romanovsky
2025-12-24  1:29             ` Barry Song
2025-12-24  1:29               ` Barry Song
2025-12-24  8:51               ` Leon Romanovsky [this message]
2025-12-24  8:51                 ` Leon Romanovsky
2025-12-25  5:45                 ` Barry Song
2025-12-25  5:45                   ` Barry Song
2025-12-25 12:36                   ` Leon Romanovsky
2025-12-25 12:36                     ` Leon Romanovsky
2025-12-25 13:31                     ` Barry Song
2025-12-25 13:31                       ` Barry Song
2025-12-25 13:40                       ` Leon Romanovsky
2025-12-25 13:40                         ` Leon Romanovsky
2025-12-21 12:36   ` kernel test robot
2025-12-21 12:36     ` kernel test robot
2025-12-22 12:43   ` kernel test robot
2025-12-22 12:43     ` kernel test robot
2025-12-22 14:00   ` kernel test robot
2025-12-22 14:00     ` kernel test robot
2025-12-19  5:36 ` [PATCH RFC 6/6] dma-iommu: Allow DMA sync batching for IOVA link/unlink Barry Song
2025-12-19  5:36   ` Barry Song
2025-12-19  6:04 ` [PATCH 0/6] dma-mapping: arm64: support batched cache sync Barry Song
2025-12-19  6:04   ` Barry Song
2025-12-19  6:12 ` Barry Song
2025-12-19  6:12   ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251224085145.GF11869@unreal \
    --to=leon@kernel.org \
    --cc=21cnbao@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=will@kernel.org \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.