DMA Engine development
 help / color / mirror / Atom feed
* [PATCH 00/17] dmaengine: dw-edma: Support dynamic LL appends
@ 2026-06-15 15:40 Koichiro Den
  2026-06-15 15:40 ` [PATCH 01/17] dmaengine: dw-edma: Fix residue burst index in tx_status() Koichiro Den
                   ` (16 more replies)
  0 siblings, 17 replies; 23+ messages in thread
From: Koichiro Den @ 2026-06-15 15:40 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Vinod Koul, Frank Li, Gustavo Pimentel,
	Kees Cook, Krzysztof Wilczyński, Kishon Vijay Abraham I,
	Bjorn Helgaas, Christoph Hellwig, Serge Semin, Cai Huoqing,
	Niklas Cassel
  Cc: Devendra K Verma, dmaengine, linux-kernel

Hi,

This series is a reworked version of Frank's earlier RFT series:

  https://lore.kernel.org/dmaengine/20260109-edma_dymatic-v1-0-9a98c9c98536@nxp.com/

After discussing the HDMA test results with Frank, I am sending this as a
standalone series that keeps the main dynamic-append direction, while adding the
fixes and HDMA handling needed to make it work reliably on both eDMA and HDMA.

Several patches are kept from, or based on, Frank's RFT series; the individual
patches carry the corresponding attribution.

The series has been tested on both eDMA and HDMA systems. Both completed the fio
test set reliably; performance results are shown below.


Dependencies
============

1). [PATCH v7 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
    https://lore.kernel.org/dmaengine/20260521-dma_prep_config-v7-0-1f73f4899883@nxp.com/

2). [PATCH v2 00/11] dmaengine: dw-edma: flatten desc structions and simple code
    https://lore.kernel.org/dmaengine/20260109-edma_ll-v2-0-5c0b27b2c664@nxp.com/


Performance measurements
========================

"Before" means the dependency series applied, without this series.
"After" means the same tree plus this series.

The fio test cases follow the set used in Frank's original RFT series.
Each full fio test set was run three times in alternating order (B-A-B-A-B-A),
with runtime=30s and ramp_time=5s. The tables below report mean bandwidth; the
detailed per-test rows also include standard deviation.

Note:
- These results are from one eDMA platform and one HDMA platform, so the exact
  deltas should NOT be read as generic numbers for all dw-edma integrations.
- Both endpoint setups used nvmet_pci_epf with a namespace backed by a
  null_blk device.

Summary by group (BW delta %)

              all    read   write    qd32      q1  small 4K  large >=128K
  eDMA      +54.6   +46.5   +66.3   +56.1   +53.5     +82.0         +46.3
  HDMA       +9.0    +5.5   +14.1   +14.9    -0.7     +24.5          +4.3

The eDMA setup shows broad improvement across the test set. On HDMA, the main
gains are in high queue-depth and small-block write cases; low queue-depth cases
are mostly neutral, with some run-to-run noise. For HDMA, watermark interrupts
are needed to obtain a reliable running HDMA_LLP_* progress point. They can be
mostly overhead for low queue-depth workloads where the current descriptor fits
in the LL ring and there is no later descriptor to append.


eDMA:
  - Testbed:
    * Endpoint: RK3588 (Rock 5B)
      controller IP version: v5.60a
      ll_max: 170

  - Summary by group (BW delta %)

    all          n=26 mean= +54.6 median= +38.4 min= +16.3 max=+119.0
    read         n=14 mean= +46.5 median= +37.5 min= +18.7 max=+119.0
    write        n=11 mean= +66.3 median= +68.1 min= +16.3 max=+117.2
    qd32         n=16 mean= +56.1 median= +46.8 min= +18.7 max=+117.2
    q1           n= 9 mean= +53.5 median= +36.8 min= +16.3 max=+119.0
    small 4K     n= 6 mean= +82.0 median= +93.6 min= +18.7 max=+117.2
    large >=128K n=20 mean= +46.3 median= +37.6 min= +16.3 max=+119.0

  - Before mean -> After mean (MiB/s)

    Case                         Before             After              Delta
    ---------------------------  -----------------  -----------------  ------
    Rnd read     4KB q1  1j          22.7 (sd 7.7)     48.3 (sd 11.3)  +112.8%
    Rnd read     4KB q32 1j        206.3 (sd 23.8)    245.0 (sd 21.7)   +18.7%
    Rnd read     4KB q32 4j        213.3 (sd 28.0)    332.7 (sd 45.6)   +55.9%
    Rnd read   128KB q1  1j       512.7 (sd 193.6)   644.0 (sd 152.8)   +25.6%
    Rnd read   128KB q32 1j       2285.7 (sd 15.5)    3071.7 (sd 4.2)   +34.4%
    Rnd read   128KB q32 4j        2392.0 (sd 6.1)    3290.0 (sd 1.0)   +37.5%
    Rnd read   512KB q1  1j         634.3 (sd 7.8)    788.7 (sd 15.2)   +24.3%
    Rnd read   512KB q32 1j        2388.7 (sd 5.5)    3282.0 (sd 2.6)   +37.4%
    Rnd read   512KB q32 4j        2391.7 (sd 5.5)    3293.0 (sd 0.0)   +37.7%
    Rnd write    4KB q1  1j         24.4 (sd 10.2)     42.8 (sd 13.2)   +75.8%
    Rnd write    4KB q32 1j        109.0 (sd 13.0)    230.3 (sd 27.1)  +111.3%
    Rnd write    4KB q32 4j        110.3 (sd 14.4)    239.7 (sd 34.4)  +117.2%
    Rnd write  128KB q1  1j        339.0 (sd 41.1)   498.7 (sd 102.9)   +47.1%
    Rnd write  128KB q32 1j       1027.3 (sd 33.5)   1617.0 (sd 14.8)   +57.4%
    Rnd write  128KB q32 4j        951.3 (sd 72.6)    1599.0 (sd 3.6)   +68.1%
    Seq read   128KB q1  1j       379.7 (sd 120.1)    831.3 (sd 89.9)  +119.0%
    Seq read   128KB q32 1j        2291.7 (sd 6.1)   3091.3 (sd 22.8)   +34.9%
    Seq read   512KB q1  1j        644.7 (sd 34.4)    882.0 (sd 28.5)   +36.8%
    Seq read   512KB q32 1j        2387.7 (sd 5.7)    3284.0 (sd 2.6)   +37.5%
    Seq read     1MB q32 1j        2390.0 (sd 5.3)    3292.3 (sd 2.1)   +37.8%
    Seq write  128KB q1  1j        354.0 (sd 88.4)    438.0 (sd 65.1)   +23.7%
    Seq write  128KB q32 1j        934.3 (sd 46.0)   1620.0 (sd 15.6)   +73.4%
    Seq write  512KB q1  1j        552.7 (sd 14.6)    642.7 (sd 38.1)   +16.3%
    Seq write  512KB q32 1j       1041.0 (sd 39.5)    1621.3 (sd 1.5)   +55.7%
    Seq write    1MB q32 1j        808.3 (sd 22.7)    1479.7 (sd 3.5)   +83.1%
    Rnd rdwr  4K..1MB q8  4j       846.7 (sd 18.8)   1177.7 (sd 23.1)   +39.1%

HDMA:
  - Testbed:
    * Endpoint: SpacemiT K3
      controller IP version: v6.30a
      ll_max: 170

  - Summary by group (BW delta %)

    all          n=26 mean=  +9.0 median=  +6.9 min= -15.2 max= +50.2
    read         n=14 mean=  +5.5 median=  +6.4 min= -15.2 max= +24.0
    write        n=11 mean= +14.1 median=  +9.0 min=  -0.2 max= +50.2
    qd32         n=16 mean= +14.9 median=  +9.1 min=  +5.7 max= +50.2
    q1           n= 9 mean=  -0.7 median=  +0.2 min= -15.2 max=  +5.2
    small 4K     n= 6 mean= +24.5 median= +21.5 min=  -0.2 max= +50.2
    large >=128K n=20 mean=  +4.3 median=  +6.4 min= -15.2 max=  +9.8

  - Before mean -> After mean (MiB/s)

    Case                         Before             After              Delta
    ---------------------------  -----------------  -----------------  ------
    Rnd read     4KB q1  1j          68.5 (sd 5.7)      72.0 (sd 6.8)    +5.1%
    Rnd read     4KB q32 1j        310.7 (sd 38.0)    385.3 (sd 43.6)   +24.0%
    Rnd read     4KB q32 4j        324.0 (sd 45.1)     385.7 (sd 9.5)   +19.0%
    Rnd read   128KB q1  1j        737.7 (sd 63.3)    746.0 (sd 47.1)    +1.1%
    Rnd read   128KB q32 1j       1513.0 (sd 24.0)    1617.0 (sd 2.0)    +6.9%
    Rnd read   128KB q32 4j        1552.7 (sd 7.0)   1641.0 (sd 29.9)    +5.7%
    Rnd read   512KB q1  1j        828.3 (sd 16.9)    815.7 (sd 14.0)    -1.5%
    Rnd read   512KB q32 1j        1550.0 (sd 8.5)   1661.7 (sd 14.3)    +7.2%
    Rnd read   512KB q32 4j       1547.3 (sd 20.4)   1670.0 (sd 27.0)    +7.9%
    Rnd write    4KB q1  1j          67.2 (sd 5.1)      67.1 (sd 5.5)    -0.2%
    Rnd write    4KB q32 1j         207.7 (sd 6.8)     309.7 (sd 3.8)   +49.1%
    Rnd write    4KB q32 4j         208.0 (sd 5.6)     312.3 (sd 4.0)   +50.2%
    Rnd write  128KB q1  1j        545.0 (sd 42.5)    573.3 (sd 45.7)    +5.2%
    Rnd write  128KB q32 1j       1251.3 (sd 16.0)    1363.3 (sd 6.7)    +9.0%
    Rnd write  128KB q32 4j       1251.0 (sd 17.1)    1365.3 (sd 4.9)    +9.1%
    Seq read   128KB q1  1j        803.3 (sd 78.2)   681.0 (sd 110.1)   -15.2%
    Seq read   128KB q32 1j       1513.3 (sd 23.5)    1618.3 (sd 4.0)    +6.9%
    Seq read   512KB q1  1j        846.7 (sd 26.9)    797.7 (sd 73.9)    -5.8%
    Seq read   512KB q32 1j       1522.0 (sd 36.2)    1671.0 (sd 1.7)    +9.8%
    Seq read     1MB q32 1j       1544.0 (sd 21.8)   1636.3 (sd 25.1)    +6.0%
    Seq write  128KB q1  1j        544.3 (sd 13.3)    572.3 (sd 28.4)    +5.1%
    Seq write  128KB q32 1j       1251.3 (sd 15.5)    1364.3 (sd 4.9)    +9.0%
    Seq write  512KB q1  1j        772.7 (sd 23.0)    774.3 (sd 64.1)    +0.2%
    Seq write  512KB q32 1j       1251.3 (sd 17.0)    1365.0 (sd 5.2)    +9.1%
    Seq write    1MB q32 1j       1250.3 (sd 16.5)    1366.0 (sd 5.3)    +9.3%
    Rnd rdwr  4K..1MB q8  4j        875.0 (sd 9.0)     884.3 (sd 4.5)    +1.1%



Best regards,
Koichiro


Frank Li (5):
  dmaengine: dw-edma: Add dw_edma_core_ll_cur_idx() to get current LL
    entry index
  dmaengine: dw-edma: Move dw_hdma_set_callback_result() up
  dmaengine: dw-edma: Make DMA link list work as a circular buffer
  dmaengine: dw-edma: Dynamically append requests while running
  dmaengine: dw-edma: Add trace support

Koichiro Den (12):
  dmaengine: dw-edma: Fix residue burst index in tx_status()
  dmaengine: dw-edma: Fix HDMA channel status register access
  dmaengine: dw-edma: Terminate STOP requests without callbacks
  dmaengine: dw-edma: Clean up vchan descriptors on termination
  dmaengine: dw-edma: Serialize channel state checks
  dmaengine: dw-edma: Add LL interrupt placement policy
  dmaengine: dw-edma: Reclaim issued descriptors from LL progress
  dmaengine: dw-edma: Use HDMA watermarks as progress events
  dmaengine: dw-edma: Clear LL data entries on reset
  dmaengine: dw-edma: Dispatch DONE interrupts by channel request
  dmaengine: dw-edma: Reset LL state after terminate and abort
  dmaengine: dw-edma: Recover stopped HDMA from tx_status

 drivers/dma/dw-edma/Makefile          |   3 +
 drivers/dma/dw-edma/dw-edma-core.c    | 577 +++++++++++++++++++++-----
 drivers/dma/dw-edma/dw-edma-core.h    |  63 ++-
 drivers/dma/dw-edma/dw-edma-trace.c   |   4 +
 drivers/dma/dw-edma/dw-edma-trace.h   | 150 +++++++
 drivers/dma/dw-edma/dw-edma-v0-core.c |  50 ++-
 drivers/dma/dw-edma/dw-hdma-v0-core.c | 125 +++++-
 drivers/dma/dw-edma/dw-hdma-v0-regs.h |   1 +
 8 files changed, 847 insertions(+), 126 deletions(-)
 create mode 100644 drivers/dma/dw-edma/dw-edma-trace.c
 create mode 100644 drivers/dma/dw-edma/dw-edma-trace.h

-- 
2.51.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-15 18:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 15:40 [PATCH 00/17] dmaengine: dw-edma: Support dynamic LL appends Koichiro Den
2026-06-15 15:40 ` [PATCH 01/17] dmaengine: dw-edma: Fix residue burst index in tx_status() Koichiro Den
2026-06-15 18:29   ` Frank Li
2026-06-15 15:40 ` [PATCH 02/17] dmaengine: dw-edma: Fix HDMA channel status register access Koichiro Den
2026-06-15 18:31   ` Frank Li
2026-06-15 15:40 ` [PATCH 03/17] dmaengine: dw-edma: Terminate STOP requests without callbacks Koichiro Den
2026-06-15 18:37   ` Frank Li
2026-06-15 15:40 ` [PATCH 04/17] dmaengine: dw-edma: Clean up vchan descriptors on termination Koichiro Den
2026-06-15 18:43   ` Frank Li
2026-06-15 15:40 ` [PATCH 05/17] dmaengine: dw-edma: Serialize channel state checks Koichiro Den
2026-06-15 18:47   ` Frank Li
2026-06-15 15:41 ` [PATCH 06/17] dmaengine: dw-edma: Add dw_edma_core_ll_cur_idx() to get current LL entry index Koichiro Den
2026-06-15 15:41 ` [PATCH 07/17] dmaengine: dw-edma: Move dw_hdma_set_callback_result() up Koichiro Den
2026-06-15 15:41 ` [PATCH 08/17] dmaengine: dw-edma: Make DMA link list work as a circular buffer Koichiro Den
2026-06-15 15:41 ` [PATCH 09/17] dmaengine: dw-edma: Add LL interrupt placement policy Koichiro Den
2026-06-15 15:41 ` [PATCH 10/17] dmaengine: dw-edma: Reclaim issued descriptors from LL progress Koichiro Den
2026-06-15 15:41 ` [PATCH 11/17] dmaengine: dw-edma: Use HDMA watermarks as progress events Koichiro Den
2026-06-15 15:41 ` [PATCH 12/17] dmaengine: dw-edma: Clear LL data entries on reset Koichiro Den
2026-06-15 15:41 ` [PATCH 13/17] dmaengine: dw-edma: Dispatch DONE interrupts by channel request Koichiro Den
2026-06-15 15:41 ` [PATCH 14/17] dmaengine: dw-edma: Reset LL state after terminate and abort Koichiro Den
2026-06-15 15:41 ` [PATCH 15/17] dmaengine: dw-edma: Dynamically append requests while running Koichiro Den
2026-06-15 15:41 ` [PATCH 16/17] dmaengine: dw-edma: Recover stopped HDMA from tx_status Koichiro Den
2026-06-15 15:41 ` [PATCH 17/17] dmaengine: dw-edma: Add trace support Koichiro Den

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox