public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA
@ 2026-01-18 13:54 Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 01/38] dmaengine: dw-edma: Export helper to get integrated register window Koichiro Den
                   ` (38 more replies)
  0 siblings, 39 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Hi,

This is RFC v4 of the NTB/PCI/dmaengine series that introduces an
optional NTB transport variant where payload data is moved by a PCI
embedded-DMA engine (eDMA) residing on the endpoint side.

The primary target is Synopsys DesignWare PCIe endpoint controllers that
integrate a DesignWare eDMA instance (dw-edma). In the remote
embedded-DMA mode, payload is transferred by DMA directly between the
two systems' memory, and NTB Memory Windows are used primarily for
control/metadata and for exposing the endpoint eDMA resources (register
window + linked-list rings) to the host.

Compared to the existing cpu/dma memcpy-based implementation, this
approach avoids window-backed payload rings and the associated extra
copies, and it is less sensitive to scarce MW space. This also enables
scaling out to multiple queue pairs, which is particularly beneficial
for ntb_netdev. On R-Car S4, preliminary iperf3 results show 10~20x
throughput improvement. Latency improvements are also observed.

RFC history:
  RFC v3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
  RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
  RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/

Parts of RFC v3 series have already been split out and posted separately
(see "Kernel base / dependencies" section below). However, feedback on
the remaining parts led to substantial restructuring and code changes,
so I am sending an RFC v4 as a refreshed version of the full series.

RFC v4 is still a large, cross-subsystem series. At this RFC stage,
I am sending the full picture in a single set to make it easier to
review the overall direction and architecture. Once the direction is
agreed upon and no further large restructuring appears necessary, I will stop
posting the new RFC-tagged revisions and continue development on
separate threads, split by sub-topic.

Many thanks for all the reviews and feedback from multiple perspectives.


Software architecture overview (RFC v4)
=======================================

A major change in RFC v4 is the software layering and module split.

The existing memcpy-based transport and the new remote embedded-DMA
transport are implemented as two independent NTB client drivers on top
of a shared core library:

                       +--------------------+
                       | ntb_transport_core |
                       +--------------------+
                           ^            ^
                           |            |
        ntb_transport -----+            +----- ntb_transport_edma
       (cpu/dma memcpy)                   (remote embedded DMA transfer)
                                                       |
                                                       v
                                                 +-----------+
                                                 |  ntb_edma |
                                                 +-----------+
                                                       ^
                                                       |
                                               +----------------+
                                               |                |
                                          ntb_dw_edma         [...]

Key points:
  * ntb_transport_core provides the queue-pair abstraction used by upper
    layer clients (e.g. ntb_netdev).
  * ntb_transport is the legacy shared-memory transport client (CPU/DMA
    memcpy).
  * ntb_transport_edma is the remote embedded-DMA transport client.
  * ntb_transport_edma relies on an ntb_edma backend registry.
    This RFC provides an initial DesignWare backend (ntb_dw_edma).
  * Transport selection is per-NTB device via the standard
    driver_override mechanism. To enable that, this RFC adds
    driver_override support to ntb_bus. This allows mixing transports
    across multiple NTB ports and provides an explicit fallback path to
    the legacy transport.

So, if ntb_transport / ntb_transport_edma are built as loadable modules,
you can just run modprobe ntb_transport as before and the original cpu/dma
memcpy-based implementation will be active. If they are built-in, whether
ntb_transport or ntb_transport_edma are bound by default depends on
initcall order. Regarding how to switch the driver, please see Patch 34
("Documentation: driver-api: ntb: Document remote embedded-DMA transport")
for details.


Data flow overview (remote embedded-DMA transport)
==================================================

At a high level:
  * One MW is reserved as an "eDMA window". The endpoint exposes the
    eDMA register block plus LL descriptor rings through that window, so
    the peer can ioremap it and drive DMA reads remotely.
  * Remaining MWs carry only small control-plane rings used to exchange
    buffer addresses and completion information.
  * For RC->EP traffic, the RC drives endpoint DMA read channels through
    the peer-visible eDMA window.
  * For EP->RC traffic, the endpoint uses its local DMA write channels.

The following figures illustrate the data flow when ntb_netdev sits on
top of the transport:

     Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
                   backed by ntb_edma/ntb_dw_edma

             EP                                   RC
          phys addr                            phys addr
            space                                space
             +-+                                  +-+
             | |                                  | |
             | |                ||                | |
             +-+-----.          ||                | |
    EDMA REG | |      \     [A] ||                | |
             +-+----.  '---+-+  ||                | |
             | |     \     | |<---------[0-a]----------
             +-+-----------| |<----------[2]----------.
     EDMA LL | |           | |  ||                | | :
             | |           | |  ||                | | :
             +-+-----------+-+  ||  [B]           | | :
             | |                ||  ++            | | :
          ---------[0-b]----------->||----------------'
             | |            ++  ||  ||            | |
             | |            ||  ||  ++            | |
             | |            ||<----------[4]-----------
             | |            ++  ||                | |
             | |           [C]  ||                | |
          .--|#|<------------------------[3]------|#|<-.
          :  |#|                ||                |#|  :
         [5] | |                ||                | | [1]
          :  | |                ||                | |  :
          '->|#|                                  |#|--'
             |#|                                  |#|
             | |                                  | |

     Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
                  backed by ntb_edma/ntb_dw_edma

             EP                                   RC
          phys addr                            phys addr
            space                                space
             +-+                                  +-+
             | |                                  | |
             | |                ||                | |
             +-+                ||                | |
    EDMA REG | |                ||                | |
             +-+                ||                | |
    ^        | |                ||                | |
    :        +-+                ||                | |
    : EDMA LL| |                ||                | |
    :        | |                ||                | |
    :        +-+                ||  [C]           | |
    :        | |                ||  ++            | |
    :     -----------[4]----------->||            | |
    :        | |            ++  ||  ||            | |
    :        | |            ||  ||  ++            | |
    '----------------[2]-----||<--------[0-b]-----------
             | |            ++  ||                | |
             | |           [B]  ||                | |
          .->|#|--------[3]---------------------->|#|--.
          :  |#|                ||                |#|  :
         [1] | |                ||                | | [5]
          :  | |                ||                | |  :
          '--|#|                                  |#|<-'
             |#|                                  |#|
             | |                                  | |

    0-a. configure remote embedded DMA (program endpoint DMA registers)
    0-b. DMA-map and publish destination address (DAR)
    1.   network stack builds skb (copy from application/user memory)
    2.   consume DAR, DMA-map source address (SAR) and kick DMA transfer
    3.   DMA transfer (payload moves between RC/EP memory)
    4.   consume completion (commit)
    5.   network stack delivers data to application/user memory

    [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
    [B]: Control-plane ring buffer for "produce"
    [C]: Control-plane ring buffer for "consume"


Kernel base / dependencies
==========================

This series is based on:

  - next-20260114 (commit b775e489bec7)

plus the following seven unmerged patch series or standalone patches:

  - [PATCH v4 0/7] PCI: endpoint/NTB: Harden vNTB resource management
    https://lore.kernel.org/all/20251202072348.2752371-1-den@valinux.co.jp/

  - [PATCH v2 0/2] NTB: ntb_transport: debugfs cleanups
    https://lore.kernel.org/all/20260107042458.1987818-1-den@valinux.co.jp/

  - [PATCH v3 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
    https://lore.kernel.org/all/20260105-dma_prep_config-v3-0-a8480362fd42@nxp.com/

  - [PATCH v8 0/5] PCI: endpoint: BAR subrange mapping support
    https://lore.kernel.org/all/20260115084928.55701-1-den@valinux.co.jp/

  - [PATCH] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
    https://lore.kernel.org/all/20260105075606.1253697-1-den@valinux.co.jp/

  - [PATCH] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
    https://lore.kernel.org/all/20260105075904.1254012-1-den@valinux.co.jp/

  - [PATCH v2 01/11] dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
    https://lore.kernel.org/imx/20260109-edma_ll-v2-1-5c0b27b2c664@nxp.com/
    (only this single commit is cherry-picked from the series)


Patch layout
============

  1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
     register/LL windows, IRQ routing control, etc.)

     Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
     Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
     Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
     Patch 04 : dmaengine: dw-edma: Add notify-only channels support
     Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region

  2. NTB EPF/core + vNTB prep (mwN_offset + versioning, MSI vector
     management, new ntb_dev_ops helpers, driver_override, vntb glue)

     Patch 06 : NTB: epf: Add mwN_offset support and config region versioning
     Patch 07 : NTB: epf: Reserve a subset of MSI vectors for non-NTB users
     Patch 08 : NTB: epf: Provide db_vector_count/db_vector_mask callbacks
     Patch 09 : NTB: core: Add mw_set_trans_ranges() for subrange programming
     Patch 10 : NTB: core: Add .get_private_data() to ntb_dev_ops
     Patch 11 : NTB: core: Add .get_dma_dev() to ntb_dev_ops
     Patch 12 : NTB: core: Add driver_override support for NTB devices
     Patch 13 : PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
     Patch 14 : PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
     Patch 15 : PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()

  3. ntb_transport refactor/modularization and backend infrastructure

     Patch 16 : NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
     Patch 17 : NTB: ntb_transport: Dynamically determine qp count
     Patch 18 : NTB: ntb_transport: Use ntb_get_dma_dev()
     Patch 19 : NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
     Patch 20 : NTB: ntb_transport: Move internal types to ntb_transport_internal.h
     Patch 21 : NTB: ntb_transport: Export common helpers for modularization
     Patch 22 : NTB: ntb_transport: Split core library and default NTB client
     Patch 23 : NTB: ntb_transport: Add transport backend infrastructure
     Patch 24 : NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation

  4. ntb_edma backend registry + DesignWare backend + transport client

     Patch 25 : NTB: hw: Add remote eDMA backend registry and DesignWare backend
     Patch 26 : NTB: ntb_transport: Add remote embedded-DMA transport client

  5. ntb_netdev multi-queue support

     Patch 27 : ntb_netdev: Multi-queue support

  6. Renesas R-Car S4 enablement (IOMMU, DTs, quirks)

     Patch 28 : iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
     Patch 29 : iommu: ipmmu-vmsa: Add support for reserved regions
     Patch 30 : arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
     Patch 31 : NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
     Patch 32 : NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car

  7. Documentation updates

     Patch 33 : Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
     Patch 34 : Documentation: driver-api: ntb: Document remote embedded-DMA transport

  8. pci-epf-test / pci_endpoint_test / kselftest coverage for remote eDMA

     Patch 35 : PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
     Patch 36 : PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
     Patch 37 : misc: pci_endpoint_test: Add remote eDMA transfer test mode
     Patch 38 : selftests: pci_endpoint: Add remote eDMA transfer coverage


Tested on
=========

* 2x Renesas R-Car S4 Spider (RC<->EP connected with OCuLink cable)
* Kernel base as described above


Performance notes
=================

The primary motivation remains improving throughput/latency for ntb_transport
users (typically ntb_netdev). On R-Car S4, the earlier prototype (RFC v3)
showed roughly 10-20x throughput improvement in preliminary iperf3 tests and
lower ping RTT. I have not yet re-measured after the v4 refactor and
module split.


Changelog
=========

RFCv3->RFCv4 changes:
  - Major refactor of the transport layering:
    - Introduce ntb_transport_core as a shared library module.
    - Split the legacy shared-memory transport client (ntb_transport) and the
      remote embedded-DMA transport client (ntb_transport_edma).
    - Add driver_override support for ntb_bus and use it for per-port transport
      selection.
  - Introduce a vendor-agnostic remote embedded-DMA backend registry (ntb_edma)
    and add the initial DesignWare backend (ntb_dw_edma).
  - Rebase to next-20260114 and move several prerequisite/fixup patchsets into
    separate threads (listed above), including BAR subrange mapping support and
    dw-edma fixes.
  - Add PCI endpoint test coverage for the remote embedded-DMA path:
    - extend pci-epf-test / pci_endpoint_test
    - add a kselftest variant to exercise remote-eDMA transfers
    Note: to keep the changes as small as possible, I added a few #ifdefs
    in the main test code. Feedback on whether/how/to what extent this
    should be split into separate modules would be appreciated.
  - Expand documentation (Documentation/driver-api/ntb.rst) to describe transport
    variants, the new module structure, and the remote embedded-DMA data flow.
  - Addressed other feedbacks from the RFC v3 thread.

RFCv2->RFCv3 changes:
  - Architecture
    - Have EP side use its local write channels, while leaving RC side to
      use remote read channels.
    - Abstraction/HW-specific stuff encapsulation improved.
  - Added control/config region versioning for the vNTB/EPF control region
    so that mismatched RC/EP kernels fail early instead of silently using an
    incompatible layout.
  - Reworked BAR subrange / multi-region mapping support:
    - Dropped the v2 approach that added new inbound mapping ops in the EPC
      core.
    - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
      support BAR subrange inbound mapping via Address Match Mode IB iATU.
    - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
      when offsets are used.
  - Changed .get_pci_epc() to .get_private_data()
  - Dropped two commits from RFC v2 that should be submitted separately:
    (1) ntb_transport debugfs seq_file conversion
    (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
  - Added documentation updates.
  - Addressed assorted review nits from the RFC v2 thread (naming/structure).

RFCv1->RFCv2 changes:
  - Architecture
    - Drop the generic interrupt backend + DW eDMA test-interrupt backend
      approach and instead adopt the remote eDMA-backed ntb_transport mode
      proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
      mapping (Address Match Mode) infrastructure from RFC v1 is largely
      kept, with only minor refinements and code motion where necessary
      to fit the new transport-mode design.
  - For Patch 01
    - Rework the array_index_nospec() conversion to address review
      comments on "[RFC PATCH 01/25]".

RFCv3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/

Thank you for reviewing,


Koichiro Den (38):
  dmaengine: dw-edma: Export helper to get integrated register window
  dmaengine: dw-edma: Add per-channel interrupt routing control
  dmaengine: dw-edma: Poll completion when local IRQ handling is
    disabled
  dmaengine: dw-edma: Add notify-only channels support
  dmaengine: dw-edma: Add a helper to query linked-list region
  NTB: epf: Add mwN_offset support and config region versioning
  NTB: epf: Reserve a subset of MSI vectors for non-NTB users
  NTB: epf: Provide db_vector_count/db_vector_mask callbacks
  NTB: core: Add mw_set_trans_ranges() for subrange programming
  NTB: core: Add .get_private_data() to ntb_dev_ops
  NTB: core: Add .get_dma_dev() to ntb_dev_ops
  NTB: core: Add driver_override support for NTB devices
  PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
  PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
  PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
  NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
  NTB: ntb_transport: Dynamically determine qp count
  NTB: ntb_transport: Use ntb_get_dma_dev()
  NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
  NTB: ntb_transport: Move internal types to ntb_transport_internal.h
  NTB: ntb_transport: Export common helpers for modularization
  NTB: ntb_transport: Split core library and default NTB client
  NTB: ntb_transport: Add transport backend infrastructure
  NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
  NTB: hw: Add remote eDMA backend registry and DesignWare backend
  NTB: ntb_transport: Add remote embedded-DMA transport client
  ntb_netdev: Multi-queue support
  iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
  iommu: ipmmu-vmsa: Add support for reserved regions
  arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
    eDMA
  NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
  NTB: epf: Add an additional memory window (MW2) barno mapping on
    Renesas R-Car
  Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
    usage
  Documentation: driver-api: ntb: Document remote embedded-DMA transport
  PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
  PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
  misc: pci_endpoint_test: Add remote eDMA transfer test mode
  selftests: pci_endpoint: Add remote eDMA transfer coverage

 Documentation/PCI/endpoint/pci-vntb-howto.rst |   19 +-
 Documentation/driver-api/ntb.rst              |  193 ++
 arch/arm64/boot/dts/renesas/Makefile          |    2 +
 .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   37 +
 .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
 drivers/dma/dw-edma/dw-edma-core.c            |  207 +-
 drivers/dma/dw-edma/dw-edma-core.h            |   10 +
 drivers/dma/dw-edma/dw-edma-v0-core.c         |   26 +-
 drivers/iommu/ipmmu-vmsa.c                    |    7 +-
 drivers/misc/pci_endpoint_test.c              |  633 +++++
 drivers/net/ntb_netdev.c                      |  341 ++-
 drivers/ntb/Kconfig                           |   13 +
 drivers/ntb/Makefile                          |    2 +
 drivers/ntb/core.c                            |   68 +
 drivers/ntb/hw/Kconfig                        |    1 +
 drivers/ntb/hw/Makefile                       |    1 +
 drivers/ntb/hw/edma/Kconfig                   |   28 +
 drivers/ntb/hw/edma/Makefile                  |    5 +
 drivers/ntb/hw/edma/backend.c                 |   87 +
 drivers/ntb/hw/edma/backend.h                 |  102 +
 drivers/ntb/hw/edma/ntb_dw_edma.c             |  977 +++++++
 drivers/ntb/hw/epf/ntb_hw_epf.c               |  199 +-
 drivers/ntb/ntb_transport.c                   | 2458 +---------------
 drivers/ntb/ntb_transport_core.c              | 2523 +++++++++++++++++
 drivers/ntb/ntb_transport_edma.c              | 1110 ++++++++
 drivers/ntb/ntb_transport_internal.h          |  261 ++
 drivers/pci/controller/dwc/pcie-designware.c  |   26 +
 drivers/pci/endpoint/functions/pci-epf-test.c |  497 +++-
 drivers/pci/endpoint/functions/pci-epf-vntb.c |  380 ++-
 include/linux/dma/edma.h                      |  106 +
 include/linux/ntb.h                           |   88 +
 include/uapi/linux/pcitest.h                  |    3 +-
 .../pci_endpoint/pci_endpoint_test.c          |   17 +
 33 files changed, 7855 insertions(+), 2624 deletions(-)
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
 create mode 100644 drivers/ntb/hw/edma/Kconfig
 create mode 100644 drivers/ntb/hw/edma/Makefile
 create mode 100644 drivers/ntb/hw/edma/backend.c
 create mode 100644 drivers/ntb/hw/edma/backend.h
 create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c
 create mode 100644 drivers/ntb/ntb_transport_core.c
 create mode 100644 drivers/ntb/ntb_transport_edma.c
 create mode 100644 drivers/ntb/ntb_transport_internal.h

-- 
2.51.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 01/38] dmaengine: dw-edma: Export helper to get integrated register window
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control Koichiro Den
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Some DesignWare PCIe endpoint controllers integrate a DesignWare eDMA
instance. Remote-eDMA providers (e.g. vNTB) need to expose the eDMA
register block to the host through a memory window so the host can
ioremap it and run dw_edma_probe() against the remote view.

Record the physical base and size of the eDMA register aperture and
export dw_edma_get_reg_window() so higher-level code can query the
register window associated with a given PCI EPC device.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/controller/dwc/pcie-designware.c | 26 ++++++++++++++++++++
 include/linux/dma/edma.h                     | 25 +++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index 345365ea97c7..ad18b84c9f71 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -162,8 +162,12 @@ int dw_pcie_get_resources(struct dw_pcie *pci)
 			pci->edma.reg_base = devm_ioremap_resource(pci->dev, res);
 			if (IS_ERR(pci->edma.reg_base))
 				return PTR_ERR(pci->edma.reg_base);
+			pci->edma.reg_phys = res->start;
+			pci->edma.reg_size = resource_size(res);
 		} else if (pci->atu_size >= 2 * DEFAULT_DBI_DMA_OFFSET) {
 			pci->edma.reg_base = pci->atu_base + DEFAULT_DBI_DMA_OFFSET;
+			pci->edma.reg_phys = pci->atu_phys_addr + DEFAULT_DBI_DMA_OFFSET;
+			pci->edma.reg_size = pci->atu_size - DEFAULT_DBI_DMA_OFFSET;
 		}
 	}
 
@@ -1257,3 +1261,25 @@ resource_size_t dw_pcie_parent_bus_offset(struct dw_pcie *pci,
 
 	return cpu_phys_addr - reg_addr;
 }
+
+int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys,
+			   resource_size_t *sz)
+{
+	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
+	struct dw_pcie *pci;
+
+	if (!ep)
+		return -ENODEV;
+
+	pci = to_dw_pcie_from_ep(ep);
+	if (!pci->edma.reg_size)
+		return -ENODEV;
+
+	if (phys)
+		*phys = pci->edma.reg_phys;
+	if (sz)
+		*sz = pci->edma.reg_size;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_get_reg_window);
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 270b5458aecf..ffad10ff2cd6 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -83,6 +83,8 @@ struct dw_edma_chip {
 	u32			flags;
 
 	void __iomem		*reg_base;
+	phys_addr_t		reg_phys;
+	resource_size_t		reg_size;
 
 	u16			ll_wr_cnt;
 	u16			ll_rd_cnt;
@@ -115,4 +117,27 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
 }
 #endif /* CONFIG_DW_EDMA */
 
+struct pci_epc;
+
+#if IS_REACHABLE(CONFIG_PCIE_DW)
+/**
+ * dw_edma_get_reg_window - get eDMA register base and size
+ * @epc:  the EPC device with which the eDMA instance is integrated
+ * @phys: the output parameter that returns the register base address
+ * @sz:   the output parameter that returns the register space size
+ *
+ * Remote eDMA users (e.g. NTB) may need to expose the integrated DW eDMA
+ * register block through a memory window. This helper returns the physical
+ * base and size for a given DesignWare EP controller.
+ */
+int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys,
+			   resource_size_t *sz);
+#else
+static inline int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys,
+					 resource_size_t *sz)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_PCIE_DW */
+
 #endif /* _DW_EDMA_H */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 01/38] dmaengine: dw-edma: Export helper to get integrated register window Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 17:03   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 03/38] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled Koichiro Den
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

DesignWare EP eDMA can generate interrupts both locally and remotely
(LIE/RIE). Remote eDMA users need to decide, per channel, whether
completions should be handled locally, remotely, or both. Unless
carefully configured, the endpoint and host would race to ack the
interrupt.

Introduce a per-channel interrupt routing mode and export small APIs to
configure and query it. Update v0 programming so that RIE and local
done/abort interrupt masking follow the selected mode. The default mode
keeps the original behavior, so unless the new APIs are explicitly used,
no functional changes.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
 drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
 drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
 include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
 4 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index b9d59c3c0cb4..059b3996d383 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
 		chan->configured = false;
 		chan->request = EDMA_REQ_NONE;
 		chan->status = EDMA_ST_IDLE;
+		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
 
 		if (chan->dir == EDMA_DIR_WRITE)
 			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
@@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
 }
 EXPORT_SYMBOL_GPL(dw_edma_remove);
 
+int dw_edma_chan_irq_config(struct dma_chan *dchan,
+			    enum dw_edma_ch_irq_mode mode)
+{
+	struct dw_edma_chan *chan;
+
+	switch (mode) {
+	case DW_EDMA_CH_IRQ_DEFAULT:
+	case DW_EDMA_CH_IRQ_LOCAL:
+	case DW_EDMA_CH_IRQ_REMOTE:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (!dchan || !dchan->device)
+		return -ENODEV;
+
+	chan = dchan2dw_edma_chan(dchan);
+	if (!chan)
+		return -ENODEV;
+
+	chan->irq_mode = mode;
+
+	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
+		 str_write_read(chan->dir == EDMA_DIR_WRITE),
+		 chan->id, mode);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
+
+bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan;
+	struct dw_edma *dw;
+
+	if (!dchan || !dchan->device)
+		return false;
+
+	chan = dchan2dw_edma_chan(dchan);
+	if (!chan)
+		return false;
+
+	dw = chan->dw;
+	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
+		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
+	else
+		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
+
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
 MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 71894b9e0b15..8458d676551a 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -81,6 +81,8 @@ struct dw_edma_chan {
 
 	struct msi_msg			msi;
 
+	enum dw_edma_ch_irq_mode	irq_mode;
+
 	enum dw_edma_request		request;
 	enum dw_edma_status		status;
 	u8				configured;
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index 2850a9df80f5..80472148c335 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
 	for_each_set_bit(pos, &val, total) {
 		chan = &dw->chan[pos + off];
 
-		dw_edma_v0_core_clear_done_int(chan);
-		done(chan);
+		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+			dw_edma_v0_core_clear_done_int(chan);
+			done(chan);
+		}
 
 		ret = IRQ_HANDLED;
 	}
@@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
 	for_each_set_bit(pos, &val, total) {
 		chan = &dw->chan[pos + off];
 
-		dw_edma_v0_core_clear_abort_int(chan);
-		abort(chan);
+		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+			dw_edma_v0_core_clear_abort_int(chan);
+			abort(chan);
+		}
 
 		ret = IRQ_HANDLED;
 	}
@@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
 		j--;
 		if (!j) {
 			control |= DW_EDMA_V0_LIE;
-			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
+			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
 				control |= DW_EDMA_V0_RIE;
 		}
 
@@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
 				break;
 			}
 		}
-		/* Interrupt unmask - done, abort */
+		/* Interrupt mask/unmask - done, abort */
 		raw_spin_lock_irqsave(&dw->lock, flags);
 
 		tmp = GET_RW_32(dw, chan->dir, int_mask);
-		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
-		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
+			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
+			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+		} else {
+			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
+			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+		}
 		SET_RW_32(dw, chan->dir, int_mask, tmp);
 		/* Linked list error */
 		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index ffad10ff2cd6..6f50165ac084 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
 	DW_EDMA_CHIP_LOCAL	= BIT(0),
 };
 
+/*
+ * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
+ * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
+ * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
+ * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
+ *
+ * Some implementations require using LIE=1/RIE=1 with the local interrupt
+ * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
+ * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
+ * Write Interrupt Generation".
+ */
+enum dw_edma_ch_irq_mode {
+	DW_EDMA_CH_IRQ_DEFAULT	= 0,
+	DW_EDMA_CH_IRQ_LOCAL,
+	DW_EDMA_CH_IRQ_REMOTE,
+};
+
 /**
  * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
  * @dev:		 struct device of the eDMA controller
@@ -105,6 +122,22 @@ struct dw_edma_chip {
 #if IS_REACHABLE(CONFIG_DW_EDMA)
 int dw_edma_probe(struct dw_edma_chip *chip);
 int dw_edma_remove(struct dw_edma_chip *chip);
+/**
+ * dw_edma_chan_irq_config - configure per-channel interrupt routing
+ * @chan: DMA channel obtained from dma_request_channel()
+ * @mode: interrupt routing mode
+ *
+ * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
+ * not belong to the DesignWare eDMA driver.
+ */
+int dw_edma_chan_irq_config(struct dma_chan *chan,
+			    enum dw_edma_ch_irq_mode mode);
+
+/**
+ * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
+ * @chan: DMA channel obtained from dma_request_channel()
+ */
+bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
 #else
 static inline int dw_edma_probe(struct dw_edma_chip *chip)
 {
@@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
 {
 	return 0;
 }
+
+static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
+					  enum dw_edma_ch_irq_mode mode)
+{
+	return -ENODEV;
+}
+
+static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
+{
+	return false;
+}
 #endif /* CONFIG_DW_EDMA */
 
 struct pci_epc;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 03/38] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 01/38] dmaengine: dw-edma: Export helper to get integrated register window Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 04/38] dmaengine: dw-edma: Add notify-only channels support Koichiro Den
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

When a channel is configured to suppress host side interrupts (RIE=0),
the host side driver cannot rely on IRQ-driven progress. Add an optional
polling path for such channels. Polling is only enabled for channels
where dw_edma_chan_ignore_irq() is true.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/dma/dw-edma/dw-edma-core.c | 98 ++++++++++++++++++++++++------
 drivers/dma/dw-edma/dw-edma-core.h |  4 ++
 2 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 059b3996d383..696b9f3ea378 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -308,23 +308,6 @@ static int dw_edma_device_terminate_all(struct dma_chan *dchan)
 	return err;
 }
 
-static void dw_edma_device_issue_pending(struct dma_chan *dchan)
-{
-	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
-	unsigned long flags;
-
-	if (!chan->configured)
-		return;
-
-	spin_lock_irqsave(&chan->vc.lock, flags);
-	if (vchan_issue_pending(&chan->vc) && chan->request == EDMA_REQ_NONE &&
-	    chan->status == EDMA_ST_IDLE) {
-		chan->status = EDMA_ST_BUSY;
-		dw_edma_start_transfer(chan);
-	}
-	spin_unlock_irqrestore(&chan->vc.lock, flags);
-}
-
 static enum dma_status
 dw_edma_device_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
 			 struct dma_tx_state *txstate)
@@ -710,6 +693,69 @@ static irqreturn_t dw_edma_interrupt_common(int irq, void *data)
 	return ret;
 }
 
+static void dw_edma_done_arm(struct dw_edma_chan *chan)
+{
+	if (!dw_edma_chan_ignore_irq(&chan->vc.chan))
+		/* no need to arm since it's not to be ignored */
+		return;
+
+	queue_delayed_work(system_wq, &chan->poll_work, 1);
+}
+
+static void dw_edma_chan_poll_done(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	enum dma_status st;
+
+	if (!dw_edma_chan_ignore_irq(dchan))
+		/* no need to poll since it's not to be ignored */
+		return;
+
+	guard(spinlock_irqsave)(&chan->poll_lock);
+
+	if (chan->status != EDMA_ST_BUSY)
+		return;
+
+	st = dw_edma_core_ch_status(chan);
+
+	switch (st) {
+	case DMA_COMPLETE:
+		dw_edma_done_interrupt(chan);
+		if (chan->status == EDMA_ST_BUSY)
+			dw_edma_done_arm(chan);
+		break;
+	case DMA_IN_PROGRESS:
+		dw_edma_done_arm(chan);
+		break;
+	case DMA_ERROR:
+		dw_edma_abort_interrupt(chan);
+		break;
+	default:
+		break;
+	}
+}
+
+static void dw_edma_device_issue_pending(struct dma_chan *dchan)
+{
+	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+	unsigned long flags;
+
+	if (!chan->configured)
+		return;
+
+	dw_edma_chan_poll_done(dchan);
+
+	spin_lock_irqsave(&chan->vc.lock, flags);
+	if (vchan_issue_pending(&chan->vc) && chan->request == EDMA_REQ_NONE &&
+	    chan->status == EDMA_ST_IDLE) {
+		chan->status = EDMA_ST_BUSY;
+		dw_edma_start_transfer(chan);
+	} else {
+		dw_edma_done_arm(chan);
+	}
+	spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
 static int dw_edma_alloc_chan_resources(struct dma_chan *dchan)
 {
 	struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
@@ -1063,6 +1109,19 @@ int dw_edma_remove(struct dw_edma_chip *chip)
 }
 EXPORT_SYMBOL_GPL(dw_edma_remove);
 
+static void dw_edma_poll_work(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct dw_edma_chan *chan =
+		container_of(dwork, struct dw_edma_chan, poll_work);
+	struct dma_chan *dchan = &chan->vc.chan;
+
+	if (!chan->configured)
+		return;
+
+	dw_edma_chan_poll_done(dchan);
+}
+
 int dw_edma_chan_irq_config(struct dma_chan *dchan,
 			    enum dw_edma_ch_irq_mode mode)
 {
@@ -1090,6 +1149,11 @@ int dw_edma_chan_irq_config(struct dma_chan *dchan,
 		 str_write_read(chan->dir == EDMA_DIR_WRITE),
 		 chan->id, mode);
 
+	if (dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+		spin_lock_init(&chan->poll_lock);
+		INIT_DELAYED_WORK(&chan->poll_work, dw_edma_poll_work);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 8458d676551a..11fe4532f0bf 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -11,6 +11,7 @@
 
 #include <linux/msi.h>
 #include <linux/dma/edma.h>
+#include <linux/workqueue.h>
 
 #include "../virt-dma.h"
 
@@ -83,6 +84,9 @@ struct dw_edma_chan {
 
 	enum dw_edma_ch_irq_mode	irq_mode;
 
+	struct delayed_work		poll_work;
+	spinlock_t			poll_lock;
+
 	enum dw_edma_request		request;
 	enum dw_edma_status		status;
 	u8				configured;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 04/38] dmaengine: dw-edma: Add notify-only channels support
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (2 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 03/38] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region Koichiro Den
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Remote eDMA users may want to prepare descriptors on the remote side while
the local side only needs completion notifications (no cookie-based
accounting).

Provide a lightweight per-channel notification callback infrastructure.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/dma/dw-edma/dw-edma-core.c | 31 ++++++++++++++++++++++++++++++
 drivers/dma/dw-edma/dw-edma-core.h |  4 ++++
 include/linux/dma/edma.h           | 23 ++++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 696b9f3ea378..0eb8fc1dcc34 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -611,6 +611,13 @@ static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
 	struct virt_dma_desc *vd;
 	unsigned long flags;
 
+	if (chan->notify_only) {
+		if (chan->notify_cb)
+			chan->notify_cb(&chan->vc.chan, chan->notify_cb_param);
+		/* no cookie on this side, just return */
+		return;
+	}
+
 	spin_lock_irqsave(&chan->vc.lock, flags);
 	vd = vchan_next_desc(&chan->vc);
 	if (vd) {
@@ -815,6 +822,9 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
 		chan->request = EDMA_REQ_NONE;
 		chan->status = EDMA_ST_IDLE;
 		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
+		chan->notify_cb = NULL;
+		chan->notify_cb_param = NULL;
+		chan->notify_only = false;
 
 		if (chan->dir == EDMA_DIR_WRITE)
 			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
@@ -1178,6 +1188,27 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
 }
 EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
 
+int dw_edma_chan_register_notify(struct dma_chan *dchan,
+				 void (*cb)(struct dma_chan *chan, void *user),
+				 void *user)
+{
+	struct dw_edma_chan *chan;
+
+	if (!dchan || !dchan->device)
+		return -ENODEV;
+
+	chan = dchan2dw_edma_chan(dchan);
+	if (!chan)
+		return -ENODEV;
+
+	chan->notify_cb = cb;
+	chan->notify_cb_param = user;
+	chan->notify_only = !!cb;
+
+	return dw_edma_chan_irq_config(dchan, DW_EDMA_CH_IRQ_LOCAL);
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
+
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
 MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 11fe4532f0bf..f652d2e38843 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -84,6 +84,10 @@ struct dw_edma_chan {
 
 	enum dw_edma_ch_irq_mode	irq_mode;
 
+	void (*notify_cb)(struct dma_chan *chan, void *user);
+	void *notify_cb_param;
+	bool notify_only;
+
 	struct delayed_work		poll_work;
 	spinlock_t			poll_lock;
 
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 6f50165ac084..3c538246de07 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -138,6 +138,21 @@ int dw_edma_chan_irq_config(struct dma_chan *chan,
  * @chan: DMA channel obtained from dma_request_channel()
  */
 bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
+
+/**
+ * dw_edma_chan_register_notify - register local completion callback for a
+ *                                notification-only channel
+ * @chan: DMA channel obtained from dma_request_channel()
+ * @cb:   callback invoked in hardirq context when LIE interrupt is raised
+ * @user: opaque pointer passed back to @cb
+ *
+ * Intended for channels where descriptors are prepared on the remote side and
+ * the local side only wants completion notifications. This forces LOCAL mode
+ * so that the local side receives LIE interrupts.
+ */
+int dw_edma_chan_register_notify(struct dma_chan *chan,
+				 void (*cb)(struct dma_chan *chan, void *user),
+				 void *user);
 #else
 static inline int dw_edma_probe(struct dw_edma_chip *chip)
 {
@@ -159,6 +174,14 @@ static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
 {
 	return false;
 }
+
+static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
+					       void (*cb)(struct dma_chan *chan,
+							  void *user),
+					       void *user)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_DW_EDMA */
 
 struct pci_epc;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (3 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 04/38] dmaengine: dw-edma: Add notify-only channels support Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 17:05   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 06/38] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

A remote eDMA provider may need to expose the linked-list (LL) memory
region that was configured by platform glue (typically at boot), so the
peer (host) can map it and operate the remote view of the controller.

Export dw_edma_chan_get_ll_region() to return the LL region associated
with a given dma_chan.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
 include/linux/dma/edma.h           | 14 ++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 0eb8fc1dcc34..c4fb66a9b5f5 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
 }
 EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
 
+int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
+			       struct dw_edma_region *region)
+{
+	struct dw_edma_chip *chip;
+	struct dw_edma_chan *chan;
+
+	if (!dchan || !region || !dchan->device)
+		return -ENODEV;
+
+	chan = dchan2dw_edma_chan(dchan);
+	if (!chan)
+		return -ENODEV;
+
+	chip = chan->dw->chip;
+	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
+		return -EINVAL;
+
+	if (chan->dir == EDMA_DIR_WRITE)
+		*region = chip->ll_region_wr[chan->id];
+	else
+		*region = chip->ll_region_rd[chan->id];
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
+
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
 MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 3c538246de07..c9ec426e27ec 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
 int dw_edma_chan_register_notify(struct dma_chan *chan,
 				 void (*cb)(struct dma_chan *chan, void *user),
 				 void *user);
+
+/**
+ * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
+ * @chan: the target DMA channel
+ * @region: output parameter returning the corresponding LL region
+ */
+int dw_edma_chan_get_ll_region(struct dma_chan *chan,
+			       struct dw_edma_region *region);
 #else
 static inline int dw_edma_probe(struct dw_edma_chip *chip)
 {
@@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
 {
 	return -ENODEV;
 }
+
+static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
+					     struct dw_edma_region *region)
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_DW_EDMA */
 
 struct pci_epc;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 06/38] NTB: epf: Add mwN_offset support and config region versioning
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (4 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 07/38] NTB: epf: Reserve a subset of MSI vectors for non-NTB users Koichiro Den
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Introduce new mwN_offset configfs attributes to specify memory window
offsets. This enables mapping multiple windows into a single BAR at
arbitrary offsets, improving layout flexibility.

Extend the control register region and add a 32-bit config version
field. Reuse the unused NTB_EPF_TOPOLOGY offset (0x0C) for a version
field. The endpoint function driver writes 1 (NTB_EPF_CTRL_VERSION_V1),
and ntb_hw_epf reads it at probe time and refuses to bind to unknown
versions.

Compatibility matrix:

            | EP v1 | EP legacy
  ----------+-------+----------
  RC v1     | v1    | legacy
  RC legacy | ? (*) | legacy

(*) An unpatched (legacy) RC may misinterpret the paired EP's intention
    and program MW layout incorrectly when offsets are used.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c               |  60 +++++++-
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 136 ++++++++++++++++--
 2 files changed, 176 insertions(+), 20 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 9935da48a52e..4b3fa996219a 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -30,18 +30,27 @@
 #define NTB_EPF_LINK_STATUS	0x0A
 #define LINK_STATUS_UP		BIT(0)
 
-#define NTB_EPF_TOPOLOGY	0x0C
+/*
+ * Legacy unused NTB_EPF_TOPOLOGY (0x0c) is repurposed as a control version
+ * field
+ */
+#define NTB_EPF_CTRL_VERSION	0x0C
 #define NTB_EPF_LOWER_ADDR	0x10
 #define NTB_EPF_UPPER_ADDR	0x14
 #define NTB_EPF_LOWER_SIZE	0x18
 #define NTB_EPF_UPPER_SIZE	0x1C
 #define NTB_EPF_MW_COUNT	0x20
-#define NTB_EPF_MW1_OFFSET	0x24
+#define NTB_EPF_MW1_OFFSET	0x24 /* used by legacy (version=0) only */
 #define NTB_EPF_SPAD_OFFSET	0x28
 #define NTB_EPF_SPAD_COUNT	0x2C
 #define NTB_EPF_DB_ENTRY_SIZE	0x30
 #define NTB_EPF_DB_DATA(n)	(0x34 + (n) * 4)
 #define NTB_EPF_DB_OFFSET(n)	(0xB4 + (n) * 4)
+#define NTB_EPF_MW_OFFSET(n)	(0x134 + (n) * 4)
+#define NTB_EPF_MW_SIZE(n)	(0x144 + (n) * 4)
+
+#define NTB_EPF_CTRL_VERSION_LEGACY	0
+#define NTB_EPF_CTRL_VERSION_V1		1
 
 #define NTB_EPF_MIN_DB_COUNT	3
 #define NTB_EPF_MAX_DB_COUNT	31
@@ -451,11 +460,22 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
 				    phys_addr_t *base, resource_size_t *size)
 {
 	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
-	u32 offset = 0;
+	resource_size_t bar_sz;
+	u32 offset, sz, ver;
 	int bar;
 
-	if (idx == 0)
-		offset = readl(ndev->ctrl_reg + NTB_EPF_MW1_OFFSET);
+	ver = readl(ndev->ctrl_reg + NTB_EPF_CTRL_VERSION);
+	if (ver == NTB_EPF_CTRL_VERSION_LEGACY) {
+		/* Legacy layout: only MW1 offset exists, and there is no MW_SIZE[] */
+		if (idx == 0)
+			offset = readl(ndev->ctrl_reg + NTB_EPF_MW1_OFFSET);
+		else
+			offset = 0;
+		sz = 0;
+	} else {
+		offset = readl(ndev->ctrl_reg + NTB_EPF_MW_OFFSET(idx));
+		sz = readl(ndev->ctrl_reg + NTB_EPF_MW_SIZE(idx));
+	}
 
 	bar = ntb_epf_mw_to_bar(ndev, idx);
 	if (bar < 0)
@@ -464,8 +484,11 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
 	if (base)
 		*base = pci_resource_start(ndev->ntb.pdev, bar) + offset;
 
-	if (size)
-		*size = pci_resource_len(ndev->ntb.pdev, bar) - offset;
+	if (size) {
+		bar_sz = pci_resource_len(ndev->ntb.pdev, bar);
+		*size = sz ? min_t(resource_size_t, sz, bar_sz - offset)
+			   : (bar_sz > offset ? bar_sz - offset : 0);
+	}
 
 	return 0;
 }
@@ -547,6 +570,25 @@ static inline void ntb_epf_init_struct(struct ntb_epf_dev *ndev,
 	ndev->ntb.ops = &ntb_epf_ops;
 }
 
+static int ntb_epf_check_version(struct ntb_epf_dev *ndev)
+{
+	struct device *dev = ndev->dev;
+	u32 ver;
+
+	ver = readl(ndev->ctrl_reg + NTB_EPF_CTRL_VERSION);
+
+	switch (ver) {
+	case NTB_EPF_CTRL_VERSION_LEGACY:
+	case NTB_EPF_CTRL_VERSION_V1:
+		break;
+	default:
+		dev_err(dev, "Unsupported NTB EPF version %u\n", ver);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
 {
 	struct device *dev = ndev->dev;
@@ -696,6 +738,10 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
 		return ret;
 	}
 
+	ret = ntb_epf_check_version(ndev);
+	if (ret)
+		return ret;
+
 	ret = ntb_epf_init_dev(ndev);
 	if (ret) {
 		dev_err(dev, "Failed to init device\n");
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index c08d349db350..4927faa28255 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -39,6 +39,7 @@
 #include <linux/atomic.h>
 #include <linux/delay.h>
 #include <linux/io.h>
+#include <linux/log2.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
@@ -61,6 +62,7 @@ static struct workqueue_struct *kpcintb_workqueue;
 
 #define LINK_STATUS_UP			BIT(0)
 
+#define CTRL_VERSION			1
 #define SPAD_COUNT			64
 #define DB_COUNT			4
 #define NTB_MW_OFFSET			2
@@ -107,7 +109,7 @@ struct epf_ntb_ctrl {
 	u32 argument;
 	u16 command_status;
 	u16 link_status;
-	u32 topology;
+	u32 version;
 	u64 addr;
 	u64 size;
 	u32 num_mws;
@@ -117,6 +119,8 @@ struct epf_ntb_ctrl {
 	u32 db_entry_size;
 	u32 db_data[MAX_DB_COUNT];
 	u32 db_offset[MAX_DB_COUNT];
+	u32 mw_offset[MAX_MW];
+	u32 mw_size[MAX_MW];
 } __packed;
 
 struct epf_ntb {
@@ -128,6 +132,7 @@ struct epf_ntb {
 	u32 db_count;
 	u32 spad_count;
 	u64 mws_size[MAX_MW];
+	u64 mws_offset[MAX_MW];
 	atomic64_t db;
 	u32 vbus_number;
 	u16 vntb_pid;
@@ -460,10 +465,13 @@ static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
 	ntb->reg = base;
 
 	ctrl = ntb->reg;
+	ctrl->version = CTRL_VERSION;
 	ctrl->spad_offset = ctrl_size;
 
 	ctrl->spad_count = spad_count;
 	ctrl->num_mws = ntb->num_mws;
+	memset(ctrl->mw_offset, 0, sizeof(ctrl->mw_offset));
+	memset(ctrl->mw_size, 0, sizeof(ctrl->mw_size));
 	ntb->spad_size = spad_size;
 
 	ctrl->db_entry_size = sizeof(u32);
@@ -695,15 +703,31 @@ static void epf_ntb_db_bar_clear(struct epf_ntb *ntb)
  */
 static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
 {
+	struct device *dev = &ntb->epf->dev;
+	u64 bar_ends[BAR_5 + 1] = { 0 };
+	unsigned long bars_used = 0;
+	enum pci_barno barno;
+	u64 off, size, end;
 	int ret = 0;
 	int i;
-	u64 size;
-	enum pci_barno barno;
-	struct device *dev = &ntb->epf->dev;
 
 	for (i = 0; i < ntb->num_mws; i++) {
-		size = ntb->mws_size[i];
 		barno = ntb->epf_ntb_bar[BAR_MW1 + i];
+		off = ntb->mws_offset[i];
+		size = ntb->mws_size[i];
+		end = off + size;
+		if (end > bar_ends[barno])
+			bar_ends[barno] = end;
+		bars_used |= BIT(barno);
+	}
+
+	for (barno = BAR_0; barno <= BAR_5; barno++) {
+		if (!(bars_used & BIT(barno)))
+			continue;
+		if (bar_ends[barno] < SZ_4K)
+			size = SZ_4K;
+		else
+			size = roundup_pow_of_two(bar_ends[barno]);
 
 		ntb->epf->bar[barno].barno = barno;
 		ntb->epf->bar[barno].size = size;
@@ -719,8 +743,12 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
 				      &ntb->epf->bar[barno]);
 		if (ret) {
 			dev_err(dev, "MW set failed\n");
-			goto err_alloc_mem;
+			goto err_set_bar;
 		}
+	}
+
+	for (i = 0; i < ntb->num_mws; i++) {
+		size = ntb->mws_size[i];
 
 		/* Allocate EPC outbound memory windows to vpci vntb device */
 		ntb->vpci_mw_addr[i] = pci_epc_mem_alloc_addr(ntb->epf->epc,
@@ -729,19 +757,31 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
 		if (!ntb->vpci_mw_addr[i]) {
 			ret = -ENOMEM;
 			dev_err(dev, "Failed to allocate source address\n");
-			goto err_set_bar;
+			goto err_alloc_mem;
 		}
 	}
 
+	for (i = 0; i < ntb->num_mws; i++) {
+		ntb->reg->mw_offset[i] = (u32)ntb->mws_offset[i];
+		ntb->reg->mw_size[i] = (u32)ntb->mws_size[i];
+	}
+
 	return ret;
 
-err_set_bar:
-	pci_epc_clear_bar(ntb->epf->epc,
-			  ntb->epf->func_no,
-			  ntb->epf->vfunc_no,
-			  &ntb->epf->bar[barno]);
 err_alloc_mem:
-	epf_ntb_mw_bar_clear(ntb, i);
+	while (--i >= 0)
+		pci_epc_mem_free_addr(ntb->epf->epc,
+				      ntb->vpci_mw_phy[i],
+				      ntb->vpci_mw_addr[i],
+				      ntb->mws_size[i]);
+err_set_bar:
+	while (--barno >= BAR_0)
+		if (bars_used & BIT(barno))
+			pci_epc_clear_bar(ntb->epf->epc,
+					  ntb->epf->func_no,
+					  ntb->epf->vfunc_no,
+					  &ntb->epf->bar[barno]);
+
 	return ret;
 }
 
@@ -1034,6 +1074,60 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item,	\
 	return len;							\
 }
 
+#define EPF_NTB_MW_OFF_R(_name)						\
+static ssize_t epf_ntb_##_name##_show(struct config_item *item,		\
+				      char *page)			\
+{									\
+	struct config_group *group = to_config_group(item);		\
+	struct epf_ntb *ntb = to_epf_ntb(group);			\
+	struct device *dev = &ntb->epf->dev;				\
+	int win_no, idx;						\
+									\
+	if (sscanf(#_name, "mw%d_offset", &win_no) != 1)		\
+		return -EINVAL;						\
+									\
+	idx = win_no - 1;						\
+	if (idx < 0 || idx >= ntb->num_mws) {				\
+		dev_err(dev, "MW%d out of range (num_mws=%d)\n",	\
+			win_no, ntb->num_mws);				\
+		return -EINVAL;						\
+	}								\
+									\
+	idx = array_index_nospec(idx, ntb->num_mws);			\
+	return sprintf(page, "%llu\n", ntb->mws_offset[idx]);		\
+}
+
+#define EPF_NTB_MW_OFF_W(_name)						\
+static ssize_t epf_ntb_##_name##_store(struct config_item *item,	\
+				       const char *page, size_t len)	\
+{									\
+	struct config_group *group = to_config_group(item);		\
+	struct epf_ntb *ntb = to_epf_ntb(group);			\
+	struct device *dev = &ntb->epf->dev;				\
+	int win_no, idx;						\
+	u64 val;							\
+	int ret;							\
+									\
+	ret = kstrtou64(page, 0, &val);					\
+	if (ret)							\
+		return ret;						\
+									\
+	if (sscanf(#_name, "mw%d_offset", &win_no) != 1)		\
+		return -EINVAL;						\
+									\
+	idx = win_no - 1;						\
+	if (idx < 0 || idx >= ntb->num_mws) {				\
+		dev_err(dev, "MW%d out of range (num_mws=%d)\n",	\
+			win_no, ntb->num_mws);				\
+		return -EINVAL;						\
+	}								\
+									\
+	idx = array_index_nospec(idx, ntb->num_mws);			\
+	ntb->mws_offset[idx] = val;					\
+									\
+	return len;							\
+}
+
 #define EPF_NTB_BAR_R(_name, _id)					\
 	static ssize_t epf_ntb_##_name##_show(struct config_item *item,	\
 					      char *page)		\
@@ -1104,6 +1198,14 @@ EPF_NTB_MW_R(mw3)
 EPF_NTB_MW_W(mw3)
 EPF_NTB_MW_R(mw4)
 EPF_NTB_MW_W(mw4)
+EPF_NTB_MW_OFF_R(mw1_offset)
+EPF_NTB_MW_OFF_W(mw1_offset)
+EPF_NTB_MW_OFF_R(mw2_offset)
+EPF_NTB_MW_OFF_W(mw2_offset)
+EPF_NTB_MW_OFF_R(mw3_offset)
+EPF_NTB_MW_OFF_W(mw3_offset)
+EPF_NTB_MW_OFF_R(mw4_offset)
+EPF_NTB_MW_OFF_W(mw4_offset)
 EPF_NTB_BAR_R(ctrl_bar, BAR_CONFIG)
 EPF_NTB_BAR_W(ctrl_bar, BAR_CONFIG)
 EPF_NTB_BAR_R(db_bar, BAR_DB)
@@ -1124,6 +1226,10 @@ CONFIGFS_ATTR(epf_ntb_, mw1);
 CONFIGFS_ATTR(epf_ntb_, mw2);
 CONFIGFS_ATTR(epf_ntb_, mw3);
 CONFIGFS_ATTR(epf_ntb_, mw4);
+CONFIGFS_ATTR(epf_ntb_, mw1_offset);
+CONFIGFS_ATTR(epf_ntb_, mw2_offset);
+CONFIGFS_ATTR(epf_ntb_, mw3_offset);
+CONFIGFS_ATTR(epf_ntb_, mw4_offset);
 CONFIGFS_ATTR(epf_ntb_, vbus_number);
 CONFIGFS_ATTR(epf_ntb_, vntb_pid);
 CONFIGFS_ATTR(epf_ntb_, vntb_vid);
@@ -1142,6 +1248,10 @@ static struct configfs_attribute *epf_ntb_attrs[] = {
 	&epf_ntb_attr_mw2,
 	&epf_ntb_attr_mw3,
 	&epf_ntb_attr_mw4,
+	&epf_ntb_attr_mw1_offset,
+	&epf_ntb_attr_mw2_offset,
+	&epf_ntb_attr_mw3_offset,
+	&epf_ntb_attr_mw4_offset,
 	&epf_ntb_attr_vbus_number,
 	&epf_ntb_attr_vntb_pid,
 	&epf_ntb_attr_vntb_vid,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 07/38] NTB: epf: Reserve a subset of MSI vectors for non-NTB users
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (5 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 06/38] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

The ntb_hw_epf driver currently uses all MSI/MSI-X vectors allocated for
the endpoint as doorbell interrupts. On SoCs that also run other
functions on the same PCIe controller (e.g. DesignWare eDMA), we need to
reserve some vectors for those other consumers.

Introduce NTB_EPF_IRQ_RESERVE and track the total number of allocated
vectors in ntb_epf_dev's 'num_irqs' field. Use only (num_irqs -
NTB_EPF_IRQ_RESERVE) vectors for NTB doorbells and free all num_irqs
vectors in the teardown path, so that the remaining vectors can be used
by other endpoint functions such as the integrated DesignWare eDMA.

This makes it possible to share the PCIe controller MSI space between
NTB and other on-chip IP blocks.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 36 ++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 4b3fa996219a..dbb5bebe63a5 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -54,6 +54,7 @@
 
 #define NTB_EPF_MIN_DB_COUNT	3
 #define NTB_EPF_MAX_DB_COUNT	31
+#define NTB_EPF_IRQ_RESERVE	8
 
 #define NTB_EPF_COMMAND_TIMEOUT	1000 /* 1 Sec */
 
@@ -92,6 +93,9 @@ struct ntb_epf_dev {
 	unsigned int spad_count;
 	unsigned int db_count;
 
+	unsigned int num_irqs;
+	unsigned int num_ntb_irqs;
+
 	void __iomem *ctrl_reg;
 	void __iomem *db_reg;
 	void __iomem *peer_spad_reg;
@@ -345,7 +349,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
 	u32 argument = MSIX_ENABLE;
 	int irq;
 	int ret;
-	int i;
+	int i = 0;
 
 	irq = pci_alloc_irq_vectors(pdev, msi_min, msi_max, PCI_IRQ_MSIX);
 	if (irq < 0) {
@@ -359,33 +363,40 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
 		argument &= ~MSIX_ENABLE;
 	}
 
+	ndev->num_irqs = irq;
+	irq -= NTB_EPF_IRQ_RESERVE;
+	if (irq <= 0) {
+		dev_err(dev, "Not enough irqs allocated\n");
+		ret = -ENOSPC;
+		goto err_out;
+	}
+	ndev->num_ntb_irqs = irq;
+
 	for (i = 0; i < irq; i++) {
 		ret = request_irq(pci_irq_vector(pdev, i), ntb_epf_vec_isr,
 				  0, "ntb_epf", ndev);
 		if (ret) {
 			dev_err(dev, "Failed to request irq\n");
-			goto err_request_irq;
+			goto err_out;
 		}
 	}
 
-	ndev->db_count = irq - 1;
+	ndev->db_count = irq;
 
 	ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
 				   argument | irq);
 	if (ret) {
 		dev_err(dev, "Failed to configure doorbell\n");
-		goto err_configure_db;
+		goto err_out;
 	}
 
 	return 0;
 
-err_configure_db:
-	for (i = 0; i < ndev->db_count + 1; i++)
+err_out:
+	while (i-- > 0)
 		free_irq(pci_irq_vector(pdev, i), ndev);
 
-err_request_irq:
 	pci_free_irq_vectors(pdev);
-
 	return ret;
 }
 
@@ -502,7 +513,7 @@ static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
 	u32 db_offset;
 	u32 db_data;
 
-	if (interrupt_num > ndev->db_count) {
+	if (interrupt_num >= ndev->db_count) {
 		dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
 			interrupt_num, ndev->db_count);
 		return -EINVAL;
@@ -512,6 +523,7 @@ static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
 
 	db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
 	db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
+
 	writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
 	       db_offset);
 
@@ -595,8 +607,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
 	int ret;
 
 	/* One Link interrupt and rest doorbell interrupt */
-	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1,
-			       NTB_EPF_MAX_DB_COUNT + 1);
+	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
+			       NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
 	if (ret) {
 		dev_err(dev, "Failed to init ISR\n");
 		return ret;
@@ -704,7 +716,7 @@ static void ntb_epf_cleanup_isr(struct ntb_epf_dev *ndev)
 
 	ntb_epf_send_command(ndev, CMD_TEARDOWN_DOORBELL, ndev->db_count + 1);
 
-	for (i = 0; i < ndev->db_count + 1; i++)
+	for (i = 0; i < ndev->num_ntb_irqs; i++)
 		free_irq(pci_irq_vector(pdev, i), ndev);
 	pci_free_irq_vectors(pdev);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (6 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 07/38] NTB: epf: Reserve a subset of MSI vectors for non-NTB users Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:03   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming Koichiro Den
                   ` (30 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Provide db_vector_count() and db_vector_mask() implementations for both
ntb_hw_epf and pci-epf-vntb so that ntb_transport can map MSI vectors to
doorbell bits. Without them, the upper layer cannot identify which
doorbell vector fired and ends up scheduling rxc_db_work() for all queue
pairs, resulting in a thundering-herd effect when multiple queue pairs
(QPs) are enabled.

With this change, .peer_db_set() must honor the db_bits mask and raise
all requested doorbell interrupts, so update those implementations
accordingly.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c               | 47 ++++++++++++-------
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 41 +++++++++++++---
 2 files changed, 64 insertions(+), 24 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index dbb5bebe63a5..c37ede4063dc 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -381,7 +381,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
 		}
 	}
 
-	ndev->db_count = irq;
+	ndev->db_count = irq - 1;
 
 	ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
 				   argument | irq);
@@ -415,6 +415,22 @@ static u64 ntb_epf_db_valid_mask(struct ntb_dev *ntb)
 	return ntb_ndev(ntb)->db_valid_mask;
 }
 
+static int ntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+	return ntb_ndev(ntb)->db_count;
+}
+
+static u64 ntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
+
+	db_vector--; /* vector 0 is reserved for link events */
+	if (db_vector < 0 || db_vector >= ndev->db_count)
+		return 0;
+
+	return ndev->db_valid_mask & BIT_ULL(db_vector);
+}
+
 static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
 {
 	return 0;
@@ -507,26 +523,21 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
 static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
 {
 	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
-	u32 interrupt_num = ffs(db_bits) + 1;
-	struct device *dev = ndev->dev;
+	u32 interrupt_num;
 	u32 db_entry_size;
 	u32 db_offset;
 	u32 db_data;
-
-	if (interrupt_num >= ndev->db_count) {
-		dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
-			interrupt_num, ndev->db_count);
-		return -EINVAL;
-	}
+	unsigned long i;
 
 	db_entry_size = readl(ndev->ctrl_reg + NTB_EPF_DB_ENTRY_SIZE);
 
-	db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
-	db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
-
-	writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
-	       db_offset);
-
+	for_each_set_bit(i, (unsigned long *)&db_bits, ndev->db_count) {
+		interrupt_num = i + 1;
+		db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
+		db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
+		writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
+		       db_offset);
+	}
 	return 0;
 }
 
@@ -556,6 +567,8 @@ static const struct ntb_dev_ops ntb_epf_ops = {
 	.spad_count		= ntb_epf_spad_count,
 	.peer_mw_count		= ntb_epf_peer_mw_count,
 	.db_valid_mask		= ntb_epf_db_valid_mask,
+	.db_vector_count	= ntb_epf_db_vector_count,
+	.db_vector_mask		= ntb_epf_db_vector_mask,
 	.db_set_mask		= ntb_epf_db_set_mask,
 	.mw_set_trans		= ntb_epf_mw_set_trans,
 	.mw_clear_trans		= ntb_epf_mw_clear_trans,
@@ -607,8 +620,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
 	int ret;
 
 	/* One Link interrupt and rest doorbell interrupt */
-	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
-			       NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
+	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE,
+			       NTB_EPF_MAX_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE);
 	if (ret) {
 		dev_err(dev, "Failed to init ISR\n");
 		return ret;
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 4927faa28255..39e784e21236 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1384,6 +1384,22 @@ static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
 	return BIT_ULL(ntb_ndev(ntb)->db_count) - 1;
 }
 
+static int vntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+	return ntb_ndev(ntb)->db_count;
+}
+
+static u64 vntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+	struct epf_ntb *ndev = ntb_ndev(ntb);
+
+	db_vector--; /* vector 0 is reserved for link events */
+	if (db_vector < 0 || db_vector >= ndev->db_count)
+		return 0;
+
+	return BIT_ULL(db_vector);
+}
+
 static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
 {
 	return 0;
@@ -1487,20 +1503,29 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
 
 static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
 {
-	u32 interrupt_num = ffs(db_bits) + 1;
 	struct epf_ntb *ntb = ntb_ndev(ndev);
 	u8 func_no, vfunc_no;
-	int ret;
+	u64 failed = 0;
+	unsigned long i;
 
 	func_no = ntb->epf->func_no;
 	vfunc_no = ntb->epf->vfunc_no;
 
-	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
-				PCI_IRQ_MSI, interrupt_num + 1);
-	if (ret)
-		dev_err(&ntb->ntb->dev, "Failed to raise IRQ\n");
+	for_each_set_bit(i, (unsigned long *)&db_bits, ntb->db_count) {
+		/*
+		 * DB bit i is MSI interrupt (i + 2).
+		 * Vector 0 is used for link events and MSI vectors are
+		 * 1-based for pci_epc_raise_irq().
+		 */
+		if (pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
+				      PCI_IRQ_MSI, i + 2))
+			failed |= BIT_ULL(i);
+	}
+	if (failed)
+		dev_err(&ntb->ntb->dev, "Failed to raise IRQ (%#llx)\n",
+			failed);
 
-	return ret;
+	return failed ? -EIO : 0;
 }
 
 static u64 vntb_epf_db_read(struct ntb_dev *ndev)
@@ -1561,6 +1586,8 @@ static const struct ntb_dev_ops vntb_epf_ops = {
 	.spad_count		= vntb_epf_spad_count,
 	.peer_mw_count		= vntb_epf_peer_mw_count,
 	.db_valid_mask		= vntb_epf_db_valid_mask,
+	.db_vector_count	= vntb_epf_db_vector_count,
+	.db_vector_mask		= vntb_epf_db_vector_mask,
 	.db_set_mask		= vntb_epf_db_set_mask,
 	.mw_set_trans		= vntb_epf_mw_set_trans,
 	.mw_clear_trans		= vntb_epf_mw_clear_trans,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (7 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:07   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 10/38] NTB: core: Add .get_private_data() to ntb_dev_ops Koichiro Den
                   ` (29 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

At the BAR level, multiple MWs may be packed into a single BAR. In
addition, a single MW may itself be subdivided into multiple address
subranges, each of which can be mapped independently by the underlying
NTB hardware.

Introduce an optional ntb_dev_ops callback, .mw_set_trans_ranges(), to
describe and program such layouts explicitly. The helper allows an NTB
driver to provide, for each MW, a list of contiguous subranges that
together cover the MW address space.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 include/linux/ntb.h | 46 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 8ff9d663096b..84908753f446 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -206,6 +206,11 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
 		1;
 }
 
+struct ntb_mw_subrange {
+	dma_addr_t	addr;
+	resource_size_t	size;
+};
+
 /**
  * struct ntb_dev_ops - ntb device operations
  * @port_number:	See ntb_port_number().
@@ -218,6 +223,7 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
  * @mw_count:		See ntb_mw_count().
  * @mw_get_align:	See ntb_mw_get_align().
  * @mw_set_trans:	See ntb_mw_set_trans().
+ * @mw_set_trans_ranges:See ntb_mw_set_trans_ranges().
  * @mw_clear_trans:	See ntb_mw_clear_trans().
  * @peer_mw_count:	See ntb_peer_mw_count().
  * @peer_mw_get_addr:	See ntb_peer_mw_get_addr().
@@ -276,6 +282,9 @@ struct ntb_dev_ops {
 			    resource_size_t *size_max);
 	int (*mw_set_trans)(struct ntb_dev *ntb, int pidx, int widx,
 			    dma_addr_t addr, resource_size_t size);
+	int (*mw_set_trans_ranges)(struct ntb_dev *ntb, int pidx, int widx,
+				   unsigned int num_ranges,
+				   const struct ntb_mw_subrange *ranges);
 	int (*mw_clear_trans)(struct ntb_dev *ntb, int pidx, int widx);
 	int (*peer_mw_count)(struct ntb_dev *ntb);
 	int (*peer_mw_get_addr)(struct ntb_dev *ntb, int widx,
@@ -350,6 +359,7 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
 		ops->mw_get_align			&&
 		(ops->mw_set_trans			||
 		 ops->peer_mw_set_trans)		&&
+		/* ops->mw_set_trans_ranges		&& */
 		/* ops->mw_clear_trans			&& */
 		ops->peer_mw_count			&&
 		ops->peer_mw_get_addr			&&
@@ -860,6 +870,42 @@ static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
 	return ntb->ops->mw_set_trans(ntb, pidx, widx, addr, size);
 }
 
+/**
+ * ntb_mw_set_trans_ranges() - set the translations of an inbound memory
+ *                             window, composed of multiple subranges.
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ * @widx:	Memory window index.
+ * @num_ranges:	The number of ranges described by @ranges array.
+ * @ranges:	Array of subranges. The subranges are interpreted in ascending
+ *		window offset order (i.e. ranges[0] maps the first part of the MW,
+ *		ranges[1] the next part, ...).
+ *
+ * Return: Zero on success, otherwise an error number. If the driver does
+ *         not implement the callback, return -EOPNOTSUPP.
+ */
+static inline int ntb_mw_set_trans_ranges(struct ntb_dev *ntb, int pidx, int widx,
+					  unsigned int num_ranges,
+					  const struct ntb_mw_subrange *ranges)
+{
+	if (!num_ranges || !ranges)
+		return -EINVAL;
+
+	if (ntb->ops->mw_set_trans_ranges)
+		return ntb->ops->mw_set_trans_ranges(ntb, pidx, widx,
+						     num_ranges, ranges);
+
+	/*
+	 * Fallback for drivers that only support the legacy single-range
+	 * translation API.
+	 */
+	if (num_ranges == 1)
+		return ntb_mw_set_trans(ntb, pidx, widx,
+					ranges[0].addr, ranges[0].size);
+
+	return -EOPNOTSUPP;
+}
+
 /**
  * ntb_mw_clear_trans() - clear the translation address of an inbound memory
  *                        window
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 10/38] NTB: core: Add .get_private_data() to ntb_dev_ops
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (8 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() " Koichiro Den
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add an optional get_private_data() callback to retrieve a private data
specific to the underlying hardware driver, e.g. pci_epc device
associated with the NTB implementation.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 include/linux/ntb.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 84908753f446..aa888219732a 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -262,6 +262,7 @@ struct ntb_mw_subrange {
  * @msg_clear_mask:	See ntb_msg_clear_mask().
  * @msg_read:		See ntb_msg_read().
  * @peer_msg_write:	See ntb_peer_msg_write().
+ * @get_private_data:	See ntb_get_private_data().
  */
 struct ntb_dev_ops {
 	int (*port_number)(struct ntb_dev *ntb);
@@ -338,6 +339,7 @@ struct ntb_dev_ops {
 	int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
 	u32 (*msg_read)(struct ntb_dev *ntb, int *pidx, int midx);
 	int (*peer_msg_write)(struct ntb_dev *ntb, int pidx, int midx, u32 msg);
+	void *(*get_private_data)(struct ntb_dev *ntb);
 };
 
 static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
@@ -401,6 +403,9 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
 		/* !ops->msg_clear_mask == !ops->msg_count	&& */
 		!ops->msg_read == !ops->msg_count		&&
 		!ops->peer_msg_write == !ops->msg_count		&&
+
+		/* Miscellaneous optional callbacks */
+		/* ops->get_private_data			&& */
 		1;
 }
 
@@ -1609,6 +1614,21 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
 	return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
 }
 
+/**
+ * ntb_get_private_data() - get private data specific to the hardware driver
+ * @ntb:	NTB device context.
+ *
+ * Retrieve private data specific to the hardware driver.
+ *
+ * Return: Pointer to the private data if available, or %NULL otherwise.
+ */
+static inline void *ntb_get_private_data(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->get_private_data)
+		return NULL;
+	return ntb->ops->get_private_data(ntb);
+}
+
 /**
  * ntb_peer_resource_idx() - get a resource index for a given peer idx
  * @ntb:	NTB device context.
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() to ntb_dev_ops
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (9 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 10/38] NTB: core: Add .get_private_data() to ntb_dev_ops Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:09   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 12/38] NTB: core: Add driver_override support for NTB devices Koichiro Den
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Not all NTB implementations are able to naturally do DMA mapping through
the NTB PCI device itself (e.g. due to IOMMU topology or non-PCI backing
devices).

Add an optional .get_dma_dev() callback and helper so clients can use
the appropriate struct device for DMA API allocations and mappings.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 include/linux/ntb.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index aa888219732a..7ac8cb13e90d 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -262,6 +262,7 @@ struct ntb_mw_subrange {
  * @msg_clear_mask:	See ntb_msg_clear_mask().
  * @msg_read:		See ntb_msg_read().
  * @peer_msg_write:	See ntb_peer_msg_write().
+ * @get_dma_dev:	See ntb_get_dma_dev().
  * @get_private_data:	See ntb_get_private_data().
  */
 struct ntb_dev_ops {
@@ -339,6 +340,7 @@ struct ntb_dev_ops {
 	int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
 	u32 (*msg_read)(struct ntb_dev *ntb, int *pidx, int midx);
 	int (*peer_msg_write)(struct ntb_dev *ntb, int pidx, int midx, u32 msg);
+	struct device *(*get_dma_dev)(struct ntb_dev *ntb);
 	void *(*get_private_data)(struct ntb_dev *ntb);
 };
 
@@ -405,6 +407,7 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
 		!ops->peer_msg_write == !ops->msg_count		&&
 
 		/* Miscellaneous optional callbacks */
+		/* ops->get_dma_dev				&& */
 		/* ops->get_private_data			&& */
 		1;
 }
@@ -1614,6 +1617,21 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
 	return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
 }
 
+/**
+ * ntb_get_dma_dev() - get the device suitable for DMA mapping
+ * @ntb:	NTB device context.
+ *
+ * Retrieve a struct device which is suitable for DMA mapping.
+ *
+ * Return: Pointer to struct device.
+ */
+static inline struct device __maybe_unused *ntb_get_dma_dev(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->get_dma_dev)
+		return ntb->dev.parent;
+	return ntb->ops->get_dma_dev(ntb);
+}
+
 /**
  * ntb_get_private_data() - get private data specific to the hardware driver
  * @ntb:	NTB device context.
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 12/38] NTB: core: Add driver_override support for NTB devices
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (10 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() " Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs Koichiro Den
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

When multiple NTB client drivers are available, selecting which driver
should bind to a given NTB device becomes necessary. The NTB bus match
logic currently has no way to force a specific driver.

Add a standard driver_override sysfs attribute to NTB devices and honor
it in the bus match callback. Also export ntb_bus_reprobe() so newly
loaded drivers can trigger probing of currently unbound NTB devices.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/core.c  | 68 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ntb.h |  4 +++
 2 files changed, 72 insertions(+)

diff --git a/drivers/ntb/core.c b/drivers/ntb/core.c
index ed6f4adc6130..404fa1433fab 100644
--- a/drivers/ntb/core.c
+++ b/drivers/ntb/core.c
@@ -56,6 +56,7 @@
 #include <linux/device.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/sysfs.h>
 
 #include <linux/ntb.h>
 #include <linux/pci.h>
@@ -298,10 +299,77 @@ static void ntb_dev_release(struct device *dev)
 	complete(&ntb->released);
 }
 
+static int ntb_bus_reprobe_one(struct device *dev, void *data)
+{
+	if (!dev->driver)
+		return device_attach(dev);
+	return 0;
+}
+
+void ntb_bus_reprobe(void)
+{
+	bus_for_each_dev(&ntb_bus, NULL, NULL, ntb_bus_reprobe_one);
+}
+EXPORT_SYMBOL_GPL(ntb_bus_reprobe);
+
+static ssize_t driver_override_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct ntb_dev *ntb = dev_ntb(dev);
+	ssize_t len;
+
+	device_lock(dev);
+	len = sysfs_emit(buf, "%s\n", ntb->driver_override);
+	device_unlock(dev);
+
+	return len;
+}
+
+static ssize_t driver_override_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	struct ntb_dev *ntb = dev_ntb(dev);
+	int ret;
+
+	ret = driver_set_override(dev, &ntb->driver_override, buf, count);
+	if (ret)
+		return ret;
+
+	return count;
+}
+static DEVICE_ATTR_RW(driver_override);
+
+static struct attribute *ntb_attrs[] = {
+	&dev_attr_driver_override.attr,
+	NULL,
+};
+
+static const struct attribute_group ntb_group = {
+	.attrs = ntb_attrs,
+};
+__ATTRIBUTE_GROUPS(ntb);
+
+static int ntb_match(struct device *dev, const struct device_driver *drv)
+{
+	struct ntb_dev *ntb = dev_ntb(dev);
+
+	/*
+	 * If driver_override is set, only allow binding to the named driver.
+	 * Otherwise keep the historical behavior (match all clients).
+	 */
+	if (ntb->driver_override)
+		return sysfs_streq(ntb->driver_override, drv->name);
+
+	return 1;
+}
+
 static const struct bus_type ntb_bus = {
 	.name = "ntb",
+	.match = ntb_match,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
+	.dev_groups = ntb_groups,
 };
 
 static int __init ntb_driver_init(void)
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 7ac8cb13e90d..d0115b0bb14b 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -431,6 +431,7 @@ struct ntb_client {
  * @ops:		See &ntb_dev_ops.
  * @ctx:		See &ntb_ctx_ops.
  * @ctx_ops:		See &ntb_ctx_ops.
+ * @driver_override:	Driver name to force a match.
  */
 struct ntb_dev {
 	struct device			dev;
@@ -439,6 +440,7 @@ struct ntb_dev {
 	const struct ntb_dev_ops	*ops;
 	void				*ctx;
 	const struct ntb_ctx_ops	*ctx_ops;
+	const char			*driver_override;
 
 	/* private: */
 
@@ -1770,4 +1772,6 @@ static inline int ntbm_msi_request_irq(struct ntb_dev *ntb,
 					     dev_id, msi_desc);
 }
 
+void ntb_bus_reprobe(void);
+
 #endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (11 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 12/38] NTB: core: Add driver_override support for NTB devices Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:26   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback Koichiro Den
                   ` (25 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

pci-epf-vntb can pack multiple memory windows into a single BAR using
mwN_offset. With the NTB core gaining support for programming multiple
translation ranges for a window, the EPF needs to provide the per-BAR
subrange layout to the endpoint controller (EPC).

Implement .mw_set_trans_ranges() for pci-epf-vntb. Track subranges for
each BAR and pass them to pci_epc_set_bar() so EPC drivers can select an
appropriate inbound mapping mode (e.g. Address Match mode on DesignWare
controllers) when subrange mappings are required.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 183 +++++++++++++++++-
 1 file changed, 175 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 39e784e21236..98128c2c5079 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -42,6 +42,7 @@
 #include <linux/log2.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/sort.h>
 
 #include <linux/pci-ep-msi.h>
 #include <linux/pci-epc.h>
@@ -144,6 +145,10 @@ struct epf_ntb {
 
 	enum pci_barno epf_ntb_bar[VNTB_BAR_NUM];
 
+	/* Cache for subrange mapping */
+	struct ntb_mw_subrange *mw_subrange[MAX_MW];
+	unsigned int num_subrange[MAX_MW];
+
 	struct epf_ntb_ctrl *reg;
 
 	u32 *epf_db;
@@ -736,6 +741,7 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
 		ntb->epf->bar[barno].flags |= upper_32_bits(size) ?
 				PCI_BASE_ADDRESS_MEM_TYPE_64 :
 				PCI_BASE_ADDRESS_MEM_TYPE_32;
+		ntb->epf->bar[barno].num_submap = 0;
 
 		ret = pci_epc_set_bar(ntb->epf->epc,
 				      ntb->epf->func_no,
@@ -1405,28 +1411,188 @@ static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
 	return 0;
 }
 
-static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
-		dma_addr_t addr, resource_size_t size)
+struct vntb_mw_order {
+	u64 off;
+	unsigned int mw;
+};
+
+static int vntb_cmp_mw_order(const void *a, const void *b)
+{
+	const struct vntb_mw_order *ma = a;
+	const struct vntb_mw_order *mb = b;
+
+	if (ma->off < mb->off)
+		return -1;
+	if (ma->off > mb->off)
+		return 1;
+	return 0;
+}
+
+static int vntb_epf_mw_set_trans_ranges(struct ntb_dev *ndev, int pidx, int idx,
+					unsigned int num_ranges,
+					const struct ntb_mw_subrange *ranges)
 {
 	struct epf_ntb *ntb = ntb_ndev(ndev);
+	struct pci_epf_bar_submap *submap;
+	struct vntb_mw_order mws[MAX_MW];
 	struct pci_epf_bar *epf_bar;
+	struct ntb_mw_subrange *r;
 	enum pci_barno barno;
+	struct device *dev, *epf_dev;
+	unsigned int total_ranges = 0;
+	unsigned int mw_cnt = 0;
+	unsigned int cur = 0;
+	u64 expected_off = 0;
+	unsigned int i, j;
 	int ret;
+
+	dev = &ntb->ntb->dev;
+	epf_dev = &ntb->epf->dev;
+	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
+	epf_bar = &ntb->epf->bar[barno];
+	epf_bar->barno = barno;
+
+	r = devm_kmemdup(epf_dev, ranges, num_ranges * sizeof(*ranges), GFP_KERNEL);
+	if (!r)
+		return -ENOMEM;
+
+	if (ntb->mw_subrange[idx])
+		devm_kfree(epf_dev, ntb->mw_subrange[idx]);
+
+	ntb->mw_subrange[idx] = r;
+	ntb->num_subrange[idx] = num_ranges;
+
+	/* Defer pci_epc_set_bar() until all MWs in this BAR have range info. */
+	for (i = 0; i < MAX_MW; i++) {
+		enum pci_barno bar = ntb->epf_ntb_bar[BAR_MW1 + i];
+
+		if (bar != barno)
+			continue;
+		if (!ntb->num_subrange[i])
+			return 0;
+
+		mws[mw_cnt].mw = i;
+		mws[mw_cnt].off = ntb->mws_offset[i];
+		mw_cnt++;
+	}
+
+	sort(mws, mw_cnt, sizeof(mws[0]), vntb_cmp_mw_order, NULL);
+
+	/* BAR submap must cover the whole BAR with no holes. */
+	for (i = 0; i < mw_cnt; i++) {
+		unsigned int mw = mws[i].mw;
+		u64 sum = 0;
+
+		if (mws[i].off != expected_off) {
+			dev_err(dev,
+				"BAR%d: hole/overlap at %#llx (MW%d@%#llx)\n",
+				barno, expected_off, mw + 1, mws[i].off);
+			return -EINVAL;
+		}
+
+		total_ranges += ntb->num_subrange[mw];
+		for (j = 0; j < ntb->num_subrange[mw]; j++)
+			sum += ntb->mw_subrange[mw][j].size;
+
+		if (sum != ntb->mws_size[mw]) {
+			dev_err(dev,
+				"MW%d: ranges size %#llx != window size %#llx\n",
+				mw + 1, sum, ntb->mws_size[mw]);
+			return -EINVAL;
+		}
+		expected_off += ntb->mws_size[mw];
+	}
+
+	submap = devm_krealloc_array(epf_dev, epf_bar->submap, total_ranges,
+				     sizeof(*submap), GFP_KERNEL);
+	if (!submap)
+		return -ENOMEM;
+
+	epf_bar->submap = submap;
+	epf_bar->num_submap = total_ranges;
+	dev_dbg(dev, "Requesting BAR%d layout (#. of subranges is %u):\n",
+		barno, total_ranges);
+
+	for (i = 0; i < mw_cnt; i++) {
+		unsigned int mw = mws[i].mw;
+
+		dev_dbg(dev, "- MW%d\n", 1 + mw);
+		for (j = 0; j < ntb->num_subrange[mw]; j++) {
+			dev_dbg(dev, "  - addr/size = %#llx/%#llx\n",
+				ntb->mw_subrange[mw][j].addr,
+				ntb->mw_subrange[mw][j].size);
+			submap[cur].phys_addr = ntb->mw_subrange[mw][j].addr;
+			submap[cur].size = ntb->mw_subrange[mw][j].size;
+			cur++;
+		}
+	}
+
+	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
+			      ntb->epf->vfunc_no, epf_bar);
+	if (ret)
+		dev_err(dev, "BAR%d: failed to program mappings for MW%d: %d\n",
+			barno, idx + 1, ret);
+
+	return ret;
+}
+
+static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
+				 dma_addr_t addr, resource_size_t size)
+{
+	struct epf_ntb *ntb = ntb_ndev(ndev);
+	struct pci_epf_bar *epf_bar;
+	resource_size_t bar_size;
+	enum pci_barno barno;
 	struct device *dev;
+	unsigned int i;
+	int ret;
 
 	dev = &ntb->ntb->dev;
 	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
 	epf_bar = &ntb->epf->bar[barno];
 	epf_bar->phys_addr = addr;
 	epf_bar->barno = barno;
-	epf_bar->size = size;
 
-	ret = pci_epc_set_bar(ntb->epf->epc, 0, 0, epf_bar);
-	if (ret) {
-		dev_err(dev, "failure set mw trans\n");
-		return ret;
+	bar_size = epf_bar->size;
+	if (!bar_size || !size)
+		return -EINVAL;
+
+	if (size != ntb->mws_size[idx])
+		return -EINVAL;
+
+	/*
+	 * Even if the caller intends to map the entire MW, the MW might
+	 * actually be just a part of the BAR. In that case, redirect the
+	 * handling to vntb_epf_mw_set_trans_ranges().
+	 */
+	if (size < bar_size) {
+		struct ntb_mw_subrange r = {
+			.addr = addr,
+			.size = size,
+		};
+		return vntb_epf_mw_set_trans_ranges(ndev, pidx, idx, 1, &r);
 	}
-	return 0;
+
+	/* Drop any stale cache for the BAR. */
+	for (i = 0; i < MAX_MW; i++) {
+		if (ntb->epf_ntb_bar[BAR_MW1 + i] != barno)
+			continue;
+		devm_kfree(&ntb->epf->dev, ntb->mw_subrange[i]);
+		ntb->mw_subrange[i] = NULL;
+		ntb->num_subrange[i] = 0;
+	}
+
+	/* Not use subrange mapping. If it's used in the past, clear it off. */
+	devm_kfree(&ntb->epf->dev, epf_bar->submap);
+	epf_bar->submap = NULL;
+	epf_bar->num_submap = 0;
+
+	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
+			      ntb->epf->vfunc_no, epf_bar);
+	if (ret)
+		dev_err(dev, "failure set mw trans\n");
+
+	return ret;
 }
 
 static int vntb_epf_mw_clear_trans(struct ntb_dev *ntb, int pidx, int idx)
@@ -1590,6 +1756,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
 	.db_vector_mask		= vntb_epf_db_vector_mask,
 	.db_set_mask		= vntb_epf_db_set_mask,
 	.mw_set_trans		= vntb_epf_mw_set_trans,
+	.mw_set_trans_ranges	= vntb_epf_mw_set_trans_ranges,
 	.mw_clear_trans		= vntb_epf_mw_clear_trans,
 	.peer_mw_get_addr	= vntb_epf_peer_mw_get_addr,
 	.link_enable		= vntb_epf_link_enable,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (12 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:27   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev() Koichiro Den
                   ` (24 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Implement the new get_private_data() operation for the EPF vNTB driver
to expose its associated EPC device to NTB subsystems.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 98128c2c5079..9fbc27000f77 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1747,6 +1747,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
 	return 0;
 }
 
+static void *vntb_epf_get_private_data(struct ntb_dev *ndev)
+{
+	struct epf_ntb *ntb = ntb_ndev(ndev);
+
+	if (!ntb || !ntb->epf)
+		return NULL;
+	return ntb->epf->epc;
+}
+
 static const struct ntb_dev_ops vntb_epf_ops = {
 	.mw_count		= vntb_epf_mw_count,
 	.spad_count		= vntb_epf_spad_count,
@@ -1771,6 +1780,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
 	.db_clear_mask		= vntb_epf_db_clear_mask,
 	.db_clear		= vntb_epf_db_clear,
 	.link_disable		= vntb_epf_link_disable,
+	.get_private_data	= vntb_epf_get_private_data,
 };
 
 static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (13 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:30   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
                   ` (23 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

For DMA API allocations and mappings, pci-epf-vntb should provide an
appropriate struct device for the NTB core/clients.

Implement .get_dma_dev() and return the EPC parent device.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 9fbc27000f77..7cd976757d15 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1747,6 +1747,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
 	return 0;
 }
 
+static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
+{
+	struct epf_ntb *ntb = ntb_ndev(ndev);
+
+	if (!ntb || !ntb->epf)
+		return NULL;
+	return ntb->epf->epc->dev.parent;
+}
+
 static void *vntb_epf_get_private_data(struct ntb_dev *ndev)
 {
 	struct epf_ntb *ntb = ntb_ndev(ndev);
@@ -1780,6 +1789,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
 	.db_clear_mask		= vntb_epf_db_clear_mask,
 	.db_clear		= vntb_epf_db_clear,
 	.link_disable		= vntb_epf_link_disable,
+	.get_dma_dev		= vntb_epf_get_dma_dev,
 	.get_private_data	= vntb_epf_get_private_data,
 };
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (14 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev() Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:36   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 17/38] NTB: ntb_transport: Dynamically determine qp count Koichiro Den
                   ` (22 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Historically both TX and RX have assumed the same per-QP MW slice
(tx_max_entry == remote rx_max_entry), while those are calculated
separately in different places (pre and post the link-up negotiation
point). This has been safe because nt->link_is_up is never set to true
unless the pre-determined qp_count are the same among them, and qp_count
is typically limited to nt->mw_count, which should be carefully
configured by admin.

However, setup_qp_mw can actually split mw and handle multi-qps in one
MW properly, so qp_count needs not to be limited by nt->mw_count. Once
we relax the limitation, pre-determined qp_count can differ among host
side and endpoint, and link-up negotiation can easily fail.

Move the TX MW configuration (per-QP offset and size) into
ntb_transport_setup_qp_mw() so that both RX and TX layout decisions are
centralized in a single helper. ntb_transport_init_queue() now deals
only with per-QP software state, not with MW layout.

This keeps the previous behavior, while preparing for relaxing the
qp_count limitation and improving readability.

No functional change is intended.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport.c | 76 ++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 44 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index d5a544bf8fd6..57a21f2daac6 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -569,7 +569,10 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 	struct ntb_transport_mw *mw;
 	struct ntb_dev *ndev = nt->ndev;
 	struct ntb_queue_entry *entry;
-	unsigned int rx_size, num_qps_mw;
+	phys_addr_t mw_base;
+	resource_size_t mw_size;
+	unsigned int rx_size, tx_size, num_qps_mw;
+	u64 qp_offset;
 	unsigned int mw_num, mw_count, qp_count;
 	unsigned int i;
 	int node;
@@ -588,13 +591,38 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 	else
 		num_qps_mw = qp_count / mw_count;
 
-	rx_size = (unsigned int)mw->xlat_size / num_qps_mw;
-	qp->rx_buff = mw->virt_addr + rx_size * (qp_num / mw_count);
-	rx_size -= sizeof(struct ntb_rx_info);
+	mw_base = nt->mw_vec[mw_num].phys_addr;
+	mw_size = nt->mw_vec[mw_num].phys_size;
+
+	if (mw_size > mw->xlat_size)
+		mw_size = mw->xlat_size;
+	if (max_mw_size && mw_size > max_mw_size)
+		mw_size = max_mw_size;
+
+	tx_size = (unsigned int)mw_size / num_qps_mw;
+	qp_offset = tx_size * (qp_num / mw_count);
+
+	qp->rx_buff = mw->virt_addr + qp_offset;
+
+	qp->tx_mw_size = tx_size;
+	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
+	if (!qp->tx_mw)
+		return -EINVAL;
+
+	qp->tx_mw_phys = mw_base + qp_offset;
+	if (!qp->tx_mw_phys)
+		return -EINVAL;
 
+	rx_size = tx_size;
+	rx_size -= sizeof(struct ntb_rx_info);
 	qp->remote_rx_info = qp->rx_buff + rx_size;
 
+	tx_size -= sizeof(struct ntb_rx_info);
+	qp->rx_info = qp->tx_mw + tx_size;
+
 	/* Due to housekeeping, there must be atleast 2 buffs */
+	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
+	qp->tx_max_entry = tx_size / qp->tx_max_frame;
 	qp->rx_max_frame = min(transport_mtu, rx_size / 2);
 	qp->rx_max_entry = rx_size / qp->rx_max_frame;
 	qp->rx_index = 0;
@@ -1132,16 +1160,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
 				    unsigned int qp_num)
 {
 	struct ntb_transport_qp *qp;
-	phys_addr_t mw_base;
-	resource_size_t mw_size;
-	unsigned int num_qps_mw, tx_size;
-	unsigned int mw_num, mw_count, qp_count;
-	u64 qp_offset;
-
-	mw_count = nt->mw_count;
-	qp_count = nt->qp_count;
-
-	mw_num = QP_TO_MW(nt, qp_num);
 
 	qp = &nt->qp_vec[qp_num];
 	qp->qp_num = qp_num;
@@ -1151,36 +1169,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
 	qp->event_handler = NULL;
 	ntb_qp_link_context_reset(qp);
 
-	if (mw_num < qp_count % mw_count)
-		num_qps_mw = qp_count / mw_count + 1;
-	else
-		num_qps_mw = qp_count / mw_count;
-
-	mw_base = nt->mw_vec[mw_num].phys_addr;
-	mw_size = nt->mw_vec[mw_num].phys_size;
-
-	if (max_mw_size && mw_size > max_mw_size)
-		mw_size = max_mw_size;
-
-	tx_size = (unsigned int)mw_size / num_qps_mw;
-	qp_offset = tx_size * (qp_num / mw_count);
-
-	qp->tx_mw_size = tx_size;
-	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
-	if (!qp->tx_mw)
-		return -EINVAL;
-
-	qp->tx_mw_phys = mw_base + qp_offset;
-	if (!qp->tx_mw_phys)
-		return -EINVAL;
-
-	tx_size -= sizeof(struct ntb_rx_info);
-	qp->rx_info = qp->tx_mw + tx_size;
-
-	/* Due to housekeeping, there must be atleast 2 buffs */
-	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
-	qp->tx_max_entry = tx_size / qp->tx_max_frame;
-
 	if (nt->debugfs_node_dir) {
 		char debugfs_name[8];
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 17/38] NTB: ntb_transport: Dynamically determine qp count
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (15 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev() Koichiro Den
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

One MW can host multiple queue pairs, so stop limiting qp_count to the
number of MWs.

Now that both TX and RX MW sizing are done in the same place, the MW
layout is derived from a single code path on both host and endpoint, so
the layout cannot diverge between the two sides.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 57a21f2daac6..6ed680d0470f 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1022,7 +1022,9 @@ static void ntb_transport_link_work(struct work_struct *work)
 		container_of(work, struct ntb_transport_ctx, link_work.work);
 	struct ntb_dev *ndev = nt->ndev;
 	struct pci_dev *pdev = ndev->pdev;
+	struct ntb_transport_qp *qp;
 	resource_size_t size;
+	u64 qp_bitmap_free;
 	u32 val;
 	int rc = 0, i, spad;
 
@@ -1070,8 +1072,28 @@ static void ntb_transport_link_work(struct work_struct *work)
 
 	val = ntb_spad_read(ndev, NUM_QPS);
 	dev_dbg(&pdev->dev, "Remote max number of qps = %d\n", val);
-	if (val != nt->qp_count)
+	if (val == 0) {
 		goto out;
+	} else if (val < nt->qp_count) {
+		/*
+		 * Clamp local qp_count to peer-advertised NUM_QPS to avoid
+		 * mismatched queues.
+		 */
+		qp_bitmap_free = nt->qp_bitmap_free;
+		for (i = val; i < nt->qp_count; i++) {
+			qp = &nt->qp_vec[i];
+			ntb_transport_free_queue(qp);
+			debugfs_remove_recursive(qp->debugfs_dir);
+
+			/* Do not expose the queue any longer */
+			nt->qp_bitmap &= ~BIT_ULL(i);
+			nt->qp_bitmap_free &= ~BIT_ULL(i);
+		}
+		dev_warn(&pdev->dev,
+			 "Local number of qps is reduced: %d->%d (%#llx->%#llx)\n",
+			 nt->qp_count, val, qp_bitmap_free, nt->qp_bitmap_free);
+		nt->qp_count = val;
+	}
 
 	val = ntb_spad_read(ndev, NUM_MWS);
 	dev_dbg(&pdev->dev, "Remote number of mws = %d\n", val);
@@ -1300,8 +1322,6 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 
 	if (max_num_clients && max_num_clients < qp_count)
 		qp_count = max_num_clients;
-	else if (nt->mw_count < qp_count)
-		qp_count = nt->mw_count;
 
 	qp_bitmap &= BIT_ULL(qp_count) - 1;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev()
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (16 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 17/38] NTB: ntb_transport: Dynamically determine qp count Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:38   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 19/38] NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c Koichiro Den
                   ` (20 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Replace direct use of ntb->pdev with ntb_get_dma_dev() for DMA-safe
allocations and frees. This allows ntb_transport to operate on NTB
implementations that are not backed by a PCI device from IOMMU
perspective.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 6ed680d0470f..7b320249629c 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -771,13 +771,13 @@ static void ntb_transport_msi_desc_changed(void *data)
 static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
 {
 	struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
-	struct pci_dev *pdev = nt->ndev->pdev;
+	struct device *dev = ntb_get_dma_dev(nt->ndev);
 
-	if (!mw->virt_addr)
+	if (!dev || !mw->virt_addr)
 		return;
 
 	ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
-	dma_free_coherent(&pdev->dev, mw->alloc_size,
+	dma_free_coherent(dev, mw->alloc_size,
 			  mw->alloc_addr, mw->dma_addr);
 	mw->xlat_size = 0;
 	mw->buff_size = 0;
@@ -847,13 +847,13 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
 		      resource_size_t size)
 {
 	struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
-	struct pci_dev *pdev = nt->ndev->pdev;
+	struct device *dev = ntb_get_dma_dev(nt->ndev);
 	size_t xlat_size, buff_size;
 	resource_size_t xlat_align;
 	resource_size_t xlat_align_size;
 	int rc;
 
-	if (!size)
+	if (!dev || !size)
 		return -EINVAL;
 
 	rc = ntb_mw_get_align(nt->ndev, PIDX, num_mw, &xlat_align,
@@ -876,12 +876,12 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
 	mw->buff_size = buff_size;
 	mw->alloc_size = buff_size;
 
-	rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
+	rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
 	if (rc) {
 		mw->alloc_size *= 2;
-		rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
+		rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
 		if (rc) {
-			dev_err(&pdev->dev,
+			dev_err(dev,
 				"Unable to alloc aligned MW buff\n");
 			mw->xlat_size = 0;
 			mw->buff_size = 0;
@@ -894,7 +894,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
 	rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
 			      mw->xlat_size);
 	if (rc) {
-		dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
+		dev_err(dev, "Unable to set mw%d translation", num_mw);
 		ntb_free_mw(nt, num_mw);
 		return -EIO;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 19/38] NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (17 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev() Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 20/38] NTB: ntb_transport: Move internal types to ntb_transport_internal.h Koichiro Den
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Prepare for splitting the transport code into a reusable core library
and separate client modules. Rename the current implementation file to
ntb_transport_core.c to reflect its role and to keep follow-up diffs
reviewable.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/Makefile                                  | 2 ++
 drivers/ntb/{ntb_transport.c => ntb_transport_core.c} | 0
 2 files changed, 2 insertions(+)
 rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (100%)

diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 3a6fa181ff99..9b66e5fafbc0 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -4,3 +4,5 @@ obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
 
 ntb-y			:= core.o
 ntb-$(CONFIG_NTB_MSI)	+= msi.o
+
+ntb_transport-y		:= ntb_transport_core.o
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport_core.c
similarity index 100%
rename from drivers/ntb/ntb_transport.c
rename to drivers/ntb/ntb_transport_core.c
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 20/38] NTB: ntb_transport: Move internal types to ntb_transport_internal.h
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (18 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 19/38] NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 21/38] NTB: ntb_transport: Export common helpers for modularization Koichiro Den
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Move internal structs and definitions from ntb_transport_core.c into a
new internal header so they can be shared by upcoming split-out modules.

No functional change.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport_core.c     | 144 +-----------------------
 drivers/ntb/ntb_transport_internal.h | 159 +++++++++++++++++++++++++++
 2 files changed, 161 insertions(+), 142 deletions(-)
 create mode 100644 drivers/ntb/ntb_transport_internal.h

diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 7b320249629c..71f01fa0ff05 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -64,6 +64,8 @@
 #include "linux/ntb.h"
 #include "linux/ntb_transport.h"
 
+#include "ntb_transport_internal.h"
+
 #define NTB_TRANSPORT_VERSION	4
 #define NTB_TRANSPORT_VER	"4"
 #define NTB_TRANSPORT_NAME	"ntb_transport"
@@ -106,153 +108,12 @@ static struct dentry *nt_debugfs_dir;
 /* Only two-ports NTB devices are supported */
 #define PIDX		NTB_DEF_PEER_IDX
 
-struct ntb_queue_entry {
-	/* ntb_queue list reference */
-	struct list_head entry;
-	/* pointers to data to be transferred */
-	void *cb_data;
-	void *buf;
-	unsigned int len;
-	unsigned int flags;
-	int retries;
-	int errors;
-	unsigned int tx_index;
-	unsigned int rx_index;
-
-	struct ntb_transport_qp *qp;
-	union {
-		struct ntb_payload_header __iomem *tx_hdr;
-		struct ntb_payload_header *rx_hdr;
-	};
-};
-
-struct ntb_rx_info {
-	unsigned int entry;
-};
-
-struct ntb_transport_qp {
-	struct ntb_transport_ctx *transport;
-	struct ntb_dev *ndev;
-	void *cb_data;
-	struct dma_chan *tx_dma_chan;
-	struct dma_chan *rx_dma_chan;
-
-	bool client_ready;
-	bool link_is_up;
-	bool active;
-
-	u8 qp_num;	/* Only 64 QP's are allowed.  0-63 */
-	u64 qp_bit;
-
-	struct ntb_rx_info __iomem *rx_info;
-	struct ntb_rx_info *remote_rx_info;
-
-	void (*tx_handler)(struct ntb_transport_qp *qp, void *qp_data,
-			   void *data, int len);
-	struct list_head tx_free_q;
-	spinlock_t ntb_tx_free_q_lock;
-	void __iomem *tx_mw;
-	phys_addr_t tx_mw_phys;
-	size_t tx_mw_size;
-	dma_addr_t tx_mw_dma_addr;
-	unsigned int tx_index;
-	unsigned int tx_max_entry;
-	unsigned int tx_max_frame;
-
-	void (*rx_handler)(struct ntb_transport_qp *qp, void *qp_data,
-			   void *data, int len);
-	struct list_head rx_post_q;
-	struct list_head rx_pend_q;
-	struct list_head rx_free_q;
-	/* ntb_rx_q_lock: synchronize access to rx_XXXX_q */
-	spinlock_t ntb_rx_q_lock;
-	void *rx_buff;
-	unsigned int rx_index;
-	unsigned int rx_max_entry;
-	unsigned int rx_max_frame;
-	unsigned int rx_alloc_entry;
-	dma_cookie_t last_cookie;
-	struct tasklet_struct rxc_db_work;
-
-	void (*event_handler)(void *data, int status);
-	struct delayed_work link_work;
-	struct work_struct link_cleanup;
-
-	struct dentry *debugfs_dir;
-	struct dentry *debugfs_stats;
-
-	/* Stats */
-	u64 rx_bytes;
-	u64 rx_pkts;
-	u64 rx_ring_empty;
-	u64 rx_err_no_buf;
-	u64 rx_err_oflow;
-	u64 rx_err_ver;
-	u64 rx_memcpy;
-	u64 rx_async;
-	u64 tx_bytes;
-	u64 tx_pkts;
-	u64 tx_ring_full;
-	u64 tx_err_no_buf;
-	u64 tx_memcpy;
-	u64 tx_async;
-
-	bool use_msi;
-	int msi_irq;
-	struct ntb_msi_desc msi_desc;
-	struct ntb_msi_desc peer_msi_desc;
-};
-
-struct ntb_transport_mw {
-	phys_addr_t phys_addr;
-	resource_size_t phys_size;
-	void __iomem *vbase;
-	size_t xlat_size;
-	size_t buff_size;
-	size_t alloc_size;
-	void *alloc_addr;
-	void *virt_addr;
-	dma_addr_t dma_addr;
-};
-
 struct ntb_transport_client_dev {
 	struct list_head entry;
 	struct ntb_transport_ctx *nt;
 	struct device dev;
 };
 
-struct ntb_transport_ctx {
-	struct list_head entry;
-	struct list_head client_devs;
-
-	struct ntb_dev *ndev;
-
-	struct ntb_transport_mw *mw_vec;
-	struct ntb_transport_qp *qp_vec;
-	unsigned int mw_count;
-	unsigned int qp_count;
-	u64 qp_bitmap;
-	u64 qp_bitmap_free;
-
-	bool use_msi;
-	unsigned int msi_spad_offset;
-	u64 msi_db_mask;
-
-	bool link_is_up;
-	struct delayed_work link_work;
-	struct work_struct link_cleanup;
-
-	struct dentry *debugfs_node_dir;
-
-	/* Make sure workq of link event be executed serially */
-	struct mutex link_event_lock;
-};
-
-enum {
-	DESC_DONE_FLAG = BIT(0),
-	LINK_DOWN_FLAG = BIT(1),
-};
-
 struct ntb_payload_header {
 	unsigned int ver;
 	unsigned int len;
@@ -274,7 +135,6 @@ enum {
 #define drv_client(__drv) \
 	container_of((__drv), struct ntb_transport_client, driver)
 
-#define QP_TO_MW(nt, qp)	((qp) % nt->mw_count)
 #define NTB_QP_DEF_NUM_ENTRIES	100
 #define NTB_LINK_DOWN_TIMEOUT	10
 
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
new file mode 100644
index 000000000000..aff9b70671c6
--- /dev/null
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/*
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ * Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ */
+#ifndef _NTB_TRANSPORT_INTERNAL_H_
+#define _NTB_TRANSPORT_INTERNAL_H_
+
+#include <linux/dcache.h>
+#include <linux/dmaengine.h>
+#include <linux/mutex.h>
+#include <linux/ntb.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#define QP_TO_MW(nt, qp)	((qp) % nt->mw_count)
+
+struct ntb_queue_entry {
+	/* ntb_queue list reference */
+	struct list_head entry;
+	/* pointers to data to be transferred */
+	void *cb_data;
+	void *buf;
+	unsigned int len;
+	unsigned int flags;
+	int retries;
+	int errors;
+	unsigned int tx_index;
+	unsigned int rx_index;
+
+	struct ntb_transport_qp *qp;
+	union {
+		struct ntb_payload_header __iomem *tx_hdr;
+		struct ntb_payload_header *rx_hdr;
+	};
+};
+
+struct ntb_rx_info {
+	unsigned int entry;
+};
+
+struct ntb_transport_qp {
+	struct ntb_transport_ctx *transport;
+	struct ntb_dev *ndev;
+	void *cb_data;
+	struct dma_chan *tx_dma_chan;
+	struct dma_chan *rx_dma_chan;
+
+	bool client_ready;
+	bool link_is_up;
+	bool active;
+
+	u8 qp_num;	/* Only 64 QP's are allowed.  0-63 */
+	u64 qp_bit;
+
+	struct ntb_rx_info __iomem *rx_info;
+	struct ntb_rx_info *remote_rx_info;
+
+	void (*tx_handler)(struct ntb_transport_qp *qp, void *qp_data,
+			   void *data, int len);
+	struct list_head tx_free_q;
+	spinlock_t ntb_tx_free_q_lock;
+	void __iomem *tx_mw;
+	phys_addr_t tx_mw_phys;
+	size_t tx_mw_size;
+	dma_addr_t tx_mw_dma_addr;
+	unsigned int tx_index;
+	unsigned int tx_max_entry;
+	unsigned int tx_max_frame;
+
+	void (*rx_handler)(struct ntb_transport_qp *qp, void *qp_data,
+			   void *data, int len);
+	struct list_head rx_post_q;
+	struct list_head rx_pend_q;
+	struct list_head rx_free_q;
+	/* ntb_rx_q_lock: synchronize access to rx_XXXX_q */
+	spinlock_t ntb_rx_q_lock;
+	void *rx_buff;
+	unsigned int rx_index;
+	unsigned int rx_max_entry;
+	unsigned int rx_max_frame;
+	unsigned int rx_alloc_entry;
+	dma_cookie_t last_cookie;
+	struct tasklet_struct rxc_db_work;
+
+	void (*event_handler)(void *data, int status);
+	struct delayed_work link_work;
+	struct work_struct link_cleanup;
+
+	struct dentry *debugfs_dir;
+	struct dentry *debugfs_stats;
+
+	/* Stats */
+	u64 rx_bytes;
+	u64 rx_pkts;
+	u64 rx_ring_empty;
+	u64 rx_err_no_buf;
+	u64 rx_err_oflow;
+	u64 rx_err_ver;
+	u64 rx_memcpy;
+	u64 rx_async;
+	u64 tx_bytes;
+	u64 tx_pkts;
+	u64 tx_ring_full;
+	u64 tx_err_no_buf;
+	u64 tx_memcpy;
+	u64 tx_async;
+
+	bool use_msi;
+	int msi_irq;
+	struct ntb_msi_desc msi_desc;
+	struct ntb_msi_desc peer_msi_desc;
+};
+
+struct ntb_transport_mw {
+	phys_addr_t phys_addr;
+	resource_size_t phys_size;
+	void __iomem *vbase;
+	size_t xlat_size;
+	size_t buff_size;
+	size_t alloc_size;
+	void *alloc_addr;
+	void *virt_addr;
+	dma_addr_t dma_addr;
+};
+
+struct ntb_transport_ctx {
+	struct list_head entry;
+	struct list_head client_devs;
+
+	struct ntb_dev *ndev;
+
+	struct ntb_transport_mw *mw_vec;
+	struct ntb_transport_qp *qp_vec;
+	unsigned int mw_count;
+	unsigned int qp_count;
+	u64 qp_bitmap;
+	u64 qp_bitmap_free;
+
+	bool use_msi;
+	unsigned int msi_spad_offset;
+	u64 msi_db_mask;
+
+	bool link_is_up;
+	struct delayed_work link_work;
+	struct work_struct link_cleanup;
+
+	struct dentry *debugfs_node_dir;
+
+	/* Make sure workq of link event be executed serially */
+	struct mutex link_event_lock;
+};
+
+enum {
+	DESC_DONE_FLAG = BIT(0),
+	LINK_DOWN_FLAG = BIT(1),
+};
+
+#endif /* _NTB_TRANSPORT_INTERNAL_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 21/38] NTB: ntb_transport: Export common helpers for modularization
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (19 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 20/38] NTB: ntb_transport: Move internal types to ntb_transport_internal.h Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 22/38] NTB: ntb_transport: Split core library and default NTB client Koichiro Den
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Upcoming changes introduce multiple ntb_client drivers and transport
backends that share common list/queue helpers.

Export the shared helper functions and declare them in the internal
header so they can be reused by split-out code.

No functional change.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport_core.c     | 17 +++++++++--------
 drivers/ntb/ntb_transport_internal.h |  7 +++++++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 71f01fa0ff05..04a13fdce71c 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -371,8 +371,7 @@ static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
 }
 DEFINE_SHOW_ATTRIBUTE(ntb_qp_debugfs_stats);
 
-static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
-			 struct list_head *list)
+void ntb_list_add(spinlock_t *lock, struct list_head *entry, struct list_head *list)
 {
 	unsigned long flags;
 
@@ -380,9 +379,9 @@ static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
 	list_add_tail(entry, list);
 	spin_unlock_irqrestore(lock, flags);
 }
+EXPORT_SYMBOL_GPL(ntb_list_add);
 
-static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
-					   struct list_head *list)
+struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list)
 {
 	struct ntb_queue_entry *entry;
 	unsigned long flags;
@@ -400,10 +399,10 @@ static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
 
 	return entry;
 }
+EXPORT_SYMBOL_GPL(ntb_list_rm);
 
-static struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock,
-					   struct list_head *list,
-					   struct list_head *to_list)
+struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
+				    struct list_head *to_list)
 {
 	struct ntb_queue_entry *entry;
 	unsigned long flags;
@@ -421,6 +420,7 @@ static struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock,
 
 	return entry;
 }
+EXPORT_SYMBOL_GPL(ntb_list_mv);
 
 static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 				     unsigned int qp_num)
@@ -820,10 +820,11 @@ static void ntb_qp_link_cleanup_work(struct work_struct *work)
 				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
 }
 
-static void ntb_qp_link_down(struct ntb_transport_qp *qp)
+void ntb_qp_link_down(struct ntb_transport_qp *qp)
 {
 	schedule_work(&qp->link_cleanup);
 }
+EXPORT_SYMBOL_GPL(ntb_qp_link_down);
 
 static void ntb_transport_link_cleanup(struct ntb_transport_ctx *nt)
 {
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index aff9b70671c6..6b45790cc88e 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -156,4 +156,11 @@ enum {
 	LINK_DOWN_FLAG = BIT(1),
 };
 
+void ntb_list_add(spinlock_t *lock, struct list_head *entry,
+		  struct list_head *list);
+struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list);
+struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
+				    struct list_head *to_list);
+void ntb_qp_link_down(struct ntb_transport_qp *qp);
+
 #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 22/38] NTB: ntb_transport: Split core library and default NTB client
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (20 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 21/38] NTB: ntb_transport: Export common helpers for modularization Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 23/38] NTB: ntb_transport: Add transport backend infrastructure Koichiro Den
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

The transport code is being refactored to support multiple clients and
transport backends. As a first step, split the module into a core
library and a thin default client that binds to NTB ports.

Move module parameters and the legacy ntb_client glue into a new
ntb_transport.c. Export ntb_transport_attach()/ntb_transport_detach()
from the core so other clients can reuse the common transport
infrastructure.

No functional change intended for the default transport.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/Makefile                 |  3 +-
 drivers/ntb/ntb_transport.c          | 81 ++++++++++++++++++++++++
 drivers/ntb/ntb_transport_core.c     | 93 ++++++++++------------------
 drivers/ntb/ntb_transport_internal.h | 15 +++++
 4 files changed, 128 insertions(+), 64 deletions(-)
 create mode 100644 drivers/ntb/ntb_transport.c

diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 9b66e5fafbc0..47e6b95ef7ce 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,8 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
+obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport_core.o
 
 ntb-y			:= core.o
 ntb-$(CONFIG_NTB_MSI)	+= msi.o
-
-ntb_transport-y		:= ntb_transport_core.o
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
new file mode 100644
index 000000000000..dafb97e38883
--- /dev/null
+++ b/drivers/ntb/ntb_transport.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ * Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *
+ * Default NTB transport client module.
+ *
+ * The transport core library and backend infrastructure are implemented in
+ * ntb_transport_core.c. This module provides the default client that binds
+ * to NTB ports and instantiates the default transport for consumers such
+ * as ntb_netdev.
+ */
+
+#include <linux/module.h>
+#include <linux/ntb.h>
+
+#include "ntb_transport_internal.h"
+
+static unsigned long max_mw_size;
+module_param(max_mw_size, ulong, 0644);
+MODULE_PARM_DESC(max_mw_size, "Limit size of large memory windows");
+
+static unsigned int transport_mtu = 0x10000;
+module_param(transport_mtu, uint, 0644);
+MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
+
+static unsigned char max_num_clients;
+module_param(max_num_clients, byte, 0644);
+MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
+
+static unsigned int copy_bytes = 1024;
+module_param(copy_bytes, uint, 0644);
+MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
+
+static bool use_dma;
+module_param(use_dma, bool, 0644);
+MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
+
+static bool use_msi;
+#ifdef CONFIG_NTB_MSI
+module_param(use_msi, bool, 0644);
+MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
+#endif
+
+#define NTB_QP_DEF_NUM_ENTRIES	100
+
+static int ntb_transport_host_probe(struct ntb_client *self,
+				    struct ntb_dev *ndev)
+{
+	return ntb_transport_attach(ndev, "default", use_msi, max_mw_size,
+				    transport_mtu, max_num_clients, copy_bytes,
+				    use_dma, NTB_QP_DEF_NUM_ENTRIES);
+}
+
+static void ntb_transport_host_remove(struct ntb_client *self, struct ntb_dev *ndev)
+{
+	ntb_transport_detach(ndev);
+}
+
+static struct ntb_client ntb_transport_host_client = {
+	.ops = {
+		.probe = ntb_transport_host_probe,
+		.remove = ntb_transport_host_remove,
+	},
+};
+
+static int __init ntb_transport_host_init(void)
+{
+	return ntb_register_client(&ntb_transport_host_client);
+}
+module_init(ntb_transport_host_init);
+
+static void __exit ntb_transport_host_exit(void)
+{
+	ntb_unregister_client(&ntb_transport_host_client);
+}
+module_exit(ntb_transport_host_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Software Queue-Pair Transport over NTB");
diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 04a13fdce71c..86181fe1eadd 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -69,7 +69,7 @@
 #define NTB_TRANSPORT_VERSION	4
 #define NTB_TRANSPORT_VER	"4"
 #define NTB_TRANSPORT_NAME	"ntb_transport"
-#define NTB_TRANSPORT_DESC	"Software Queue-Pair Transport over NTB"
+#define NTB_TRANSPORT_DESC	"NTB transport core library"
 #define NTB_TRANSPORT_MIN_SPADS (MW0_SZ_HIGH + 2)
 
 MODULE_DESCRIPTION(NTB_TRANSPORT_DESC);
@@ -77,31 +77,6 @@ MODULE_VERSION(NTB_TRANSPORT_VER);
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Intel Corporation");
 
-static unsigned long max_mw_size;
-module_param(max_mw_size, ulong, 0644);
-MODULE_PARM_DESC(max_mw_size, "Limit size of large memory windows");
-
-static unsigned int transport_mtu = 0x10000;
-module_param(transport_mtu, uint, 0644);
-MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
-
-static unsigned char max_num_clients;
-module_param(max_num_clients, byte, 0644);
-MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
-
-static unsigned int copy_bytes = 1024;
-module_param(copy_bytes, uint, 0644);
-MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
-
-static bool use_dma;
-module_param(use_dma, bool, 0644);
-MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
-
-static bool use_msi;
-#ifdef CONFIG_NTB_MSI
-module_param(use_msi, bool, 0644);
-MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
-#endif
 
 static struct dentry *nt_debugfs_dir;
 
@@ -135,12 +110,10 @@ enum {
 #define drv_client(__drv) \
 	container_of((__drv), struct ntb_transport_client, driver)
 
-#define NTB_QP_DEF_NUM_ENTRIES	100
 #define NTB_LINK_DOWN_TIMEOUT	10
 
 static void ntb_transport_rxc_db(unsigned long data);
 static const struct ntb_ctx_ops ntb_transport_ops;
-static struct ntb_client ntb_transport_client;
 static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
 			       struct ntb_queue_entry *entry);
 static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset);
@@ -456,8 +429,8 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 
 	if (mw_size > mw->xlat_size)
 		mw_size = mw->xlat_size;
-	if (max_mw_size && mw_size > max_mw_size)
-		mw_size = max_mw_size;
+	if (nt->max_mw_size && mw_size > nt->max_mw_size)
+		mw_size = nt->max_mw_size;
 
 	tx_size = (unsigned int)mw_size / num_qps_mw;
 	qp_offset = tx_size * (qp_num / mw_count);
@@ -481,9 +454,9 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 	qp->rx_info = qp->tx_mw + tx_size;
 
 	/* Due to housekeeping, there must be atleast 2 buffs */
-	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
+	qp->tx_max_frame = min(nt->transport_mtu, tx_size / 2);
 	qp->tx_max_entry = tx_size / qp->tx_max_frame;
-	qp->rx_max_frame = min(transport_mtu, rx_size / 2);
+	qp->rx_max_frame = min(nt->transport_mtu, rx_size / 2);
 	qp->rx_max_entry = rx_size / qp->rx_max_frame;
 	qp->rx_index = 0;
 
@@ -909,8 +882,8 @@ static void ntb_transport_link_work(struct work_struct *work)
 	for (i = 0; i < nt->mw_count; i++) {
 		size = nt->mw_vec[i].phys_size;
 
-		if (max_mw_size && size > max_mw_size)
-			size = max_mw_size;
+		if (nt->max_mw_size && size > nt->max_mw_size)
+			size = nt->max_mw_size;
 
 		spad = MW0_SZ_HIGH + (i * 2);
 		ntb_peer_spad_write(ndev, PIDX, spad, upper_32_bits(size));
@@ -1084,7 +1057,12 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
 	return 0;
 }
 
-static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
+int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
+			 bool use_msi, unsigned long max_mw_size,
+			 unsigned int transport_mtu,
+			 unsigned char max_num_clients,
+			 unsigned int copy_bytes, bool use_dma,
+			 unsigned int num_rx_entries)
 {
 	struct ntb_transport_ctx *nt;
 	struct ntb_transport_mw *mw;
@@ -1117,6 +1095,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 		return -ENOMEM;
 
 	nt->ndev = ndev;
+	nt->max_mw_size = max_mw_size;
+	nt->transport_mtu = transport_mtu;
+	nt->copy_bytes = copy_bytes;
+	nt->use_dma = use_dma;
+	nt->num_rx_entries = num_rx_entries;
 
 	/*
 	 * If we are using MSI, and have at least one extra memory window,
@@ -1241,8 +1224,9 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 	kfree(nt);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(ntb_transport_attach);
 
-static void ntb_transport_free(struct ntb_client *self, struct ntb_dev *ndev)
+void ntb_transport_detach(struct ntb_dev *ndev)
 {
 	struct ntb_transport_ctx *nt = ndev->ctx;
 	struct ntb_transport_qp *qp;
@@ -1262,6 +1246,7 @@ static void ntb_transport_free(struct ntb_client *self, struct ntb_dev *ndev)
 			ntb_transport_free_queue(qp);
 		debugfs_remove_recursive(qp->debugfs_dir);
 	}
+	debugfs_remove(nt->debugfs_node_dir);
 
 	ntb_link_disable(ndev);
 	ntb_clear_ctx(ndev);
@@ -1277,6 +1262,7 @@ static void ntb_transport_free(struct ntb_client *self, struct ntb_dev *ndev)
 	kfree(nt->mw_vec);
 	kfree(nt);
 }
+EXPORT_SYMBOL_GPL(ntb_transport_detach);
 
 static void ntb_complete_rxc(struct ntb_transport_qp *qp)
 {
@@ -1438,7 +1424,7 @@ static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset)
 	if (!chan)
 		goto err;
 
-	if (entry->len < copy_bytes)
+	if (entry->len < qp->transport->copy_bytes)
 		goto err;
 
 	res = ntb_async_rx_submit(entry, offset);
@@ -1718,7 +1704,7 @@ static void ntb_async_tx(struct ntb_transport_qp *qp,
 	if (!chan)
 		goto err;
 
-	if (entry->len < copy_bytes)
+	if (entry->len < qp->transport->copy_bytes)
 		goto err;
 
 	res = ntb_async_tx_submit(qp, entry);
@@ -1856,7 +1842,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 	dma_cap_zero(dma_mask);
 	dma_cap_set(DMA_MEMCPY, dma_mask);
 
-	if (use_dma) {
+	if (nt->use_dma) {
 		qp->tx_dma_chan =
 			dma_request_channel(dma_mask, ntb_dma_filter_fn,
 					    (void *)(unsigned long)node);
@@ -1892,7 +1878,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 	dev_dbg(&pdev->dev, "Using %s memcpy for RX\n",
 		qp->rx_dma_chan ? "DMA" : "CPU");
 
-	for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+	for (i = 0; i < nt->num_rx_entries; i++) {
 		entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
 		if (!entry)
 			goto err1;
@@ -1901,7 +1887,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 		ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
 			     &qp->rx_free_q);
 	}
-	qp->rx_alloc_entry = NTB_QP_DEF_NUM_ENTRIES;
+	qp->rx_alloc_entry = nt->num_rx_entries;
 
 	for (i = 0; i < qp->tx_max_entry; i++) {
 		entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
@@ -2301,13 +2287,6 @@ static const struct ntb_ctx_ops ntb_transport_ops = {
 	.db_event = ntb_transport_doorbell_callback,
 };
 
-static struct ntb_client ntb_transport_client = {
-	.ops = {
-		.probe = ntb_transport_probe,
-		.remove = ntb_transport_free,
-	},
-};
-
 static int __init ntb_transport_init(void)
 {
 	int rc;
@@ -2318,26 +2297,16 @@ static int __init ntb_transport_init(void)
 		nt_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
 
 	rc = bus_register(&ntb_transport_bus);
-	if (rc)
-		goto err_bus;
-
-	rc = ntb_register_client(&ntb_transport_client);
-	if (rc)
-		goto err_client;
-
-	return 0;
-
-err_client:
-	bus_unregister(&ntb_transport_bus);
-err_bus:
-	debugfs_remove_recursive(nt_debugfs_dir);
+	if (rc) {
+		bus_unregister(&ntb_transport_bus);
+		debugfs_remove_recursive(nt_debugfs_dir);
+	}
 	return rc;
 }
 module_init(ntb_transport_init);
 
 static void __exit ntb_transport_exit(void)
 {
-	ntb_unregister_client(&ntb_transport_client);
 	bus_unregister(&ntb_transport_bus);
 	debugfs_remove_recursive(nt_debugfs_dir);
 }
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index 6b45790cc88e..406033dbddb7 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -134,9 +134,17 @@ struct ntb_transport_ctx {
 	struct ntb_transport_qp *qp_vec;
 	unsigned int mw_count;
 	unsigned int qp_count;
+	unsigned int max_qp_count;
 	u64 qp_bitmap;
 	u64 qp_bitmap_free;
 
+	/* Parameters */
+	unsigned int num_rx_entries;
+	unsigned int transport_mtu;
+	unsigned long max_mw_size;
+	unsigned int copy_bytes;
+	bool use_dma;
+
 	bool use_msi;
 	unsigned int msi_spad_offset;
 	u64 msi_db_mask;
@@ -162,5 +170,12 @@ struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list);
 struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
 				    struct list_head *to_list);
 void ntb_qp_link_down(struct ntb_transport_qp *qp);
+int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
+			 bool use_msi, unsigned long max_mw_size,
+			 unsigned int transport_mtu,
+			 unsigned char max_num_clients,
+			 unsigned int copy_bytes, bool use_dma,
+			 unsigned int num_rx_entries);
+void ntb_transport_detach(struct ntb_dev *ndev);
 
 #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 23/38] NTB: ntb_transport: Add transport backend infrastructure
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (21 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 22/38] NTB: ntb_transport: Split core library and default NTB client Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 24/38] NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation Koichiro Den
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Introduce a backend abstraction layer so ntb_transport can support
multiple data-plane implementations without too much code duplication.

Add backend registration APIs, store the selected backend in the
transport context, and route key operations through backend hooks. Also
add per-entry/per-QP private pointers and move backend-specific debugfs
stats behind the backend ops callback.

Register the existing implementation as the default backend.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport_core.c     | 329 ++++++++++++++++++++++-----
 drivers/ntb/ntb_transport_internal.h |  80 +++++++
 2 files changed, 347 insertions(+), 62 deletions(-)

diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 86181fe1eadd..2129fa7a22d8 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -77,6 +77,8 @@ MODULE_VERSION(NTB_TRANSPORT_VER);
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Intel Corporation");
 
+static LIST_HEAD(ntb_transport_backends);
+static DEFINE_MUTEX(ntb_transport_backend_lock);
 
 static struct dentry *nt_debugfs_dir;
 
@@ -300,15 +302,51 @@ void ntb_transport_unregister_client(struct ntb_transport_client *drv)
 }
 EXPORT_SYMBOL_GPL(ntb_transport_unregister_client);
 
-static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
+int ntb_transport_backend_register(struct ntb_transport_backend *b)
 {
-	struct ntb_transport_qp *qp = s->private;
+	struct ntb_transport_backend *tmp;
 
-	if (!qp || !qp->link_is_up)
-		return 0;
+	if (!b || !b->name || !b->ops)
+		return -EINVAL;
 
-	seq_puts(s, "\nNTB QP stats:\n\n");
+	mutex_lock(&ntb_transport_backend_lock);
+	list_for_each_entry(tmp, &ntb_transport_backends, node) {
+		if (!strcmp(tmp->name, b->name)) {
+			mutex_unlock(&ntb_transport_backend_lock);
+			return -EEXIST;
+		}
+	}
+	list_add_tail(&b->node, &ntb_transport_backends);
+	mutex_unlock(&ntb_transport_backend_lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_backend_register);
+
+void ntb_transport_backend_unregister(struct ntb_transport_backend *b)
+{
+	if (!b)
+		return;
+	mutex_lock(&ntb_transport_backend_lock);
+	list_del_init(&b->node);
+	mutex_unlock(&ntb_transport_backend_lock);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_backend_unregister);
+
+static struct ntb_transport_backend *ntb_transport_backend_find(const char *name)
+{
+	struct ntb_transport_backend *b;
+
+	list_for_each_entry(b, &ntb_transport_backends, node) {
+		if (!strcmp(b->name, name))
+			return b;
+	}
+
+	return NULL;
+}
 
+static void ntb_transport_default_debugfs_stats_show(struct seq_file *s,
+						     struct ntb_transport_qp *qp)
+{
 	seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
 	seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
 	seq_printf(s, "rx_memcpy - \t%llu\n", qp->rx_memcpy);
@@ -338,6 +376,17 @@ static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
 	seq_printf(s, "Using TX DMA - \t%s\n", qp->tx_dma_chan ? "Yes" : "No");
 	seq_printf(s, "Using RX DMA - \t%s\n", qp->rx_dma_chan ? "Yes" : "No");
 	seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
+}
+
+static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
+{
+	struct ntb_transport_qp *qp = s->private;
+
+	if (!qp || !qp->link_is_up)
+		return 0;
+
+	seq_puts(s, "\nNTB QP stats:\n\n");
+	qp->transport->backend->ops->debugfs_stats_show(s, qp);
 	seq_putc(s, '\n');
 
 	return 0;
@@ -395,8 +444,37 @@ struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
 }
 EXPORT_SYMBOL_GPL(ntb_list_mv);
 
-static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
-				     unsigned int qp_num)
+struct ntb_queue_entry *
+ntb_queue_entry_alloc(struct ntb_transport_ctx *nt, struct ntb_transport_qp *qp, int node)
+{
+	static struct ntb_queue_entry *entry;
+
+	entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
+	if (!entry)
+		return NULL;
+
+	if (nt->backend->ops->entry_priv_alloc) {
+		entry->priv = nt->backend->ops->entry_priv_alloc();
+		if (!entry->priv) {
+			kfree(entry);
+			return NULL;
+		}
+	}
+	return entry;
+}
+EXPORT_SYMBOL_GPL(ntb_queue_entry_alloc);
+
+static void
+ntb_queue_entry_free(struct ntb_transport_ctx *nt, struct ntb_queue_entry *entry)
+{
+	if (nt->backend->ops->entry_priv_free)
+		nt->backend->ops->entry_priv_free(entry->priv);
+
+	kfree(entry);
+}
+
+static int ntb_transport_default_setup_qp_mw(struct ntb_transport_ctx *nt,
+					     unsigned int qp_num)
 {
 	struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
 	struct ntb_transport_mw *mw;
@@ -467,7 +545,7 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
 	 */
 	node = dev_to_node(&ndev->dev);
 	for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
-		entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
+		entry = ntb_queue_entry_alloc(nt, qp, node);
 		if (!entry)
 			return -ENOMEM;
 
@@ -805,6 +883,9 @@ static void ntb_transport_link_cleanup(struct ntb_transport_ctx *nt)
 	u64 qp_bitmap_alloc;
 	unsigned int i, count;
 
+	if (nt->backend->ops->link_down)
+		nt->backend->ops->link_down(nt);
+
 	qp_bitmap_alloc = nt->qp_bitmap & ~nt->qp_bitmap_free;
 
 	/* Pass along the info to any clients */
@@ -866,6 +947,12 @@ static void ntb_transport_link_work(struct work_struct *work)
 
 	/* send the local info, in the opposite order of the way we read it */
 
+	if (nt->backend->ops->link_up_pre) {
+		rc = nt->backend->ops->link_up_pre(nt);
+		if (rc)
+			return;
+	}
+
 	if (nt->use_msi) {
 		rc = ntb_msi_setup_mws(ndev);
 		if (rc) {
@@ -952,10 +1039,16 @@ static void ntb_transport_link_work(struct work_struct *work)
 
 	nt->link_is_up = true;
 
+	if (nt->backend->ops->link_up_post) {
+		rc = nt->backend->ops->link_up_post(nt);
+		if (rc)
+			return;
+	}
+
 	for (i = 0; i < nt->qp_count; i++) {
 		struct ntb_transport_qp *qp = &nt->qp_vec[i];
 
-		ntb_transport_setup_qp_mw(nt, i);
+		nt->backend->ops->setup_qp_mw(nt, i);
 		ntb_transport_setup_qp_peer_msi(nt, i);
 
 		if (qp->client_ready)
@@ -1012,8 +1105,7 @@ static void ntb_qp_link_work(struct work_struct *work)
 				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
 }
 
-static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
-				    unsigned int qp_num)
+int ntb_transport_init_queue(struct ntb_transport_ctx *nt, unsigned int qp_num)
 {
 	struct ntb_transport_qp *qp;
 
@@ -1057,6 +1149,69 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
 	return 0;
 }
 
+static unsigned int ntb_transport_default_tx_free_entry(struct ntb_transport_qp *qp)
+{
+	unsigned int head = qp->tx_index;
+	unsigned int tail = qp->remote_rx_info->entry;
+
+	return tail >= head ? tail - head : qp->tx_max_entry + tail - head;
+}
+
+static int ntb_transport_default_rx_enqueue(struct ntb_transport_qp *qp,
+					    struct ntb_queue_entry *entry)
+{
+	ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
+
+	if (qp->active)
+		tasklet_schedule(&qp->rxc_db_work);
+
+	return 0;
+}
+
+static void ntb_transport_default_rx_poll(struct ntb_transport_qp *qp);
+static int ntb_transport_default_tx_enqueue(struct ntb_transport_qp *qp,
+					    struct ntb_queue_entry *entry,
+					    void *cb, void *data, unsigned int len,
+					    unsigned int flags);
+
+static const struct ntb_transport_backend_ops default_backend_ops = {
+	.setup_qp_mw = ntb_transport_default_setup_qp_mw,
+	.tx_free_entry = ntb_transport_default_tx_free_entry,
+	.tx_enqueue = ntb_transport_default_tx_enqueue,
+	.rx_enqueue = ntb_transport_default_rx_enqueue,
+	.rx_poll = ntb_transport_default_rx_poll,
+	.debugfs_stats_show = ntb_transport_default_debugfs_stats_show,
+};
+
+static struct ntb_transport_backend default_transport_backend = {
+	.name = "default",
+	.ops = &default_backend_ops,
+	.owner = THIS_MODULE,
+};
+
+static struct ntb_transport_backend *
+ntb_transport_backend_get(const char *name)
+{
+	struct ntb_transport_backend *b;
+
+	if (!name || !name[0])
+		name = "default";
+
+	mutex_lock(&ntb_transport_backend_lock);
+	b = ntb_transport_backend_find(name);
+	if (b && !try_module_get(b->owner))
+		b = NULL;
+	mutex_unlock(&ntb_transport_backend_lock);
+
+	return b;
+}
+
+static void
+ntb_transport_backend_put(struct ntb_transport_backend *b)
+{
+	module_put(b->owner);
+}
+
 int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 			 bool use_msi, unsigned long max_mw_size,
 			 unsigned int transport_mtu,
@@ -1064,6 +1219,7 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 			 unsigned int copy_bytes, bool use_dma,
 			 unsigned int num_rx_entries)
 {
+	struct ntb_transport_backend *b;
 	struct ntb_transport_ctx *nt;
 	struct ntb_transport_mw *mw;
 	unsigned int mw_count, qp_count, spad_count, max_mw_count_for_spads;
@@ -1101,6 +1257,20 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 	nt->use_dma = use_dma;
 	nt->num_rx_entries = num_rx_entries;
 
+	b = ntb_transport_backend_get(backend_name);
+	if (!b) {
+		rc = -EPROBE_DEFER;
+		goto err_free_ctx;
+	}
+
+	nt->backend = b;
+
+	if (b->ops->enable) {
+		rc = b->ops->enable(nt, &mw_count);
+		if (rc)
+			goto err_put_backend;
+	}
+
 	/*
 	 * If we are using MSI, and have at least one extra memory window,
 	 * we will reserve the last MW for the MSI window.
@@ -1120,7 +1290,7 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 	if (spad_count < NTB_TRANSPORT_MIN_SPADS) {
 		nt->mw_count = 0;
 		rc = -EINVAL;
-		goto err;
+		goto err_disable_backend;
 	}
 
 	max_mw_count_for_spads = (spad_count - MW0_SZ_HIGH) / 2;
@@ -1132,7 +1302,7 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 				  GFP_KERNEL, node);
 	if (!nt->mw_vec) {
 		rc = -ENOMEM;
-		goto err;
+		goto err_disable_backend;
 	}
 
 	for (i = 0; i < mw_count; i++) {
@@ -1141,12 +1311,12 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 		rc = ntb_peer_mw_get_addr(ndev, i, &mw->phys_addr,
 					  &mw->phys_size);
 		if (rc)
-			goto err1;
+			goto err_free_mw_vec;
 
 		mw->vbase = ioremap_wc(mw->phys_addr, mw->phys_size);
 		if (!mw->vbase) {
 			rc = -ENOMEM;
-			goto err1;
+			goto err_free_mw_vec;
 		}
 
 		mw->buff_size = 0;
@@ -1177,7 +1347,7 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 				  GFP_KERNEL, node);
 	if (!nt->qp_vec) {
 		rc = -ENOMEM;
-		goto err1;
+		goto err_free_mw_vec;
 	}
 
 	if (nt_debugfs_dir) {
@@ -1189,7 +1359,13 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 	for (i = 0; i < qp_count; i++) {
 		rc = ntb_transport_init_queue(nt, i);
 		if (rc)
-			goto err2;
+			goto err_free_qp_vec;
+
+		if (b->ops->qp_init) {
+			rc = b->ops->qp_init(nt, i);
+			if (rc)
+				goto err_free_qp_vec;
+		}
 	}
 
 	INIT_DELAYED_WORK(&nt->link_work, ntb_transport_link_work);
@@ -1197,12 +1373,12 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 
 	rc = ntb_set_ctx(ndev, nt, &ntb_transport_ops);
 	if (rc)
-		goto err2;
+		goto err_free_qp_vec;
 
 	INIT_LIST_HEAD(&nt->client_devs);
 	rc = ntb_bus_init(nt);
 	if (rc)
-		goto err3;
+		goto err_clear_ctx;
 
 	nt->link_is_up = false;
 	ntb_link_enable(ndev, NTB_SPEED_AUTO, NTB_WIDTH_AUTO);
@@ -1210,17 +1386,22 @@ int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 
 	return 0;
 
-err3:
+err_clear_ctx:
 	ntb_clear_ctx(ndev);
-err2:
+err_free_qp_vec:
 	kfree(nt->qp_vec);
-err1:
+err_free_mw_vec:
 	while (i--) {
 		mw = &nt->mw_vec[i];
 		iounmap(mw->vbase);
 	}
 	kfree(nt->mw_vec);
-err:
+err_disable_backend:
+	if (b->ops->disable)
+		b->ops->disable(nt);
+err_put_backend:
+	module_put(nt->backend->owner);
+err_free_ctx:
 	kfree(nt);
 	return rc;
 }
@@ -1229,10 +1410,13 @@ EXPORT_SYMBOL_GPL(ntb_transport_attach);
 void ntb_transport_detach(struct ntb_dev *ndev)
 {
 	struct ntb_transport_ctx *nt = ndev->ctx;
+	struct ntb_transport_backend *b;
 	struct ntb_transport_qp *qp;
 	u64 qp_bitmap_alloc;
 	int i;
 
+	WARN_ON_ONCE(!nt);
+
 	ntb_transport_link_cleanup(nt);
 	cancel_work_sync(&nt->link_cleanup);
 	cancel_delayed_work_sync(&nt->link_work);
@@ -1258,6 +1442,11 @@ void ntb_transport_detach(struct ntb_dev *ndev)
 		iounmap(nt->mw_vec[i].vbase);
 	}
 
+	b = nt->backend;
+	if (b && b->ops->disable)
+		b->ops->disable(nt);
+	ntb_transport_backend_put(b);
+
 	kfree(nt->qp_vec);
 	kfree(nt->mw_vec);
 	kfree(nt);
@@ -1513,14 +1702,10 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
 	return 0;
 }
 
-static void ntb_transport_rxc_db(unsigned long data)
+static void ntb_transport_default_rx_poll(struct ntb_transport_qp *qp)
 {
-	struct ntb_transport_qp *qp = (void *)data;
 	int rc, i;
 
-	dev_dbg(&qp->ndev->pdev->dev, "%s: doorbell %d received\n",
-		__func__, qp->qp_num);
-
 	/* Limit the number of packets processed in a single interrupt to
 	 * provide fairness to others
 	 */
@@ -1552,6 +1737,17 @@ static void ntb_transport_rxc_db(unsigned long data)
 	}
 }
 
+static void ntb_transport_rxc_db(unsigned long data)
+{
+	struct ntb_transport_qp *qp = (void *)data;
+	struct ntb_transport_ctx *nt = qp->transport;
+
+	dev_dbg(&qp->ndev->pdev->dev, "%s: doorbell %d received\n",
+		__func__, qp->qp_num);
+
+	nt->backend->ops->rx_poll(qp);
+}
+
 static void ntb_tx_copy_callback(void *data,
 				 const struct dmaengine_result *res)
 {
@@ -1721,9 +1917,18 @@ static void ntb_async_tx(struct ntb_transport_qp *qp,
 	qp->tx_memcpy++;
 }
 
-static int ntb_process_tx(struct ntb_transport_qp *qp,
-			  struct ntb_queue_entry *entry)
+static int ntb_transport_default_tx_enqueue(struct ntb_transport_qp *qp,
+					    struct ntb_queue_entry *entry,
+					    void *cb, void *data, unsigned int len,
+					    unsigned int flags)
 {
+	entry->cb_data = cb;
+	entry->buf = data;
+	entry->len = len;
+	entry->flags = flags;
+	entry->errors = 0;
+	entry->tx_index = 0;
+
 	if (!ntb_transport_tx_free_entry(qp)) {
 		qp->tx_ring_full++;
 		return -EAGAIN;
@@ -1750,6 +1955,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
 
 static void ntb_send_link_down(struct ntb_transport_qp *qp)
 {
+	struct ntb_transport_ctx *nt = qp->transport;
 	struct pci_dev *pdev = qp->ndev->pdev;
 	struct ntb_queue_entry *entry;
 	int i, rc;
@@ -1769,12 +1975,7 @@ static void ntb_send_link_down(struct ntb_transport_qp *qp)
 	if (!entry)
 		return;
 
-	entry->cb_data = NULL;
-	entry->buf = NULL;
-	entry->len = 0;
-	entry->flags = LINK_DOWN_FLAG;
-
-	rc = ntb_process_tx(qp, entry);
+	rc = nt->backend->ops->tx_enqueue(qp, entry, NULL, NULL, 0, LINK_DOWN_FLAG);
 	if (rc)
 		dev_err(&pdev->dev, "ntb: QP%d unable to send linkdown msg\n",
 			qp->qp_num);
@@ -1834,6 +2035,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 
 	nt->qp_bitmap_free &= ~qp_bit;
 
+	qp->qp_bit = qp_bit;
 	qp->cb_data = data;
 	qp->rx_handler = handlers->rx_handler;
 	qp->tx_handler = handlers->tx_handler;
@@ -1879,7 +2081,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 		qp->rx_dma_chan ? "DMA" : "CPU");
 
 	for (i = 0; i < nt->num_rx_entries; i++) {
-		entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
+		entry = ntb_queue_entry_alloc(nt, qp, node);
 		if (!entry)
 			goto err1;
 
@@ -1890,7 +2092,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 	qp->rx_alloc_entry = nt->num_rx_entries;
 
 	for (i = 0; i < qp->tx_max_entry; i++) {
-		entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
+		entry = ntb_queue_entry_alloc(nt, qp, node);
 		if (!entry)
 			goto err2;
 
@@ -1908,11 +2110,11 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
 
 err2:
 	while ((entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 err1:
 	qp->rx_alloc_entry = 0;
 	while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 	if (qp->tx_mw_dma_addr)
 		dma_unmap_resource(qp->tx_dma_chan->device->dev,
 				   qp->tx_mw_dma_addr, qp->tx_mw_size,
@@ -1935,6 +2137,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_create_queue);
  */
 void ntb_transport_free_queue(struct ntb_transport_qp *qp)
 {
+	struct ntb_transport_ctx *nt;
 	struct pci_dev *pdev;
 	struct ntb_queue_entry *entry;
 	u64 qp_bit;
@@ -1942,6 +2145,7 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
 	if (!qp)
 		return;
 
+	nt = qp->transport;
 	pdev = qp->ndev->pdev;
 
 	qp->active = false;
@@ -1988,26 +2192,29 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
 
 	cancel_delayed_work_sync(&qp->link_work);
 
+	if (nt->backend->ops->qp_free)
+		nt->backend->ops->qp_free(qp);
+
 	qp->cb_data = NULL;
 	qp->rx_handler = NULL;
 	qp->tx_handler = NULL;
 	qp->event_handler = NULL;
 
 	while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 
 	while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_pend_q))) {
 		dev_warn(&pdev->dev, "Freeing item from non-empty rx_pend_q\n");
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 	}
 
 	while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_post_q))) {
 		dev_warn(&pdev->dev, "Freeing item from non-empty rx_post_q\n");
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 	}
 
 	while ((entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
-		kfree(entry);
+		ntb_queue_entry_free(nt, entry);
 
 	qp->transport->qp_bitmap_free |= qp_bit;
 
@@ -2061,11 +2268,13 @@ EXPORT_SYMBOL_GPL(ntb_transport_rx_remove);
 int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
 			     unsigned int len)
 {
+	struct ntb_transport_ctx *nt;
 	struct ntb_queue_entry *entry;
 
 	if (!qp)
 		return -EINVAL;
 
+	nt = qp->transport;
 	entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q);
 	if (!entry)
 		return -ENOMEM;
@@ -2078,12 +2287,7 @@ int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
 	entry->errors = 0;
 	entry->rx_index = 0;
 
-	ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
-
-	if (qp->active)
-		tasklet_schedule(&qp->rxc_db_work);
-
-	return 0;
+	return nt->backend->ops->rx_enqueue(qp, entry);
 }
 EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
 
@@ -2103,6 +2307,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
 int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
 			     unsigned int len)
 {
+	struct ntb_transport_ctx *nt = qp->transport;
 	struct ntb_queue_entry *entry;
 	int rc;
 
@@ -2119,15 +2324,7 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
 		return -EBUSY;
 	}
 
-	entry->cb_data = cb;
-	entry->buf = data;
-	entry->len = len;
-	entry->flags = 0;
-	entry->errors = 0;
-	entry->retries = 0;
-	entry->tx_index = 0;
-
-	rc = ntb_process_tx(qp, entry);
+	rc = nt->backend->ops->tx_enqueue(qp, entry, cb, data, len, 0);
 	if (rc)
 		ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
 			     &qp->tx_free_q);
@@ -2249,10 +2446,9 @@ EXPORT_SYMBOL_GPL(ntb_transport_max_size);
 
 unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp)
 {
-	unsigned int head = qp->tx_index;
-	unsigned int tail = qp->remote_rx_info->entry;
+	struct ntb_transport_ctx *nt = qp->transport;
 
-	return tail >= head ? tail - head : qp->tx_max_entry + tail - head;
+	return nt->backend->ops->tx_free_entry(qp);
 }
 EXPORT_SYMBOL_GPL(ntb_transport_tx_free_entry);
 
@@ -2293,6 +2489,13 @@ static int __init ntb_transport_init(void)
 
 	pr_info("%s, version %s\n", NTB_TRANSPORT_DESC, NTB_TRANSPORT_VER);
 
+	rc = ntb_transport_backend_register(&default_transport_backend);
+	if (rc) {
+		pr_err("%s: failed to register default transport backend\n",
+		       NTB_TRANSPORT_NAME);
+		return rc;
+	}
+
 	if (debugfs_initialized())
 		nt_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
 
@@ -2300,6 +2503,7 @@ static int __init ntb_transport_init(void)
 	if (rc) {
 		bus_unregister(&ntb_transport_bus);
 		debugfs_remove_recursive(nt_debugfs_dir);
+		ntb_transport_backend_unregister(&default_transport_backend);
 	}
 	return rc;
 }
@@ -2309,5 +2513,6 @@ static void __exit ntb_transport_exit(void)
 {
 	bus_unregister(&ntb_transport_bus);
 	debugfs_remove_recursive(nt_debugfs_dir);
+	ntb_transport_backend_unregister(&default_transport_backend);
 }
 module_exit(ntb_transport_exit);
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index 406033dbddb7..a7cc44c466ee 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -33,6 +33,9 @@ struct ntb_queue_entry {
 		struct ntb_payload_header __iomem *tx_hdr;
 		struct ntb_payload_header *rx_hdr;
 	};
+
+	/* Backend-specific */
+	void *priv;
 };
 
 struct ntb_rx_info {
@@ -110,6 +113,9 @@ struct ntb_transport_qp {
 	int msi_irq;
 	struct ntb_msi_desc msi_desc;
 	struct ntb_msi_desc peer_msi_desc;
+
+	/* Backend-specific */
+	void *priv;
 };
 
 struct ntb_transport_mw {
@@ -124,12 +130,74 @@ struct ntb_transport_mw {
 	dma_addr_t dma_addr;
 };
 
+/**
+ * struct ntb_transport_backend_ops - ntb_transport backend operations
+ * @enable:             Optional. Initialize backend-specific state for the
+ *                      passed @nt context on ntb_transport_attach().
+ * @disable:            Optional. Tear down backend-specific state initialized
+ *                      by @enable. Called from ntb_transport_detach() and
+ *                      attach error paths.
+ * @qp_init:            Optional. Initialize per-QP backend-specific state for
+ *                      @qp_num.
+ * @qp_free:            Optional. Tear down per-QP backend-specific state
+ *                      initialized by @qp_init.
+ * @link_up_pre:        Optional. Called before the link-up handshake.
+ * @link_up_post:       Optional. Called after the link-up handshake.
+ * @link_down:          Optional. Called when tearing down an established link.
+ * @setup_qp_mw:        Required. Program MW layout and initialize QP mappings
+ *                      for @qp_num.
+ * @entry_priv_alloc:   Optional. Allocate backend-private per-entry data.
+ *                      The returned pointer is stored in entry->priv.
+ * @entry_priv_free:    Optional. Free per-entry private data allocated by
+ *                      @entry_priv_alloc.
+ * @tx_free_entry:      Required. Return the number of free TX entries available
+ *                      for enqueue on @qp.
+ * @tx_enqueue:         Required. Backend-specific implementation of
+ *                      ntb_transport_tx_enqueue().
+ * @rx_enqueue:         Required. Backend-specific implementation of
+ *                      ntb_transport_rx_enqueue().
+ * @rx_poll:            Required. Poll RX completions and/or push newly posted
+ *                      RX buffers.
+ * @debugfs_stats_show: Required. Emit backend-specific per-QP stats into @s.
+ */
+struct ntb_transport_backend_ops {
+	int (*enable)(struct ntb_transport_ctx *nt, unsigned int *mw_count);
+	void (*disable)(struct ntb_transport_ctx *nt);
+	int (*qp_init)(struct ntb_transport_ctx *nt, unsigned int qp_num);
+	void (*qp_free)(struct ntb_transport_qp *qp);
+	int (*link_up_pre)(struct ntb_transport_ctx *nt);
+	int (*link_up_post)(struct ntb_transport_ctx *nt);
+	void (*link_down)(struct ntb_transport_ctx *nt);
+	int (*setup_qp_mw)(struct ntb_transport_ctx *nt, unsigned int qp_num);
+	void *(*entry_priv_alloc)(void);
+	void (*entry_priv_free)(void *priv);
+	unsigned int (*tx_free_entry)(struct ntb_transport_qp *qp);
+	int (*tx_enqueue)(struct ntb_transport_qp *qp,
+			  struct ntb_queue_entry *entry,
+			  void *cb, void *data, unsigned int len,
+			  unsigned int flags);
+	int (*rx_enqueue)(struct ntb_transport_qp *qp,
+			  struct ntb_queue_entry *entry);
+	void (*rx_poll)(struct ntb_transport_qp *qp);
+	void (*debugfs_stats_show)(struct seq_file *s,
+				   struct ntb_transport_qp *qp);
+};
+
+struct ntb_transport_backend {
+	const char *name;
+	const struct ntb_transport_backend_ops *ops;
+	struct module *owner;
+	struct list_head node;
+};
+
 struct ntb_transport_ctx {
 	struct list_head entry;
 	struct list_head client_devs;
 
 	struct ntb_dev *ndev;
 
+	struct ntb_transport_backend *backend;
+
 	struct ntb_transport_mw *mw_vec;
 	struct ntb_transport_qp *qp_vec;
 	unsigned int mw_count;
@@ -157,6 +225,9 @@ struct ntb_transport_ctx {
 
 	/* Make sure workq of link event be executed serially */
 	struct mutex link_event_lock;
+
+	/* Backend-specific context */
+	void *priv;
 };
 
 enum {
@@ -169,7 +240,16 @@ void ntb_list_add(spinlock_t *lock, struct list_head *entry,
 struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list);
 struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
 				    struct list_head *to_list);
+struct ntb_queue_entry *ntb_queue_entry_alloc(struct ntb_transport_ctx *nt,
+					      struct ntb_transport_qp *qp,
+					      int node);
 void ntb_qp_link_down(struct ntb_transport_qp *qp);
+int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
+			     unsigned int qp_num);
+
+int ntb_transport_backend_register(struct ntb_transport_backend *b);
+void ntb_transport_backend_unregister(struct ntb_transport_backend *b);
+
 int ntb_transport_attach(struct ntb_dev *ndev, const char *backend_name,
 			 bool use_msi, unsigned long max_mw_size,
 			 unsigned int transport_mtu,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 24/38] NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (22 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 23/38] NTB: ntb_transport: Add transport backend infrastructure Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 25/38] NTB: hw: Add remote eDMA backend registry and DesignWare backend Koichiro Den
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Some backends may need to program BAR subrange mappings, and due to
pci_epc_set_bar() submap API constraint, the entire BAR layout needs to
be provided when calling the function. Since one MW that is to be
programmed by ntb_set_mw() can be the last piece that allows us to call
pci_epc_set_bar() for the BAR, calling it only after link-up can race
with post-link-up setup on the host.

Invoke ntb_set_mw() before the link-up handshake so the MW translation
is established early and the post-link-up setup can run without relying
on late MW programming. Since ntb_set_mw() can re-setup the MW when it
turns out that the size differs, it's safe to do so.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/ntb_transport_core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 2129fa7a22d8..185d73f8ea93 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -977,6 +977,10 @@ static void ntb_transport_link_work(struct work_struct *work)
 
 		spad = MW0_SZ_LOW + (i * 2);
 		ntb_peer_spad_write(ndev, PIDX, spad, lower_32_bits(size));
+
+		rc = ntb_set_mw(nt, i, size);
+		if (rc)
+			goto out;
 	}
 
 	ntb_peer_spad_write(ndev, PIDX, NUM_MWS, nt->mw_count);
@@ -1032,6 +1036,7 @@ static void ntb_transport_link_work(struct work_struct *work)
 
 		dev_dbg(&pdev->dev, "Remote MW%d size = %#llx\n", i, val64);
 
+		/* If it turns out that the size differs, reconfigure it */
 		rc = ntb_set_mw(nt, i, val64);
 		if (rc)
 			goto out1;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 25/38] NTB: hw: Add remote eDMA backend registry and DesignWare backend
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (23 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 24/38] NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 26/38] NTB: ntb_transport: Add remote embedded-DMA transport client Koichiro Den
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Introduce a common registry for NTB remote embedded-DMA (eDMA) backends.
Vendor-specific backend drivers register here and the transport backend
selects an implementation based on match score.

Add an initial backend for Synopsys DesignWare eDMA. The backend handles
exposing the peer-visible eDMA register window and LL rings and provides
the plumbing needed by the remote-eDMA transport backend.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/Kconfig            |   1 +
 drivers/ntb/hw/Makefile           |   1 +
 drivers/ntb/hw/edma/Kconfig       |  28 +
 drivers/ntb/hw/edma/Makefile      |   5 +
 drivers/ntb/hw/edma/backend.c     |  87 +++
 drivers/ntb/hw/edma/backend.h     | 102 ++++
 drivers/ntb/hw/edma/ntb_dw_edma.c | 977 ++++++++++++++++++++++++++++++
 7 files changed, 1201 insertions(+)
 create mode 100644 drivers/ntb/hw/edma/Kconfig
 create mode 100644 drivers/ntb/hw/edma/Makefile
 create mode 100644 drivers/ntb/hw/edma/backend.c
 create mode 100644 drivers/ntb/hw/edma/backend.h
 create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c

diff --git a/drivers/ntb/hw/Kconfig b/drivers/ntb/hw/Kconfig
index c325be526b80..4d281f258643 100644
--- a/drivers/ntb/hw/Kconfig
+++ b/drivers/ntb/hw/Kconfig
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 source "drivers/ntb/hw/amd/Kconfig"
+source "drivers/ntb/hw/edma/Kconfig"
 source "drivers/ntb/hw/idt/Kconfig"
 source "drivers/ntb/hw/intel/Kconfig"
 source "drivers/ntb/hw/epf/Kconfig"
diff --git a/drivers/ntb/hw/Makefile b/drivers/ntb/hw/Makefile
index 223ca592b5f9..05fcdd7d56b7 100644
--- a/drivers/ntb/hw/Makefile
+++ b/drivers/ntb/hw/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_NTB_AMD)	+= amd/
+obj-$(CONFIG_NTB_EDMA)	+= edma/
 obj-$(CONFIG_NTB_IDT)	+= idt/
 obj-$(CONFIG_NTB_INTEL)	+= intel/
 obj-$(CONFIG_NTB_EPF)	+= epf/
diff --git a/drivers/ntb/hw/edma/Kconfig b/drivers/ntb/hw/edma/Kconfig
new file mode 100644
index 000000000000..e1e82570c8ac
--- /dev/null
+++ b/drivers/ntb/hw/edma/Kconfig
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config NTB_EDMA
+	tristate "NTB PCI EP embedded DMA backend registry"
+	help
+	 Common registry for NTB remote embedded-DMA (eDMA) backends.
+	 Vendor-specific backend drivers register themselves here, and the
+	 remote-eDMA transport backend (NTB_TRANSPORT_EDMA) selects a backend
+	 based on match() score.
+
+	 To compile this as a module, choose M here: the module will be called
+	 ntb_edma.
+
+	 If unsure, say N.
+
+config NTB_DW_EDMA
+	tristate "DesignWare eDMA backend for NTB PCI EP embedded DMA"
+	depends on DW_EDMA
+	select NTB_EDMA
+	select DMA_ENGINE
+	help
+	 Backend implementation for Synopsys DesignWare PCIe embedded DMA (eDMA)
+	 used with the NTB remote-eDMA transport backend.
+
+	 To compile this driver as a module, choose M here: the module will be
+	 called ntb_dw_edma.
+
+	 If unsure, say N.
diff --git a/drivers/ntb/hw/edma/Makefile b/drivers/ntb/hw/edma/Makefile
new file mode 100644
index 000000000000..993a5efd64f8
--- /dev/null
+++ b/drivers/ntb/hw/edma/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_NTB_EDMA)		+= ntb_edma.o
+ntb_edma-y			:= backend.o
+
+obj-$(CONFIG_NTB_DW_EDMA)	+= ntb_dw_edma.o
diff --git a/drivers/ntb/hw/edma/backend.c b/drivers/ntb/hw/edma/backend.c
new file mode 100644
index 000000000000..b59100c07908
--- /dev/null
+++ b/drivers/ntb/hw/edma/backend.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Generic NTB remote PCI embedded DMA backend registry.
+ *
+ * The registry provides a vendor-agnostic rendezvous point for transport
+ * backends that want to use a peer-exposed embedded DMA engine.
+ */
+
+#include <linux/cleanup.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/ntb.h>
+
+#include "backend.h"
+
+static LIST_HEAD(ntb_edma_backends);
+static DEFINE_MUTEX(ntb_edma_backends_lock);
+
+int ntb_edma_backend_register(struct ntb_edma_backend *be)
+{
+	struct ntb_edma_backend *tmp;
+
+	if (!be || !be->name || !be->ops)
+		return -EINVAL;
+
+	scoped_guard(mutex, &ntb_edma_backends_lock) {
+		list_for_each_entry(tmp, &ntb_edma_backends, node) {
+			if (!strcmp(tmp->name, be->name))
+				return -EEXIST;
+		}
+		list_add_tail(&be->node, &ntb_edma_backends);
+	}
+
+	ntb_bus_reprobe();
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ntb_edma_backend_register);
+
+void ntb_edma_backend_unregister(struct ntb_edma_backend *be)
+{
+	if (!be)
+		return;
+
+	guard(mutex)(&ntb_edma_backends_lock);
+	list_del_init(&be->node);
+}
+EXPORT_SYMBOL_GPL(ntb_edma_backend_unregister);
+
+const struct ntb_edma_backend *
+ntb_edma_backend_get(struct ntb_dev *ndev)
+{
+	const struct ntb_edma_backend *best = NULL, *be;
+	int best_score = INT_MIN, score;
+
+	guard(mutex)(&ntb_edma_backends_lock);
+	list_for_each_entry(be, &ntb_edma_backends, node) {
+		score = be->ops->match ? be->ops->match(ndev) : -ENODEV;
+		if (score >= 0 && score > best_score) {
+			best = be;
+			best_score = score;
+		}
+	}
+	if (best && !try_module_get(best->owner))
+		best = NULL;
+	return best;
+}
+EXPORT_SYMBOL_GPL(ntb_edma_backend_get);
+
+void ntb_edma_backend_put(const struct ntb_edma_backend *be)
+{
+	module_put(be->owner);
+}
+EXPORT_SYMBOL_GPL(ntb_edma_backend_put);
+
+static int __init ntb_edma_init(void)
+{
+	return 0;
+}
+module_init(ntb_edma_init);
+
+static void __exit ntb_edma_exit(void)
+{
+}
+module_exit(ntb_edma_exit);
+
+MODULE_DESCRIPTION("NTB remote embedded DMA backend registry");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/ntb/hw/edma/backend.h b/drivers/ntb/hw/edma/backend.h
new file mode 100644
index 000000000000..c15a78fd4063
--- /dev/null
+++ b/drivers/ntb/hw/edma/backend.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+
+#ifndef _NTB_HW_EDMA_BACKEND_H_
+#define _NTB_HW_EDMA_BACKEND_H_
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/ntb.h>
+
+#define NTB_EDMA_CH_NUM		4
+
+/*
+ * REMOTE_EDMA_EP:
+ *   Endpoint owns the eDMA engine and pushes descriptors into a shared MW.
+ *
+ * REMOTE_EDMA_RC:
+ *   Root Complex controls the endpoint eDMA through the shared MW and
+ *   drives reads/writes on behalf of the host.
+ */
+typedef enum {
+	REMOTE_EDMA_UNKNOWN,
+	REMOTE_EDMA_EP,
+	REMOTE_EDMA_RC,
+} remote_edma_mode_t;
+
+struct ntb_edma_chans {
+	struct device *dev;
+
+	struct dma_chan *chan[NTB_EDMA_CH_NUM];
+	struct dma_chan *intr_chan;
+
+	unsigned int num_chans;
+	atomic_t cur_chan;
+
+	struct mutex lock;
+};
+
+/**
+ * struct ntb_edma_backend_ops - operations for a remote embedded-DMA backend
+ *
+ * A backend provides the hardware-specific plumbing required by the
+ * ntb_transport remote-eDMA backend, such as exposing peer-mappable resources
+ * via an NTB MW, setting up DMA channels, and delivering peer notifications.
+ *
+ * @match:           Optional. Return a non-negative score if this backend
+ *                   supports @ndev. Higher score wins. Return a negative
+ *                   errno otherwise.
+ * @alloc:           Allocate backend-private per-device state and store
+ *                   it in *@priv. Called once during transport backend
+ *                   initialization.
+ * @free:            Free backend-private state allocated by @alloc.
+ * @ep_publish:      EP-side control plane. Publish peer-accessible resources
+ *                   via MW @mw_index for @qp_count queue pairs, and arm
+ *                   the notification path. When a peer notification is
+ *                   received, invoke @cb(@cb_data, qp_num).
+ * @ep_unpublish:    Undo @ep_publish.
+ * @rc_connect:      RC-side control plane. Connect to peer-published resources
+ *                   via MW @mw_index for @qp_count queue pairs.
+ * @rc_disconnect:   Undo @rc_connect.
+ * @tx_chans_init:   Initialize DMA channels used for data transfer into @chans.
+ * @tx_chans_deinit: Tear down DMA channels initialized by @tx_chans_init.
+ * @notify_peer:     Try to notify the peer about updated shared state for
+ *                   @qp_num. Return 0 if the peer has been notified (no
+ *                   doorbell fallback needed). Return a non-zero value to
+ *                   request a doorbell-based fallback.
+ */
+struct ntb_edma_backend_ops {
+	int (*match)(struct ntb_dev *ndev);
+	int (*alloc)(struct ntb_dev *ndev, void **priv);
+	void (*free)(struct ntb_dev *ndev, void **priv);
+
+	/* Control plane: EP publishes and RC connects */
+	int (*ep_publish)(struct ntb_dev *ndev, void *priv, int mw_index,
+			  unsigned int qp_count,
+			  void (*cb)(void *data, int qp_num), void *cb_data);
+	void (*ep_unpublish)(struct ntb_dev *ndev, void *priv);
+	int (*rc_connect)(struct ntb_dev *ndev, void *priv, int mw_index,
+			  unsigned int qp_count);
+	void (*rc_disconnect)(struct ntb_dev *ndev, void *priv);
+
+	/* Data plane: TX channels */
+	int (*tx_chans_init)(struct ntb_dev *ndev, void *priv,
+			     struct ntb_edma_chans *chans, bool remote);
+	void (*tx_chans_deinit)(struct ntb_edma_chans *chans);
+	int (*notify_peer)(struct ntb_edma_chans *chans, void *priv,
+			   int qp_num);
+};
+
+struct ntb_edma_backend {
+	const char *name;
+	const struct ntb_edma_backend_ops *ops;
+	struct module *owner;
+	struct list_head node;
+};
+
+int ntb_edma_backend_register(struct ntb_edma_backend *be);
+void ntb_edma_backend_unregister(struct ntb_edma_backend *be);
+const struct ntb_edma_backend *ntb_edma_backend_get(struct ntb_dev *ndev);
+void ntb_edma_backend_put(const struct ntb_edma_backend *be);
+
+#endif /* _NTB_HW_EDMA_BACKEND_H_ */
diff --git a/drivers/ntb/hw/edma/ntb_dw_edma.c b/drivers/ntb/hw/edma/ntb_dw_edma.c
new file mode 100644
index 000000000000..f4c8985889eb
--- /dev/null
+++ b/drivers/ntb/hw/edma/ntb_dw_edma.c
@@ -0,0 +1,977 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)
+/*
+ * NTB remote DesignWare eDMA helpers
+ *
+ * This file is a helper library used by the NTB transport remote-eDMA backend,
+ * not a standalone NTB hardware driver. It contains the DesignWare eDMA
+ * specific plumbing needed to expose/map peer-accessible resources via an NTB
+ * memory window and to manage DMA channels and peer notifications.
+ */
+
+#include <linux/dma/edma.h>
+#include <linux/dmaengine.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/ntb.h>
+#include <linux/pci.h>
+#include <linux/pci-epc.h>
+#include <linux/spinlock.h>
+#include <linux/xarray.h>
+
+#include "backend.h"
+
+/* One extra channel is reserved for notification (RC to EP interrupt kick). */
+#define NTB_DW_EDMA_TOTAL_CH_NUM	(NTB_EDMA_CH_NUM + 1)
+
+#define NTB_DW_EDMA_INFO_MAGIC		0x45444D41 /* "EDMA" */
+#define NTB_DW_EDMA_NOTIFY_MAX_QP	64
+#define NTB_DW_EDMA_NR_IRQS		4
+#define NTB_DW_EDMA_MW_IDX_INVALID	(-1)
+
+/* Default eDMA LLP memory size */
+#define DMA_LLP_MEM_SIZE		PAGE_SIZE
+
+typedef void (*ntb_edma_interrupt_cb_t)(void *data, int qp_num);
+
+struct ntb_edma_ctx {
+	bool initialized;
+
+	/* Fields for the notification handling */
+	u32 qp_count;
+	u32 *notify_src_virt;
+	dma_addr_t notify_src_phys;
+	struct scatterlist sgl;
+
+	/* Host-to-EP scratch buffer used to convey event information */
+	union {
+		struct ntb_dw_edma_db *db_virt;
+		struct ntb_dw_edma_db __iomem *db_io;
+	};
+	dma_addr_t db_phys;
+
+	/* Deterministic mapping for dw-edma .irq_vector callback */
+	unsigned int peer_irq_count;
+	int peer_irq_vec[NTB_DW_EDMA_NR_IRQS];
+
+	/* For interrupts */
+	ntb_edma_interrupt_cb_t cb;
+	void *cb_data;
+
+	/* Below are the records for teardown path */
+
+	int mw_index;
+	bool mw_trans_set;
+
+	/* For ntb_dw_edma_info to be unmapped on teardown */
+	struct ntb_dw_edma_info *info_virt;
+	dma_addr_t info_phys;
+	size_t info_bytes;
+
+	/* Scratchpad backing for the unused tail of the inbound MW */
+	void *mw_pad_virt;
+	dma_addr_t mw_pad_phys;
+	size_t mw_pad_bytes;
+
+	/* eDMA register window IOMMU mapping (EP side) */
+	bool reg_mapped;
+	struct iommu_domain *iommu_dom;
+	unsigned long reg_iova;
+	size_t reg_iova_size;
+
+	/* Read channels delegated to the host side (EP side) */
+	struct dma_chan *dchan[NTB_DW_EDMA_TOTAL_CH_NUM];
+
+	/* RC-side state */
+	bool peer_initialized;
+	bool peer_probed;
+	struct dw_edma_chip *peer_chip;
+	void __iomem *peer_virt;
+	resource_size_t peer_virt_size;
+};
+
+struct ntb_dw_edma_info {
+	u32 magic;
+	u32 reg_size;
+	u16 ch_cnt;
+	u64 db_base;
+	u64 ll_rd_phys[NTB_DW_EDMA_TOTAL_CH_NUM];
+};
+
+struct ntb_dw_edma_db {
+	u32 target;
+	u32 db[NTB_DW_EDMA_NOTIFY_MAX_QP];
+};
+
+struct ntb_edma_filter {
+	struct device *dma_dev;
+	u32 direction;
+};
+
+static DEFINE_XARRAY(ntb_dw_edma_ctx_xa);
+static DEFINE_SPINLOCK(ntb_dw_edma_notify_lock);
+
+static void ntb_dw_edma_ep_unpublish(struct ntb_dev *ndev, void *priv);
+
+static int ntb_dw_edma_ctx_register(struct device *dev, struct ntb_edma_ctx *ctx)
+{
+	return xa_insert(&ntb_dw_edma_ctx_xa, (unsigned long)dev, ctx, GFP_KERNEL);
+}
+
+static void ntb_dw_edma_ctx_unregister(struct device *dev)
+{
+	xa_erase(&ntb_dw_edma_ctx_xa, (unsigned long)dev);
+}
+
+static struct ntb_edma_ctx *ntb_dw_edma_ctx_lookup(struct device *dev)
+{
+	return xa_load(&ntb_dw_edma_ctx_xa, (unsigned long)dev);
+}
+
+static bool ntb_dw_edma_filter_fn(struct dma_chan *chan, void *arg)
+{
+	struct ntb_edma_filter *filter = arg;
+	u32 dir = filter->direction;
+	struct dma_slave_caps caps;
+	int ret;
+
+	if (chan->device->dev != filter->dma_dev)
+		return false;
+
+	ret = dma_get_slave_caps(chan, &caps);
+	if (ret < 0)
+		return false;
+
+	return !!(caps.directions & dir);
+}
+
+static void ntb_dw_edma_notify_cb(struct dma_chan *dchan, void *data)
+{
+	struct ntb_edma_ctx *ctx = data;
+	ntb_edma_interrupt_cb_t cb;
+	struct ntb_dw_edma_db *db;
+	void *cb_data;
+	u32 qp_count;
+	u32 i, val;
+
+	guard(spinlock_irqsave)(&ntb_dw_edma_notify_lock);
+
+	cb = ctx->cb;
+	cb_data = ctx->cb_data;
+	qp_count = ctx->qp_count;
+	db = ctx->db_virt;
+	if (!cb || !db)
+		return;
+
+	for (i = 0; i < qp_count; i++) {
+		val = READ_ONCE(db->db[i]);
+		if (!val)
+			continue;
+
+		WRITE_ONCE(db->db[i], 0);
+		cb(cb_data, i);
+	}
+}
+
+static void ntb_dw_edma_undelegate_chans(struct ntb_edma_ctx *ctx)
+{
+	unsigned int i;
+
+	if (!ctx)
+		return;
+
+	scoped_guard(spinlock_irqsave, &ntb_dw_edma_notify_lock) {
+		ctx->cb = NULL;
+		ctx->cb_data = NULL;
+	}
+
+	for (i = 0; i < NTB_DW_EDMA_TOTAL_CH_NUM; i++) {
+		if (!ctx->dchan[i])
+			continue;
+
+		if (i == NTB_EDMA_CH_NUM)
+			dw_edma_chan_register_notify(ctx->dchan[i], NULL, NULL);
+
+		dma_release_channel(ctx->dchan[i]);
+		ctx->dchan[i] = NULL;
+	}
+}
+
+static int ntb_dw_edma_delegate_chans(struct device *dev,
+				      struct ntb_edma_ctx *ctx,
+				      struct ntb_dw_edma_info *info,
+				      ntb_edma_interrupt_cb_t cb, void *data)
+{
+	struct ntb_edma_filter filter;
+	struct dw_edma_region region;
+	dma_cap_mask_t dma_mask;
+	struct dma_chan *chan;
+	unsigned int i;
+	int rc;
+
+	dma_cap_zero(dma_mask);
+	dma_cap_set(DMA_SLAVE, dma_mask);
+
+	filter.dma_dev = dev;
+
+	/* Configure read channels, which will be driven by the host side */
+	for (i = 0; i < NTB_DW_EDMA_TOTAL_CH_NUM; i++) {
+		filter.direction = BIT(DMA_DEV_TO_MEM);
+		chan = dma_request_channel(dma_mask, ntb_dw_edma_filter_fn,
+					   &filter);
+		if (!chan) {
+			rc = -ENODEV;
+			goto err;
+		}
+		ctx->dchan[i] = chan;
+
+		if (i == NTB_EDMA_CH_NUM) {
+			scoped_guard(spinlock_irqsave, &ntb_dw_edma_notify_lock) {
+				ctx->cb = cb;
+				ctx->cb_data = data;
+			}
+			rc = dw_edma_chan_register_notify(chan,
+							  ntb_dw_edma_notify_cb,
+							  ctx);
+			if (rc)
+				goto err;
+		} else {
+			rc = dw_edma_chan_irq_config(chan, DW_EDMA_CH_IRQ_REMOTE);
+			if (rc)
+				dev_warn(dev, "irq config failed (i=%u %d)\n",
+					 i, rc);
+		}
+
+		rc = dw_edma_chan_get_ll_region(chan, &region);
+		if (rc)
+			goto err;
+
+		info->ll_rd_phys[i] = region.paddr;
+	}
+
+	return 0;
+
+err:
+	ntb_dw_edma_undelegate_chans(ctx);
+	return rc;
+}
+
+static void ntb_dw_edma_ctx_reset(struct ntb_edma_ctx *ctx)
+{
+	ctx->initialized = false;
+	ctx->mw_index = NTB_DW_EDMA_MW_IDX_INVALID;
+	ctx->mw_trans_set = false;
+	ctx->reg_mapped = false;
+	ctx->iommu_dom = NULL;
+	ctx->reg_iova = 0;
+	ctx->reg_iova_size = 0;
+	ctx->db_phys = 0;
+	ctx->qp_count = 0;
+	ctx->info_virt = NULL;
+	ctx->info_phys = 0;
+	ctx->info_bytes = 0;
+	ctx->mw_pad_virt = NULL;
+	ctx->mw_pad_phys = 0;
+	ctx->mw_pad_bytes = 0;
+	ctx->db_virt = NULL;
+	memset(ctx->dchan, 0, sizeof(ctx->dchan));
+}
+
+static int ntb_dw_edma_match(struct ntb_dev *ndev)
+{
+	struct pci_epc *epc;
+	phys_addr_t reg_phys;
+	resource_size_t reg_size;
+
+	/* EP can verify the local DesignWare eDMA presence via epc hook. */
+	epc = ntb_get_private_data(ndev);
+	if (epc) {
+		if (dw_edma_get_reg_window(epc, &reg_phys, &reg_size))
+			return -ENODEV;
+		return 100;
+	}
+
+	/* Host cannot validate peer eDMA until link/peer mapping is done. */
+	return 50;
+}
+
+static int ntb_dw_edma_alloc(struct ntb_dev *ndev, void **priv)
+{
+	struct ntb_edma_ctx *ctx;
+
+	ctx = devm_kzalloc(&ndev->dev, sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	*priv = ctx;
+	return 0;
+}
+
+static void ntb_dw_edma_free(struct ntb_dev *ndev, void **priv)
+{
+	devm_kfree(&ndev->dev, *priv);
+	*priv = NULL;
+}
+
+static int ntb_dw_edma_ep_publish(struct ntb_dev *ndev, void *priv,
+				  int mw_index, unsigned int qp_count,
+				  ntb_edma_interrupt_cb_t cb, void *data)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	struct ntb_dw_edma_info *info;
+	struct ntb_dw_edma_db *db;
+	struct iommu_domain *dom;
+	struct pci_epc *epc;
+	struct device *dev;
+	unsigned int num_subrange = NTB_DW_EDMA_TOTAL_CH_NUM + 3;
+	resource_size_t reg_size, reg_size_mw;
+	const size_t info_bytes = PAGE_SIZE;
+	dma_addr_t db_phys, info_phys;
+	phys_addr_t edma_reg_phys;
+	resource_size_t size_max;
+	size_t ll_bytes, size;
+	unsigned int cur = 0;
+	u64 need;
+	int rc;
+	u32 i;
+
+	if (ctx->initialized)
+		return 0;
+
+	/* Clean up stale state from a previous failed attempt. */
+	ntb_dw_edma_ep_unpublish(ndev, ctx);
+
+	epc = (struct pci_epc *)ntb_get_private_data(ndev);
+	if (!epc)
+		return -ENODEV;
+	dev = epc->dev.parent;
+
+	ntb_dw_edma_ctx_reset(ctx);
+
+	ctx->mw_index = mw_index;
+	ctx->qp_count = qp_count;
+
+	info = dma_alloc_coherent(dev, info_bytes, &info_phys, GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+	memset(info, 0, info_bytes);
+
+	ctx->info_virt = info;
+	ctx->info_phys = info_phys;
+	ctx->info_bytes = info_bytes;
+
+	/* Get eDMA reg base and size, IOMMU map it if necessary */
+	rc = dw_edma_get_reg_window(epc, &edma_reg_phys, &reg_size);
+	if (rc) {
+		dev_err(&ndev->pdev->dev,
+			"failed to get eDMA register window: %d\n", rc);
+		goto err;
+	}
+	dom = iommu_get_domain_for_dev(dev);
+	if (dom) {
+		phys_addr_t phys;
+		unsigned long iova;
+
+		phys = edma_reg_phys & PAGE_MASK;
+		size = PAGE_ALIGN(reg_size + edma_reg_phys - phys);
+		iova = phys;
+
+		rc = iommu_map(dom, iova, phys, size,
+			       IOMMU_READ | IOMMU_WRITE | IOMMU_MMIO,
+			       GFP_KERNEL);
+		if (rc) {
+			dev_err(&ndev->dev,
+				"failed to direct map eDMA reg: %d\n", rc);
+			goto err;
+		}
+
+		ctx->reg_mapped = true;
+		ctx->iommu_dom = dom;
+		ctx->reg_iova = iova;
+		ctx->reg_iova_size = size;
+	}
+
+	/* Read channels are driven by the peer (host side) */
+	rc = ntb_dw_edma_delegate_chans(dev, ctx, info, cb, data);
+	if (rc) {
+		dev_err(&ndev->pdev->dev,
+			"failed to prepare channels to delegate: %d\n", rc);
+		goto err;
+	}
+
+	/* Scratch buffer for notification */
+	db = dma_alloc_coherent(dev, sizeof(*db), &db_phys, GFP_KERNEL);
+	if (!db) {
+		rc = -ENOMEM;
+		goto err;
+	}
+	memset(db, 0, sizeof(*db));
+
+	ctx->db_virt = db;
+	ctx->db_phys = db_phys;
+
+	/* Prep works for IB iATU mappings */
+	ll_bytes = NTB_DW_EDMA_TOTAL_CH_NUM * DMA_LLP_MEM_SIZE;
+	reg_size_mw = roundup_pow_of_two(reg_size);
+	need = info_bytes + PAGE_SIZE + reg_size_mw + ll_bytes;
+
+	rc = ntb_mw_get_align(ndev, 0, mw_index, NULL, NULL, &size_max);
+	if (rc)
+		goto err;
+
+	if (size_max < need) {
+		rc = -ENOSPC;
+		goto err;
+	}
+
+	if (need < size_max)
+		num_subrange++;
+
+	struct ntb_mw_subrange *r __free(kfree) =
+				kcalloc(num_subrange, sizeof(*r), GFP_KERNEL);
+	if (!r) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	ctx->mw_trans_set = true;
+
+	/* iATU map ntb_dw_edma_info */
+	r[cur].addr = info_phys;
+	r[cur++].size = info_bytes;
+
+	/* iATU map ntb_dw_edma_db */
+	r[cur].addr = db_phys;
+	r[cur++].size = PAGE_SIZE;
+
+	/* iATU map eDMA reg */
+	r[cur].addr = edma_reg_phys;
+	r[cur++].size = reg_size_mw;
+
+	/* iATU map LL location */
+	for (i = 0; i < NTB_DW_EDMA_TOTAL_CH_NUM; i++) {
+		r[cur].addr = info->ll_rd_phys[i];
+		r[cur++].size = DMA_LLP_MEM_SIZE;
+	}
+
+	/* Padding if needed */
+	if (size_max - need > 0) {
+		resource_size_t pad_bytes = size_max - need;
+		dma_addr_t pad_phys;
+		void *pad;
+
+		pad = dma_alloc_coherent(dev, pad_bytes, &pad_phys, GFP_KERNEL);
+		if (!pad) {
+			rc = -ENOMEM;
+			goto err;
+		}
+		memset(pad, 0, pad_bytes);
+
+		ctx->mw_pad_virt = pad;
+		ctx->mw_pad_phys = pad_phys;
+		ctx->mw_pad_bytes = pad_bytes;
+
+		r[cur].addr = pad_phys;
+		r[cur++].size = pad_bytes;
+	}
+
+	rc = ntb_mw_set_trans_ranges(ndev, 0, mw_index, num_subrange, r);
+	if (rc)
+		goto err;
+
+	/* Fill in info */
+	info->magic = NTB_DW_EDMA_INFO_MAGIC;
+	info->reg_size = reg_size_mw;
+	info->ch_cnt = NTB_DW_EDMA_TOTAL_CH_NUM;
+	info->db_base = db_phys;
+
+	ctx->initialized = true;
+	return 0;
+
+err:
+	ntb_dw_edma_ep_unpublish(ndev, ctx);
+	return rc;
+}
+
+static void ntb_dw_edma_peer_irq_reset(struct ntb_edma_ctx *ctx)
+{
+	ctx->peer_irq_count = 0;
+	memset(ctx->peer_irq_vec, 0xff, sizeof(ctx->peer_irq_vec));
+}
+
+static int ntb_dw_edma_reserve_peer_irq_vectors(struct pci_dev *pdev,
+						struct ntb_edma_ctx *ctx,
+						unsigned int nreq)
+{
+	int i, found = 0;
+	int irq;
+
+	if (nreq > NTB_DW_EDMA_NR_IRQS)
+		return -EINVAL;
+
+	ntb_dw_edma_peer_irq_reset(ctx);
+
+	/* NTB driver should have reserved sufficient number of vectors */
+	for (i = 0; found < nreq; i++) {
+		irq = pci_irq_vector(pdev, i);
+		if (irq < 0)
+			break;
+		if (!irq_has_action(irq))
+			ctx->peer_irq_vec[found++] = i;
+	}
+	if (found < nreq)
+		return -ENOSPC;
+
+	ctx->peer_irq_count = found;
+	return 0;
+}
+
+static int ntb_dw_edma_irq_vector(struct device *dev, unsigned int nr)
+{
+	struct ntb_edma_ctx *ctx = ntb_dw_edma_ctx_lookup(dev);
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int vec;
+
+	if (!ctx)
+		return -EINVAL;
+
+	if (nr >= ctx->peer_irq_count)
+		return -EINVAL;
+
+	vec = ctx->peer_irq_vec[nr];
+	if (vec < 0)
+		return -EINVAL;
+
+	return pci_irq_vector(pdev, vec);
+}
+
+static const struct dw_edma_plat_ops ntb_dw_edma_ops = {
+	.irq_vector     = ntb_dw_edma_irq_vector,
+};
+
+static void ntb_dw_edma_rc_disconnect(struct ntb_dev *ndev, void *priv)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	void __iomem *peer_virt = ctx->peer_virt;
+	struct dw_edma_chip *chip = ctx->peer_chip;
+	u32 *notify_src = ctx->notify_src_virt;
+	dma_addr_t notify_src_phys = ctx->notify_src_phys;
+
+	/* Stop using peer MMIO early. */
+	ctx->db_io = NULL;
+	ctx->db_phys = 0;
+	ctx->qp_count = 0;
+
+	if (ctx->peer_probed && chip)
+		dw_edma_remove(chip);
+
+	ntb_dw_edma_ctx_unregister(&ndev->pdev->dev);
+
+	ntb_dw_edma_peer_irq_reset(ctx);
+
+	ctx->peer_initialized = false;
+	ctx->peer_probed = false;
+	ctx->peer_chip = NULL;
+
+	if (notify_src)
+		dma_free_coherent(&ndev->pdev->dev, sizeof(*notify_src),
+				  notify_src, notify_src_phys);
+
+	ctx->notify_src_virt = NULL;
+	ctx->notify_src_phys = 0;
+	memset(&ctx->sgl, 0, sizeof(ctx->sgl));
+
+	if (peer_virt)
+		iounmap(peer_virt);
+
+	ctx->peer_virt = NULL;
+	ctx->peer_virt_size = 0;
+}
+
+static int ntb_dw_edma_rc_connect(struct ntb_dev *ndev, void *priv, int mw_index,
+				  unsigned int qp_count)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	struct ntb_dw_edma_info __iomem *info;
+	struct dw_edma_chip *chip;
+	void __iomem *edma_virt;
+	resource_size_t mw_size;
+	phys_addr_t edma_phys;
+	unsigned int ch_cnt;
+	unsigned int i;
+	int ret;
+	u64 off;
+
+	if (ctx->peer_initialized)
+		return 0;
+
+	/* Clean up stale state from a previous failed attempt. */
+	ntb_dw_edma_rc_disconnect(ndev, priv);
+
+	ret = ntb_peer_mw_get_addr(ndev, mw_index, &edma_phys, &mw_size);
+	if (ret)
+		return ret;
+
+	edma_virt = ioremap(edma_phys, mw_size);
+	if (!edma_virt)
+		return -ENOMEM;
+
+	ctx->peer_virt = edma_virt;
+	ctx->peer_virt_size = mw_size;
+
+	info = edma_virt;
+	if (readl(&info->magic) != NTB_DW_EDMA_INFO_MAGIC) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ch_cnt = readw(&info->ch_cnt);
+	if (ch_cnt != NTB_DW_EDMA_TOTAL_CH_NUM) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	chip = devm_kzalloc(&ndev->dev, sizeof(*chip), GFP_KERNEL);
+	if (!chip) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	ret = ntb_dw_edma_ctx_register(&ndev->pdev->dev, ctx);
+	if (ret)
+		return ret;
+
+	off = 2 * PAGE_SIZE;
+	chip->dev = &ndev->pdev->dev;
+	chip->nr_irqs = NTB_DW_EDMA_NR_IRQS;
+	chip->ops = &ntb_dw_edma_ops;
+	chip->flags = 0;
+	chip->reg_base = edma_virt + off;
+	chip->mf = EDMA_MF_EDMA_UNROLL;
+	chip->ll_wr_cnt = 0;
+	chip->ll_rd_cnt = ch_cnt;
+
+	ctx->db_io = (void __iomem *)edma_virt + PAGE_SIZE;
+	ctx->qp_count = qp_count;
+	ctx->db_phys = readq(&info->db_base);
+
+	ctx->notify_src_virt = dma_alloc_coherent(&ndev->pdev->dev,
+						  sizeof(*ctx->notify_src_virt),
+						  &ctx->notify_src_phys,
+						  GFP_KERNEL);
+	if (!ctx->notify_src_virt) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	off += readl(&info->reg_size);
+
+	for (i = 0; i < ch_cnt; i++) {
+		chip->ll_region_rd[i].vaddr.io = edma_virt + off;
+		chip->ll_region_rd[i].paddr = readq(&info->ll_rd_phys[i]);
+		chip->ll_region_rd[i].sz = DMA_LLP_MEM_SIZE;
+		off += DMA_LLP_MEM_SIZE;
+	}
+
+	if (!pci_dev_msi_enabled(ndev->pdev)) {
+		ret = -ENXIO;
+		goto err;
+	}
+	ret = ntb_dw_edma_reserve_peer_irq_vectors(ndev->pdev, ctx, chip->nr_irqs);
+	if (ret) {
+		dev_err(&ndev->dev, "no free MSI vectors for remote eDMA: %d\n",
+			ret);
+		goto err;
+	}
+
+	ret = dw_edma_probe(chip);
+	if (ret) {
+		dev_err(&ndev->dev, "dw_edma_probe failed: %d\n", ret);
+		ntb_dw_edma_ctx_unregister(&ndev->pdev->dev);
+		goto err;
+	}
+
+	ctx->peer_chip = chip;
+	ctx->peer_probed = true;
+	ctx->peer_initialized = true;
+	return 0;
+
+err:
+	ntb_dw_edma_rc_disconnect(ndev, ctx);
+	return ret;
+}
+
+static void ntb_dw_edma_ep_unpublish(struct ntb_dev *ndev, void *priv)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	struct ntb_dw_edma_info *info;
+	struct ntb_dw_edma_db *db;
+	struct device *dev = NULL;
+	struct pci_epc *epc;
+	dma_addr_t db_phys, info_phys, mw_pad_phys;
+	size_t info_bytes, mw_pad_bytes;
+	void *mw_pad;
+
+	epc = (struct pci_epc *)ntb_get_private_data(ndev);
+	WARN_ON(!epc);
+	if (epc)
+		dev = epc->dev.parent;
+
+	scoped_guard(spinlock_irqsave, &ntb_dw_edma_notify_lock) {
+		db = ctx->db_virt;
+		db_phys = ctx->db_phys;
+
+		/* Make callbacks no-op first. */
+		ctx->cb = NULL;
+		ctx->cb_data = NULL;
+		ctx->db_virt = NULL;
+		ctx->qp_count = 0;
+	}
+
+	info = ctx->info_virt;
+	info_phys = ctx->info_phys;
+	info_bytes = ctx->info_bytes;
+
+	mw_pad = ctx->mw_pad_virt;
+	mw_pad_phys = ctx->mw_pad_phys;
+	mw_pad_bytes = ctx->mw_pad_bytes;
+	ctx->mw_pad_virt = NULL;
+	ctx->mw_pad_phys = 0;
+	ctx->mw_pad_bytes = 0;
+
+	/* Disconnect the MW before freeing its backing memory */
+	if (ctx->mw_trans_set && ctx->mw_index != NTB_DW_EDMA_MW_IDX_INVALID)
+		ntb_mw_clear_trans(ndev, 0, ctx->mw_index);
+
+	ntb_dw_edma_undelegate_chans(ctx);
+
+	if (ctx->reg_mapped)
+		iommu_unmap(ctx->iommu_dom, ctx->reg_iova, ctx->reg_iova_size);
+
+	if (db && dev)
+		dma_free_coherent(dev, sizeof(*db), db, db_phys);
+
+	if (info && dev && info_bytes)
+		dma_free_coherent(dev, info_bytes, info, info_phys);
+
+	if (mw_pad && dev && mw_pad_bytes)
+		dma_free_coherent(dev, mw_pad_bytes, mw_pad, mw_pad_phys);
+
+	ntb_dw_edma_ctx_reset(ctx);
+}
+
+static void ntb_dw_edma_tx_chans_deinit(struct ntb_edma_chans *edma)
+{
+	unsigned int i;
+
+	if (!edma)
+		return;
+
+	for (i = 0; i < NTB_EDMA_CH_NUM; i++) {
+		if (!edma->chan[i])
+			continue;
+		dmaengine_terminate_sync(edma->chan[i]);
+		dma_release_channel(edma->chan[i]);
+		edma->chan[i] = NULL;
+	}
+	edma->num_chans = 0;
+
+	if (edma->intr_chan) {
+		dmaengine_terminate_sync(edma->intr_chan);
+		dma_release_channel(edma->intr_chan);
+		edma->intr_chan = NULL;
+	}
+
+	atomic_set(&edma->cur_chan, 0);
+}
+
+static int ntb_dw_edma_setup_intr_chan(struct device *dev,
+				       struct ntb_edma_chans *edma, void *priv)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	struct ntb_edma_filter filter;
+	dma_cap_mask_t dma_mask;
+	struct dma_slave_config cfg;
+	struct scatterlist *sgl = &ctx->sgl;
+	int rc;
+
+	if (edma->intr_chan)
+		return 0;
+
+	if (!ctx->notify_src_virt || !ctx->db_phys)
+		return -EINVAL;
+
+	dma_cap_zero(dma_mask);
+	dma_cap_set(DMA_SLAVE, dma_mask);
+
+	filter.dma_dev = dev;
+	filter.direction = BIT(DMA_MEM_TO_DEV);
+
+	edma->intr_chan = dma_request_channel(dma_mask, ntb_dw_edma_filter_fn,
+					      &filter);
+	if (!edma->intr_chan) {
+		dev_warn(dev,
+			 "Remote eDMA notify channel could not be allocated\n");
+		return -ENODEV;
+	}
+
+	rc = dw_edma_chan_irq_config(edma->intr_chan, DW_EDMA_CH_IRQ_LOCAL);
+	if (rc)
+		goto err_release;
+
+	/* Ensure store is visible before kicking DMA transfer */
+	wmb();
+
+	sg_init_table(sgl, 1);
+	sg_dma_address(sgl) = ctx->notify_src_phys;
+	sg_dma_len(sgl) = sizeof(u32);
+
+	memset(&cfg, 0, sizeof(cfg));
+	cfg.dst_addr = ctx->db_phys; /* The first 32bit is 'target' */
+	cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.direction = DMA_MEM_TO_DEV;
+
+	rc = dmaengine_slave_config(edma->intr_chan, &cfg);
+	if (rc)
+		goto err_release;
+
+	return 0;
+
+err_release:
+	dma_release_channel(edma->intr_chan);
+	edma->intr_chan = NULL;
+	return rc;
+}
+
+static int ntb_dw_edma_tx_chans_init(struct ntb_dev *ndev, void *priv,
+				     struct ntb_edma_chans *edma, bool remote)
+{
+	struct device *dev = ntb_get_dma_dev(ndev);
+	struct ntb_edma_filter filter;
+	dma_cap_mask_t dma_mask;
+	unsigned int i;
+	int rc;
+
+	dma_cap_zero(dma_mask);
+	dma_cap_set(DMA_SLAVE, dma_mask);
+
+	memset(edma, 0, sizeof(*edma));
+	edma->dev = dev;
+
+	mutex_init(&edma->lock);
+
+	filter.dma_dev = dev;
+	filter.direction = BIT(DMA_MEM_TO_DEV);
+	for (i = 0; i < NTB_EDMA_CH_NUM; i++) {
+		edma->chan[i] = dma_request_channel(dma_mask,
+						    ntb_dw_edma_filter_fn,
+						    &filter);
+		if (!edma->chan[i])
+			break;
+		edma->num_chans++;
+
+		if (remote)
+			rc = dw_edma_chan_irq_config(edma->chan[i],
+						     DW_EDMA_CH_IRQ_REMOTE);
+		else
+			rc = dw_edma_chan_irq_config(edma->chan[i],
+						     DW_EDMA_CH_IRQ_LOCAL);
+
+		if (rc) {
+			dev_err(dev, "irq config failed on ch%u: %d\n", i, rc);
+			goto err;
+		}
+	}
+
+	if (!edma->num_chans) {
+		dev_warn(dev, "Remote eDMA channels failed to initialize\n");
+		ntb_dw_edma_tx_chans_deinit(edma);
+		return -ENODEV;
+	}
+
+	if (remote) {
+		rc = ntb_dw_edma_setup_intr_chan(dev, edma, priv);
+		if (rc)
+			goto err;
+	}
+	return 0;
+err:
+	ntb_dw_edma_tx_chans_deinit(edma);
+	return rc;
+}
+
+static int ntb_dw_edma_notify_peer(struct ntb_edma_chans *edma, void *priv,
+				   int qp_num)
+{
+	struct ntb_edma_ctx *ctx = priv;
+	struct dma_async_tx_descriptor *txd;
+	dma_cookie_t cookie;
+
+	if (!edma || !edma->intr_chan)
+		return -ENXIO;
+
+	if (qp_num < 0 || qp_num >= ctx->qp_count)
+		return -EINVAL;
+
+	if (!ctx->db_io)
+		return -EINVAL;
+
+	guard(mutex)(&edma->lock);
+
+	writel(1, &ctx->db_io->db[qp_num]);
+
+	/* Ensure store is visible before kicking the DMA transfer */
+	wmb();
+
+	txd = dmaengine_prep_slave_sg(edma->intr_chan, &ctx->sgl, 1,
+				      DMA_MEM_TO_DEV,
+				      DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
+	if (!txd)
+		return -ENOSPC;
+
+	cookie = dmaengine_submit(txd);
+	if (dma_submit_error(cookie))
+		return -ENOSPC;
+
+	dma_async_issue_pending(edma->intr_chan);
+	return 0;
+}
+
+static const struct ntb_edma_backend_ops ntb_dw_edma_backend_ops = {
+	.match = ntb_dw_edma_match,
+	.alloc = ntb_dw_edma_alloc,
+	.free = ntb_dw_edma_free,
+
+	.ep_publish = ntb_dw_edma_ep_publish,
+	.ep_unpublish = ntb_dw_edma_ep_unpublish,
+	.rc_connect = ntb_dw_edma_rc_connect,
+	.rc_disconnect = ntb_dw_edma_rc_disconnect,
+
+	.tx_chans_init = ntb_dw_edma_tx_chans_init,
+	.tx_chans_deinit = ntb_dw_edma_tx_chans_deinit,
+	.notify_peer = ntb_dw_edma_notify_peer,
+};
+
+static struct ntb_edma_backend ntb_dw_edma_backend = {
+	.name = "dw-edma",
+	.ops  = &ntb_dw_edma_backend_ops,
+	.owner = THIS_MODULE,
+};
+
+static int __init ntb_dw_edma_init(void)
+{
+	return ntb_edma_backend_register(&ntb_dw_edma_backend);
+}
+module_init(ntb_dw_edma_init);
+
+static void __exit ntb_dw_edma_exit(void)
+{
+	ntb_edma_backend_unregister(&ntb_dw_edma_backend);
+}
+module_exit(ntb_dw_edma_exit);
+
+MODULE_DESCRIPTION("NTB DW EPC eDMA backend");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 26/38] NTB: ntb_transport: Add remote embedded-DMA transport client
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (24 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 25/38] NTB: hw: Add remote eDMA backend registry and DesignWare backend Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 27/38] ntb_netdev: Multi-queue support Koichiro Den
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Introduce a new NTB transport client (ntb_client) that uses a PCI
endpoint embedded DMA engine to move data between the endpoint and the
host.

Unlike the existing cpu/dma memcpy-based transport, this transport
offloads the data plane to an embedded DMA engine located on the
endpoint and driven by the remote host. Control and queue management
remain on the peer-exposed memory window, while bulk data movement is
performed by the remote embedded DMA engine.

This transport requires a different memory window layout from the
traditional NTB transport. A key benefit of this client implementation
is that the memory window no longer needs to carry data buffers.  This
makes the design less sensitive to limited memory window space and
allows scaling to multiple queue pairs.

The transport itself is generic and does not assume a specific vendor's
DMA implementation. Support for concrete embedded DMA engines is
provided via the ntb_edma backend registry. The initial backend
implementation is ntb_dw_edma, which integrates with the DesignWare eDMA
driver.

This separation allows additional embedded DMA backends to be added in
the future without changing the NTB transport core or client logic.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/Kconfig              |   13 +
 drivers/ntb/Makefile             |    1 +
 drivers/ntb/ntb_transport_edma.c | 1110 ++++++++++++++++++++++++++++++
 3 files changed, 1124 insertions(+)
 create mode 100644 drivers/ntb/ntb_transport_edma.c

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index df16c755b4da..0dfb89ec290c 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -37,4 +37,17 @@ config NTB_TRANSPORT
 
 	 If unsure, say N.
 
+config NTB_TRANSPORT_EDMA
+	tristate "NTB Transport Client on PCI EP embedded DMA"
+	depends on NTB_TRANSPORT
+	select NTB_EDMA
+	help
+	 Enable a transport backend that uses a peer-exposed PCI embedded DMA
+	 engine through a dedicated NTB memory window.
+
+	 NOTE: You also need at least one eDMA backend driver enabled/loaded
+	 (e.g. NTB_DW_EDMA) so the transport can find a matching backend.
+
+	 If unsure, say N.
+
 endif # NTB
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 47e6b95ef7ce..7bb952a1cf8f 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport_core.o
+obj-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
 
 ntb-y			:= core.o
 ntb-$(CONFIG_NTB_MSI)	+= msi.o
diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
new file mode 100644
index 000000000000..778143a15930
--- /dev/null
+++ b/drivers/ntb/ntb_transport_edma.c
@@ -0,0 +1,1110 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * NTB transport backend for remote embedded DMA (eDMA).
+ *
+ * The backend uses an endpoint-exposed embedded DMA engine via an NTB
+ * memory window. Hardware-specific details are provided by an ntb_edma
+ * backend driver.
+ */
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/debugfs.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/errno.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/ntb.h>
+#include <linux/ntb_transport.h>
+#include <linux/pci.h>
+#include <linux/pci-epc.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+#include "ntb_transport_internal.h"
+#include "hw/edma/backend.h"
+
+static unsigned long max_mw_size;
+module_param(max_mw_size, ulong, 0644);
+MODULE_PARM_DESC(max_mw_size, "Limit size of large memory windows");
+
+static unsigned char max_num_clients;
+module_param(max_num_clients, byte, 0644);
+MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
+
+#define NTB_EDMA_RING_ORDER	7
+#define NTB_EDMA_RING_ENTRIES	BIT(NTB_EDMA_RING_ORDER)
+#define NTB_EDMA_RING_MASK	(NTB_EDMA_RING_ENTRIES - 1)
+
+#define NTB_EDMA_MAX_POLL	32
+
+/*
+ * Remote eDMA mode implementation
+ */
+struct ntb_queue_entry_edma {
+	dma_addr_t addr;
+	struct scatterlist sgl;
+};
+
+struct ntb_transport_ctx_edma {
+	remote_edma_mode_t remote_edma_mode;
+	struct device *dma_dev;
+	struct workqueue_struct *wq;
+	struct ntb_edma_chans chans;
+
+	const struct ntb_edma_backend *be;
+	void *be_priv;
+};
+
+struct ntb_transport_qp_edma {
+	struct ntb_transport_qp *qp;
+
+	/*
+	 * Schedule peer notification from a sleepable context.
+	 * ntb_peer_db_set() may sleep.
+	 */
+	struct work_struct db_work;
+
+	u32 rx_prod;
+	u32 rx_cons;
+	u32 tx_cons;
+	u32 tx_issue;
+
+	spinlock_t rx_lock;
+	spinlock_t tx_lock;
+
+	struct work_struct rx_work;
+	struct work_struct tx_work;
+};
+
+struct ntb_edma_desc {
+	u32 len;
+	u32 flags;
+	u64 addr; /* DMA address */
+	u64 data;
+};
+
+struct ntb_edma_ring {
+	struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
+	u32 head;
+	u32 tail;
+};
+
+static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+	return ctx->remote_edma_mode == REMOTE_EDMA_RC;
+}
+
+static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+	return ctx->remote_edma_mode == REMOTE_EDMA_EP;
+}
+
+static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
+{
+	return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
+}
+
+static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
+					     unsigned int n)
+{
+	return n ^ !!ntb_qp_edma_is_ep(qp);
+}
+
+static inline struct ntb_edma_ring *
+ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
+{
+	unsigned int r = ntb_edma_ring_sel(qp, n);
+
+	return &((struct ntb_edma_ring *)qp->rx_buff)[r];
+}
+
+static inline struct ntb_edma_ring __iomem *
+ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
+{
+	unsigned int r = ntb_edma_ring_sel(qp, n);
+
+	return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
+}
+
+static inline struct ntb_edma_desc *
+ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
+{
+	return &ntb_edma_ring_local(qp, n)->desc[i];
+}
+
+static inline struct ntb_edma_desc __iomem *
+ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
+		     unsigned int i)
+{
+	return &ntb_edma_ring_remote(qp, n)->desc[i];
+}
+
+static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
+				       unsigned int n)
+{
+	return &ntb_edma_ring_local(qp, n)->head;
+}
+
+static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
+						unsigned int n)
+{
+	return &ntb_edma_ring_remote(qp, n)->head;
+}
+
+static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
+				       unsigned int n)
+{
+	return &ntb_edma_ring_local(qp, n)->tail;
+}
+
+static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
+						unsigned int n)
+{
+	return &ntb_edma_ring_remote(qp, n)->tail;
+}
+
+/* The 'i' must be generated by ntb_edma_ring_idx() */
+#define NTB_DESC_TX_O(qp, i)	ntb_edma_desc_remote(qp, 0, i)
+#define NTB_DESC_TX_I(qp, i)	ntb_edma_desc_local(qp, 0, i)
+#define NTB_DESC_RX_O(qp, i)	ntb_edma_desc_remote(qp, 1, i)
+#define NTB_DESC_RX_I(qp, i)	ntb_edma_desc_local(qp, 1, i)
+
+#define NTB_HEAD_TX_I(qp)	ntb_edma_head_local(qp, 0)
+#define NTB_HEAD_RX_O(qp)	ntb_edma_head_remote(qp, 1)
+
+#define NTB_TAIL_TX_O(qp)	ntb_edma_tail_remote(qp, 0)
+#define NTB_TAIL_RX_I(qp)	ntb_edma_tail_local(qp, 1)
+
+/* ntb_edma_ring helpers */
+static __always_inline u32 ntb_edma_ring_idx(u32 v)
+{
+	return v & NTB_EDMA_RING_MASK;
+}
+
+static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
+{
+	if (head >= tail) {
+		WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
+		return head - tail;
+	}
+
+	WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
+	return U32_MAX - tail + head + 1;
+}
+
+static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
+{
+	return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
+}
+
+static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
+{
+	return ntb_edma_ring_free_entry(head, tail) == 0;
+}
+
+static void *ntb_transport_edma_entry_priv_alloc(void)
+{
+	return kzalloc(sizeof(struct ntb_queue_entry_edma), GFP_KERNEL);
+}
+
+static void ntb_transport_edma_entry_priv_free(void *priv)
+{
+	kfree(priv);
+}
+
+static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport_qp_edma *edma = qp->priv;
+	unsigned int head, tail;
+
+	scoped_guard(spinlock_irqsave, &edma->tx_lock) {
+		/* In this scope, only 'head' might proceed */
+		tail = READ_ONCE(edma->tx_issue);
+		head = READ_ONCE(*NTB_HEAD_TX_I(qp));
+	}
+	/*
+	 * 'used' amount indicates how much the other end has refilled,
+	 * which are available for us to use for TX.
+	 */
+	return ntb_edma_ring_used_entry(head, tail);
+}
+
+static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
+						  struct ntb_transport_qp *qp)
+{
+	seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
+	seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
+	seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
+	seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
+	seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
+	seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
+
+	seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
+	seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
+	seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
+	seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
+	seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
+	seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
+	seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
+	seq_putc(s, '\n');
+
+	seq_puts(s, "Using Remote eDMA - Yes\n");
+	seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
+}
+
+static void ntb_transport_edma_db_work(struct work_struct *work)
+{
+	struct ntb_transport_qp_edma *edma =
+		container_of(work, struct ntb_transport_qp_edma, db_work);
+	struct ntb_transport_qp *qp = edma->qp;
+
+	ntb_peer_db_set(qp->ndev, qp->qp_bit);
+}
+
+static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
+{
+	struct ntb_transport_qp *qp = edma->qp;
+	struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+	if (!ctx->be->ops->notify_peer(&ctx->chans, ctx->be_priv, qp->qp_num))
+		return;
+
+	/*
+	 * Called from contexts that may be atomic. Since ntb_peer_db_set()
+	 * may sleep, delegate the actual doorbell write to a workqueue.
+	 */
+	queue_work(system_highpri_wq, &edma->db_work);
+}
+
+static void ntb_transport_edma_isr(void *data, int qp_num)
+{
+	struct ntb_transport_ctx *nt = data;
+	struct ntb_transport_qp_edma *edma;
+	struct ntb_transport_ctx_edma *ctx;
+	struct ntb_transport_qp *qp;
+
+	if (qp_num < 0 || qp_num >= nt->qp_count)
+		return;
+
+	qp = &nt->qp_vec[qp_num];
+	if (WARN_ON(!qp))
+		return;
+
+	ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
+	edma = qp->priv;
+	if (!edma || !ctx)
+		return;
+
+	queue_work(ctx->wq, &edma->rx_work);
+	queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_dev *ndev = nt->ndev;
+	struct pci_dev *pdev = ndev->pdev;
+	int peer_mw;
+	int rc;
+
+	if (ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
+		return 0;
+
+	peer_mw = ntb_peer_mw_count(ndev);
+	if (peer_mw <= 0)
+		return -ENODEV;
+
+	rc = ctx->be->ops->rc_connect(ndev, ctx->be_priv, peer_mw - 1, nt->qp_count);
+	if (rc) {
+		dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
+		return rc;
+	}
+
+	rc = ctx->be->ops->tx_chans_init(ndev, ctx->be_priv, &ctx->chans, true);
+	if (rc) {
+		dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
+		goto err_rc_disconnect;
+	}
+
+	ctx->remote_edma_mode = REMOTE_EDMA_RC;
+	return 0;
+
+err_rc_disconnect:
+	ctx->be->ops->rc_disconnect(ndev, ctx->be_priv);
+	return rc;
+}
+
+static void ntb_transport_edma_rc_deinit(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_dev *ndev = nt->ndev;
+
+	if (ctx->remote_edma_mode != REMOTE_EDMA_RC)
+		return;
+
+	ctx->be->ops->tx_chans_deinit(&ctx->chans);
+	ctx->be->ops->rc_disconnect(ndev, ctx->be_priv);
+
+	ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
+}
+
+static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_dev *ndev = nt->ndev;
+	struct pci_dev *pdev = ndev->pdev;
+	int peer_mw;
+	int rc;
+
+	if (ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
+		return 0;
+
+	/*
+	 * This check assumes that the endpoint (pci-epf-vntb.c)
+	 * ntb_dev_ops implements .get_private_data() while the host side
+	 * (ntb_hw_epf.c) does not.
+	 */
+	if (!ntb_get_private_data(ndev))
+		return 0;
+
+	peer_mw = ntb_peer_mw_count(ndev);
+	if (peer_mw <= 0)
+		return -ENODEV;
+
+	rc = ctx->be->ops->ep_publish(ndev, ctx->be_priv, peer_mw - 1, nt->qp_count,
+				      ntb_transport_edma_isr, nt);
+	if (rc) {
+		dev_err(&pdev->dev,
+			"Failed to set up memory window for eDMA: %d\n", rc);
+		return rc;
+	}
+
+	rc = ctx->be->ops->tx_chans_init(ndev, ctx->be_priv, &ctx->chans, false);
+	if (rc) {
+		dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
+		ctx->be->ops->ep_unpublish(ndev, ctx->be_priv);
+		return rc;
+	}
+
+	ctx->remote_edma_mode = REMOTE_EDMA_EP;
+	return 0;
+}
+
+static void ntb_transport_edma_ep_deinit(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_dev *ndev = nt->ndev;
+
+	if (ctx->remote_edma_mode != REMOTE_EDMA_EP)
+		return;
+
+	ctx->be->ops->tx_chans_deinit(&ctx->chans);
+	ctx->be->ops->ep_unpublish(ndev, ctx->be_priv);
+
+	ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
+}
+
+static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
+					  unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+	struct ntb_dev *ndev = nt->ndev;
+	struct ntb_queue_entry *entry;
+	struct ntb_transport_mw *mw;
+	unsigned int mw_num, mw_count, qp_count;
+	unsigned int qp_offset, rx_info_offset;
+	unsigned int mw_size, mw_size_per_qp;
+	unsigned int num_qps_mw;
+	size_t edma_total;
+	unsigned int i;
+	int node;
+
+	mw_count = nt->mw_count;
+	qp_count = nt->qp_count;
+
+	mw_num = QP_TO_MW(nt, qp_num);
+	mw = &nt->mw_vec[mw_num];
+
+	if (!mw->virt_addr)
+		return -ENOMEM;
+
+	if (mw_num < qp_count % mw_count)
+		num_qps_mw = qp_count / mw_count + 1;
+	else
+		num_qps_mw = qp_count / mw_count;
+
+	mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
+	if (max_mw_size && mw_size > max_mw_size)
+		mw_size = max_mw_size;
+
+	mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
+	qp_offset = mw_size_per_qp * (qp_num / mw_count);
+	rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
+
+	qp->tx_mw_size = mw_size_per_qp;
+	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
+	if (!qp->tx_mw)
+		return -EINVAL;
+	qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
+	if (!qp->tx_mw_phys)
+		return -EINVAL;
+	qp->rx_info = qp->tx_mw + rx_info_offset;
+	qp->rx_buff = mw->virt_addr + qp_offset;
+	qp->remote_rx_info = qp->rx_buff + rx_info_offset;
+
+	/* Due to housekeeping, there must be at least 2 buffs */
+	qp->tx_max_frame = min(nt->transport_mtu, mw_size_per_qp / 2);
+	qp->rx_max_frame = min(nt->transport_mtu, mw_size_per_qp / 2);
+
+	/* In eDMA mode, decouple from MW sizing and force ring-sized entries */
+	edma_total = 2 * sizeof(struct ntb_edma_ring);
+	if (rx_info_offset < edma_total) {
+		dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
+			edma_total, rx_info_offset);
+		return -EINVAL;
+	}
+	qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
+	qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
+
+	/*
+	 * Checking to see if we have more entries than the default.
+	 * We should add additional entries if that is the case so we
+	 * can be in sync with the transport frames.
+	 */
+	node = dev_to_node(&ndev->dev);
+	for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
+		entry = ntb_queue_entry_alloc(nt, qp, node);
+		if (!entry)
+			return -ENOMEM;
+
+		entry->qp = qp;
+		ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
+			     &qp->rx_free_q);
+		qp->rx_alloc_entry++;
+	}
+
+	memset(qp->rx_buff, 0, edma_total);
+
+	qp->rx_pkts = 0;
+	qp->tx_pkts = 0;
+
+	return 0;
+}
+
+static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
+{
+	struct device *dma_dev = ntb_get_dma_dev(qp->ndev);
+	struct ntb_transport_qp_edma *edma = qp->priv;
+	struct ntb_queue_entry_edma *e;
+	struct ntb_queue_entry *entry;
+	struct ntb_edma_desc *in;
+	unsigned int len;
+	bool link_down;
+	u32 idx;
+
+	if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
+				     edma->rx_cons) == 0)
+		return 0;
+
+	idx = ntb_edma_ring_idx(edma->rx_cons);
+	in = NTB_DESC_RX_I(qp, idx);
+	if (!(in->flags & DESC_DONE_FLAG))
+		return 0;
+
+	link_down = in->flags & LINK_DOWN_FLAG;
+	in->flags = 0;
+	len = in->len; /* might be smaller than entry->len */
+
+	entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
+	if (WARN_ON(!entry))
+		return 0;
+
+	e = entry->priv;
+	dma_unmap_single(dma_dev, e->addr, entry->len, DMA_FROM_DEVICE);
+
+	if (link_down) {
+		ntb_qp_link_down(qp);
+		edma->rx_cons++;
+		ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
+		return 1;
+	}
+
+	qp->rx_bytes += len;
+	qp->rx_pkts++;
+	edma->rx_cons++;
+
+	if (qp->rx_handler && qp->client_ready)
+		qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
+
+	ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
+	return 1;
+}
+
+static void ntb_transport_edma_rx_work(struct work_struct *work)
+{
+	struct ntb_transport_qp_edma *edma =
+		container_of(work, struct ntb_transport_qp_edma, rx_work);
+	struct ntb_transport_qp *qp = edma->qp;
+	struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+	unsigned int i;
+
+	for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
+		if (!ntb_transport_edma_rx_complete(qp))
+			break;
+	}
+
+	if (ntb_transport_edma_rx_complete(qp))
+		queue_work(ctx->wq, &edma->rx_work);
+}
+
+static void ntb_transport_edma_tx_work(struct work_struct *work)
+{
+	struct ntb_transport_qp_edma *edma =
+		container_of(work, struct ntb_transport_qp_edma, tx_work);
+	struct ntb_transport_qp *qp = edma->qp;
+	struct ntb_edma_desc *in, __iomem *out;
+	struct ntb_queue_entry *entry;
+	void *cb_data;
+	int len;
+	u32 idx;
+
+	while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
+					edma->tx_cons) != 0) {
+		/* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
+		smp_rmb();
+
+		idx = ntb_edma_ring_idx(edma->tx_cons);
+		in = NTB_DESC_TX_I(qp, idx);
+		entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
+		if (!entry || !(entry->flags & DESC_DONE_FLAG))
+			break;
+
+		in->data = 0;
+
+		cb_data = entry->cb_data;
+		len = entry->len;
+
+		out = NTB_DESC_TX_O(qp, idx);
+
+		WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
+
+		iowrite32(entry->flags, &out->flags);
+		iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
+
+		ntb_transport_edma_notify_peer(edma);
+
+		ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+			     &qp->tx_free_q);
+
+		if (qp->tx_handler)
+			qp->tx_handler(qp, qp->cb_data, cb_data, len);
+
+		if (len < 0)
+			continue;
+
+		/* stat updates */
+		qp->tx_bytes += len;
+		qp->tx_pkts++;
+	}
+}
+
+static void ntb_transport_edma_tx_cb(void *data,
+				     const struct dmaengine_result *res)
+{
+	struct ntb_queue_entry *entry = data;
+	struct ntb_transport_qp *qp = entry->qp;
+	struct ntb_queue_entry_edma *e = entry->priv;
+	struct ntb_transport_ctx *nt = qp->transport;
+	struct device *dma_dev = ntb_get_dma_dev(qp->ndev);
+	enum dmaengine_tx_result dma_err = res->result;
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_transport_qp_edma *edma = qp->priv;
+
+	switch (dma_err) {
+	case DMA_TRANS_READ_FAILED:
+	case DMA_TRANS_WRITE_FAILED:
+	case DMA_TRANS_ABORTED:
+		entry->errors++;
+		entry->len = -EIO;
+		break;
+	case DMA_TRANS_NOERROR:
+	default:
+		break;
+	}
+	dma_unmap_sg(dma_dev, &e->sgl, 1, DMA_TO_DEVICE);
+	sg_dma_address(&e->sgl) = 0;
+
+	entry->flags |= DESC_DONE_FLAG;
+
+	queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
+				     size_t len, void *rc_src, dma_addr_t dst,
+				     struct ntb_queue_entry *entry)
+{
+	struct ntb_queue_entry_edma *e = entry->priv;
+	struct dma_async_tx_descriptor *txd;
+	struct scatterlist *sgl = &e->sgl;
+	struct dma_slave_config cfg;
+	dma_cookie_t cookie;
+	int nents, rc;
+
+	if (!d)
+		return -ENODEV;
+
+	if (!chan)
+		return -ENXIO;
+
+	if (WARN_ON(!rc_src || !dst))
+		return -EINVAL;
+
+	if (WARN_ON(sg_dma_address(sgl)))
+		return -EINVAL;
+
+	sg_init_one(sgl, rc_src, len);
+	nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
+	if (nents <= 0)
+		return -EIO;
+
+	memset(&cfg, 0, sizeof(cfg));
+	cfg.dst_addr       = dst;
+	cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.direction      = DMA_MEM_TO_DEV;
+
+	txd = dmaengine_prep_config_sg(chan, sgl, 1, DMA_MEM_TO_DEV,
+				       DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
+	if (!txd) {
+		rc = -EIO;
+		goto out_unmap;
+	}
+
+	txd->callback_result = ntb_transport_edma_tx_cb;
+	txd->callback_param = entry;
+
+	cookie = dmaengine_submit(txd);
+	if (dma_submit_error(cookie)) {
+		rc = -EIO;
+		goto out_unmap;
+	}
+	dma_async_issue_pending(chan);
+	return 0;
+out_unmap:
+	dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
+	return rc;
+}
+
+static struct dma_chan *ntb_transport_edma_pick_chan(struct ntb_edma_chans *chans,
+						     unsigned int idx)
+{
+	return chans->chan[idx % chans->num_chans];
+}
+
+static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
+					       struct ntb_queue_entry *entry)
+{
+	struct device *dma_dev = ntb_get_dma_dev(qp->ndev);
+	struct ntb_transport_qp_edma *edma = qp->priv;
+	struct ntb_transport_ctx *nt = qp->transport;
+	struct ntb_edma_desc *in, __iomem *out;
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	unsigned int len = entry->len;
+	struct dma_chan *chan;
+	u32 issue, idx, head;
+	dma_addr_t dst;
+	int rc;
+
+	WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
+
+	scoped_guard(spinlock_irqsave, &edma->tx_lock) {
+		head = READ_ONCE(*NTB_HEAD_TX_I(qp));
+		issue = edma->tx_issue;
+		if (ntb_edma_ring_used_entry(head, issue) == 0) {
+			qp->tx_ring_full++;
+			return -ENOSPC;
+		}
+
+		/*
+		 * ntb_transport_edma_tx_work() checks entry->flags
+		 * so it needs to be set before tx_issue++.
+		 */
+		idx = ntb_edma_ring_idx(issue);
+		in = NTB_DESC_TX_I(qp, idx);
+		in->data = (uintptr_t)entry;
+
+		/* Make in->data visible before tx_issue++ */
+		smp_wmb();
+
+		WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
+	}
+
+	/* Publish the final transfer length to the other end */
+	out = NTB_DESC_TX_O(qp, idx);
+	iowrite32(len, &out->len);
+	ioread32(&out->len);
+
+	if (unlikely(!len)) {
+		entry->flags |= DESC_DONE_FLAG;
+		queue_work(ctx->wq, &edma->tx_work);
+		return 0;
+	}
+
+	/* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
+	dma_rmb();
+
+	/* kick remote eDMA read transfer */
+	dst = (dma_addr_t)in->addr;
+	chan = ntb_transport_edma_pick_chan(&ctx->chans, qp->qp_num);
+	rc = ntb_transport_edma_submit(dma_dev, chan, len, entry->buf, dst,
+				       entry);
+	if (rc) {
+		entry->errors++;
+		entry->len = -EIO;
+		entry->flags |= DESC_DONE_FLAG;
+		queue_work(ctx->wq, &edma->tx_work);
+	}
+	return 0;
+}
+
+static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
+					 struct ntb_queue_entry *entry,
+					 void *cb, void *data, unsigned int len,
+					 unsigned int flags)
+{
+	struct ntb_queue_entry_edma *e = entry->priv;
+	struct device *dma_dev;
+
+	if (e->addr) {
+		/* Deferred unmap */
+		dma_dev = ntb_get_dma_dev(qp->ndev);
+		dma_unmap_single(dma_dev, e->addr, entry->len,
+				 DMA_TO_DEVICE);
+	}
+
+	entry->cb_data = cb;
+	entry->buf = data;
+	entry->len = len;
+	entry->flags = flags;
+	entry->errors = 0;
+
+	e->addr = 0;
+
+	WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
+
+	return ntb_transport_edma_tx_enqueue_inner(qp, entry);
+}
+
+static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
+					       struct ntb_queue_entry *entry)
+{
+	struct device *dma_dev = ntb_get_dma_dev(qp->ndev);
+	struct ntb_transport_qp_edma *edma = qp->priv;
+	struct ntb_queue_entry_edma *e = entry->priv;
+	struct ntb_edma_desc *in, __iomem *out;
+	unsigned int len = entry->len;
+	void *data = entry->buf;
+	dma_addr_t dst;
+	u32 idx;
+	int rc;
+
+	dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
+	rc = dma_mapping_error(dma_dev, dst);
+	if (rc)
+		return rc;
+
+	guard(spinlock_bh)(&edma->rx_lock);
+
+	if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
+			       READ_ONCE(edma->rx_cons))) {
+		rc = -ENOSPC;
+		goto out_unmap;
+	}
+
+	idx = ntb_edma_ring_idx(edma->rx_prod);
+	in = NTB_DESC_RX_I(qp, idx);
+	out = NTB_DESC_RX_O(qp, idx);
+
+	iowrite32(len, &out->len);
+	iowrite64(dst, &out->addr);
+
+	WARN_ON(in->flags & DESC_DONE_FLAG);
+	in->data = (uintptr_t)entry;
+	e->addr = dst;
+
+	/* Ensure len/addr are visible before the head update */
+	dma_wmb();
+
+	WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
+	iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
+
+	return 0;
+out_unmap:
+	dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
+	return rc;
+}
+
+static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
+					 struct ntb_queue_entry *entry)
+{
+	int rc;
+
+	rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
+	if (rc) {
+		ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
+			     &qp->rx_free_q);
+		return rc;
+	}
+
+	if (qp->active)
+		tasklet_schedule(&qp->rxc_db_work);
+
+	return 0;
+}
+
+static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport_ctx *nt = qp->transport;
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_transport_qp_edma *edma = qp->priv;
+
+	queue_work(ctx->wq, &edma->rx_work);
+	queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
+				      unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+	struct ntb_transport_qp_edma *edma;
+	struct ntb_dev *ndev = nt->ndev;
+	int node;
+
+	node = dev_to_node(&ndev->dev);
+
+	qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
+	if (!qp->priv)
+		return -ENOMEM;
+
+	edma = (struct ntb_transport_qp_edma *)qp->priv;
+	edma->qp = qp;
+	edma->rx_prod = 0;
+	edma->rx_cons = 0;
+	edma->tx_cons = 0;
+	edma->tx_issue = 0;
+
+	spin_lock_init(&edma->rx_lock);
+	spin_lock_init(&edma->tx_lock);
+
+	INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
+	INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
+	INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
+
+	return 0;
+}
+
+static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport_qp_edma *edma = qp->priv;
+
+	disable_work_sync(&edma->db_work);
+	disable_work_sync(&edma->rx_work);
+	disable_work_sync(&edma->tx_work);
+
+	kfree(qp->priv);
+	qp->priv = NULL;
+}
+
+static int ntb_transport_edma_link_up_pre(struct ntb_transport_ctx *nt)
+{
+	struct ntb_dev *ndev = nt->ndev;
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	rc = ntb_transport_edma_ep_init(nt);
+	if (rc)
+		dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
+
+	return rc;
+}
+
+static int ntb_transport_edma_link_up_post(struct ntb_transport_ctx *nt)
+{
+	struct ntb_dev *ndev = nt->ndev;
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	rc = ntb_transport_edma_rc_init(nt);
+	if (rc)
+		dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
+
+	return rc;
+}
+
+static void ntb_transport_edma_link_down(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+
+	WARN_ON_ONCE(!ctx);
+	switch (ctx->remote_edma_mode) {
+	case REMOTE_EDMA_EP:
+		ntb_transport_edma_ep_deinit(nt);
+		break;
+	case REMOTE_EDMA_RC:
+		ntb_transport_edma_rc_deinit(nt);
+		break;
+	default:
+	}
+}
+
+static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
+{
+	struct ntb_transport_ctx_edma *ctx = nt->priv;
+	struct ntb_dev *ndev = nt->ndev;
+
+	if (!ctx)
+		return;
+
+	if (ctx->wq)
+		destroy_workqueue(ctx->wq);
+	if (ctx->be_priv)
+		ctx->be->ops->free(ndev, &ctx->be_priv);
+	if (ctx->be)
+		ntb_edma_backend_put(ctx->be);
+
+	kfree(ctx);
+	nt->priv = NULL;
+}
+
+static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
+				     unsigned int *mw_count)
+{
+	struct ntb_transport_ctx_edma *ctx;
+	struct ntb_dev *ndev = nt->ndev;
+	int node;
+	int ret;
+
+	node = dev_to_node(&ndev->dev);
+	ctx = kzalloc_node(sizeof(*ctx), GFP_KERNEL, node);
+	if (!ctx)
+		return -ENOMEM;
+
+	nt->priv = ctx;
+	ctx->be = ntb_edma_backend_get(ndev);
+	if (!ctx->be) {
+		dev_err(&ndev->dev, "No suitable eDMA backend found\n");
+		ret = -ENODEV;
+		goto err;
+	}
+	dev_info(&ndev->dev, "Selected eDMA backend: %s\n", ctx->be->name);
+
+	ret = ctx->be->ops->alloc(ndev, &ctx->be_priv);
+	if (ret)
+		goto err;
+
+	/*
+	 * We need at least one MW for the transport plus one MW reserved
+	 * for the remote eDMA window (see ntb_edma_setup_mws/peer).
+	 */
+	if (*mw_count <= 1) {
+		dev_err(&ndev->dev,
+			"remote eDMA requires at least two MWS (have %u)\n",
+			*mw_count);
+		kfree(ctx->be_priv);
+		ret = -ENODEV;
+		goto err;
+	}
+
+	ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
+	if (!ctx->wq) {
+		kfree(ctx->be_priv);
+		ntb_transport_edma_disable(nt);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	/* Reserve the last peer MW exclusively for the eDMA window. */
+	*mw_count -= 1;
+
+	return 0;
+err:
+	ntb_transport_edma_disable(nt);
+	return ret;
+}
+
+static const struct ntb_transport_backend_ops edma_transport_ops = {
+	.enable = ntb_transport_edma_enable,
+	.disable = ntb_transport_edma_disable,
+	.qp_init = ntb_transport_edma_qp_init,
+	.qp_free = ntb_transport_edma_qp_free,
+	.link_up_pre = ntb_transport_edma_link_up_pre,
+	.link_up_post = ntb_transport_edma_link_up_post,
+	.link_down = ntb_transport_edma_link_down,
+	.setup_qp_mw = ntb_transport_edma_setup_qp_mw,
+	.entry_priv_alloc = ntb_transport_edma_entry_priv_alloc,
+	.entry_priv_free  = ntb_transport_edma_entry_priv_free,
+	.tx_free_entry = ntb_transport_edma_tx_free_entry,
+	.tx_enqueue = ntb_transport_edma_tx_enqueue,
+	.rx_enqueue = ntb_transport_edma_rx_enqueue,
+	.rx_poll = ntb_transport_edma_rx_poll,
+	.debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
+};
+
+static struct ntb_transport_backend ntb_edma_transport_backend = {
+	.name = "edma",
+	.ops = &edma_transport_ops,
+	.owner = THIS_MODULE,
+};
+
+static int ntb_transport_edma_client_probe(struct ntb_client *self,
+					   struct ntb_dev *ndev)
+{
+	return ntb_transport_attach(ndev, "edma", false, max_mw_size, 0xffff,
+				    max_num_clients, 0, false,
+				    NTB_EDMA_RING_ENTRIES);
+}
+
+static void ntb_transport_edma_client_remove(struct ntb_client *self,
+					     struct ntb_dev *ndev)
+{
+	ntb_transport_detach(ndev);
+}
+
+static struct ntb_client ntb_transport_edma_client = {
+	.ops = {
+		.probe = ntb_transport_edma_client_probe,
+		.remove = ntb_transport_edma_client_remove,
+	},
+};
+
+static int __init ntb_transport_edma_init(void)
+{
+	int rc;
+
+	rc = ntb_transport_backend_register(&ntb_edma_transport_backend);
+	if (rc)
+		return rc;
+
+	rc = ntb_register_client(&ntb_transport_edma_client);
+	if (rc)
+		ntb_transport_backend_unregister(&ntb_edma_transport_backend);
+
+	return rc;
+}
+module_init(ntb_transport_edma_init);
+
+static void ntb_transport_edma_exit(void)
+{
+	ntb_unregister_client(&ntb_transport_edma_client);
+	ntb_transport_backend_unregister(&ntb_edma_transport_backend);
+}
+module_exit(ntb_transport_edma_exit);
+
+MODULE_DESCRIPTION("NTB transport backend for remote PCI embedded DMA");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 27/38] ntb_netdev: Multi-queue support
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (25 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 26/38] NTB: ntb_transport: Add remote embedded-DMA transport client Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 28/38] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist Koichiro Den
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

In eDMA-backed mode (when using ntb_transport_edma), NTB transport can
scale throughput across multiple queue pairs without being constrained
by scarce BAR/memory window space used for data-plane buffers. It
contrasts with the default ntb_transport, where even with a single queue
pair, only up to 15 in-flight descriptors fit in a 1 MiB MW.

Teach ntb_netdev to allocate multiple ntb_transport queue pairs and
expose them as a multi-queue net_device.

With this patch, up to N queue pairs are created, where N is chosen as
follows:

  - By default, N is num_online_cpus(), to give each CPU its own queue.
  - If the ntb_num_queues module parameter is non-zero, it overrides the
    default and requests that many queues.
  - In both cases the requested value is capped at a fixed upper bound
    to avoid unbounded allocations, and by the number of queue pairs
    actually available from ntb_transport.

If only one queue pair can be created (or ntb_num_queues=1 is set), the
driver effectively falls back to the previous single-queue behavior.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/net/ntb_netdev.c | 341 ++++++++++++++++++++++++++++-----------
 1 file changed, 243 insertions(+), 98 deletions(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index fbeae05817e9..fc300db66ef7 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -53,6 +53,8 @@
 #include <linux/pci.h>
 #include <linux/ntb.h>
 #include <linux/ntb_transport.h>
+#include <linux/cpumask.h>
+#include <linux/slab.h>
 
 #define NTB_NETDEV_VER	"0.7"
 
@@ -70,26 +72,84 @@ static unsigned int tx_start = 10;
 /* Number of descriptors still available before stop upper layer tx */
 static unsigned int tx_stop = 5;
 
+/*
+ * Upper bound on how many queue pairs we will try to create even if
+ * ntb_num_queues or num_online_cpus() is very large. This is an
+ * arbitrary safety cap to avoid unbounded allocations.
+ */
+#define NTB_NETDEV_MAX_QUEUES  64
+
+/*
+ * ntb_num_queues == 0 (default) means:
+ *   - use num_online_cpus() as the desired queue count, capped by
+ *     NTB_NETDEV_MAX_QUEUES.
+ * ntb_num_queues > 0:
+ *   - try to create exactly ntb_num_queues queue pairs (again capped
+ *     by NTB_NETDEV_MAX_QUEUES), but fall back to the number of queue
+ *     pairs actually available from ntb_transport.
+ */
+static unsigned int ntb_num_queues;
+module_param(ntb_num_queues, uint, 0644);
+MODULE_PARM_DESC(ntb_num_queues,
+		 "Number of NTB netdev queue pairs to use (0 = per-CPU)");
+
+struct ntb_netdev;
+
+struct ntb_netdev_queue {
+	struct ntb_netdev *ntdev;
+	struct ntb_transport_qp *qp;
+	struct timer_list tx_timer;
+	u16 qid;
+};
+
 struct ntb_netdev {
 	struct pci_dev *pdev;
 	struct net_device *ndev;
-	struct ntb_transport_qp *qp;
-	struct timer_list tx_timer;
+	unsigned int num_queues;
+	struct ntb_netdev_queue *queues;
 };
 
 #define	NTB_TX_TIMEOUT_MS	1000
 #define	NTB_RXQ_SIZE		100
 
+static unsigned int ntb_netdev_default_queues(void)
+{
+	unsigned int n;
+
+	if (ntb_num_queues)
+		n = ntb_num_queues;
+	else
+		n = num_online_cpus();
+
+	if (!n)
+		n = 1;
+
+	if (n > NTB_NETDEV_MAX_QUEUES)
+		n = NTB_NETDEV_MAX_QUEUES;
+
+	return n;
+}
+
 static void ntb_netdev_event_handler(void *data, int link_is_up)
 {
-	struct net_device *ndev = data;
-	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct ntb_netdev_queue *q = data;
+	struct ntb_netdev *dev = q->ntdev;
+	struct net_device *ndev = dev->ndev;
+	bool any_up = false;
+	unsigned int i;
 
-	netdev_dbg(ndev, "Event %x, Link %x\n", link_is_up,
-		   ntb_transport_link_query(dev->qp));
+	netdev_dbg(ndev, "Event %x, Link %x, qp %u\n", link_is_up,
+		   ntb_transport_link_query(q->qp), q->qid);
 
 	if (link_is_up) {
-		if (ntb_transport_link_query(dev->qp))
+		for (i = 0; i < dev->num_queues; i++) {
+			if (ntb_transport_link_query(dev->queues[i].qp)) {
+				any_up = true;
+				break;
+			}
+		}
+
+		if (any_up)
 			netif_carrier_on(ndev);
 	} else {
 		netif_carrier_off(ndev);
@@ -99,7 +159,9 @@ static void ntb_netdev_event_handler(void *data, int link_is_up)
 static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
 				  void *data, int len)
 {
-	struct net_device *ndev = qp_data;
+	struct ntb_netdev_queue *q = qp_data;
+	struct ntb_netdev *dev = q->ntdev;
+	struct net_device *ndev = dev->ndev;
 	struct sk_buff *skb;
 	int rc;
 
@@ -135,7 +197,8 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
 	}
 
 enqueue_again:
-	rc = ntb_transport_rx_enqueue(qp, skb, skb->data, ndev->mtu + ETH_HLEN);
+	rc = ntb_transport_rx_enqueue(q->qp, skb, skb->data,
+				      ndev->mtu + ETH_HLEN);
 	if (rc) {
 		dev_kfree_skb_any(skb);
 		ndev->stats.rx_errors++;
@@ -143,42 +206,37 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
 	}
 }
 
-static int __ntb_netdev_maybe_stop_tx(struct net_device *netdev,
-				      struct ntb_transport_qp *qp, int size)
+static int ntb_netdev_maybe_stop_tx(struct ntb_netdev_queue *q, int size)
 {
-	struct ntb_netdev *dev = netdev_priv(netdev);
+	struct net_device *ndev = q->ntdev->ndev;
+
+	if (ntb_transport_tx_free_entry(q->qp) >= size)
+		return 0;
+
+	netif_stop_subqueue(ndev, q->qid);
 
-	netif_stop_queue(netdev);
 	/* Make sure to see the latest value of ntb_transport_tx_free_entry()
 	 * since the queue was last started.
 	 */
 	smp_mb();
 
-	if (likely(ntb_transport_tx_free_entry(qp) < size)) {
-		mod_timer(&dev->tx_timer, jiffies + usecs_to_jiffies(tx_time));
+	if (likely(ntb_transport_tx_free_entry(q->qp) < size)) {
+		mod_timer(&q->tx_timer, jiffies + usecs_to_jiffies(tx_time));
 		return -EBUSY;
 	}
 
-	netif_start_queue(netdev);
-	return 0;
-}
-
-static int ntb_netdev_maybe_stop_tx(struct net_device *ndev,
-				    struct ntb_transport_qp *qp, int size)
-{
-	if (netif_queue_stopped(ndev) ||
-	    (ntb_transport_tx_free_entry(qp) >= size))
-		return 0;
+	netif_wake_subqueue(ndev, q->qid);
 
-	return __ntb_netdev_maybe_stop_tx(ndev, qp, size);
+	return 0;
 }
 
 static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
 				  void *data, int len)
 {
-	struct net_device *ndev = qp_data;
+	struct ntb_netdev_queue *q = qp_data;
+	struct ntb_netdev *dev = q->ntdev;
+	struct net_device *ndev = dev->ndev;
 	struct sk_buff *skb;
-	struct ntb_netdev *dev = netdev_priv(ndev);
 
 	skb = data;
 	if (!skb || !ndev)
@@ -194,13 +252,12 @@ static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
 
 	dev_kfree_skb_any(skb);
 
-	if (ntb_transport_tx_free_entry(dev->qp) >= tx_start) {
+	if (ntb_transport_tx_free_entry(qp) >= tx_start) {
 		/* Make sure anybody stopping the queue after this sees the new
 		 * value of ntb_transport_tx_free_entry()
 		 */
 		smp_mb();
-		if (netif_queue_stopped(ndev))
-			netif_wake_queue(ndev);
+		netif_wake_subqueue(ndev, q->qid);
 	}
 }
 
@@ -208,16 +265,26 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
 					 struct net_device *ndev)
 {
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	u16 qid = skb_get_queue_mapping(skb);
+	struct ntb_netdev_queue *q;
 	int rc;
 
-	ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+	if (unlikely(!dev->num_queues))
+		goto err;
+
+	if (unlikely(qid >= dev->num_queues))
+		qid = qid % dev->num_queues;
 
-	rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
+	q = &dev->queues[qid];
+
+	ntb_netdev_maybe_stop_tx(q, tx_stop);
+
+	rc = ntb_transport_tx_enqueue(q->qp, skb, skb->data, skb->len);
 	if (rc)
 		goto err;
 
 	/* check for next submit */
-	ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+	ntb_netdev_maybe_stop_tx(q, tx_stop);
 
 	return NETDEV_TX_OK;
 
@@ -229,80 +296,103 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
 
 static void ntb_netdev_tx_timer(struct timer_list *t)
 {
-	struct ntb_netdev *dev = timer_container_of(dev, t, tx_timer);
+	struct ntb_netdev_queue *q = container_of(t, struct ntb_netdev_queue, tx_timer);
+	struct ntb_netdev *dev = q->ntdev;
 	struct net_device *ndev = dev->ndev;
 
-	if (ntb_transport_tx_free_entry(dev->qp) < tx_stop) {
-		mod_timer(&dev->tx_timer, jiffies + usecs_to_jiffies(tx_time));
+	if (ntb_transport_tx_free_entry(q->qp) < tx_stop) {
+		mod_timer(&q->tx_timer, jiffies + usecs_to_jiffies(tx_time));
 	} else {
-		/* Make sure anybody stopping the queue after this sees the new
+		/*
+		 * Make sure anybody stopping the queue after this sees the new
 		 * value of ntb_transport_tx_free_entry()
 		 */
 		smp_mb();
-		if (netif_queue_stopped(ndev))
-			netif_wake_queue(ndev);
+		netif_wake_subqueue(ndev, q->qid);
 	}
 }
 
 static int ntb_netdev_open(struct net_device *ndev)
 {
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct ntb_netdev_queue *queue;
 	struct sk_buff *skb;
-	int rc, i, len;
-
-	/* Add some empty rx bufs */
-	for (i = 0; i < NTB_RXQ_SIZE; i++) {
-		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
-		if (!skb) {
-			rc = -ENOMEM;
-			goto err;
-		}
+	int rc = 0, i, len;
+	unsigned int q;
 
-		rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
-					      ndev->mtu + ETH_HLEN);
-		if (rc) {
-			dev_kfree_skb(skb);
-			goto err;
+	/* Add some empty rx bufs for each queue */
+	for (q = 0; q < dev->num_queues; q++) {
+		queue = &dev->queues[q];
+
+		for (i = 0; i < NTB_RXQ_SIZE; i++) {
+			skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+			if (!skb) {
+				rc = -ENOMEM;
+				goto err;
+			}
+
+			rc = ntb_transport_rx_enqueue(queue->qp, skb, skb->data,
+						      ndev->mtu + ETH_HLEN);
+			if (rc) {
+				dev_kfree_skb(skb);
+				goto err;
+			}
 		}
-	}
 
-	timer_setup(&dev->tx_timer, ntb_netdev_tx_timer, 0);
+		timer_setup(&queue->tx_timer, ntb_netdev_tx_timer, 0);
+	}
 
 	netif_carrier_off(ndev);
-	ntb_transport_link_up(dev->qp);
-	netif_start_queue(ndev);
+
+	for (q = 0; q < dev->num_queues; q++)
+		ntb_transport_link_up(dev->queues[q].qp);
+
+	netif_tx_start_all_queues(ndev);
 
 	return 0;
 
 err:
-	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
-		dev_kfree_skb(skb);
+	for (q = 0; q < dev->num_queues; q++) {
+		queue = &dev->queues[q];
+
+		while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+			dev_kfree_skb(skb);
+	}
 	return rc;
 }
 
 static int ntb_netdev_close(struct net_device *ndev)
 {
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct ntb_netdev_queue *queue;
 	struct sk_buff *skb;
+	unsigned int q;
 	int len;
 
-	ntb_transport_link_down(dev->qp);
+	netif_tx_stop_all_queues(ndev);
+
+	for (q = 0; q < dev->num_queues; q++) {
+		queue = &dev->queues[q];
 
-	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
-		dev_kfree_skb(skb);
+		ntb_transport_link_down(queue->qp);
 
-	timer_delete_sync(&dev->tx_timer);
+		while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+			dev_kfree_skb(skb);
 
+		timer_delete_sync(&queue->tx_timer);
+	}
 	return 0;
 }
 
 static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
 {
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct ntb_netdev_queue *queue;
 	struct sk_buff *skb;
-	int len, rc;
+	unsigned int q, i;
+	int len, rc = 0;
 
-	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+	if (new_mtu > ntb_transport_max_size(dev->queues[0].qp) - ETH_HLEN)
 		return -EINVAL;
 
 	if (!netif_running(ndev)) {
@@ -311,41 +401,54 @@ static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
 	}
 
 	/* Bring down the link and dispose of posted rx entries */
-	ntb_transport_link_down(dev->qp);
+	for (q = 0; q < dev->num_queues; q++)
+		ntb_transport_link_down(dev->queues[q].qp);
 
 	if (ndev->mtu < new_mtu) {
-		int i;
-
-		for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
-			dev_kfree_skb(skb);
+		for (q = 0; q < dev->num_queues; q++) {
+			queue = &dev->queues[q];
 
-		for (; i; i--) {
-			skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
-			if (!skb) {
-				rc = -ENOMEM;
-				goto err;
-			}
-
-			rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
-						      new_mtu + ETH_HLEN);
-			if (rc) {
+			for (i = 0;
+			     (skb = ntb_transport_rx_remove(queue->qp, &len));
+			     i++)
 				dev_kfree_skb(skb);
-				goto err;
+
+			for (; i; i--) {
+				skb = netdev_alloc_skb(ndev,
+						       new_mtu + ETH_HLEN);
+				if (!skb) {
+					rc = -ENOMEM;
+					goto err;
+				}
+
+				rc = ntb_transport_rx_enqueue(queue->qp, skb,
+							      skb->data,
+							      new_mtu +
+							      ETH_HLEN);
+				if (rc) {
+					dev_kfree_skb(skb);
+					goto err;
+				}
 			}
 		}
 	}
 
 	WRITE_ONCE(ndev->mtu, new_mtu);
 
-	ntb_transport_link_up(dev->qp);
+	for (q = 0; q < dev->num_queues; q++)
+		ntb_transport_link_up(dev->queues[q].qp);
 
 	return 0;
 
 err:
-	ntb_transport_link_down(dev->qp);
+	for (q = 0; q < dev->num_queues; q++) {
+		struct ntb_netdev_queue *queue = &dev->queues[q];
+
+		ntb_transport_link_down(queue->qp);
 
-	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
-		dev_kfree_skb(skb);
+		while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+			dev_kfree_skb(skb);
+	}
 
 	netdev_err(ndev, "Error changing MTU, device inoperable\n");
 	return rc;
@@ -404,6 +507,7 @@ static int ntb_netdev_probe(struct device *client_dev)
 	struct net_device *ndev;
 	struct pci_dev *pdev;
 	struct ntb_netdev *dev;
+	unsigned int q, desired_queues;
 	int rc;
 
 	ntb = dev_ntb(client_dev->parent);
@@ -411,7 +515,9 @@ static int ntb_netdev_probe(struct device *client_dev)
 	if (!pdev)
 		return -ENODEV;
 
-	ndev = alloc_etherdev(sizeof(*dev));
+	desired_queues = ntb_netdev_default_queues();
+
+	ndev = alloc_etherdev_mq(sizeof(*dev), desired_queues);
 	if (!ndev)
 		return -ENOMEM;
 
@@ -420,6 +526,15 @@ static int ntb_netdev_probe(struct device *client_dev)
 	dev = netdev_priv(ndev);
 	dev->ndev = ndev;
 	dev->pdev = pdev;
+	dev->num_queues = 0;
+
+	dev->queues = kcalloc(desired_queues, sizeof(*dev->queues),
+			      GFP_KERNEL);
+	if (!dev->queues) {
+		rc = -ENOMEM;
+		goto err_free_netdev;
+	}
+
 	ndev->features = NETIF_F_HIGHDMA;
 
 	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
@@ -436,26 +551,51 @@ static int ntb_netdev_probe(struct device *client_dev)
 	ndev->min_mtu = 0;
 	ndev->max_mtu = ETH_MAX_MTU;
 
-	dev->qp = ntb_transport_create_queue(ndev, client_dev,
-					     &ntb_netdev_handlers);
-	if (!dev->qp) {
+	for (q = 0; q < desired_queues; q++) {
+		struct ntb_netdev_queue *queue = &dev->queues[q];
+
+		queue->ntdev = dev;
+		queue->qid = q;
+		queue->qp = ntb_transport_create_queue(queue, client_dev,
+						       &ntb_netdev_handlers);
+		if (!queue->qp)
+			break;
+
+		dev->num_queues++;
+	}
+
+	if (!dev->num_queues) {
 		rc = -EIO;
-		goto err;
+		goto err_free_queues;
 	}
 
-	ndev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
+	rc = netif_set_real_num_tx_queues(ndev, dev->num_queues);
+	if (rc)
+		goto err_free_qps;
+
+	rc = netif_set_real_num_rx_queues(ndev, dev->num_queues);
+	if (rc)
+		goto err_free_qps;
+
+	ndev->mtu = ntb_transport_max_size(dev->queues[0].qp) - ETH_HLEN;
 
 	rc = register_netdev(ndev);
 	if (rc)
-		goto err1;
+		goto err_free_qps;
 
 	dev_set_drvdata(client_dev, ndev);
-	dev_info(&pdev->dev, "%s created\n", ndev->name);
+	dev_info(&pdev->dev, "%s created with %u queue pairs\n",
+		 ndev->name, dev->num_queues);
 	return 0;
 
-err1:
-	ntb_transport_free_queue(dev->qp);
-err:
+err_free_qps:
+	for (q = 0; q < dev->num_queues; q++)
+		ntb_transport_free_queue(dev->queues[q].qp);
+
+err_free_queues:
+	kfree(dev->queues);
+
+err_free_netdev:
 	free_netdev(ndev);
 	return rc;
 }
@@ -464,9 +604,14 @@ static void ntb_netdev_remove(struct device *client_dev)
 {
 	struct net_device *ndev = dev_get_drvdata(client_dev);
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	unsigned int q;
+
 
 	unregister_netdev(ndev);
-	ntb_transport_free_queue(dev->qp);
+	for (q = 0; q < dev->num_queues; q++)
+		ntb_transport_free_queue(dev->queues[q].qp);
+
+	kfree(dev->queues);
 	free_netdev(ndev);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 28/38] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (26 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 27/38] ntb_netdev: Multi-queue support Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 29/38] iommu: ipmmu-vmsa: Add support for reserved regions Koichiro Den
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add the PCIe ch0 to the ipmmu-vmsa devices_allowlist so that traffic
routed through this PCIe instance can be translated by the IOMMU.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/iommu/ipmmu-vmsa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ca848288dbf2..724d67ad5ef2 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -743,7 +743,9 @@ static const char * const devices_allowlist[] = {
 	"ee100000.mmc",
 	"ee120000.mmc",
 	"ee140000.mmc",
-	"ee160000.mmc"
+	"ee160000.mmc",
+	"e65d0000.pcie",
+	"e65d0000.pcie-ep",
 };
 
 static bool ipmmu_device_is_allowed(struct device *dev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 29/38] iommu: ipmmu-vmsa: Add support for reserved regions
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (27 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 28/38] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 30/38] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA Koichiro Den
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add support for reserved regions using iommu_dma_get_resv_regions().

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/iommu/ipmmu-vmsa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 724d67ad5ef2..4a89d95db0f8 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -25,6 +25,8 @@
 #include <linux/slab.h>
 #include <linux/sys_soc.h>
 
+#include "dma-iommu.h"
+
 #if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
 #include <asm/dma-iommu.h>
 #else
@@ -888,6 +890,7 @@ static const struct iommu_ops ipmmu_ops = {
 	.device_group = IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)
 			? generic_device_group : generic_single_device_group,
 	.of_xlate = ipmmu_of_xlate,
+	.get_resv_regions = iommu_dma_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= ipmmu_attach_device,
 		.map_pages	= ipmmu_map,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 30/38] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (28 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 29/38] iommu: ipmmu-vmsa: Add support for reserved regions Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 31/38] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car) Koichiro Den
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add dedicated DTs for the Spider CPU+BreakOut boards when used in PCIe
RC/EP mode with DW PCIe eDMA based NTB transport.

 * r8a779f0-spider-rc.dts describes the board in RC mode.

   It reserves 4 MiB of IOVA starting at 0xfe000000, which on this SoC
   is the ECAM/Config aperture of the PCIe host bridge. In stress
   testing with the remote eDMA, allowing generic DMA mappings to occupy
   this range led to immediate instability. The exact mechanism is under
   investigation, but reserving the range avoids the issue in practice.

 * r8a779f0-spider-ep.dts describes the board in EP mode.

   The RC interface is disabled and the EP interface is enabled. IPMMU
   usage matches the RC case.

The base r8a779f0-spider.dts is intentionally left unchanged and
continues to describe the default RC-only board configuration.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 arch/arm64/boot/dts/renesas/Makefile          |  2 +
 .../boot/dts/renesas/r8a779f0-spider-ep.dts   | 37 +++++++++++++
 .../boot/dts/renesas/r8a779f0-spider-rc.dts   | 52 +++++++++++++++++++
 3 files changed, 91 insertions(+)
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
 create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts

diff --git a/arch/arm64/boot/dts/renesas/Makefile b/arch/arm64/boot/dts/renesas/Makefile
index 1fab1b50f20e..e8d312be515b 100644
--- a/arch/arm64/boot/dts/renesas/Makefile
+++ b/arch/arm64/boot/dts/renesas/Makefile
@@ -82,6 +82,8 @@ dtb-$(CONFIG_ARCH_R8A77995) += r8a77995-draak-panel-aa104xd12.dtb
 dtb-$(CONFIG_ARCH_R8A779A0) += r8a779a0-falcon.dtb
 
 dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider.dtb
+dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider-ep.dtb
+dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider-rc.dtb
 dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f4-s4sk.dtb
 
 dtb-$(CONFIG_ARCH_R8A779G0) += r8a779g0-white-hawk.dtb
diff --git a/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts b/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
new file mode 100644
index 000000000000..6753f8497d0d
--- /dev/null
+++ b/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Device Tree Source for the Spider CPU and BreakOut boards
+ * (PCIe EP mode with DW PCIe eDMA used for NTB transport)
+ *
+ * Based on the base r8a779f0-spider.dts.
+ *
+ * Copyright (C) 2025 Renesas Electronics Corp.
+ */
+
+/dts-v1/;
+#include "r8a779f0-spider-cpu.dtsi"
+#include "r8a779f0-spider-ethernet.dtsi"
+
+/ {
+	model = "Renesas Spider CPU and Breakout boards based on r8a779f0";
+	compatible = "renesas,spider-breakout", "renesas,spider-cpu",
+		     "renesas,r8a779f0";
+};
+
+&i2c4 {
+	eeprom@51 {
+		compatible = "rohm,br24g01", "atmel,24c01";
+		label = "breakout-board";
+		reg = <0x51>;
+		pagesize = <8>;
+	};
+};
+
+&pciec0 {
+	status = "disabled";
+};
+
+&pciec0_ep {
+	iommus = <&ipmmu_hc 32>;
+	status = "okay";
+};
diff --git a/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts b/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
new file mode 100644
index 000000000000..c7112862e1e1
--- /dev/null
+++ b/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Device Tree Source for the Spider CPU and BreakOut boards
+ * (PCIe RC mode with remote DW PCIe eDMA used for NTB transport)
+ *
+ * Based on the base r8a779f0-spider.dts.
+ *
+ * Copyright (C) 2025 Renesas Electronics Corp.
+ */
+
+/dts-v1/;
+#include "r8a779f0-spider-cpu.dtsi"
+#include "r8a779f0-spider-ethernet.dtsi"
+
+/ {
+	model = "Renesas Spider CPU and Breakout boards based on r8a779f0";
+	compatible = "renesas,spider-breakout", "renesas,spider-cpu",
+		     "renesas,r8a779f0";
+
+	reserved-memory {
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		/*
+		 * Reserve 4 MiB of IOVA starting at 0xfe000000. Allowing DMA
+		 * writes whose DAR (destination IOVA) falls numerically inside
+		 * the ECAM/config window has been observed to trigger
+		 * controller misbehavior.
+		 */
+		pciec0_iova_resv: pcie-iova-resv {
+			iommu-addresses = <&pciec0 0x0 0xfe000000 0x0 0x00400000>;
+		};
+	};
+};
+
+&i2c4 {
+	eeprom@51 {
+		compatible = "rohm,br24g01", "atmel,24c01";
+		label = "breakout-board";
+		reg = <0x51>;
+		pagesize = <8>;
+	};
+};
+
+&pciec0 {
+	iommus = <&ipmmu_hc 32>;
+	iommu-map = <0 &ipmmu_hc 32 1>;
+	iommu-map-mask = <0>;
+
+	memory-region = <&pciec0_iova_resv>;
+};
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 31/38] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (29 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 30/38] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 32/38] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car Koichiro Den
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Some R-Car platforms using Synopsys DesignWare PCIe with the integrated
eDMA exhibit reproducible payload corruption in RC->EP remote DMA read
traffic whenever the endpoint issues 256-byte Memory Read (MRd) TLPs.

The eDMA injects multiple MRd requests of size less than or equal to
min(MRRS, MPS), so constraining the endpoint's MRd request size removes
256-byte MRd TLPs and avoids the issue. This change adds a per-SoC knob
in the ntb_hw_epf driver and sets MRRS=128 on R-Car.

We intentionally do not change the endpoint's MPS. Per PCIe Base
Specification, MPS limits the payload size of TLPs with data transmitted
by the Function, while Max_Read_Request_Size limits the size of read
requests produced by the Function as a Requester. Limiting MRRS is
sufficient to constrain MRd Byte Count, while lowering MPS would also
throttle unrelated traffic (e.g. endpoint-originated Posted Writes and
Completions with Data) without being necessary for this fix.

This quirk is scoped to the affected endpoint only and can be removed
once the underlying issue is resolved in the controller/IP.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 66 +++++++++++++++++++++++++++++----
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index c37ede4063dc..2cefe46d2520 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -79,6 +79,12 @@ enum epf_ntb_bar {
 	NTB_BAR_NUM,
 };
 
+struct ntb_epf_soc_data {
+	const enum pci_barno *barno_map;
+	/* non-zero to override MRRS for this SoC */
+	int force_mrrs;
+};
+
 #define NTB_EPF_MAX_MW_COUNT	(NTB_BAR_NUM - BAR_MW1)
 
 struct ntb_epf_dev {
@@ -640,11 +646,12 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
 }
 
 static int ntb_epf_init_pci(struct ntb_epf_dev *ndev,
-			    struct pci_dev *pdev)
+			    struct pci_dev *pdev,
+			    const struct ntb_epf_soc_data *soc)
 {
 	struct device *dev = ndev->dev;
 	size_t spad_sz, spad_off;
-	int ret;
+	int ret, cur;
 
 	pci_set_drvdata(pdev, ndev);
 
@@ -662,6 +669,17 @@ static int ntb_epf_init_pci(struct ntb_epf_dev *ndev,
 
 	pci_set_master(pdev);
 
+	if (soc && pci_is_pcie(pdev) && soc->force_mrrs) {
+		cur = pcie_get_readrq(pdev);
+		ret = pcie_set_readrq(pdev, soc->force_mrrs);
+		if (ret)
+			dev_warn(&pdev->dev, "failed to set MRRS=%d: %d\n",
+				 soc->force_mrrs, ret);
+		else
+			dev_info(&pdev->dev, "capped MRRS: %d->%d for ntb-epf\n",
+				 cur, soc->force_mrrs);
+	}
+
 	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
 	if (ret) {
 		ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
@@ -737,6 +755,7 @@ static void ntb_epf_cleanup_isr(struct ntb_epf_dev *ndev)
 static int ntb_epf_pci_probe(struct pci_dev *pdev,
 			     const struct pci_device_id *id)
 {
+	const struct ntb_epf_soc_data *soc = (const void *)id->driver_data;
 	struct device *dev = &pdev->dev;
 	struct ntb_epf_dev *ndev;
 	int ret;
@@ -748,16 +767,16 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
 	if (!ndev)
 		return -ENOMEM;
 
-	ndev->barno_map = (const enum pci_barno *)id->driver_data;
-	if (!ndev->barno_map)
+	if (!soc || !soc->barno_map)
 		return -EINVAL;
 
+	ndev->barno_map = soc->barno_map;
 	ndev->dev = dev;
 
 	ntb_epf_init_struct(ndev, pdev);
 	mutex_init(&ndev->cmd_lock);
 
-	ret = ntb_epf_init_pci(ndev, pdev);
+	ret = ntb_epf_init_pci(ndev, pdev, soc);
 	if (ret) {
 		dev_err(dev, "Failed to init PCI\n");
 		return ret;
@@ -829,21 +848,52 @@ static const enum pci_barno rcar_barno[NTB_BAR_NUM] = {
 	[BAR_MW4]	= NO_BAR,
 };
 
+static const struct ntb_epf_soc_data j721e_soc = {
+	.barno_map = j721e_map,
+};
+
+static const struct ntb_epf_soc_data mx8_soc = {
+	.barno_map = mx8_map,
+};
+
+static const struct ntb_epf_soc_data rcar_soc = {
+	.barno_map = rcar_barno,
+	/*
+	 * On some R-Car platforms using the Synopsys DWC PCIe + eDMA we
+	 * observe data corruption on RC->EP Remote DMA Read paths whenever
+	 * the EP issues large MRd requests. The corruption consistently
+	 * hits the tail of each 256-byte segment (e.g. offsets
+	 * 0x00E0..0x00FF within a 256B block, and again at 0x01E0..0x01FF
+	 * for larger transfers).
+	 *
+	 * The DMA injects multiple MRd requests of size less than or equal
+	 * to the min(MRRS, MPS) into the outbound request path. By
+	 * lowering MRRS to 128 we prevent 256B MRd TLPs from being
+	 * generated and avoid the issue on the affected hardware. We
+	 * intentionally keep MPS unchanged and scope this quirk to this
+	 * endpoint to avoid impacting unrelated devices.
+	 *
+	 * Remove this once the issue is resolved (maybe controller/IP
+	 * level) or a more preferable workaround becomes available.
+	 */
+	.force_mrrs = 128,
+};
+
 static const struct pci_device_id ntb_epf_pci_tbl[] = {
 	{
 		PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_J721E),
 		.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
-		.driver_data = (kernel_ulong_t)j721e_map,
+		.driver_data = (kernel_ulong_t)&j721e_soc,
 	},
 	{
 		PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x0809),
 		.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
-		.driver_data = (kernel_ulong_t)mx8_map,
+		.driver_data = (kernel_ulong_t)&mx8_soc,
 	},
 	{
 		PCI_DEVICE(PCI_VENDOR_ID_RENESAS, 0x0030),
 		.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
-		.driver_data = (kernel_ulong_t)rcar_barno,
+		.driver_data = (kernel_ulong_t)&rcar_soc,
 	},
 	{ },
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 32/38] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (30 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 31/38] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car) Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 33/38] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage Koichiro Den
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

To enable remote eDMA mode on NTB transport, one additional memory
window is required. Since a single BAR can now be split into multiple
memory windows, add MW2 to BAR2 on R-Car.

For pci_epf_vntb configfs settings, users who want to use MW2 (e.g. to
enable remote eDMA mode for NTB transport as mentioned above) may
configure as follows:

  $ echo 2       > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
  $ echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
  $ echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
  $ echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
  $ echo 2       > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
  $ echo 2       > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/ntb/hw/epf/ntb_hw_epf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 2cefe46d2520..007c93e34398 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -843,7 +843,7 @@ static const enum pci_barno rcar_barno[NTB_BAR_NUM] = {
 	[BAR_PEER_SPAD]	= BAR_0,
 	[BAR_DB]	= BAR_4,
 	[BAR_MW1]	= BAR_2,
-	[BAR_MW2]	= NO_BAR,
+	[BAR_MW2]	= BAR_2,
 	[BAR_MW3]	= NO_BAR,
 	[BAR_MW4]	= NO_BAR,
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 33/38] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (31 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 32/38] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 34/38] Documentation: driver-api: ntb: Document remote embedded-DMA transport Koichiro Den
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add a concrete example showing how to pack multiple memory windows into
a single BAR by using 'mwN_bar' and 'mwN_offset'.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 Documentation/PCI/endpoint/pci-vntb-howto.rst | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/Documentation/PCI/endpoint/pci-vntb-howto.rst b/Documentation/PCI/endpoint/pci-vntb-howto.rst
index 3679f5c30254..097826f946a9 100644
--- a/Documentation/PCI/endpoint/pci-vntb-howto.rst
+++ b/Documentation/PCI/endpoint/pci-vntb-howto.rst
@@ -90,9 +90,9 @@ of the function device and is populated with the following NTB specific
 attributes that can be configured by the user::
 
 	# ls functions/pci_epf_vntb/func1/pci_epf_vntb.0/
-	ctrl_bar  db_count  mw1_bar  mw2_bar  mw3_bar  mw4_bar	spad_count
-	db_bar	  mw1	    mw2      mw3      mw4      num_mws	vbus_number
-	vntb_vid  vntb_pid
+	ctrl_bar  mw1         mw2         mw3         mw4         num_mws      vntb_pid
+	db_bar    mw1_bar     mw2_bar     mw3_bar     mw4_bar     spad_count   vntb_vid
+	db_count  mw1_offset  mw2_offset  mw3_offset  mw4_offset  vbus_number
 
 A sample configuration for NTB function is given below::
 
@@ -111,6 +111,19 @@ A sample configuration for virtual NTB driver for virtual PCI bus::
 	# echo 0x080A > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
 	# echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
 
+When BAR resources are tight but you still need to create many memory
+windows, you can pack multiple windows into a single BAR by setting
+``mwN_bar`` to the same BAR number and using ``mwN_offset`` to place each
+MW within that BAR. Offsets are in bytes and the resulting regions must not
+overlap and must exactly fit within the BAR size. This may fail depending
+on the underlying EPC capabilities::
+
+	# echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
+	# echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
+	# echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
+	# echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
+	# echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
+
 Binding pci-epf-vntb Device to EP Controller
 --------------------------------------------
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 34/38] Documentation: driver-api: ntb: Document remote embedded-DMA transport
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (32 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 33/38] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 35/38] PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper Koichiro Den
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

The NTB transport code is split into a common library
(ntb_transport_core) and NTB client modules.

Document the two transport variants:
- ntb_transport: legacy shared-memory rings (CPU/local DMA memcpy)
- ntb_transport_edma: remote embedded-DMA data plane

Also describe how to select the desired driver (module load order or
driver_override binding) and add data-flow diagrams for both directions.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 Documentation/driver-api/ntb.rst | 193 +++++++++++++++++++++++++++++++
 1 file changed, 193 insertions(+)

diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
index a49c41383779..75f96726c373 100644
--- a/Documentation/driver-api/ntb.rst
+++ b/Documentation/driver-api/ntb.rst
@@ -132,6 +132,199 @@ Transport queue pair.  Network data is copied between socket buffers and the
 Transport queue pair buffer.  The Transport client may be used for other things
 besides Netdev, however no other applications have yet been written.
 
+Transport variants
+~~~~~~~~~~~~~~~~~~
+
+The ``ntb_transport`` module is a thin NTB client driver. Most of its
+functionality is implemented in the ``ntb_transport_core`` library module,
+which provides a "queue pair" abstraction to transport clients such as
+``ntb_netdev``. Another transport variant, ``ntb_transport_edma``, relies
+on an endpoint embedded DMA engine for the data plane. When
+``ntb_transport_edma`` is loaded before ``ntb_transport``, or when an NTB
+device is explicitly bound to ``ntb_transport_edma`` via sysfs, it will be
+selected. Only one transport driver can bind to a given NTB device, and the
+upper layer does not need to care which variant is active::
+
+                       +--------------------+
+                       | ntb_transport_core |
+                       +--------------------+
+                         ^                ^
+                         |                |
+      ntb_transport -----+                +----- ntb_transport_edma
+     (cpu/dma memcpy)                       (remote embedded DMA transfer)
+                                                         |
+                                                         v
+                                                   +-----------+
+                                                   |  ntb_edma |
+                                                   +-----------+
+                                                         ^
+                                                         |
+                                                 +----------------+
+                                                 |                |
+                                            ntb_dw_edma         [...]
+
+
+Legacy shared-memory backend (``ntb_transport``)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The default backend uses the NTB memory windows as the data plane. For a TX,
+the payload is copied into a window-backed ring buffer and the receiver copies
+it back out. Copying is performed by the CPU or by a local DMA engine when the
+``use_dma`` module parameter is set.
+
+This mode is widely applicable but is sensitive to memory window size, as
+one descriptor can hold the entire MTU-sized packet data. It also requires
+one extra memcpy on both ends, as opposed to the Remote embedded DMA
+backend, described below.
+
+Remote embedded DMA backend (``ntb_transport_edma``)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The remote embedded DMA backend moves payload data directly between the two
+systems' memories, and uses the NTB memory windows only for control
+structures and for exposing the endpoint PCI embedded DMA engine to the
+host. It is provided by the ``ntb_transport_edma`` module.
+
+The current implementation supports Synopsys DesignWare PCIe embedded DMA
+(eDMA) via the ``ntb_dw_edma``, which is the only remote embedded DMA
+backend option at the moment, and the ``dw-edma`` DMA-engine driver
+(``drivers/dma/dw-edma``). The transport is not inherently tied to
+DesignWare: additional vendor-specific backends can be added by registering
+an ``ntb_edma_backend`` implementation (See ``[...]`` in the above figure.)
+
+At a high level:
+
+* One memory window is reserved as the "eDMA window". The endpoint maps its DMA
+  register block and linked-list descriptor memory into that window so the
+  host can ioremap it.
+
+* The remaining memory windows contain small per-QP control rings used to
+  exchange receive-buffer addresses and completion information.
+
+* For RC->EP traffic the RC controls the endpoint DMA read channels through the
+  eDMA window and the DMA engine pulls from RC memory into an EP RX buffer.
+
+* For EP->RC traffic the endpoint uses its local DMA write channels to push into
+  an RC RX buffer.
+
+Because the data plane no longer uses window-backed payload rings, this mode
+scales better when window space is scarce (for example, when using many queue
+pairs).
+
+The following figures illustrate the data flow when ``ntb_netdev`` sits on top
+of the transport:
+
+::
+
+       Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
+                         backed by ntb_dw_edma
+
+             EP                                   RC
+          phys addr                            phys addr
+            space                                space
+             +-+                                  +-+
+             | |                                  | |
+             | |                ||                | |
+             +-+-----.          ||                | |
+    EDMA REG | |      \\    [A] ||                | |
+             +-+----.  '---+-+  ||                | |
+             | |     \\    | |<---------[0-a]----------
+             +-+-----------| |<----------[2]----------.
+     EDMA LL | |           | |  ||                | | :
+             | |           | |  ||                | | :
+             +-+-----------+-+  ||  [B]           | | :
+             | |                ||  ++            | | :
+          ---------[0-b]----------->||----------------'
+             | |            ++  ||  ||            | |
+             | |            ||  ||  ++            | |
+             | |            ||<----------[4]-----------
+             | |            ++  ||                | |
+             | |           [C]  ||                | |
+          .--|#|<------------------------[3]------|#|<-.
+          :  |#|                ||                |#|  :
+         [5] | |                ||                | | [1]
+          :  | |                ||                | |  :
+          '->|#|                                  |#|--'
+             |#|                                  |#|
+             | |                                  | |
+
+       Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
+                        backed by ntb_dw_edma
+
+             EP                                   RC
+          phys addr                            phys addr
+            space                                space
+             +-+                                  +-+
+             | |                                  | |
+             | |                ||                | |
+             +-+                ||                | |
+    EDMA REG | |                ||                | |
+             +-+                ||                | |
+    ^        | |                ||                | |
+    :        +-+                ||                | |
+    : EDMA LL| |                ||                | |
+    :        | |                ||                | |
+    :        +-+                ||  [C]           | |
+    :        | |                ||  ++            | |
+    :     -----------[4]----------->||            | |
+    :        | |            ++  ||  ||            | |
+    :        | |            ||  ||  ++            | |
+    '----------------[2]-----||<--------[0-b]-----------
+             | |            ++  ||                | |
+             | |           [B]  ||                | |
+          .->|#|--------[3]---------------------->|#|--.
+          :  |#|                ||                |#|  :
+         [1] | |                ||                | | [5]
+          :  | |                ||                | |  :
+          '--|#|                                  |#|<-'
+             |#|                                  |#|
+             | |                                  | |
+
+       0-a. configure remote embedded DMA (e.g. program endpoint DMA registers)
+       0-b. DMA-map and publish destination address (DAR)
+       1.   network stack builds skb (copy from application/user memory)
+       2.   consume DAR, DMA-map source address (SAR) and start the DMA transfer
+       3.   DMA transfer (payload moves between RC/EP memory)
+       4.   consume completion (commit)
+       5.   network stack delivers data to application/user memory
+
+       [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
+       [B]: Control-plane ring buffer for "produce"
+       [C]: Control-plane ring buffer for "consume"
+
+Enabling the remote embedded DMA transport requires:
+
+* ``CONFIG_NTB_TRANSPORT`` and ``CONFIG_NTB_TRANSPORT_EDMA``,
+
+* a matching embedded-DMA backend enabled and/or loaded (e.g.
+  ``CONFIG_NTB_DW_EDMA``),
+
+* an endpoint configuration exposing an extra Memory Window, which,
+  in the ``ntb_dw_edma`` case, exposes eDMA registers and LL region. That
+  means at least the two Memory Windows (MW1 and MW2) need to be present.
+
+* loading ``ntb_transport_edma`` (instead of ``ntb_transport``) on both sides,
+  or explicitly bind to ``ntb_transport_edma`` when both are loaded. See
+  the following::
+
+      dev=<ntb device>  # pick one from: /sys/bus/ntb/devices/
+
+      # switch from ntb_transport -> ntb_transport_edma
+      echo $dev > /sys/bus/ntb/drivers/ntb_transport/unbind
+      echo ntb_transport_edma > /sys/bus/ntb/devices/$dev/driver_override
+      echo $dev > /sys/bus/ntb/drivers/ntb_transport_edma/bind
+
+      # switch back (optional)
+      echo $dev > /sys/bus/ntb/drivers/ntb_transport_edma/unbind
+      echo ntb_transport > /sys/bus/ntb/devices/$dev/driver_override
+      echo $dev > /sys/bus/ntb/drivers/ntb_transport/bind
+
+The remote embedded DMA mode uses a different memory window layout from the
+legacy shared-memory transport. There is no automatic fallback at runtime:
+if the endpoint does not expose a compatible eDMA window,
+``ntb_transport_edma`` will fail to attach. In that case, users need to
+manually switch back to ``ntb_transport``.
+
 NTB Ping Pong Test Client (ntb\_pingpong)
 -----------------------------------------
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 35/38] PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (33 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 34/38] Documentation: driver-api: ntb: Document remote embedded-DMA transport Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode Koichiro Den
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Introduce pci_epf_test_next_free_bar(), a small helper that wraps
pci_epc_get_next_free_bar() and tracks the next starting BAR number.
Use it for selecting the test register BAR and the doorbell BAR.

An upcoming extension needs to reserve an additional BAR, and this
helper will help keep the code compact with the selection logic
being centralized.

No functional change intended.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-test.c | 20 ++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
index 6ecbc2c2ff36..e560c3becebb 100644
--- a/drivers/pci/endpoint/functions/pci-epf-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-test.c
@@ -64,6 +64,7 @@ struct pci_epf_test {
 	void			*reg[PCI_STD_NUM_BARS];
 	struct pci_epf		*epf;
 	enum pci_barno		test_reg_bar;
+	enum pci_barno		next_free_bar;
 	size_t			msix_table_offset;
 	struct delayed_work	cmd_handler;
 	struct dma_chan		*dma_chan_tx;
@@ -104,6 +105,18 @@ static struct pci_epf_header test_header = {
 
 static size_t bar_size[] = { 512, 512, 1024, 16384, 131072, 1048576 };
 
+static enum pci_barno pci_epf_test_next_free_bar(struct pci_epf_test *epf_test)
+{
+	enum pci_barno bar;
+
+	bar = pci_epc_get_next_free_bar(epf_test->epc_features,
+					epf_test->next_free_bar);
+	if (bar != NO_BAR)
+		epf_test->next_free_bar = bar + 1;
+
+	return bar;
+}
+
 static void pci_epf_test_dma_callback(void *param)
 {
 	struct pci_epf_test *epf_test = param;
@@ -721,7 +734,7 @@ static void pci_epf_test_enable_doorbell(struct pci_epf_test *epf_test,
 		goto set_status_err;
 
 	msg = &epf->db_msg[0].msg;
-	bar = pci_epc_get_next_free_bar(epf_test->epc_features, epf_test->test_reg_bar + 1);
+	bar = pci_epf_test_next_free_bar(epf_test);
 	if (bar < BAR_0)
 		goto err_doorbell_cleanup;
 
@@ -1110,13 +1123,14 @@ static int pci_epf_test_bind(struct pci_epf *epf)
 		dev_err(&epf->dev, "epc_features not implemented\n");
 		return -EOPNOTSUPP;
 	}
+	epf_test->epc_features = epc_features;
+	epf_test->next_free_bar = BAR_0;
 
-	test_reg_bar = pci_epc_get_first_free_bar(epc_features);
+	test_reg_bar = pci_epf_test_next_free_bar(epf_test);
 	if (test_reg_bar < 0)
 		return -EINVAL;
 
 	epf_test->test_reg_bar = test_reg_bar;
-	epf_test->epc_features = epc_features;
 
 	ret = pci_epf_test_alloc_space(epf);
 	if (ret)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (34 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 35/38] PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-19 20:47   ` Frank Li
  2026-01-18 13:54 ` [RFC PATCH v4 37/38] misc: pci_endpoint_test: Add remote eDMA transfer test mode Koichiro Den
                   ` (2 subsequent siblings)
  38 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Some DesignWare-based endpoints integrate an eDMA engine that can be
programmed by the host via MMIO. The upcoming NTB transport remote-eDMA
backend relies on this capability, but there is currently no upstream
test coverage for the end-to-end control and data path.

Extend pci-epf-test with an optional remote eDMA test backend (built when
CONFIG_DW_EDMA is enabled).

- Reserve a spare BAR and expose a small 'pcitest_edma_info' header at
  BAR offset 0. The header carries a magic/version and describes the
  endpoint eDMA register window, per-direction linked-list (LL)
  locations and an endpoint test buffer.
- Map the eDMA registers and LL locations into that BAR using BAR
  subrange mappings (address-match inbound iATU).

To run this extra testing, two new endpoint commands are added:
  * COMMAND_REMOTE_EDMA_SETUP
  * COMMAND_REMOTE_EDMA_CHECKSUM

When the former command is received, the endpoint prepares for the
remote eDMA transfer. The CHECKSUM command is useful for Host-to-EP
transfer testing, as the endpoint side is not expected to receive the
DMA completion interrupt directly. Instead, the host asks the endpoint
to compute a CRC32 over the transferred data.

This backend is exercised by the host-side pci_endpoint_test driver via a
new UAPI flag.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/pci/endpoint/functions/pci-epf-test.c | 477 ++++++++++++++++++
 1 file changed, 477 insertions(+)

diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
index e560c3becebb..eea10bddcd2a 100644
--- a/drivers/pci/endpoint/functions/pci-epf-test.c
+++ b/drivers/pci/endpoint/functions/pci-epf-test.c
@@ -10,6 +10,7 @@
 #include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/io.h>
+#include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/slab.h>
@@ -33,6 +34,8 @@
 #define COMMAND_COPY			BIT(5)
 #define COMMAND_ENABLE_DOORBELL		BIT(6)
 #define COMMAND_DISABLE_DOORBELL	BIT(7)
+#define COMMAND_REMOTE_EDMA_SETUP	BIT(8)
+#define COMMAND_REMOTE_EDMA_CHECKSUM	BIT(9)
 
 #define STATUS_READ_SUCCESS		BIT(0)
 #define STATUS_READ_FAIL		BIT(1)
@@ -48,6 +51,10 @@
 #define STATUS_DOORBELL_ENABLE_FAIL	BIT(11)
 #define STATUS_DOORBELL_DISABLE_SUCCESS BIT(12)
 #define STATUS_DOORBELL_DISABLE_FAIL	BIT(13)
+#define STATUS_REMOTE_EDMA_SETUP_SUCCESS	BIT(14)
+#define STATUS_REMOTE_EDMA_SETUP_FAIL		BIT(15)
+#define STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS	BIT(16)
+#define STATUS_REMOTE_EDMA_CHECKSUM_FAIL	BIT(17)
 
 #define FLAG_USE_DMA			BIT(0)
 
@@ -77,6 +84,9 @@ struct pci_epf_test {
 	bool			dma_private;
 	const struct pci_epc_features *epc_features;
 	struct pci_epf_bar	db_bar;
+
+	/* For extended tests that rely on vendor-specific features */
+	void *data;
 };
 
 struct pci_epf_test_reg {
@@ -117,6 +127,454 @@ static enum pci_barno pci_epf_test_next_free_bar(struct pci_epf_test *epf_test)
 	return bar;
 }
 
+#if IS_REACHABLE(CONFIG_DW_EDMA)
+#include <linux/dma/edma.h>
+
+#define PCITEST_EDMA_INFO_MAGIC		0x414d4445U /* 'EDMA' */
+#define PCITEST_EDMA_INFO_VERSION	0x00010000U
+#define PCITEST_EDMA_TEST_BUF_SIZE	(1024 * 1024)
+
+struct pci_epf_test_edma {
+	/* Remote eDMA test resources */
+	bool			enabled;
+	enum pci_barno		bar;
+	void			*info;
+	size_t			total_size;
+	void			*test_buf;
+	dma_addr_t		test_buf_phys;
+	size_t			test_buf_size;
+
+	/* DW eDMA specifics */
+	phys_addr_t		reg_phys;
+	size_t			reg_submap_sz;
+	unsigned long		reg_iova;
+	size_t			reg_iova_sz;
+	phys_addr_t		ll_rd_phys;
+	size_t			ll_rd_sz_aligned;
+	phys_addr_t		ll_wr_phys;
+	size_t			ll_wr_sz_aligned;
+};
+
+struct pcitest_edma_info {
+	__le32 magic;
+	__le32 version;
+
+	__le32 reg_off;
+	__le32 reg_size;
+
+	__le64 ll_rd_phys;
+	__le32 ll_rd_off;
+	__le32 ll_rd_size;
+
+	__le64 ll_wr_phys;
+	__le32 ll_wr_off;
+	__le32 ll_wr_size;
+
+	__le64 test_buf_phys;
+	__le32 test_buf_size;
+};
+
+static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
+					 enum pci_barno barno)
+{
+	struct pci_epf_test_edma *edma = test->data;
+
+	if (!edma)
+		return false;
+
+	return barno == edma->bar;
+}
+
+static void pci_epf_test_clear_submaps(struct pci_epf_bar *bar)
+{
+	kfree(bar->submap);
+	bar->submap = NULL;
+	bar->num_submap = 0;
+}
+
+static int pci_epf_test_add_submap(struct pci_epf_bar *bar, phys_addr_t phys,
+				   size_t size)
+{
+	struct pci_epf_bar_submap *submap, *new;
+
+	new = krealloc_array(bar->submap, bar->num_submap + 1, sizeof(*new),
+			     GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	bar->submap = new;
+	submap = &bar->submap[bar->num_submap];
+	submap->phys_addr = phys;
+	submap->size = size;
+	bar->num_submap++;
+
+	return 0;
+}
+
+static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
+{
+	struct pci_epf_test_edma *edma = test->data;
+	struct pci_epf *epf = test->epf;
+	struct pci_epc *epc = epf->epc;
+	struct device *dev = epc->dev.parent;
+	struct iommu_domain *dom;
+	struct pci_epf_bar *bar;
+	enum pci_barno barno;
+
+	if (!edma)
+		return;
+
+	barno = edma->bar;
+	if (barno == NO_BAR)
+		return;
+
+	bar = &epf->bar[barno];
+
+	dom = iommu_get_domain_for_dev(dev);
+	if (dom && edma->reg_iova_sz) {
+		iommu_unmap(dom, edma->reg_iova, edma->reg_iova_sz);
+		edma->reg_iova = 0;
+		edma->reg_iova_sz = 0;
+	}
+
+	if (edma->test_buf) {
+		dma_free_coherent(dev, edma->test_buf_size,
+				  edma->test_buf,
+				  edma->test_buf_phys);
+		edma->test_buf = NULL;
+		edma->test_buf_phys = 0;
+		edma->test_buf_size = 0;
+	}
+
+	if (edma->info) {
+		pci_epf_free_space(epf, edma->info, barno, PRIMARY_INTERFACE);
+		edma->info = NULL;
+	}
+
+	pci_epf_test_clear_submaps(bar);
+	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, bar);
+
+	edma->bar = NO_BAR;
+	edma->enabled = false;
+}
+
+static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
+{
+	const struct pci_epc_features *epc_features = test->epc_features;
+	struct pci_epf_test_edma *edma;
+	struct pci_epf *epf = test->epf;
+	struct pci_epc *epc = epf->epc;
+	struct pcitest_edma_info *info;
+	struct device *dev = epc->dev.parent;
+	struct dw_edma_region region;
+	struct iommu_domain *dom;
+	size_t reg_sz_aligned, ll_rd_sz_aligned, ll_wr_sz_aligned;
+	phys_addr_t phys, ll_rd_phys, ll_wr_phys;
+	size_t ll_rd_size, ll_wr_size;
+	resource_size_t reg_size;
+	unsigned long iova;
+	size_t off, size;
+	int ret;
+
+	if (!test->dma_chan_tx || !test->dma_chan_rx)
+		return -ENODEV;
+
+	edma = devm_kzalloc(&epf->dev, sizeof(*edma), GFP_KERNEL);
+	if (!edma)
+		return -ENOMEM;
+	test->data = edma;
+
+	edma->bar = pci_epf_test_next_free_bar(test);
+	if (edma->bar == NO_BAR) {
+		dev_err(&epf->dev, "No spare BAR for remote eDMA (remote eDMA disabled)\n");
+		ret = -ENOSPC;
+		goto err;
+	}
+
+	ret = dw_edma_get_reg_window(epc, &edma->reg_phys, &reg_size);
+	if (ret) {
+		dev_err(dev, "failed to get edma reg window: %d\n", ret);
+		goto err;
+	}
+	dom = iommu_get_domain_for_dev(dev);
+	if (dom) {
+		phys = edma->reg_phys & PAGE_MASK;
+		size = PAGE_ALIGN(reg_size + edma->reg_phys - phys);
+		iova = phys;
+
+		ret = iommu_map(dom, iova, phys, size,
+				IOMMU_READ | IOMMU_WRITE | IOMMU_MMIO,
+				GFP_KERNEL);
+		if (ret) {
+			dev_err(dev, "failed to direct map eDMA reg: %d\n", ret);
+			goto err;
+		}
+		edma->reg_iova = iova;
+		edma->reg_iova_sz = size;
+	}
+
+	/* Get LL location addresses and sizes */
+	ret = dw_edma_chan_get_ll_region(test->dma_chan_rx, &region);
+	if (ret) {
+		dev_err(dev, "failed to get edma ll region for rx: %d\n", ret);
+		goto err;
+	}
+	ll_rd_phys = region.paddr;
+	ll_rd_size = region.sz;
+
+	ret = dw_edma_chan_get_ll_region(test->dma_chan_tx, &region);
+	if (ret) {
+		dev_err(dev, "failed to get edma ll region for tx: %d\n", ret);
+		goto err;
+	}
+	ll_wr_phys = region.paddr;
+	ll_wr_size = region.sz;
+
+	edma->test_buf_size = PCITEST_EDMA_TEST_BUF_SIZE;
+	edma->test_buf = dma_alloc_coherent(dev, edma->test_buf_size,
+					    &edma->test_buf_phys, GFP_KERNEL);
+	if (!edma->test_buf) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	reg_sz_aligned = PAGE_ALIGN(reg_size);
+	ll_rd_sz_aligned = PAGE_ALIGN(ll_rd_size);
+	ll_wr_sz_aligned = PAGE_ALIGN(ll_wr_size);
+	edma->total_size = PAGE_SIZE + reg_sz_aligned + ll_rd_sz_aligned +
+			   ll_wr_sz_aligned;
+	size = roundup_pow_of_two(edma->total_size);
+
+	info = pci_epf_alloc_space(epf, size, edma->bar,
+				   epc_features, PRIMARY_INTERFACE);
+	if (!info) {
+		ret = -ENOMEM;
+		goto err;
+	}
+	memset(info, 0, size);
+
+	off = PAGE_SIZE;
+	info->magic = cpu_to_le32(PCITEST_EDMA_INFO_MAGIC);
+	info->version = cpu_to_le32(PCITEST_EDMA_INFO_VERSION);
+
+	info->reg_off = cpu_to_le32(off);
+	info->reg_size = cpu_to_le32(reg_size);
+	off += reg_sz_aligned;
+
+	info->ll_rd_phys = cpu_to_le64(ll_rd_phys);
+	info->ll_rd_off = cpu_to_le32(off);
+	info->ll_rd_size = cpu_to_le32(ll_rd_size);
+	off += ll_rd_sz_aligned;
+
+	info->ll_wr_phys = cpu_to_le64(ll_wr_phys);
+	info->ll_wr_off = cpu_to_le32(off);
+	info->ll_wr_size = cpu_to_le32(ll_wr_size);
+	off += ll_wr_sz_aligned;
+
+	info->test_buf_phys = cpu_to_le64(edma->test_buf_phys);
+	info->test_buf_size = cpu_to_le32(edma->test_buf_size);
+
+	edma->info = info;
+	edma->reg_submap_sz = reg_sz_aligned;
+	edma->ll_rd_phys = ll_rd_phys;
+	edma->ll_wr_phys = ll_wr_phys;
+	edma->ll_rd_sz_aligned = ll_rd_sz_aligned;
+	edma->ll_wr_sz_aligned = ll_wr_sz_aligned;
+
+	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no,
+			      &epf->bar[edma->bar]);
+	if (ret) {
+		dev_err(dev,
+			"failed to init BAR%d for remote eDMA: %d\n",
+			edma->bar, ret);
+		goto err;
+	}
+	dev_info(dev, "BAR%d initialized for remote eDMA\n", edma->bar);
+
+	return 0;
+
+err:
+	pci_epf_test_clean_remote_edma(test);
+	devm_kfree(&epf->dev, edma);
+	test->data = NULL;
+	return ret;
+}
+
+static int pci_epf_test_map_remote_edma(struct pci_epf_test *test)
+{
+	struct pci_epf_test_edma *edma = test->data;
+	struct pcitest_edma_info *info;
+	struct pci_epf *epf = test->epf;
+	struct pci_epc *epc = epf->epc;
+	struct pci_epf_bar *bar;
+	enum pci_barno barno;
+	struct device *dev = epc->dev.parent;
+	int ret;
+
+	if (!edma)
+		return -ENODEV;
+
+	info = edma->info;
+	barno = edma->bar;
+
+	if (barno == NO_BAR)
+		return -ENOSPC;
+	if (!info || !edma->test_buf)
+		return -ENODEV;
+
+	bar = &epf->bar[barno];
+	pci_epf_test_clear_submaps(bar);
+
+	ret = pci_epf_test_add_submap(bar, bar->phys_addr, PAGE_SIZE);
+	if (ret)
+		return ret;
+
+	ret = pci_epf_test_add_submap(bar, edma->reg_phys, edma->reg_submap_sz);
+	if (ret)
+		goto err_submap;
+
+	ret = pci_epf_test_add_submap(bar, edma->ll_rd_phys,
+				      edma->ll_rd_sz_aligned);
+	if (ret)
+		goto err_submap;
+
+	ret = pci_epf_test_add_submap(bar, edma->ll_wr_phys,
+				      edma->ll_wr_sz_aligned);
+	if (ret)
+		goto err_submap;
+
+	if (bar->size > edma->total_size) {
+		ret = pci_epf_test_add_submap(bar, 0,
+					      bar->size - edma->total_size);
+		if (ret)
+			goto err_submap;
+	}
+
+	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no, bar);
+	if (ret) {
+		dev_err(dev, "failed to map BAR%d: %d\n", barno, ret);
+		goto err_submap;
+	}
+
+	/*
+	 * Endpoint-local interrupts must be ignored even if the host fails to
+	 * mask them.
+	 */
+	ret = dw_edma_chan_irq_config(test->dma_chan_tx, DW_EDMA_CH_IRQ_REMOTE);
+	if (ret) {
+		dev_err(dev, "failed to set irq mode for tx channel: %d\n",
+			ret);
+		goto err_bar;
+	}
+	ret = dw_edma_chan_irq_config(test->dma_chan_rx, DW_EDMA_CH_IRQ_REMOTE);
+	if (ret) {
+		dev_err(dev, "failed to set irq mode for rx channel: %d\n",
+			ret);
+		goto err_bar;
+	}
+
+	return 0;
+err_bar:
+	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, &epf->bar[barno]);
+err_submap:
+	pci_epf_test_clear_submaps(bar);
+	return ret;
+}
+
+static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
+					   struct pci_epf_test_reg *reg)
+{
+	struct pci_epf_test_edma *edma = epf_test->data;
+	size_t size = le32_to_cpu(reg->size);
+	void *buf;
+	int ret;
+
+	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
+		reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
+		return;
+	}
+
+	buf = edma->test_buf;
+
+	if (!edma->enabled) {
+		/* NB. Currently DW eDMA is the only supported backend */
+		ret = pci_epf_test_map_remote_edma(epf_test);
+		if (ret) {
+			WRITE_ONCE(reg->status,
+				   cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL));
+			return;
+		}
+		edma->enabled = true;
+	}
+
+	/* Populate the test buffer with random data */
+	get_random_bytes(buf, size);
+	reg->checksum = cpu_to_le32(crc32_le(~0, buf, size));
+
+	WRITE_ONCE(reg->status, cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_SUCCESS));
+}
+
+static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
+					      struct pci_epf_test_reg *reg)
+{
+	struct pci_epf_test_edma *edma = epf_test->data;
+	u32 status = le32_to_cpu(reg->status);
+	size_t size;
+	void *addr;
+	u32 crc32;
+
+	size = le32_to_cpu(reg->size);
+	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
+		status |= STATUS_REMOTE_EDMA_CHECKSUM_FAIL;
+		reg->status = cpu_to_le32(status);
+		return;
+	}
+
+	addr = edma->test_buf;
+	crc32 = crc32_le(~0, addr, size);
+	status |= STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS;
+
+	reg->checksum = cpu_to_le32(crc32);
+	reg->status = cpu_to_le32(status);
+}
+
+static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
+{
+	dw_edma_chan_irq_config(chan, DW_EDMA_CH_IRQ_DEFAULT);
+}
+#else
+static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
+					 enum pci_barno barno)
+{
+	return false;
+}
+
+static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
+{
+}
+
+static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
+{
+	return -EOPNOTSUPP;
+}
+
+static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
+					   struct pci_epf_test_reg *reg)
+{
+	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
+}
+
+static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
+					      struct pci_epf_test_reg *reg)
+{
+	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_CHECKSUM_FAIL);
+}
+
+static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
+{
+}
+#endif
+
 static void pci_epf_test_dma_callback(void *param)
 {
 	struct pci_epf_test *epf_test = param;
@@ -168,6 +626,8 @@ static int pci_epf_test_data_transfer(struct pci_epf_test *epf_test,
 		return -EINVAL;
 	}
 
+	pci_epf_test_reset_dma_chan(chan);
+
 	if (epf_test->dma_private) {
 		sconf.direction = dir;
 		if (dir == DMA_MEM_TO_DEV)
@@ -870,6 +1330,14 @@ static void pci_epf_test_cmd_handler(struct work_struct *work)
 		pci_epf_test_disable_doorbell(epf_test, reg);
 		pci_epf_test_raise_irq(epf_test, reg);
 		break;
+	case COMMAND_REMOTE_EDMA_SETUP:
+		pci_epf_test_remote_edma_setup(epf_test, reg);
+		pci_epf_test_raise_irq(epf_test, reg);
+		break;
+	case COMMAND_REMOTE_EDMA_CHECKSUM:
+		pci_epf_test_remote_edma_checksum(epf_test, reg);
+		pci_epf_test_raise_irq(epf_test, reg);
+		break;
 	default:
 		dev_err(dev, "Invalid command 0x%x\n", command);
 		break;
@@ -961,6 +1429,10 @@ static int pci_epf_test_epc_init(struct pci_epf *epf)
 	if (ret)
 		epf_test->dma_supported = false;
 
+	ret = pci_epf_test_init_remote_edma(epf_test);
+	if (ret && ret != -EOPNOTSUPP)
+		dev_warn(dev, "Remote eDMA setup failed\n");
+
 	if (epf->vfunc_no <= 1) {
 		ret = pci_epc_write_header(epc, epf->func_no, epf->vfunc_no, header);
 		if (ret) {
@@ -1007,6 +1479,7 @@ static void pci_epf_test_epc_deinit(struct pci_epf *epf)
 	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
 
 	cancel_delayed_work_sync(&epf_test->cmd_handler);
+	pci_epf_test_clean_remote_edma(epf_test);
 	pci_epf_test_clean_dma_chan(epf_test);
 	pci_epf_test_clear_bar(epf);
 }
@@ -1076,6 +1549,9 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
 		if (bar == test_reg_bar)
 			continue;
 
+		if (pci_epf_test_bar_is_reserved(epf_test, bar))
+			continue;
+
 		if (epc_features->bar[bar].type == BAR_FIXED)
 			test_reg_size = epc_features->bar[bar].fixed_size;
 		else
@@ -1146,6 +1622,7 @@ static void pci_epf_test_unbind(struct pci_epf *epf)
 
 	cancel_delayed_work_sync(&epf_test->cmd_handler);
 	if (epc->init_complete) {
+		pci_epf_test_clean_remote_edma(epf_test);
 		pci_epf_test_clean_dma_chan(epf_test);
 		pci_epf_test_clear_bar(epf);
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 37/38] misc: pci_endpoint_test: Add remote eDMA transfer test mode
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (35 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-18 13:54 ` [RFC PATCH v4 38/38] selftests: pci_endpoint: Add remote eDMA transfer coverage Koichiro Den
  2026-01-20 18:30 ` [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Dave Jiang
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Add a new test mode controlled by a flag, PCITEST_FLAGS_USE_REMOTE_EDMA.
When requested, the driver:
- Issues COMMAND_REMOTE_EDMA_SETUP to the endpoint and locates the BAR
  containing a pcitest_edma_info header (magic/version).
- Creates a remote dw-edma instance by ioremapping the endpoint's
  exposed eDMA registers and linked-list regions and probing dw-edma on
  top of it.
- Requests one DMA_SLAVE channel per direction and performs the
  transfer.
- Uses COMMAND_REMOTE_EDMA_CHECKSUM to validate the result when the
  transfer direction is host-to-endpoint. For the opposite direction,
  the endpoint provides the expected checksum up front.

One MSI/MSI-X vector is reserved for the remote dw-edma instance by
freeing the last test IRQ vector. This keeps existing MSI/MSI-X tests
unchanged unless the remote-eDMA mode is invoked.

BAR read/write tests skip the BAR reserved for remote-eDMA metadata to
avoid corrupting the eDMA window.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 drivers/misc/pci_endpoint_test.c | 633 +++++++++++++++++++++++++++++++
 include/uapi/linux/pcitest.h     |   3 +-
 2 files changed, 635 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 1c0fd185114f..52d700374ac6 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -8,7 +8,10 @@
 
 #include <linux/crc32.h>
 #include <linux/cleanup.h>
+#include <linux/completion.h>
 #include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
 #include <linux/fs.h>
 #include <linux/io.h>
 #include <linux/interrupt.h>
@@ -17,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/random.h>
+#include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/pci.h>
@@ -39,6 +43,8 @@
 #define COMMAND_COPY				BIT(5)
 #define COMMAND_ENABLE_DOORBELL			BIT(6)
 #define COMMAND_DISABLE_DOORBELL		BIT(7)
+#define COMMAND_REMOTE_EDMA_SETUP		BIT(8)
+#define COMMAND_REMOTE_EDMA_CHECKSUM		BIT(9)
 
 #define PCI_ENDPOINT_TEST_STATUS		0x8
 #define STATUS_READ_SUCCESS			BIT(0)
@@ -55,6 +61,10 @@
 #define STATUS_DOORBELL_ENABLE_FAIL		BIT(11)
 #define STATUS_DOORBELL_DISABLE_SUCCESS		BIT(12)
 #define STATUS_DOORBELL_DISABLE_FAIL		BIT(13)
+#define STATUS_REMOTE_EDMA_SETUP_SUCCESS	BIT(14)
+#define STATUS_REMOTE_EDMA_SETUP_FAIL		BIT(15)
+#define STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS	BIT(16)
+#define STATUS_REMOTE_EDMA_CHECKSUM_FAIL	BIT(17)
 
 #define PCI_ENDPOINT_TEST_LOWER_SRC_ADDR	0x0c
 #define PCI_ENDPOINT_TEST_UPPER_SRC_ADDR	0x10
@@ -130,6 +140,9 @@ struct pci_endpoint_test {
 	size_t alignment;
 	u32 ep_caps;
 	const char *name;
+
+	/* For extended tests that rely on vendor-specific features */
+	void *data;
 };
 
 struct pci_endpoint_test_data {
@@ -149,6 +162,610 @@ static inline void pci_endpoint_test_writel(struct pci_endpoint_test *test,
 	writel(value, test->base + offset);
 }
 
+static irqreturn_t pci_endpoint_test_irqhandler(int irq, void *dev_id);
+
+#if IS_REACHABLE(CONFIG_DW_EDMA)
+#include <linux/dma/edma.h>
+
+#define PCITEST_EDMA_INFO_MAGIC		0x414d4445U /* 'EDMA' */
+#define PCITEST_EDMA_INFO_VERSION	0x00010000U
+
+struct pci_endpoint_test_edma {
+	bool			probed;
+	void __iomem		*bar_base;
+	int			irq;
+
+	/* Remote dw-edma instance */
+	struct dw_edma_chip	chip;
+
+	/* One channel per direction */
+	struct dma_chan		*m2d;
+	struct dma_chan		*d2m;
+};
+
+struct pcitest_edma_info {
+	__le32 magic;
+	__le32 version;
+
+	__le32 reg_off;
+	__le32 reg_size;
+
+	__le64 ll_rd_phys;
+	__le32 ll_rd_off;
+	__le32 ll_rd_size;
+
+	__le64 ll_wr_phys;
+	__le32 ll_wr_off;
+	__le32 ll_wr_size;
+
+	__le64 test_buf_phys;
+	__le32 test_buf_size;
+};
+
+struct pci_endpoint_test_edma_filter {
+	struct device *dma_dev;
+	unsigned long direction;
+};
+
+static bool test_edma_filter_fn(struct dma_chan *chan, void *param)
+{
+	struct pci_endpoint_test_edma_filter *filter = param;
+	u32 dir = filter->direction;
+	struct dma_slave_caps caps;
+	int ret;
+
+	if (chan->device->dev != filter->dma_dev)
+		return false;
+
+	ret = dma_get_slave_caps(chan, &caps);
+	if (ret < 0)
+		return false;
+
+	return !!(caps.directions & dir);
+}
+
+static int pci_endpoint_test_edma_irq_vector(struct device *dev, unsigned int nr)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct pci_endpoint_test *test = pci_get_drvdata(pdev);
+	struct pci_endpoint_test_edma *edma;
+
+	if (!test)
+		return -EINVAL;
+
+	edma = test->data;
+	if (!edma)
+		return -EINVAL;
+
+	/*
+	 * Only one vector is reserved for remote eDMA use, thus 'nr' is
+	 * ignored. See pci_endpoint_test_edma_reserve_irq().
+	 */
+	return pci_irq_vector(pdev, edma->irq);
+}
+
+static enum pci_barno pci_endpoint_test_edma_bar(struct pci_dev *pdev)
+{
+	int bar;
+
+	for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+		void __iomem *base;
+		u32 magic;
+
+		if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+			continue;
+		if (!pci_resource_len(pdev, bar))
+			continue;
+
+		base = pci_iomap_range(pdev, bar, 0, sizeof(u32));
+		if (!base)
+			continue;
+
+		magic = ioread32(base);
+		pci_iounmap(pdev, base);
+
+		if (magic == PCITEST_EDMA_INFO_MAGIC)
+			return bar;
+	}
+	return NO_BAR;
+}
+
+static bool pci_endpoint_test_bar_is_reserved(struct pci_endpoint_test *test,
+					      enum pci_barno barno)
+{
+	struct pci_dev *pdev = test->pdev;
+	enum pci_barno edma_bar = pci_endpoint_test_edma_bar(pdev);
+
+	return barno == NO_BAR || barno == edma_bar;
+}
+
+static void pci_endpoint_test_dw_edma_cleanup(struct pci_endpoint_test *test,
+					      struct pci_endpoint_test_edma *edma)
+{
+	if (!edma)
+		return;
+
+	if (edma->m2d) {
+		dmaengine_terminate_sync(edma->m2d);
+		dma_release_channel(edma->m2d);
+		edma->m2d = NULL;
+	}
+
+	if (edma->d2m) {
+		dmaengine_terminate_sync(edma->d2m);
+		dma_release_channel(edma->d2m);
+		edma->d2m = NULL;
+	}
+
+	if (edma->probed) {
+		dw_edma_remove(&edma->chip);
+		edma->probed = false;
+	}
+
+	if (edma->bar_base) {
+		pci_iounmap(test->pdev, edma->bar_base);
+		edma->bar_base = NULL;
+	}
+}
+
+static void pci_endpoint_test_remote_edma_teardown(struct pci_endpoint_test *test)
+{
+	struct pci_endpoint_test_edma *edma = test->data;
+
+	pci_endpoint_test_dw_edma_cleanup(test, edma);
+	kfree(edma);
+	test->data = NULL;
+}
+
+/*
+ * Reserve exactly one IRQ vector for dw-edma by freeing the last handler.
+ * This avoids changing existing MSI/MSI-X tests unless remote eDMA is used.
+ */
+static int pci_endpoint_test_edma_reserve_irq(struct pci_endpoint_test *test)
+{
+	struct pci_dev *pdev = test->pdev;
+
+	if (test->irq_type != PCITEST_IRQ_TYPE_MSI &&
+	    test->irq_type != PCITEST_IRQ_TYPE_MSIX)
+		return -EOPNOTSUPP;
+
+	if (test->num_irqs < 2)
+		return -ENOSPC;
+
+	/* use the last vector for remote eDMA use */
+	free_irq(pci_irq_vector(pdev, test->num_irqs - 1), test);
+	return test->num_irqs - 1;
+}
+
+static void pci_endpoint_test_edma_restore_irq(struct pci_endpoint_test *test)
+{
+	struct pci_dev *pdev = test->pdev;
+	int ret;
+
+	ret = request_irq(pci_irq_vector(pdev, test->num_irqs - 1),
+			  pci_endpoint_test_irqhandler, IRQF_SHARED, test->name,
+			  test);
+	if (ret)
+		dev_warn(&pdev->dev,
+			 "failed to restore IRQ vector %d after remote eDMA: %d\n",
+			 test->num_irqs - 1, ret);
+}
+
+static const struct dw_edma_plat_ops test_edma_ops = {
+	.irq_vector     = pci_endpoint_test_edma_irq_vector,
+};
+
+static int pci_endpoint_test_dw_edma_setup(struct pci_endpoint_test *test)
+{
+	struct pci_endpoint_test_edma *edma = test->data;
+	struct pci_endpoint_test_edma_filter f;
+	struct pci_endpoint_test_edma *new;
+	struct pci_dev *pdev = test->pdev;
+	struct device *dev = &pdev->dev;
+	struct pcitest_edma_info info;
+	resource_size_t bar_size;
+	u32 ll_rd_off, ll_rd_size;
+	u32 ll_wr_off, ll_wr_size;
+	u32 reg_off, reg_size;
+	dma_cap_mask_t mask;
+	enum pci_barno bar;
+	int ret;
+
+	if (edma && edma->probed)
+		return 0;
+
+	new = kzalloc_obj(*new, GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+
+	ret = pci_endpoint_test_edma_reserve_irq(test);
+	if (ret < 0)
+		goto err_free;
+	new->irq = ret;
+
+	bar = pci_endpoint_test_edma_bar(pdev);
+	if (bar == NO_BAR) {
+		ret = -EOPNOTSUPP;
+		goto err_restore_irq;
+	}
+
+	new->bar_base = pci_iomap(pdev, bar, 0);
+	if (!new->bar_base) {
+		ret = -ENOMEM;
+		goto err_restore_irq;
+	}
+	bar_size = pci_resource_len(pdev, bar);
+
+	/* Snapshot the info (avoid repeated __iomem reads). */
+	memcpy_fromio(&info, new->bar_base, sizeof(info));
+	if (le32_to_cpu(info.magic) != PCITEST_EDMA_INFO_MAGIC ||
+	    le32_to_cpu(info.version) != PCITEST_EDMA_INFO_VERSION) {
+		dev_err(&pdev->dev, "Invalid eDMA info\n");
+		ret = -EINVAL;
+		goto err_cleanup;
+	}
+
+	reg_off = le32_to_cpu(info.reg_off);
+	reg_size = le32_to_cpu(info.reg_size);
+	ll_rd_off = le32_to_cpu(info.ll_rd_off);
+	ll_rd_size = le32_to_cpu(info.ll_rd_size);
+	ll_wr_off = le32_to_cpu(info.ll_wr_off);
+	ll_wr_size = le32_to_cpu(info.ll_wr_size);
+
+	if (reg_off > bar_size || reg_size > bar_size - reg_off ||
+	    ll_rd_off > bar_size || ll_rd_size > bar_size - ll_rd_off ||
+	    ll_wr_off > bar_size || ll_wr_size > bar_size - ll_wr_off) {
+		dev_err(&pdev->dev, "eDMA info offsets out of BAR range\n");
+		ret = -EINVAL;
+		goto err_cleanup;
+	}
+
+	memset(&new->chip, 0, sizeof(new->chip));
+	new->chip.dev = &pdev->dev;
+	new->chip.mf = EDMA_MF_EDMA_UNROLL;
+	new->chip.nr_irqs = 1;
+	new->chip.ops = &test_edma_ops;
+	new->chip.reg_base = new->bar_base + reg_off;
+	new->chip.ll_rd_cnt = 1;
+	new->chip.ll_region_rd[0].paddr = le64_to_cpu(info.ll_rd_phys);
+	new->chip.ll_region_rd[0].vaddr.io = new->bar_base + ll_rd_off;
+	new->chip.ll_region_rd[0].sz = ll_rd_size;
+	new->chip.ll_wr_cnt = 1;
+	new->chip.ll_region_wr[0].paddr = le64_to_cpu(info.ll_wr_phys);
+	new->chip.ll_region_wr[0].vaddr.io = new->bar_base + ll_wr_off;
+	new->chip.ll_region_wr[0].sz = ll_wr_size;
+
+	test->data = new;
+	ret = dw_edma_probe(&new->chip);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to probe eDMA: %d\n", ret);
+		goto err_cleanup;
+	}
+	new->probed = true;
+
+	/* Request one channel per direction. */
+	dma_cap_zero(mask);
+	dma_cap_set(DMA_SLAVE, mask);
+	f.dma_dev = dev;
+	f.direction = BIT(DMA_MEM_TO_DEV);
+	new->m2d = dma_request_channel(mask, test_edma_filter_fn, &f);
+	f.direction = BIT(DMA_DEV_TO_MEM);
+	new->d2m = dma_request_channel(mask, test_edma_filter_fn, &f);
+	if (!new->m2d || !new->d2m) {
+		ret = -ENODEV;
+		goto err_cleanup;
+	}
+
+	/*
+	 * Best-effort attempt, ie. even if it fails for some reason, the
+	 * endpoint will ignore endpoint-local interrupts (edma_int bus).
+	 */
+	dw_edma_chan_irq_config(new->m2d, DW_EDMA_CH_IRQ_REMOTE);
+	dw_edma_chan_irq_config(new->d2m, DW_EDMA_CH_IRQ_REMOTE);
+
+	return 0;
+err_cleanup:
+	pci_endpoint_test_dw_edma_cleanup(test, new);
+err_restore_irq:
+	pci_endpoint_test_edma_restore_irq(test);
+err_free:
+	kfree(new);
+	test->data = NULL;
+	return ret;
+}
+
+static int pci_endpoint_test_remote_edma_setup(struct pci_endpoint_test *test,
+					       size_t size)
+{
+	struct pci_dev *pdev = test->pdev;
+	struct device *dev = &pdev->dev;
+	unsigned long left;
+	u32 status;
+
+	/* Same rule as existing tests: IRQ type must be configured first */
+	if (test->irq_type != PCITEST_IRQ_TYPE_MSI &&
+	    test->irq_type != PCITEST_IRQ_TYPE_MSIX) {
+		dev_err(dev, "Invalid IRQ type for remote eDMA\n");
+		return -EINVAL;
+	}
+
+	/* Need one spare vector for dw-edma */
+	if (test->num_irqs < 2)
+		return -ENOSPC;
+
+	/*
+	 * Ensure EP command handler won't reject us due to stale flags.
+	 * (remote-eDMA setup itself is not "FLAG_USE_DMA")
+	 */
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_FLAGS, 0);
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_IRQ_TYPE,
+				 test->irq_type);
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_IRQ_NUMBER, 1);
+
+	reinit_completion(&test->irq_raised);
+	test->last_irq = -ENODATA;
+
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_SIZE, size);
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
+				 COMMAND_REMOTE_EDMA_SETUP);
+
+	left = wait_for_completion_timeout(&test->irq_raised,
+					   msecs_to_jiffies(1000));
+	if (!left)
+		return -ETIMEDOUT;
+
+	status = pci_endpoint_test_readl(test, PCI_ENDPOINT_TEST_STATUS);
+	if (status & STATUS_REMOTE_EDMA_SETUP_FAIL) {
+		dev_err(dev, "Endpoint failed to setup remote eDMA window\n");
+		return -EIO;
+	}
+	if (!(status & STATUS_REMOTE_EDMA_SETUP_SUCCESS)) {
+		dev_err(dev,
+			"Endpoint did not report remote eDMA setup success\n");
+		return -EIO;
+	}
+
+	return pci_endpoint_test_dw_edma_setup(test);
+}
+
+static int pci_endpoint_test_edma_xfer(struct pci_dev *pdev,
+				       struct pci_endpoint_test_edma *edma,
+				       void *buf, size_t len,
+				       dma_addr_t dev_addr,
+				       enum dma_transfer_direction dir)
+{
+	struct dma_async_tx_descriptor *tx;
+	enum dma_data_direction map_dir;
+	struct device *dev = &pdev->dev;
+	struct dma_slave_config cfg;
+	struct completion done;
+	struct dma_chan *chan;
+	struct scatterlist sg;
+	dma_cookie_t cookie;
+	int ret;
+
+	memset(&cfg, 0, sizeof(cfg));
+	if (dir == DMA_MEM_TO_DEV) {
+		chan = edma->m2d;
+		map_dir = DMA_TO_DEVICE;
+		cfg.direction = DMA_MEM_TO_DEV;
+		cfg.dst_addr = dev_addr;
+	} else if (dir == DMA_DEV_TO_MEM) {
+		chan = edma->d2m;
+		map_dir = DMA_FROM_DEVICE;
+		cfg.direction = DMA_DEV_TO_MEM;
+		cfg.src_addr = dev_addr;
+	} else {
+		return -EINVAL;
+	}
+
+	ret = dmaengine_slave_config(chan, &cfg);
+	if (ret)
+		return ret;
+
+	sg_init_one(&sg, buf, len);
+	if (!dma_map_sg(dev, &sg, 1, map_dir)) {
+		dev_err(dev, "unable to map local address\n");
+		return -EIO;
+	}
+
+	tx = dmaengine_prep_slave_sg(chan, &sg, 1, dir,
+				     DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	if (!tx) {
+		dev_err(dev, "failed to prepare slave for sg\n");
+		ret = -EIO;
+		goto unmap;
+	}
+
+	init_completion(&done);
+	tx->callback = (dma_async_tx_callback)complete;
+	tx->callback_param = &done;
+
+	cookie = dmaengine_submit(tx);
+	ret = dma_submit_error(cookie);
+	if (ret) {
+		dev_err(dev, "remote eDMA submission error: %d\n", ret);
+		goto unmap;
+	}
+
+	dma_async_issue_pending(chan);
+
+	if (!wait_for_completion_timeout(&done, msecs_to_jiffies(5000))) {
+		dev_err(dev, "remote eDMA transfer timeout\n");
+		dmaengine_terminate_sync(chan);
+		ret = -ETIMEDOUT;
+		goto unmap;
+	}
+
+	ret = 0;
+unmap:
+	dma_unmap_sg(dev, &sg, 1, map_dir);
+	return ret;
+}
+
+static int pci_endpoint_test_edma_write(struct pci_endpoint_test *test,
+					size_t size)
+{
+	struct pci_endpoint_test_edma *edma;
+	struct pci_dev *pdev = test->pdev;
+	struct device *dev = &pdev->dev;
+	struct pcitest_edma_info info;
+	u32 reg, crc32, peer_crc32;
+	unsigned long left;
+	int ret;
+
+	/*
+	 * Note that test->alignment does not apply here. If some vendor
+	 * dmaengine for remote use may impose some alignment restriction, we
+	 * may as well introduce another field such as
+	 * test->remote_dma_alignment.
+	 */
+	void *orig_addr __free(kfree) = kzalloc(size, GFP_KERNEL);
+	if (!orig_addr)
+		return -ENOMEM;
+
+	ret = pci_endpoint_test_remote_edma_setup(test, size);
+	if (ret)
+		return ret;
+
+	edma = test->data;
+	if (!edma) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	get_random_bytes(orig_addr, size);
+
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_STATUS, 0);
+
+	memcpy_fromio(&info, edma->bar_base, sizeof(info));
+	if (le32_to_cpu(info.test_buf_size) < size) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = pci_endpoint_test_edma_xfer(test->pdev, edma, orig_addr, size,
+					  le64_to_cpu(info.test_buf_phys),
+					  DMA_MEM_TO_DEV);
+	if (ret) {
+		dev_err(dev, "pci_endpoint_test_edma_xfer error: %d\n", ret);
+		goto err;
+	}
+
+	reinit_completion(&test->irq_raised);
+
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_STATUS, 0);
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_SIZE, size);
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_COMMAND,
+				 COMMAND_REMOTE_EDMA_CHECKSUM);
+
+	left = wait_for_completion_timeout(&test->irq_raised,
+					   msecs_to_jiffies(1000));
+
+	reg = pci_endpoint_test_readl(test, PCI_ENDPOINT_TEST_STATUS);
+
+	if (!left || !(reg & STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS)) {
+		dev_err(dev, "Failed to get checksum\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	crc32 = crc32_le(~0, orig_addr, size);
+	peer_crc32 = pci_endpoint_test_readl(test, PCI_ENDPOINT_TEST_CHECKSUM);
+	if (crc32 != peer_crc32) {
+		dev_err(dev,
+			"Checksum mismatch: %#x vs %#x\n", crc32, peer_crc32);
+		ret = -EINVAL;
+	}
+err:
+	pci_endpoint_test_remote_edma_teardown(test);
+	pci_endpoint_test_edma_restore_irq(test);
+	return ret;
+}
+
+static int pci_endpoint_test_edma_read(struct pci_endpoint_test *test,
+				       size_t size)
+{
+	struct pci_endpoint_test_edma *edma;
+	struct pci_dev *pdev = test->pdev;
+	struct device *dev = &pdev->dev;
+	struct pcitest_edma_info info;
+	u32 crc32, peer_crc32;
+	int ret;
+
+	/*
+	 * Note that test->alignment does not apply here. If some vendor
+	 * dmaengine for remote use may impose some alignment restriction, we
+	 * may as well introduce another field such as
+	 * test->remote_dma_alignment.
+	 */
+	void *orig_addr __free(kfree) = kzalloc(size, GFP_KERNEL);
+	if (!orig_addr)
+		return -ENOMEM;
+
+	ret = pci_endpoint_test_remote_edma_setup(test, size);
+	if (ret)
+		return ret;
+
+	peer_crc32 = pci_endpoint_test_readl(test, PCI_ENDPOINT_TEST_CHECKSUM);
+
+	edma = test->data;
+	if (!edma) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	pci_endpoint_test_writel(test, PCI_ENDPOINT_TEST_STATUS, 0);
+
+	memcpy_fromio(&info, edma->bar_base, sizeof(info));
+	if (le32_to_cpu(info.test_buf_size) < size) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = pci_endpoint_test_edma_xfer(test->pdev, edma, orig_addr, size,
+					  le64_to_cpu(info.test_buf_phys),
+					  DMA_DEV_TO_MEM);
+	if (ret) {
+		dev_err(dev, "pci_endpoint_test_edma_xfer error: %d\n", ret);
+		goto err;
+	}
+
+	crc32 = crc32_le(~0, orig_addr, size);
+	if (crc32 != peer_crc32) {
+		dev_err(dev,
+			"Checksum mismatch: %#x vs %#x\n", crc32, peer_crc32);
+		ret = -EINVAL;
+	}
+err:
+	pci_endpoint_test_remote_edma_teardown(test);
+	pci_endpoint_test_edma_restore_irq(test);
+	return ret;
+}
+#else
+static bool pci_endpoint_test_bar_is_reserved(struct pci_endpoint_test *test,
+					      enum pci_barno barno)
+{
+	return 0;
+}
+
+static void pci_endpoint_test_remote_edma_teardown(struct pci_endpoint_test *test)
+{
+}
+
+static int pci_endpoint_test_edma_write(struct pci_endpoint_test *test,
+					size_t size)
+{
+	return -EOPNOTSUPP;
+}
+
+static int pci_endpoint_test_edma_read(struct pci_endpoint_test *test,
+				       size_t size)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 static irqreturn_t pci_endpoint_test_irqhandler(int irq, void *dev_id)
 {
 	struct pci_endpoint_test *test = dev_id;
@@ -307,6 +924,9 @@ static int pci_endpoint_test_bar(struct pci_endpoint_test *test,
 	if (barno == test->test_reg_bar)
 		bar_size = 0x4;
 
+	if (pci_endpoint_test_bar_is_reserved(test, barno))
+		return -EOPNOTSUPP;
+
 	/*
 	 * Allocate a buffer of max size 1MB, and reuse that buffer while
 	 * iterating over the whole BAR size (which might be much larger).
@@ -354,6 +974,9 @@ static void pci_endpoint_test_bars_write_bar(struct pci_endpoint_test *test,
 	if (barno == test->test_reg_bar)
 		size = 0x4;
 
+	if (pci_endpoint_test_bar_is_reserved(test, barno))
+		return;
+
 	for (j = 0; j < size; j += 4)
 		writel_relaxed(bar_test_pattern_with_offset(barno, j),
 			       test->bar[barno] + j);
@@ -372,6 +995,9 @@ static int pci_endpoint_test_bars_read_bar(struct pci_endpoint_test *test,
 	if (barno == test->test_reg_bar)
 		size = 0x4;
 
+	if (pci_endpoint_test_bar_is_reserved(test, barno))
+		return 0;
+
 	for (j = 0; j < size; j += 4) {
 		u32 expected = bar_test_pattern_with_offset(barno, j);
 
@@ -645,6 +1271,9 @@ static int pci_endpoint_test_write(struct pci_endpoint_test *test,
 
 	size = param.size;
 
+	if (param.flags & PCITEST_FLAGS_USE_REMOTE_EDMA)
+		return pci_endpoint_test_edma_write(test, size);
+
 	use_dma = !!(param.flags & PCITEST_FLAGS_USE_DMA);
 	if (use_dma)
 		flags |= FLAG_USE_DMA;
@@ -742,6 +1371,9 @@ static int pci_endpoint_test_read(struct pci_endpoint_test *test,
 
 	size = param.size;
 
+	if (param.flags & PCITEST_FLAGS_USE_REMOTE_EDMA)
+		return pci_endpoint_test_edma_read(test, size);
+
 	use_dma = !!(param.flags & PCITEST_FLAGS_USE_DMA);
 	if (use_dma)
 		flags |= FLAG_USE_DMA;
@@ -1139,6 +1771,7 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev)
 	if (id < 0)
 		return;
 
+	pci_endpoint_test_remote_edma_teardown(test);
 	pci_endpoint_test_release_irq(test);
 	pci_endpoint_test_free_irq_vectors(test);
 
diff --git a/include/uapi/linux/pcitest.h b/include/uapi/linux/pcitest.h
index d6023a45a9d0..c72d999cecf7 100644
--- a/include/uapi/linux/pcitest.h
+++ b/include/uapi/linux/pcitest.h
@@ -30,7 +30,8 @@
 #define PCITEST_IRQ_TYPE_MSIX		2
 #define PCITEST_IRQ_TYPE_AUTO		3
 
-#define PCITEST_FLAGS_USE_DMA	0x00000001
+#define PCITEST_FLAGS_USE_DMA		0x00000001
+#define PCITEST_FLAGS_USE_REMOTE_EDMA	0x00000002
 
 struct pci_endpoint_test_xfer_param {
 	unsigned long size;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [RFC PATCH v4 38/38] selftests: pci_endpoint: Add remote eDMA transfer coverage
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (36 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 37/38] misc: pci_endpoint_test: Add remote eDMA transfer test mode Koichiro Den
@ 2026-01-18 13:54 ` Koichiro Den
  2026-01-20 18:30 ` [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Dave Jiang
  38 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-18 13:54 UTC (permalink / raw)
  To: Frank.Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

Extend the pci_endpoint_test kselftest with a 'remote_edma' variant for
READ/WRITE tests. The variant sets PCITEST_FLAGS_USE_REMOTE_EDMA and
skips the test when the feature is not supported.

Also treat -EOPNOTSUPP from BAR tests as "BAR is reserved" and skip,
since the host driver may reserve a BAR for remote-eDMA metadata.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
 .../selftests/pci_endpoint/pci_endpoint_test.c  | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c b/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
index 23aac6f97061..39593da3b40d 100644
--- a/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
+++ b/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
@@ -67,6 +67,8 @@ TEST_F(pci_ep_bar, BAR_TEST)
 	pci_ep_ioctl(PCITEST_BAR, variant->barno);
 	if (ret == -ENODATA)
 		SKIP(return, "BAR is disabled");
+	if (ret == -EOPNOTSUPP)
+		SKIP(return, "BAR is reserved");
 	EXPECT_FALSE(ret) TH_LOG("Test failed for BAR%d", variant->barno);
 }
 
@@ -165,16 +167,25 @@ FIXTURE_TEARDOWN(pci_ep_data_transfer)
 FIXTURE_VARIANT(pci_ep_data_transfer)
 {
 	bool use_dma;
+	bool use_remote_edma;
 };
 
 FIXTURE_VARIANT_ADD(pci_ep_data_transfer, memcpy)
 {
 	.use_dma = false,
+	.use_remote_edma = false,
 };
 
 FIXTURE_VARIANT_ADD(pci_ep_data_transfer, dma)
 {
 	.use_dma = true,
+	.use_remote_edma = false,
+};
+
+FIXTURE_VARIANT_ADD(pci_ep_data_transfer, remote_edma)
+{
+	.use_dma = false,
+	.use_remote_edma = true,
 };
 
 TEST_F(pci_ep_data_transfer, READ_TEST)
@@ -184,6 +195,8 @@ TEST_F(pci_ep_data_transfer, READ_TEST)
 
 	if (variant->use_dma)
 		param.flags = PCITEST_FLAGS_USE_DMA;
+	if (variant->use_remote_edma)
+		param.flags = PCITEST_FLAGS_USE_REMOTE_EDMA;
 
 	pci_ep_ioctl(PCITEST_SET_IRQTYPE, PCITEST_IRQ_TYPE_AUTO);
 	ASSERT_EQ(0, ret) TH_LOG("Can't set AUTO IRQ type");
@@ -203,6 +216,8 @@ TEST_F(pci_ep_data_transfer, WRITE_TEST)
 
 	if (variant->use_dma)
 		param.flags = PCITEST_FLAGS_USE_DMA;
+	if (variant->use_remote_edma)
+		param.flags = PCITEST_FLAGS_USE_REMOTE_EDMA;
 
 	pci_ep_ioctl(PCITEST_SET_IRQTYPE, PCITEST_IRQ_TYPE_AUTO);
 	ASSERT_EQ(0, ret) TH_LOG("Can't set AUTO IRQ type");
@@ -222,6 +237,8 @@ TEST_F(pci_ep_data_transfer, COPY_TEST)
 
 	if (variant->use_dma)
 		param.flags = PCITEST_FLAGS_USE_DMA;
+	if (variant->use_remote_edma)
+		SKIP(return, "Remote eDMA is not supported");
 
 	pci_ep_ioctl(PCITEST_SET_IRQTYPE, PCITEST_IRQ_TYPE_AUTO);
 	ASSERT_EQ(0, ret) TH_LOG("Can't set AUTO IRQ type");
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-18 13:54 ` [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control Koichiro Den
@ 2026-01-18 17:03   ` Frank Li
  2026-01-19 14:26     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-18 17:03 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:04PM +0900, Koichiro Den wrote:
> DesignWare EP eDMA can generate interrupts both locally and remotely
> (LIE/RIE). Remote eDMA users need to decide, per channel, whether
> completions should be handled locally, remotely, or both. Unless
> carefully configured, the endpoint and host would race to ack the
> interrupt.
>
> Introduce a per-channel interrupt routing mode and export small APIs to
> configure and query it. Update v0 programming so that RIE and local
> done/abort interrupt masking follow the selected mode. The default mode
> keeps the original behavior, so unless the new APIs are explicitly used,
> no functional changes.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
>  drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
>  drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
>  include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
>  4 files changed, 116 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> index b9d59c3c0cb4..059b3996d383 100644
> --- a/drivers/dma/dw-edma/dw-edma-core.c
> +++ b/drivers/dma/dw-edma/dw-edma-core.c
> @@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
>  		chan->configured = false;
>  		chan->request = EDMA_REQ_NONE;
>  		chan->status = EDMA_ST_IDLE;
> +		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
>
>  		if (chan->dir == EDMA_DIR_WRITE)
>  			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
> @@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
>  }
>  EXPORT_SYMBOL_GPL(dw_edma_remove);
>
> +int dw_edma_chan_irq_config(struct dma_chan *dchan,
> +			    enum dw_edma_ch_irq_mode mode)
> +{
> +	struct dw_edma_chan *chan;
> +
> +	switch (mode) {
> +	case DW_EDMA_CH_IRQ_DEFAULT:
> +	case DW_EDMA_CH_IRQ_LOCAL:
> +	case DW_EDMA_CH_IRQ_REMOTE:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (!dchan || !dchan->device)
> +		return -ENODEV;
> +
> +	chan = dchan2dw_edma_chan(dchan);
> +	if (!chan)
> +		return -ENODEV;
> +
> +	chan->irq_mode = mode;
> +
> +	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
> +		 str_write_read(chan->dir == EDMA_DIR_WRITE),
> +		 chan->id, mode);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
> +
> +bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
> +{
> +	struct dw_edma_chan *chan;
> +	struct dw_edma *dw;
> +
> +	if (!dchan || !dchan->device)
> +		return false;
> +
> +	chan = dchan2dw_edma_chan(dchan);
> +	if (!chan)
> +		return false;
> +
> +	dw = chan->dw;
> +	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
> +		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
> +	else
> +		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
> +}
> +EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
> +
>  MODULE_LICENSE("GPL v2");
>  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
>  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> index 71894b9e0b15..8458d676551a 100644
> --- a/drivers/dma/dw-edma/dw-edma-core.h
> +++ b/drivers/dma/dw-edma/dw-edma-core.h
> @@ -81,6 +81,8 @@ struct dw_edma_chan {
>
>  	struct msi_msg			msi;
>
> +	enum dw_edma_ch_irq_mode	irq_mode;
> +
>  	enum dw_edma_request		request;
>  	enum dw_edma_status		status;
>  	u8				configured;
> diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> index 2850a9df80f5..80472148c335 100644
> --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> @@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
>  	for_each_set_bit(pos, &val, total) {
>  		chan = &dw->chan[pos + off];
>
> -		dw_edma_v0_core_clear_done_int(chan);
> -		done(chan);
> +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> +			dw_edma_v0_core_clear_done_int(chan);
> +			done(chan);
> +		}
>
>  		ret = IRQ_HANDLED;
>  	}
> @@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
>  	for_each_set_bit(pos, &val, total) {
>  		chan = &dw->chan[pos + off];
>
> -		dw_edma_v0_core_clear_abort_int(chan);
> -		abort(chan);
> +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> +			dw_edma_v0_core_clear_abort_int(chan);
> +			abort(chan);
> +		}
>
>  		ret = IRQ_HANDLED;
>  	}
> @@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
>  		j--;
>  		if (!j) {
>  			control |= DW_EDMA_V0_LIE;
> -			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
> +			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
> +			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
>  				control |= DW_EDMA_V0_RIE;
>  		}
>
> @@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
>  				break;
>  			}
>  		}
> -		/* Interrupt unmask - done, abort */
> +		/* Interrupt mask/unmask - done, abort */
>  		raw_spin_lock_irqsave(&dw->lock, flags);
>
>  		tmp = GET_RW_32(dw, chan->dir, int_mask);
> -		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> -		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> +		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> +			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> +			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> +		} else {
> +			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> +			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> +		}
>  		SET_RW_32(dw, chan->dir, int_mask, tmp);
>  		/* Linked list error */
>  		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> index ffad10ff2cd6..6f50165ac084 100644
> --- a/include/linux/dma/edma.h
> +++ b/include/linux/dma/edma.h
> @@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
>  	DW_EDMA_CHIP_LOCAL	= BIT(0),
>  };
>
> +/*
> + * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
> + * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
> + * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
> + * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
> + *
> + * Some implementations require using LIE=1/RIE=1 with the local interrupt
> + * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
> + * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
> + * Write Interrupt Generation".
> + */
> +enum dw_edma_ch_irq_mode {
> +	DW_EDMA_CH_IRQ_DEFAULT	= 0,
> +	DW_EDMA_CH_IRQ_LOCAL,
> +	DW_EDMA_CH_IRQ_REMOTE,
> +};
> +
>  /**
>   * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
>   * @dev:		 struct device of the eDMA controller
> @@ -105,6 +122,22 @@ struct dw_edma_chip {
>  #if IS_REACHABLE(CONFIG_DW_EDMA)
>  int dw_edma_probe(struct dw_edma_chip *chip);
>  int dw_edma_remove(struct dw_edma_chip *chip);
> +/**
> + * dw_edma_chan_irq_config - configure per-channel interrupt routing
> + * @chan: DMA channel obtained from dma_request_channel()
> + * @mode: interrupt routing mode
> + *
> + * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
> + * not belong to the DesignWare eDMA driver.
> + */
> +int dw_edma_chan_irq_config(struct dma_chan *chan,
> +			    enum dw_edma_ch_irq_mode mode);
> +
> +/**
> + * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
> + * @chan: DMA channel obtained from dma_request_channel()
> + */
> +bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
>  #else
>  static inline int dw_edma_probe(struct dw_edma_chip *chip)
>  {
> @@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
>  {
>  	return 0;
>  }
> +
> +static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
> +					  enum dw_edma_ch_irq_mode mode)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
> +{
> +	return false;
> +}

I think it'd better go thought

struct dma_slave_config {
	...
        void *peripheral_config;
	size_t peripheral_size;

};

So DMA consumer can use standard DMAengine API, dmaengine_slave_config().

Frank
>  #endif /* CONFIG_DW_EDMA */
>
>  struct pci_epc;
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-18 13:54 ` [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region Koichiro Den
@ 2026-01-18 17:05   ` Frank Li
  2026-01-21  1:38     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-18 17:05 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> A remote eDMA provider may need to expose the linked-list (LL) memory
> region that was configured by platform glue (typically at boot), so the
> peer (host) can map it and operate the remote view of the controller.
>
> Export dw_edma_chan_get_ll_region() to return the LL region associated
> with a given dma_chan.

This informaiton passed from dwc epc driver. Is it possible to get it from
EPC driver.

Frank
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
>  include/linux/dma/edma.h           | 14 ++++++++++++++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> --- a/drivers/dma/dw-edma/dw-edma-core.c
> +++ b/drivers/dma/dw-edma/dw-edma-core.c
> @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
>  }
>  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
>
> +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> +			       struct dw_edma_region *region)
> +{
> +	struct dw_edma_chip *chip;
> +	struct dw_edma_chan *chan;
> +
> +	if (!dchan || !region || !dchan->device)
> +		return -ENODEV;
> +
> +	chan = dchan2dw_edma_chan(dchan);
> +	if (!chan)
> +		return -ENODEV;
> +
> +	chip = chan->dw->chip;
> +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> +		return -EINVAL;
> +
> +	if (chan->dir == EDMA_DIR_WRITE)
> +		*region = chip->ll_region_wr[chan->id];
> +	else
> +		*region = chip->ll_region_rd[chan->id];
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> +
>  MODULE_LICENSE("GPL v2");
>  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
>  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> index 3c538246de07..c9ec426e27ec 100644
> --- a/include/linux/dma/edma.h
> +++ b/include/linux/dma/edma.h
> @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
>  int dw_edma_chan_register_notify(struct dma_chan *chan,
>  				 void (*cb)(struct dma_chan *chan, void *user),
>  				 void *user);
> +
> +/**
> + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> + * @chan: the target DMA channel
> + * @region: output parameter returning the corresponding LL region
> + */
> +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> +			       struct dw_edma_region *region);
>  #else
>  static inline int dw_edma_probe(struct dw_edma_chip *chip)
>  {
> @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> +					     struct dw_edma_region *region)
> +{
> +	return -EINVAL;
> +}
>  #endif /* CONFIG_DW_EDMA */
>
>  struct pci_epc;
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-18 17:03   ` Frank Li
@ 2026-01-19 14:26     ` Koichiro Den
  2026-01-21 16:02       ` Vinod Koul
  0 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-19 14:26 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 12:03:19PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:04PM +0900, Koichiro Den wrote:
> > DesignWare EP eDMA can generate interrupts both locally and remotely
> > (LIE/RIE). Remote eDMA users need to decide, per channel, whether
> > completions should be handled locally, remotely, or both. Unless
> > carefully configured, the endpoint and host would race to ack the
> > interrupt.
> >
> > Introduce a per-channel interrupt routing mode and export small APIs to
> > configure and query it. Update v0 programming so that RIE and local
> > done/abort interrupt masking follow the selected mode. The default mode
> > keeps the original behavior, so unless the new APIs are explicitly used,
> > no functional changes.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
> >  drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
> >  drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
> >  include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
> >  4 files changed, 116 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > index b9d59c3c0cb4..059b3996d383 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > @@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
> >  		chan->configured = false;
> >  		chan->request = EDMA_REQ_NONE;
> >  		chan->status = EDMA_ST_IDLE;
> > +		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
> >
> >  		if (chan->dir == EDMA_DIR_WRITE)
> >  			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
> > @@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
> >  }
> >  EXPORT_SYMBOL_GPL(dw_edma_remove);
> >
> > +int dw_edma_chan_irq_config(struct dma_chan *dchan,
> > +			    enum dw_edma_ch_irq_mode mode)
> > +{
> > +	struct dw_edma_chan *chan;
> > +
> > +	switch (mode) {
> > +	case DW_EDMA_CH_IRQ_DEFAULT:
> > +	case DW_EDMA_CH_IRQ_LOCAL:
> > +	case DW_EDMA_CH_IRQ_REMOTE:
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!dchan || !dchan->device)
> > +		return -ENODEV;
> > +
> > +	chan = dchan2dw_edma_chan(dchan);
> > +	if (!chan)
> > +		return -ENODEV;
> > +
> > +	chan->irq_mode = mode;
> > +
> > +	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
> > +		 str_write_read(chan->dir == EDMA_DIR_WRITE),
> > +		 chan->id, mode);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
> > +
> > +bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
> > +{
> > +	struct dw_edma_chan *chan;
> > +	struct dw_edma *dw;
> > +
> > +	if (!dchan || !dchan->device)
> > +		return false;
> > +
> > +	chan = dchan2dw_edma_chan(dchan);
> > +	if (!chan)
> > +		return false;
> > +
> > +	dw = chan->dw;
> > +	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
> > +		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
> > +	else
> > +		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
> > +}
> > +EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
> > +
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> > index 71894b9e0b15..8458d676551a 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.h
> > +++ b/drivers/dma/dw-edma/dw-edma-core.h
> > @@ -81,6 +81,8 @@ struct dw_edma_chan {
> >
> >  	struct msi_msg			msi;
> >
> > +	enum dw_edma_ch_irq_mode	irq_mode;
> > +
> >  	enum dw_edma_request		request;
> >  	enum dw_edma_status		status;
> >  	u8				configured;
> > diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > index 2850a9df80f5..80472148c335 100644
> > --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > @@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> >  	for_each_set_bit(pos, &val, total) {
> >  		chan = &dw->chan[pos + off];
> >
> > -		dw_edma_v0_core_clear_done_int(chan);
> > -		done(chan);
> > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > +			dw_edma_v0_core_clear_done_int(chan);
> > +			done(chan);
> > +		}
> >
> >  		ret = IRQ_HANDLED;
> >  	}
> > @@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> >  	for_each_set_bit(pos, &val, total) {
> >  		chan = &dw->chan[pos + off];
> >
> > -		dw_edma_v0_core_clear_abort_int(chan);
> > -		abort(chan);
> > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > +			dw_edma_v0_core_clear_abort_int(chan);
> > +			abort(chan);
> > +		}
> >
> >  		ret = IRQ_HANDLED;
> >  	}
> > @@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
> >  		j--;
> >  		if (!j) {
> >  			control |= DW_EDMA_V0_LIE;
> > -			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
> > +			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
> > +			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
> >  				control |= DW_EDMA_V0_RIE;
> >  		}
> >
> > @@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> >  				break;
> >  			}
> >  		}
> > -		/* Interrupt unmask - done, abort */
> > +		/* Interrupt mask/unmask - done, abort */
> >  		raw_spin_lock_irqsave(&dw->lock, flags);
> >
> >  		tmp = GET_RW_32(dw, chan->dir, int_mask);
> > -		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > -		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > +		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> > +			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > +			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > +		} else {
> > +			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > +			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > +		}
> >  		SET_RW_32(dw, chan->dir, int_mask, tmp);
> >  		/* Linked list error */
> >  		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > index ffad10ff2cd6..6f50165ac084 100644
> > --- a/include/linux/dma/edma.h
> > +++ b/include/linux/dma/edma.h
> > @@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
> >  	DW_EDMA_CHIP_LOCAL	= BIT(0),
> >  };
> >
> > +/*
> > + * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
> > + * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
> > + * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
> > + * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
> > + *
> > + * Some implementations require using LIE=1/RIE=1 with the local interrupt
> > + * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
> > + * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
> > + * Write Interrupt Generation".
> > + */
> > +enum dw_edma_ch_irq_mode {
> > +	DW_EDMA_CH_IRQ_DEFAULT	= 0,
> > +	DW_EDMA_CH_IRQ_LOCAL,
> > +	DW_EDMA_CH_IRQ_REMOTE,
> > +};
> > +
> >  /**
> >   * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
> >   * @dev:		 struct device of the eDMA controller
> > @@ -105,6 +122,22 @@ struct dw_edma_chip {
> >  #if IS_REACHABLE(CONFIG_DW_EDMA)
> >  int dw_edma_probe(struct dw_edma_chip *chip);
> >  int dw_edma_remove(struct dw_edma_chip *chip);
> > +/**
> > + * dw_edma_chan_irq_config - configure per-channel interrupt routing
> > + * @chan: DMA channel obtained from dma_request_channel()
> > + * @mode: interrupt routing mode
> > + *
> > + * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
> > + * not belong to the DesignWare eDMA driver.
> > + */
> > +int dw_edma_chan_irq_config(struct dma_chan *chan,
> > +			    enum dw_edma_ch_irq_mode mode);
> > +
> > +/**
> > + * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
> > + * @chan: DMA channel obtained from dma_request_channel()
> > + */
> > +bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> >  #else
> >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> >  {
> > @@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
> > +					  enum dw_edma_ch_irq_mode mode)
> > +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
> > +{
> > +	return false;
> > +}
> 
> I think it'd better go thought
> 
> struct dma_slave_config {
> 	...
>         void *peripheral_config;
> 	size_t peripheral_size;
> 
> };
> 
> So DMA consumer can use standard DMAengine API, dmaengine_slave_config().

Using .peripheral_config wasn't something I had initially considered, but I
agree that this is preferable in the sense that it avoids introducing the
additional exported APIs. I'm not entirely sure whether it's clean to use
it for non-peripheral settings in the strict sense, but there seem to be
precedents such as stm32_mdma_dma_config, so I guess it seems acceptable.
If I'm missing something, please correct me.

I'll rework this part as you suggested. Thanks for the guidance.

Koichiro

> 
> Frank
> >  #endif /* CONFIG_DW_EDMA */
> >
> >  struct pci_epc;
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks
  2026-01-18 13:54 ` [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
@ 2026-01-19 20:03   ` Frank Li
  2026-01-21  1:41     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:03 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:10PM +0900, Koichiro Den wrote:
> Provide db_vector_count() and db_vector_mask() implementations for both
> ntb_hw_epf and pci-epf-vntb so that ntb_transport can map MSI vectors to
> doorbell bits. Without them, the upper layer cannot identify which
> doorbell vector fired and ends up scheduling rxc_db_work() for all queue
> pairs, resulting in a thundering-herd effect when multiple queue pairs
> (QPs) are enabled.
>
> With this change, .peer_db_set() must honor the db_bits mask and raise
> all requested doorbell interrupts, so update those implementations
> accordingly.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---

Patch 6/7/8 can be post seperatly. Basic look good.

Frank

>  drivers/ntb/hw/epf/ntb_hw_epf.c               | 47 ++++++++++++-------
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 41 +++++++++++++---
>  2 files changed, 64 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> index dbb5bebe63a5..c37ede4063dc 100644
> --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> @@ -381,7 +381,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
>  		}
>  	}
>
> -	ndev->db_count = irq;
> +	ndev->db_count = irq - 1;
>
>  	ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
>  				   argument | irq);
> @@ -415,6 +415,22 @@ static u64 ntb_epf_db_valid_mask(struct ntb_dev *ntb)
>  	return ntb_ndev(ntb)->db_valid_mask;
>  }
>
> +static int ntb_epf_db_vector_count(struct ntb_dev *ntb)
> +{
> +	return ntb_ndev(ntb)->db_count;
> +}
> +
> +static u64 ntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
> +{
> +	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> +
> +	db_vector--; /* vector 0 is reserved for link events */
> +	if (db_vector < 0 || db_vector >= ndev->db_count)
> +		return 0;
> +
> +	return ndev->db_valid_mask & BIT_ULL(db_vector);
> +}
> +
>  static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
>  {
>  	return 0;
> @@ -507,26 +523,21 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
>  static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
>  {
>  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> -	u32 interrupt_num = ffs(db_bits) + 1;
> -	struct device *dev = ndev->dev;
> +	u32 interrupt_num;
>  	u32 db_entry_size;
>  	u32 db_offset;
>  	u32 db_data;
> -
> -	if (interrupt_num >= ndev->db_count) {
> -		dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
> -			interrupt_num, ndev->db_count);
> -		return -EINVAL;
> -	}
> +	unsigned long i;
>
>  	db_entry_size = readl(ndev->ctrl_reg + NTB_EPF_DB_ENTRY_SIZE);
>
> -	db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
> -	db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
> -
> -	writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
> -	       db_offset);
> -
> +	for_each_set_bit(i, (unsigned long *)&db_bits, ndev->db_count) {
> +		interrupt_num = i + 1;
> +		db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
> +		db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
> +		writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
> +		       db_offset);
> +	}
>  	return 0;
>  }
>
> @@ -556,6 +567,8 @@ static const struct ntb_dev_ops ntb_epf_ops = {
>  	.spad_count		= ntb_epf_spad_count,
>  	.peer_mw_count		= ntb_epf_peer_mw_count,
>  	.db_valid_mask		= ntb_epf_db_valid_mask,
> +	.db_vector_count	= ntb_epf_db_vector_count,
> +	.db_vector_mask		= ntb_epf_db_vector_mask,
>  	.db_set_mask		= ntb_epf_db_set_mask,
>  	.mw_set_trans		= ntb_epf_mw_set_trans,
>  	.mw_clear_trans		= ntb_epf_mw_clear_trans,
> @@ -607,8 +620,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
>  	int ret;
>
>  	/* One Link interrupt and rest doorbell interrupt */
> -	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
> -			       NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
> +	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE,
> +			       NTB_EPF_MAX_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE);
>  	if (ret) {
>  		dev_err(dev, "Failed to init ISR\n");
>  		return ret;
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 4927faa28255..39e784e21236 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1384,6 +1384,22 @@ static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
>  	return BIT_ULL(ntb_ndev(ntb)->db_count) - 1;
>  }
>
> +static int vntb_epf_db_vector_count(struct ntb_dev *ntb)
> +{
> +	return ntb_ndev(ntb)->db_count;
> +}
> +
> +static u64 vntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
> +{
> +	struct epf_ntb *ndev = ntb_ndev(ntb);
> +
> +	db_vector--; /* vector 0 is reserved for link events */
> +	if (db_vector < 0 || db_vector >= ndev->db_count)
> +		return 0;
> +
> +	return BIT_ULL(db_vector);
> +}
> +
>  static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
>  {
>  	return 0;
> @@ -1487,20 +1503,29 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
>
>  static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
>  {
> -	u32 interrupt_num = ffs(db_bits) + 1;
>  	struct epf_ntb *ntb = ntb_ndev(ndev);
>  	u8 func_no, vfunc_no;
> -	int ret;
> +	u64 failed = 0;
> +	unsigned long i;
>
>  	func_no = ntb->epf->func_no;
>  	vfunc_no = ntb->epf->vfunc_no;
>
> -	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> -				PCI_IRQ_MSI, interrupt_num + 1);
> -	if (ret)
> -		dev_err(&ntb->ntb->dev, "Failed to raise IRQ\n");
> +	for_each_set_bit(i, (unsigned long *)&db_bits, ntb->db_count) {
> +		/*
> +		 * DB bit i is MSI interrupt (i + 2).
> +		 * Vector 0 is used for link events and MSI vectors are
> +		 * 1-based for pci_epc_raise_irq().
> +		 */
> +		if (pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> +				      PCI_IRQ_MSI, i + 2))
> +			failed |= BIT_ULL(i);
> +	}
> +	if (failed)
> +		dev_err(&ntb->ntb->dev, "Failed to raise IRQ (%#llx)\n",
> +			failed);
>
> -	return ret;
> +	return failed ? -EIO : 0;
>  }
>
>  static u64 vntb_epf_db_read(struct ntb_dev *ndev)
> @@ -1561,6 +1586,8 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>  	.spad_count		= vntb_epf_spad_count,
>  	.peer_mw_count		= vntb_epf_peer_mw_count,
>  	.db_valid_mask		= vntb_epf_db_valid_mask,
> +	.db_vector_count	= vntb_epf_db_vector_count,
> +	.db_vector_mask		= vntb_epf_db_vector_mask,
>  	.db_set_mask		= vntb_epf_db_set_mask,
>  	.mw_set_trans		= vntb_epf_mw_set_trans,
>  	.mw_clear_trans		= vntb_epf_mw_clear_trans,
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming
  2026-01-18 13:54 ` [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming Koichiro Den
@ 2026-01-19 20:07   ` Frank Li
  0 siblings, 0 replies; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:07 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:11PM +0900, Koichiro Den wrote:
> At the BAR level, multiple MWs may be packed into a single BAR. In
> addition, a single MW may itself be subdivided into multiple address
> subranges, each of which can be mapped independently by the underlying
> NTB hardware.
>
> Introduce an optional ntb_dev_ops callback, .mw_set_trans_ranges(), to
> describe and program such layouts explicitly. The helper allows an NTB
> driver to provide, for each MW, a list of contiguous subranges that
> together cover the MW address space.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---

Reviewed-by: Frank Li <Frank.Li@nxp.com>
>  include/linux/ntb.h | 46 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
>
> diff --git a/include/linux/ntb.h b/include/linux/ntb.h
> index 8ff9d663096b..84908753f446 100644
> --- a/include/linux/ntb.h
> +++ b/include/linux/ntb.h
> @@ -206,6 +206,11 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
>  		1;
>  }
>
> +struct ntb_mw_subrange {
> +	dma_addr_t	addr;
> +	resource_size_t	size;
> +};
> +
>  /**
>   * struct ntb_dev_ops - ntb device operations
>   * @port_number:	See ntb_port_number().
> @@ -218,6 +223,7 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
>   * @mw_count:		See ntb_mw_count().
>   * @mw_get_align:	See ntb_mw_get_align().
>   * @mw_set_trans:	See ntb_mw_set_trans().
> + * @mw_set_trans_ranges:See ntb_mw_set_trans_ranges().
>   * @mw_clear_trans:	See ntb_mw_clear_trans().
>   * @peer_mw_count:	See ntb_peer_mw_count().
>   * @peer_mw_get_addr:	See ntb_peer_mw_get_addr().
> @@ -276,6 +282,9 @@ struct ntb_dev_ops {
>  			    resource_size_t *size_max);
>  	int (*mw_set_trans)(struct ntb_dev *ntb, int pidx, int widx,
>  			    dma_addr_t addr, resource_size_t size);
> +	int (*mw_set_trans_ranges)(struct ntb_dev *ntb, int pidx, int widx,
> +				   unsigned int num_ranges,
> +				   const struct ntb_mw_subrange *ranges);
>  	int (*mw_clear_trans)(struct ntb_dev *ntb, int pidx, int widx);
>  	int (*peer_mw_count)(struct ntb_dev *ntb);
>  	int (*peer_mw_get_addr)(struct ntb_dev *ntb, int widx,
> @@ -350,6 +359,7 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
>  		ops->mw_get_align			&&
>  		(ops->mw_set_trans			||
>  		 ops->peer_mw_set_trans)		&&
> +		/* ops->mw_set_trans_ranges		&& */
>  		/* ops->mw_clear_trans			&& */
>  		ops->peer_mw_count			&&
>  		ops->peer_mw_get_addr			&&
> @@ -860,6 +870,42 @@ static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
>  	return ntb->ops->mw_set_trans(ntb, pidx, widx, addr, size);
>  }
>
> +/**
> + * ntb_mw_set_trans_ranges() - set the translations of an inbound memory
> + *                             window, composed of multiple subranges.
> + * @ntb:	NTB device context.
> + * @pidx:	Port index of peer device.
> + * @widx:	Memory window index.
> + * @num_ranges:	The number of ranges described by @ranges array.
> + * @ranges:	Array of subranges. The subranges are interpreted in ascending
> + *		window offset order (i.e. ranges[0] maps the first part of the MW,
> + *		ranges[1] the next part, ...).
> + *
> + * Return: Zero on success, otherwise an error number. If the driver does
> + *         not implement the callback, return -EOPNOTSUPP.
> + */
> +static inline int ntb_mw_set_trans_ranges(struct ntb_dev *ntb, int pidx, int widx,
> +					  unsigned int num_ranges,
> +					  const struct ntb_mw_subrange *ranges)
> +{
> +	if (!num_ranges || !ranges)
> +		return -EINVAL;
> +
> +	if (ntb->ops->mw_set_trans_ranges)
> +		return ntb->ops->mw_set_trans_ranges(ntb, pidx, widx,
> +						     num_ranges, ranges);
> +
> +	/*
> +	 * Fallback for drivers that only support the legacy single-range
> +	 * translation API.
> +	 */
> +	if (num_ranges == 1)
> +		return ntb_mw_set_trans(ntb, pidx, widx,
> +					ranges[0].addr, ranges[0].size);
> +
> +	return -EOPNOTSUPP;
> +}
> +
>  /**
>   * ntb_mw_clear_trans() - clear the translation address of an inbound memory
>   *                        window
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() to ntb_dev_ops
  2026-01-18 13:54 ` [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() " Koichiro Den
@ 2026-01-19 20:09   ` Frank Li
  2026-01-21  1:44     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:09 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:13PM +0900, Koichiro Den wrote:
> Not all NTB implementations are able to naturally do DMA mapping through
> the NTB PCI device itself (e.g. due to IOMMU topology or non-PCI backing
> devices).
>
> Add an optional .get_dma_dev() callback and helper so clients can use
> the appropriate struct device for DMA API allocations and mappings.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  include/linux/ntb.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/ntb.h b/include/linux/ntb.h
> index aa888219732a..7ac8cb13e90d 100644
> --- a/include/linux/ntb.h
> +++ b/include/linux/ntb.h
> @@ -262,6 +262,7 @@ struct ntb_mw_subrange {
>   * @msg_clear_mask:	See ntb_msg_clear_mask().
>   * @msg_read:		See ntb_msg_read().
>   * @peer_msg_write:	See ntb_peer_msg_write().
> + * @get_dma_dev:	See ntb_get_dma_dev().
>   * @get_private_data:	See ntb_get_private_data().
>   */
>  struct ntb_dev_ops {
> @@ -339,6 +340,7 @@ struct ntb_dev_ops {
>  	int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
>  	u32 (*msg_read)(struct ntb_dev *ntb, int *pidx, int midx);
>  	int (*peer_msg_write)(struct ntb_dev *ntb, int pidx, int midx, u32 msg);
> +	struct device *(*get_dma_dev)(struct ntb_dev *ntb);
>  	void *(*get_private_data)(struct ntb_dev *ntb);
>  };
>
> @@ -405,6 +407,7 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
>  		!ops->peer_msg_write == !ops->msg_count		&&
>
>  		/* Miscellaneous optional callbacks */
> +		/* ops->get_dma_dev				&& */
>  		/* ops->get_private_data			&& */
>  		1;
>  }
> @@ -1614,6 +1617,21 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
>  	return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
>  }
>
> +/**
> + * ntb_get_dma_dev() - get the device suitable for DMA mapping
> + * @ntb:	NTB device context.
> + *
> + * Retrieve a struct device which is suitable for DMA mapping.
> + *
> + * Return: Pointer to struct device.
> + */
> +static inline struct device __maybe_unused *ntb_get_dma_dev(struct ntb_dev *ntb)

I remember if there are inline,  needn't __maybe_unused.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
> +{
> +	if (!ntb->ops->get_dma_dev)
> +		return ntb->dev.parent;
> +	return ntb->ops->get_dma_dev(ntb);
> +}
> +
>  /**
>   * ntb_get_private_data() - get private data specific to the hardware driver
>   * @ntb:	NTB device context.
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
  2026-01-18 13:54 ` [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs Koichiro Den
@ 2026-01-19 20:26   ` Frank Li
  2026-01-21  2:08     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:26 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:15PM +0900, Koichiro Den wrote:
> pci-epf-vntb can pack multiple memory windows into a single BAR using
> mwN_offset. With the NTB core gaining support for programming multiple
> translation ranges for a window, the EPF needs to provide the per-BAR
> subrange layout to the endpoint controller (EPC).
>
> Implement .mw_set_trans_ranges() for pci-epf-vntb. Track subranges for
> each BAR and pass them to pci_epc_set_bar() so EPC drivers can select an
> appropriate inbound mapping mode (e.g. Address Match mode on DesignWare
> controllers) when subrange mappings are required.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 183 +++++++++++++++++-
>  1 file changed, 175 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 39e784e21236..98128c2c5079 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -42,6 +42,7 @@
>  #include <linux/log2.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> +#include <linux/sort.h>
>
>  #include <linux/pci-ep-msi.h>
>  #include <linux/pci-epc.h>
> @@ -144,6 +145,10 @@ struct epf_ntb {
>
>  	enum pci_barno epf_ntb_bar[VNTB_BAR_NUM];
>
> +	/* Cache for subrange mapping */
> +	struct ntb_mw_subrange *mw_subrange[MAX_MW];
> +	unsigned int num_subrange[MAX_MW];
> +
>  	struct epf_ntb_ctrl *reg;
>
>  	u32 *epf_db;
> @@ -736,6 +741,7 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
>  		ntb->epf->bar[barno].flags |= upper_32_bits(size) ?
>  				PCI_BASE_ADDRESS_MEM_TYPE_64 :
>  				PCI_BASE_ADDRESS_MEM_TYPE_32;
> +		ntb->epf->bar[barno].num_submap = 0;
>
>  		ret = pci_epc_set_bar(ntb->epf->epc,
>  				      ntb->epf->func_no,
> @@ -1405,28 +1411,188 @@ static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
>  	return 0;
>  }
>
> -static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
> -		dma_addr_t addr, resource_size_t size)
> +struct vntb_mw_order {
> +	u64 off;
> +	unsigned int mw;
> +};
> +
> +static int vntb_cmp_mw_order(const void *a, const void *b)
> +{
> +	const struct vntb_mw_order *ma = a;
> +	const struct vntb_mw_order *mb = b;
> +
> +	if (ma->off < mb->off)
> +		return -1;
> +	if (ma->off > mb->off)
> +		return 1;
> +	return 0;
> +}
> +
> +static int vntb_epf_mw_set_trans_ranges(struct ntb_dev *ndev, int pidx, int idx,
> +					unsigned int num_ranges,
> +					const struct ntb_mw_subrange *ranges)
>  {
>  	struct epf_ntb *ntb = ntb_ndev(ndev);
> +	struct pci_epf_bar_submap *submap;
> +	struct vntb_mw_order mws[MAX_MW];
>  	struct pci_epf_bar *epf_bar;
> +	struct ntb_mw_subrange *r;
>  	enum pci_barno barno;
> +	struct device *dev, *epf_dev;
> +	unsigned int total_ranges = 0;
> +	unsigned int mw_cnt = 0;
> +	unsigned int cur = 0;
> +	u64 expected_off = 0;
> +	unsigned int i, j;
>  	int ret;
> +
> +	dev = &ntb->ntb->dev;
> +	epf_dev = &ntb->epf->dev;
> +	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
> +	epf_bar = &ntb->epf->bar[barno];
> +	epf_bar->barno = barno;
> +
> +	r = devm_kmemdup(epf_dev, ranges, num_ranges * sizeof(*ranges), GFP_KERNEL);

size_mul(sizeof(*ranges), num_ranges)

> +	if (!r)
> +		return -ENOMEM;
> +
> +	if (ntb->mw_subrange[idx])
> +		devm_kfree(epf_dev, ntb->mw_subrange[idx]);
> +
> +	ntb->mw_subrange[idx] = r;
> +	ntb->num_subrange[idx] = num_ranges;
> +
> +	/* Defer pci_epc_set_bar() until all MWs in this BAR have range info. */
> +	for (i = 0; i < MAX_MW; i++) {
> +		enum pci_barno bar = ntb->epf_ntb_bar[BAR_MW1 + i];
> +
> +		if (bar != barno)
> +			continue;
> +		if (!ntb->num_subrange[i])
> +			return 0;
> +
> +		mws[mw_cnt].mw = i;
> +		mws[mw_cnt].off = ntb->mws_offset[i];
> +		mw_cnt++;
> +	}
> +
> +	sort(mws, mw_cnt, sizeof(mws[0]), vntb_cmp_mw_order, NULL);

Can we require mws_offset is ordered? So needn't sort here.

> +
> +	/* BAR submap must cover the whole BAR with no holes. */
> +	for (i = 0; i < mw_cnt; i++) {
> +		unsigned int mw = mws[i].mw;
> +		u64 sum = 0;
> +
> +		if (mws[i].off != expected_off) {

can we all use size instead 'off' to keep align with submap?

Frank
> +			dev_err(dev,
> +				"BAR%d: hole/overlap at %#llx (MW%d@%#llx)\n",
> +				barno, expected_off, mw + 1, mws[i].off);
> +			return -EINVAL;
> +		}
> +
> +		total_ranges += ntb->num_subrange[mw];
> +		for (j = 0; j < ntb->num_subrange[mw]; j++)
> +			sum += ntb->mw_subrange[mw][j].size;
> +
> +		if (sum != ntb->mws_size[mw]) {
> +			dev_err(dev,
> +				"MW%d: ranges size %#llx != window size %#llx\n",
> +				mw + 1, sum, ntb->mws_size[mw]);
> +			return -EINVAL;
> +		}
> +		expected_off += ntb->mws_size[mw];
> +	}
> +
> +	submap = devm_krealloc_array(epf_dev, epf_bar->submap, total_ranges,
> +				     sizeof(*submap), GFP_KERNEL);
> +	if (!submap)
> +		return -ENOMEM;
> +
> +	epf_bar->submap = submap;
> +	epf_bar->num_submap = total_ranges;
> +	dev_dbg(dev, "Requesting BAR%d layout (#. of subranges is %u):\n",
> +		barno, total_ranges);
> +
> +	for (i = 0; i < mw_cnt; i++) {
> +		unsigned int mw = mws[i].mw;
> +
> +		dev_dbg(dev, "- MW%d\n", 1 + mw);
> +		for (j = 0; j < ntb->num_subrange[mw]; j++) {
> +			dev_dbg(dev, "  - addr/size = %#llx/%#llx\n",
> +				ntb->mw_subrange[mw][j].addr,
> +				ntb->mw_subrange[mw][j].size);
> +			submap[cur].phys_addr = ntb->mw_subrange[mw][j].addr;
> +			submap[cur].size = ntb->mw_subrange[mw][j].size;
> +			cur++;
> +		}
> +	}
> +
> +	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
> +			      ntb->epf->vfunc_no, epf_bar);
> +	if (ret)
> +		dev_err(dev, "BAR%d: failed to program mappings for MW%d: %d\n",
> +			barno, idx + 1, ret);
> +
> +	return ret;
> +}
> +
> +static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
> +				 dma_addr_t addr, resource_size_t size)
> +{
> +	struct epf_ntb *ntb = ntb_ndev(ndev);
> +	struct pci_epf_bar *epf_bar;
> +	resource_size_t bar_size;
> +	enum pci_barno barno;
>  	struct device *dev;
> +	unsigned int i;
> +	int ret;
>
>  	dev = &ntb->ntb->dev;
>  	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
>  	epf_bar = &ntb->epf->bar[barno];
>  	epf_bar->phys_addr = addr;
>  	epf_bar->barno = barno;
> -	epf_bar->size = size;
>
> -	ret = pci_epc_set_bar(ntb->epf->epc, 0, 0, epf_bar);
> -	if (ret) {
> -		dev_err(dev, "failure set mw trans\n");
> -		return ret;
> +	bar_size = epf_bar->size;
> +	if (!bar_size || !size)
> +		return -EINVAL;
> +
> +	if (size != ntb->mws_size[idx])
> +		return -EINVAL;
> +
> +	/*
> +	 * Even if the caller intends to map the entire MW, the MW might
> +	 * actually be just a part of the BAR. In that case, redirect the
> +	 * handling to vntb_epf_mw_set_trans_ranges().
> +	 */
> +	if (size < bar_size) {
> +		struct ntb_mw_subrange r = {
> +			.addr = addr,
> +			.size = size,
> +		};
> +		return vntb_epf_mw_set_trans_ranges(ndev, pidx, idx, 1, &r);
>  	}
> -	return 0;
> +
> +	/* Drop any stale cache for the BAR. */
> +	for (i = 0; i < MAX_MW; i++) {
> +		if (ntb->epf_ntb_bar[BAR_MW1 + i] != barno)
> +			continue;
> +		devm_kfree(&ntb->epf->dev, ntb->mw_subrange[i]);
> +		ntb->mw_subrange[i] = NULL;
> +		ntb->num_subrange[i] = 0;
> +	}
> +
> +	/* Not use subrange mapping. If it's used in the past, clear it off. */
> +	devm_kfree(&ntb->epf->dev, epf_bar->submap);
> +	epf_bar->submap = NULL;
> +	epf_bar->num_submap = 0;
> +
> +	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
> +			      ntb->epf->vfunc_no, epf_bar);
> +	if (ret)
> +		dev_err(dev, "failure set mw trans\n");
> +
> +	return ret;
>  }
>
>  static int vntb_epf_mw_clear_trans(struct ntb_dev *ntb, int pidx, int idx)
> @@ -1590,6 +1756,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>  	.db_vector_mask		= vntb_epf_db_vector_mask,
>  	.db_set_mask		= vntb_epf_db_set_mask,
>  	.mw_set_trans		= vntb_epf_mw_set_trans,
> +	.mw_set_trans_ranges	= vntb_epf_mw_set_trans_ranges,
>  	.mw_clear_trans		= vntb_epf_mw_clear_trans,
>  	.peer_mw_get_addr	= vntb_epf_peer_mw_get_addr,
>  	.link_enable		= vntb_epf_link_enable,
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
  2026-01-18 13:54 ` [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback Koichiro Den
@ 2026-01-19 20:27   ` Frank Li
  0 siblings, 0 replies; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:27 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:16PM +0900, Koichiro Den wrote:
> Implement the new get_private_data() operation for the EPF vNTB driver
> to expose its associated EPC device to NTB subsystems.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---

Reviewed-by: Frank Li <Frank.Li@nxp.com>

>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 98128c2c5079..9fbc27000f77 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1747,6 +1747,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
>  	return 0;
>  }
>
> +static void *vntb_epf_get_private_data(struct ntb_dev *ndev)
> +{
> +	struct epf_ntb *ntb = ntb_ndev(ndev);
> +
> +	if (!ntb || !ntb->epf)
> +		return NULL;
> +	return ntb->epf->epc;
> +}
> +
>  static const struct ntb_dev_ops vntb_epf_ops = {
>  	.mw_count		= vntb_epf_mw_count,
>  	.spad_count		= vntb_epf_spad_count,
> @@ -1771,6 +1780,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>  	.db_clear_mask		= vntb_epf_db_clear_mask,
>  	.db_clear		= vntb_epf_db_clear,
>  	.link_disable		= vntb_epf_link_disable,
> +	.get_private_data	= vntb_epf_get_private_data,
>  };
>
>  static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
  2026-01-18 13:54 ` [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev() Koichiro Den
@ 2026-01-19 20:30   ` Frank Li
  2026-01-22 14:58     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:30 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:17PM +0900, Koichiro Den wrote:
> For DMA API allocations and mappings, pci-epf-vntb should provide an
> appropriate struct device for the NTB core/clients.
>
> Implement .get_dma_dev() and return the EPC parent device.

Simple said:

Implement .get_dma_dev() and return the EPC parent device for NTB
core/client's DMA allocations and mappings API.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 9fbc27000f77..7cd976757d15 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -1747,6 +1747,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
>  	return 0;
>  }
>
> +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> +{
> +	struct epf_ntb *ntb = ntb_ndev(ndev);
> +
> +	if (!ntb || !ntb->epf)
> +		return NULL;
> +	return ntb->epf->epc->dev.parent;
> +}
> +
>  static void *vntb_epf_get_private_data(struct ntb_dev *ndev)
>  {
>  	struct epf_ntb *ntb = ntb_ndev(ndev);
> @@ -1780,6 +1789,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
>  	.db_clear_mask		= vntb_epf_db_clear_mask,
>  	.db_clear		= vntb_epf_db_clear,
>  	.link_disable		= vntb_epf_link_disable,
> +	.get_dma_dev		= vntb_epf_get_dma_dev,
>  	.get_private_data	= vntb_epf_get_private_data,
>  };
>
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
  2026-01-18 13:54 ` [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
@ 2026-01-19 20:36   ` Frank Li
  2026-01-21  2:15     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:36 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:18PM +0900, Koichiro Den wrote:
> Historically both TX and RX have assumed the same per-QP MW slice
> (tx_max_entry == remote rx_max_entry), while those are calculated
> separately in different places (pre and post the link-up negotiation
> point). This has been safe because nt->link_is_up is never set to true
> unless the pre-determined qp_count are the same among them, and qp_count
> is typically limited to nt->mw_count, which should be carefully
> configured by admin.
>
> However, setup_qp_mw can actually split mw and handle multi-qps in one
> MW properly, so qp_count needs not to be limited by nt->mw_count. Once
> we relax the limitation, pre-determined qp_count can differ among host
> side and endpoint, and link-up negotiation can easily fail.
>
> Move the TX MW configuration (per-QP offset and size) into
> ntb_transport_setup_qp_mw() so that both RX and TX layout decisions are
> centralized in a single helper. ntb_transport_init_queue() now deals
> only with per-QP software state, not with MW layout.
>
> This keeps the previous behavior, while preparing for relaxing the
> qp_count limitation and improving readability.
>
> No functional change is intended.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/ntb/ntb_transport.c | 76 ++++++++++++++++---------------------
>  1 file changed, 32 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index d5a544bf8fd6..57a21f2daac6 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -569,7 +569,10 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
>  	struct ntb_transport_mw *mw;
>  	struct ntb_dev *ndev = nt->ndev;
>  	struct ntb_queue_entry *entry;
> -	unsigned int rx_size, num_qps_mw;
> +	phys_addr_t mw_base;
> +	resource_size_t mw_size;
> +	unsigned int rx_size, tx_size, num_qps_mw;
> +	u64 qp_offset;
>  	unsigned int mw_num, mw_count, qp_count;
>  	unsigned int i;
>  	int node;
> @@ -588,13 +591,38 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
>  	else
>  		num_qps_mw = qp_count / mw_count;
>
> -	rx_size = (unsigned int)mw->xlat_size / num_qps_mw;
> -	qp->rx_buff = mw->virt_addr + rx_size * (qp_num / mw_count);
> -	rx_size -= sizeof(struct ntb_rx_info);
> +	mw_base = nt->mw_vec[mw_num].phys_addr;
> +	mw_size = nt->mw_vec[mw_num].phys_size;
> +
> +	if (mw_size > mw->xlat_size)
> +		mw_size = mw->xlat_size;

old code have not check this.

Frank
> +	if (max_mw_size && mw_size > max_mw_size)
> +		mw_size = max_mw_size;
> +
> +	tx_size = (unsigned int)mw_size / num_qps_mw;
> +	qp_offset = tx_size * (qp_num / mw_count);
> +
> +	qp->rx_buff = mw->virt_addr + qp_offset;
> +
> +	qp->tx_mw_size = tx_size;
> +	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> +	if (!qp->tx_mw)
> +		return -EINVAL;
> +
> +	qp->tx_mw_phys = mw_base + qp_offset;
> +	if (!qp->tx_mw_phys)
> +		return -EINVAL;
>
> +	rx_size = tx_size;
> +	rx_size -= sizeof(struct ntb_rx_info);
>  	qp->remote_rx_info = qp->rx_buff + rx_size;
>
> +	tx_size -= sizeof(struct ntb_rx_info);
> +	qp->rx_info = qp->tx_mw + tx_size;
> +
>  	/* Due to housekeeping, there must be atleast 2 buffs */
> +	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> +	qp->tx_max_entry = tx_size / qp->tx_max_frame;
>  	qp->rx_max_frame = min(transport_mtu, rx_size / 2);
>  	qp->rx_max_entry = rx_size / qp->rx_max_frame;
>  	qp->rx_index = 0;
> @@ -1132,16 +1160,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
>  				    unsigned int qp_num)
>  {
>  	struct ntb_transport_qp *qp;
> -	phys_addr_t mw_base;
> -	resource_size_t mw_size;
> -	unsigned int num_qps_mw, tx_size;
> -	unsigned int mw_num, mw_count, qp_count;
> -	u64 qp_offset;
> -
> -	mw_count = nt->mw_count;
> -	qp_count = nt->qp_count;
> -
> -	mw_num = QP_TO_MW(nt, qp_num);
>
>  	qp = &nt->qp_vec[qp_num];
>  	qp->qp_num = qp_num;
> @@ -1151,36 +1169,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
>  	qp->event_handler = NULL;
>  	ntb_qp_link_context_reset(qp);
>
> -	if (mw_num < qp_count % mw_count)
> -		num_qps_mw = qp_count / mw_count + 1;
> -	else
> -		num_qps_mw = qp_count / mw_count;
> -
> -	mw_base = nt->mw_vec[mw_num].phys_addr;
> -	mw_size = nt->mw_vec[mw_num].phys_size;
> -
> -	if (max_mw_size && mw_size > max_mw_size)
> -		mw_size = max_mw_size;
> -
> -	tx_size = (unsigned int)mw_size / num_qps_mw;
> -	qp_offset = tx_size * (qp_num / mw_count);
> -
> -	qp->tx_mw_size = tx_size;
> -	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> -	if (!qp->tx_mw)
> -		return -EINVAL;
> -
> -	qp->tx_mw_phys = mw_base + qp_offset;
> -	if (!qp->tx_mw_phys)
> -		return -EINVAL;
> -
> -	tx_size -= sizeof(struct ntb_rx_info);
> -	qp->rx_info = qp->tx_mw + tx_size;
> -
> -	/* Due to housekeeping, there must be atleast 2 buffs */
> -	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> -	qp->tx_max_entry = tx_size / qp->tx_max_frame;
> -
>  	if (nt->debugfs_node_dir) {
>  		char debugfs_name[8];
>
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev()
  2026-01-18 13:54 ` [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev() Koichiro Den
@ 2026-01-19 20:38   ` Frank Li
  0 siblings, 0 replies; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:38 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:20PM +0900, Koichiro Den wrote:
> Replace direct use of ntb->pdev with ntb_get_dma_dev() for DMA-safe
> allocations and frees. This allows ntb_transport to operate on NTB
> implementations that are not backed by a PCI device from IOMMU
> perspective.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
Reviewed-by: Frank Li <Frank.Li@nxp.com>
>  drivers/ntb/ntb_transport.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index 6ed680d0470f..7b320249629c 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -771,13 +771,13 @@ static void ntb_transport_msi_desc_changed(void *data)
>  static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
>  {
>  	struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> -	struct pci_dev *pdev = nt->ndev->pdev;
> +	struct device *dev = ntb_get_dma_dev(nt->ndev);
>
> -	if (!mw->virt_addr)
> +	if (!dev || !mw->virt_addr)
>  		return;
>
>  	ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
> -	dma_free_coherent(&pdev->dev, mw->alloc_size,
> +	dma_free_coherent(dev, mw->alloc_size,
>  			  mw->alloc_addr, mw->dma_addr);
>  	mw->xlat_size = 0;
>  	mw->buff_size = 0;
> @@ -847,13 +847,13 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
>  		      resource_size_t size)
>  {
>  	struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> -	struct pci_dev *pdev = nt->ndev->pdev;
> +	struct device *dev = ntb_get_dma_dev(nt->ndev);
>  	size_t xlat_size, buff_size;
>  	resource_size_t xlat_align;
>  	resource_size_t xlat_align_size;
>  	int rc;
>
> -	if (!size)
> +	if (!dev || !size)
>  		return -EINVAL;
>
>  	rc = ntb_mw_get_align(nt->ndev, PIDX, num_mw, &xlat_align,
> @@ -876,12 +876,12 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
>  	mw->buff_size = buff_size;
>  	mw->alloc_size = buff_size;
>
> -	rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> +	rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
>  	if (rc) {
>  		mw->alloc_size *= 2;
> -		rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> +		rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
>  		if (rc) {
> -			dev_err(&pdev->dev,
> +			dev_err(dev,
>  				"Unable to alloc aligned MW buff\n");
>  			mw->xlat_size = 0;
>  			mw->buff_size = 0;
> @@ -894,7 +894,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
>  	rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
>  			      mw->xlat_size);
>  	if (rc) {
> -		dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
> +		dev_err(dev, "Unable to set mw%d translation", num_mw);
>  		ntb_free_mw(nt, num_mw);
>  		return -EIO;
>  	}
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
  2026-01-18 13:54 ` [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode Koichiro Den
@ 2026-01-19 20:47   ` Frank Li
  2026-01-22 14:54     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-19 20:47 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 10:54:38PM +0900, Koichiro Den wrote:
> Some DesignWare-based endpoints integrate an eDMA engine that can be
> programmed by the host via MMIO. The upcoming NTB transport remote-eDMA
> backend relies on this capability, but there is currently no upstream
> test coverage for the end-to-end control and data path.
>
> Extend pci-epf-test with an optional remote eDMA test backend (built when
> CONFIG_DW_EDMA is enabled).
>
> - Reserve a spare BAR and expose a small 'pcitest_edma_info' header at
>   BAR offset 0. The header carries a magic/version and describes the
>   endpoint eDMA register window, per-direction linked-list (LL)
>   locations and an endpoint test buffer.
> - Map the eDMA registers and LL locations into that BAR using BAR
>   subrange mappings (address-match inbound iATU).
>
> To run this extra testing, two new endpoint commands are added:
>   * COMMAND_REMOTE_EDMA_SETUP
>   * COMMAND_REMOTE_EDMA_CHECKSUM
>
> When the former command is received, the endpoint prepares for the
> remote eDMA transfer. The CHECKSUM command is useful for Host-to-EP
> transfer testing, as the endpoint side is not expected to receive the
> DMA completion interrupt directly. Instead, the host asks the endpoint
> to compute a CRC32 over the transferred data.
>
> This backend is exercised by the host-side pci_endpoint_test driver via a
> new UAPI flag.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/pci/endpoint/functions/pci-epf-test.c | 477 ++++++++++++++++++

This patch should be combined into your submap patches, which is one user
of submap.

Frank

>  1 file changed, 477 insertions(+)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> index e560c3becebb..eea10bddcd2a 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> @@ -10,6 +10,7 @@
>  #include <linux/delay.h>
>  #include <linux/dmaengine.h>
>  #include <linux/io.h>
> +#include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/msi.h>
>  #include <linux/slab.h>
> @@ -33,6 +34,8 @@
>  #define COMMAND_COPY			BIT(5)
>  #define COMMAND_ENABLE_DOORBELL		BIT(6)
>  #define COMMAND_DISABLE_DOORBELL	BIT(7)
> +#define COMMAND_REMOTE_EDMA_SETUP	BIT(8)
> +#define COMMAND_REMOTE_EDMA_CHECKSUM	BIT(9)
>
>  #define STATUS_READ_SUCCESS		BIT(0)
>  #define STATUS_READ_FAIL		BIT(1)
> @@ -48,6 +51,10 @@
>  #define STATUS_DOORBELL_ENABLE_FAIL	BIT(11)
>  #define STATUS_DOORBELL_DISABLE_SUCCESS BIT(12)
>  #define STATUS_DOORBELL_DISABLE_FAIL	BIT(13)
> +#define STATUS_REMOTE_EDMA_SETUP_SUCCESS	BIT(14)
> +#define STATUS_REMOTE_EDMA_SETUP_FAIL		BIT(15)
> +#define STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS	BIT(16)
> +#define STATUS_REMOTE_EDMA_CHECKSUM_FAIL	BIT(17)
>
>  #define FLAG_USE_DMA			BIT(0)
>
> @@ -77,6 +84,9 @@ struct pci_epf_test {
>  	bool			dma_private;
>  	const struct pci_epc_features *epc_features;
>  	struct pci_epf_bar	db_bar;
> +
> +	/* For extended tests that rely on vendor-specific features */
> +	void *data;
>  };
>
>  struct pci_epf_test_reg {
> @@ -117,6 +127,454 @@ static enum pci_barno pci_epf_test_next_free_bar(struct pci_epf_test *epf_test)
>  	return bar;
>  }
>
> +#if IS_REACHABLE(CONFIG_DW_EDMA)
> +#include <linux/dma/edma.h>
> +
> +#define PCITEST_EDMA_INFO_MAGIC		0x414d4445U /* 'EDMA' */
> +#define PCITEST_EDMA_INFO_VERSION	0x00010000U
> +#define PCITEST_EDMA_TEST_BUF_SIZE	(1024 * 1024)
> +
> +struct pci_epf_test_edma {
> +	/* Remote eDMA test resources */
> +	bool			enabled;
> +	enum pci_barno		bar;
> +	void			*info;
> +	size_t			total_size;
> +	void			*test_buf;
> +	dma_addr_t		test_buf_phys;
> +	size_t			test_buf_size;
> +
> +	/* DW eDMA specifics */
> +	phys_addr_t		reg_phys;
> +	size_t			reg_submap_sz;
> +	unsigned long		reg_iova;
> +	size_t			reg_iova_sz;
> +	phys_addr_t		ll_rd_phys;
> +	size_t			ll_rd_sz_aligned;
> +	phys_addr_t		ll_wr_phys;
> +	size_t			ll_wr_sz_aligned;
> +};
> +
> +struct pcitest_edma_info {
> +	__le32 magic;
> +	__le32 version;
> +
> +	__le32 reg_off;
> +	__le32 reg_size;
> +
> +	__le64 ll_rd_phys;
> +	__le32 ll_rd_off;
> +	__le32 ll_rd_size;
> +
> +	__le64 ll_wr_phys;
> +	__le32 ll_wr_off;
> +	__le32 ll_wr_size;
> +
> +	__le64 test_buf_phys;
> +	__le32 test_buf_size;
> +};
> +
> +static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
> +					 enum pci_barno barno)
> +{
> +	struct pci_epf_test_edma *edma = test->data;
> +
> +	if (!edma)
> +		return false;
> +
> +	return barno == edma->bar;
> +}
> +
> +static void pci_epf_test_clear_submaps(struct pci_epf_bar *bar)
> +{
> +	kfree(bar->submap);
> +	bar->submap = NULL;
> +	bar->num_submap = 0;
> +}
> +
> +static int pci_epf_test_add_submap(struct pci_epf_bar *bar, phys_addr_t phys,
> +				   size_t size)
> +{
> +	struct pci_epf_bar_submap *submap, *new;
> +
> +	new = krealloc_array(bar->submap, bar->num_submap + 1, sizeof(*new),
> +			     GFP_KERNEL);
> +	if (!new)
> +		return -ENOMEM;
> +
> +	bar->submap = new;
> +	submap = &bar->submap[bar->num_submap];
> +	submap->phys_addr = phys;
> +	submap->size = size;
> +	bar->num_submap++;
> +
> +	return 0;
> +}
> +
> +static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
> +{
> +	struct pci_epf_test_edma *edma = test->data;
> +	struct pci_epf *epf = test->epf;
> +	struct pci_epc *epc = epf->epc;
> +	struct device *dev = epc->dev.parent;
> +	struct iommu_domain *dom;
> +	struct pci_epf_bar *bar;
> +	enum pci_barno barno;
> +
> +	if (!edma)
> +		return;
> +
> +	barno = edma->bar;
> +	if (barno == NO_BAR)
> +		return;
> +
> +	bar = &epf->bar[barno];
> +
> +	dom = iommu_get_domain_for_dev(dev);
> +	if (dom && edma->reg_iova_sz) {
> +		iommu_unmap(dom, edma->reg_iova, edma->reg_iova_sz);
> +		edma->reg_iova = 0;
> +		edma->reg_iova_sz = 0;
> +	}
> +
> +	if (edma->test_buf) {
> +		dma_free_coherent(dev, edma->test_buf_size,
> +				  edma->test_buf,
> +				  edma->test_buf_phys);
> +		edma->test_buf = NULL;
> +		edma->test_buf_phys = 0;
> +		edma->test_buf_size = 0;
> +	}
> +
> +	if (edma->info) {
> +		pci_epf_free_space(epf, edma->info, barno, PRIMARY_INTERFACE);
> +		edma->info = NULL;
> +	}
> +
> +	pci_epf_test_clear_submaps(bar);
> +	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, bar);
> +
> +	edma->bar = NO_BAR;
> +	edma->enabled = false;
> +}
> +
> +static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
> +{
> +	const struct pci_epc_features *epc_features = test->epc_features;
> +	struct pci_epf_test_edma *edma;
> +	struct pci_epf *epf = test->epf;
> +	struct pci_epc *epc = epf->epc;
> +	struct pcitest_edma_info *info;
> +	struct device *dev = epc->dev.parent;
> +	struct dw_edma_region region;
> +	struct iommu_domain *dom;
> +	size_t reg_sz_aligned, ll_rd_sz_aligned, ll_wr_sz_aligned;
> +	phys_addr_t phys, ll_rd_phys, ll_wr_phys;
> +	size_t ll_rd_size, ll_wr_size;
> +	resource_size_t reg_size;
> +	unsigned long iova;
> +	size_t off, size;
> +	int ret;
> +
> +	if (!test->dma_chan_tx || !test->dma_chan_rx)
> +		return -ENODEV;
> +
> +	edma = devm_kzalloc(&epf->dev, sizeof(*edma), GFP_KERNEL);
> +	if (!edma)
> +		return -ENOMEM;
> +	test->data = edma;
> +
> +	edma->bar = pci_epf_test_next_free_bar(test);
> +	if (edma->bar == NO_BAR) {
> +		dev_err(&epf->dev, "No spare BAR for remote eDMA (remote eDMA disabled)\n");
> +		ret = -ENOSPC;
> +		goto err;
> +	}
> +
> +	ret = dw_edma_get_reg_window(epc, &edma->reg_phys, &reg_size);
> +	if (ret) {
> +		dev_err(dev, "failed to get edma reg window: %d\n", ret);
> +		goto err;
> +	}
> +	dom = iommu_get_domain_for_dev(dev);
> +	if (dom) {
> +		phys = edma->reg_phys & PAGE_MASK;
> +		size = PAGE_ALIGN(reg_size + edma->reg_phys - phys);
> +		iova = phys;
> +
> +		ret = iommu_map(dom, iova, phys, size,
> +				IOMMU_READ | IOMMU_WRITE | IOMMU_MMIO,
> +				GFP_KERNEL);
> +		if (ret) {
> +			dev_err(dev, "failed to direct map eDMA reg: %d\n", ret);
> +			goto err;
> +		}
> +		edma->reg_iova = iova;
> +		edma->reg_iova_sz = size;
> +	}
> +
> +	/* Get LL location addresses and sizes */
> +	ret = dw_edma_chan_get_ll_region(test->dma_chan_rx, &region);
> +	if (ret) {
> +		dev_err(dev, "failed to get edma ll region for rx: %d\n", ret);
> +		goto err;
> +	}
> +	ll_rd_phys = region.paddr;
> +	ll_rd_size = region.sz;
> +
> +	ret = dw_edma_chan_get_ll_region(test->dma_chan_tx, &region);
> +	if (ret) {
> +		dev_err(dev, "failed to get edma ll region for tx: %d\n", ret);
> +		goto err;
> +	}
> +	ll_wr_phys = region.paddr;
> +	ll_wr_size = region.sz;
> +
> +	edma->test_buf_size = PCITEST_EDMA_TEST_BUF_SIZE;
> +	edma->test_buf = dma_alloc_coherent(dev, edma->test_buf_size,
> +					    &edma->test_buf_phys, GFP_KERNEL);
> +	if (!edma->test_buf) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	reg_sz_aligned = PAGE_ALIGN(reg_size);
> +	ll_rd_sz_aligned = PAGE_ALIGN(ll_rd_size);
> +	ll_wr_sz_aligned = PAGE_ALIGN(ll_wr_size);
> +	edma->total_size = PAGE_SIZE + reg_sz_aligned + ll_rd_sz_aligned +
> +			   ll_wr_sz_aligned;
> +	size = roundup_pow_of_two(edma->total_size);
> +
> +	info = pci_epf_alloc_space(epf, size, edma->bar,
> +				   epc_features, PRIMARY_INTERFACE);
> +	if (!info) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +	memset(info, 0, size);
> +
> +	off = PAGE_SIZE;
> +	info->magic = cpu_to_le32(PCITEST_EDMA_INFO_MAGIC);
> +	info->version = cpu_to_le32(PCITEST_EDMA_INFO_VERSION);
> +
> +	info->reg_off = cpu_to_le32(off);
> +	info->reg_size = cpu_to_le32(reg_size);
> +	off += reg_sz_aligned;
> +
> +	info->ll_rd_phys = cpu_to_le64(ll_rd_phys);
> +	info->ll_rd_off = cpu_to_le32(off);
> +	info->ll_rd_size = cpu_to_le32(ll_rd_size);
> +	off += ll_rd_sz_aligned;
> +
> +	info->ll_wr_phys = cpu_to_le64(ll_wr_phys);
> +	info->ll_wr_off = cpu_to_le32(off);
> +	info->ll_wr_size = cpu_to_le32(ll_wr_size);
> +	off += ll_wr_sz_aligned;
> +
> +	info->test_buf_phys = cpu_to_le64(edma->test_buf_phys);
> +	info->test_buf_size = cpu_to_le32(edma->test_buf_size);
> +
> +	edma->info = info;
> +	edma->reg_submap_sz = reg_sz_aligned;
> +	edma->ll_rd_phys = ll_rd_phys;
> +	edma->ll_wr_phys = ll_wr_phys;
> +	edma->ll_rd_sz_aligned = ll_rd_sz_aligned;
> +	edma->ll_wr_sz_aligned = ll_wr_sz_aligned;
> +
> +	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no,
> +			      &epf->bar[edma->bar]);
> +	if (ret) {
> +		dev_err(dev,
> +			"failed to init BAR%d for remote eDMA: %d\n",
> +			edma->bar, ret);
> +		goto err;
> +	}
> +	dev_info(dev, "BAR%d initialized for remote eDMA\n", edma->bar);
> +
> +	return 0;
> +
> +err:
> +	pci_epf_test_clean_remote_edma(test);
> +	devm_kfree(&epf->dev, edma);
> +	test->data = NULL;
> +	return ret;
> +}
> +
> +static int pci_epf_test_map_remote_edma(struct pci_epf_test *test)
> +{
> +	struct pci_epf_test_edma *edma = test->data;
> +	struct pcitest_edma_info *info;
> +	struct pci_epf *epf = test->epf;
> +	struct pci_epc *epc = epf->epc;
> +	struct pci_epf_bar *bar;
> +	enum pci_barno barno;
> +	struct device *dev = epc->dev.parent;
> +	int ret;
> +
> +	if (!edma)
> +		return -ENODEV;
> +
> +	info = edma->info;
> +	barno = edma->bar;
> +
> +	if (barno == NO_BAR)
> +		return -ENOSPC;
> +	if (!info || !edma->test_buf)
> +		return -ENODEV;
> +
> +	bar = &epf->bar[barno];
> +	pci_epf_test_clear_submaps(bar);
> +
> +	ret = pci_epf_test_add_submap(bar, bar->phys_addr, PAGE_SIZE);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_epf_test_add_submap(bar, edma->reg_phys, edma->reg_submap_sz);
> +	if (ret)
> +		goto err_submap;
> +
> +	ret = pci_epf_test_add_submap(bar, edma->ll_rd_phys,
> +				      edma->ll_rd_sz_aligned);
> +	if (ret)
> +		goto err_submap;
> +
> +	ret = pci_epf_test_add_submap(bar, edma->ll_wr_phys,
> +				      edma->ll_wr_sz_aligned);
> +	if (ret)
> +		goto err_submap;
> +
> +	if (bar->size > edma->total_size) {
> +		ret = pci_epf_test_add_submap(bar, 0,
> +					      bar->size - edma->total_size);
> +		if (ret)
> +			goto err_submap;
> +	}
> +
> +	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no, bar);
> +	if (ret) {
> +		dev_err(dev, "failed to map BAR%d: %d\n", barno, ret);
> +		goto err_submap;
> +	}
> +
> +	/*
> +	 * Endpoint-local interrupts must be ignored even if the host fails to
> +	 * mask them.
> +	 */
> +	ret = dw_edma_chan_irq_config(test->dma_chan_tx, DW_EDMA_CH_IRQ_REMOTE);
> +	if (ret) {
> +		dev_err(dev, "failed to set irq mode for tx channel: %d\n",
> +			ret);
> +		goto err_bar;
> +	}
> +	ret = dw_edma_chan_irq_config(test->dma_chan_rx, DW_EDMA_CH_IRQ_REMOTE);
> +	if (ret) {
> +		dev_err(dev, "failed to set irq mode for rx channel: %d\n",
> +			ret);
> +		goto err_bar;
> +	}
> +
> +	return 0;
> +err_bar:
> +	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, &epf->bar[barno]);
> +err_submap:
> +	pci_epf_test_clear_submaps(bar);
> +	return ret;
> +}
> +
> +static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
> +					   struct pci_epf_test_reg *reg)
> +{
> +	struct pci_epf_test_edma *edma = epf_test->data;
> +	size_t size = le32_to_cpu(reg->size);
> +	void *buf;
> +	int ret;
> +
> +	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
> +		reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
> +		return;
> +	}
> +
> +	buf = edma->test_buf;
> +
> +	if (!edma->enabled) {
> +		/* NB. Currently DW eDMA is the only supported backend */
> +		ret = pci_epf_test_map_remote_edma(epf_test);
> +		if (ret) {
> +			WRITE_ONCE(reg->status,
> +				   cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL));
> +			return;
> +		}
> +		edma->enabled = true;
> +	}
> +
> +	/* Populate the test buffer with random data */
> +	get_random_bytes(buf, size);
> +	reg->checksum = cpu_to_le32(crc32_le(~0, buf, size));
> +
> +	WRITE_ONCE(reg->status, cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_SUCCESS));
> +}
> +
> +static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
> +					      struct pci_epf_test_reg *reg)
> +{
> +	struct pci_epf_test_edma *edma = epf_test->data;
> +	u32 status = le32_to_cpu(reg->status);
> +	size_t size;
> +	void *addr;
> +	u32 crc32;
> +
> +	size = le32_to_cpu(reg->size);
> +	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
> +		status |= STATUS_REMOTE_EDMA_CHECKSUM_FAIL;
> +		reg->status = cpu_to_le32(status);
> +		return;
> +	}
> +
> +	addr = edma->test_buf;
> +	crc32 = crc32_le(~0, addr, size);
> +	status |= STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS;
> +
> +	reg->checksum = cpu_to_le32(crc32);
> +	reg->status = cpu_to_le32(status);
> +}
> +
> +static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
> +{
> +	dw_edma_chan_irq_config(chan, DW_EDMA_CH_IRQ_DEFAULT);
> +}
> +#else
> +static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
> +					 enum pci_barno barno)
> +{
> +	return false;
> +}
> +
> +static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
> +{
> +}
> +
> +static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
> +					   struct pci_epf_test_reg *reg)
> +{
> +	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
> +}
> +
> +static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
> +					      struct pci_epf_test_reg *reg)
> +{
> +	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_CHECKSUM_FAIL);
> +}
> +
> +static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
> +{
> +}
> +#endif
> +
>  static void pci_epf_test_dma_callback(void *param)
>  {
>  	struct pci_epf_test *epf_test = param;
> @@ -168,6 +626,8 @@ static int pci_epf_test_data_transfer(struct pci_epf_test *epf_test,
>  		return -EINVAL;
>  	}
>
> +	pci_epf_test_reset_dma_chan(chan);
> +
>  	if (epf_test->dma_private) {
>  		sconf.direction = dir;
>  		if (dir == DMA_MEM_TO_DEV)
> @@ -870,6 +1330,14 @@ static void pci_epf_test_cmd_handler(struct work_struct *work)
>  		pci_epf_test_disable_doorbell(epf_test, reg);
>  		pci_epf_test_raise_irq(epf_test, reg);
>  		break;
> +	case COMMAND_REMOTE_EDMA_SETUP:
> +		pci_epf_test_remote_edma_setup(epf_test, reg);
> +		pci_epf_test_raise_irq(epf_test, reg);
> +		break;
> +	case COMMAND_REMOTE_EDMA_CHECKSUM:
> +		pci_epf_test_remote_edma_checksum(epf_test, reg);
> +		pci_epf_test_raise_irq(epf_test, reg);
> +		break;
>  	default:
>  		dev_err(dev, "Invalid command 0x%x\n", command);
>  		break;
> @@ -961,6 +1429,10 @@ static int pci_epf_test_epc_init(struct pci_epf *epf)
>  	if (ret)
>  		epf_test->dma_supported = false;
>
> +	ret = pci_epf_test_init_remote_edma(epf_test);
> +	if (ret && ret != -EOPNOTSUPP)
> +		dev_warn(dev, "Remote eDMA setup failed\n");
> +
>  	if (epf->vfunc_no <= 1) {
>  		ret = pci_epc_write_header(epc, epf->func_no, epf->vfunc_no, header);
>  		if (ret) {
> @@ -1007,6 +1479,7 @@ static void pci_epf_test_epc_deinit(struct pci_epf *epf)
>  	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
>
>  	cancel_delayed_work_sync(&epf_test->cmd_handler);
> +	pci_epf_test_clean_remote_edma(epf_test);
>  	pci_epf_test_clean_dma_chan(epf_test);
>  	pci_epf_test_clear_bar(epf);
>  }
> @@ -1076,6 +1549,9 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
>  		if (bar == test_reg_bar)
>  			continue;
>
> +		if (pci_epf_test_bar_is_reserved(epf_test, bar))
> +			continue;
> +
>  		if (epc_features->bar[bar].type == BAR_FIXED)
>  			test_reg_size = epc_features->bar[bar].fixed_size;
>  		else
> @@ -1146,6 +1622,7 @@ static void pci_epf_test_unbind(struct pci_epf *epf)
>
>  	cancel_delayed_work_sync(&epf_test->cmd_handler);
>  	if (epc->init_complete) {
> +		pci_epf_test_clean_remote_edma(epf_test);
>  		pci_epf_test_clean_dma_chan(epf_test);
>  		pci_epf_test_clear_bar(epf);
>  	}
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA
  2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
                   ` (37 preceding siblings ...)
  2026-01-18 13:54 ` [RFC PATCH v4 38/38] selftests: pci_endpoint: Add remote eDMA transfer coverage Koichiro Den
@ 2026-01-20 18:30 ` Dave Jiang
  2026-01-20 18:47   ` Dave Jiang
  38 siblings, 1 reply; 68+ messages in thread
From: Dave Jiang @ 2026-01-20 18:30 UTC (permalink / raw)
  To: Koichiro Den, Frank.Li, cassel, mani, kwilczynski, kishon,
	bhelgaas, geert+renesas, robh, vkoul, jdmason, allenbh,
	jingoohan1, lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t



On 1/18/26 6:54 AM, Koichiro Den wrote:
> Hi,
> 
> This is RFC v4 of the NTB/PCI/dmaengine series that introduces an
> optional NTB transport variant where payload data is moved by a PCI
> embedded-DMA engine (eDMA) residing on the endpoint side.

Just a fly by comment. This series is huge. I do suggest break it down to something more manageable to prevent review fatigue from patch reviewers. For example, linux network sub-system has a rule to restrict patch series to no more than 15 patches. NTB sub-system does not have that rule. But maybe split out the dmaengine changes and the hardware specific dw-edma bits from the ntb core changes.

DJ
 
> 
> The primary target is Synopsys DesignWare PCIe endpoint controllers that
> integrate a DesignWare eDMA instance (dw-edma). In the remote
> embedded-DMA mode, payload is transferred by DMA directly between the
> two systems' memory, and NTB Memory Windows are used primarily for
> control/metadata and for exposing the endpoint eDMA resources (register
> window + linked-list rings) to the host.
> 
> Compared to the existing cpu/dma memcpy-based implementation, this
> approach avoids window-backed payload rings and the associated extra
> copies, and it is less sensitive to scarce MW space. This also enables
> scaling out to multiple queue pairs, which is particularly beneficial
> for ntb_netdev. On R-Car S4, preliminary iperf3 results show 10~20x
> throughput improvement. Latency improvements are also observed.
> 
> RFC history:
>   RFC v3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
>   RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
>   RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> 
> Parts of RFC v3 series have already been split out and posted separately
> (see "Kernel base / dependencies" section below). However, feedback on
> the remaining parts led to substantial restructuring and code changes,
> so I am sending an RFC v4 as a refreshed version of the full series.
> 
> RFC v4 is still a large, cross-subsystem series. At this RFC stage,
> I am sending the full picture in a single set to make it easier to
> review the overall direction and architecture. Once the direction is
> agreed upon and no further large restructuring appears necessary, I will stop
> posting the new RFC-tagged revisions and continue development on
> separate threads, split by sub-topic.
> 
> Many thanks for all the reviews and feedback from multiple perspectives.
> 
> 
> Software architecture overview (RFC v4)
> =======================================
> 
> A major change in RFC v4 is the software layering and module split.
> 
> The existing memcpy-based transport and the new remote embedded-DMA
> transport are implemented as two independent NTB client drivers on top
> of a shared core library:
> 
>                        +--------------------+
>                        | ntb_transport_core |
>                        +--------------------+
>                            ^            ^
>                            |            |
>         ntb_transport -----+            +----- ntb_transport_edma
>        (cpu/dma memcpy)                   (remote embedded DMA transfer)
>                                                        |
>                                                        v
>                                                  +-----------+
>                                                  |  ntb_edma |
>                                                  +-----------+
>                                                        ^
>                                                        |
>                                                +----------------+
>                                                |                |
>                                           ntb_dw_edma         [...]
> 
> Key points:
>   * ntb_transport_core provides the queue-pair abstraction used by upper
>     layer clients (e.g. ntb_netdev).
>   * ntb_transport is the legacy shared-memory transport client (CPU/DMA
>     memcpy).
>   * ntb_transport_edma is the remote embedded-DMA transport client.
>   * ntb_transport_edma relies on an ntb_edma backend registry.
>     This RFC provides an initial DesignWare backend (ntb_dw_edma).
>   * Transport selection is per-NTB device via the standard
>     driver_override mechanism. To enable that, this RFC adds
>     driver_override support to ntb_bus. This allows mixing transports
>     across multiple NTB ports and provides an explicit fallback path to
>     the legacy transport.
> 
> So, if ntb_transport / ntb_transport_edma are built as loadable modules,
> you can just run modprobe ntb_transport as before and the original cpu/dma
> memcpy-based implementation will be active. If they are built-in, whether
> ntb_transport or ntb_transport_edma are bound by default depends on
> initcall order. Regarding how to switch the driver, please see Patch 34
> ("Documentation: driver-api: ntb: Document remote embedded-DMA transport")
> for details.
> 
> 
> Data flow overview (remote embedded-DMA transport)
> ==================================================
> 
> At a high level:
>   * One MW is reserved as an "eDMA window". The endpoint exposes the
>     eDMA register block plus LL descriptor rings through that window, so
>     the peer can ioremap it and drive DMA reads remotely.
>   * Remaining MWs carry only small control-plane rings used to exchange
>     buffer addresses and completion information.
>   * For RC->EP traffic, the RC drives endpoint DMA read channels through
>     the peer-visible eDMA window.
>   * For EP->RC traffic, the endpoint uses its local DMA write channels.
> 
> The following figures illustrate the data flow when ntb_netdev sits on
> top of the transport:
> 
>      Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
>                    backed by ntb_edma/ntb_dw_edma
> 
>              EP                                   RC
>           phys addr                            phys addr
>             space                                space
>              +-+                                  +-+
>              | |                                  | |
>              | |                ||                | |
>              +-+-----.          ||                | |
>     EDMA REG | |      \     [A] ||                | |
>              +-+----.  '---+-+  ||                | |
>              | |     \     | |<---------[0-a]----------
>              +-+-----------| |<----------[2]----------.
>      EDMA LL | |           | |  ||                | | :
>              | |           | |  ||                | | :
>              +-+-----------+-+  ||  [B]           | | :
>              | |                ||  ++            | | :
>           ---------[0-b]----------->||----------------'
>              | |            ++  ||  ||            | |
>              | |            ||  ||  ++            | |
>              | |            ||<----------[4]-----------
>              | |            ++  ||                | |
>              | |           [C]  ||                | |
>           .--|#|<------------------------[3]------|#|<-.
>           :  |#|                ||                |#|  :
>          [5] | |                ||                | | [1]
>           :  | |                ||                | |  :
>           '->|#|                                  |#|--'
>              |#|                                  |#|
>              | |                                  | |
> 
>      Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
>                   backed by ntb_edma/ntb_dw_edma
> 
>              EP                                   RC
>           phys addr                            phys addr
>             space                                space
>              +-+                                  +-+
>              | |                                  | |
>              | |                ||                | |
>              +-+                ||                | |
>     EDMA REG | |                ||                | |
>              +-+                ||                | |
>     ^        | |                ||                | |
>     :        +-+                ||                | |
>     : EDMA LL| |                ||                | |
>     :        | |                ||                | |
>     :        +-+                ||  [C]           | |
>     :        | |                ||  ++            | |
>     :     -----------[4]----------->||            | |
>     :        | |            ++  ||  ||            | |
>     :        | |            ||  ||  ++            | |
>     '----------------[2]-----||<--------[0-b]-----------
>              | |            ++  ||                | |
>              | |           [B]  ||                | |
>           .->|#|--------[3]---------------------->|#|--.
>           :  |#|                ||                |#|  :
>          [1] | |                ||                | | [5]
>           :  | |                ||                | |  :
>           '--|#|                                  |#|<-'
>              |#|                                  |#|
>              | |                                  | |
> 
>     0-a. configure remote embedded DMA (program endpoint DMA registers)
>     0-b. DMA-map and publish destination address (DAR)
>     1.   network stack builds skb (copy from application/user memory)
>     2.   consume DAR, DMA-map source address (SAR) and kick DMA transfer
>     3.   DMA transfer (payload moves between RC/EP memory)
>     4.   consume completion (commit)
>     5.   network stack delivers data to application/user memory
> 
>     [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
>     [B]: Control-plane ring buffer for "produce"
>     [C]: Control-plane ring buffer for "consume"
> 
> 
> Kernel base / dependencies
> ==========================
> 
> This series is based on:
> 
>   - next-20260114 (commit b775e489bec7)
> 
> plus the following seven unmerged patch series or standalone patches:
> 
>   - [PATCH v4 0/7] PCI: endpoint/NTB: Harden vNTB resource management
>     https://lore.kernel.org/all/20251202072348.2752371-1-den@valinux.co.jp/
> 
>   - [PATCH v2 0/2] NTB: ntb_transport: debugfs cleanups
>     https://lore.kernel.org/all/20260107042458.1987818-1-den@valinux.co.jp/
> 
>   - [PATCH v3 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
>     https://lore.kernel.org/all/20260105-dma_prep_config-v3-0-a8480362fd42@nxp.com/
> 
>   - [PATCH v8 0/5] PCI: endpoint: BAR subrange mapping support
>     https://lore.kernel.org/all/20260115084928.55701-1-den@valinux.co.jp/
> 
>   - [PATCH] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
>     https://lore.kernel.org/all/20260105075606.1253697-1-den@valinux.co.jp/
> 
>   - [PATCH] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
>     https://lore.kernel.org/all/20260105075904.1254012-1-den@valinux.co.jp/
> 
>   - [PATCH v2 01/11] dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
>     https://lore.kernel.org/imx/20260109-edma_ll-v2-1-5c0b27b2c664@nxp.com/
>     (only this single commit is cherry-picked from the series)
> 
> 
> Patch layout
> ============
> 
>   1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
>      register/LL windows, IRQ routing control, etc.)
> 
>      Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
>      Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
>      Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
>      Patch 04 : dmaengine: dw-edma: Add notify-only channels support
>      Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region
> 
>   2. NTB EPF/core + vNTB prep (mwN_offset + versioning, MSI vector
>      management, new ntb_dev_ops helpers, driver_override, vntb glue)
> 
>      Patch 06 : NTB: epf: Add mwN_offset support and config region versioning
>      Patch 07 : NTB: epf: Reserve a subset of MSI vectors for non-NTB users
>      Patch 08 : NTB: epf: Provide db_vector_count/db_vector_mask callbacks
>      Patch 09 : NTB: core: Add mw_set_trans_ranges() for subrange programming
>      Patch 10 : NTB: core: Add .get_private_data() to ntb_dev_ops
>      Patch 11 : NTB: core: Add .get_dma_dev() to ntb_dev_ops
>      Patch 12 : NTB: core: Add driver_override support for NTB devices
>      Patch 13 : PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
>      Patch 14 : PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
>      Patch 15 : PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
> 
>   3. ntb_transport refactor/modularization and backend infrastructure
> 
>      Patch 16 : NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
>      Patch 17 : NTB: ntb_transport: Dynamically determine qp count
>      Patch 18 : NTB: ntb_transport: Use ntb_get_dma_dev()
>      Patch 19 : NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
>      Patch 20 : NTB: ntb_transport: Move internal types to ntb_transport_internal.h
>      Patch 21 : NTB: ntb_transport: Export common helpers for modularization
>      Patch 22 : NTB: ntb_transport: Split core library and default NTB client
>      Patch 23 : NTB: ntb_transport: Add transport backend infrastructure
>      Patch 24 : NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
> 
>   4. ntb_edma backend registry + DesignWare backend + transport client
> 
>      Patch 25 : NTB: hw: Add remote eDMA backend registry and DesignWare backend
>      Patch 26 : NTB: ntb_transport: Add remote embedded-DMA transport client
> 
>   5. ntb_netdev multi-queue support
> 
>      Patch 27 : ntb_netdev: Multi-queue support
> 
>   6. Renesas R-Car S4 enablement (IOMMU, DTs, quirks)
> 
>      Patch 28 : iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
>      Patch 29 : iommu: ipmmu-vmsa: Add support for reserved regions
>      Patch 30 : arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
>      Patch 31 : NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
>      Patch 32 : NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car
> 
>   7. Documentation updates
> 
>      Patch 33 : Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
>      Patch 34 : Documentation: driver-api: ntb: Document remote embedded-DMA transport
> 
>   8. pci-epf-test / pci_endpoint_test / kselftest coverage for remote eDMA
> 
>      Patch 35 : PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
>      Patch 36 : PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
>      Patch 37 : misc: pci_endpoint_test: Add remote eDMA transfer test mode
>      Patch 38 : selftests: pci_endpoint: Add remote eDMA transfer coverage
> 
> 
> Tested on
> =========
> 
> * 2x Renesas R-Car S4 Spider (RC<->EP connected with OCuLink cable)
> * Kernel base as described above
> 
> 
> Performance notes
> =================
> 
> The primary motivation remains improving throughput/latency for ntb_transport
> users (typically ntb_netdev). On R-Car S4, the earlier prototype (RFC v3)
> showed roughly 10-20x throughput improvement in preliminary iperf3 tests and
> lower ping RTT. I have not yet re-measured after the v4 refactor and
> module split.
> 
> 
> Changelog
> =========
> 
> RFCv3->RFCv4 changes:
>   - Major refactor of the transport layering:
>     - Introduce ntb_transport_core as a shared library module.
>     - Split the legacy shared-memory transport client (ntb_transport) and the
>       remote embedded-DMA transport client (ntb_transport_edma).
>     - Add driver_override support for ntb_bus and use it for per-port transport
>       selection.
>   - Introduce a vendor-agnostic remote embedded-DMA backend registry (ntb_edma)
>     and add the initial DesignWare backend (ntb_dw_edma).
>   - Rebase to next-20260114 and move several prerequisite/fixup patchsets into
>     separate threads (listed above), including BAR subrange mapping support and
>     dw-edma fixes.
>   - Add PCI endpoint test coverage for the remote embedded-DMA path:
>     - extend pci-epf-test / pci_endpoint_test
>     - add a kselftest variant to exercise remote-eDMA transfers
>     Note: to keep the changes as small as possible, I added a few #ifdefs
>     in the main test code. Feedback on whether/how/to what extent this
>     should be split into separate modules would be appreciated.
>   - Expand documentation (Documentation/driver-api/ntb.rst) to describe transport
>     variants, the new module structure, and the remote embedded-DMA data flow.
>   - Addressed other feedbacks from the RFC v3 thread.
> 
> RFCv2->RFCv3 changes:
>   - Architecture
>     - Have EP side use its local write channels, while leaving RC side to
>       use remote read channels.
>     - Abstraction/HW-specific stuff encapsulation improved.
>   - Added control/config region versioning for the vNTB/EPF control region
>     so that mismatched RC/EP kernels fail early instead of silently using an
>     incompatible layout.
>   - Reworked BAR subrange / multi-region mapping support:
>     - Dropped the v2 approach that added new inbound mapping ops in the EPC
>       core.
>     - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
>       support BAR subrange inbound mapping via Address Match Mode IB iATU.
>     - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
>       when offsets are used.
>   - Changed .get_pci_epc() to .get_private_data()
>   - Dropped two commits from RFC v2 that should be submitted separately:
>     (1) ntb_transport debugfs seq_file conversion
>     (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
>   - Added documentation updates.
>   - Addressed assorted review nits from the RFC v2 thread (naming/structure).
> 
> RFCv1->RFCv2 changes:
>   - Architecture
>     - Drop the generic interrupt backend + DW eDMA test-interrupt backend
>       approach and instead adopt the remote eDMA-backed ntb_transport mode
>       proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
>       mapping (Address Match Mode) infrastructure from RFC v1 is largely
>       kept, with only minor refinements and code motion where necessary
>       to fit the new transport-mode design.
>   - For Patch 01
>     - Rework the array_index_nospec() conversion to address review
>       comments on "[RFC PATCH 01/25]".
> 
> RFCv3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
> RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> 
> Thank you for reviewing,
> 
> 
> Koichiro Den (38):
>   dmaengine: dw-edma: Export helper to get integrated register window
>   dmaengine: dw-edma: Add per-channel interrupt routing control
>   dmaengine: dw-edma: Poll completion when local IRQ handling is
>     disabled
>   dmaengine: dw-edma: Add notify-only channels support
>   dmaengine: dw-edma: Add a helper to query linked-list region
>   NTB: epf: Add mwN_offset support and config region versioning
>   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
>   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
>   NTB: core: Add mw_set_trans_ranges() for subrange programming
>   NTB: core: Add .get_private_data() to ntb_dev_ops
>   NTB: core: Add .get_dma_dev() to ntb_dev_ops
>   NTB: core: Add driver_override support for NTB devices
>   PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
>   PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
>   PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
>   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
>   NTB: ntb_transport: Dynamically determine qp count
>   NTB: ntb_transport: Use ntb_get_dma_dev()
>   NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
>   NTB: ntb_transport: Move internal types to ntb_transport_internal.h
>   NTB: ntb_transport: Export common helpers for modularization
>   NTB: ntb_transport: Split core library and default NTB client
>   NTB: ntb_transport: Add transport backend infrastructure
>   NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
>   NTB: hw: Add remote eDMA backend registry and DesignWare backend
>   NTB: ntb_transport: Add remote embedded-DMA transport client
>   ntb_netdev: Multi-queue support
>   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
>   iommu: ipmmu-vmsa: Add support for reserved regions
>   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
>     eDMA
>   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
>   NTB: epf: Add an additional memory window (MW2) barno mapping on
>     Renesas R-Car
>   Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
>     usage
>   Documentation: driver-api: ntb: Document remote embedded-DMA transport
>   PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
>   PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
>   misc: pci_endpoint_test: Add remote eDMA transfer test mode
>   selftests: pci_endpoint: Add remote eDMA transfer coverage
> 
>  Documentation/PCI/endpoint/pci-vntb-howto.rst |   19 +-
>  Documentation/driver-api/ntb.rst              |  193 ++
>  arch/arm64/boot/dts/renesas/Makefile          |    2 +
>  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   37 +
>  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
>  drivers/dma/dw-edma/dw-edma-core.c            |  207 +-
>  drivers/dma/dw-edma/dw-edma-core.h            |   10 +
>  drivers/dma/dw-edma/dw-edma-v0-core.c         |   26 +-
>  drivers/iommu/ipmmu-vmsa.c                    |    7 +-
>  drivers/misc/pci_endpoint_test.c              |  633 +++++
>  drivers/net/ntb_netdev.c                      |  341 ++-
>  drivers/ntb/Kconfig                           |   13 +
>  drivers/ntb/Makefile                          |    2 +
>  drivers/ntb/core.c                            |   68 +
>  drivers/ntb/hw/Kconfig                        |    1 +
>  drivers/ntb/hw/Makefile                       |    1 +
>  drivers/ntb/hw/edma/Kconfig                   |   28 +
>  drivers/ntb/hw/edma/Makefile                  |    5 +
>  drivers/ntb/hw/edma/backend.c                 |   87 +
>  drivers/ntb/hw/edma/backend.h                 |  102 +
>  drivers/ntb/hw/edma/ntb_dw_edma.c             |  977 +++++++
>  drivers/ntb/hw/epf/ntb_hw_epf.c               |  199 +-
>  drivers/ntb/ntb_transport.c                   | 2458 +---------------
>  drivers/ntb/ntb_transport_core.c              | 2523 +++++++++++++++++
>  drivers/ntb/ntb_transport_edma.c              | 1110 ++++++++
>  drivers/ntb/ntb_transport_internal.h          |  261 ++
>  drivers/pci/controller/dwc/pcie-designware.c  |   26 +
>  drivers/pci/endpoint/functions/pci-epf-test.c |  497 +++-
>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  380 ++-
>  include/linux/dma/edma.h                      |  106 +
>  include/linux/ntb.h                           |   88 +
>  include/uapi/linux/pcitest.h                  |    3 +-
>  .../pci_endpoint/pci_endpoint_test.c          |   17 +
>  33 files changed, 7855 insertions(+), 2624 deletions(-)
>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
>  create mode 100644 drivers/ntb/hw/edma/Kconfig
>  create mode 100644 drivers/ntb/hw/edma/Makefile
>  create mode 100644 drivers/ntb/hw/edma/backend.c
>  create mode 100644 drivers/ntb/hw/edma/backend.h
>  create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c
>  create mode 100644 drivers/ntb/ntb_transport_core.c
>  create mode 100644 drivers/ntb/ntb_transport_edma.c
>  create mode 100644 drivers/ntb/ntb_transport_internal.h
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA
  2026-01-20 18:30 ` [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Dave Jiang
@ 2026-01-20 18:47   ` Dave Jiang
  2026-01-21  2:40     ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Dave Jiang @ 2026-01-20 18:47 UTC (permalink / raw)
  To: Koichiro Den, Frank.Li, cassel, mani, kwilczynski, kishon,
	bhelgaas, geert+renesas, robh, vkoul, jdmason, allenbh,
	jingoohan1, lpieralisi
  Cc: linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t



On 1/20/26 11:30 AM, Dave Jiang wrote:
> 
> 
> On 1/18/26 6:54 AM, Koichiro Den wrote:
>> Hi,
>>
>> This is RFC v4 of the NTB/PCI/dmaengine series that introduces an
>> optional NTB transport variant where payload data is moved by a PCI
>> embedded-DMA engine (eDMA) residing on the endpoint side.
> 
> Just a fly by comment. This series is huge. I do suggest break it down to something more manageable to prevent review fatigue from patch reviewers. For example, linux network sub-system has a rule to restrict patch series to no more than 15 patches. NTB sub-system does not have that rule. But maybe split out the dmaengine changes and the hardware specific dw-edma bits from the ntb core changes.
> 
> DJ

Ah I do see your comment that you will split when out of RFC below now.

DJ
>  
>>
>> The primary target is Synopsys DesignWare PCIe endpoint controllers that
>> integrate a DesignWare eDMA instance (dw-edma). In the remote
>> embedded-DMA mode, payload is transferred by DMA directly between the
>> two systems' memory, and NTB Memory Windows are used primarily for
>> control/metadata and for exposing the endpoint eDMA resources (register
>> window + linked-list rings) to the host.
>>
>> Compared to the existing cpu/dma memcpy-based implementation, this
>> approach avoids window-backed payload rings and the associated extra
>> copies, and it is less sensitive to scarce MW space. This also enables
>> scaling out to multiple queue pairs, which is particularly beneficial
>> for ntb_netdev. On R-Car S4, preliminary iperf3 results show 10~20x
>> throughput improvement. Latency improvements are also observed.
>>
>> RFC history:
>>   RFC v3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
>>   RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
>>   RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>>
>> Parts of RFC v3 series have already been split out and posted separately
>> (see "Kernel base / dependencies" section below). However, feedback on
>> the remaining parts led to substantial restructuring and code changes,
>> so I am sending an RFC v4 as a refreshed version of the full series.
>>
>> RFC v4 is still a large, cross-subsystem series. At this RFC stage,
>> I am sending the full picture in a single set to make it easier to
>> review the overall direction and architecture. Once the direction is
>> agreed upon and no further large restructuring appears necessary, I will stop
>> posting the new RFC-tagged revisions and continue development on
>> separate threads, split by sub-topic.
>>
>> Many thanks for all the reviews and feedback from multiple perspectives.
>>
>>
>> Software architecture overview (RFC v4)
>> =======================================
>>
>> A major change in RFC v4 is the software layering and module split.
>>
>> The existing memcpy-based transport and the new remote embedded-DMA
>> transport are implemented as two independent NTB client drivers on top
>> of a shared core library:
>>
>>                        +--------------------+
>>                        | ntb_transport_core |
>>                        +--------------------+
>>                            ^            ^
>>                            |            |
>>         ntb_transport -----+            +----- ntb_transport_edma
>>        (cpu/dma memcpy)                   (remote embedded DMA transfer)
>>                                                        |
>>                                                        v
>>                                                  +-----------+
>>                                                  |  ntb_edma |
>>                                                  +-----------+
>>                                                        ^
>>                                                        |
>>                                                +----------------+
>>                                                |                |
>>                                           ntb_dw_edma         [...]
>>
>> Key points:
>>   * ntb_transport_core provides the queue-pair abstraction used by upper
>>     layer clients (e.g. ntb_netdev).
>>   * ntb_transport is the legacy shared-memory transport client (CPU/DMA
>>     memcpy).
>>   * ntb_transport_edma is the remote embedded-DMA transport client.
>>   * ntb_transport_edma relies on an ntb_edma backend registry.
>>     This RFC provides an initial DesignWare backend (ntb_dw_edma).
>>   * Transport selection is per-NTB device via the standard
>>     driver_override mechanism. To enable that, this RFC adds
>>     driver_override support to ntb_bus. This allows mixing transports
>>     across multiple NTB ports and provides an explicit fallback path to
>>     the legacy transport.
>>
>> So, if ntb_transport / ntb_transport_edma are built as loadable modules,
>> you can just run modprobe ntb_transport as before and the original cpu/dma
>> memcpy-based implementation will be active. If they are built-in, whether
>> ntb_transport or ntb_transport_edma are bound by default depends on
>> initcall order. Regarding how to switch the driver, please see Patch 34
>> ("Documentation: driver-api: ntb: Document remote embedded-DMA transport")
>> for details.
>>
>>
>> Data flow overview (remote embedded-DMA transport)
>> ==================================================
>>
>> At a high level:
>>   * One MW is reserved as an "eDMA window". The endpoint exposes the
>>     eDMA register block plus LL descriptor rings through that window, so
>>     the peer can ioremap it and drive DMA reads remotely.
>>   * Remaining MWs carry only small control-plane rings used to exchange
>>     buffer addresses and completion information.
>>   * For RC->EP traffic, the RC drives endpoint DMA read channels through
>>     the peer-visible eDMA window.
>>   * For EP->RC traffic, the endpoint uses its local DMA write channels.
>>
>> The following figures illustrate the data flow when ntb_netdev sits on
>> top of the transport:
>>
>>      Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
>>                    backed by ntb_edma/ntb_dw_edma
>>
>>              EP                                   RC
>>           phys addr                            phys addr
>>             space                                space
>>              +-+                                  +-+
>>              | |                                  | |
>>              | |                ||                | |
>>              +-+-----.          ||                | |
>>     EDMA REG | |      \     [A] ||                | |
>>              +-+----.  '---+-+  ||                | |
>>              | |     \     | |<---------[0-a]----------
>>              +-+-----------| |<----------[2]----------.
>>      EDMA LL | |           | |  ||                | | :
>>              | |           | |  ||                | | :
>>              +-+-----------+-+  ||  [B]           | | :
>>              | |                ||  ++            | | :
>>           ---------[0-b]----------->||----------------'
>>              | |            ++  ||  ||            | |
>>              | |            ||  ||  ++            | |
>>              | |            ||<----------[4]-----------
>>              | |            ++  ||                | |
>>              | |           [C]  ||                | |
>>           .--|#|<------------------------[3]------|#|<-.
>>           :  |#|                ||                |#|  :
>>          [5] | |                ||                | | [1]
>>           :  | |                ||                | |  :
>>           '->|#|                                  |#|--'
>>              |#|                                  |#|
>>              | |                                  | |
>>
>>      Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
>>                   backed by ntb_edma/ntb_dw_edma
>>
>>              EP                                   RC
>>           phys addr                            phys addr
>>             space                                space
>>              +-+                                  +-+
>>              | |                                  | |
>>              | |                ||                | |
>>              +-+                ||                | |
>>     EDMA REG | |                ||                | |
>>              +-+                ||                | |
>>     ^        | |                ||                | |
>>     :        +-+                ||                | |
>>     : EDMA LL| |                ||                | |
>>     :        | |                ||                | |
>>     :        +-+                ||  [C]           | |
>>     :        | |                ||  ++            | |
>>     :     -----------[4]----------->||            | |
>>     :        | |            ++  ||  ||            | |
>>     :        | |            ||  ||  ++            | |
>>     '----------------[2]-----||<--------[0-b]-----------
>>              | |            ++  ||                | |
>>              | |           [B]  ||                | |
>>           .->|#|--------[3]---------------------->|#|--.
>>           :  |#|                ||                |#|  :
>>          [1] | |                ||                | | [5]
>>           :  | |                ||                | |  :
>>           '--|#|                                  |#|<-'
>>              |#|                                  |#|
>>              | |                                  | |
>>
>>     0-a. configure remote embedded DMA (program endpoint DMA registers)
>>     0-b. DMA-map and publish destination address (DAR)
>>     1.   network stack builds skb (copy from application/user memory)
>>     2.   consume DAR, DMA-map source address (SAR) and kick DMA transfer
>>     3.   DMA transfer (payload moves between RC/EP memory)
>>     4.   consume completion (commit)
>>     5.   network stack delivers data to application/user memory
>>
>>     [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
>>     [B]: Control-plane ring buffer for "produce"
>>     [C]: Control-plane ring buffer for "consume"
>>
>>
>> Kernel base / dependencies
>> ==========================
>>
>> This series is based on:
>>
>>   - next-20260114 (commit b775e489bec7)
>>
>> plus the following seven unmerged patch series or standalone patches:
>>
>>   - [PATCH v4 0/7] PCI: endpoint/NTB: Harden vNTB resource management
>>     https://lore.kernel.org/all/20251202072348.2752371-1-den@valinux.co.jp/
>>
>>   - [PATCH v2 0/2] NTB: ntb_transport: debugfs cleanups
>>     https://lore.kernel.org/all/20260107042458.1987818-1-den@valinux.co.jp/
>>
>>   - [PATCH v3 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
>>     https://lore.kernel.org/all/20260105-dma_prep_config-v3-0-a8480362fd42@nxp.com/
>>
>>   - [PATCH v8 0/5] PCI: endpoint: BAR subrange mapping support
>>     https://lore.kernel.org/all/20260115084928.55701-1-den@valinux.co.jp/
>>
>>   - [PATCH] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
>>     https://lore.kernel.org/all/20260105075606.1253697-1-den@valinux.co.jp/
>>
>>   - [PATCH] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
>>     https://lore.kernel.org/all/20260105075904.1254012-1-den@valinux.co.jp/
>>
>>   - [PATCH v2 01/11] dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
>>     https://lore.kernel.org/imx/20260109-edma_ll-v2-1-5c0b27b2c664@nxp.com/
>>     (only this single commit is cherry-picked from the series)
>>
>>
>> Patch layout
>> ============
>>
>>   1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
>>      register/LL windows, IRQ routing control, etc.)
>>
>>      Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
>>      Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
>>      Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
>>      Patch 04 : dmaengine: dw-edma: Add notify-only channels support
>>      Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region
>>
>>   2. NTB EPF/core + vNTB prep (mwN_offset + versioning, MSI vector
>>      management, new ntb_dev_ops helpers, driver_override, vntb glue)
>>
>>      Patch 06 : NTB: epf: Add mwN_offset support and config region versioning
>>      Patch 07 : NTB: epf: Reserve a subset of MSI vectors for non-NTB users
>>      Patch 08 : NTB: epf: Provide db_vector_count/db_vector_mask callbacks
>>      Patch 09 : NTB: core: Add mw_set_trans_ranges() for subrange programming
>>      Patch 10 : NTB: core: Add .get_private_data() to ntb_dev_ops
>>      Patch 11 : NTB: core: Add .get_dma_dev() to ntb_dev_ops
>>      Patch 12 : NTB: core: Add driver_override support for NTB devices
>>      Patch 13 : PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
>>      Patch 14 : PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
>>      Patch 15 : PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
>>
>>   3. ntb_transport refactor/modularization and backend infrastructure
>>
>>      Patch 16 : NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
>>      Patch 17 : NTB: ntb_transport: Dynamically determine qp count
>>      Patch 18 : NTB: ntb_transport: Use ntb_get_dma_dev()
>>      Patch 19 : NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
>>      Patch 20 : NTB: ntb_transport: Move internal types to ntb_transport_internal.h
>>      Patch 21 : NTB: ntb_transport: Export common helpers for modularization
>>      Patch 22 : NTB: ntb_transport: Split core library and default NTB client
>>      Patch 23 : NTB: ntb_transport: Add transport backend infrastructure
>>      Patch 24 : NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
>>
>>   4. ntb_edma backend registry + DesignWare backend + transport client
>>
>>      Patch 25 : NTB: hw: Add remote eDMA backend registry and DesignWare backend
>>      Patch 26 : NTB: ntb_transport: Add remote embedded-DMA transport client
>>
>>   5. ntb_netdev multi-queue support
>>
>>      Patch 27 : ntb_netdev: Multi-queue support
>>
>>   6. Renesas R-Car S4 enablement (IOMMU, DTs, quirks)
>>
>>      Patch 28 : iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
>>      Patch 29 : iommu: ipmmu-vmsa: Add support for reserved regions
>>      Patch 30 : arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
>>      Patch 31 : NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
>>      Patch 32 : NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car
>>
>>   7. Documentation updates
>>
>>      Patch 33 : Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
>>      Patch 34 : Documentation: driver-api: ntb: Document remote embedded-DMA transport
>>
>>   8. pci-epf-test / pci_endpoint_test / kselftest coverage for remote eDMA
>>
>>      Patch 35 : PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
>>      Patch 36 : PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
>>      Patch 37 : misc: pci_endpoint_test: Add remote eDMA transfer test mode
>>      Patch 38 : selftests: pci_endpoint: Add remote eDMA transfer coverage
>>
>>
>> Tested on
>> =========
>>
>> * 2x Renesas R-Car S4 Spider (RC<->EP connected with OCuLink cable)
>> * Kernel base as described above
>>
>>
>> Performance notes
>> =================
>>
>> The primary motivation remains improving throughput/latency for ntb_transport
>> users (typically ntb_netdev). On R-Car S4, the earlier prototype (RFC v3)
>> showed roughly 10-20x throughput improvement in preliminary iperf3 tests and
>> lower ping RTT. I have not yet re-measured after the v4 refactor and
>> module split.
>>
>>
>> Changelog
>> =========
>>
>> RFCv3->RFCv4 changes:
>>   - Major refactor of the transport layering:
>>     - Introduce ntb_transport_core as a shared library module.
>>     - Split the legacy shared-memory transport client (ntb_transport) and the
>>       remote embedded-DMA transport client (ntb_transport_edma).
>>     - Add driver_override support for ntb_bus and use it for per-port transport
>>       selection.
>>   - Introduce a vendor-agnostic remote embedded-DMA backend registry (ntb_edma)
>>     and add the initial DesignWare backend (ntb_dw_edma).
>>   - Rebase to next-20260114 and move several prerequisite/fixup patchsets into
>>     separate threads (listed above), including BAR subrange mapping support and
>>     dw-edma fixes.
>>   - Add PCI endpoint test coverage for the remote embedded-DMA path:
>>     - extend pci-epf-test / pci_endpoint_test
>>     - add a kselftest variant to exercise remote-eDMA transfers
>>     Note: to keep the changes as small as possible, I added a few #ifdefs
>>     in the main test code. Feedback on whether/how/to what extent this
>>     should be split into separate modules would be appreciated.
>>   - Expand documentation (Documentation/driver-api/ntb.rst) to describe transport
>>     variants, the new module structure, and the remote embedded-DMA data flow.
>>   - Addressed other feedbacks from the RFC v3 thread.
>>
>> RFCv2->RFCv3 changes:
>>   - Architecture
>>     - Have EP side use its local write channels, while leaving RC side to
>>       use remote read channels.
>>     - Abstraction/HW-specific stuff encapsulation improved.
>>   - Added control/config region versioning for the vNTB/EPF control region
>>     so that mismatched RC/EP kernels fail early instead of silently using an
>>     incompatible layout.
>>   - Reworked BAR subrange / multi-region mapping support:
>>     - Dropped the v2 approach that added new inbound mapping ops in the EPC
>>       core.
>>     - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
>>       support BAR subrange inbound mapping via Address Match Mode IB iATU.
>>     - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
>>       when offsets are used.
>>   - Changed .get_pci_epc() to .get_private_data()
>>   - Dropped two commits from RFC v2 that should be submitted separately:
>>     (1) ntb_transport debugfs seq_file conversion
>>     (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
>>   - Added documentation updates.
>>   - Addressed assorted review nits from the RFC v2 thread (naming/structure).
>>
>> RFCv1->RFCv2 changes:
>>   - Architecture
>>     - Drop the generic interrupt backend + DW eDMA test-interrupt backend
>>       approach and instead adopt the remote eDMA-backed ntb_transport mode
>>       proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
>>       mapping (Address Match Mode) infrastructure from RFC v1 is largely
>>       kept, with only minor refinements and code motion where necessary
>>       to fit the new transport-mode design.
>>   - For Patch 01
>>     - Rework the array_index_nospec() conversion to address review
>>       comments on "[RFC PATCH 01/25]".
>>
>> RFCv3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
>> RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
>> RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>>
>> Thank you for reviewing,
>>
>>
>> Koichiro Den (38):
>>   dmaengine: dw-edma: Export helper to get integrated register window
>>   dmaengine: dw-edma: Add per-channel interrupt routing control
>>   dmaengine: dw-edma: Poll completion when local IRQ handling is
>>     disabled
>>   dmaengine: dw-edma: Add notify-only channels support
>>   dmaengine: dw-edma: Add a helper to query linked-list region
>>   NTB: epf: Add mwN_offset support and config region versioning
>>   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
>>   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
>>   NTB: core: Add mw_set_trans_ranges() for subrange programming
>>   NTB: core: Add .get_private_data() to ntb_dev_ops
>>   NTB: core: Add .get_dma_dev() to ntb_dev_ops
>>   NTB: core: Add driver_override support for NTB devices
>>   PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
>>   PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
>>   PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
>>   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
>>   NTB: ntb_transport: Dynamically determine qp count
>>   NTB: ntb_transport: Use ntb_get_dma_dev()
>>   NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
>>   NTB: ntb_transport: Move internal types to ntb_transport_internal.h
>>   NTB: ntb_transport: Export common helpers for modularization
>>   NTB: ntb_transport: Split core library and default NTB client
>>   NTB: ntb_transport: Add transport backend infrastructure
>>   NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
>>   NTB: hw: Add remote eDMA backend registry and DesignWare backend
>>   NTB: ntb_transport: Add remote embedded-DMA transport client
>>   ntb_netdev: Multi-queue support
>>   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
>>   iommu: ipmmu-vmsa: Add support for reserved regions
>>   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
>>     eDMA
>>   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
>>   NTB: epf: Add an additional memory window (MW2) barno mapping on
>>     Renesas R-Car
>>   Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
>>     usage
>>   Documentation: driver-api: ntb: Document remote embedded-DMA transport
>>   PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
>>   PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
>>   misc: pci_endpoint_test: Add remote eDMA transfer test mode
>>   selftests: pci_endpoint: Add remote eDMA transfer coverage
>>
>>  Documentation/PCI/endpoint/pci-vntb-howto.rst |   19 +-
>>  Documentation/driver-api/ntb.rst              |  193 ++
>>  arch/arm64/boot/dts/renesas/Makefile          |    2 +
>>  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   37 +
>>  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
>>  drivers/dma/dw-edma/dw-edma-core.c            |  207 +-
>>  drivers/dma/dw-edma/dw-edma-core.h            |   10 +
>>  drivers/dma/dw-edma/dw-edma-v0-core.c         |   26 +-
>>  drivers/iommu/ipmmu-vmsa.c                    |    7 +-
>>  drivers/misc/pci_endpoint_test.c              |  633 +++++
>>  drivers/net/ntb_netdev.c                      |  341 ++-
>>  drivers/ntb/Kconfig                           |   13 +
>>  drivers/ntb/Makefile                          |    2 +
>>  drivers/ntb/core.c                            |   68 +
>>  drivers/ntb/hw/Kconfig                        |    1 +
>>  drivers/ntb/hw/Makefile                       |    1 +
>>  drivers/ntb/hw/edma/Kconfig                   |   28 +
>>  drivers/ntb/hw/edma/Makefile                  |    5 +
>>  drivers/ntb/hw/edma/backend.c                 |   87 +
>>  drivers/ntb/hw/edma/backend.h                 |  102 +
>>  drivers/ntb/hw/edma/ntb_dw_edma.c             |  977 +++++++
>>  drivers/ntb/hw/epf/ntb_hw_epf.c               |  199 +-
>>  drivers/ntb/ntb_transport.c                   | 2458 +---------------
>>  drivers/ntb/ntb_transport_core.c              | 2523 +++++++++++++++++
>>  drivers/ntb/ntb_transport_edma.c              | 1110 ++++++++
>>  drivers/ntb/ntb_transport_internal.h          |  261 ++
>>  drivers/pci/controller/dwc/pcie-designware.c  |   26 +
>>  drivers/pci/endpoint/functions/pci-epf-test.c |  497 +++-
>>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  380 ++-
>>  include/linux/dma/edma.h                      |  106 +
>>  include/linux/ntb.h                           |   88 +
>>  include/uapi/linux/pcitest.h                  |    3 +-
>>  .../pci_endpoint/pci_endpoint_test.c          |   17 +
>>  33 files changed, 7855 insertions(+), 2624 deletions(-)
>>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
>>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
>>  create mode 100644 drivers/ntb/hw/edma/Kconfig
>>  create mode 100644 drivers/ntb/hw/edma/Makefile
>>  create mode 100644 drivers/ntb/hw/edma/backend.c
>>  create mode 100644 drivers/ntb/hw/edma/backend.h
>>  create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c
>>  create mode 100644 drivers/ntb/ntb_transport_core.c
>>  create mode 100644 drivers/ntb/ntb_transport_edma.c
>>  create mode 100644 drivers/ntb/ntb_transport_internal.h
>>
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-18 17:05   ` Frank Li
@ 2026-01-21  1:38     ` Koichiro Den
  2026-01-21  8:41       ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  1:38 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Sun, Jan 18, 2026 at 12:05:47PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> > A remote eDMA provider may need to expose the linked-list (LL) memory
> > region that was configured by platform glue (typically at boot), so the
> > peer (host) can map it and operate the remote view of the controller.
> >
> > Export dw_edma_chan_get_ll_region() to return the LL region associated
> > with a given dma_chan.
> 
> This informaiton passed from dwc epc driver. Is it possible to get it from
> EPC driver.

That makes sense, from an API cleanness perspective, thanks.
I'll add a helper function dw_pcie_edma_get_ll_region() in
drivers/pci/controller/dwc/pcie-designware.c, instead of the current
dw_edma_chan_get_ll_region() in dw-edma-core.c.

Thanks for the review,
Koichiro

> 
> Frank
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
> >  include/linux/dma/edma.h           | 14 ++++++++++++++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
> >  }
> >  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
> >
> > +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> > +			       struct dw_edma_region *region)
> > +{
> > +	struct dw_edma_chip *chip;
> > +	struct dw_edma_chan *chan;
> > +
> > +	if (!dchan || !region || !dchan->device)
> > +		return -ENODEV;
> > +
> > +	chan = dchan2dw_edma_chan(dchan);
> > +	if (!chan)
> > +		return -ENODEV;
> > +
> > +	chip = chan->dw->chip;
> > +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> > +		return -EINVAL;
> > +
> > +	if (chan->dir == EDMA_DIR_WRITE)
> > +		*region = chip->ll_region_wr[chan->id];
> > +	else
> > +		*region = chip->ll_region_rd[chan->id];
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> > +
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > index 3c538246de07..c9ec426e27ec 100644
> > --- a/include/linux/dma/edma.h
> > +++ b/include/linux/dma/edma.h
> > @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> >  int dw_edma_chan_register_notify(struct dma_chan *chan,
> >  				 void (*cb)(struct dma_chan *chan, void *user),
> >  				 void *user);
> > +
> > +/**
> > + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> > + * @chan: the target DMA channel
> > + * @region: output parameter returning the corresponding LL region
> > + */
> > +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > +			       struct dw_edma_region *region);
> >  #else
> >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> >  {
> > @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
> >  {
> >  	return -ENODEV;
> >  }
> > +
> > +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > +					     struct dw_edma_region *region)
> > +{
> > +	return -EINVAL;
> > +}
> >  #endif /* CONFIG_DW_EDMA */
> >
> >  struct pci_epc;
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks
  2026-01-19 20:03   ` Frank Li
@ 2026-01-21  1:41     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  1:41 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:03:07PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:10PM +0900, Koichiro Den wrote:
> > Provide db_vector_count() and db_vector_mask() implementations for both
> > ntb_hw_epf and pci-epf-vntb so that ntb_transport can map MSI vectors to
> > doorbell bits. Without them, the upper layer cannot identify which
> > doorbell vector fired and ends up scheduling rxc_db_work() for all queue
> > pairs, resulting in a thundering-herd effect when multiple queue pairs
> > (QPs) are enabled.
> >
> > With this change, .peer_db_set() must honor the db_bits mask and raise
> > all requested doorbell interrupts, so update those implementations
> > accordingly.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> 
> Patch 6/7/8 can be post seperatly. Basic look good.

Will do so. Thank you for the suggestion.

Koichiro

> 
> Frank
> 
> >  drivers/ntb/hw/epf/ntb_hw_epf.c               | 47 ++++++++++++-------
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 41 +++++++++++++---
> >  2 files changed, 64 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > index dbb5bebe63a5..c37ede4063dc 100644
> > --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> > +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > @@ -381,7 +381,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
> >  		}
> >  	}
> >
> > -	ndev->db_count = irq;
> > +	ndev->db_count = irq - 1;
> >
> >  	ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
> >  				   argument | irq);
> > @@ -415,6 +415,22 @@ static u64 ntb_epf_db_valid_mask(struct ntb_dev *ntb)
> >  	return ntb_ndev(ntb)->db_valid_mask;
> >  }
> >
> > +static int ntb_epf_db_vector_count(struct ntb_dev *ntb)
> > +{
> > +	return ntb_ndev(ntb)->db_count;
> > +}
> > +
> > +static u64 ntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
> > +{
> > +	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> > +
> > +	db_vector--; /* vector 0 is reserved for link events */
> > +	if (db_vector < 0 || db_vector >= ndev->db_count)
> > +		return 0;
> > +
> > +	return ndev->db_valid_mask & BIT_ULL(db_vector);
> > +}
> > +
> >  static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
> >  {
> >  	return 0;
> > @@ -507,26 +523,21 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
> >  static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
> >  {
> >  	struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> > -	u32 interrupt_num = ffs(db_bits) + 1;
> > -	struct device *dev = ndev->dev;
> > +	u32 interrupt_num;
> >  	u32 db_entry_size;
> >  	u32 db_offset;
> >  	u32 db_data;
> > -
> > -	if (interrupt_num >= ndev->db_count) {
> > -		dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
> > -			interrupt_num, ndev->db_count);
> > -		return -EINVAL;
> > -	}
> > +	unsigned long i;
> >
> >  	db_entry_size = readl(ndev->ctrl_reg + NTB_EPF_DB_ENTRY_SIZE);
> >
> > -	db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
> > -	db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
> > -
> > -	writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
> > -	       db_offset);
> > -
> > +	for_each_set_bit(i, (unsigned long *)&db_bits, ndev->db_count) {
> > +		interrupt_num = i + 1;
> > +		db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
> > +		db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
> > +		writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
> > +		       db_offset);
> > +	}
> >  	return 0;
> >  }
> >
> > @@ -556,6 +567,8 @@ static const struct ntb_dev_ops ntb_epf_ops = {
> >  	.spad_count		= ntb_epf_spad_count,
> >  	.peer_mw_count		= ntb_epf_peer_mw_count,
> >  	.db_valid_mask		= ntb_epf_db_valid_mask,
> > +	.db_vector_count	= ntb_epf_db_vector_count,
> > +	.db_vector_mask		= ntb_epf_db_vector_mask,
> >  	.db_set_mask		= ntb_epf_db_set_mask,
> >  	.mw_set_trans		= ntb_epf_mw_set_trans,
> >  	.mw_clear_trans		= ntb_epf_mw_clear_trans,
> > @@ -607,8 +620,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
> >  	int ret;
> >
> >  	/* One Link interrupt and rest doorbell interrupt */
> > -	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
> > -			       NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
> > +	ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE,
> > +			       NTB_EPF_MAX_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE);
> >  	if (ret) {
> >  		dev_err(dev, "Failed to init ISR\n");
> >  		return ret;
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 4927faa28255..39e784e21236 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -1384,6 +1384,22 @@ static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
> >  	return BIT_ULL(ntb_ndev(ntb)->db_count) - 1;
> >  }
> >
> > +static int vntb_epf_db_vector_count(struct ntb_dev *ntb)
> > +{
> > +	return ntb_ndev(ntb)->db_count;
> > +}
> > +
> > +static u64 vntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
> > +{
> > +	struct epf_ntb *ndev = ntb_ndev(ntb);
> > +
> > +	db_vector--; /* vector 0 is reserved for link events */
> > +	if (db_vector < 0 || db_vector >= ndev->db_count)
> > +		return 0;
> > +
> > +	return BIT_ULL(db_vector);
> > +}
> > +
> >  static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
> >  {
> >  	return 0;
> > @@ -1487,20 +1503,29 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
> >
> >  static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
> >  {
> > -	u32 interrupt_num = ffs(db_bits) + 1;
> >  	struct epf_ntb *ntb = ntb_ndev(ndev);
> >  	u8 func_no, vfunc_no;
> > -	int ret;
> > +	u64 failed = 0;
> > +	unsigned long i;
> >
> >  	func_no = ntb->epf->func_no;
> >  	vfunc_no = ntb->epf->vfunc_no;
> >
> > -	ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> > -				PCI_IRQ_MSI, interrupt_num + 1);
> > -	if (ret)
> > -		dev_err(&ntb->ntb->dev, "Failed to raise IRQ\n");
> > +	for_each_set_bit(i, (unsigned long *)&db_bits, ntb->db_count) {
> > +		/*
> > +		 * DB bit i is MSI interrupt (i + 2).
> > +		 * Vector 0 is used for link events and MSI vectors are
> > +		 * 1-based for pci_epc_raise_irq().
> > +		 */
> > +		if (pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
> > +				      PCI_IRQ_MSI, i + 2))
> > +			failed |= BIT_ULL(i);
> > +	}
> > +	if (failed)
> > +		dev_err(&ntb->ntb->dev, "Failed to raise IRQ (%#llx)\n",
> > +			failed);
> >
> > -	return ret;
> > +	return failed ? -EIO : 0;
> >  }
> >
> >  static u64 vntb_epf_db_read(struct ntb_dev *ndev)
> > @@ -1561,6 +1586,8 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> >  	.spad_count		= vntb_epf_spad_count,
> >  	.peer_mw_count		= vntb_epf_peer_mw_count,
> >  	.db_valid_mask		= vntb_epf_db_valid_mask,
> > +	.db_vector_count	= vntb_epf_db_vector_count,
> > +	.db_vector_mask		= vntb_epf_db_vector_mask,
> >  	.db_set_mask		= vntb_epf_db_set_mask,
> >  	.mw_set_trans		= vntb_epf_mw_set_trans,
> >  	.mw_clear_trans		= vntb_epf_mw_clear_trans,
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() to ntb_dev_ops
  2026-01-19 20:09   ` Frank Li
@ 2026-01-21  1:44     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  1:44 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:09:18PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:13PM +0900, Koichiro Den wrote:
> > Not all NTB implementations are able to naturally do DMA mapping through
> > the NTB PCI device itself (e.g. due to IOMMU topology or non-PCI backing
> > devices).
> >
> > Add an optional .get_dma_dev() callback and helper so clients can use
> > the appropriate struct device for DMA API allocations and mappings.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  include/linux/ntb.h | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/include/linux/ntb.h b/include/linux/ntb.h
> > index aa888219732a..7ac8cb13e90d 100644
> > --- a/include/linux/ntb.h
> > +++ b/include/linux/ntb.h
> > @@ -262,6 +262,7 @@ struct ntb_mw_subrange {
> >   * @msg_clear_mask:	See ntb_msg_clear_mask().
> >   * @msg_read:		See ntb_msg_read().
> >   * @peer_msg_write:	See ntb_peer_msg_write().
> > + * @get_dma_dev:	See ntb_get_dma_dev().
> >   * @get_private_data:	See ntb_get_private_data().
> >   */
> >  struct ntb_dev_ops {
> > @@ -339,6 +340,7 @@ struct ntb_dev_ops {
> >  	int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
> >  	u32 (*msg_read)(struct ntb_dev *ntb, int *pidx, int midx);
> >  	int (*peer_msg_write)(struct ntb_dev *ntb, int pidx, int midx, u32 msg);
> > +	struct device *(*get_dma_dev)(struct ntb_dev *ntb);
> >  	void *(*get_private_data)(struct ntb_dev *ntb);
> >  };
> >
> > @@ -405,6 +407,7 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
> >  		!ops->peer_msg_write == !ops->msg_count		&&
> >
> >  		/* Miscellaneous optional callbacks */
> > +		/* ops->get_dma_dev				&& */
> >  		/* ops->get_private_data			&& */
> >  		1;
> >  }
> > @@ -1614,6 +1617,21 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
> >  	return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
> >  }
> >
> > +/**
> > + * ntb_get_dma_dev() - get the device suitable for DMA mapping
> > + * @ntb:	NTB device context.
> > + *
> > + * Retrieve a struct device which is suitable for DMA mapping.
> > + *
> > + * Return: Pointer to struct device.
> > + */
> > +static inline struct device __maybe_unused *ntb_get_dma_dev(struct ntb_dev *ntb)
> 
> I remember if there are inline,  needn't __maybe_unused.

My bad, I'll drop it.

Thanks,
Koichiro

> 
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > +{
> > +	if (!ntb->ops->get_dma_dev)
> > +		return ntb->dev.parent;
> > +	return ntb->ops->get_dma_dev(ntb);
> > +}
> > +
> >  /**
> >   * ntb_get_private_data() - get private data specific to the hardware driver
> >   * @ntb:	NTB device context.
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
  2026-01-19 20:26   ` Frank Li
@ 2026-01-21  2:08     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  2:08 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:26:11PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:15PM +0900, Koichiro Den wrote:
> > pci-epf-vntb can pack multiple memory windows into a single BAR using
> > mwN_offset. With the NTB core gaining support for programming multiple
> > translation ranges for a window, the EPF needs to provide the per-BAR
> > subrange layout to the endpoint controller (EPC).
> >
> > Implement .mw_set_trans_ranges() for pci-epf-vntb. Track subranges for
> > each BAR and pass them to pci_epc_set_bar() so EPC drivers can select an
> > appropriate inbound mapping mode (e.g. Address Match mode on DesignWare
> > controllers) when subrange mappings are required.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 183 +++++++++++++++++-
> >  1 file changed, 175 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 39e784e21236..98128c2c5079 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -42,6 +42,7 @@
> >  #include <linux/log2.h>
> >  #include <linux/module.h>
> >  #include <linux/slab.h>
> > +#include <linux/sort.h>
> >
> >  #include <linux/pci-ep-msi.h>
> >  #include <linux/pci-epc.h>
> > @@ -144,6 +145,10 @@ struct epf_ntb {
> >
> >  	enum pci_barno epf_ntb_bar[VNTB_BAR_NUM];
> >
> > +	/* Cache for subrange mapping */
> > +	struct ntb_mw_subrange *mw_subrange[MAX_MW];
> > +	unsigned int num_subrange[MAX_MW];
> > +
> >  	struct epf_ntb_ctrl *reg;
> >
> >  	u32 *epf_db;
> > @@ -736,6 +741,7 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> >  		ntb->epf->bar[barno].flags |= upper_32_bits(size) ?
> >  				PCI_BASE_ADDRESS_MEM_TYPE_64 :
> >  				PCI_BASE_ADDRESS_MEM_TYPE_32;
> > +		ntb->epf->bar[barno].num_submap = 0;
> >
> >  		ret = pci_epc_set_bar(ntb->epf->epc,
> >  				      ntb->epf->func_no,
> > @@ -1405,28 +1411,188 @@ static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
> >  	return 0;
> >  }
> >
> > -static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
> > -		dma_addr_t addr, resource_size_t size)
> > +struct vntb_mw_order {
> > +	u64 off;
> > +	unsigned int mw;
> > +};
> > +
> > +static int vntb_cmp_mw_order(const void *a, const void *b)
> > +{
> > +	const struct vntb_mw_order *ma = a;
> > +	const struct vntb_mw_order *mb = b;
> > +
> > +	if (ma->off < mb->off)
> > +		return -1;
> > +	if (ma->off > mb->off)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +static int vntb_epf_mw_set_trans_ranges(struct ntb_dev *ndev, int pidx, int idx,
> > +					unsigned int num_ranges,
> > +					const struct ntb_mw_subrange *ranges)
> >  {
> >  	struct epf_ntb *ntb = ntb_ndev(ndev);
> > +	struct pci_epf_bar_submap *submap;
> > +	struct vntb_mw_order mws[MAX_MW];
> >  	struct pci_epf_bar *epf_bar;
> > +	struct ntb_mw_subrange *r;
> >  	enum pci_barno barno;
> > +	struct device *dev, *epf_dev;
> > +	unsigned int total_ranges = 0;
> > +	unsigned int mw_cnt = 0;
> > +	unsigned int cur = 0;
> > +	u64 expected_off = 0;
> > +	unsigned int i, j;
> >  	int ret;
> > +
> > +	dev = &ntb->ntb->dev;
> > +	epf_dev = &ntb->epf->dev;
> > +	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
> > +	epf_bar = &ntb->epf->bar[barno];
> > +	epf_bar->barno = barno;
> > +
> > +	r = devm_kmemdup(epf_dev, ranges, num_ranges * sizeof(*ranges), GFP_KERNEL);
> 
> size_mul(sizeof(*ranges), num_ranges)

Will update.

> 
> > +	if (!r)
> > +		return -ENOMEM;
> > +
> > +	if (ntb->mw_subrange[idx])
> > +		devm_kfree(epf_dev, ntb->mw_subrange[idx]);
> > +
> > +	ntb->mw_subrange[idx] = r;
> > +	ntb->num_subrange[idx] = num_ranges;
> > +
> > +	/* Defer pci_epc_set_bar() until all MWs in this BAR have range info. */
> > +	for (i = 0; i < MAX_MW; i++) {
> > +		enum pci_barno bar = ntb->epf_ntb_bar[BAR_MW1 + i];
> > +
> > +		if (bar != barno)
> > +			continue;
> > +		if (!ntb->num_subrange[i])
> > +			return 0;
> > +
> > +		mws[mw_cnt].mw = i;
> > +		mws[mw_cnt].off = ntb->mws_offset[i];
> > +		mw_cnt++;
> > +	}
> > +
> > +	sort(mws, mw_cnt, sizeof(mws[0]), vntb_cmp_mw_order, NULL);
> 
> Can we require mws_offset is ordered? So needn't sort here.

Yes, by adding an ordering validation at epf_ntb_bind() time, and
documenting the constraint in
Documentation/PCI/endpoint/pci-vntb-howto.rst.
The sorting was only added for convenience, and there should be no strong
technical reason to support arbitrary MW ordering.

Thanks for the review.
Koichiro

> 
> > +
> > +	/* BAR submap must cover the whole BAR with no holes. */
> > +	for (i = 0; i < mw_cnt; i++) {
> > +		unsigned int mw = mws[i].mw;
> > +		u64 sum = 0;
> > +
> > +		if (mws[i].off != expected_off) {
> 
> can we all use size instead 'off' to keep align with submap?

Yes, will rename it.

Thanks for the review,
Koichiro

> 
> Frank
> > +			dev_err(dev,
> > +				"BAR%d: hole/overlap at %#llx (MW%d@%#llx)\n",
> > +				barno, expected_off, mw + 1, mws[i].off);
> > +			return -EINVAL;
> > +		}
> > +
> > +		total_ranges += ntb->num_subrange[mw];
> > +		for (j = 0; j < ntb->num_subrange[mw]; j++)
> > +			sum += ntb->mw_subrange[mw][j].size;
> > +
> > +		if (sum != ntb->mws_size[mw]) {
> > +			dev_err(dev,
> > +				"MW%d: ranges size %#llx != window size %#llx\n",
> > +				mw + 1, sum, ntb->mws_size[mw]);
> > +			return -EINVAL;
> > +		}
> > +		expected_off += ntb->mws_size[mw];
> > +	}
> > +
> > +	submap = devm_krealloc_array(epf_dev, epf_bar->submap, total_ranges,
> > +				     sizeof(*submap), GFP_KERNEL);
> > +	if (!submap)
> > +		return -ENOMEM;
> > +
> > +	epf_bar->submap = submap;
> > +	epf_bar->num_submap = total_ranges;
> > +	dev_dbg(dev, "Requesting BAR%d layout (#. of subranges is %u):\n",
> > +		barno, total_ranges);
> > +
> > +	for (i = 0; i < mw_cnt; i++) {
> > +		unsigned int mw = mws[i].mw;
> > +
> > +		dev_dbg(dev, "- MW%d\n", 1 + mw);
> > +		for (j = 0; j < ntb->num_subrange[mw]; j++) {
> > +			dev_dbg(dev, "  - addr/size = %#llx/%#llx\n",
> > +				ntb->mw_subrange[mw][j].addr,
> > +				ntb->mw_subrange[mw][j].size);
> > +			submap[cur].phys_addr = ntb->mw_subrange[mw][j].addr;
> > +			submap[cur].size = ntb->mw_subrange[mw][j].size;
> > +			cur++;
> > +		}
> > +	}
> > +
> > +	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
> > +			      ntb->epf->vfunc_no, epf_bar);
> > +	if (ret)
> > +		dev_err(dev, "BAR%d: failed to program mappings for MW%d: %d\n",
> > +			barno, idx + 1, ret);
> > +
> > +	return ret;
> > +}
> > +
> > +static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
> > +				 dma_addr_t addr, resource_size_t size)
> > +{
> > +	struct epf_ntb *ntb = ntb_ndev(ndev);
> > +	struct pci_epf_bar *epf_bar;
> > +	resource_size_t bar_size;
> > +	enum pci_barno barno;
> >  	struct device *dev;
> > +	unsigned int i;
> > +	int ret;
> >
> >  	dev = &ntb->ntb->dev;
> >  	barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
> >  	epf_bar = &ntb->epf->bar[barno];
> >  	epf_bar->phys_addr = addr;
> >  	epf_bar->barno = barno;
> > -	epf_bar->size = size;
> >
> > -	ret = pci_epc_set_bar(ntb->epf->epc, 0, 0, epf_bar);
> > -	if (ret) {
> > -		dev_err(dev, "failure set mw trans\n");
> > -		return ret;
> > +	bar_size = epf_bar->size;
> > +	if (!bar_size || !size)
> > +		return -EINVAL;
> > +
> > +	if (size != ntb->mws_size[idx])
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Even if the caller intends to map the entire MW, the MW might
> > +	 * actually be just a part of the BAR. In that case, redirect the
> > +	 * handling to vntb_epf_mw_set_trans_ranges().
> > +	 */
> > +	if (size < bar_size) {
> > +		struct ntb_mw_subrange r = {
> > +			.addr = addr,
> > +			.size = size,
> > +		};
> > +		return vntb_epf_mw_set_trans_ranges(ndev, pidx, idx, 1, &r);
> >  	}
> > -	return 0;
> > +
> > +	/* Drop any stale cache for the BAR. */
> > +	for (i = 0; i < MAX_MW; i++) {
> > +		if (ntb->epf_ntb_bar[BAR_MW1 + i] != barno)
> > +			continue;
> > +		devm_kfree(&ntb->epf->dev, ntb->mw_subrange[i]);
> > +		ntb->mw_subrange[i] = NULL;
> > +		ntb->num_subrange[i] = 0;
> > +	}
> > +
> > +	/* Not use subrange mapping. If it's used in the past, clear it off. */
> > +	devm_kfree(&ntb->epf->dev, epf_bar->submap);
> > +	epf_bar->submap = NULL;
> > +	epf_bar->num_submap = 0;
> > +
> > +	ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
> > +			      ntb->epf->vfunc_no, epf_bar);
> > +	if (ret)
> > +		dev_err(dev, "failure set mw trans\n");
> > +
> > +	return ret;
> >  }
> >
> >  static int vntb_epf_mw_clear_trans(struct ntb_dev *ntb, int pidx, int idx)
> > @@ -1590,6 +1756,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> >  	.db_vector_mask		= vntb_epf_db_vector_mask,
> >  	.db_set_mask		= vntb_epf_db_set_mask,
> >  	.mw_set_trans		= vntb_epf_mw_set_trans,
> > +	.mw_set_trans_ranges	= vntb_epf_mw_set_trans_ranges,
> >  	.mw_clear_trans		= vntb_epf_mw_clear_trans,
> >  	.peer_mw_get_addr	= vntb_epf_peer_mw_get_addr,
> >  	.link_enable		= vntb_epf_link_enable,
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
  2026-01-19 20:36   ` Frank Li
@ 2026-01-21  2:15     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  2:15 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:36:47PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:18PM +0900, Koichiro Den wrote:
> > Historically both TX and RX have assumed the same per-QP MW slice
> > (tx_max_entry == remote rx_max_entry), while those are calculated
> > separately in different places (pre and post the link-up negotiation
> > point). This has been safe because nt->link_is_up is never set to true
> > unless the pre-determined qp_count are the same among them, and qp_count
> > is typically limited to nt->mw_count, which should be carefully
> > configured by admin.
> >
> > However, setup_qp_mw can actually split mw and handle multi-qps in one
> > MW properly, so qp_count needs not to be limited by nt->mw_count. Once
> > we relax the limitation, pre-determined qp_count can differ among host
> > side and endpoint, and link-up negotiation can easily fail.
> >
> > Move the TX MW configuration (per-QP offset and size) into
> > ntb_transport_setup_qp_mw() so that both RX and TX layout decisions are
> > centralized in a single helper. ntb_transport_init_queue() now deals
> > only with per-QP software state, not with MW layout.
> >
> > This keeps the previous behavior, while preparing for relaxing the
> > qp_count limitation and improving readability.
> >
> > No functional change is intended.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/ntb/ntb_transport.c | 76 ++++++++++++++++---------------------
> >  1 file changed, 32 insertions(+), 44 deletions(-)
> >
> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> > index d5a544bf8fd6..57a21f2daac6 100644
> > --- a/drivers/ntb/ntb_transport.c
> > +++ b/drivers/ntb/ntb_transport.c
> > @@ -569,7 +569,10 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
> >  	struct ntb_transport_mw *mw;
> >  	struct ntb_dev *ndev = nt->ndev;
> >  	struct ntb_queue_entry *entry;
> > -	unsigned int rx_size, num_qps_mw;
> > +	phys_addr_t mw_base;
> > +	resource_size_t mw_size;
> > +	unsigned int rx_size, tx_size, num_qps_mw;
> > +	u64 qp_offset;
> >  	unsigned int mw_num, mw_count, qp_count;
> >  	unsigned int i;
> >  	int node;
> > @@ -588,13 +591,38 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
> >  	else
> >  		num_qps_mw = qp_count / mw_count;
> >
> > -	rx_size = (unsigned int)mw->xlat_size / num_qps_mw;
> > -	qp->rx_buff = mw->virt_addr + rx_size * (qp_num / mw_count);
> > -	rx_size -= sizeof(struct ntb_rx_info);
> > +	mw_base = nt->mw_vec[mw_num].phys_addr;
> > +	mw_size = nt->mw_vec[mw_num].phys_size;
> > +
> > +	if (mw_size > mw->xlat_size)
> > +		mw_size = mw->xlat_size;
> 
> old code have not check this.

Thanks for pointing it out, I'll drop it from this commit so the existing
behaviour remains unchanged, as stated in the commit message.

Thanks,
Koichiro

> 
> Frank
> > +	if (max_mw_size && mw_size > max_mw_size)
> > +		mw_size = max_mw_size;
> > +
> > +	tx_size = (unsigned int)mw_size / num_qps_mw;
> > +	qp_offset = tx_size * (qp_num / mw_count);
> > +
> > +	qp->rx_buff = mw->virt_addr + qp_offset;
> > +
> > +	qp->tx_mw_size = tx_size;
> > +	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> > +	if (!qp->tx_mw)
> > +		return -EINVAL;
> > +
> > +	qp->tx_mw_phys = mw_base + qp_offset;
> > +	if (!qp->tx_mw_phys)
> > +		return -EINVAL;
> >
> > +	rx_size = tx_size;
> > +	rx_size -= sizeof(struct ntb_rx_info);
> >  	qp->remote_rx_info = qp->rx_buff + rx_size;
> >
> > +	tx_size -= sizeof(struct ntb_rx_info);
> > +	qp->rx_info = qp->tx_mw + tx_size;
> > +
> >  	/* Due to housekeeping, there must be atleast 2 buffs */
> > +	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> > +	qp->tx_max_entry = tx_size / qp->tx_max_frame;
> >  	qp->rx_max_frame = min(transport_mtu, rx_size / 2);
> >  	qp->rx_max_entry = rx_size / qp->rx_max_frame;
> >  	qp->rx_index = 0;
> > @@ -1132,16 +1160,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> >  				    unsigned int qp_num)
> >  {
> >  	struct ntb_transport_qp *qp;
> > -	phys_addr_t mw_base;
> > -	resource_size_t mw_size;
> > -	unsigned int num_qps_mw, tx_size;
> > -	unsigned int mw_num, mw_count, qp_count;
> > -	u64 qp_offset;
> > -
> > -	mw_count = nt->mw_count;
> > -	qp_count = nt->qp_count;
> > -
> > -	mw_num = QP_TO_MW(nt, qp_num);
> >
> >  	qp = &nt->qp_vec[qp_num];
> >  	qp->qp_num = qp_num;
> > @@ -1151,36 +1169,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> >  	qp->event_handler = NULL;
> >  	ntb_qp_link_context_reset(qp);
> >
> > -	if (mw_num < qp_count % mw_count)
> > -		num_qps_mw = qp_count / mw_count + 1;
> > -	else
> > -		num_qps_mw = qp_count / mw_count;
> > -
> > -	mw_base = nt->mw_vec[mw_num].phys_addr;
> > -	mw_size = nt->mw_vec[mw_num].phys_size;
> > -
> > -	if (max_mw_size && mw_size > max_mw_size)
> > -		mw_size = max_mw_size;
> > -
> > -	tx_size = (unsigned int)mw_size / num_qps_mw;
> > -	qp_offset = tx_size * (qp_num / mw_count);
> > -
> > -	qp->tx_mw_size = tx_size;
> > -	qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> > -	if (!qp->tx_mw)
> > -		return -EINVAL;
> > -
> > -	qp->tx_mw_phys = mw_base + qp_offset;
> > -	if (!qp->tx_mw_phys)
> > -		return -EINVAL;
> > -
> > -	tx_size -= sizeof(struct ntb_rx_info);
> > -	qp->rx_info = qp->tx_mw + tx_size;
> > -
> > -	/* Due to housekeeping, there must be atleast 2 buffs */
> > -	qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> > -	qp->tx_max_entry = tx_size / qp->tx_max_frame;
> > -
> >  	if (nt->debugfs_node_dir) {
> >  		char debugfs_name[8];
> >
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA
  2026-01-20 18:47   ` Dave Jiang
@ 2026-01-21  2:40     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  2:40 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Frank.Li, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Tue, Jan 20, 2026 at 11:47:32AM -0700, Dave Jiang wrote:
> 
> 
> On 1/20/26 11:30 AM, Dave Jiang wrote:
> > 
> > 
> > On 1/18/26 6:54 AM, Koichiro Den wrote:
> >> Hi,
> >>
> >> This is RFC v4 of the NTB/PCI/dmaengine series that introduces an
> >> optional NTB transport variant where payload data is moved by a PCI
> >> embedded-DMA engine (eDMA) residing on the endpoint side.
> > 
> > Just a fly by comment. This series is huge. I do suggest break it down to something more manageable to prevent review fatigue from patch reviewers. For example, linux network sub-system has a rule to restrict patch series to no more than 15 patches. NTB sub-system does not have that rule. But maybe split out the dmaengine changes and the hardware specific dw-edma bits from the ntb core changes.
> > 
> > DJ
> 
> Ah I do see your comment that you will split when out of RFC below now.

Thanks for the comment. You're right that the series is huge.
Should another RFC iteration turn out necessary, I'll make sure to
keep netdev out and will see further splitting. (Though I don't think I'll
need to send another huge RFC series, as the usecase scenario and the
overall picture seems to have already conveyed.)

Koichiro

> 
> DJ
> >  
> >>
> >> The primary target is Synopsys DesignWare PCIe endpoint controllers that
> >> integrate a DesignWare eDMA instance (dw-edma). In the remote
> >> embedded-DMA mode, payload is transferred by DMA directly between the
> >> two systems' memory, and NTB Memory Windows are used primarily for
> >> control/metadata and for exposing the endpoint eDMA resources (register
> >> window + linked-list rings) to the host.
> >>
> >> Compared to the existing cpu/dma memcpy-based implementation, this
> >> approach avoids window-backed payload rings and the associated extra
> >> copies, and it is less sensitive to scarce MW space. This also enables
> >> scaling out to multiple queue pairs, which is particularly beneficial
> >> for ntb_netdev. On R-Car S4, preliminary iperf3 results show 10~20x
> >> throughput improvement. Latency improvements are also observed.
> >>
> >> RFC history:
> >>   RFC v3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
> >>   RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> >>   RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >>
> >> Parts of RFC v3 series have already been split out and posted separately
> >> (see "Kernel base / dependencies" section below). However, feedback on
> >> the remaining parts led to substantial restructuring and code changes,
> >> so I am sending an RFC v4 as a refreshed version of the full series.
> >>
> >> RFC v4 is still a large, cross-subsystem series. At this RFC stage,
> >> I am sending the full picture in a single set to make it easier to
> >> review the overall direction and architecture. Once the direction is
> >> agreed upon and no further large restructuring appears necessary, I will stop
> >> posting the new RFC-tagged revisions and continue development on
> >> separate threads, split by sub-topic.
> >>
> >> Many thanks for all the reviews and feedback from multiple perspectives.
> >>
> >>
> >> Software architecture overview (RFC v4)
> >> =======================================
> >>
> >> A major change in RFC v4 is the software layering and module split.
> >>
> >> The existing memcpy-based transport and the new remote embedded-DMA
> >> transport are implemented as two independent NTB client drivers on top
> >> of a shared core library:
> >>
> >>                        +--------------------+
> >>                        | ntb_transport_core |
> >>                        +--------------------+
> >>                            ^            ^
> >>                            |            |
> >>         ntb_transport -----+            +----- ntb_transport_edma
> >>        (cpu/dma memcpy)                   (remote embedded DMA transfer)
> >>                                                        |
> >>                                                        v
> >>                                                  +-----------+
> >>                                                  |  ntb_edma |
> >>                                                  +-----------+
> >>                                                        ^
> >>                                                        |
> >>                                                +----------------+
> >>                                                |                |
> >>                                           ntb_dw_edma         [...]
> >>
> >> Key points:
> >>   * ntb_transport_core provides the queue-pair abstraction used by upper
> >>     layer clients (e.g. ntb_netdev).
> >>   * ntb_transport is the legacy shared-memory transport client (CPU/DMA
> >>     memcpy).
> >>   * ntb_transport_edma is the remote embedded-DMA transport client.
> >>   * ntb_transport_edma relies on an ntb_edma backend registry.
> >>     This RFC provides an initial DesignWare backend (ntb_dw_edma).
> >>   * Transport selection is per-NTB device via the standard
> >>     driver_override mechanism. To enable that, this RFC adds
> >>     driver_override support to ntb_bus. This allows mixing transports
> >>     across multiple NTB ports and provides an explicit fallback path to
> >>     the legacy transport.
> >>
> >> So, if ntb_transport / ntb_transport_edma are built as loadable modules,
> >> you can just run modprobe ntb_transport as before and the original cpu/dma
> >> memcpy-based implementation will be active. If they are built-in, whether
> >> ntb_transport or ntb_transport_edma are bound by default depends on
> >> initcall order. Regarding how to switch the driver, please see Patch 34
> >> ("Documentation: driver-api: ntb: Document remote embedded-DMA transport")
> >> for details.
> >>
> >>
> >> Data flow overview (remote embedded-DMA transport)
> >> ==================================================
> >>
> >> At a high level:
> >>   * One MW is reserved as an "eDMA window". The endpoint exposes the
> >>     eDMA register block plus LL descriptor rings through that window, so
> >>     the peer can ioremap it and drive DMA reads remotely.
> >>   * Remaining MWs carry only small control-plane rings used to exchange
> >>     buffer addresses and completion information.
> >>   * For RC->EP traffic, the RC drives endpoint DMA read channels through
> >>     the peer-visible eDMA window.
> >>   * For EP->RC traffic, the endpoint uses its local DMA write channels.
> >>
> >> The following figures illustrate the data flow when ntb_netdev sits on
> >> top of the transport:
> >>
> >>      Figure 1. RC->EP traffic via ntb_netdev + ntb_transport_edma
> >>                    backed by ntb_edma/ntb_dw_edma
> >>
> >>              EP                                   RC
> >>           phys addr                            phys addr
> >>             space                                space
> >>              +-+                                  +-+
> >>              | |                                  | |
> >>              | |                ||                | |
> >>              +-+-----.          ||                | |
> >>     EDMA REG | |      \     [A] ||                | |
> >>              +-+----.  '---+-+  ||                | |
> >>              | |     \     | |<---------[0-a]----------
> >>              +-+-----------| |<----------[2]----------.
> >>      EDMA LL | |           | |  ||                | | :
> >>              | |           | |  ||                | | :
> >>              +-+-----------+-+  ||  [B]           | | :
> >>              | |                ||  ++            | | :
> >>           ---------[0-b]----------->||----------------'
> >>              | |            ++  ||  ||            | |
> >>              | |            ||  ||  ++            | |
> >>              | |            ||<----------[4]-----------
> >>              | |            ++  ||                | |
> >>              | |           [C]  ||                | |
> >>           .--|#|<------------------------[3]------|#|<-.
> >>           :  |#|                ||                |#|  :
> >>          [5] | |                ||                | | [1]
> >>           :  | |                ||                | |  :
> >>           '->|#|                                  |#|--'
> >>              |#|                                  |#|
> >>              | |                                  | |
> >>
> >>      Figure 2. EP->RC traffic via ntb_netdev + ntb_transport_edma
> >>                   backed by ntb_edma/ntb_dw_edma
> >>
> >>              EP                                   RC
> >>           phys addr                            phys addr
> >>             space                                space
> >>              +-+                                  +-+
> >>              | |                                  | |
> >>              | |                ||                | |
> >>              +-+                ||                | |
> >>     EDMA REG | |                ||                | |
> >>              +-+                ||                | |
> >>     ^        | |                ||                | |
> >>     :        +-+                ||                | |
> >>     : EDMA LL| |                ||                | |
> >>     :        | |                ||                | |
> >>     :        +-+                ||  [C]           | |
> >>     :        | |                ||  ++            | |
> >>     :     -----------[4]----------->||            | |
> >>     :        | |            ++  ||  ||            | |
> >>     :        | |            ||  ||  ++            | |
> >>     '----------------[2]-----||<--------[0-b]-----------
> >>              | |            ++  ||                | |
> >>              | |           [B]  ||                | |
> >>           .->|#|--------[3]---------------------->|#|--.
> >>           :  |#|                ||                |#|  :
> >>          [1] | |                ||                | | [5]
> >>           :  | |                ||                | |  :
> >>           '--|#|                                  |#|<-'
> >>              |#|                                  |#|
> >>              | |                                  | |
> >>
> >>     0-a. configure remote embedded DMA (program endpoint DMA registers)
> >>     0-b. DMA-map and publish destination address (DAR)
> >>     1.   network stack builds skb (copy from application/user memory)
> >>     2.   consume DAR, DMA-map source address (SAR) and kick DMA transfer
> >>     3.   DMA transfer (payload moves between RC/EP memory)
> >>     4.   consume completion (commit)
> >>     5.   network stack delivers data to application/user memory
> >>
> >>     [A]: Dedicated MW that aggregates DMA regs and LL (peer ioremaps it)
> >>     [B]: Control-plane ring buffer for "produce"
> >>     [C]: Control-plane ring buffer for "consume"
> >>
> >>
> >> Kernel base / dependencies
> >> ==========================
> >>
> >> This series is based on:
> >>
> >>   - next-20260114 (commit b775e489bec7)
> >>
> >> plus the following seven unmerged patch series or standalone patches:
> >>
> >>   - [PATCH v4 0/7] PCI: endpoint/NTB: Harden vNTB resource management
> >>     https://lore.kernel.org/all/20251202072348.2752371-1-den@valinux.co.jp/
> >>
> >>   - [PATCH v2 0/2] NTB: ntb_transport: debugfs cleanups
> >>     https://lore.kernel.org/all/20260107042458.1987818-1-den@valinux.co.jp/
> >>
> >>   - [PATCH v3 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
> >>     https://lore.kernel.org/all/20260105-dma_prep_config-v3-0-a8480362fd42@nxp.com/
> >>
> >>   - [PATCH v8 0/5] PCI: endpoint: BAR subrange mapping support
> >>     https://lore.kernel.org/all/20260115084928.55701-1-den@valinux.co.jp/
> >>
> >>   - [PATCH] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
> >>     https://lore.kernel.org/all/20260105075606.1253697-1-den@valinux.co.jp/
> >>
> >>   - [PATCH] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
> >>     https://lore.kernel.org/all/20260105075904.1254012-1-den@valinux.co.jp/
> >>
> >>   - [PATCH v2 01/11] dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
> >>     https://lore.kernel.org/imx/20260109-edma_ll-v2-1-5c0b27b2c664@nxp.com/
> >>     (only this single commit is cherry-picked from the series)
> >>
> >>
> >> Patch layout
> >> ============
> >>
> >>   1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
> >>      register/LL windows, IRQ routing control, etc.)
> >>
> >>      Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
> >>      Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
> >>      Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
> >>      Patch 04 : dmaengine: dw-edma: Add notify-only channels support
> >>      Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region
> >>
> >>   2. NTB EPF/core + vNTB prep (mwN_offset + versioning, MSI vector
> >>      management, new ntb_dev_ops helpers, driver_override, vntb glue)
> >>
> >>      Patch 06 : NTB: epf: Add mwN_offset support and config region versioning
> >>      Patch 07 : NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> >>      Patch 08 : NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> >>      Patch 09 : NTB: core: Add mw_set_trans_ranges() for subrange programming
> >>      Patch 10 : NTB: core: Add .get_private_data() to ntb_dev_ops
> >>      Patch 11 : NTB: core: Add .get_dma_dev() to ntb_dev_ops
> >>      Patch 12 : NTB: core: Add driver_override support for NTB devices
> >>      Patch 13 : PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
> >>      Patch 14 : PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
> >>      Patch 15 : PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
> >>
> >>   3. ntb_transport refactor/modularization and backend infrastructure
> >>
> >>      Patch 16 : NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> >>      Patch 17 : NTB: ntb_transport: Dynamically determine qp count
> >>      Patch 18 : NTB: ntb_transport: Use ntb_get_dma_dev()
> >>      Patch 19 : NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
> >>      Patch 20 : NTB: ntb_transport: Move internal types to ntb_transport_internal.h
> >>      Patch 21 : NTB: ntb_transport: Export common helpers for modularization
> >>      Patch 22 : NTB: ntb_transport: Split core library and default NTB client
> >>      Patch 23 : NTB: ntb_transport: Add transport backend infrastructure
> >>      Patch 24 : NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
> >>
> >>   4. ntb_edma backend registry + DesignWare backend + transport client
> >>
> >>      Patch 25 : NTB: hw: Add remote eDMA backend registry and DesignWare backend
> >>      Patch 26 : NTB: ntb_transport: Add remote embedded-DMA transport client
> >>
> >>   5. ntb_netdev multi-queue support
> >>
> >>      Patch 27 : ntb_netdev: Multi-queue support
> >>
> >>   6. Renesas R-Car S4 enablement (IOMMU, DTs, quirks)
> >>
> >>      Patch 28 : iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> >>      Patch 29 : iommu: ipmmu-vmsa: Add support for reserved regions
> >>      Patch 30 : arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
> >>      Patch 31 : NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> >>      Patch 32 : NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car
> >>
> >>   7. Documentation updates
> >>
> >>      Patch 33 : Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
> >>      Patch 34 : Documentation: driver-api: ntb: Document remote embedded-DMA transport
> >>
> >>   8. pci-epf-test / pci_endpoint_test / kselftest coverage for remote eDMA
> >>
> >>      Patch 35 : PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
> >>      Patch 36 : PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
> >>      Patch 37 : misc: pci_endpoint_test: Add remote eDMA transfer test mode
> >>      Patch 38 : selftests: pci_endpoint: Add remote eDMA transfer coverage
> >>
> >>
> >> Tested on
> >> =========
> >>
> >> * 2x Renesas R-Car S4 Spider (RC<->EP connected with OCuLink cable)
> >> * Kernel base as described above
> >>
> >>
> >> Performance notes
> >> =================
> >>
> >> The primary motivation remains improving throughput/latency for ntb_transport
> >> users (typically ntb_netdev). On R-Car S4, the earlier prototype (RFC v3)
> >> showed roughly 10-20x throughput improvement in preliminary iperf3 tests and
> >> lower ping RTT. I have not yet re-measured after the v4 refactor and
> >> module split.
> >>
> >>
> >> Changelog
> >> =========
> >>
> >> RFCv3->RFCv4 changes:
> >>   - Major refactor of the transport layering:
> >>     - Introduce ntb_transport_core as a shared library module.
> >>     - Split the legacy shared-memory transport client (ntb_transport) and the
> >>       remote embedded-DMA transport client (ntb_transport_edma).
> >>     - Add driver_override support for ntb_bus and use it for per-port transport
> >>       selection.
> >>   - Introduce a vendor-agnostic remote embedded-DMA backend registry (ntb_edma)
> >>     and add the initial DesignWare backend (ntb_dw_edma).
> >>   - Rebase to next-20260114 and move several prerequisite/fixup patchsets into
> >>     separate threads (listed above), including BAR subrange mapping support and
> >>     dw-edma fixes.
> >>   - Add PCI endpoint test coverage for the remote embedded-DMA path:
> >>     - extend pci-epf-test / pci_endpoint_test
> >>     - add a kselftest variant to exercise remote-eDMA transfers
> >>     Note: to keep the changes as small as possible, I added a few #ifdefs
> >>     in the main test code. Feedback on whether/how/to what extent this
> >>     should be split into separate modules would be appreciated.
> >>   - Expand documentation (Documentation/driver-api/ntb.rst) to describe transport
> >>     variants, the new module structure, and the remote embedded-DMA data flow.
> >>   - Addressed other feedbacks from the RFC v3 thread.
> >>
> >> RFCv2->RFCv3 changes:
> >>   - Architecture
> >>     - Have EP side use its local write channels, while leaving RC side to
> >>       use remote read channels.
> >>     - Abstraction/HW-specific stuff encapsulation improved.
> >>   - Added control/config region versioning for the vNTB/EPF control region
> >>     so that mismatched RC/EP kernels fail early instead of silently using an
> >>     incompatible layout.
> >>   - Reworked BAR subrange / multi-region mapping support:
> >>     - Dropped the v2 approach that added new inbound mapping ops in the EPC
> >>       core.
> >>     - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
> >>       support BAR subrange inbound mapping via Address Match Mode IB iATU.
> >>     - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
> >>       when offsets are used.
> >>   - Changed .get_pci_epc() to .get_private_data()
> >>   - Dropped two commits from RFC v2 that should be submitted separately:
> >>     (1) ntb_transport debugfs seq_file conversion
> >>     (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
> >>   - Added documentation updates.
> >>   - Addressed assorted review nits from the RFC v2 thread (naming/structure).
> >>
> >> RFCv1->RFCv2 changes:
> >>   - Architecture
> >>     - Drop the generic interrupt backend + DW eDMA test-interrupt backend
> >>       approach and instead adopt the remote eDMA-backed ntb_transport mode
> >>       proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
> >>       mapping (Address Match Mode) infrastructure from RFC v1 is largely
> >>       kept, with only minor refinements and code motion where necessary
> >>       to fit the new transport-mode design.
> >>   - For Patch 01
> >>     - Rework the array_index_nospec() conversion to address review
> >>       comments on "[RFC PATCH 01/25]".
> >>
> >> RFCv3: https://lore.kernel.org/all/20251217151609.3162665-1-den@valinux.co.jp/
> >> RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> >> RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >>
> >> Thank you for reviewing,
> >>
> >>
> >> Koichiro Den (38):
> >>   dmaengine: dw-edma: Export helper to get integrated register window
> >>   dmaengine: dw-edma: Add per-channel interrupt routing control
> >>   dmaengine: dw-edma: Poll completion when local IRQ handling is
> >>     disabled
> >>   dmaengine: dw-edma: Add notify-only channels support
> >>   dmaengine: dw-edma: Add a helper to query linked-list region
> >>   NTB: epf: Add mwN_offset support and config region versioning
> >>   NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> >>   NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> >>   NTB: core: Add mw_set_trans_ranges() for subrange programming
> >>   NTB: core: Add .get_private_data() to ntb_dev_ops
> >>   NTB: core: Add .get_dma_dev() to ntb_dev_ops
> >>   NTB: core: Add driver_override support for NTB devices
> >>   PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs
> >>   PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback
> >>   PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
> >>   NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> >>   NTB: ntb_transport: Dynamically determine qp count
> >>   NTB: ntb_transport: Use ntb_get_dma_dev()
> >>   NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c
> >>   NTB: ntb_transport: Move internal types to ntb_transport_internal.h
> >>   NTB: ntb_transport: Export common helpers for modularization
> >>   NTB: ntb_transport: Split core library and default NTB client
> >>   NTB: ntb_transport: Add transport backend infrastructure
> >>   NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation
> >>   NTB: hw: Add remote eDMA backend registry and DesignWare backend
> >>   NTB: ntb_transport: Add remote embedded-DMA transport client
> >>   ntb_netdev: Multi-queue support
> >>   iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> >>   iommu: ipmmu-vmsa: Add support for reserved regions
> >>   arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> >>     eDMA
> >>   NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> >>   NTB: epf: Add an additional memory window (MW2) barno mapping on
> >>     Renesas R-Car
> >>   Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> >>     usage
> >>   Documentation: driver-api: ntb: Document remote embedded-DMA transport
> >>   PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper
> >>   PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
> >>   misc: pci_endpoint_test: Add remote eDMA transfer test mode
> >>   selftests: pci_endpoint: Add remote eDMA transfer coverage
> >>
> >>  Documentation/PCI/endpoint/pci-vntb-howto.rst |   19 +-
> >>  Documentation/driver-api/ntb.rst              |  193 ++
> >>  arch/arm64/boot/dts/renesas/Makefile          |    2 +
> >>  .../boot/dts/renesas/r8a779f0-spider-ep.dts   |   37 +
> >>  .../boot/dts/renesas/r8a779f0-spider-rc.dts   |   52 +
> >>  drivers/dma/dw-edma/dw-edma-core.c            |  207 +-
> >>  drivers/dma/dw-edma/dw-edma-core.h            |   10 +
> >>  drivers/dma/dw-edma/dw-edma-v0-core.c         |   26 +-
> >>  drivers/iommu/ipmmu-vmsa.c                    |    7 +-
> >>  drivers/misc/pci_endpoint_test.c              |  633 +++++
> >>  drivers/net/ntb_netdev.c                      |  341 ++-
> >>  drivers/ntb/Kconfig                           |   13 +
> >>  drivers/ntb/Makefile                          |    2 +
> >>  drivers/ntb/core.c                            |   68 +
> >>  drivers/ntb/hw/Kconfig                        |    1 +
> >>  drivers/ntb/hw/Makefile                       |    1 +
> >>  drivers/ntb/hw/edma/Kconfig                   |   28 +
> >>  drivers/ntb/hw/edma/Makefile                  |    5 +
> >>  drivers/ntb/hw/edma/backend.c                 |   87 +
> >>  drivers/ntb/hw/edma/backend.h                 |  102 +
> >>  drivers/ntb/hw/edma/ntb_dw_edma.c             |  977 +++++++
> >>  drivers/ntb/hw/epf/ntb_hw_epf.c               |  199 +-
> >>  drivers/ntb/ntb_transport.c                   | 2458 +---------------
> >>  drivers/ntb/ntb_transport_core.c              | 2523 +++++++++++++++++
> >>  drivers/ntb/ntb_transport_edma.c              | 1110 ++++++++
> >>  drivers/ntb/ntb_transport_internal.h          |  261 ++
> >>  drivers/pci/controller/dwc/pcie-designware.c  |   26 +
> >>  drivers/pci/endpoint/functions/pci-epf-test.c |  497 +++-
> >>  drivers/pci/endpoint/functions/pci-epf-vntb.c |  380 ++-
> >>  include/linux/dma/edma.h                      |  106 +
> >>  include/linux/ntb.h                           |   88 +
> >>  include/uapi/linux/pcitest.h                  |    3 +-
> >>  .../pci_endpoint/pci_endpoint_test.c          |   17 +
> >>  33 files changed, 7855 insertions(+), 2624 deletions(-)
> >>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> >>  create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> >>  create mode 100644 drivers/ntb/hw/edma/Kconfig
> >>  create mode 100644 drivers/ntb/hw/edma/Makefile
> >>  create mode 100644 drivers/ntb/hw/edma/backend.c
> >>  create mode 100644 drivers/ntb/hw/edma/backend.h
> >>  create mode 100644 drivers/ntb/hw/edma/ntb_dw_edma.c
> >>  create mode 100644 drivers/ntb/ntb_transport_core.c
> >>  create mode 100644 drivers/ntb/ntb_transport_edma.c
> >>  create mode 100644 drivers/ntb/ntb_transport_internal.h
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-21  1:38     ` Koichiro Den
@ 2026-01-21  8:41       ` Koichiro Den
  2026-01-21 15:24         ` Frank Li
  0 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-21  8:41 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Wed, Jan 21, 2026 at 10:38:53AM +0900, Koichiro Den wrote:
> On Sun, Jan 18, 2026 at 12:05:47PM -0500, Frank Li wrote:
> > On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> > > A remote eDMA provider may need to expose the linked-list (LL) memory
> > > region that was configured by platform glue (typically at boot), so the
> > > peer (host) can map it and operate the remote view of the controller.
> > >
> > > Export dw_edma_chan_get_ll_region() to return the LL region associated
> > > with a given dma_chan.
> > 
> > This informaiton passed from dwc epc driver. Is it possible to get it from
> > EPC driver.
> 
> That makes sense, from an API cleanness perspective, thanks.
> I'll add a helper function dw_pcie_edma_get_ll_region() in
> drivers/pci/controller/dwc/pcie-designware.c, instead of the current
> dw_edma_chan_get_ll_region() in dw-edma-core.c.

Hi Frank,

I looked into exposing LL regions from the EPC driver side, but the key
issue is channel identification under possibly concurrent dmaengine users.
In practice, the only stable handle a consumer has is a pointer to struct
dma_chan, and the only reliable way to map that to the eDMA hardware
channel is via dw_edma_chan->id. I think an EPC-facing API would still need
that mapping in any case, so keeping the helper in dw-edma seems simpler
and more robust.
If you have another idea, I'd appreciate your insights.

Regards,
Koichiro

> 
> Thanks for the review,
> Koichiro
> 
> > 
> > Frank
> > >
> > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > ---
> > >  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
> > >  include/linux/dma/edma.h           | 14 ++++++++++++++
> > >  2 files changed, 40 insertions(+)
> > >
> > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
> > >  }
> > >  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
> > >
> > > +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> > > +			       struct dw_edma_region *region)
> > > +{
> > > +	struct dw_edma_chip *chip;
> > > +	struct dw_edma_chan *chan;
> > > +
> > > +	if (!dchan || !region || !dchan->device)
> > > +		return -ENODEV;
> > > +
> > > +	chan = dchan2dw_edma_chan(dchan);
> > > +	if (!chan)
> > > +		return -ENODEV;
> > > +
> > > +	chip = chan->dw->chip;
> > > +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> > > +		return -EINVAL;
> > > +
> > > +	if (chan->dir == EDMA_DIR_WRITE)
> > > +		*region = chip->ll_region_wr[chan->id];
> > > +	else
> > > +		*region = chip->ll_region_rd[chan->id];
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> > > +
> > >  MODULE_LICENSE("GPL v2");
> > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > index 3c538246de07..c9ec426e27ec 100644
> > > --- a/include/linux/dma/edma.h
> > > +++ b/include/linux/dma/edma.h
> > > @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > >  int dw_edma_chan_register_notify(struct dma_chan *chan,
> > >  				 void (*cb)(struct dma_chan *chan, void *user),
> > >  				 void *user);
> > > +
> > > +/**
> > > + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> > > + * @chan: the target DMA channel
> > > + * @region: output parameter returning the corresponding LL region
> > > + */
> > > +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > +			       struct dw_edma_region *region);
> > >  #else
> > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > >  {
> > > @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
> > >  {
> > >  	return -ENODEV;
> > >  }
> > > +
> > > +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > +					     struct dw_edma_region *region)
> > > +{
> > > +	return -EINVAL;
> > > +}
> > >  #endif /* CONFIG_DW_EDMA */
> > >
> > >  struct pci_epc;
> > > --
> > > 2.51.0
> > >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-21  8:41       ` Koichiro Den
@ 2026-01-21 15:24         ` Frank Li
  2026-01-22  1:19           ` Koichiro Den
  0 siblings, 1 reply; 68+ messages in thread
From: Frank Li @ 2026-01-21 15:24 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Wed, Jan 21, 2026 at 05:41:11PM +0900, Koichiro Den wrote:
> On Wed, Jan 21, 2026 at 10:38:53AM +0900, Koichiro Den wrote:
> > On Sun, Jan 18, 2026 at 12:05:47PM -0500, Frank Li wrote:
> > > On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> > > > A remote eDMA provider may need to expose the linked-list (LL) memory
> > > > region that was configured by platform glue (typically at boot), so the
> > > > peer (host) can map it and operate the remote view of the controller.
> > > >
> > > > Export dw_edma_chan_get_ll_region() to return the LL region associated
> > > > with a given dma_chan.
> > >
> > > This informaiton passed from dwc epc driver. Is it possible to get it from
> > > EPC driver.
> >
> > That makes sense, from an API cleanness perspective, thanks.
> > I'll add a helper function dw_pcie_edma_get_ll_region() in
> > drivers/pci/controller/dwc/pcie-designware.c, instead of the current
> > dw_edma_chan_get_ll_region() in dw-edma-core.c.
>
> Hi Frank,
>
> I looked into exposing LL regions from the EPC driver side, but the key
> issue is channel identification under possibly concurrent dmaengine users.
> In practice, the only stable handle a consumer has is a pointer to struct
> dma_chan, and the only reliable way to map that to the eDMA hardware
> channel is via dw_edma_chan->id.

If possible, I suggest change to one page pre-channel. So there are a fixed
ll mapping.

> I think an EPC-facing API would still need
> that mapping in any case, so keeping the helper in dw-edma seems simpler
> and more robust.
> If you have another idea, I'd appreciate your insights.

I suggest add generally DMA engine API to get such property, some likes
a kind ioctrl \ dma_get_config().

Frank

>
> Regards,
> Koichiro
>
> >
> > Thanks for the review,
> > Koichiro
> >
> > >
> > > Frank
> > > >
> > > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > > ---
> > > >  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
> > > >  include/linux/dma/edma.h           | 14 ++++++++++++++
> > > >  2 files changed, 40 insertions(+)
> > > >
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
> > > >
> > > > +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> > > > +			       struct dw_edma_region *region)
> > > > +{
> > > > +	struct dw_edma_chip *chip;
> > > > +	struct dw_edma_chan *chan;
> > > > +
> > > > +	if (!dchan || !region || !dchan->device)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > +	if (!chan)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chip = chan->dw->chip;
> > > > +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (chan->dir == EDMA_DIR_WRITE)
> > > > +		*region = chip->ll_region_wr[chan->id];
> > > > +	else
> > > > +		*region = chip->ll_region_rd[chan->id];
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> > > > +
> > > >  MODULE_LICENSE("GPL v2");
> > > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > > index 3c538246de07..c9ec426e27ec 100644
> > > > --- a/include/linux/dma/edma.h
> > > > +++ b/include/linux/dma/edma.h
> > > > @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > > >  int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > >  				 void (*cb)(struct dma_chan *chan, void *user),
> > > >  				 void *user);
> > > > +
> > > > +/**
> > > > + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> > > > + * @chan: the target DMA channel
> > > > + * @region: output parameter returning the corresponding LL region
> > > > + */
> > > > +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > +			       struct dw_edma_region *region);
> > > >  #else
> > > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > > >  {
> > > > @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > >  {
> > > >  	return -ENODEV;
> > > >  }
> > > > +
> > > > +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > +					     struct dw_edma_region *region)
> > > > +{
> > > > +	return -EINVAL;
> > > > +}
> > > >  #endif /* CONFIG_DW_EDMA */
> > > >
> > > >  struct pci_epc;
> > > > --
> > > > 2.51.0
> > > >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-19 14:26     ` Koichiro Den
@ 2026-01-21 16:02       ` Vinod Koul
  2026-01-22  2:44         ` Koichiro Den
  2026-01-23 15:44         ` Frank Li
  0 siblings, 2 replies; 68+ messages in thread
From: Vinod Koul @ 2026-01-21 16:02 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Frank Li, dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, jdmason, allenbh, jingoohan1, lpieralisi,
	linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

On 19-01-26, 23:26, Koichiro Den wrote:
> On Sun, Jan 18, 2026 at 12:03:19PM -0500, Frank Li wrote:
> > On Sun, Jan 18, 2026 at 10:54:04PM +0900, Koichiro Den wrote:
> > > DesignWare EP eDMA can generate interrupts both locally and remotely
> > > (LIE/RIE). Remote eDMA users need to decide, per channel, whether
> > > completions should be handled locally, remotely, or both. Unless
> > > carefully configured, the endpoint and host would race to ack the
> > > interrupt.
> > >
> > > Introduce a per-channel interrupt routing mode and export small APIs to
> > > configure and query it. Update v0 programming so that RIE and local
> > > done/abort interrupt masking follow the selected mode. The default mode
> > > keeps the original behavior, so unless the new APIs are explicitly used,
> > > no functional changes.
> > >
> > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > ---
> > >  drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
> > >  drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
> > >  drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
> > >  include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
> > >  4 files changed, 116 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > index b9d59c3c0cb4..059b3996d383 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > @@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
> > >  		chan->configured = false;
> > >  		chan->request = EDMA_REQ_NONE;
> > >  		chan->status = EDMA_ST_IDLE;
> > > +		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
> > >
> > >  		if (chan->dir == EDMA_DIR_WRITE)
> > >  			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
> > > @@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
> > >  }
> > >  EXPORT_SYMBOL_GPL(dw_edma_remove);
> > >
> > > +int dw_edma_chan_irq_config(struct dma_chan *dchan,
> > > +			    enum dw_edma_ch_irq_mode mode)
> > > +{
> > > +	struct dw_edma_chan *chan;
> > > +
> > > +	switch (mode) {
> > > +	case DW_EDMA_CH_IRQ_DEFAULT:
> > > +	case DW_EDMA_CH_IRQ_LOCAL:
> > > +	case DW_EDMA_CH_IRQ_REMOTE:
> > > +		break;
> > > +	default:
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (!dchan || !dchan->device)
> > > +		return -ENODEV;
> > > +
> > > +	chan = dchan2dw_edma_chan(dchan);
> > > +	if (!chan)
> > > +		return -ENODEV;
> > > +
> > > +	chan->irq_mode = mode;
> > > +
> > > +	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
> > > +		 str_write_read(chan->dir == EDMA_DIR_WRITE),
> > > +		 chan->id, mode);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
> > > +
> > > +bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
> > > +{
> > > +	struct dw_edma_chan *chan;
> > > +	struct dw_edma *dw;
> > > +
> > > +	if (!dchan || !dchan->device)
> > > +		return false;
> > > +
> > > +	chan = dchan2dw_edma_chan(dchan);
> > > +	if (!chan)
> > > +		return false;
> > > +
> > > +	dw = chan->dw;
> > > +	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
> > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
> > > +	else
> > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
> > > +
> > >  MODULE_LICENSE("GPL v2");
> > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> > > index 71894b9e0b15..8458d676551a 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-core.h
> > > +++ b/drivers/dma/dw-edma/dw-edma-core.h
> > > @@ -81,6 +81,8 @@ struct dw_edma_chan {
> > >
> > >  	struct msi_msg			msi;
> > >
> > > +	enum dw_edma_ch_irq_mode	irq_mode;
> > > +
> > >  	enum dw_edma_request		request;
> > >  	enum dw_edma_status		status;
> > >  	u8				configured;
> > > diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > index 2850a9df80f5..80472148c335 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > @@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > >  	for_each_set_bit(pos, &val, total) {
> > >  		chan = &dw->chan[pos + off];
> > >
> > > -		dw_edma_v0_core_clear_done_int(chan);
> > > -		done(chan);
> > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > +			dw_edma_v0_core_clear_done_int(chan);
> > > +			done(chan);
> > > +		}
> > >
> > >  		ret = IRQ_HANDLED;
> > >  	}
> > > @@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > >  	for_each_set_bit(pos, &val, total) {
> > >  		chan = &dw->chan[pos + off];
> > >
> > > -		dw_edma_v0_core_clear_abort_int(chan);
> > > -		abort(chan);
> > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > +			dw_edma_v0_core_clear_abort_int(chan);
> > > +			abort(chan);
> > > +		}
> > >
> > >  		ret = IRQ_HANDLED;
> > >  	}
> > > @@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
> > >  		j--;
> > >  		if (!j) {
> > >  			control |= DW_EDMA_V0_LIE;
> > > -			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
> > > +			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
> > > +			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
> > >  				control |= DW_EDMA_V0_RIE;
> > >  		}
> > >
> > > @@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > >  				break;
> > >  			}
> > >  		}
> > > -		/* Interrupt unmask - done, abort */
> > > +		/* Interrupt mask/unmask - done, abort */
> > >  		raw_spin_lock_irqsave(&dw->lock, flags);
> > >
> > >  		tmp = GET_RW_32(dw, chan->dir, int_mask);
> > > -		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > -		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > +		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> > > +			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > +			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > +		} else {
> > > +			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > +			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > +		}
> > >  		SET_RW_32(dw, chan->dir, int_mask, tmp);
> > >  		/* Linked list error */
> > >  		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > index ffad10ff2cd6..6f50165ac084 100644
> > > --- a/include/linux/dma/edma.h
> > > +++ b/include/linux/dma/edma.h
> > > @@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
> > >  	DW_EDMA_CHIP_LOCAL	= BIT(0),
> > >  };
> > >
> > > +/*
> > > + * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
> > > + * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
> > > + * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
> > > + * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
> > > + *
> > > + * Some implementations require using LIE=1/RIE=1 with the local interrupt
> > > + * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
> > > + * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
> > > + * Write Interrupt Generation".
> > > + */
> > > +enum dw_edma_ch_irq_mode {
> > > +	DW_EDMA_CH_IRQ_DEFAULT	= 0,
> > > +	DW_EDMA_CH_IRQ_LOCAL,
> > > +	DW_EDMA_CH_IRQ_REMOTE,
> > > +};
> > > +
> > >  /**
> > >   * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
> > >   * @dev:		 struct device of the eDMA controller
> > > @@ -105,6 +122,22 @@ struct dw_edma_chip {
> > >  #if IS_REACHABLE(CONFIG_DW_EDMA)
> > >  int dw_edma_probe(struct dw_edma_chip *chip);
> > >  int dw_edma_remove(struct dw_edma_chip *chip);
> > > +/**
> > > + * dw_edma_chan_irq_config - configure per-channel interrupt routing
> > > + * @chan: DMA channel obtained from dma_request_channel()
> > > + * @mode: interrupt routing mode
> > > + *
> > > + * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
> > > + * not belong to the DesignWare eDMA driver.
> > > + */
> > > +int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > +			    enum dw_edma_ch_irq_mode mode);
> > > +
> > > +/**
> > > + * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
> > > + * @chan: DMA channel obtained from dma_request_channel()
> > > + */
> > > +bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > >  #else
> > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > >  {
> > > @@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
> > >  {
> > >  	return 0;
> > >  }
> > > +
> > > +static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > +					  enum dw_edma_ch_irq_mode mode)
> > > +{
> > > +	return -ENODEV;
> > > +}
> > > +
> > > +static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
> > > +{
> > > +	return false;
> > > +}
> > 
> > I think it'd better go thought
> > 
> > struct dma_slave_config {
> > 	...
> >         void *peripheral_config;
> > 	size_t peripheral_size;
> > 
> > };
> > 
> > So DMA consumer can use standard DMAengine API, dmaengine_slave_config().
> 
> Using .peripheral_config wasn't something I had initially considered, but I
> agree that this is preferable in the sense that it avoids introducing the
> additional exported APIs. I'm not entirely sure whether it's clean to use
> it for non-peripheral settings in the strict sense, but there seem to be
> precedents such as stm32_mdma_dma_config, so I guess it seems acceptable.
> If I'm missing something, please correct me.

Strictly speaking slave config should be used for peripheral transfers.
For memcpy users (this seems more like that), I would argue slave config
does not make much sense.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-21 15:24         ` Frank Li
@ 2026-01-22  1:19           ` Koichiro Den
  2026-01-22  1:54             ` Frank Li
  0 siblings, 1 reply; 68+ messages in thread
From: Koichiro Den @ 2026-01-22  1:19 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Wed, Jan 21, 2026 at 10:24:01AM -0500, Frank Li wrote:
> On Wed, Jan 21, 2026 at 05:41:11PM +0900, Koichiro Den wrote:
> > On Wed, Jan 21, 2026 at 10:38:53AM +0900, Koichiro Den wrote:
> > > On Sun, Jan 18, 2026 at 12:05:47PM -0500, Frank Li wrote:
> > > > On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> > > > > A remote eDMA provider may need to expose the linked-list (LL) memory
> > > > > region that was configured by platform glue (typically at boot), so the
> > > > > peer (host) can map it and operate the remote view of the controller.
> > > > >
> > > > > Export dw_edma_chan_get_ll_region() to return the LL region associated
> > > > > with a given dma_chan.
> > > >
> > > > This informaiton passed from dwc epc driver. Is it possible to get it from
> > > > EPC driver.
> > >
> > > That makes sense, from an API cleanness perspective, thanks.
> > > I'll add a helper function dw_pcie_edma_get_ll_region() in
> > > drivers/pci/controller/dwc/pcie-designware.c, instead of the current
> > > dw_edma_chan_get_ll_region() in dw-edma-core.c.
> >
> > Hi Frank,
> >
> > I looked into exposing LL regions from the EPC driver side, but the key
> > issue is channel identification under possibly concurrent dmaengine users.
> > In practice, the only stable handle a consumer has is a pointer to struct
> > dma_chan, and the only reliable way to map that to the eDMA hardware
> > channel is via dw_edma_chan->id.
> 
> If possible, I suggest change to one page pre-channel. So there are a fixed
> ll mapping.

I agree that this would make the LL layout more deterministic and would
indeed simplify locating the region for a given dw_edma_chan ID. That said,
my concern was that even with a fixed per-channel layout, we still need a
reliable way to map a struct dma_chan obtained by a consumer to the
corresponding dw_edma_chan ID, especially in the presence of potentially
concurrent dmaengine users.

> 
> > I think an EPC-facing API would still need
> > that mapping in any case, so keeping the helper in dw-edma seems simpler
> > and more robust.
> > If you have another idea, I'd appreciate your insights.
> 
> I suggest add generally DMA engine API to get such property, some likes
> a kind ioctrl \ dma_get_config().

I think such a helper, combined with your one page per-channel idea, would
resolve the issue cleanly. For example, a helper like dma_get_hw_info()
returning struct dma_hw_info, whose first field is a hw_id, could work
well. Consumers could then use this helper, and if they know they are
dealing with a dw-edma channel, they can derive the LL location
straightforwardly as {hw_id * fixed_stride (e.g. PAGE_SIZE)}. Adding hw_id
to struct dma_slave_caps would make the necessary diff smaller, but I think
it would not semantically fit in the structure.

Thanks,
Koichiro

> 
> Frank
> 
> >
> > Regards,
> > Koichiro
> >
> > >
> > > Thanks for the review,
> > > Koichiro
> > >
> > > >
> > > > Frank
> > > > >
> > > > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > > > ---
> > > > >  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
> > > > >  include/linux/dma/edma.h           | 14 ++++++++++++++
> > > > >  2 files changed, 40 insertions(+)
> > > > >
> > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
> > > > >
> > > > > +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> > > > > +			       struct dw_edma_region *region)
> > > > > +{
> > > > > +	struct dw_edma_chip *chip;
> > > > > +	struct dw_edma_chan *chan;
> > > > > +
> > > > > +	if (!dchan || !region || !dchan->device)
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > > +	if (!chan)
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	chip = chan->dw->chip;
> > > > > +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	if (chan->dir == EDMA_DIR_WRITE)
> > > > > +		*region = chip->ll_region_wr[chan->id];
> > > > > +	else
> > > > > +		*region = chip->ll_region_rd[chan->id];
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> > > > > +
> > > > >  MODULE_LICENSE("GPL v2");
> > > > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > > > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > > > index 3c538246de07..c9ec426e27ec 100644
> > > > > --- a/include/linux/dma/edma.h
> > > > > +++ b/include/linux/dma/edma.h
> > > > > @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > > > >  int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > > >  				 void (*cb)(struct dma_chan *chan, void *user),
> > > > >  				 void *user);
> > > > > +
> > > > > +/**
> > > > > + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> > > > > + * @chan: the target DMA channel
> > > > > + * @region: output parameter returning the corresponding LL region
> > > > > + */
> > > > > +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > > +			       struct dw_edma_region *region);
> > > > >  #else
> > > > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > > > >  {
> > > > > @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > > >  {
> > > > >  	return -ENODEV;
> > > > >  }
> > > > > +
> > > > > +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > > +					     struct dw_edma_region *region)
> > > > > +{
> > > > > +	return -EINVAL;
> > > > > +}
> > > > >  #endif /* CONFIG_DW_EDMA */
> > > > >
> > > > >  struct pci_epc;
> > > > > --
> > > > > 2.51.0
> > > > >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region
  2026-01-22  1:19           ` Koichiro Den
@ 2026-01-22  1:54             ` Frank Li
  0 siblings, 0 replies; 68+ messages in thread
From: Frank Li @ 2026-01-22  1:54 UTC (permalink / raw)
  To: Koichiro Den
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Thu, Jan 22, 2026 at 10:19:24AM +0900, Koichiro Den wrote:
> On Wed, Jan 21, 2026 at 10:24:01AM -0500, Frank Li wrote:
> > On Wed, Jan 21, 2026 at 05:41:11PM +0900, Koichiro Den wrote:
> > > On Wed, Jan 21, 2026 at 10:38:53AM +0900, Koichiro Den wrote:
> > > > On Sun, Jan 18, 2026 at 12:05:47PM -0500, Frank Li wrote:
> > > > > On Sun, Jan 18, 2026 at 10:54:07PM +0900, Koichiro Den wrote:
> > > > > > A remote eDMA provider may need to expose the linked-list (LL) memory
> > > > > > region that was configured by platform glue (typically at boot), so the
> > > > > > peer (host) can map it and operate the remote view of the controller.
> > > > > >
> > > > > > Export dw_edma_chan_get_ll_region() to return the LL region associated
> > > > > > with a given dma_chan.
> > > > >
> > > > > This informaiton passed from dwc epc driver. Is it possible to get it from
> > > > > EPC driver.
> > > >
> > > > That makes sense, from an API cleanness perspective, thanks.
> > > > I'll add a helper function dw_pcie_edma_get_ll_region() in
> > > > drivers/pci/controller/dwc/pcie-designware.c, instead of the current
> > > > dw_edma_chan_get_ll_region() in dw-edma-core.c.
> > >
> > > Hi Frank,
> > >
> > > I looked into exposing LL regions from the EPC driver side, but the key
> > > issue is channel identification under possibly concurrent dmaengine users.
> > > In practice, the only stable handle a consumer has is a pointer to struct
> > > dma_chan, and the only reliable way to map that to the eDMA hardware
> > > channel is via dw_edma_chan->id.
> >
> > If possible, I suggest change to one page pre-channel. So there are a fixed
> > ll mapping.
>
> I agree that this would make the LL layout more deterministic and would
> indeed simplify locating the region for a given dw_edma_chan ID. That said,
> my concern was that even with a fixed per-channel layout, we still need a
> reliable way to map a struct dma_chan obtained by a consumer to the
> corresponding dw_edma_chan ID, especially in the presence of potentially
> concurrent dmaengine users.
>
> >
> > > I think an EPC-facing API would still need
> > > that mapping in any case, so keeping the helper in dw-edma seems simpler
> > > and more robust.
> > > If you have another idea, I'd appreciate your insights.
> >
> > I suggest add generally DMA engine API to get such property, some likes
> > a kind ioctrl \ dma_get_config().
>
> I think such a helper, combined with your one page per-channel idea, would
> resolve the issue cleanly. For example, a helper like dma_get_hw_info()
> returning struct dma_hw_info, whose first field is a hw_id, could work
> well. Consumers could then use this helper, and if they know they are
> dealing with a dw-edma channel, they can derive the LL location
> straightforwardly as {hw_id * fixed_stride (e.g. PAGE_SIZE)}. Adding hw_id
> to struct dma_slave_caps would make the necessary diff smaller, but I think
> it would not semantically fit in the structure.

It is worth to try.

Frank
>
> Thanks,
> Koichiro
>
> >
> > Frank
> >
> > >
> > > Regards,
> > > Koichiro
> > >
> > > >
> > > > Thanks for the review,
> > > > Koichiro
> > > >
> > > > >
> > > > > Frank
> > > > > >
> > > > > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > > > > ---
> > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 26 ++++++++++++++++++++++++++
> > > > > >  include/linux/dma/edma.h           | 14 ++++++++++++++
> > > > > >  2 files changed, 40 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > index 0eb8fc1dcc34..c4fb66a9b5f5 100644
> > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > @@ -1209,6 +1209,32 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
> > > > > >
> > > > > > +int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
> > > > > > +			       struct dw_edma_region *region)
> > > > > > +{
> > > > > > +	struct dw_edma_chip *chip;
> > > > > > +	struct dw_edma_chan *chan;
> > > > > > +
> > > > > > +	if (!dchan || !region || !dchan->device)
> > > > > > +		return -ENODEV;
> > > > > > +
> > > > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > > > +	if (!chan)
> > > > > > +		return -ENODEV;
> > > > > > +
> > > > > > +	chip = chan->dw->chip;
> > > > > > +	if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
> > > > > > +		return -EINVAL;
> > > > > > +
> > > > > > +	if (chan->dir == EDMA_DIR_WRITE)
> > > > > > +		*region = chip->ll_region_wr[chan->id];
> > > > > > +	else
> > > > > > +		*region = chip->ll_region_rd[chan->id];
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
> > > > > > +
> > > > > >  MODULE_LICENSE("GPL v2");
> > > > > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > > > > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > > > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > > > > index 3c538246de07..c9ec426e27ec 100644
> > > > > > --- a/include/linux/dma/edma.h
> > > > > > +++ b/include/linux/dma/edma.h
> > > > > > @@ -153,6 +153,14 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > > > > >  int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > > > >  				 void (*cb)(struct dma_chan *chan, void *user),
> > > > > >  				 void *user);
> > > > > > +
> > > > > > +/**
> > > > > > + * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
> > > > > > + * @chan: the target DMA channel
> > > > > > + * @region: output parameter returning the corresponding LL region
> > > > > > + */
> > > > > > +int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > > > +			       struct dw_edma_region *region);
> > > > > >  #else
> > > > > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > > > > >  {
> > > > > > @@ -182,6 +190,12 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
> > > > > >  {
> > > > > >  	return -ENODEV;
> > > > > >  }
> > > > > > +
> > > > > > +static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
> > > > > > +					     struct dw_edma_region *region)
> > > > > > +{
> > > > > > +	return -EINVAL;
> > > > > > +}
> > > > > >  #endif /* CONFIG_DW_EDMA */
> > > > > >
> > > > > >  struct pci_epc;
> > > > > > --
> > > > > > 2.51.0
> > > > > >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-21 16:02       ` Vinod Koul
@ 2026-01-22  2:44         ` Koichiro Den
  2026-01-23 15:44         ` Frank Li
  1 sibling, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-22  2:44 UTC (permalink / raw)
  To: Vinod Koul, mani
  Cc: Frank Li, dave.jiang, cassel, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, jdmason, allenbh, jingoohan1, lpieralisi,
	linux-pci, linux-doc, linux-kernel, linux-renesas-soc, devicetree,
	dmaengine, iommu, ntb, netdev, linux-kselftest, arnd, gregkh,
	joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt, corbet,
	skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Wed, Jan 21, 2026 at 09:32:37PM +0530, Vinod Koul wrote:
> On 19-01-26, 23:26, Koichiro Den wrote:
> > On Sun, Jan 18, 2026 at 12:03:19PM -0500, Frank Li wrote:
> > > On Sun, Jan 18, 2026 at 10:54:04PM +0900, Koichiro Den wrote:
> > > > DesignWare EP eDMA can generate interrupts both locally and remotely
> > > > (LIE/RIE). Remote eDMA users need to decide, per channel, whether
> > > > completions should be handled locally, remotely, or both. Unless
> > > > carefully configured, the endpoint and host would race to ack the
> > > > interrupt.
> > > >
> > > > Introduce a per-channel interrupt routing mode and export small APIs to
> > > > configure and query it. Update v0 programming so that RIE and local
> > > > done/abort interrupt masking follow the selected mode. The default mode
> > > > keeps the original behavior, so unless the new APIs are explicitly used,
> > > > no functional changes.
> > > >
> > > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > > ---
> > > >  drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
> > > >  drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
> > > >  drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
> > > >  include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
> > > >  4 files changed, 116 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index b9d59c3c0cb4..059b3996d383 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
> > > >  		chan->configured = false;
> > > >  		chan->request = EDMA_REQ_NONE;
> > > >  		chan->status = EDMA_ST_IDLE;
> > > > +		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
> > > >
> > > >  		if (chan->dir == EDMA_DIR_WRITE)
> > > >  			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
> > > > @@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(dw_edma_remove);
> > > >
> > > > +int dw_edma_chan_irq_config(struct dma_chan *dchan,
> > > > +			    enum dw_edma_ch_irq_mode mode)
> > > > +{
> > > > +	struct dw_edma_chan *chan;
> > > > +
> > > > +	switch (mode) {
> > > > +	case DW_EDMA_CH_IRQ_DEFAULT:
> > > > +	case DW_EDMA_CH_IRQ_LOCAL:
> > > > +	case DW_EDMA_CH_IRQ_REMOTE:
> > > > +		break;
> > > > +	default:
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	if (!dchan || !dchan->device)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > +	if (!chan)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chan->irq_mode = mode;
> > > > +
> > > > +	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
> > > > +		 str_write_read(chan->dir == EDMA_DIR_WRITE),
> > > > +		 chan->id, mode);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
> > > > +
> > > > +bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
> > > > +{
> > > > +	struct dw_edma_chan *chan;
> > > > +	struct dw_edma *dw;
> > > > +
> > > > +	if (!dchan || !dchan->device)
> > > > +		return false;
> > > > +
> > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > +	if (!chan)
> > > > +		return false;
> > > > +
> > > > +	dw = chan->dw;
> > > > +	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
> > > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
> > > > +	else
> > > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
> > > > +
> > > >  MODULE_LICENSE("GPL v2");
> > > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> > > > index 71894b9e0b15..8458d676551a 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.h
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.h
> > > > @@ -81,6 +81,8 @@ struct dw_edma_chan {
> > > >
> > > >  	struct msi_msg			msi;
> > > >
> > > > +	enum dw_edma_ch_irq_mode	irq_mode;
> > > > +
> > > >  	enum dw_edma_request		request;
> > > >  	enum dw_edma_status		status;
> > > >  	u8				configured;
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > index 2850a9df80f5..80472148c335 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > @@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > > >  	for_each_set_bit(pos, &val, total) {
> > > >  		chan = &dw->chan[pos + off];
> > > >
> > > > -		dw_edma_v0_core_clear_done_int(chan);
> > > > -		done(chan);
> > > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > > +			dw_edma_v0_core_clear_done_int(chan);
> > > > +			done(chan);
> > > > +		}
> > > >
> > > >  		ret = IRQ_HANDLED;
> > > >  	}
> > > > @@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > > >  	for_each_set_bit(pos, &val, total) {
> > > >  		chan = &dw->chan[pos + off];
> > > >
> > > > -		dw_edma_v0_core_clear_abort_int(chan);
> > > > -		abort(chan);
> > > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > > +			dw_edma_v0_core_clear_abort_int(chan);
> > > > +			abort(chan);
> > > > +		}
> > > >
> > > >  		ret = IRQ_HANDLED;
> > > >  	}
> > > > @@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
> > > >  		j--;
> > > >  		if (!j) {
> > > >  			control |= DW_EDMA_V0_LIE;
> > > > -			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
> > > > +			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
> > > > +			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
> > > >  				control |= DW_EDMA_V0_RIE;
> > > >  		}
> > > >
> > > > @@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > > >  				break;
> > > >  			}
> > > >  		}
> > > > -		/* Interrupt unmask - done, abort */
> > > > +		/* Interrupt mask/unmask - done, abort */
> > > >  		raw_spin_lock_irqsave(&dw->lock, flags);
> > > >
> > > >  		tmp = GET_RW_32(dw, chan->dir, int_mask);
> > > > -		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > -		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> > > > +			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > +			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		} else {
> > > > +			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > +			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		}
> > > >  		SET_RW_32(dw, chan->dir, int_mask, tmp);
> > > >  		/* Linked list error */
> > > >  		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> > > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > > index ffad10ff2cd6..6f50165ac084 100644
> > > > --- a/include/linux/dma/edma.h
> > > > +++ b/include/linux/dma/edma.h
> > > > @@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
> > > >  	DW_EDMA_CHIP_LOCAL	= BIT(0),
> > > >  };
> > > >
> > > > +/*
> > > > + * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
> > > > + * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
> > > > + * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
> > > > + * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
> > > > + *
> > > > + * Some implementations require using LIE=1/RIE=1 with the local interrupt
> > > > + * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
> > > > + * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
> > > > + * Write Interrupt Generation".
> > > > + */
> > > > +enum dw_edma_ch_irq_mode {
> > > > +	DW_EDMA_CH_IRQ_DEFAULT	= 0,
> > > > +	DW_EDMA_CH_IRQ_LOCAL,
> > > > +	DW_EDMA_CH_IRQ_REMOTE,
> > > > +};
> > > > +
> > > >  /**
> > > >   * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
> > > >   * @dev:		 struct device of the eDMA controller
> > > > @@ -105,6 +122,22 @@ struct dw_edma_chip {
> > > >  #if IS_REACHABLE(CONFIG_DW_EDMA)
> > > >  int dw_edma_probe(struct dw_edma_chip *chip);
> > > >  int dw_edma_remove(struct dw_edma_chip *chip);
> > > > +/**
> > > > + * dw_edma_chan_irq_config - configure per-channel interrupt routing
> > > > + * @chan: DMA channel obtained from dma_request_channel()
> > > > + * @mode: interrupt routing mode
> > > > + *
> > > > + * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
> > > > + * not belong to the DesignWare eDMA driver.
> > > > + */
> > > > +int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > > +			    enum dw_edma_ch_irq_mode mode);
> > > > +
> > > > +/**
> > > > + * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
> > > > + * @chan: DMA channel obtained from dma_request_channel()
> > > > + */
> > > > +bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > > >  #else
> > > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > > >  {
> > > > @@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
> > > >  {
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > > +					  enum dw_edma_ch_irq_mode mode)
> > > > +{
> > > > +	return -ENODEV;
> > > > +}
> > > > +
> > > > +static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
> > > > +{
> > > > +	return false;
> > > > +}
> > > 
> > > I think it'd better go thought
> > > 
> > > struct dma_slave_config {
> > > 	...
> > >         void *peripheral_config;
> > > 	size_t peripheral_size;
> > > 
> > > };
> > > 
> > > So DMA consumer can use standard DMAengine API, dmaengine_slave_config().
> > 
> > Using .peripheral_config wasn't something I had initially considered, but I
> > agree that this is preferable in the sense that it avoids introducing the
> > additional exported APIs. I'm not entirely sure whether it's clean to use
> > it for non-peripheral settings in the strict sense, but there seem to be
> > precedents such as stm32_mdma_dma_config, so I guess it seems acceptable.
> > If I'm missing something, please correct me.
> 
> Strictly speaking slave config should be used for peripheral transfers.
> For memcpy users (this seems more like that), I would argue slave config
> does not make much sense.

Thank you for the comment. Understood, so it seems outside the intended
semantics of .peripheral_config.

Now I see two possible directions:

1. Keep my original approach (i.e. add a dw-edma specific exported helper in
   dw-edma-core, like dw_edma_chan_irq_config()).

2. Introduce a more generic mechanism than .peripheral_config/size (e.g.
   .hw_config/size), and use that instead.

If you see a better approach, I'd be glad to hear it. Also, Mani's input on
whether or not (1) is acceptable in the overall picture would be helpful
(from a dw-edma-core maintainer perspective).

Regards,
Koichiro

> 
> -- 
> ~Vinod

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode
  2026-01-19 20:47   ` Frank Li
@ 2026-01-22 14:54     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-22 14:54 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:47:27PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:38PM +0900, Koichiro Den wrote:
> > Some DesignWare-based endpoints integrate an eDMA engine that can be
> > programmed by the host via MMIO. The upcoming NTB transport remote-eDMA
> > backend relies on this capability, but there is currently no upstream
> > test coverage for the end-to-end control and data path.
> >
> > Extend pci-epf-test with an optional remote eDMA test backend (built when
> > CONFIG_DW_EDMA is enabled).
> >
> > - Reserve a spare BAR and expose a small 'pcitest_edma_info' header at
> >   BAR offset 0. The header carries a magic/version and describes the
> >   endpoint eDMA register window, per-direction linked-list (LL)
> >   locations and an endpoint test buffer.
> > - Map the eDMA registers and LL locations into that BAR using BAR
> >   subrange mappings (address-match inbound iATU).
> >
> > To run this extra testing, two new endpoint commands are added:
> >   * COMMAND_REMOTE_EDMA_SETUP
> >   * COMMAND_REMOTE_EDMA_CHECKSUM
> >
> > When the former command is received, the endpoint prepares for the
> > remote eDMA transfer. The CHECKSUM command is useful for Host-to-EP
> > transfer testing, as the endpoint side is not expected to receive the
> > DMA completion interrupt directly. Instead, the host asks the endpoint
> > to compute a CRC32 over the transferred data.
> >
> > This backend is exercised by the host-side pci_endpoint_test driver via a
> > new UAPI flag.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-test.c | 477 ++++++++++++++++++
> 
> This patch should be combined into your submap patches, which is one user
> of submap.

Thanks for the comment, and my apologies for the delayed response to this.

The pci endpoint test case addition depends on both of the following prerequisites:

1) [PATCH v9 0/5] PCI: endpoint: BAR subrange mapping support
   https://lore.kernel.org/all/20260122084909.2390865-1-den@valinux.co.jp/

2) A not-yet-submitted series for Patch 01-05, as described in the "Patch
   layout" section of the cover letter:
   https://lore.kernel.org/all/20260118135440.1958279-1-den@valinux.co.jp/

   [...]
   1. dw-edma / DesignWare EP helpers needed for remote embedded-DMA (export
      register/LL windows, IRQ routing control, etc.)

      Patch 01 : dmaengine: dw-edma: Export helper to get integrated register window
      Patch 02 : dmaengine: dw-edma: Add per-channel interrupt routing control
      Patch 03 : dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
      Patch 04 : dmaengine: dw-edma: Add notify-only channels support
      Patch 05 : dmaengine: dw-edma: Add a helper to query linked-list region
   [...]

   I plan to submit these patches shortly, perhaps as a single series, once the
   design discussion in the following thread is resolved:
   https://lore.kernel.org/all/2bcksnyuxj33bjctjombrstfvjrcdtap6i3v6xhfxtqjmbdkwm@jcaoy2iuh5pr/
   Thank you for reviewing that discussion as well.

Given that (1) precedes (2), it should be reasonable to include the PCI
endpoint test case additions (Patchs 35-38) as part of the series in (2).

Kind regards,
Koichiro

> 
> Frank
> 
> >  1 file changed, 477 insertions(+)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> > index e560c3becebb..eea10bddcd2a 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/delay.h>
> >  #include <linux/dmaengine.h>
> >  #include <linux/io.h>
> > +#include <linux/iommu.h>
> >  #include <linux/module.h>
> >  #include <linux/msi.h>
> >  #include <linux/slab.h>
> > @@ -33,6 +34,8 @@
> >  #define COMMAND_COPY			BIT(5)
> >  #define COMMAND_ENABLE_DOORBELL		BIT(6)
> >  #define COMMAND_DISABLE_DOORBELL	BIT(7)
> > +#define COMMAND_REMOTE_EDMA_SETUP	BIT(8)
> > +#define COMMAND_REMOTE_EDMA_CHECKSUM	BIT(9)
> >
> >  #define STATUS_READ_SUCCESS		BIT(0)
> >  #define STATUS_READ_FAIL		BIT(1)
> > @@ -48,6 +51,10 @@
> >  #define STATUS_DOORBELL_ENABLE_FAIL	BIT(11)
> >  #define STATUS_DOORBELL_DISABLE_SUCCESS BIT(12)
> >  #define STATUS_DOORBELL_DISABLE_FAIL	BIT(13)
> > +#define STATUS_REMOTE_EDMA_SETUP_SUCCESS	BIT(14)
> > +#define STATUS_REMOTE_EDMA_SETUP_FAIL		BIT(15)
> > +#define STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS	BIT(16)
> > +#define STATUS_REMOTE_EDMA_CHECKSUM_FAIL	BIT(17)
> >
> >  #define FLAG_USE_DMA			BIT(0)
> >
> > @@ -77,6 +84,9 @@ struct pci_epf_test {
> >  	bool			dma_private;
> >  	const struct pci_epc_features *epc_features;
> >  	struct pci_epf_bar	db_bar;
> > +
> > +	/* For extended tests that rely on vendor-specific features */
> > +	void *data;
> >  };
> >
> >  struct pci_epf_test_reg {
> > @@ -117,6 +127,454 @@ static enum pci_barno pci_epf_test_next_free_bar(struct pci_epf_test *epf_test)
> >  	return bar;
> >  }
> >
> > +#if IS_REACHABLE(CONFIG_DW_EDMA)
> > +#include <linux/dma/edma.h>
> > +
> > +#define PCITEST_EDMA_INFO_MAGIC		0x414d4445U /* 'EDMA' */
> > +#define PCITEST_EDMA_INFO_VERSION	0x00010000U
> > +#define PCITEST_EDMA_TEST_BUF_SIZE	(1024 * 1024)
> > +
> > +struct pci_epf_test_edma {
> > +	/* Remote eDMA test resources */
> > +	bool			enabled;
> > +	enum pci_barno		bar;
> > +	void			*info;
> > +	size_t			total_size;
> > +	void			*test_buf;
> > +	dma_addr_t		test_buf_phys;
> > +	size_t			test_buf_size;
> > +
> > +	/* DW eDMA specifics */
> > +	phys_addr_t		reg_phys;
> > +	size_t			reg_submap_sz;
> > +	unsigned long		reg_iova;
> > +	size_t			reg_iova_sz;
> > +	phys_addr_t		ll_rd_phys;
> > +	size_t			ll_rd_sz_aligned;
> > +	phys_addr_t		ll_wr_phys;
> > +	size_t			ll_wr_sz_aligned;
> > +};
> > +
> > +struct pcitest_edma_info {
> > +	__le32 magic;
> > +	__le32 version;
> > +
> > +	__le32 reg_off;
> > +	__le32 reg_size;
> > +
> > +	__le64 ll_rd_phys;
> > +	__le32 ll_rd_off;
> > +	__le32 ll_rd_size;
> > +
> > +	__le64 ll_wr_phys;
> > +	__le32 ll_wr_off;
> > +	__le32 ll_wr_size;
> > +
> > +	__le64 test_buf_phys;
> > +	__le32 test_buf_size;
> > +};
> > +
> > +static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
> > +					 enum pci_barno barno)
> > +{
> > +	struct pci_epf_test_edma *edma = test->data;
> > +
> > +	if (!edma)
> > +		return false;
> > +
> > +	return barno == edma->bar;
> > +}
> > +
> > +static void pci_epf_test_clear_submaps(struct pci_epf_bar *bar)
> > +{
> > +	kfree(bar->submap);
> > +	bar->submap = NULL;
> > +	bar->num_submap = 0;
> > +}
> > +
> > +static int pci_epf_test_add_submap(struct pci_epf_bar *bar, phys_addr_t phys,
> > +				   size_t size)
> > +{
> > +	struct pci_epf_bar_submap *submap, *new;
> > +
> > +	new = krealloc_array(bar->submap, bar->num_submap + 1, sizeof(*new),
> > +			     GFP_KERNEL);
> > +	if (!new)
> > +		return -ENOMEM;
> > +
> > +	bar->submap = new;
> > +	submap = &bar->submap[bar->num_submap];
> > +	submap->phys_addr = phys;
> > +	submap->size = size;
> > +	bar->num_submap++;
> > +
> > +	return 0;
> > +}
> > +
> > +static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
> > +{
> > +	struct pci_epf_test_edma *edma = test->data;
> > +	struct pci_epf *epf = test->epf;
> > +	struct pci_epc *epc = epf->epc;
> > +	struct device *dev = epc->dev.parent;
> > +	struct iommu_domain *dom;
> > +	struct pci_epf_bar *bar;
> > +	enum pci_barno barno;
> > +
> > +	if (!edma)
> > +		return;
> > +
> > +	barno = edma->bar;
> > +	if (barno == NO_BAR)
> > +		return;
> > +
> > +	bar = &epf->bar[barno];
> > +
> > +	dom = iommu_get_domain_for_dev(dev);
> > +	if (dom && edma->reg_iova_sz) {
> > +		iommu_unmap(dom, edma->reg_iova, edma->reg_iova_sz);
> > +		edma->reg_iova = 0;
> > +		edma->reg_iova_sz = 0;
> > +	}
> > +
> > +	if (edma->test_buf) {
> > +		dma_free_coherent(dev, edma->test_buf_size,
> > +				  edma->test_buf,
> > +				  edma->test_buf_phys);
> > +		edma->test_buf = NULL;
> > +		edma->test_buf_phys = 0;
> > +		edma->test_buf_size = 0;
> > +	}
> > +
> > +	if (edma->info) {
> > +		pci_epf_free_space(epf, edma->info, barno, PRIMARY_INTERFACE);
> > +		edma->info = NULL;
> > +	}
> > +
> > +	pci_epf_test_clear_submaps(bar);
> > +	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, bar);
> > +
> > +	edma->bar = NO_BAR;
> > +	edma->enabled = false;
> > +}
> > +
> > +static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
> > +{
> > +	const struct pci_epc_features *epc_features = test->epc_features;
> > +	struct pci_epf_test_edma *edma;
> > +	struct pci_epf *epf = test->epf;
> > +	struct pci_epc *epc = epf->epc;
> > +	struct pcitest_edma_info *info;
> > +	struct device *dev = epc->dev.parent;
> > +	struct dw_edma_region region;
> > +	struct iommu_domain *dom;
> > +	size_t reg_sz_aligned, ll_rd_sz_aligned, ll_wr_sz_aligned;
> > +	phys_addr_t phys, ll_rd_phys, ll_wr_phys;
> > +	size_t ll_rd_size, ll_wr_size;
> > +	resource_size_t reg_size;
> > +	unsigned long iova;
> > +	size_t off, size;
> > +	int ret;
> > +
> > +	if (!test->dma_chan_tx || !test->dma_chan_rx)
> > +		return -ENODEV;
> > +
> > +	edma = devm_kzalloc(&epf->dev, sizeof(*edma), GFP_KERNEL);
> > +	if (!edma)
> > +		return -ENOMEM;
> > +	test->data = edma;
> > +
> > +	edma->bar = pci_epf_test_next_free_bar(test);
> > +	if (edma->bar == NO_BAR) {
> > +		dev_err(&epf->dev, "No spare BAR for remote eDMA (remote eDMA disabled)\n");
> > +		ret = -ENOSPC;
> > +		goto err;
> > +	}
> > +
> > +	ret = dw_edma_get_reg_window(epc, &edma->reg_phys, &reg_size);
> > +	if (ret) {
> > +		dev_err(dev, "failed to get edma reg window: %d\n", ret);
> > +		goto err;
> > +	}
> > +	dom = iommu_get_domain_for_dev(dev);
> > +	if (dom) {
> > +		phys = edma->reg_phys & PAGE_MASK;
> > +		size = PAGE_ALIGN(reg_size + edma->reg_phys - phys);
> > +		iova = phys;
> > +
> > +		ret = iommu_map(dom, iova, phys, size,
> > +				IOMMU_READ | IOMMU_WRITE | IOMMU_MMIO,
> > +				GFP_KERNEL);
> > +		if (ret) {
> > +			dev_err(dev, "failed to direct map eDMA reg: %d\n", ret);
> > +			goto err;
> > +		}
> > +		edma->reg_iova = iova;
> > +		edma->reg_iova_sz = size;
> > +	}
> > +
> > +	/* Get LL location addresses and sizes */
> > +	ret = dw_edma_chan_get_ll_region(test->dma_chan_rx, &region);
> > +	if (ret) {
> > +		dev_err(dev, "failed to get edma ll region for rx: %d\n", ret);
> > +		goto err;
> > +	}
> > +	ll_rd_phys = region.paddr;
> > +	ll_rd_size = region.sz;
> > +
> > +	ret = dw_edma_chan_get_ll_region(test->dma_chan_tx, &region);
> > +	if (ret) {
> > +		dev_err(dev, "failed to get edma ll region for tx: %d\n", ret);
> > +		goto err;
> > +	}
> > +	ll_wr_phys = region.paddr;
> > +	ll_wr_size = region.sz;
> > +
> > +	edma->test_buf_size = PCITEST_EDMA_TEST_BUF_SIZE;
> > +	edma->test_buf = dma_alloc_coherent(dev, edma->test_buf_size,
> > +					    &edma->test_buf_phys, GFP_KERNEL);
> > +	if (!edma->test_buf) {
> > +		ret = -ENOMEM;
> > +		goto err;
> > +	}
> > +
> > +	reg_sz_aligned = PAGE_ALIGN(reg_size);
> > +	ll_rd_sz_aligned = PAGE_ALIGN(ll_rd_size);
> > +	ll_wr_sz_aligned = PAGE_ALIGN(ll_wr_size);
> > +	edma->total_size = PAGE_SIZE + reg_sz_aligned + ll_rd_sz_aligned +
> > +			   ll_wr_sz_aligned;
> > +	size = roundup_pow_of_two(edma->total_size);
> > +
> > +	info = pci_epf_alloc_space(epf, size, edma->bar,
> > +				   epc_features, PRIMARY_INTERFACE);
> > +	if (!info) {
> > +		ret = -ENOMEM;
> > +		goto err;
> > +	}
> > +	memset(info, 0, size);
> > +
> > +	off = PAGE_SIZE;
> > +	info->magic = cpu_to_le32(PCITEST_EDMA_INFO_MAGIC);
> > +	info->version = cpu_to_le32(PCITEST_EDMA_INFO_VERSION);
> > +
> > +	info->reg_off = cpu_to_le32(off);
> > +	info->reg_size = cpu_to_le32(reg_size);
> > +	off += reg_sz_aligned;
> > +
> > +	info->ll_rd_phys = cpu_to_le64(ll_rd_phys);
> > +	info->ll_rd_off = cpu_to_le32(off);
> > +	info->ll_rd_size = cpu_to_le32(ll_rd_size);
> > +	off += ll_rd_sz_aligned;
> > +
> > +	info->ll_wr_phys = cpu_to_le64(ll_wr_phys);
> > +	info->ll_wr_off = cpu_to_le32(off);
> > +	info->ll_wr_size = cpu_to_le32(ll_wr_size);
> > +	off += ll_wr_sz_aligned;
> > +
> > +	info->test_buf_phys = cpu_to_le64(edma->test_buf_phys);
> > +	info->test_buf_size = cpu_to_le32(edma->test_buf_size);
> > +
> > +	edma->info = info;
> > +	edma->reg_submap_sz = reg_sz_aligned;
> > +	edma->ll_rd_phys = ll_rd_phys;
> > +	edma->ll_wr_phys = ll_wr_phys;
> > +	edma->ll_rd_sz_aligned = ll_rd_sz_aligned;
> > +	edma->ll_wr_sz_aligned = ll_wr_sz_aligned;
> > +
> > +	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no,
> > +			      &epf->bar[edma->bar]);
> > +	if (ret) {
> > +		dev_err(dev,
> > +			"failed to init BAR%d for remote eDMA: %d\n",
> > +			edma->bar, ret);
> > +		goto err;
> > +	}
> > +	dev_info(dev, "BAR%d initialized for remote eDMA\n", edma->bar);
> > +
> > +	return 0;
> > +
> > +err:
> > +	pci_epf_test_clean_remote_edma(test);
> > +	devm_kfree(&epf->dev, edma);
> > +	test->data = NULL;
> > +	return ret;
> > +}
> > +
> > +static int pci_epf_test_map_remote_edma(struct pci_epf_test *test)
> > +{
> > +	struct pci_epf_test_edma *edma = test->data;
> > +	struct pcitest_edma_info *info;
> > +	struct pci_epf *epf = test->epf;
> > +	struct pci_epc *epc = epf->epc;
> > +	struct pci_epf_bar *bar;
> > +	enum pci_barno barno;
> > +	struct device *dev = epc->dev.parent;
> > +	int ret;
> > +
> > +	if (!edma)
> > +		return -ENODEV;
> > +
> > +	info = edma->info;
> > +	barno = edma->bar;
> > +
> > +	if (barno == NO_BAR)
> > +		return -ENOSPC;
> > +	if (!info || !edma->test_buf)
> > +		return -ENODEV;
> > +
> > +	bar = &epf->bar[barno];
> > +	pci_epf_test_clear_submaps(bar);
> > +
> > +	ret = pci_epf_test_add_submap(bar, bar->phys_addr, PAGE_SIZE);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = pci_epf_test_add_submap(bar, edma->reg_phys, edma->reg_submap_sz);
> > +	if (ret)
> > +		goto err_submap;
> > +
> > +	ret = pci_epf_test_add_submap(bar, edma->ll_rd_phys,
> > +				      edma->ll_rd_sz_aligned);
> > +	if (ret)
> > +		goto err_submap;
> > +
> > +	ret = pci_epf_test_add_submap(bar, edma->ll_wr_phys,
> > +				      edma->ll_wr_sz_aligned);
> > +	if (ret)
> > +		goto err_submap;
> > +
> > +	if (bar->size > edma->total_size) {
> > +		ret = pci_epf_test_add_submap(bar, 0,
> > +					      bar->size - edma->total_size);
> > +		if (ret)
> > +			goto err_submap;
> > +	}
> > +
> > +	ret = pci_epc_set_bar(epc, epf->func_no, epf->vfunc_no, bar);
> > +	if (ret) {
> > +		dev_err(dev, "failed to map BAR%d: %d\n", barno, ret);
> > +		goto err_submap;
> > +	}
> > +
> > +	/*
> > +	 * Endpoint-local interrupts must be ignored even if the host fails to
> > +	 * mask them.
> > +	 */
> > +	ret = dw_edma_chan_irq_config(test->dma_chan_tx, DW_EDMA_CH_IRQ_REMOTE);
> > +	if (ret) {
> > +		dev_err(dev, "failed to set irq mode for tx channel: %d\n",
> > +			ret);
> > +		goto err_bar;
> > +	}
> > +	ret = dw_edma_chan_irq_config(test->dma_chan_rx, DW_EDMA_CH_IRQ_REMOTE);
> > +	if (ret) {
> > +		dev_err(dev, "failed to set irq mode for rx channel: %d\n",
> > +			ret);
> > +		goto err_bar;
> > +	}
> > +
> > +	return 0;
> > +err_bar:
> > +	pci_epc_clear_bar(epc, epf->func_no, epf->vfunc_no, &epf->bar[barno]);
> > +err_submap:
> > +	pci_epf_test_clear_submaps(bar);
> > +	return ret;
> > +}
> > +
> > +static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
> > +					   struct pci_epf_test_reg *reg)
> > +{
> > +	struct pci_epf_test_edma *edma = epf_test->data;
> > +	size_t size = le32_to_cpu(reg->size);
> > +	void *buf;
> > +	int ret;
> > +
> > +	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
> > +		reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
> > +		return;
> > +	}
> > +
> > +	buf = edma->test_buf;
> > +
> > +	if (!edma->enabled) {
> > +		/* NB. Currently DW eDMA is the only supported backend */
> > +		ret = pci_epf_test_map_remote_edma(epf_test);
> > +		if (ret) {
> > +			WRITE_ONCE(reg->status,
> > +				   cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL));
> > +			return;
> > +		}
> > +		edma->enabled = true;
> > +	}
> > +
> > +	/* Populate the test buffer with random data */
> > +	get_random_bytes(buf, size);
> > +	reg->checksum = cpu_to_le32(crc32_le(~0, buf, size));
> > +
> > +	WRITE_ONCE(reg->status, cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_SUCCESS));
> > +}
> > +
> > +static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
> > +					      struct pci_epf_test_reg *reg)
> > +{
> > +	struct pci_epf_test_edma *edma = epf_test->data;
> > +	u32 status = le32_to_cpu(reg->status);
> > +	size_t size;
> > +	void *addr;
> > +	u32 crc32;
> > +
> > +	size = le32_to_cpu(reg->size);
> > +	if (!edma || !edma->test_buf || size > edma->test_buf_size) {
> > +		status |= STATUS_REMOTE_EDMA_CHECKSUM_FAIL;
> > +		reg->status = cpu_to_le32(status);
> > +		return;
> > +	}
> > +
> > +	addr = edma->test_buf;
> > +	crc32 = crc32_le(~0, addr, size);
> > +	status |= STATUS_REMOTE_EDMA_CHECKSUM_SUCCESS;
> > +
> > +	reg->checksum = cpu_to_le32(crc32);
> > +	reg->status = cpu_to_le32(status);
> > +}
> > +
> > +static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
> > +{
> > +	dw_edma_chan_irq_config(chan, DW_EDMA_CH_IRQ_DEFAULT);
> > +}
> > +#else
> > +static bool pci_epf_test_bar_is_reserved(struct pci_epf_test *test,
> > +					 enum pci_barno barno)
> > +{
> > +	return false;
> > +}
> > +
> > +static void pci_epf_test_clean_remote_edma(struct pci_epf_test *test)
> > +{
> > +}
> > +
> > +static int pci_epf_test_init_remote_edma(struct pci_epf_test *test)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +static void pci_epf_test_remote_edma_setup(struct pci_epf_test *epf_test,
> > +					   struct pci_epf_test_reg *reg)
> > +{
> > +	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_SETUP_FAIL);
> > +}
> > +
> > +static void pci_epf_test_remote_edma_checksum(struct pci_epf_test *epf_test,
> > +					      struct pci_epf_test_reg *reg)
> > +{
> > +	reg->status = cpu_to_le32(STATUS_REMOTE_EDMA_CHECKSUM_FAIL);
> > +}
> > +
> > +static void pci_epf_test_reset_dma_chan(struct dma_chan *chan)
> > +{
> > +}
> > +#endif
> > +
> >  static void pci_epf_test_dma_callback(void *param)
> >  {
> >  	struct pci_epf_test *epf_test = param;
> > @@ -168,6 +626,8 @@ static int pci_epf_test_data_transfer(struct pci_epf_test *epf_test,
> >  		return -EINVAL;
> >  	}
> >
> > +	pci_epf_test_reset_dma_chan(chan);
> > +
> >  	if (epf_test->dma_private) {
> >  		sconf.direction = dir;
> >  		if (dir == DMA_MEM_TO_DEV)
> > @@ -870,6 +1330,14 @@ static void pci_epf_test_cmd_handler(struct work_struct *work)
> >  		pci_epf_test_disable_doorbell(epf_test, reg);
> >  		pci_epf_test_raise_irq(epf_test, reg);
> >  		break;
> > +	case COMMAND_REMOTE_EDMA_SETUP:
> > +		pci_epf_test_remote_edma_setup(epf_test, reg);
> > +		pci_epf_test_raise_irq(epf_test, reg);
> > +		break;
> > +	case COMMAND_REMOTE_EDMA_CHECKSUM:
> > +		pci_epf_test_remote_edma_checksum(epf_test, reg);
> > +		pci_epf_test_raise_irq(epf_test, reg);
> > +		break;
> >  	default:
> >  		dev_err(dev, "Invalid command 0x%x\n", command);
> >  		break;
> > @@ -961,6 +1429,10 @@ static int pci_epf_test_epc_init(struct pci_epf *epf)
> >  	if (ret)
> >  		epf_test->dma_supported = false;
> >
> > +	ret = pci_epf_test_init_remote_edma(epf_test);
> > +	if (ret && ret != -EOPNOTSUPP)
> > +		dev_warn(dev, "Remote eDMA setup failed\n");
> > +
> >  	if (epf->vfunc_no <= 1) {
> >  		ret = pci_epc_write_header(epc, epf->func_no, epf->vfunc_no, header);
> >  		if (ret) {
> > @@ -1007,6 +1479,7 @@ static void pci_epf_test_epc_deinit(struct pci_epf *epf)
> >  	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
> >
> >  	cancel_delayed_work_sync(&epf_test->cmd_handler);
> > +	pci_epf_test_clean_remote_edma(epf_test);
> >  	pci_epf_test_clean_dma_chan(epf_test);
> >  	pci_epf_test_clear_bar(epf);
> >  }
> > @@ -1076,6 +1549,9 @@ static int pci_epf_test_alloc_space(struct pci_epf *epf)
> >  		if (bar == test_reg_bar)
> >  			continue;
> >
> > +		if (pci_epf_test_bar_is_reserved(epf_test, bar))
> > +			continue;
> > +
> >  		if (epc_features->bar[bar].type == BAR_FIXED)
> >  			test_reg_size = epc_features->bar[bar].fixed_size;
> >  		else
> > @@ -1146,6 +1622,7 @@ static void pci_epf_test_unbind(struct pci_epf *epf)
> >
> >  	cancel_delayed_work_sync(&epf_test->cmd_handler);
> >  	if (epc->init_complete) {
> > +		pci_epf_test_clean_remote_edma(epf_test);
> >  		pci_epf_test_clean_dma_chan(epf_test);
> >  		pci_epf_test_clear_bar(epf);
> >  	}
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()
  2026-01-19 20:30   ` Frank Li
@ 2026-01-22 14:58     ` Koichiro Den
  0 siblings, 0 replies; 68+ messages in thread
From: Koichiro Den @ 2026-01-22 14:58 UTC (permalink / raw)
  To: Frank Li
  Cc: dave.jiang, cassel, mani, kwilczynski, kishon, bhelgaas,
	geert+renesas, robh, vkoul, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Mon, Jan 19, 2026 at 03:30:16PM -0500, Frank Li wrote:
> On Sun, Jan 18, 2026 at 10:54:17PM +0900, Koichiro Den wrote:
> > For DMA API allocations and mappings, pci-epf-vntb should provide an
> > appropriate struct device for the NTB core/clients.
> >
> > Implement .get_dma_dev() and return the EPC parent device.
> 
> Simple said:
> 
> Implement .get_dma_dev() and return the EPC parent device for NTB
> core/client's DMA allocations and mappings API.

Thanks for pointing this out. That makes sense, I'll update the text as you
suggested.

Koichiro

> 
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 9fbc27000f77..7cd976757d15 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -1747,6 +1747,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
> >  	return 0;
> >  }
> >
> > +static struct device *vntb_epf_get_dma_dev(struct ntb_dev *ndev)
> > +{
> > +	struct epf_ntb *ntb = ntb_ndev(ndev);
> > +
> > +	if (!ntb || !ntb->epf)
> > +		return NULL;
> > +	return ntb->epf->epc->dev.parent;
> > +}
> > +
> >  static void *vntb_epf_get_private_data(struct ntb_dev *ndev)
> >  {
> >  	struct epf_ntb *ntb = ntb_ndev(ndev);
> > @@ -1780,6 +1789,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
> >  	.db_clear_mask		= vntb_epf_db_clear_mask,
> >  	.db_clear		= vntb_epf_db_clear,
> >  	.link_disable		= vntb_epf_link_disable,
> > +	.get_dma_dev		= vntb_epf_get_dma_dev,
> >  	.get_private_data	= vntb_epf_get_private_data,
> >  };
> >
> > --
> > 2.51.0
> >

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control
  2026-01-21 16:02       ` Vinod Koul
  2026-01-22  2:44         ` Koichiro Den
@ 2026-01-23 15:44         ` Frank Li
  1 sibling, 0 replies; 68+ messages in thread
From: Frank Li @ 2026-01-23 15:44 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Koichiro Den, dave.jiang, cassel, mani, kwilczynski, kishon,
	bhelgaas, geert+renesas, robh, jdmason, allenbh, jingoohan1,
	lpieralisi, linux-pci, linux-doc, linux-kernel, linux-renesas-soc,
	devicetree, dmaengine, iommu, ntb, netdev, linux-kselftest, arnd,
	gregkh, joro, will, robin.murphy, magnus.damm, krzk+dt, conor+dt,
	corbet, skhan, andriy.shevchenko, jbrunet, utkarsh02t

On Wed, Jan 21, 2026 at 09:32:37PM +0530, Vinod Koul wrote:
> On 19-01-26, 23:26, Koichiro Den wrote:
> > On Sun, Jan 18, 2026 at 12:03:19PM -0500, Frank Li wrote:
> > > On Sun, Jan 18, 2026 at 10:54:04PM +0900, Koichiro Den wrote:
> > > > DesignWare EP eDMA can generate interrupts both locally and remotely
> > > > (LIE/RIE). Remote eDMA users need to decide, per channel, whether
> > > > completions should be handled locally, remotely, or both. Unless
> > > > carefully configured, the endpoint and host would race to ack the
> > > > interrupt.
> > > >
> > > > Introduce a per-channel interrupt routing mode and export small APIs to
> > > > configure and query it. Update v0 programming so that RIE and local
> > > > done/abort interrupt masking follow the selected mode. The default mode
> > > > keeps the original behavior, so unless the new APIs are explicitly used,
> > > > no functional changes.
> > > >
> > > > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > > > ---
> > > >  drivers/dma/dw-edma/dw-edma-core.c    | 52 +++++++++++++++++++++++++++
> > > >  drivers/dma/dw-edma/dw-edma-core.h    |  2 ++
> > > >  drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
> > > >  include/linux/dma/edma.h              | 44 +++++++++++++++++++++++
> > > >  4 files changed, 116 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index b9d59c3c0cb4..059b3996d383 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -768,6 +768,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
> > > >  		chan->configured = false;
> > > >  		chan->request = EDMA_REQ_NONE;
> > > >  		chan->status = EDMA_ST_IDLE;
> > > > +		chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
> > > >
> > > >  		if (chan->dir == EDMA_DIR_WRITE)
> > > >  			chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
> > > > @@ -1062,6 +1063,57 @@ int dw_edma_remove(struct dw_edma_chip *chip)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(dw_edma_remove);
> > > >
> > > > +int dw_edma_chan_irq_config(struct dma_chan *dchan,
> > > > +			    enum dw_edma_ch_irq_mode mode)
> > > > +{
> > > > +	struct dw_edma_chan *chan;
> > > > +
> > > > +	switch (mode) {
> > > > +	case DW_EDMA_CH_IRQ_DEFAULT:
> > > > +	case DW_EDMA_CH_IRQ_LOCAL:
> > > > +	case DW_EDMA_CH_IRQ_REMOTE:
> > > > +		break;
> > > > +	default:
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	if (!dchan || !dchan->device)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > +	if (!chan)
> > > > +		return -ENODEV;
> > > > +
> > > > +	chan->irq_mode = mode;
> > > > +
> > > > +	dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
> > > > +		 str_write_read(chan->dir == EDMA_DIR_WRITE),
> > > > +		 chan->id, mode);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
> > > > +
> > > > +bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
> > > > +{
> > > > +	struct dw_edma_chan *chan;
> > > > +	struct dw_edma *dw;
> > > > +
> > > > +	if (!dchan || !dchan->device)
> > > > +		return false;
> > > > +
> > > > +	chan = dchan2dw_edma_chan(dchan);
> > > > +	if (!chan)
> > > > +		return false;
> > > > +
> > > > +	dw = chan->dw;
> > > > +	if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
> > > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
> > > > +	else
> > > > +		return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
> > > > +
> > > >  MODULE_LICENSE("GPL v2");
> > > >  MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
> > > >  MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> > > > index 71894b9e0b15..8458d676551a 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.h
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.h
> > > > @@ -81,6 +81,8 @@ struct dw_edma_chan {
> > > >
> > > >  	struct msi_msg			msi;
> > > >
> > > > +	enum dw_edma_ch_irq_mode	irq_mode;
> > > > +
> > > >  	enum dw_edma_request		request;
> > > >  	enum dw_edma_status		status;
> > > >  	u8				configured;
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > index 2850a9df80f5..80472148c335 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > > > @@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > > >  	for_each_set_bit(pos, &val, total) {
> > > >  		chan = &dw->chan[pos + off];
> > > >
> > > > -		dw_edma_v0_core_clear_done_int(chan);
> > > > -		done(chan);
> > > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > > +			dw_edma_v0_core_clear_done_int(chan);
> > > > +			done(chan);
> > > > +		}
> > > >
> > > >  		ret = IRQ_HANDLED;
> > > >  	}
> > > > @@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
> > > >  	for_each_set_bit(pos, &val, total) {
> > > >  		chan = &dw->chan[pos + off];
> > > >
> > > > -		dw_edma_v0_core_clear_abort_int(chan);
> > > > -		abort(chan);
> > > > +		if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
> > > > +			dw_edma_v0_core_clear_abort_int(chan);
> > > > +			abort(chan);
> > > > +		}
> > > >
> > > >  		ret = IRQ_HANDLED;
> > > >  	}
> > > > @@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
> > > >  		j--;
> > > >  		if (!j) {
> > > >  			control |= DW_EDMA_V0_LIE;
> > > > -			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
> > > > +			if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
> > > > +			    chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
> > > >  				control |= DW_EDMA_V0_RIE;
> > > >  		}
> > > >
> > > > @@ -408,12 +413,17 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > > >  				break;
> > > >  			}
> > > >  		}
> > > > -		/* Interrupt unmask - done, abort */
> > > > +		/* Interrupt mask/unmask - done, abort */
> > > >  		raw_spin_lock_irqsave(&dw->lock, flags);
> > > >
> > > >  		tmp = GET_RW_32(dw, chan->dir, int_mask);
> > > > -		tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > -		tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> > > > +			tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > +			tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		} else {
> > > > +			tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > > > +			tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > > > +		}
> > > >  		SET_RW_32(dw, chan->dir, int_mask, tmp);
> > > >  		/* Linked list error */
> > > >  		tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> > > > diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
> > > > index ffad10ff2cd6..6f50165ac084 100644
> > > > --- a/include/linux/dma/edma.h
> > > > +++ b/include/linux/dma/edma.h
> > > > @@ -60,6 +60,23 @@ enum dw_edma_chip_flags {
> > > >  	DW_EDMA_CHIP_LOCAL	= BIT(0),
> > > >  };
> > > >
> > > > +/*
> > > > + * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
> > > > + * @DW_EDMA_CH_IRQ_DEFAULT:   LIE=1/RIE=1, local interrupt unmasked
> > > > + * @DW_EDMA_CH_IRQ_LOCAL:     LIE=1/RIE=0
> > > > + * @DW_EDMA_CH_IRQ_REMOTE:    LIE=1/RIE=1, local interrupt masked
> > > > + *
> > > > + * Some implementations require using LIE=1/RIE=1 with the local interrupt
> > > > + * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
> > > > + * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
> > > > + * Write Interrupt Generation".
> > > > + */
> > > > +enum dw_edma_ch_irq_mode {
> > > > +	DW_EDMA_CH_IRQ_DEFAULT	= 0,
> > > > +	DW_EDMA_CH_IRQ_LOCAL,
> > > > +	DW_EDMA_CH_IRQ_REMOTE,
> > > > +};
> > > > +
> > > >  /**
> > > >   * struct dw_edma_chip - representation of DesignWare eDMA controller hardware
> > > >   * @dev:		 struct device of the eDMA controller
> > > > @@ -105,6 +122,22 @@ struct dw_edma_chip {
> > > >  #if IS_REACHABLE(CONFIG_DW_EDMA)
> > > >  int dw_edma_probe(struct dw_edma_chip *chip);
> > > >  int dw_edma_remove(struct dw_edma_chip *chip);
> > > > +/**
> > > > + * dw_edma_chan_irq_config - configure per-channel interrupt routing
> > > > + * @chan: DMA channel obtained from dma_request_channel()
> > > > + * @mode: interrupt routing mode
> > > > + *
> > > > + * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
> > > > + * not belong to the DesignWare eDMA driver.
> > > > + */
> > > > +int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > > +			    enum dw_edma_ch_irq_mode mode);
> > > > +
> > > > +/**
> > > > + * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
> > > > + * @chan: DMA channel obtained from dma_request_channel()
> > > > + */
> > > > +bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
> > > >  #else
> > > >  static inline int dw_edma_probe(struct dw_edma_chip *chip)
> > > >  {
> > > > @@ -115,6 +148,17 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
> > > >  {
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
> > > > +					  enum dw_edma_ch_irq_mode mode)
> > > > +{
> > > > +	return -ENODEV;
> > > > +}
> > > > +
> > > > +static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
> > > > +{
> > > > +	return false;
> > > > +}
> > >
> > > I think it'd better go thought
> > >
> > > struct dma_slave_config {
> > > 	...
> > >         void *peripheral_config;
> > > 	size_t peripheral_size;
> > >
> > > };
> > >
> > > So DMA consumer can use standard DMAengine API, dmaengine_slave_config().
> >
> > Using .peripheral_config wasn't something I had initially considered, but I
> > agree that this is preferable in the sense that it avoids introducing the
> > additional exported APIs. I'm not entirely sure whether it's clean to use
> > it for non-peripheral settings in the strict sense, but there seem to be
> > precedents such as stm32_mdma_dma_config, so I guess it seems acceptable.
> > If I'm missing something, please correct me.
>
> Strictly speaking slave config should be used for peripheral transfers.
> For memcpy users (this seems more like that), I would argue slave config
> does not make much sense.

It is not memcpy because one side address is not visible. It is really
hard to define the difference between memcpy and slave transfer because
- some slave transfer use MMIO, both side address increase, not like
tranditional FIFO. it makes more like memcpy.
- althougth it look like memcpy, but some slave have limitation, like need
4 bytpe alignment, there are limitation about burst length each time.
Generally, memcpy have not such limitation (except from dmaengine itself).
- slave address may not visialable at one side system.

So dw-edma don't use prep_memcpy, which use prep_sg to do data transfer.

Frank
>
> --
> ~Vinod

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2026-01-23 15:44 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-18 13:54 [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 01/38] dmaengine: dw-edma: Export helper to get integrated register window Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 02/38] dmaengine: dw-edma: Add per-channel interrupt routing control Koichiro Den
2026-01-18 17:03   ` Frank Li
2026-01-19 14:26     ` Koichiro Den
2026-01-21 16:02       ` Vinod Koul
2026-01-22  2:44         ` Koichiro Den
2026-01-23 15:44         ` Frank Li
2026-01-18 13:54 ` [RFC PATCH v4 03/38] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 04/38] dmaengine: dw-edma: Add notify-only channels support Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 05/38] dmaengine: dw-edma: Add a helper to query linked-list region Koichiro Den
2026-01-18 17:05   ` Frank Li
2026-01-21  1:38     ` Koichiro Den
2026-01-21  8:41       ` Koichiro Den
2026-01-21 15:24         ` Frank Li
2026-01-22  1:19           ` Koichiro Den
2026-01-22  1:54             ` Frank Li
2026-01-18 13:54 ` [RFC PATCH v4 06/38] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 07/38] NTB: epf: Reserve a subset of MSI vectors for non-NTB users Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 08/38] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
2026-01-19 20:03   ` Frank Li
2026-01-21  1:41     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 09/38] NTB: core: Add mw_set_trans_ranges() for subrange programming Koichiro Den
2026-01-19 20:07   ` Frank Li
2026-01-18 13:54 ` [RFC PATCH v4 10/38] NTB: core: Add .get_private_data() to ntb_dev_ops Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 11/38] NTB: core: Add .get_dma_dev() " Koichiro Den
2026-01-19 20:09   ` Frank Li
2026-01-21  1:44     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 12/38] NTB: core: Add driver_override support for NTB devices Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 13/38] PCI: endpoint: pci-epf-vntb: Support BAR subrange mappings for MWs Koichiro Den
2026-01-19 20:26   ` Frank Li
2026-01-21  2:08     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 14/38] PCI: endpoint: pci-epf-vntb: Implement .get_private_data() callback Koichiro Den
2026-01-19 20:27   ` Frank Li
2026-01-18 13:54 ` [RFC PATCH v4 15/38] PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev() Koichiro Den
2026-01-19 20:30   ` Frank Li
2026-01-22 14:58     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 16/38] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
2026-01-19 20:36   ` Frank Li
2026-01-21  2:15     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 17/38] NTB: ntb_transport: Dynamically determine qp count Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 18/38] NTB: ntb_transport: Use ntb_get_dma_dev() Koichiro Den
2026-01-19 20:38   ` Frank Li
2026-01-18 13:54 ` [RFC PATCH v4 19/38] NTB: ntb_transport: Rename ntb_transport.c to ntb_transport_core.c Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 20/38] NTB: ntb_transport: Move internal types to ntb_transport_internal.h Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 21/38] NTB: ntb_transport: Export common helpers for modularization Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 22/38] NTB: ntb_transport: Split core library and default NTB client Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 23/38] NTB: ntb_transport: Add transport backend infrastructure Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 24/38] NTB: ntb_transport: Run ntb_set_mw() before link-up negotiation Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 25/38] NTB: hw: Add remote eDMA backend registry and DesignWare backend Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 26/38] NTB: ntb_transport: Add remote embedded-DMA transport client Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 27/38] ntb_netdev: Multi-queue support Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 28/38] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 29/38] iommu: ipmmu-vmsa: Add support for reserved regions Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 30/38] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 31/38] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car) Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 32/38] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 33/38] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 34/38] Documentation: driver-api: ntb: Document remote embedded-DMA transport Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 35/38] PCI: endpoint: pci-epf-test: Add pci_epf_test_next_free_bar() helper Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 36/38] PCI: endpoint: pci-epf-test: Add remote eDMA-backed mode Koichiro Den
2026-01-19 20:47   ` Frank Li
2026-01-22 14:54     ` Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 37/38] misc: pci_endpoint_test: Add remote eDMA transfer test mode Koichiro Den
2026-01-18 13:54 ` [RFC PATCH v4 38/38] selftests: pci_endpoint: Add remote eDMA transfer coverage Koichiro Den
2026-01-20 18:30 ` [RFC PATCH v4 00/38] NTB transport backed by PCI EP embedded DMA Dave Jiang
2026-01-20 18:47   ` Dave Jiang
2026-01-21  2:40     ` Koichiro Den

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox