All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD
@ 2026-05-18 18:18 Raghavendra Ningoji
  2026-05-21 14:28 ` David Marchand
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
  0 siblings, 2 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-05-18 18:18 UTC (permalink / raw)
  To: dev; +Cc: thomas, rjarry, Bhagyada.Modali, Selwin.Sebastian,
	Raghavendra Ningoji

Add a new dmadev poll-mode driver for the AMD AE4DMA hardware DMA
engine.  An AE4DMA engine exposes 16 hardware command queues, each
with a 32-entry descriptor ring; the PMD maps each hardware channel
to its own dmadev with a single virtual channel, so a PCI function
appears as 16 dmadevs named "<pci-bdf>-ch0" .. "<pci-bdf>-ch15".

Driver characteristics:

 - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
 - Completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure.
 - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
   and HALTED_ERROR when the queue is not enabled.
 - depends on bus_pci and dmadev.

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  75 +++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 742 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 164 ++++++
 drivers/dma/ae4dma/ae4dma_internal.h   | 117 ++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 10 files changed, 1123 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 9143d028bc..0b5a6e08d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
 DMAdev Drivers
 --------------
 
+AMD AE4DMA
+M: Bhagyada Modali <Bhagyada.Modali@amd.com>
+F: drivers/dma/ae4dma/
+F: doc/guides/dmadevs/ae4dma.rst
+
 Intel IDXD - EXPERIMENTAL
 M: Bruce Richardson <bruce.richardson@intel.com>
 M: Kevin Laatz <kevin.laatz@intel.com>
diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
new file mode 100644
index 0000000000..37a2096ccf
--- /dev/null
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -0,0 +1,75 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2025 Advanced Micro Devices, Inc.
+
+.. include:: <isonum.txt>
+
+AMD AE4DMA DMA Device Driver
+============================
+
+The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
+AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
+hardware command queues, each with a ring of 32 descriptors. The PMD
+maps each hardware command queue to a separate DPDK dmadev with a
+single virtual channel, so a single PCI function appears as 16 dmadevs
+named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
+
+The driver supports memory-to-memory copy operations only.
+
+Hardware Requirements
+---------------------
+
+The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
+the system::
+
+   dpdk-devbind.py --status-dev dma
+
+AE4DMA devices appear with vendor ID ``0x1022`` and device ID
+``0x149b``.
+
+Compilation
+-----------
+
+The driver is built as part of the standard DPDK build on x86 platforms
+using ``meson`` and ``ninja``; no extra configuration is required.
+
+Device Setup
+------------
+
+The AE4DMA device must be bound to a DPDK-compatible kernel module such
+as ``vfio-pci`` before it can be used::
+
+   dpdk-devbind.py -b vfio-pci <pci-bdf>
+
+Initialization
+~~~~~~~~~~~~~~
+
+On probe the PMD performs the following steps for each PCI function:
+
+* Reads BAR0 and programs the common configuration register with the
+  number of hardware queues to enable (16).
+* For each hardware queue it allocates a 32-entry descriptor ring in
+  IOVA-contiguous memory, programs the queue base address and ring
+  depth into the per-queue registers, and enables the queue.
+* Interrupts are masked; completion is polled by the application.
+
+Usage
+-----
+
+Once a dmadev has been started, copies are submitted with
+``rte_dma_copy()`` and completions are reaped with ``rte_dma_completed()``
+or ``rte_dma_completed_status()``. See the
+:ref:`Enqueue / Dequeue API <dmadev_enqueue_dequeue>` section of the
+dmadev library documentation for details.
+
+Limitations
+-----------
+
+* Only memory-to-memory copies are supported. Fill, scatter-gather and
+  any other operation types are not advertised in
+  ``rte_dma_info::dev_capa``.
+* The maximum number of descriptors per virtual channel is fixed by
+  hardware at 32. The PMD rounds the requested ring size up to a
+  power of two and clamps it to 32.
+* Only a single virtual channel per dmadev is supported; use the 16
+  per-PCI-function dmadevs to obtain channel-level parallelism.
+* Interrupt-driven completion is not supported.
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 56beb1733f..97399590f6 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -11,6 +11,7 @@ an application through DMA API.
    :maxdepth: 1
    :numbered:
 
+   ae4dma
    cnxk
    dpaa
    dpaa2
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index f012d47a4b..9a78a7ef62 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -63,6 +63,13 @@ New Features
     ``rte_eal_init`` and the application is responsible for probing each device,
   * ``--auto-probing`` enables the initial bus probing, which is the current default behavior.
 
+* **Added AMD AE4DMA DMA PMD.**
+
+  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
+  Each PCI function exposes 16 hardware command queues; the PMD registers one
+  dmadev per channel with a single virtual channel and supports
+  memory-to-memory copy operations.
+
 
 Removed Items
 -------------
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
new file mode 100644
index 0000000000..eb6ea88f55
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -0,0 +1,742 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021-2025 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <rte_bus_pci.h>
+#include <bus_pci_driver.h>
+#include <rte_dmadev_pmd.h>
+#include <rte_malloc.h>
+
+#include "ae4dma_internal.h"
+
+/*
+ * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
+ * virtual channel. The HW's per-queue register block must be densely
+ * packed right after the engine-common config register at BAR0+0; the
+ * build-time check below catches an accidental layout change.
+ */
+static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
+		"ae4dma_hwq_regs stride changed; per-queue offset math will break");
+
+RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
+
+#define AE4DMA_PMD_NAME dmadev_ae4dma
+#define AE4DMA_PMD_NAME_STR RTE_STR(AE4DMA_PMD_NAME)
+
+static const struct rte_memzone *
+ae4dma_queue_dma_zone_reserve(const char *queue_name,
+		uint32_t queue_size, int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != 0) {
+		if (((size_t)queue_size <= mz->len) &&
+				((socket_id == SOCKET_ID_ANY) ||
+				 (socket_id == mz->socket_id))) {
+			AE4DMA_PMD_INFO("re-use memzone already "
+					"allocated for %s", queue_name);
+			return mz;
+		}
+		AE4DMA_PMD_ERR("Incompatible memzone already "
+				"allocated %s, size %u, socket %d. "
+				"Requested size %u, socket %u",
+				queue_name, (uint32_t)mz->len,
+				mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+/* Configure a device. */
+static int
+ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
+		const struct rte_dma_conf *dev_conf,
+		uint32_t conf_sz)
+{
+	if (sizeof(struct rte_dma_conf) != conf_sz)
+		return -EINVAL;
+
+	if (dev_conf->nb_vchans != 1)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
+static int
+ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t max_desc = qconf->nb_desc;
+
+	if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
+		return -EINVAL;
+
+	if (max_desc < 2)
+		return -EINVAL;
+
+	if (!rte_is_power_of_2(max_desc))
+		max_desc = rte_align32pow2(max_desc);
+
+	if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
+		AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
+				dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
+		max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	}
+
+	cmd_q->qcfg = *qconf;
+	cmd_q->qcfg.nb_desc = max_desc;
+
+	/* Ensure all counters are reset, if reconfiguring/restarting device. */
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+/* Start a configured device. */
+static int
+ae4dma_dev_start(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	if (nb == 0)
+		return -EBUSY;
+
+	/* Program ring depth expected by hardware. */
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
+	return 0;
+}
+
+/* Stop a configured device. */
+static int
+ae4dma_dev_stop(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+	return 0;
+}
+
+/* Get device information of a device. */
+static int
+ae4dma_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info,
+		uint32_t size)
+{
+	if (size < sizeof(*info))
+		return -EINVAL;
+	info->dev_name = dev->device->name;
+	info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
+	info->max_vchans = 1;
+	info->min_desc = 2;
+	info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	info->nb_vchans = 1;
+	return 0;
+}
+
+/* Close a configured device. */
+static int
+ae4dma_dev_close(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+
+	if (cmd_q->memz_name[0] != '\0') {
+		const struct rte_memzone *mz = rte_memzone_lookup(cmd_q->memz_name);
+
+		if (mz != NULL)
+			rte_memzone_free(mz);
+	}
+	cmd_q->qbase_desc = NULL;
+	cmd_q->qbase_addr = NULL;
+	cmd_q->qbase_phys_addr = 0;
+	return 0;
+}
+
+/* trigger h/w to process enqued desc:doorbell - by next_write */
+static inline void
+__submit(struct ae4dma_dmadev *ae4dma)
+{
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
+	if (nb != 0)
+		cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - cmd_q->last_write +
+				nb) % nb);
+	cmd_q->last_write = cmd_q->next_write;
+}
+
+static int
+ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+
+	__submit(ae4dma);
+	return 0;
+}
+
+/* Write descriptor for enqueue (copy only). */
+static inline int
+__write_desc_copy(void *dev_private, rte_iova_t src, phys_addr_t dst,
+		uint32_t len, uint64_t flags)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	struct ae4dma_desc *dma_desc;
+	uint16_t ret;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t write = cmd_q->next_write;
+
+	if (nb == 0)
+		return -EINVAL;
+
+	/* Reserve one slot to distinguish full from empty (power-of-two ring). */
+	if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
+		return -ENOSPC;
+
+	dma_desc = &cmd_q->qbase_desc[write];
+	memset(dma_desc, 0, sizeof(*dma_desc));
+	dma_desc->length = len;
+	dma_desc->src_hi = upper_32_bits(src);
+	dma_desc->src_lo = lower_32_bits(src);
+	dma_desc->dst_hi = upper_32_bits(dst);
+	dma_desc->dst_lo = lower_32_bits(dst);
+	cmd_q->ring_buff_count++;
+	cmd_q->next_write = (uint16_t)((write + 1) % nb);
+	ret = write;
+	if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+		__submit(ae4dma);
+	return ret;
+}
+
+/* Enqueue a copy operation onto the ae4dma device. */
+static int
+ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
+		rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+	return __write_desc_copy(dev_private, src, dst, length, flags);
+}
+
+/* Dump DMA device info. */
+static int
+ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q;
+	void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
+
+	cmd_q = &ae4dma->cmd_q;
+	fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
+	fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
+	fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
+	fprintf(f, "mmio_base_addr	= %p\n", ae4dma_mmio_base_addr);
+	fprintf(f, "queues per ae4dma engine     = %d\n", AE4DMA_READ_REG_OFFSET(
+				ae4dma_mmio_base_addr, AE4DMA_COMMON_CONFIG_OFFSET));
+	fprintf(f, "== Private Data ==\n");
+	fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
+	fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
+			(void *)cmd_q->qbase_desc,
+			(uint64_t)cmd_q->qbase_phys_addr);
+	fprintf(f, "  Next write: %u\n", cmd_q->next_write);
+	fprintf(f, "  Next read: %u\n", cmd_q->next_read);
+	fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
+	fprintf(f, "  }\n");
+	fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", failed: %" PRIu64 " }\n",
+		cmd_q->stats.submitted,
+		cmd_q->stats.completed,
+		cmd_q->stats.errors);
+	return 0;
+}
+
+/* Translates AE4DMA ChanERRs to DMA error codes. */
+static inline enum rte_dma_status_code
+__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
+{
+	AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
+
+	switch (status) {
+	case AE4DMA_DMA_ERR_NO_ERR:
+		return RTE_DMA_STATUS_SUCCESSFUL;
+	case AE4DMA_DMA_ERR_INV_LEN:
+		return RTE_DMA_STATUS_INVALID_LENGTH;
+	case AE4DMA_DMA_ERR_INV_SRC:
+		return RTE_DMA_STATUS_INVALID_SRC_ADDR;
+	case AE4DMA_DMA_ERR_INV_DST:
+		return RTE_DMA_STATUS_INVALID_DST_ADDR;
+	case AE4DMA_DMA_ERR_INV_ALIGN:
+		/* Name matches DPDK public enum spelling. */
+		return RTE_DMA_STATUS_DATA_POISION;
+	case AE4DMA_DMA_ERR_INV_HEADER:
+	case AE4DMA_DMA_ERR_INV_STATUS:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	default:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	}
+}
+
+/*
+ * Scan HW queue for completed descriptors (non-blocking).
+ *
+ * The AE4DMA engine signals completion by advancing the per-queue
+ * `read_idx` register; it does not (reliably) write a status value
+ * back into the descriptor. We therefore use the HW `read_idx`
+ * register as the source of truth and only inspect the descriptor's
+ * `dw1.err_code` byte to classify each completion as success or
+ * failure.
+ *
+ * @param cmd_q
+ *   The AE4DMA command queue.
+ * @param max_ops
+ *   Maximum descriptors to process this call.
+ * @param[out] failed_count
+ *   Number of completed descriptors that did not report success.
+ * @return
+ *   Number of descriptors completed (success + failure), <= max_ops.
+ */
+static inline uint16_t
+ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
+		uint16_t *failed_count)
+{
+	volatile struct ae4dma_desc *hw_desc;
+	uint16_t events_count = 0, fails = 0;
+	uint16_t tail;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t hw_read_idx;
+	uint16_t in_flight;
+	uint16_t scan_cap;
+
+	if (nb == 0 || cmd_q->ring_buff_count == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+	mask = nb - 1;
+
+	hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & mask);
+	tail = cmd_q->next_read;
+
+	/*
+	 * Descriptors completed since our last visit live in the
+	 * half-open ring range [tail, hw_read_idx). If HW hasn't
+	 * moved we have nothing to do.
+	 */
+	in_flight = (uint16_t)((hw_read_idx - tail) & mask);
+	if (in_flight == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+
+	scan_cap = max_ops;
+	if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
+		scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	if (scan_cap > in_flight)
+		scan_cap = in_flight;
+	if (scan_cap > cmd_q->ring_buff_count)
+		scan_cap = (uint16_t)cmd_q->ring_buff_count;
+
+	while (events_count < scan_cap) {
+		uint8_t hw_status;
+		uint8_t hw_err;
+
+		hw_desc = &cmd_q->qbase_desc[tail];
+		hw_status = hw_desc->dw1.status;
+		hw_err = hw_desc->dw1.err_code;
+
+		/*
+		 * read_idx advancing is the definitive completion
+		 * signal. The per-descriptor status byte is informational
+		 * and may not yet be written when we observe it:
+		 *
+		 *   AE4DMA_DMA_DESC_ERROR (4)
+		 *     Hard failure - err_code names the precise cause.
+		 *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
+		 *     Success.
+		 *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
+		 *     Benign race: HW had not finished updating the
+		 *     status byte at the instant we read it. Since
+		 *     read_idx has moved past this slot, treat it as
+		 *     success unless err_code says otherwise.
+		 *
+		 * A non-zero err_code is treated as a failure regardless
+		 * of the observed status value.
+		 */
+		if (hw_status == AE4DMA_DMA_DESC_ERROR ||
+				hw_err != AE4DMA_DMA_ERR_NO_ERR) {
+			fails++;
+			AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
+					hw_status, hw_err);
+		}
+		cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
+		cmd_q->ring_buff_count--;
+		events_count++;
+		tail = (tail + 1) & mask;
+	}
+
+	cmd_q->stats.completed += events_count;
+	cmd_q->stats.errors += fails;
+	cmd_q->next_read = tail;
+	*failed_count = fails;
+	return events_count;
+}
+
+/* Returns successful operations count and sets error flag if any errors. */
+static uint16_t
+ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
+		const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count, sl_count;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	*has_error = false;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	sl_count = cpl_count - err_count;
+	if (err_count)
+		*has_error = true;
+
+	return sl_count;
+}
+
+static uint16_t
+ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
+		uint16_t max_ops, uint16_t *last_idx,
+		enum rte_dma_status_code *status)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count;
+	uint16_t i;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	if (likely(err_count == 0)) {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = RTE_DMA_STATUS_SUCCESSFUL;
+	} else {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = __translate_status_ae4dma_to_dma(cmd_q->status[i]);
+	}
+
+	return cpl_count;
+}
+
+/* Get the remaining capacity of the ring. */
+static uint16_t
+ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
+{
+	const struct ae4dma_dmadev *ae4dma = dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t read_idx = cmd_q->next_read;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t used;
+
+	if (nb < 2 || !rte_is_power_of_2(nb))
+		return 0;
+
+	mask = nb - 1;
+	used = (uint16_t)((write_idx - read_idx) & mask);
+	/* One slot reserved (same rule as enqueue). */
+	if (used >= nb - 1)
+		return 0;
+	return (uint16_t)(nb - 1 - used);
+}
+
+/* Retrieve the generic stats of a DMA device. */
+static int
+ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		struct rte_dma_stats *rte_stats, uint32_t size)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	const struct rte_dma_stats *stats = &cmd_q->stats;
+
+	if (size < sizeof(*rte_stats))
+		return -EINVAL;
+	if (rte_stats == NULL)
+		return -EINVAL;
+
+	*rte_stats = *stats;
+	return 0;
+}
+
+/* Reset the generic stat counters for the DMA device. */
+static int
+ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+/*
+ * Report channel state to the dmadev framework.
+ *
+ *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
+ *                                stopped via dev_stop()).
+ *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
+ *                                no descriptors in flight.
+ *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
+ */
+static int
+ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		enum rte_dma_vchan_status *status)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint32_t ctrl, hw_read, hw_write;
+
+	if (cmd_q->hwq_regs == NULL) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
+	if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+
+	*status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
+					: RTE_DMA_VCHAN_ACTIVE;
+	return 0;
+}
+
+static int
+ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
+{
+	uint32_t dma_addr_lo, dma_addr_hi;
+	struct ae4dma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	dev->io_regs = dev->pci->mem_resource[AE4DMA_PCIE_BAR].addr;
+
+	cmd_q = &dev->cmd_q;
+	cmd_q->id = qn;
+	cmd_q->qidx = 0;
+	cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
+	cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
+
+	/*
+	 * Memzone name must be globally unique. Embed PCI BDF so multiple
+	 * PCI functions probed concurrently don't collide.
+	 */
+	snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			"ae4dma_%s_q%u", pci_name, (unsigned int)qn);
+
+	q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
+			cmd_q->qsize, rte_socket_id());
+	if (q_mz == NULL) {
+		AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
+		return -ENOMEM;
+	}
+
+	cmd_q->qbase_addr = (void *)q_mz->addr;
+	cmd_q->qbase_desc = (struct ae4dma_desc *)q_mz->addr;
+	cmd_q->qbase_phys_addr = q_mz->iova;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+			AE4DMA_CMD_QUEUE_ENABLE);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
+			AE4DMA_DISABLE_INTR);
+	cmd_q->next_write = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+	cmd_q->next_read = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	cmd_q->ring_buff_count = 0;
+
+	dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
+	dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
+
+	return 0;
+}
+
+static void
+ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
+		unsigned int ch)
+{
+	snprintf(out, outlen, "%s-ch%u", pci_name, ch);
+}
+
+/* Create a dmadev(dpdk DMA device) */
+static int
+ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
+{
+	static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
+		.dev_close = ae4dma_dev_close,
+		.dev_configure = ae4dma_dev_configure,
+		.dev_dump = ae4dma_dev_dump,
+		.dev_info_get = ae4dma_dev_info_get,
+		.dev_start = ae4dma_dev_start,
+		.dev_stop = ae4dma_dev_stop,
+		.stats_get = ae4dma_stats_get,
+		.stats_reset = ae4dma_stats_reset,
+		.vchan_status = ae4dma_vchan_status,
+		.vchan_setup = ae4dma_vchan_setup,
+	};
+
+	struct rte_dma_dev *dmadev = NULL;
+	struct ae4dma_dmadev *ae4dma = NULL;
+	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
+
+	if (!name) {
+		AE4DMA_PMD_ERR("Invalid name of the device!");
+		return -EINVAL;
+	}
+	memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
+	ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
+
+	dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
+			sizeof(struct ae4dma_dmadev));
+	if (dmadev == NULL) {
+		AE4DMA_PMD_ERR("Unable to allocate dma device");
+		return -ENOMEM;
+	}
+	dmadev->device = &dev->device;
+	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+	dmadev->dev_ops = &ae4dma_dmadev_ops;
+
+	dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
+	dmadev->fp_obj->completed = ae4dma_completed;
+	dmadev->fp_obj->completed_status = ae4dma_completed_status;
+	dmadev->fp_obj->copy = ae4dma_enqueue_copy;
+	dmadev->fp_obj->submit = ae4dma_submit;
+	/* fill capability not advertised: leave fp_obj->fill as zero-initialised. */
+
+	ae4dma = dmadev->data->dev_private;
+	ae4dma->dmadev = dmadev;
+	ae4dma->pci = dev;
+
+	if (ae4dma_add_queue(ae4dma, qn, name) != 0)
+		goto init_error;
+	return 0;
+
+init_error:
+	AE4DMA_PMD_ERR("driver %s(): failed", __func__);
+	rte_dma_pmd_release(hwq_dev_name);
+	return -EFAULT;
+}
+
+/* Probe DMA device. */
+static int
+ae4dma_dmadev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	void *mmio_base;
+	uint32_t q_per_eng;
+	int ret = 0;
+	uint8_t i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+	dev->device.driver = &drv->driver;
+
+	mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
+	if (mmio_base == NULL) {
+		AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
+		return -ENODEV;
+	}
+
+	/* Program the per-engine HW queue count once. */
+	AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
+			AE4DMA_MAX_HW_QUEUES);
+	q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
+	AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ret = ae4dma_dmadev_create(name, dev, i);
+		if (ret != 0) {
+			AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
+			while (i > 0) {
+				i--;
+				ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+				rte_dma_pmd_release(chname);
+			}
+			break;
+		}
+	}
+	return ret;
+}
+
+/* Remove DMA device. */
+static int
+ae4dma_dmadev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	unsigned int i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+
+	AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+		rte_dma_pmd_release(chname);
+	}
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_ae4dma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver ae4dma_pmd_drv = {
+	.id_table = pci_id_ae4dma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = ae4dma_dmadev_probe,
+	.remove = ae4dma_dmadev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
+RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
new file mode 100644
index 0000000000..235819778e
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __AE4DMA_HW_DEFS_H__
+#define __AE4DMA_HW_DEFS_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_memzone.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define AE4DMA_BIT(nr)			(1UL << (nr))
+
+#define AE4DMA_BITS_PER_LONG	(__SIZEOF_LONG__ * 8)
+#define AE4DMA_GENMASK(h, l) \
+	(((~0UL) << (l)) & (~0UL >> (AE4DMA_BITS_PER_LONG - 1 - (h))))
+
+/* ae4dma device details */
+#define AMD_VENDOR_ID	0x1022
+#define AE4DMA_DEVICE_ID	0x149b
+#define AE4DMA_PCIE_BAR 0
+
+/*
+ * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
+ */
+#define AE4DMA_MAX_HW_QUEUES        16
+#define AE4DMA_QUEUE_START_INDEX    0
+#define AE4DMA_CMD_QUEUE_ENABLE		0x1
+#define AE4DMA_CMD_QUEUE_DISABLE	0x0
+
+/* Common to all queues */
+#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
+
+#define AE4DMA_DISABLE_INTR 0x01
+
+/* Descriptor status */
+enum ae4dma_dma_status {
+	AE4DMA_DMA_DESC_SUBMITTED = 0,
+	AE4DMA_DMA_DESC_VALIDATED = 1,
+	AE4DMA_DMA_DESC_PROCESSED = 2,
+	AE4DMA_DMA_DESC_COMPLETED = 3,
+	AE4DMA_DMA_DESC_ERROR = 4,
+};
+
+/* Descriptor error-code */
+enum ae4dma_dma_err {
+	AE4DMA_DMA_ERR_NO_ERR = 0,
+	AE4DMA_DMA_ERR_INV_HEADER = 1,
+	AE4DMA_DMA_ERR_INV_STATUS = 2,
+	AE4DMA_DMA_ERR_INV_LEN = 3,
+	AE4DMA_DMA_ERR_INV_SRC = 4,
+	AE4DMA_DMA_ERR_INV_DST = 5,
+	AE4DMA_DMA_ERR_INV_ALIGN = 6,
+	AE4DMA_DMA_ERR_UNKNOWN = 7,
+};
+
+/* HW Queue status */
+enum ae4dma_hwqueue_status {
+	AE4DMA_HWQUEUE_EMPTY = 0,
+	AE4DMA_HWQUEUE_FULL = 1,
+	AE4DMA_HWQUEUE_NOT_EMPTY = 4
+};
+/*
+ * descriptor for AE4DMA commands
+ * 8 32-bit words:
+ * word 0: source memory type; destination memory type ; control bits
+ * word 1: desc_id; error code; status
+ * word 2: length
+ * word 3: reserved
+ * word 4: upper 32 bits of source pointer
+ * word 5: low 32 bits of source pointer
+ * word 6: upper 32 bits of destination pointer
+ * word 7: low 32 bits of destination pointer
+ */
+
+/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
+#define AE4DMA_DWORD0_STOP_ON_COMPLETION	AE4DMA_BIT(0)
+#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION	AE4DMA_BIT(1)
+#define AE4DMA_DWORD0_START_OF_MESSAGE		AE4DMA_BIT(3)
+#define AE4DMA_DWORD0_END_OF_MESSAGE		AE4DMA_BIT(4)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE	AE4DMA_GENMASK(5, 4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE	AE4DMA_GENMASK(7, 6)
+
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
+
+struct ae4dma_desc_dword0 {
+	uint8_t byte0;
+	uint8_t byte1;
+	uint16_t timestamp;
+};
+
+struct ae4dma_desc_dword1 {
+	uint8_t status;
+	uint8_t err_code;
+	uint16_t desc_id;
+};
+
+struct ae4dma_desc {
+	struct ae4dma_desc_dword0 dw0;
+	struct ae4dma_desc_dword1 dw1;
+	uint32_t length;
+	uint32_t reserved;
+	uint32_t src_lo;
+	uint32_t src_hi;
+	uint32_t dst_lo;
+	uint32_t dst_hi;
+};
+
+/*
+ * Registers for each queue :4 bytes length
+ * Effective address : offset + reg
+ */
+struct ae4dma_hwq_regs {
+	union {
+		uint32_t control_raw;
+		struct {
+			uint32_t queue_enable: 1;
+			uint32_t reserved_internal: 31;
+		} control;
+	} control_reg;
+
+	union {
+		uint32_t status_raw;
+		struct {
+			uint32_t reserved0: 1;
+			/* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
+			uint32_t queue_status: 2;
+			uint32_t reserved1: 21;
+			uint32_t interrupt_type: 4;
+			uint32_t reserved2: 4;
+		} status;
+	} status_reg;
+
+	uint32_t max_idx;
+	uint32_t read_idx;
+	uint32_t write_idx;
+
+	union {
+		uint32_t intr_status_raw;
+		struct {
+			uint32_t intr_status: 1;
+			uint32_t reserved: 31;
+		} intr_status;
+	} intr_status_reg;
+
+	uint32_t qbase_lo;
+	uint32_t qbase_hi;
+
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* AE4DMA_HW_DEFS_H */
diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
new file mode 100644
index 0000000000..d55cfbe3b8
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_internal.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _AE4DMA_INTERNAL_H_
+#define _AE4DMA_INTERNAL_H_
+
+#include <stdint.h>
+
+#include "ae4dma_hw_defs.h"
+
+/**
+ * upper_32_bits - return bits 32-63 of a number
+ * @n: the number we're accessing
+ */
+#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
+
+/**
+ * lower_32_bits - return bits 0-31 of a number
+ * @n: the number we're accessing
+ */
+#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
+
+/** Hardware ring depth (slots per queue); must be power of two. */
+#define AE4DMA_DESCRIPTORS_PER_CMDQ	32
+#define AE4DMA_QUEUE_DESC_SIZE		sizeof(struct ae4dma_desc)
+#define AE4DMA_QUEUE_SIZE(n)		(AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
+
+
+/** AE4DMA registers Write/Read */
+static inline void ae4dma_pci_reg_write(void *base, int offset,
+		uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
+	ae4dma_pci_reg_read(hw_addr, reg_offset)
+
+#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
+	ae4dma_pci_reg_write(hw_addr, reg_offset, value)
+
+
+#define AE4DMA_READ_REG(hw_addr) \
+	ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
+
+#define AE4DMA_WRITE_REG(hw_addr, value) \
+	ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+	return ((uint64_t)addr) & 0xffffffffUL;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+	return (uint32_t)(((uint64_t)addr) >> 32);
+}
+
+/**
+ * A structure describing a AE4DMA command queue.
+ */
+struct ae4dma_cmd_queue {
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+	volatile struct ae4dma_hwq_regs *hwq_regs;
+
+	struct rte_dma_vchan_conf qcfg;
+	struct rte_dma_stats stats;
+	/* Queue address */
+	struct ae4dma_desc *qbase_desc;
+	void *qbase_addr;
+	phys_addr_t qbase_phys_addr;
+	enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
+	/* Queue identifier */
+	uint64_t id;    /**< queue id */
+	uint64_t qidx;  /**< queue index */
+	uint64_t qsize; /**< queue size */
+	uint32_t ring_buff_count;
+	unsigned short next_read;
+	unsigned short next_write;
+	unsigned short last_write; /* Used to compute submitted count. */
+} __rte_cache_aligned;
+
+/*
+ * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
+ * dmadevs per PCI function, each owning a single HW command queue.
+ */
+struct ae4dma_dmadev {
+	struct rte_dma_dev *dmadev;
+	void *io_regs;
+	struct ae4dma_cmd_queue cmd_q; /**< single HW queue owned by this dmadev */
+	struct rte_pci_device *pci;    /**< owning PCI device (not owned) */
+};
+
+
+extern int ae4dma_pmd_logtype;
+
+#define AE4DMA_PMD_LOG(level, fmt, args...) rte_log(RTE_LOG_ ## level, \
+		ae4dma_pmd_logtype, "AE4DMA: %s(): " fmt "\n", __func__, ##args)
+
+#define AE4DMA_PMD_DEBUG(fmt, args...)  AE4DMA_PMD_LOG(DEBUG, fmt, ## args)
+#define AE4DMA_PMD_INFO(fmt, args...)   AE4DMA_PMD_LOG(INFO, fmt, ## args)
+#define AE4DMA_PMD_ERR(fmt, args...)    AE4DMA_PMD_LOG(ERR, fmt, ## args)
+#define AE4DMA_PMD_WARN(fmt, args...)   AE4DMA_PMD_LOG(WARNING, fmt, ## args)
+
+#endif /* _AE4DMA_INTERNAL_H_ */
diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
new file mode 100644
index 0000000000..e48ab0d561
--- /dev/null
+++ b/drivers/dma/ae4dma/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files('ae4dma_dmadev.c')
+deps += ['bus_pci', 'dmadev']
diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
index e0d94db967..c230ac5a06 100644
--- a/drivers/dma/meson.build
+++ b/drivers/dma/meson.build
@@ -2,6 +2,7 @@
 # Copyright 2021 HiSilicon Limited
 
 drivers = [
+        'ae4dma',
         'cnxk',
         'dpaa',
         'dpaa2',
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 93f2383dff..ec6d6713b4 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -86,6 +86,9 @@
 cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
             'SVendor': None, 'SDevice': None}
 
+amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
+                 'SVendor': None, 'SDevice': None}
+
 virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
               'SVendor': None, 'SDevice': None}
 
@@ -95,7 +98,7 @@
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
 crypto_devices = [encryption_class, intel_processor_class]
-dma_devices = [cnxk_dma, hisilicon_dma,
+dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
                intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
                intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
                odm_dma]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD
  2026-05-18 18:18 [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD Raghavendra Ningoji
@ 2026-05-21 14:28 ` David Marchand
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
  1 sibling, 0 replies; 24+ messages in thread
From: David Marchand @ 2026-05-21 14:28 UTC (permalink / raw)
  To: Raghavendra Ningoji
  Cc: dev, thomas, rjarry, Bhagyada.Modali, Selwin.Sebastian,
	Chengwen Feng

Hello,

On Mon, 18 May 2026 at 20:19, Raghavendra Ningoji
<raghavendra.ningoji@amd.com> wrote:
>
> Add a new dmadev poll-mode driver for the AMD AE4DMA hardware DMA
> engine.  An AE4DMA engine exposes 16 hardware command queues, each
> with a 32-entry descriptor ring; the PMD maps each hardware channel
> to its own dmadev with a single virtual channel, so a PCI function
> appears as 16 dmadevs named "<pci-bdf>-ch0" .. "<pci-bdf>-ch15".
>
> Driver characteristics:
>
>  - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
>  - Completion is detected via the hardware's per-queue read_idx
>    register, which the engine advances as it processes descriptors.
>    The descriptor status / err_code bytes are read only to classify
>    each drained slot as success or failure.
>  - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
>    and HALTED_ERROR when the queue is not enabled.
>  - depends on bus_pci and dmadev.
>
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>

Cc: Chengwen Feng (DMAdev maintainer)

- The patch is big, splitting it into logical patches introducing one
feature at a time would help.
See for example how the latest DMA driver was submitted:

5a9c32a89c - dma/hisi_acc: introduce HiSilicon SoC accelerator driver
(7 months ago) <Chengwen Feng>
2557ad8f8a - dma/hisi_acc: add control path operations (7 months ago)
<Chengwen Feng>
b58c4435ea - dma/hisi_acc: add data path operations (7 months ago)
<Chengwen Feng>


- Please fix the below warnings raised by checkpatches.sh, and run
this script before submitting a new revision.

### [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD

WARNING:TYPO_SPELLING: 're-use' may be misspelled - perhaps 'reuse'?
#199: FILE: drivers/dma/ae4dma/ae4dma_dmadev.c:42:
+            AE4DMA_PMD_INFO("re-use memzone already "
                              ^^^^^^

total: 0 errors, 1 warnings, 1160 lines checked

Warning in drivers/dma/ae4dma/ae4dma_internal.h:
Prefer RTE_LOG_LINE/RTE_LOG_DP_LINE

Warning in drivers/dma/ae4dma/ae4dma_internal.h:
Do not use variadic argument pack in macros

Please use __rte_cache_aligned only for struct or union types alignment.


- Please also checks the copyright years.
For new code (from upstream pov), this should be 2026.


- Globally in those changes, rte_iova_t should probably be used
instead of phys_addr_t.


> ---
>  MAINTAINERS                            |   5 +
>  doc/guides/dmadevs/ae4dma.rst          |  75 +++
>  doc/guides/dmadevs/index.rst           |   1 +
>  doc/guides/rel_notes/release_26_07.rst |   7 +
>  drivers/dma/ae4dma/ae4dma_dmadev.c     | 742 +++++++++++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_hw_defs.h    | 164 ++++++
>  drivers/dma/ae4dma/ae4dma_internal.h   | 117 ++++
>  drivers/dma/ae4dma/meson.build         |   7 +
>  drivers/dma/meson.build                |   1 +
>  usertools/dpdk-devbind.py              |   5 +-
>  10 files changed, 1123 insertions(+), 1 deletion(-)
>  create mode 100644 doc/guides/dmadevs/ae4dma.rst
>  create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
>  create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
>  create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
>  create mode 100644 drivers/dma/ae4dma/meson.build
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9143d028bc..0b5a6e08d8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
>  DMAdev Drivers
>  --------------
>
> +AMD AE4DMA
> +M: Bhagyada Modali <Bhagyada.Modali@amd.com>

No capital letter in the mail address section.


> +F: drivers/dma/ae4dma/
> +F: doc/guides/dmadevs/ae4dma.rst
> +
>  Intel IDXD - EXPERIMENTAL
>  M: Bruce Richardson <bruce.richardson@intel.com>
>  M: Kevin Laatz <kevin.laatz@intel.com>

Please also add an entry in the .mailmap file as it is your first contribution.


> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> new file mode 100644
> index 0000000000..eb6ea88f55
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -0,0 +1,742 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021-2025 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <rte_bus_pci.h>
> +#include <bus_pci_driver.h>
> +#include <rte_dmadev_pmd.h>
> +#include <rte_malloc.h>
> +
> +#include "ae4dma_internal.h"
> +
> +/*
> + * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
> + * virtual channel. The HW's per-queue register block must be densely
> + * packed right after the engine-common config register at BAR0+0; the
> + * build-time check below catches an accidental layout change.
> + */
> +static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
> +               "ae4dma_hwq_regs stride changed; per-queue offset math will break");
> +
> +RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
> +
> +#define AE4DMA_PMD_NAME dmadev_ae4dma
> +#define AE4DMA_PMD_NAME_STR RTE_STR(AE4DMA_PMD_NAME)

AE4DMA_PMD_NAME_STR is not used at all.
Avoid introducing the macro AE4DMA_PMD_NAME, the value is not supposed
to change.


> +
> +static const struct rte_memzone *
> +ae4dma_queue_dma_zone_reserve(const char *queue_name,
> +               uint32_t queue_size, int socket_id)
> +{
> +       const struct rte_memzone *mz;
> +
> +       mz = rte_memzone_lookup(queue_name);
> +       if (mz != 0) {

mz != NULL

> +               if (((size_t)queue_size <= mz->len) &&
> +                               ((socket_id == SOCKET_ID_ANY) ||
> +                                (socket_id == mz->socket_id))) {
> +                       AE4DMA_PMD_INFO("re-use memzone already "
> +                                       "allocated for %s", queue_name);
> +                       return mz;
> +               }
> +               AE4DMA_PMD_ERR("Incompatible memzone already "
> +                               "allocated %s, size %u, socket %d. "
> +                               "Requested size %u, socket %u",
> +                               queue_name, (uint32_t)mz->len,
> +                               mz->socket_id, queue_size, socket_id);
> +               return NULL;
> +       }
> +       return rte_memzone_reserve_aligned(queue_name, queue_size,
> +                       socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
> +}
> +
> +/* Configure a device. */
> +static int
> +ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
> +               const struct rte_dma_conf *dev_conf,
> +               uint32_t conf_sz)
> +{
> +       if (sizeof(struct rte_dma_conf) != conf_sz)
> +               return -EINVAL;
> +
> +       if (dev_conf->nb_vchans != 1)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
> +static int
> +ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +               const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t max_desc = qconf->nb_desc;
> +
> +       if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
> +               return -EINVAL;
> +
> +       if (max_desc < 2)
> +               return -EINVAL;
> +
> +       if (!rte_is_power_of_2(max_desc))
> +               max_desc = rte_align32pow2(max_desc);
> +
> +       if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
> +               AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
> +                               dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +               max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +       }
> +
> +       cmd_q->qcfg = *qconf;
> +       cmd_q->qcfg.nb_desc = max_desc;
> +
> +       /* Ensure all counters are reset, if reconfiguring/restarting device. */
> +       memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
> +       return 0;
> +}
> +
> +/* Start a configured device. */
> +static int
> +ae4dma_dev_start(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +       if (nb == 0)
> +               return -EBUSY;
> +
> +       /* Program ring depth expected by hardware. */
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
> +       return 0;
> +}
> +
> +/* Stop a configured device. */
> +static int
> +ae4dma_dev_stop(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +       if (cmd_q->hwq_regs != NULL)
> +               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                               AE4DMA_CMD_QUEUE_DISABLE);
> +       return 0;
> +}
> +
> +/* Get device information of a device. */
> +static int
> +ae4dma_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info,
> +               uint32_t size)
> +{
> +       if (size < sizeof(*info))
> +               return -EINVAL;
> +       info->dev_name = dev->device->name;
> +       info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
> +       info->max_vchans = 1;
> +       info->min_desc = 2;
> +       info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +       info->nb_vchans = 1;
> +       return 0;
> +}
> +
> +/* Close a configured device. */
> +static int
> +ae4dma_dev_close(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +       if (cmd_q->hwq_regs != NULL)
> +               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                               AE4DMA_CMD_QUEUE_DISABLE);
> +
> +       if (cmd_q->memz_name[0] != '\0') {
> +               const struct rte_memzone *mz = rte_memzone_lookup(cmd_q->memz_name);
> +
> +               if (mz != NULL)
> +                       rte_memzone_free(mz);
> +       }
> +       cmd_q->qbase_desc = NULL;
> +       cmd_q->qbase_addr = NULL;
> +       cmd_q->qbase_phys_addr = 0;
> +       return 0;
> +}
> +
> +/* trigger h/w to process enqued desc:doorbell - by next_write */
> +static inline void
> +__submit(struct ae4dma_dmadev *ae4dma)
> +{
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t write_idx = cmd_q->next_write;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
> +       if (nb != 0)
> +               cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - cmd_q->last_write +
> +                               nb) % nb);
> +       cmd_q->last_write = cmd_q->next_write;
> +}
> +
> +static int
> +ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev_private;
> +
> +       __submit(ae4dma);
> +       return 0;
> +}
> +
> +/* Write descriptor for enqueue (copy only). */
> +static inline int
> +__write_desc_copy(void *dev_private, rte_iova_t src, phys_addr_t dst,
> +               uint32_t len, uint64_t flags)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       struct ae4dma_desc *dma_desc;
> +       uint16_t ret;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +       uint16_t write = cmd_q->next_write;
> +
> +       if (nb == 0)
> +               return -EINVAL;
> +
> +       /* Reserve one slot to distinguish full from empty (power-of-two ring). */
> +       if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
> +               return -ENOSPC;
> +
> +       dma_desc = &cmd_q->qbase_desc[write];
> +       memset(dma_desc, 0, sizeof(*dma_desc));
> +       dma_desc->length = len;
> +       dma_desc->src_hi = upper_32_bits(src);
> +       dma_desc->src_lo = lower_32_bits(src);
> +       dma_desc->dst_hi = upper_32_bits(dst);
> +       dma_desc->dst_lo = lower_32_bits(dst);
> +       cmd_q->ring_buff_count++;
> +       cmd_q->next_write = (uint16_t)((write + 1) % nb);
> +       ret = write;
> +       if (flags & RTE_DMA_OP_FLAG_SUBMIT)
> +               __submit(ae4dma);
> +       return ret;
> +}
> +
> +/* Enqueue a copy operation onto the ae4dma device. */
> +static int
> +ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
> +               rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +       return __write_desc_copy(dev_private, src, dst, length, flags);
> +}
> +
> +/* Dump DMA device info. */
> +static int
> +ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q;
> +       void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
> +
> +       cmd_q = &ae4dma->cmd_q;
> +       fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
> +       fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
> +       fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
> +       fprintf(f, "mmio_base_addr      = %p\n", ae4dma_mmio_base_addr);
> +       fprintf(f, "queues per ae4dma engine     = %d\n", AE4DMA_READ_REG_OFFSET(
> +                               ae4dma_mmio_base_addr, AE4DMA_COMMON_CONFIG_OFFSET));
> +       fprintf(f, "== Private Data ==\n");
> +       fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
> +       fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
> +                       (void *)cmd_q->qbase_desc,
> +                       (uint64_t)cmd_q->qbase_phys_addr);
> +       fprintf(f, "  Next write: %u\n", cmd_q->next_write);
> +       fprintf(f, "  Next read: %u\n", cmd_q->next_read);
> +       fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
> +       fprintf(f, "  }\n");
> +       fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", failed: %" PRIu64 " }\n",
> +               cmd_q->stats.submitted,
> +               cmd_q->stats.completed,
> +               cmd_q->stats.errors);
> +       return 0;
> +}
> +
> +/* Translates AE4DMA ChanERRs to DMA error codes. */
> +static inline enum rte_dma_status_code
> +__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
> +{
> +       AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
> +
> +       switch (status) {
> +       case AE4DMA_DMA_ERR_NO_ERR:
> +               return RTE_DMA_STATUS_SUCCESSFUL;
> +       case AE4DMA_DMA_ERR_INV_LEN:
> +               return RTE_DMA_STATUS_INVALID_LENGTH;
> +       case AE4DMA_DMA_ERR_INV_SRC:
> +               return RTE_DMA_STATUS_INVALID_SRC_ADDR;
> +       case AE4DMA_DMA_ERR_INV_DST:
> +               return RTE_DMA_STATUS_INVALID_DST_ADDR;
> +       case AE4DMA_DMA_ERR_INV_ALIGN:
> +               /* Name matches DPDK public enum spelling. */
> +               return RTE_DMA_STATUS_DATA_POISION;
> +       case AE4DMA_DMA_ERR_INV_HEADER:
> +       case AE4DMA_DMA_ERR_INV_STATUS:
> +               return RTE_DMA_STATUS_ERROR_UNKNOWN;
> +       default:
> +               return RTE_DMA_STATUS_ERROR_UNKNOWN;
> +       }
> +}
> +
> +/*
> + * Scan HW queue for completed descriptors (non-blocking).
> + *
> + * The AE4DMA engine signals completion by advancing the per-queue
> + * `read_idx` register; it does not (reliably) write a status value
> + * back into the descriptor. We therefore use the HW `read_idx`
> + * register as the source of truth and only inspect the descriptor's
> + * `dw1.err_code` byte to classify each completion as success or
> + * failure.
> + *
> + * @param cmd_q
> + *   The AE4DMA command queue.
> + * @param max_ops
> + *   Maximum descriptors to process this call.
> + * @param[out] failed_count
> + *   Number of completed descriptors that did not report success.
> + * @return
> + *   Number of descriptors completed (success + failure), <= max_ops.
> + */
> +static inline uint16_t
> +ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
> +               uint16_t *failed_count)
> +{
> +       volatile struct ae4dma_desc *hw_desc;
> +       uint16_t events_count = 0, fails = 0;
> +       uint16_t tail;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +       uint16_t mask;
> +       uint16_t hw_read_idx;
> +       uint16_t in_flight;
> +       uint16_t scan_cap;
> +
> +       if (nb == 0 || cmd_q->ring_buff_count == 0) {
> +               *failed_count = 0;
> +               return 0;
> +       }
> +       mask = nb - 1;
> +
> +       hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & mask);
> +       tail = cmd_q->next_read;
> +
> +       /*
> +        * Descriptors completed since our last visit live in the
> +        * half-open ring range [tail, hw_read_idx). If HW hasn't
> +        * moved we have nothing to do.
> +        */
> +       in_flight = (uint16_t)((hw_read_idx - tail) & mask);
> +       if (in_flight == 0) {
> +               *failed_count = 0;
> +               return 0;
> +       }
> +
> +       scan_cap = max_ops;
> +       if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
> +               scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +       if (scan_cap > in_flight)
> +               scan_cap = in_flight;
> +       if (scan_cap > cmd_q->ring_buff_count)
> +               scan_cap = (uint16_t)cmd_q->ring_buff_count;
> +
> +       while (events_count < scan_cap) {
> +               uint8_t hw_status;
> +               uint8_t hw_err;
> +
> +               hw_desc = &cmd_q->qbase_desc[tail];
> +               hw_status = hw_desc->dw1.status;
> +               hw_err = hw_desc->dw1.err_code;
> +
> +               /*
> +                * read_idx advancing is the definitive completion
> +                * signal. The per-descriptor status byte is informational
> +                * and may not yet be written when we observe it:
> +                *
> +                *   AE4DMA_DMA_DESC_ERROR (4)
> +                *     Hard failure - err_code names the precise cause.
> +                *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
> +                *     Success.
> +                *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
> +                *     Benign race: HW had not finished updating the
> +                *     status byte at the instant we read it. Since
> +                *     read_idx has moved past this slot, treat it as
> +                *     success unless err_code says otherwise.
> +                *
> +                * A non-zero err_code is treated as a failure regardless
> +                * of the observed status value.
> +                */
> +               if (hw_status == AE4DMA_DMA_DESC_ERROR ||
> +                               hw_err != AE4DMA_DMA_ERR_NO_ERR) {
> +                       fails++;
> +                       AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
> +                                       hw_status, hw_err);
> +               }
> +               cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
> +               cmd_q->ring_buff_count--;
> +               events_count++;
> +               tail = (tail + 1) & mask;
> +       }
> +
> +       cmd_q->stats.completed += events_count;
> +       cmd_q->stats.errors += fails;
> +       cmd_q->next_read = tail;
> +       *failed_count = fails;
> +       return events_count;
> +}
> +
> +/* Returns successful operations count and sets error flag if any errors. */
> +static uint16_t
> +ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
> +               const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t cpl_count, sl_count;
> +       uint16_t err_count = 0;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +       *has_error = false;
> +
> +       cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
> +
> +       if (cpl_count > max_ops)
> +               cpl_count = max_ops;
> +
> +       if (cpl_count > 0 && last_idx != NULL)
> +               *last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
> +
> +       sl_count = cpl_count - err_count;
> +       if (err_count)
> +               *has_error = true;
> +
> +       return sl_count;
> +}
> +
> +static uint16_t
> +ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
> +               uint16_t max_ops, uint16_t *last_idx,
> +               enum rte_dma_status_code *status)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t cpl_count;
> +       uint16_t i;
> +       uint16_t err_count = 0;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +       cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
> +
> +       if (cpl_count > max_ops)
> +               cpl_count = max_ops;
> +
> +       if (cpl_count > 0 && last_idx != NULL)
> +               *last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
> +
> +       if (likely(err_count == 0)) {
> +               for (i = 0; i < cpl_count; i++)
> +                       status[i] = RTE_DMA_STATUS_SUCCESSFUL;
> +       } else {
> +               for (i = 0; i < cpl_count; i++)
> +                       status[i] = __translate_status_ae4dma_to_dma(cmd_q->status[i]);
> +       }
> +
> +       return cpl_count;
> +}
> +
> +/* Get the remaining capacity of the ring. */
> +static uint16_t
> +ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
> +{
> +       const struct ae4dma_dmadev *ae4dma = dev_private;
> +       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +       uint16_t mask;
> +       uint16_t read_idx = cmd_q->next_read;
> +       uint16_t write_idx = cmd_q->next_write;
> +       uint16_t used;
> +
> +       if (nb < 2 || !rte_is_power_of_2(nb))
> +               return 0;
> +
> +       mask = nb - 1;
> +       used = (uint16_t)((write_idx - read_idx) & mask);
> +       /* One slot reserved (same rule as enqueue). */
> +       if (used >= nb - 1)
> +               return 0;
> +       return (uint16_t)(nb - 1 - used);
> +}
> +
> +/* Retrieve the generic stats of a DMA device. */
> +static int
> +ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +               struct rte_dma_stats *rte_stats, uint32_t size)
> +{
> +       const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       const struct rte_dma_stats *stats = &cmd_q->stats;
> +
> +       if (size < sizeof(*rte_stats))
> +               return -EINVAL;
> +       if (rte_stats == NULL)
> +               return -EINVAL;
> +
> +       *rte_stats = *stats;
> +       return 0;
> +}
> +
> +/* Reset the generic stat counters for the DMA device. */
> +static int
> +ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +       memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
> +       return 0;
> +}
> +
> +/*
> + * Report channel state to the dmadev framework.
> + *
> + *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
> + *                                stopped via dev_stop()).
> + *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
> + *                                no descriptors in flight.
> + *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
> + */
> +static int
> +ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +               enum rte_dma_vchan_status *status)
> +{
> +       const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint32_t ctrl, hw_read, hw_write;
> +
> +       if (cmd_q->hwq_regs == NULL) {
> +               *status = RTE_DMA_VCHAN_HALTED_ERROR;
> +               return 0;
> +       }
> +
> +       ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
> +       if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
> +               *status = RTE_DMA_VCHAN_HALTED_ERROR;
> +               return 0;
> +       }
> +
> +       hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
> +       hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +
> +       *status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
> +                                       : RTE_DMA_VCHAN_ACTIVE;
> +       return 0;
> +}
> +
> +static int
> +ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
> +{
> +       uint32_t dma_addr_lo, dma_addr_hi;
> +       struct ae4dma_cmd_queue *cmd_q;
> +       const struct rte_memzone *q_mz;
> +
> +       if (dev == NULL)
> +               return -EINVAL;

dev can't be NULL.
The only caller passes a pointer that was already dereferenced.

> +
> +       dev->io_regs = dev->pci->mem_resource[AE4DMA_PCIE_BAR].addr;
> +
> +       cmd_q = &dev->cmd_q;
> +       cmd_q->id = qn;
> +       cmd_q->qidx = 0;
> +       cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
> +       cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
> +
> +       /*
> +        * Memzone name must be globally unique. Embed PCI BDF so multiple
> +        * PCI functions probed concurrently don't collide.
> +        */
> +       snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
> +                       "ae4dma_%s_q%u", pci_name, (unsigned int)qn);
> +
> +       q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
> +                       cmd_q->qsize, rte_socket_id());
> +       if (q_mz == NULL) {
> +               AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
> +               return -ENOMEM;
> +       }
> +
> +       cmd_q->qbase_addr = (void *)q_mz->addr;
> +       cmd_q->qbase_desc = (struct ae4dma_desc *)q_mz->addr;
> +       cmd_q->qbase_phys_addr = q_mz->iova;
> +
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                       AE4DMA_CMD_QUEUE_ENABLE);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
> +                       AE4DMA_DISABLE_INTR);
> +       cmd_q->next_write = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +       cmd_q->next_read = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
> +       cmd_q->ring_buff_count = 0;
> +
> +       dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
> +       dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
> +
> +       return 0;
> +}
> +
> +static void
> +ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
> +               unsigned int ch)
> +{
> +       snprintf(out, outlen, "%s-ch%u", pci_name, ch);
> +}
> +
> +/* Create a dmadev(dpdk DMA device) */
> +static int
> +ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
> +{
> +       static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
> +               .dev_close = ae4dma_dev_close,
> +               .dev_configure = ae4dma_dev_configure,
> +               .dev_dump = ae4dma_dev_dump,
> +               .dev_info_get = ae4dma_dev_info_get,
> +               .dev_start = ae4dma_dev_start,
> +               .dev_stop = ae4dma_dev_stop,
> +               .stats_get = ae4dma_stats_get,
> +               .stats_reset = ae4dma_stats_reset,
> +               .vchan_status = ae4dma_vchan_status,
> +               .vchan_setup = ae4dma_vchan_setup,
> +       };
> +
> +       struct rte_dma_dev *dmadev = NULL;
> +       struct ae4dma_dmadev *ae4dma = NULL;
> +       char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
> +
> +       if (!name) {
> +               AE4DMA_PMD_ERR("Invalid name of the device!");
> +               return -EINVAL;
> +       }
> +       memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
> +       ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
> +
> +       dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
> +                       sizeof(struct ae4dma_dmadev));
> +       if (dmadev == NULL) {
> +               AE4DMA_PMD_ERR("Unable to allocate dma device");
> +               return -ENOMEM;
> +       }
> +       dmadev->device = &dev->device;
> +       dmadev->fp_obj->dev_private = dmadev->data->dev_private;
> +       dmadev->dev_ops = &ae4dma_dmadev_ops;
> +
> +       dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
> +       dmadev->fp_obj->completed = ae4dma_completed;
> +       dmadev->fp_obj->completed_status = ae4dma_completed_status;
> +       dmadev->fp_obj->copy = ae4dma_enqueue_copy;
> +       dmadev->fp_obj->submit = ae4dma_submit;
> +       /* fill capability not advertised: leave fp_obj->fill as zero-initialised. */
> +
> +       ae4dma = dmadev->data->dev_private;
> +       ae4dma->dmadev = dmadev;
> +       ae4dma->pci = dev;
> +
> +       if (ae4dma_add_queue(ae4dma, qn, name) != 0)
> +               goto init_error;
> +       return 0;
> +
> +init_error:
> +       AE4DMA_PMD_ERR("driver %s(): failed", __func__);
> +       rte_dma_pmd_release(hwq_dev_name);
> +       return -EFAULT;

ENOMEM or the value returned from ae4dma_add_queue.


> +}
> +

[snip]

> diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> new file mode 100644
> index 0000000000..235819778e
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> @@ -0,0 +1,164 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef __AE4DMA_HW_DEFS_H__
> +#define __AE4DMA_HW_DEFS_H__
> +
> +#include <rte_bus_pci.h>
> +#include <rte_byteorder.h>
> +#include <rte_io.h>
> +#include <rte_pci.h>
> +#include <rte_memzone.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define AE4DMA_BIT(nr)                 (1UL << (nr))
> +
> +#define AE4DMA_BITS_PER_LONG   (__SIZEOF_LONG__ * 8)
> +#define AE4DMA_GENMASK(h, l) \
> +       (((~0UL) << (l)) & (~0UL >> (AE4DMA_BITS_PER_LONG - 1 - (h))))

We have rte_bitops.h macros for bit manipulations, please reuse.


> +
> +/* ae4dma device details */
> +#define AMD_VENDOR_ID  0x1022
> +#define AE4DMA_DEVICE_ID       0x149b
> +#define AE4DMA_PCIE_BAR 0
> +

[snip]


> diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
> index 93f2383dff..ec6d6713b4 100755
> --- a/usertools/dpdk-devbind.py
> +++ b/usertools/dpdk-devbind.py
> @@ -86,6 +86,9 @@
>  cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
>              'SVendor': None, 'SDevice': None}
>
> +amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
> +                 'SVendor': None, 'SDevice': None}
> +

Indent looks odd.


>  virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
>                'SVendor': None, 'SDevice': None}
>
> @@ -95,7 +98,7 @@
>  network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
>  baseband_devices = [acceleration_class]
>  crypto_devices = [encryption_class, intel_processor_class]
> -dma_devices = [cnxk_dma, hisilicon_dma,
> +dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
>                 intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
>                 intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
>                 odm_dma]


-- 
David Marchand


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD
  2026-05-18 18:18 [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD Raghavendra Ningoji
  2026-05-21 14:28 ` David Marchand
@ 2026-05-25 18:42 ` Raghavendra Ningoji
  2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
                     ` (4 more replies)
  1 sibling, 5 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-05-25 18:42 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bhagyada Modali, Robin Jarry, Selwin.Sebastian,
	david.marchand, Raghavendra Ningoji

This series adds a new dmadev poll-mode driver for the AMD AE4DMA
hardware DMA engine. An AE4DMA engine exposes 16 hardware command
queues, each with a 32-entry descriptor ring; the PMD maps each
hardware channel to its own dmadev with a single virtual channel,
so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
"<pci-bdf>-ch15".

Driver characteristics:

 - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
 - Completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure.
 - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
   and HALTED_ERROR when the queue is not enabled.
 - depends on bus_pci and dmadev.

The v1 was submitted as a single patch.  Per review feedback the
driver is now introduced in three logical patches, following the
pattern of the recent hisi_acc dmadev driver:

  1/3 - introduce driver (probe, remove, per-queue HW init)
  2/3 - add control path operations (dev_ops)
  3/3 - add data path operations (copy, submit, completion)
---
Changes in v2:
 - Split the monolithic v1 patch into three logical patches
   (introduce / control path / data path), mirroring the
   structure used by drivers/dma/hisi_acc.
 - Fix checkpatches.sh warnings in drivers/dma/ae4dma/ae4dma_internal.h:
     * Use RTE_LOG_LINE_PREFIX (with RTE_LOGTYPE_AE4DMA_PMD) instead
       of the deprecated rte_log() call form.
     * Replace the GCC variadic argument-pack extension ("args...")
       with C99 __VA_ARGS__ in the AE4DMA_PMD_{LOG,DEBUG,INFO,ERR,
       WARN} macros.
 - Move __rte_cache_aligned to the "struct" keyword position on
   struct ae4dma_cmd_queue, as required by checkpatches.sh.

v1:https://patches.dpdk.org/project/dpdk/patch/20260518181856.1228373-1-raghavendra.ningoji@amd.com/

Raghavendra Ningoji (3):
  dma/ae4dma: introduce AMD AE4DMA DMA PMD
  dma/ae4dma: add control path operations
  dma/ae4dma: add data path operations

 .mailmap                               |   1 +
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  75 +++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 738 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 160 ++++++
 drivers/dma/ae4dma/ae4dma_internal.h   | 118 ++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 11 files changed, 1117 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build


base-commit: f724d1c0d1c1636b9c171c34db3f17c3defaa2f3
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
@ 2026-05-25 18:42   ` Raghavendra Ningoji
  2026-06-22 12:06     ` David Marchand
  2026-06-22 12:26     ` David Marchand
  2026-05-25 18:42   ` [PATCH v2 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-05-25 18:42 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bhagyada Modali, Robin Jarry, Selwin.Sebastian,
	david.marchand, Raghavendra Ningoji

Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
hardware DMA engine, providing only PCI probe/remove and per-queue
hardware initialisation. An AE4DMA engine exposes 16 hardware command
queues, each with a 32-entry descriptor ring; the PMD maps each
hardware channel to its own dmadev with a single virtual channel,
so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
"<pci-bdf>-ch15".

This patch only registers the PCI driver, allocates the dmadev
objects, reserves the per-queue descriptor rings and programs the
hardware queue base addresses. Control and data path operations are
added in subsequent patches.

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 .mailmap                               |   1 +
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  53 ++++++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 227 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 160 +++++++++++++++++
 drivers/dma/ae4dma/ae4dma_internal.h   | 118 +++++++++++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 11 files changed, 584 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build

diff --git a/.mailmap b/.mailmap
index 89ba6ffccc..60180818f9 100644
--- a/.mailmap
+++ b/.mailmap
@@ -203,6 +203,7 @@ Benoît Ganne <bganne@cisco.com>
 Bernard Iremonger <bernard.iremonger@intel.com>
 Bert van Leeuwen <bert.vanleeuwen@netronome.com>
 Bhagyada Modali <bhagyada.modali@amd.com>
+Raghavendra Ningoji <raghavendra.ningoji@amd.com>
 Bharat Mota <bharat.mota@broadcom.com> <bmota@vmware.com>
 Bhuvan Mital <bhuvan.mital@amd.com>
 Bibo Mao <maobibo@loongson.cn>
diff --git a/MAINTAINERS b/MAINTAINERS
index 9143d028bc..2e27af49f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
 DMAdev Drivers
 --------------
 
+AMD AE4DMA
+M: Bhagyada Modali <bhagyada.modali@amd.com>
+F: drivers/dma/ae4dma/
+F: doc/guides/dmadevs/ae4dma.rst
+
 Intel IDXD - EXPERIMENTAL
 M: Bruce Richardson <bruce.richardson@intel.com>
 M: Kevin Laatz <kevin.laatz@intel.com>
diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
new file mode 100644
index 0000000000..a85c1d92ca
--- /dev/null
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -0,0 +1,53 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2025 Advanced Micro Devices, Inc.
+
+.. include:: <isonum.txt>
+
+AMD AE4DMA DMA Device Driver
+============================
+
+The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
+AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
+hardware command queues, each with a ring of 32 descriptors. The PMD
+maps each hardware command queue to a separate DPDK dmadev with a
+single virtual channel, so a single PCI function appears as 16 dmadevs
+named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
+
+The driver supports memory-to-memory copy operations only.
+
+Hardware Requirements
+---------------------
+
+The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
+the system::
+
+   dpdk-devbind.py --status-dev dma
+
+AE4DMA devices appear with vendor ID ``0x1022`` and device ID
+``0x149b``.
+
+Compilation
+-----------
+
+The driver is built as part of the standard DPDK build on x86 platforms
+using ``meson`` and ``ninja``; no extra configuration is required.
+
+Device Setup
+------------
+
+The AE4DMA device must be bound to a DPDK-compatible kernel module such
+as ``vfio-pci`` before it can be used::
+
+   dpdk-devbind.py -b vfio-pci <pci-bdf>
+
+Initialization
+~~~~~~~~~~~~~~
+
+On probe the PMD performs the following steps for each PCI function:
+
+* Reads BAR0 and programs the common configuration register with the
+  number of hardware queues to enable (16).
+* For each hardware queue it allocates a 32-entry descriptor ring in
+  IOVA-contiguous memory, programs the queue base address and ring
+  depth into the per-queue registers, and enables the queue.
+* Interrupts are masked; completion is polled by the application.
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 56beb1733f..97399590f6 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -11,6 +11,7 @@ an application through DMA API.
    :maxdepth: 1
    :numbered:
 
+   ae4dma
    cnxk
    dpaa
    dpaa2
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index f012d47a4b..9a78a7ef62 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -63,6 +63,13 @@ New Features
     ``rte_eal_init`` and the application is responsible for probing each device,
   * ``--auto-probing`` enables the initial bus probing, which is the current default behavior.
 
+* **Added AMD AE4DMA DMA PMD.**
+
+  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
+  Each PCI function exposes 16 hardware command queues; the PMD registers one
+  dmadev per channel with a single virtual channel and supports
+  memory-to-memory copy operations.
+
 
 Removed Items
 -------------
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
new file mode 100644
index 0000000000..76de2cde45
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -0,0 +1,227 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <rte_bus_pci.h>
+#include <bus_pci_driver.h>
+#include <rte_dmadev_pmd.h>
+#include <rte_malloc.h>
+
+#include "ae4dma_internal.h"
+
+/*
+ * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
+ * virtual channel. The HW's per-queue register block must be densely
+ * packed right after the engine-common config register at BAR0+0; the
+ * build-time check below catches an accidental layout change.
+ */
+static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
+		"ae4dma_hwq_regs stride changed; per-queue offset math will break");
+
+RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
+
+#define AE4DMA_PMD_NAME dmadev_ae4dma
+
+static const struct rte_memzone *
+ae4dma_queue_dma_zone_reserve(const char *queue_name,
+		uint32_t queue_size, int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != NULL) {
+		if (((size_t)queue_size <= mz->len) &&
+				((socket_id == SOCKET_ID_ANY) ||
+				 (socket_id == mz->socket_id))) {
+			AE4DMA_PMD_INFO("reuse memzone already "
+					"allocated for %s", queue_name);
+			return mz;
+		}
+		AE4DMA_PMD_ERR("Incompatible memzone already "
+				"allocated %s, size %u, socket %d. "
+				"Requested size %u, socket %u",
+				queue_name, (uint32_t)mz->len,
+				mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+static int
+ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
+{
+	uint32_t dma_addr_lo, dma_addr_hi;
+	struct ae4dma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+
+	dev->io_regs = dev->pci->mem_resource[AE4DMA_PCIE_BAR].addr;
+
+	cmd_q = &dev->cmd_q;
+	cmd_q->id = qn;
+	cmd_q->qidx = 0;
+	cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
+	cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
+
+	/*
+	 * Memzone name must be globally unique. Embed PCI BDF so multiple
+	 * PCI functions probed concurrently don't collide.
+	 */
+	snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			"ae4dma_%s_q%u", pci_name, (unsigned int)qn);
+
+	q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
+			cmd_q->qsize, rte_socket_id());
+	if (q_mz == NULL) {
+		AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
+		return -ENOMEM;
+	}
+
+	cmd_q->qbase_addr = (void *)q_mz->addr;
+	cmd_q->qbase_desc = (struct ae4dma_desc *)q_mz->addr;
+	cmd_q->qbase_phys_addr = q_mz->iova;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+			AE4DMA_CMD_QUEUE_ENABLE);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
+			AE4DMA_DISABLE_INTR);
+	cmd_q->next_write = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+	cmd_q->next_read = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	cmd_q->ring_buff_count = 0;
+
+	dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
+	dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
+
+	return 0;
+}
+
+static void
+ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
+		unsigned int ch)
+{
+	snprintf(out, outlen, "%s-ch%u", pci_name, ch);
+}
+
+/* Create a dmadev(dpdk DMA device) */
+static int
+ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
+{
+	struct rte_dma_dev *dmadev = NULL;
+	struct ae4dma_dmadev *ae4dma = NULL;
+	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
+
+	if (!name) {
+		AE4DMA_PMD_ERR("Invalid name of the device!");
+		return -EINVAL;
+	}
+	memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
+	ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
+
+	dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
+			sizeof(struct ae4dma_dmadev));
+	if (dmadev == NULL) {
+		AE4DMA_PMD_ERR("Unable to allocate dma device");
+		return -ENOMEM;
+	}
+	dmadev->device = &dev->device;
+	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+
+	ae4dma = dmadev->data->dev_private;
+	ae4dma->dmadev = dmadev;
+	ae4dma->pci = dev;
+
+	if (ae4dma_add_queue(ae4dma, qn, name) != 0)
+		goto init_error;
+	return 0;
+
+init_error:
+	AE4DMA_PMD_ERR("driver %s(): failed", __func__);
+	rte_dma_pmd_release(hwq_dev_name);
+	return -ENOMEM;
+}
+
+/* Probe DMA device. */
+static int
+ae4dma_dmadev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	void *mmio_base;
+	uint32_t q_per_eng;
+	int ret = 0;
+	uint8_t i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+	dev->device.driver = &drv->driver;
+
+	mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
+	if (mmio_base == NULL) {
+		AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
+		return -ENODEV;
+	}
+
+	/* Program the per-engine HW queue count once. */
+	AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
+			AE4DMA_MAX_HW_QUEUES);
+	q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
+	AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ret = ae4dma_dmadev_create(name, dev, i);
+		if (ret != 0) {
+			AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
+			while (i > 0) {
+				i--;
+				ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+				rte_dma_pmd_release(chname);
+			}
+			break;
+		}
+	}
+	return ret;
+}
+
+/* Remove DMA device. */
+static int
+ae4dma_dmadev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	unsigned int i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+
+	AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+		rte_dma_pmd_release(chname);
+	}
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_ae4dma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver ae4dma_pmd_drv = {
+	.id_table = pci_id_ae4dma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = ae4dma_dmadev_probe,
+	.remove = ae4dma_dmadev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
+RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
new file mode 100644
index 0000000000..62b6a1b30b
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __AE4DMA_HW_DEFS_H__
+#define __AE4DMA_HW_DEFS_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_memzone.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define AE4DMA_BIT(nr)			(1UL << (nr))
+
+/* ae4dma device details */
+#define AMD_VENDOR_ID	0x1022
+#define AE4DMA_DEVICE_ID	0x149b
+#define AE4DMA_PCIE_BAR 0
+
+/*
+ * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
+ */
+#define AE4DMA_MAX_HW_QUEUES        16
+#define AE4DMA_QUEUE_START_INDEX    0
+#define AE4DMA_CMD_QUEUE_ENABLE		0x1
+#define AE4DMA_CMD_QUEUE_DISABLE	0x0
+
+/* Common to all queues */
+#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
+
+#define AE4DMA_DISABLE_INTR 0x01
+
+/* Descriptor status */
+enum ae4dma_dma_status {
+	AE4DMA_DMA_DESC_SUBMITTED = 0,
+	AE4DMA_DMA_DESC_VALIDATED = 1,
+	AE4DMA_DMA_DESC_PROCESSED = 2,
+	AE4DMA_DMA_DESC_COMPLETED = 3,
+	AE4DMA_DMA_DESC_ERROR = 4,
+};
+
+/* Descriptor error-code */
+enum ae4dma_dma_err {
+	AE4DMA_DMA_ERR_NO_ERR = 0,
+	AE4DMA_DMA_ERR_INV_HEADER = 1,
+	AE4DMA_DMA_ERR_INV_STATUS = 2,
+	AE4DMA_DMA_ERR_INV_LEN = 3,
+	AE4DMA_DMA_ERR_INV_SRC = 4,
+	AE4DMA_DMA_ERR_INV_DST = 5,
+	AE4DMA_DMA_ERR_INV_ALIGN = 6,
+	AE4DMA_DMA_ERR_UNKNOWN = 7,
+};
+
+/* HW Queue status */
+enum ae4dma_hwqueue_status {
+	AE4DMA_HWQUEUE_EMPTY = 0,
+	AE4DMA_HWQUEUE_FULL = 1,
+	AE4DMA_HWQUEUE_NOT_EMPTY = 4
+};
+/*
+ * descriptor for AE4DMA commands
+ * 8 32-bit words:
+ * word 0: source memory type; destination memory type ; control bits
+ * word 1: desc_id; error code; status
+ * word 2: length
+ * word 3: reserved
+ * word 4: upper 32 bits of source pointer
+ * word 5: low 32 bits of source pointer
+ * word 6: upper 32 bits of destination pointer
+ * word 7: low 32 bits of destination pointer
+ */
+
+/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
+#define AE4DMA_DWORD0_STOP_ON_COMPLETION	AE4DMA_BIT(0)
+#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION	AE4DMA_BIT(1)
+#define AE4DMA_DWORD0_START_OF_MESSAGE		AE4DMA_BIT(3)
+#define AE4DMA_DWORD0_END_OF_MESSAGE		AE4DMA_BIT(4)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE	RTE_GENMASK64(5, 4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE	RTE_GENMASK64(7, 6)
+
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
+
+struct ae4dma_desc_dword0 {
+	uint8_t byte0;
+	uint8_t byte1;
+	uint16_t timestamp;
+};
+
+struct ae4dma_desc_dword1 {
+	uint8_t status;
+	uint8_t err_code;
+	uint16_t desc_id;
+};
+
+struct ae4dma_desc {
+	struct ae4dma_desc_dword0 dw0;
+	struct ae4dma_desc_dword1 dw1;
+	uint32_t length;
+	uint32_t reserved;
+	uint32_t src_lo;
+	uint32_t src_hi;
+	uint32_t dst_lo;
+	uint32_t dst_hi;
+};
+
+/*
+ * Registers for each queue :4 bytes length
+ * Effective address : offset + reg
+ */
+struct ae4dma_hwq_regs {
+	union {
+		uint32_t control_raw;
+		struct {
+			uint32_t queue_enable: 1;
+			uint32_t reserved_internal: 31;
+		} control;
+	} control_reg;
+
+	union {
+		uint32_t status_raw;
+		struct {
+			uint32_t reserved0: 1;
+			/* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
+			uint32_t queue_status: 2;
+			uint32_t reserved1: 21;
+			uint32_t interrupt_type: 4;
+			uint32_t reserved2: 4;
+		} status;
+	} status_reg;
+
+	uint32_t max_idx;
+	uint32_t read_idx;
+	uint32_t write_idx;
+
+	union {
+		uint32_t intr_status_raw;
+		struct {
+			uint32_t intr_status: 1;
+			uint32_t reserved: 31;
+		} intr_status;
+	} intr_status_reg;
+
+	uint32_t qbase_lo;
+	uint32_t qbase_hi;
+
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* AE4DMA_HW_DEFS_H */
diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
new file mode 100644
index 0000000000..9892d6697f
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_internal.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _AE4DMA_INTERNAL_H_
+#define _AE4DMA_INTERNAL_H_
+
+#include <stdint.h>
+
+#include "ae4dma_hw_defs.h"
+
+/**
+ * upper_32_bits - return bits 32-63 of a number
+ * @n: the number we're accessing
+ */
+#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
+
+/**
+ * lower_32_bits - return bits 0-31 of a number
+ * @n: the number we're accessing
+ */
+#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
+
+/** Hardware ring depth (slots per queue); must be power of two. */
+#define AE4DMA_DESCRIPTORS_PER_CMDQ	32
+#define AE4DMA_QUEUE_DESC_SIZE		sizeof(struct ae4dma_desc)
+#define AE4DMA_QUEUE_SIZE(n)		(AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
+
+
+/** AE4DMA registers Write/Read */
+static inline void ae4dma_pci_reg_write(void *base, int offset,
+		uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
+	ae4dma_pci_reg_read(hw_addr, reg_offset)
+
+#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
+	ae4dma_pci_reg_write(hw_addr, reg_offset, value)
+
+
+#define AE4DMA_READ_REG(hw_addr) \
+	ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
+
+#define AE4DMA_WRITE_REG(hw_addr, value) \
+	ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+	return ((uint64_t)addr) & 0xffffffffUL;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+	return (uint32_t)(((uint64_t)addr) >> 32);
+}
+
+/**
+ * A structure describing a AE4DMA command queue.
+ */
+struct __rte_cache_aligned ae4dma_cmd_queue {
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+	volatile struct ae4dma_hwq_regs *hwq_regs;
+
+	struct rte_dma_vchan_conf qcfg;
+	struct rte_dma_stats stats;
+	/* Queue address */
+	struct ae4dma_desc *qbase_desc;
+	void *qbase_addr;
+	rte_iova_t qbase_phys_addr;
+	enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
+	/* Queue identifier */
+	uint64_t id;    /**< queue id */
+	uint64_t qidx;  /**< queue index */
+	uint64_t qsize; /**< queue size */
+	uint32_t ring_buff_count;
+	unsigned short next_read;
+	unsigned short next_write;
+	unsigned short last_write; /* Used to compute submitted count. */
+};
+
+/*
+ * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
+ * dmadevs per PCI function, each owning a single HW command queue.
+ */
+struct ae4dma_dmadev {
+	struct rte_dma_dev *dmadev;
+	void *io_regs;
+	struct ae4dma_cmd_queue cmd_q; /**< single HW queue owned by this dmadev */
+	struct rte_pci_device *pci;    /**< owning PCI device (not owned) */
+};
+
+
+extern int ae4dma_pmd_logtype;
+#define RTE_LOGTYPE_AE4DMA_PMD ae4dma_pmd_logtype
+
+#define AE4DMA_PMD_LOG(level, ...) \
+	RTE_LOG_LINE_PREFIX(level, AE4DMA_PMD, "%s(): ", __func__, __VA_ARGS__)
+
+#define AE4DMA_PMD_DEBUG(...)  AE4DMA_PMD_LOG(DEBUG, __VA_ARGS__)
+#define AE4DMA_PMD_INFO(...)   AE4DMA_PMD_LOG(INFO, __VA_ARGS__)
+#define AE4DMA_PMD_ERR(...)    AE4DMA_PMD_LOG(ERR, __VA_ARGS__)
+#define AE4DMA_PMD_WARN(...)   AE4DMA_PMD_LOG(WARNING, __VA_ARGS__)
+
+#endif /* _AE4DMA_INTERNAL_H_ */
diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
new file mode 100644
index 0000000000..e48ab0d561
--- /dev/null
+++ b/drivers/dma/ae4dma/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files('ae4dma_dmadev.c')
+deps += ['bus_pci', 'dmadev']
diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
index e0d94db967..c230ac5a06 100644
--- a/drivers/dma/meson.build
+++ b/drivers/dma/meson.build
@@ -2,6 +2,7 @@
 # Copyright 2021 HiSilicon Limited
 
 drivers = [
+        'ae4dma',
         'cnxk',
         'dpaa',
         'dpaa2',
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 93f2383dff..7d09f155de 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -86,6 +86,9 @@
 cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
             'SVendor': None, 'SDevice': None}
 
+amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
+              'SVendor': None, 'SDevice': None}
+
 virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
               'SVendor': None, 'SDevice': None}
 
@@ -95,7 +98,7 @@
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
 crypto_devices = [encryption_class, intel_processor_class]
-dma_devices = [cnxk_dma, hisilicon_dma,
+dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
                intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
                intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
                odm_dma]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/3] dma/ae4dma: add control path operations
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
  2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
@ 2026-05-25 18:42   ` Raghavendra Ningoji
  2026-06-22 12:15     ` David Marchand
  2026-05-25 18:42   ` [PATCH v2 3/3] dma/ae4dma: add data " Raghavendra Ningoji
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-05-25 18:42 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bhagyada Modali, Robin Jarry, Selwin.Sebastian,
	david.marchand, Raghavendra Ningoji

Implement the dmadev control path for the AMD AE4DMA PMD.

This commit adds:
 - dev_configure / vchan_setup: accept a single virtual channel per
   dmadev and clamp the requested ring size to the hardware maximum
   of 32 descriptors (rounded up to a power of two).
 - dev_start / dev_stop / dev_close: program the per-queue control
   register to enable/disable the hardware queue and release the
   descriptor ring memzone on close.
 - dev_info_get: advertise RTE_DMA_CAPA_MEM_TO_MEM and the fixed
   ring depth.
 - dev_dump: print the queue identifiers, ring layout and software
   completion counters.
 - stats_get / stats_reset: expose submitted / completed / errors
   counters maintained by the driver.
 - vchan_status: report IDLE / ACTIVE based on hardware read_idx vs
   write_idx, and HALTED_ERROR when the queue is not enabled.

The dmadev framework is wired through dev_ops in ae4dma_dmadev_create().

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 drivers/dma/ae4dma/ae4dma_dmadev.c | 223 +++++++++++++++++++++++++++++
 1 file changed, 223 insertions(+)

diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
index 76de2cde45..dfda723c13 100644
--- a/drivers/dma/ae4dma/ae4dma_dmadev.c
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -53,6 +53,215 @@ ae4dma_queue_dma_zone_reserve(const char *queue_name,
 			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
 }
 
+/* Configure a device. */
+static int
+ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
+		const struct rte_dma_conf *dev_conf,
+		uint32_t conf_sz)
+{
+	if (sizeof(struct rte_dma_conf) != conf_sz)
+		return -EINVAL;
+
+	if (dev_conf->nb_vchans != 1)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
+static int
+ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t max_desc = qconf->nb_desc;
+
+	if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
+		return -EINVAL;
+
+	if (max_desc < 2)
+		return -EINVAL;
+
+	if (!rte_is_power_of_2(max_desc))
+		max_desc = rte_align32pow2(max_desc);
+
+	if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
+		AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
+				dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
+		max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	}
+
+	cmd_q->qcfg = *qconf;
+	cmd_q->qcfg.nb_desc = max_desc;
+
+	/* Ensure all counters are reset, if reconfiguring/restarting device. */
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+/* Start a configured device. */
+static int
+ae4dma_dev_start(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	if (nb == 0)
+		return -EBUSY;
+
+	/* Program ring depth expected by hardware. */
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
+	return 0;
+}
+
+/* Stop a configured device. */
+static int
+ae4dma_dev_stop(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+	return 0;
+}
+
+/* Get device information of a device. */
+static int
+ae4dma_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info,
+		uint32_t size)
+{
+	if (size < sizeof(*info))
+		return -EINVAL;
+	info->dev_name = dev->device->name;
+	info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
+	info->max_vchans = 1;
+	info->min_desc = 2;
+	info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	info->nb_vchans = 1;
+	return 0;
+}
+
+/* Close a configured device. */
+static int
+ae4dma_dev_close(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+
+	if (cmd_q->memz_name[0] != '\0') {
+		const struct rte_memzone *mz = rte_memzone_lookup(cmd_q->memz_name);
+
+		if (mz != NULL)
+			rte_memzone_free(mz);
+	}
+	cmd_q->qbase_desc = NULL;
+	cmd_q->qbase_addr = NULL;
+	cmd_q->qbase_phys_addr = 0;
+	return 0;
+}
+/* Dump DMA device info. */
+static int
+ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q;
+	void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
+
+	cmd_q = &ae4dma->cmd_q;
+	fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
+	fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
+	fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
+	fprintf(f, "mmio_base_addr	= %p\n", ae4dma_mmio_base_addr);
+	fprintf(f, "queues per ae4dma engine     = %d\n", AE4DMA_READ_REG_OFFSET(
+				ae4dma_mmio_base_addr, AE4DMA_COMMON_CONFIG_OFFSET));
+	fprintf(f, "== Private Data ==\n");
+	fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
+	fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
+			(void *)cmd_q->qbase_desc,
+			(uint64_t)cmd_q->qbase_phys_addr);
+	fprintf(f, "  Next write: %u\n", cmd_q->next_write);
+	fprintf(f, "  Next read: %u\n", cmd_q->next_read);
+	fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
+	fprintf(f, "  }\n");
+	fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", failed: %" PRIu64 " }\n",
+		cmd_q->stats.submitted,
+		cmd_q->stats.completed,
+		cmd_q->stats.errors);
+	return 0;
+}
+/* Retrieve the generic stats of a DMA device. */
+static int
+ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		struct rte_dma_stats *rte_stats, uint32_t size)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	const struct rte_dma_stats *stats = &cmd_q->stats;
+
+	if (size < sizeof(*rte_stats))
+		return -EINVAL;
+	if (rte_stats == NULL)
+		return -EINVAL;
+
+	*rte_stats = *stats;
+	return 0;
+}
+
+/* Reset the generic stat counters for the DMA device. */
+static int
+ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+/*
+ * Report channel state to the dmadev framework.
+ *
+ *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
+ *                                stopped via dev_stop()).
+ *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
+ *                                no descriptors in flight.
+ *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
+ */
+static int
+ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		enum rte_dma_vchan_status *status)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint32_t ctrl, hw_read, hw_write;
+
+	if (cmd_q->hwq_regs == NULL) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
+	if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+
+	*status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
+					: RTE_DMA_VCHAN_ACTIVE;
+	return 0;
+}
+
 static int
 ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
 {
@@ -114,6 +323,19 @@ ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
 static int
 ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 {
+	static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
+		.dev_close = ae4dma_dev_close,
+		.dev_configure = ae4dma_dev_configure,
+		.dev_dump = ae4dma_dev_dump,
+		.dev_info_get = ae4dma_dev_info_get,
+		.dev_start = ae4dma_dev_start,
+		.dev_stop = ae4dma_dev_stop,
+		.stats_get = ae4dma_stats_get,
+		.stats_reset = ae4dma_stats_reset,
+		.vchan_status = ae4dma_vchan_status,
+		.vchan_setup = ae4dma_vchan_setup,
+	};
+
 	struct rte_dma_dev *dmadev = NULL;
 	struct ae4dma_dmadev *ae4dma = NULL;
 	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
@@ -133,6 +355,7 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 	}
 	dmadev->device = &dev->device;
 	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+	dmadev->dev_ops = &ae4dma_dmadev_ops;
 
 	ae4dma = dmadev->data->dev_private;
 	ae4dma->dmadev = dmadev;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/3] dma/ae4dma: add data path operations
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
  2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
  2026-05-25 18:42   ` [PATCH v2 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
@ 2026-05-25 18:42   ` Raghavendra Ningoji
  2026-06-22 12:25   ` [PATCH v2 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD David Marchand
  2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
  4 siblings, 0 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-05-25 18:42 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bhagyada Modali, Robin Jarry, Selwin.Sebastian,
	david.marchand, Raghavendra Ningoji

Implement the dmadev fast path for the AMD AE4DMA PMD.

This commit adds:
 - copy enqueue (rte_dma_copy): write an AE4DMA descriptor for a
   memory-to-memory transfer; on RTE_DMA_OP_FLAG_SUBMIT the doorbell
   is rung immediately.
 - submit (rte_dma_submit): advance the per-queue write_idx
   register to expose pending descriptors to the hardware.
 - completion (rte_dma_completed / rte_dma_completed_status):
   completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure, and HW error codes are
   translated to the dmadev RTE_DMA_STATUS_* enumeration.
 - burst capacity (rte_dma_burst_capacity): report the number of
   free descriptor slots, taking into account the one slot reserved
   to distinguish full from empty on the power-of-two ring.

The fast path entry points are wired through fp_obj in
ae4dma_dmadev_create(). The fill capability is not advertised;
fp_obj->fill is left zero-initialised.

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 doc/guides/dmadevs/ae4dma.rst      |  22 +++
 drivers/dma/ae4dma/ae4dma_dmadev.c | 288 +++++++++++++++++++++++++++++
 2 files changed, 310 insertions(+)

diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
index a85c1d92ca..37a2096ccf 100644
--- a/doc/guides/dmadevs/ae4dma.rst
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -51,3 +51,25 @@ On probe the PMD performs the following steps for each PCI function:
   IOVA-contiguous memory, programs the queue base address and ring
   depth into the per-queue registers, and enables the queue.
 * Interrupts are masked; completion is polled by the application.
+
+Usage
+-----
+
+Once a dmadev has been started, copies are submitted with
+``rte_dma_copy()`` and completions are reaped with ``rte_dma_completed()``
+or ``rte_dma_completed_status()``. See the
+:ref:`Enqueue / Dequeue API <dmadev_enqueue_dequeue>` section of the
+dmadev library documentation for details.
+
+Limitations
+-----------
+
+* Only memory-to-memory copies are supported. Fill, scatter-gather and
+  any other operation types are not advertised in
+  ``rte_dma_info::dev_capa``.
+* The maximum number of descriptors per virtual channel is fixed by
+  hardware at 32. The PMD rounds the requested ring size up to a
+  power of two and clamps it to 32.
+* Only a single virtual channel per dmadev is supported; use the 16
+  per-PCI-function dmadevs to obtain channel-level parallelism.
+* Interrupt-driven completion is not supported.
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
index dfda723c13..0f223fc40c 100644
--- a/drivers/dma/ae4dma/ae4dma_dmadev.c
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -167,6 +167,73 @@ ae4dma_dev_close(struct rte_dma_dev *dev)
 	cmd_q->qbase_phys_addr = 0;
 	return 0;
 }
+
+/* trigger h/w to process enqued desc:doorbell - by next_write */
+static inline void
+__submit(struct ae4dma_dmadev *ae4dma)
+{
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
+	if (nb != 0)
+		cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - cmd_q->last_write +
+				nb) % nb);
+	cmd_q->last_write = cmd_q->next_write;
+}
+
+static int
+ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+
+	__submit(ae4dma);
+	return 0;
+}
+
+/* Write descriptor for enqueue (copy only). */
+static inline int
+__write_desc_copy(void *dev_private, rte_iova_t src, rte_iova_t dst,
+		uint32_t len, uint64_t flags)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	struct ae4dma_desc *dma_desc;
+	uint16_t ret;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t write = cmd_q->next_write;
+
+	if (nb == 0)
+		return -EINVAL;
+
+	/* Reserve one slot to distinguish full from empty (power-of-two ring). */
+	if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
+		return -ENOSPC;
+
+	dma_desc = &cmd_q->qbase_desc[write];
+	memset(dma_desc, 0, sizeof(*dma_desc));
+	dma_desc->length = len;
+	dma_desc->src_hi = upper_32_bits(src);
+	dma_desc->src_lo = lower_32_bits(src);
+	dma_desc->dst_hi = upper_32_bits(dst);
+	dma_desc->dst_lo = lower_32_bits(dst);
+	cmd_q->ring_buff_count++;
+	cmd_q->next_write = (uint16_t)((write + 1) % nb);
+	ret = write;
+	if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+		__submit(ae4dma);
+	return ret;
+}
+
+/* Enqueue a copy operation onto the ae4dma device. */
+static int
+ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
+		rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+	return __write_desc_copy(dev_private, src, dst, length, flags);
+}
+
 /* Dump DMA device info. */
 static int
 ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
@@ -197,6 +264,220 @@ ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
 		cmd_q->stats.errors);
 	return 0;
 }
+
+/* Translates AE4DMA ChanERRs to DMA error codes. */
+static inline enum rte_dma_status_code
+__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
+{
+	AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
+
+	switch (status) {
+	case AE4DMA_DMA_ERR_NO_ERR:
+		return RTE_DMA_STATUS_SUCCESSFUL;
+	case AE4DMA_DMA_ERR_INV_LEN:
+		return RTE_DMA_STATUS_INVALID_LENGTH;
+	case AE4DMA_DMA_ERR_INV_SRC:
+		return RTE_DMA_STATUS_INVALID_SRC_ADDR;
+	case AE4DMA_DMA_ERR_INV_DST:
+		return RTE_DMA_STATUS_INVALID_DST_ADDR;
+	case AE4DMA_DMA_ERR_INV_ALIGN:
+		/* Name matches DPDK public enum spelling. */
+		return RTE_DMA_STATUS_DATA_POISION;
+	case AE4DMA_DMA_ERR_INV_HEADER:
+	case AE4DMA_DMA_ERR_INV_STATUS:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	default:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	}
+}
+
+/*
+ * Scan HW queue for completed descriptors (non-blocking).
+ *
+ * The AE4DMA engine signals completion by advancing the per-queue
+ * `read_idx` register; it does not (reliably) write a status value
+ * back into the descriptor. We therefore use the HW `read_idx`
+ * register as the source of truth and only inspect the descriptor's
+ * `dw1.err_code` byte to classify each completion as success or
+ * failure.
+ *
+ * @param cmd_q
+ *   The AE4DMA command queue.
+ * @param max_ops
+ *   Maximum descriptors to process this call.
+ * @param[out] failed_count
+ *   Number of completed descriptors that did not report success.
+ * @return
+ *   Number of descriptors completed (success + failure), <= max_ops.
+ */
+static inline uint16_t
+ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
+		uint16_t *failed_count)
+{
+	volatile struct ae4dma_desc *hw_desc;
+	uint16_t events_count = 0, fails = 0;
+	uint16_t tail;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t hw_read_idx;
+	uint16_t in_flight;
+	uint16_t scan_cap;
+
+	if (nb == 0 || cmd_q->ring_buff_count == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+	mask = nb - 1;
+
+	hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & mask);
+	tail = cmd_q->next_read;
+
+	/*
+	 * Descriptors completed since our last visit live in the
+	 * half-open ring range [tail, hw_read_idx). If HW hasn't
+	 * moved we have nothing to do.
+	 */
+	in_flight = (uint16_t)((hw_read_idx - tail) & mask);
+	if (in_flight == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+
+	scan_cap = max_ops;
+	if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
+		scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	if (scan_cap > in_flight)
+		scan_cap = in_flight;
+	if (scan_cap > cmd_q->ring_buff_count)
+		scan_cap = (uint16_t)cmd_q->ring_buff_count;
+
+	while (events_count < scan_cap) {
+		uint8_t hw_status;
+		uint8_t hw_err;
+
+		hw_desc = &cmd_q->qbase_desc[tail];
+		hw_status = hw_desc->dw1.status;
+		hw_err = hw_desc->dw1.err_code;
+
+		/*
+		 * read_idx advancing is the definitive completion
+		 * signal. The per-descriptor status byte is informational
+		 * and may not yet be written when we observe it:
+		 *
+		 *   AE4DMA_DMA_DESC_ERROR (4)
+		 *     Hard failure - err_code names the precise cause.
+		 *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
+		 *     Success.
+		 *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
+		 *     Benign race: HW had not finished updating the
+		 *     status byte at the instant we read it. Since
+		 *     read_idx has moved past this slot, treat it as
+		 *     success unless err_code says otherwise.
+		 *
+		 * A non-zero err_code is treated as a failure regardless
+		 * of the observed status value.
+		 */
+		if (hw_status == AE4DMA_DMA_DESC_ERROR ||
+				hw_err != AE4DMA_DMA_ERR_NO_ERR) {
+			fails++;
+			AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
+					hw_status, hw_err);
+		}
+		cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
+		cmd_q->ring_buff_count--;
+		events_count++;
+		tail = (tail + 1) & mask;
+	}
+
+	cmd_q->stats.completed += events_count;
+	cmd_q->stats.errors += fails;
+	cmd_q->next_read = tail;
+	*failed_count = fails;
+	return events_count;
+}
+
+/* Returns successful operations count and sets error flag if any errors. */
+static uint16_t
+ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
+		const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count, sl_count;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	*has_error = false;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	sl_count = cpl_count - err_count;
+	if (err_count)
+		*has_error = true;
+
+	return sl_count;
+}
+
+static uint16_t
+ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
+		uint16_t max_ops, uint16_t *last_idx,
+		enum rte_dma_status_code *status)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count;
+	uint16_t i;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	if (likely(err_count == 0)) {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = RTE_DMA_STATUS_SUCCESSFUL;
+	} else {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = __translate_status_ae4dma_to_dma(cmd_q->status[i]);
+	}
+
+	return cpl_count;
+}
+
+/* Get the remaining capacity of the ring. */
+static uint16_t
+ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
+{
+	const struct ae4dma_dmadev *ae4dma = dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t read_idx = cmd_q->next_read;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t used;
+
+	if (nb < 2 || !rte_is_power_of_2(nb))
+		return 0;
+
+	mask = nb - 1;
+	used = (uint16_t)((write_idx - read_idx) & mask);
+	/* One slot reserved (same rule as enqueue). */
+	if (used >= nb - 1)
+		return 0;
+	return (uint16_t)(nb - 1 - used);
+}
+
 /* Retrieve the generic stats of a DMA device. */
 static int
 ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
@@ -357,6 +638,13 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
 	dmadev->dev_ops = &ae4dma_dmadev_ops;
 
+	dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
+	dmadev->fp_obj->completed = ae4dma_completed;
+	dmadev->fp_obj->completed_status = ae4dma_completed_status;
+	dmadev->fp_obj->copy = ae4dma_enqueue_copy;
+	dmadev->fp_obj->submit = ae4dma_submit;
+	/* fill capability not advertised: leave fp_obj->fill as zero-initialised. */
+
 	ae4dma = dmadev->data->dev_private;
 	ae4dma->dmadev = dmadev;
 	ae4dma->pci = dev;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
@ 2026-06-22 12:06     ` David Marchand
  2026-06-22 12:16       ` Bruce Richardson
                         ` (2 more replies)
  2026-06-22 12:26     ` David Marchand
  1 sibling, 3 replies; 24+ messages in thread
From: David Marchand @ 2026-06-22 12:06 UTC (permalink / raw)
  To: Raghavendra Ningoji
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian, Chengwen Feng

On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
<raghavendra.ningoji@amd.com> wrote:
>
> Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
> hardware DMA engine, providing only PCI probe/remove and per-queue
> hardware initialisation. An AE4DMA engine exposes 16 hardware command
> queues, each with a 32-entry descriptor ring; the PMD maps each
> hardware channel to its own dmadev with a single virtual channel,
> so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
> "<pci-bdf>-ch15".

I am not familiar with DMA drivers, I am not sure it is something acceptable.
@Chengwen for info.


>
> This patch only registers the PCI driver, allocates the dmadev
> objects, reserves the per-queue descriptor rings and programs the
> hardware queue base addresses. Control and data path operations are
> added in subsequent patches.
>
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>

Here is a superficial review.

Many places are fishy when it comes to integer/pointer casts: I only
raised a few comments on this topic.


[snip]

> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> new file mode 100644
> index 0000000000..76de2cde45
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -0,0 +1,227 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <rte_bus_pci.h>
> +#include <bus_pci_driver.h>
> +#include <rte_dmadev_pmd.h>
> +#include <rte_malloc.h>
> +
> +#include "ae4dma_internal.h"
> +
> +/*
> + * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
> + * virtual channel. The HW's per-queue register block must be densely
> + * packed right after the engine-common config register at BAR0+0; the
> + * build-time check below catches an accidental layout change.
> + */
> +static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
> +               "ae4dma_hwq_regs stride changed; per-queue offset math will break");
> +
> +RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
> +
> +#define AE4DMA_PMD_NAME dmadev_ae4dma
> +
> +static const struct rte_memzone *
> +ae4dma_queue_dma_zone_reserve(const char *queue_name,
> +               uint32_t queue_size, int socket_id)
> +{
> +       const struct rte_memzone *mz;
> +
> +       mz = rte_memzone_lookup(queue_name);
> +       if (mz != NULL) {
> +               if (((size_t)queue_size <= mz->len) &&
> +                               ((socket_id == SOCKET_ID_ANY) ||
> +                                (socket_id == mz->socket_id))) {
> +                       AE4DMA_PMD_INFO("reuse memzone already "
> +                                       "allocated for %s", queue_name);
> +                       return mz;
> +               }
> +               AE4DMA_PMD_ERR("Incompatible memzone already "
> +                               "allocated %s, size %u, socket %d. "
> +                               "Requested size %u, socket %u",
> +                               queue_name, (uint32_t)mz->len,
> +                               mz->socket_id, queue_size, socket_id);
> +               return NULL;
> +       }
> +       return rte_memzone_reserve_aligned(queue_name, queue_size,
> +                       socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
> +}
> +
> +static int
> +ae4dma_add_queue(struct ae4dma_dmadev *dev, uint8_t qn, const char *pci_name)
> +{
> +       uint32_t dma_addr_lo, dma_addr_hi;
> +       struct ae4dma_cmd_queue *cmd_q;
> +       const struct rte_memzone *q_mz;
> +
> +       dev->io_regs = dev->pci->mem_resource[AE4DMA_PCIE_BAR].addr;
> +
> +       cmd_q = &dev->cmd_q;
> +       cmd_q->id = qn;
> +       cmd_q->qidx = 0;
> +       cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
> +       cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
> +
> +       /*
> +        * Memzone name must be globally unique. Embed PCI BDF so multiple
> +        * PCI functions probed concurrently don't collide.
> +        */
> +       snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
> +                       "ae4dma_%s_q%u", pci_name, (unsigned int)qn);
> +
> +       q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
> +                       cmd_q->qsize, rte_socket_id());
> +       if (q_mz == NULL) {
> +               AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
> +               return -ENOMEM;
> +       }

I see no tracking of q_mz, so I suspect this memzone is leaked on
device probing failure, and/or unplugging.


> +
> +       cmd_q->qbase_addr = (void *)q_mz->addr;
> +       cmd_q->qbase_desc = (struct ae4dma_desc *)q_mz->addr;
> +       cmd_q->qbase_phys_addr = q_mz->iova;
> +
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                       AE4DMA_CMD_QUEUE_ENABLE);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
> +                       AE4DMA_DISABLE_INTR);
> +       cmd_q->next_write = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +       cmd_q->next_read = (uint16_t)AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);

Strange that you need to cast.


> +       cmd_q->ring_buff_count = 0;
> +
> +       dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
> +       dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
> +
> +       return 0;
> +}
> +
> +static void
> +ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
> +               unsigned int ch)
> +{
> +       snprintf(out, outlen, "%s-ch%u", pci_name, ch);
> +}
> +
> +/* Create a dmadev(dpdk DMA device) */

This is a general comment for the patch: let's avoid Lapalissade /
trivial comments that adds nothing.
The function name is self explanatory.


> +static int
> +ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
> +{
> +       struct rte_dma_dev *dmadev = NULL;
> +       struct ae4dma_dmadev *ae4dma = NULL;

Those variables do not need any explicit setting to NULL, since there
are set at their first use.


> +       char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
> +
> +       if (!name) {

Such check will only confuse AI tools or other static code analysers,
as those tools will assume the function *may* be called with a NULL
pointer.
This is a static helper called internally from a single location,
remove the check.


> +               AE4DMA_PMD_ERR("Invalid name of the device!");
> +               return -EINVAL;
> +       }
> +       memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
> +       ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
> +
> +       dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
> +                       sizeof(struct ae4dma_dmadev));
> +       if (dmadev == NULL) {
> +               AE4DMA_PMD_ERR("Unable to allocate dma device");
> +               return -ENOMEM;
> +       }
> +       dmadev->device = &dev->device;
> +       dmadev->fp_obj->dev_private = dmadev->data->dev_private;
> +
> +       ae4dma = dmadev->data->dev_private;
> +       ae4dma->dmadev = dmadev;

Such a back reference looks odd to me (how could you end with only a
reference to the priv pointer, which is in general deduced from the
dmadev pointer?).
And, in the end, this field is never used in the series.

Please remove.


> +       ae4dma->pci = dev;

dev is already a rte_pci_device pointer, and you only need to pass it
to ae4dma_add_queue as an argument.
By doing this change, there is no user of this field in the series,
please remove.


One note on this topic, you have a reference to the rte_device in the
dmadev object.
On the principle, the pci device can be resolved via
RTE_BUS_DEVICE(dmadev->device, struct rte_pci_device), or
RTE_BUS_DEVICE(dmadev->device, *pci_dev).
See other drivers for examples.


> +
> +       if (ae4dma_add_queue(ae4dma, qn, name) != 0)
> +               goto init_error;
> +       return 0;
> +
> +init_error:
> +       AE4DMA_PMD_ERR("driver %s(): failed", __func__);

__func__ is already part of AE4DMA_PMD_LOG.


> +       rte_dma_pmd_release(hwq_dev_name);
> +       return -ENOMEM;
> +}
> +
> +/* Probe DMA device. */
> +static int
> +ae4dma_dmadev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
> +{
> +       char name[32];
> +       char chname[RTE_DEV_NAME_MAX_LEN];
> +       void *mmio_base;
> +       uint32_t q_per_eng;
> +       int ret = 0;
> +       uint8_t i;
> +
> +       rte_pci_device_name(&dev->addr, name, sizeof(name));
> +       AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
> +       dev->device.driver = &drv->driver;

Setting the driver pointer in the device object is not the driver
responsibility anymore with commit f282771a04ef ("bus: factorize
driver reference").
EAL will set this field on probe() success.


> +
> +       mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
> +       if (mmio_base == NULL) {
> +               AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
> +               return -ENODEV;
> +       }
> +
> +       /* Program the per-engine HW queue count once. */
> +       AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
> +                       AE4DMA_MAX_HW_QUEUES);
> +       q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
> +       AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
> +
> +       for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +               ret = ae4dma_dmadev_create(name, dev, i);
> +               if (ret != 0) {
> +                       AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
> +                       while (i > 0) {
> +                               i--;
> +                               ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +                               rte_dma_pmd_release(chname);
> +                       }
> +                       break;
> +               }
> +       }
> +       return ret;
> +}
> +
> +/* Remove DMA device. */
> +static int
> +ae4dma_dmadev_remove(struct rte_pci_device *dev)
> +{
> +       char name[32];
> +       char chname[RTE_DEV_NAME_MAX_LEN];
> +       unsigned int i;
> +
> +       rte_pci_device_name(&dev->addr, name, sizeof(name));
> +
> +       AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
> +                       name, dev->device.numa_node);
> +
> +       for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +               ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +               rte_dma_pmd_release(chname);
> +       }
> +       return 0;
> +}
> +
> +static const struct rte_pci_id pci_id_ae4dma_map[] = {
> +       { RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
> +       { .vendor_id = 0, /* sentinel */ },
> +};
> +
> +static struct rte_pci_driver ae4dma_pmd_drv = {
> +       .id_table = pci_id_ae4dma_map,
> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> +       .probe = ae4dma_dmadev_probe,
> +       .remove = ae4dma_dmadev_remove,
> +};
> +
> +RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
> +RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
> +RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
> diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> new file mode 100644
> index 0000000000..62b6a1b30b
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> @@ -0,0 +1,160 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef __AE4DMA_HW_DEFS_H__
> +#define __AE4DMA_HW_DEFS_H__
> +

Is this header autosufficient ?
I see references to uint32_t below, so this header probably depends on stdint.h.


> +#include <rte_bus_pci.h>
> +#include <rte_byteorder.h>
> +#include <rte_io.h>
> +#include <rte_pci.h>
> +#include <rte_memzone.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

Do we really need C++ guards?

> +
> +#define AE4DMA_BIT(nr)                 (1UL << (nr))
> +
> +/* ae4dma device details */
> +#define AMD_VENDOR_ID  0x1022
> +#define AE4DMA_DEVICE_ID       0x149b
> +#define AE4DMA_PCIE_BAR 0
> +
> +/*
> + * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
> + */
> +#define AE4DMA_MAX_HW_QUEUES        16
> +#define AE4DMA_QUEUE_START_INDEX    0
> +#define AE4DMA_CMD_QUEUE_ENABLE                0x1
> +#define AE4DMA_CMD_QUEUE_DISABLE       0x0
> +
> +/* Common to all queues */
> +#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
> +
> +#define AE4DMA_DISABLE_INTR 0x01
> +
> +/* Descriptor status */
> +enum ae4dma_dma_status {
> +       AE4DMA_DMA_DESC_SUBMITTED = 0,
> +       AE4DMA_DMA_DESC_VALIDATED = 1,
> +       AE4DMA_DMA_DESC_PROCESSED = 2,
> +       AE4DMA_DMA_DESC_COMPLETED = 3,
> +       AE4DMA_DMA_DESC_ERROR = 4,
> +};
> +
> +/* Descriptor error-code */
> +enum ae4dma_dma_err {
> +       AE4DMA_DMA_ERR_NO_ERR = 0,
> +       AE4DMA_DMA_ERR_INV_HEADER = 1,
> +       AE4DMA_DMA_ERR_INV_STATUS = 2,
> +       AE4DMA_DMA_ERR_INV_LEN = 3,
> +       AE4DMA_DMA_ERR_INV_SRC = 4,
> +       AE4DMA_DMA_ERR_INV_DST = 5,
> +       AE4DMA_DMA_ERR_INV_ALIGN = 6,
> +       AE4DMA_DMA_ERR_UNKNOWN = 7,
> +};
> +
> +/* HW Queue status */
> +enum ae4dma_hwqueue_status {
> +       AE4DMA_HWQUEUE_EMPTY = 0,
> +       AE4DMA_HWQUEUE_FULL = 1,
> +       AE4DMA_HWQUEUE_NOT_EMPTY = 4

For consistency with other enums, add a comma.


> +};
> +/*
> + * descriptor for AE4DMA commands
> + * 8 32-bit words:
> + * word 0: source memory type; destination memory type ; control bits
> + * word 1: desc_id; error code; status
> + * word 2: length
> + * word 3: reserved
> + * word 4: upper 32 bits of source pointer
> + * word 5: low 32 bits of source pointer
> + * word 6: upper 32 bits of destination pointer
> + * word 7: low 32 bits of destination pointer
> + */
> +
> +/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
> +#define AE4DMA_DWORD0_STOP_ON_COMPLETION       AE4DMA_BIT(0)
> +#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION  AE4DMA_BIT(1)
> +#define AE4DMA_DWORD0_START_OF_MESSAGE         AE4DMA_BIT(3)
> +#define AE4DMA_DWORD0_END_OF_MESSAGE           AE4DMA_BIT(4)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE  RTE_GENMASK64(5, 4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE      RTE_GENMASK64(7, 6)
> +
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
> +
> +struct ae4dma_desc_dword0 {
> +       uint8_t byte0;
> +       uint8_t byte1;
> +       uint16_t timestamp;
> +};
> +
> +struct ae4dma_desc_dword1 {
> +       uint8_t status;
> +       uint8_t err_code;
> +       uint16_t desc_id;
> +};
> +
> +struct ae4dma_desc {
> +       struct ae4dma_desc_dword0 dw0;
> +       struct ae4dma_desc_dword1 dw1;
> +       uint32_t length;
> +       uint32_t reserved;
> +       uint32_t src_lo;
> +       uint32_t src_hi;
> +       uint32_t dst_lo;
> +       uint32_t dst_hi;
> +};
> +
> +/*
> + * Registers for each queue :4 bytes length
> + * Effective address : offset + reg
> + */
> +struct ae4dma_hwq_regs {
> +       union {
> +               uint32_t control_raw;
> +               struct {
> +                       uint32_t queue_enable: 1;
> +                       uint32_t reserved_internal: 31;
> +               } control;
> +       } control_reg;
> +
> +       union {
> +               uint32_t status_raw;
> +               struct {
> +                       uint32_t reserved0: 1;
> +                       /* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
> +                       uint32_t queue_status: 2;
> +                       uint32_t reserved1: 21;
> +                       uint32_t interrupt_type: 4;
> +                       uint32_t reserved2: 4;
> +               } status;
> +       } status_reg;
> +
> +       uint32_t max_idx;
> +       uint32_t read_idx;
> +       uint32_t write_idx;
> +
> +       union {
> +               uint32_t intr_status_raw;
> +               struct {
> +                       uint32_t intr_status: 1;
> +                       uint32_t reserved: 31;
> +               } intr_status;
> +       } intr_status_reg;
> +
> +       uint32_t qbase_lo;
> +       uint32_t qbase_hi;
> +
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* AE4DMA_HW_DEFS_H */
> diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
> new file mode 100644
> index 0000000000..9892d6697f
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_internal.h
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef _AE4DMA_INTERNAL_H_
> +#define _AE4DMA_INTERNAL_H_
> +
> +#include <stdint.h>
> +
> +#include "ae4dma_hw_defs.h"
> +
> +/**

This is an internal header, we don't need doxygen style comments,
simple comments are enough.


> + * upper_32_bits - return bits 32-63 of a number
> + * @n: the number we're accessing
> + */
> +#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
> +
> +/**
> + * lower_32_bits - return bits 0-31 of a number
> + * @n: the number we're accessing
> + */
> +#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
> +
> +/** Hardware ring depth (slots per queue); must be power of two. */
> +#define AE4DMA_DESCRIPTORS_PER_CMDQ    32
> +#define AE4DMA_QUEUE_DESC_SIZE         sizeof(struct ae4dma_desc)
> +#define AE4DMA_QUEUE_SIZE(n)           (AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
> +
> +
> +/** AE4DMA registers Write/Read */
> +static inline void ae4dma_pci_reg_write(void *base, int offset,
> +               uint32_t value)
> +{
> +       volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +       rte_write32((rte_cpu_to_le_32(value)), reg_addr);
> +}
> +
> +static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
> +{
> +       volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +       return rte_le_to_cpu_32(rte_read32(reg_addr));
> +}
> +
> +#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
> +       ae4dma_pci_reg_read(hw_addr, reg_offset)
> +
> +#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
> +       ae4dma_pci_reg_write(hw_addr, reg_offset, value)
> +
> +
> +#define AE4DMA_READ_REG(hw_addr) \
> +       ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
> +
> +#define AE4DMA_WRITE_REG(hw_addr, value) \
> +       ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
> +
> +static inline uint32_t
> +low32_value(unsigned long addr)
> +{
> +       return ((uint64_t)addr) & 0xffffffffUL;
> +}
> +
> +static inline uint32_t
> +high32_value(unsigned long addr)
> +{
> +       return (uint32_t)(((uint64_t)addr) >> 32);
> +}
> +
> +/**
> + * A structure describing a AE4DMA command queue.
> + */
> +struct __rte_cache_aligned ae4dma_cmd_queue {
> +       char memz_name[RTE_MEMZONE_NAMESIZE];
> +       volatile struct ae4dma_hwq_regs *hwq_regs;
> +
> +       struct rte_dma_vchan_conf qcfg;
> +       struct rte_dma_stats stats;
> +       /* Queue address */
> +       struct ae4dma_desc *qbase_desc;
> +       void *qbase_addr;
> +       rte_iova_t qbase_phys_addr;
> +       enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
> +       /* Queue identifier */
> +       uint64_t id;    /**< queue id */
> +       uint64_t qidx;  /**< queue index */
> +       uint64_t qsize; /**< queue size */
> +       uint32_t ring_buff_count;
> +       unsigned short next_read;
> +       unsigned short next_write;
> +       unsigned short last_write; /* Used to compute submitted count. */
> +};
> +
> +/*
> + * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
> + * dmadevs per PCI function, each owning a single HW command queue.
> + */
> +struct ae4dma_dmadev {
> +       struct rte_dma_dev *dmadev;
> +       void *io_regs;
> +       struct ae4dma_cmd_queue cmd_q; /**< single HW queue owned by this dmadev */
> +       struct rte_pci_device *pci;    /**< owning PCI device (not owned) */
> +};
> +
> +
> +extern int ae4dma_pmd_logtype;
> +#define RTE_LOGTYPE_AE4DMA_PMD ae4dma_pmd_logtype
> +
> +#define AE4DMA_PMD_LOG(level, ...) \
> +       RTE_LOG_LINE_PREFIX(level, AE4DMA_PMD, "%s(): ", __func__, __VA_ARGS__)
> +
> +#define AE4DMA_PMD_DEBUG(...)  AE4DMA_PMD_LOG(DEBUG, __VA_ARGS__)
> +#define AE4DMA_PMD_INFO(...)   AE4DMA_PMD_LOG(INFO, __VA_ARGS__)
> +#define AE4DMA_PMD_ERR(...)    AE4DMA_PMD_LOG(ERR, __VA_ARGS__)
> +#define AE4DMA_PMD_WARN(...)   AE4DMA_PMD_LOG(WARNING, __VA_ARGS__)
> +
> +#endif /* _AE4DMA_INTERNAL_H_ */


-- 
David Marchand


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/3] dma/ae4dma: add control path operations
  2026-05-25 18:42   ` [PATCH v2 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
@ 2026-06-22 12:15     ` David Marchand
  2026-06-25 18:42       ` Raghavendra Ningoji
  0 siblings, 1 reply; 24+ messages in thread
From: David Marchand @ 2026-06-22 12:15 UTC (permalink / raw)
  To: Raghavendra Ningoji
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian

On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
<raghavendra.ningoji@amd.com> wrote:
>
> Implement the dmadev control path for the AMD AE4DMA PMD.
>
> This commit adds:
>  - dev_configure / vchan_setup: accept a single virtual channel per
>    dmadev and clamp the requested ring size to the hardware maximum
>    of 32 descriptors (rounded up to a power of two).
>  - dev_start / dev_stop / dev_close: program the per-queue control
>    register to enable/disable the hardware queue and release the
>    descriptor ring memzone on close.
>  - dev_info_get: advertise RTE_DMA_CAPA_MEM_TO_MEM and the fixed
>    ring depth.
>  - dev_dump: print the queue identifiers, ring layout and software
>    completion counters.
>  - stats_get / stats_reset: expose submitted / completed / errors
>    counters maintained by the driver.
>  - vchan_status: report IDLE / ACTIVE based on hardware read_idx vs
>    write_idx, and HALTED_ERROR when the queue is not enabled.
>
> The dmadev framework is wired through dev_ops in ae4dma_dmadev_create().
>
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
> ---
>  drivers/dma/ae4dma/ae4dma_dmadev.c | 223 +++++++++++++++++++++++++++++
>  1 file changed, 223 insertions(+)
>
> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> index 76de2cde45..dfda723c13 100644
> --- a/drivers/dma/ae4dma/ae4dma_dmadev.c
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -53,6 +53,215 @@ ae4dma_queue_dma_zone_reserve(const char *queue_name,
>                         socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
>  }
>
> +/* Configure a device. */
> +static int
> +ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
> +               const struct rte_dma_conf *dev_conf,
> +               uint32_t conf_sz)
> +{
> +       if (sizeof(struct rte_dma_conf) != conf_sz)
> +               return -EINVAL;
> +
> +       if (dev_conf->nb_vchans != 1)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
> +static int
> +ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +               const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t max_desc = qconf->nb_desc;
> +
> +       if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
> +               return -EINVAL;
> +
> +       if (max_desc < 2)
> +               return -EINVAL;
> +
> +       if (!rte_is_power_of_2(max_desc))
> +               max_desc = rte_align32pow2(max_desc);
> +
> +       if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
> +               AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
> +                               dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +               max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +       }
> +
> +       cmd_q->qcfg = *qconf;
> +       cmd_q->qcfg.nb_desc = max_desc;
> +
> +       /* Ensure all counters are reset, if reconfiguring/restarting device. */
> +       memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
> +       return 0;
> +}
> +
> +/* Start a configured device. */
> +static int
> +ae4dma_dev_start(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +       uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +       if (nb == 0)
> +               return -EBUSY;
> +
> +       /* Program ring depth expected by hardware. */
> +       AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
> +       return 0;
> +}
> +
> +/* Stop a configured device. */
> +static int
> +ae4dma_dev_stop(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +       if (cmd_q->hwq_regs != NULL)
> +               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                               AE4DMA_CMD_QUEUE_DISABLE);
> +       return 0;
> +}
> +
> +/* Get device information of a device. */
> +static int
> +ae4dma_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info,
> +               uint32_t size)
> +{
> +       if (size < sizeof(*info))
> +               return -EINVAL;
> +       info->dev_name = dev->device->name;

The dmadev library sets this field in rte_dma_info_get().
Please remove.


> +       info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
> +       info->max_vchans = 1;
> +       info->min_desc = 2;
> +       info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +       info->nb_vchans = 1;
> +       return 0;
> +}
> +
> +/* Close a configured device. */
> +static int
> +ae4dma_dev_close(struct rte_dma_dev *dev)
> +{
> +       struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +       struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +       if (cmd_q->hwq_regs != NULL)
> +               AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +                               AE4DMA_CMD_QUEUE_DISABLE);
> +
> +       if (cmd_q->memz_name[0] != '\0') {
> +               const struct rte_memzone *mz = rte_memzone_lookup(cmd_q->memz_name);

Rather than resolve again, can't you store the reference to the
memzone in the priv pointer at probe time?


> +
> +               if (mz != NULL)
> +                       rte_memzone_free(mz);

No need to test for NULL.


> +       }
> +       cmd_q->qbase_desc = NULL;
> +       cmd_q->qbase_addr = NULL;
> +       cmd_q->qbase_phys_addr = 0;
> +       return 0;
> +}

[snip]


-- 
David Marchand


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-22 12:06     ` David Marchand
@ 2026-06-22 12:16       ` Bruce Richardson
  2026-06-24  0:38       ` fengchengwen
  2026-06-25 18:41       ` Raghavendra Ningoji
  2 siblings, 0 replies; 24+ messages in thread
From: Bruce Richardson @ 2026-06-22 12:16 UTC (permalink / raw)
  To: David Marchand
  Cc: Raghavendra Ningoji, dev, Thomas Monjalon, Bhagyada Modali,
	Robin Jarry, Selwin.Sebastian, Chengwen Feng

On Mon, Jun 22, 2026 at 02:06:55PM +0200, David Marchand wrote:
> On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
> <raghavendra.ningoji@amd.com> wrote:
> >
> > Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
> > hardware DMA engine, providing only PCI probe/remove and per-queue
> > hardware initialisation. An AE4DMA engine exposes 16 hardware command
> > queues, each with a 32-entry descriptor ring; the PMD maps each
> > hardware channel to its own dmadev with a single virtual channel,
> > so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
> > "<pci-bdf>-ch15".
> 
> I am not familiar with DMA drivers, I am not sure it is something acceptable.
> @Chengwen for info.
> 
This is similar with what is done by idxd driver when used as a PCI device
bound to vfio. We make the number of channels to configure a devarg, and
each channel becomes its own dmadev instance, since each channel is
independent from a user viewpoint. Only difference is that we use "q"
rather than "ch" in the naming. See [1] for what idxd does.

/Bruce

[1] https://github.com/DPDK/dpdk/blob/main/drivers/dma/idxd/idxd_pci.c#L326

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
                     ` (2 preceding siblings ...)
  2026-05-25 18:42   ` [PATCH v2 3/3] dma/ae4dma: add data " Raghavendra Ningoji
@ 2026-06-22 12:25   ` David Marchand
  2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
  4 siblings, 0 replies; 24+ messages in thread
From: David Marchand @ 2026-06-22 12:25 UTC (permalink / raw)
  To: Raghavendra Ningoji
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian, Chengwen Feng, Bruce Richardson

Hello,

On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
<raghavendra.ningoji@amd.com> wrote:
>
> This series adds a new dmadev poll-mode driver for the AMD AE4DMA
> hardware DMA engine. An AE4DMA engine exposes 16 hardware command
> queues, each with a 32-entry descriptor ring; the PMD maps each
> hardware channel to its own dmadev with a single virtual channel,
> so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
> "<pci-bdf>-ch15".
>
> Driver characteristics:
>
>  - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
>  - Completion is detected via the hardware's per-queue read_idx
>    register, which the engine advances as it processes descriptors.
>    The descriptor status / err_code bytes are read only to classify
>    each drained slot as success or failure.
>  - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
>    and HALTED_ERROR when the queue is not enabled.
>  - depends on bus_pci and dmadev.
>
> The v1 was submitted as a single patch.  Per review feedback the
> driver is now introduced in three logical patches, following the
> pattern of the recent hisi_acc dmadev driver:
>
>   1/3 - introduce driver (probe, remove, per-queue HW init)
>   2/3 - add control path operations (dev_ops)
>   3/3 - add data path operations (copy, submit, completion)
> ---
> Changes in v2:
>  - Split the monolithic v1 patch into three logical patches
>    (introduce / control path / data path), mirroring the
>    structure used by drivers/dma/hisi_acc.
>  - Fix checkpatches.sh warnings in drivers/dma/ae4dma/ae4dma_internal.h:
>      * Use RTE_LOG_LINE_PREFIX (with RTE_LOGTYPE_AE4DMA_PMD) instead
>        of the deprecated rte_log() call form.
>      * Replace the GCC variadic argument-pack extension ("args...")
>        with C99 __VA_ARGS__ in the AE4DMA_PMD_{LOG,DEBUG,INFO,ERR,
>        WARN} macros.
>  - Move __rte_cache_aligned to the "struct" keyword position on
>    struct ae4dma_cmd_queue, as required by checkpatches.sh.
>
> v1:https://patches.dpdk.org/project/dpdk/patch/20260518181856.1228373-1-raghavendra.ningoji@amd.com/
>
> Raghavendra Ningoji (3):
>   dma/ae4dma: introduce AMD AE4DMA DMA PMD
>   dma/ae4dma: add control path operations
>   dma/ae4dma: add data path operations
>
>  .mailmap                               |   1 +
>  MAINTAINERS                            |   5 +
>  doc/guides/dmadevs/ae4dma.rst          |  75 +++
>  doc/guides/dmadevs/index.rst           |   1 +
>  doc/guides/rel_notes/release_26_07.rst |   7 +
>  drivers/dma/ae4dma/ae4dma_dmadev.c     | 738 +++++++++++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_hw_defs.h    | 160 ++++++
>  drivers/dma/ae4dma/ae4dma_internal.h   | 118 ++++
>  drivers/dma/ae4dma/meson.build         |   7 +
>  drivers/dma/meson.build                |   1 +
>  usertools/dpdk-devbind.py              |   5 +-
>  11 files changed, 1117 insertions(+), 1 deletion(-)
>  create mode 100644 doc/guides/dmadevs/ae4dma.rst
>  create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
>  create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
>  create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
>  create mode 100644 drivers/dma/ae4dma/meson.build
>
>
> base-commit: f724d1c0d1c1636b9c171c34db3f17c3defaa2f3

I did a pass on this series and sent comments, nothing blocking but please fix.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
  2026-06-22 12:06     ` David Marchand
@ 2026-06-22 12:26     ` David Marchand
  2026-06-22 12:37       ` Bruce Richardson
  2026-06-25 18:43       ` Raghavendra Ningoji
  1 sibling, 2 replies; 24+ messages in thread
From: David Marchand @ 2026-06-22 12:26 UTC (permalink / raw)
  To: Raghavendra Ningoji
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian

On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
<raghavendra.ningoji@amd.com> wrote:
> diff --git a/.mailmap b/.mailmap
> index 89ba6ffccc..60180818f9 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -203,6 +203,7 @@ Benoît Ganne <bganne@cisco.com>
>  Bernard Iremonger <bernard.iremonger@intel.com>
>  Bert van Leeuwen <bert.vanleeuwen@netronome.com>
>  Bhagyada Modali <bhagyada.modali@amd.com>
> +Raghavendra Ningoji <raghavendra.ningoji@amd.com>
>  Bharat Mota <bharat.mota@broadcom.com> <bmota@vmware.com>
>  Bhuvan Mital <bhuvan.mital@amd.com>
>  Bibo Mao <maobibo@loongson.cn>

Almost missed this.
Alphabetical order please.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-22 12:26     ` David Marchand
@ 2026-06-22 12:37       ` Bruce Richardson
  2026-06-25 18:43       ` Raghavendra Ningoji
  1 sibling, 0 replies; 24+ messages in thread
From: Bruce Richardson @ 2026-06-22 12:37 UTC (permalink / raw)
  To: David Marchand
  Cc: Raghavendra Ningoji, dev, Thomas Monjalon, Bhagyada Modali,
	Robin Jarry, Selwin.Sebastian

On Mon, Jun 22, 2026 at 02:26:33PM +0200, David Marchand wrote:
> On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
> <raghavendra.ningoji@amd.com> wrote:
> > diff --git a/.mailmap b/.mailmap
> > index 89ba6ffccc..60180818f9 100644
> > --- a/.mailmap
> > +++ b/.mailmap
> > @@ -203,6 +203,7 @@ Benoît Ganne <bganne@cisco.com>
> >  Bernard Iremonger <bernard.iremonger@intel.com>
> >  Bert van Leeuwen <bert.vanleeuwen@netronome.com>
> >  Bhagyada Modali <bhagyada.modali@amd.com>
> > +Raghavendra Ningoji <raghavendra.ningoji@amd.com>
> >  Bharat Mota <bharat.mota@broadcom.com> <bmota@vmware.com>
> >  Bhuvan Mital <bhuvan.mital@amd.com>
> >  Bibo Mao <maobibo@loongson.cn>
> 
> Almost missed this.
> Alphabetical order please.
> 
To make it a little easier, you can use devtools/mailmap_ctl.py:

	mailmap_ctl.py add "name1 name2 <email@domain>"

And that will automatically insert the name in the correct location in the
file for you.

/Bruce

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-22 12:06     ` David Marchand
  2026-06-22 12:16       ` Bruce Richardson
@ 2026-06-24  0:38       ` fengchengwen
  2026-06-25 18:41       ` Raghavendra Ningoji
  2 siblings, 0 replies; 24+ messages in thread
From: fengchengwen @ 2026-06-24  0:38 UTC (permalink / raw)
  To: David Marchand, Raghavendra Ningoji
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian

On 6/22/2026 8:06 PM, David Marchand wrote:
> On Mon, 25 May 2026 at 20:43, Raghavendra Ningoji
> <raghavendra.ningoji@amd.com> wrote:
>> Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
>> hardware DMA engine, providing only PCI probe/remove and per-queue
>> hardware initialisation. An AE4DMA engine exposes 16 hardware command
>> queues, each with a 32-entry descriptor ring; the PMD maps each
>> hardware channel to its own dmadev with a single virtual channel,
>> so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
>> "<pci-bdf>-ch15".
> I am not familiar with DMA drivers, I am not sure it is something acceptable.
> @Chengwen for info.

This is acceptable. For a DMA controller (which may be a PCI device), there
may be multiple hardware channels, and each hardware channel is presented as
a dmadev device. The device name can be in the format of BDF-chX.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-22 12:06     ` David Marchand
  2026-06-22 12:16       ` Bruce Richardson
  2026-06-24  0:38       ` fengchengwen
@ 2026-06-25 18:41       ` Raghavendra Ningoji
  2 siblings, 0 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:41 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian, Chengwen Feng, Bruce Richardson

On Mon, 22 Jun 2026 at 14:06, David Marchand <david.marchand@redhat.com> wrote:
>
> Here is a superficial review.
>
> Many places are fishy when it comes to integer/pointer casts: I only
> raised a few comments on this topic.

Thanks for the review. I went through the cast usage as well; the
low32_value()/high32_value() helpers (which took an unsigned long and
were therefore broken on LLP64) are gone in v3, replaced by
lower_32_bits()/upper_32_bits() on the rte_iova_t value, and the
redundant index casts are removed. Replies inline.

> > +       q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
> > +                       cmd_q->qsize, rte_socket_id());
>
> I see no tracking of q_mz, so I suspect this memzone is leaked on
> device probing failure, and/or unplugging.

The memzone is now stored in cmd_q->mz at probe time and freed directly
in dev_close(). dev_close() is reached on the unplug path too
(remove() -> rte_dma_pmd_release() -> rte_dma_close()), so the ring is
no longer leaked.

> > +       cmd_q->next_write = (uint16_t)AE4DMA_READ_REG(...);
>
> Strange that you need to cast.

Removed; next_read/next_write/last_write are uint16_t and the registers
are read into them without an explicit cast in v3.

> > +/* Create a dmadev(dpdk DMA device) */
>
> This is a general comment for the patch: let's avoid Lapalissade /
> trivial comments that adds nothing.

Removed the trivial "what" comments across the series.

> > +       struct rte_dma_dev *dmadev = NULL;
> > +       struct ae4dma_dmadev *ae4dma = NULL;
>
> Those variables do not need any explicit setting to NULL [...]

Done.

> > +       if (!name) {
>
> [...] This is a static helper called internally from a single
> location, remove the check.

Removed.

> > +       ae4dma->dmadev = dmadev;
>
> [...] this field is never used in the series. Please remove.

Removed the field and the assignment.

> > +       ae4dma->pci = dev;
>
> [...] no user of this field in the series, please remove.

Removed. ae4dma_add_queue() now takes the rte_pci_device pointer as an
argument instead.

> > +init_error:
> > +       AE4DMA_PMD_ERR("driver %s(): failed", __func__);
>
> __func__ is already part of AE4DMA_PMD_LOG.

Dropped __func__ from the message.

> > +       dev->device.driver = &drv->driver;
>
> Setting the driver pointer in the device object is not the driver
> responsibility anymore [...]. EAL will set this field on probe()
> success.

Removed; the drv argument is now __rte_unused.

> > +#ifndef __AE4DMA_HW_DEFS_H__
>
> Is this header autosufficient ? I see references to uint32_t below,
> so this header probably depends on stdint.h.

Added #include <stdint.h>.

> > +#ifdef __cplusplus
> > +extern "C" {
>
> Do we really need C++ guards?

Removed (internal header).

> > +       AE4DMA_HWQUEUE_NOT_EMPTY = 4
>
> For consistency with other enums, add a comma.

Done.

> > +/**
>
> This is an internal header, we don't need doxygen style comments,
> simple comments are enough.

Converted the doxygen comments to plain comments.

Sent as v3.

Thanks,
Raghavendra

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/3] dma/ae4dma: add control path operations
  2026-06-22 12:15     ` David Marchand
@ 2026-06-25 18:42       ` Raghavendra Ningoji
  0 siblings, 0 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:42 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian

On Mon, 22 Jun 2026 at 14:15, David Marchand <david.marchand@redhat.com> wrote:
>
> > +       info->dev_name = dev->device->name;
>
> The dmadev library sets this field in rte_dma_info_get().
> Please remove.

Removed.

> > +               const struct rte_memzone *mz = rte_memzone_lookup(cmd_q->memz_name);
>
> Rather than resolve again, can't you store the reference to the
> memzone in the priv pointer at probe time?

Done. The memzone reference is stored in cmd_q->mz at probe time (in
patch 1/3) and dev_close() now frees cmd_q->mz directly without a
lookup.

> > +               if (mz != NULL)
> > +                       rte_memzone_free(mz);
>
> No need to test for NULL.

Removed; rte_memzone_free(cmd_q->mz) is called unconditionally.

Sent as v3.

Thanks,
Raghavendra

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-22 12:26     ` David Marchand
  2026-06-22 12:37       ` Bruce Richardson
@ 2026-06-25 18:43       ` Raghavendra Ningoji
  1 sibling, 0 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:43 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Bhagyada Modali, Robin Jarry,
	Selwin.Sebastian, Bruce Richardson

On Mon, 22 Jun 2026 at 14:26, David Marchand <david.marchand@redhat.com> wrote:
>
> > +Raghavendra Ningoji <raghavendra.ningoji@amd.com>
>
> Almost missed this.
> Alphabetical order please.

Fixed in v3 using devtools/mailmap-ctl.py (thanks Bruce for the
pointer); the entry now sits between "Rafal Kozik" and "Ragothaman
Jayaraman".

Thanks,
Raghavendra

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD
  2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
                     ` (3 preceding siblings ...)
  2026-06-22 12:25   ` [PATCH v2 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD David Marchand
@ 2026-06-25 18:47   ` Raghavendra Ningoji
  2026-06-25 18:47     ` [PATCH v3 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
                       ` (2 more replies)
  4 siblings, 3 replies; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:47 UTC (permalink / raw)
  To: dev
  Cc: david.marchand, bruce.richardson, fengchengwen, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas, Raghavendra Ningoji

This series adds a new dmadev poll-mode driver for the AMD AE4DMA
hardware DMA engine. An AE4DMA engine exposes 16 hardware command
queues, each with a 32-entry descriptor ring; the PMD maps each
hardware channel to its own dmadev with a single virtual channel,
so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
"<pci-bdf>-ch15".

Driver characteristics:

 - Memory-to-memory copy operations only (RTE_DMA_CAPA_MEM_TO_MEM).
 - Completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure.
 - vchan_status reports IDLE/ACTIVE based on HW read_idx vs write_idx
   and HALTED_ERROR when the queue is not enabled.
 - depends on bus_pci and dmadev.

The driver is introduced in three logical patches, following the
pattern of the recent hisi_acc dmadev driver:

  1/3 - introduce driver (probe, remove, per-queue HW init)
  2/3 - add control path operations (dev_ops)
  3/3 - add data path operations (copy, submit, completion)

Changes in v3:
 - Address review comments from David Marchand on patch 1/3 and 2/3:
     * Track the descriptor-ring memzone in the queue structure and
       free it directly in dev_close() instead of re-resolving it by
       name (also fixes the potential leak noted on probe/unplug).
     * Drop the unused back-references (ae4dma->dmadev, ae4dma->pci);
       pass the rte_pci_device to ae4dma_add_queue() instead.
     * Stop setting dev->device.driver in probe(); EAL sets it on
       probe() success since commit f282771a04ef.
     * Remove the redundant NULL name check in the single-caller
       helper, the needless NULL initialisers and the __func__ in the
       error log (already added by the log macro), and the NULL test
       before rte_memzone_free().
     * Remove the info->dev_name assignment (set by rte_dma_info_get()).
     * Replace the unsigned-long low32_value()/high32_value() helpers
       with lower_32_bits()/upper_32_bits() and drop the redundant
       index casts.
     * ae4dma_hw_defs.h: include <stdint.h>, drop the C++ guards and
       add the missing trailing enum comma; ae4dma_internal.h: convert
       doxygen comments to plain comments.
     * Remove trivial "what" comments throughout.
 - Reorder the .mailmap entry into alphabetical position.
 - Naming/architecture (16 dmadevs per PCI function, "<bdf>-chX")
   acknowledged as acceptable by Chengwen Feng and Bruce Richardson;
   kept unchanged.

Changes in v2:
 - Split the monolithic v1 patch into three logical patches
   (introduce / control path / data path), mirroring the
   structure used by drivers/dma/hisi_acc.
 - Fix checkpatches.sh warnings in ae4dma_internal.h (RTE_LOG_LINE_PREFIX,
   C99 __VA_ARGS__, __rte_cache_aligned placement).

v1: https://patches.dpdk.org/project/dpdk/patch/20260518181856.1228373-1-raghavendra.ningoji@amd.com/
v2: https://patches.dpdk.org/project/dpdk/patch/20260525184244.1758825-1-raghavendra.ningoji@amd.com/

Raghavendra Ningoji (3):
  dma/ae4dma: introduce AMD AE4DMA DMA PMD
  dma/ae4dma: add control path operations
  dma/ae4dma: add data path operations

 .mailmap                               |   1 +
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  75 +++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 718 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 154 ++++++
 drivers/dma/ae4dma/ae4dma_internal.h   |  97 ++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 11 files changed, 1070 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build


base-commit: f724d1c0d1c1636b9c171c34db3f17c3defaa2f3
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
@ 2026-06-25 18:47     ` Raghavendra Ningoji
  2026-06-27  0:01       ` fengchengwen
  2026-06-25 18:47     ` [PATCH v3 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
  2026-06-25 18:47     ` [PATCH v3 3/3] dma/ae4dma: add data " Raghavendra Ningoji
  2 siblings, 1 reply; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:47 UTC (permalink / raw)
  To: dev
  Cc: david.marchand, bruce.richardson, fengchengwen, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas, Raghavendra Ningoji

Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
hardware DMA engine, providing only PCI probe/remove and per-queue
hardware initialisation. An AE4DMA engine exposes 16 hardware command
queues, each with a 32-entry descriptor ring; the PMD maps each
hardware channel to its own dmadev with a single virtual channel,
so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
"<pci-bdf>-ch15".

This patch only registers the PCI driver, allocates the dmadev
objects, reserves the per-queue descriptor rings and programs the
hardware queue base addresses. Control and data path operations are
added in subsequent patches.

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 .mailmap                               |   1 +
 MAINTAINERS                            |   5 +
 doc/guides/dmadevs/ae4dma.rst          |  53 ++++++
 doc/guides/dmadevs/index.rst           |   1 +
 doc/guides/rel_notes/release_26_07.rst |   7 +
 drivers/dma/ae4dma/ae4dma_dmadev.c     | 220 +++++++++++++++++++++++++
 drivers/dma/ae4dma/ae4dma_hw_defs.h    | 154 +++++++++++++++++
 drivers/dma/ae4dma/ae4dma_internal.h   |  97 +++++++++++
 drivers/dma/ae4dma/meson.build         |   7 +
 drivers/dma/meson.build                |   1 +
 usertools/dpdk-devbind.py              |   5 +-
 11 files changed, 550 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/dmadevs/ae4dma.rst
 create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
 create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
 create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
 create mode 100644 drivers/dma/ae4dma/meson.build

diff --git a/.mailmap b/.mailmap
index 89ba6ffccc..71a62564fa 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1329,6 +1329,7 @@ Radu Bulie <radu-andrei.bulie@nxp.com>
 Radu Nicolau <radu.nicolau@intel.com>
 Rafael Ávila de Espíndola <espindola@scylladb.com>
 Rafal Kozik <rk@semihalf.com>
+Raghavendra Ningoji <raghavendra.ningoji@amd.com>
 Ragothaman Jayaraman <rjayaraman@caviumnetworks.com>
 Rahul Bhansali <rbhansali@marvell.com>
 Rahul Gupta <rahul.gupta@broadcom.com>
diff --git a/MAINTAINERS b/MAINTAINERS
index 9143d028bc..2e27af49f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
 DMAdev Drivers
 --------------
 
+AMD AE4DMA
+M: Bhagyada Modali <bhagyada.modali@amd.com>
+F: drivers/dma/ae4dma/
+F: doc/guides/dmadevs/ae4dma.rst
+
 Intel IDXD - EXPERIMENTAL
 M: Bruce Richardson <bruce.richardson@intel.com>
 M: Kevin Laatz <kevin.laatz@intel.com>
diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
new file mode 100644
index 0000000000..a85c1d92ca
--- /dev/null
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -0,0 +1,53 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2025 Advanced Micro Devices, Inc.
+
+.. include:: <isonum.txt>
+
+AMD AE4DMA DMA Device Driver
+============================
+
+The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
+AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
+hardware command queues, each with a ring of 32 descriptors. The PMD
+maps each hardware command queue to a separate DPDK dmadev with a
+single virtual channel, so a single PCI function appears as 16 dmadevs
+named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
+
+The driver supports memory-to-memory copy operations only.
+
+Hardware Requirements
+---------------------
+
+The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
+the system::
+
+   dpdk-devbind.py --status-dev dma
+
+AE4DMA devices appear with vendor ID ``0x1022`` and device ID
+``0x149b``.
+
+Compilation
+-----------
+
+The driver is built as part of the standard DPDK build on x86 platforms
+using ``meson`` and ``ninja``; no extra configuration is required.
+
+Device Setup
+------------
+
+The AE4DMA device must be bound to a DPDK-compatible kernel module such
+as ``vfio-pci`` before it can be used::
+
+   dpdk-devbind.py -b vfio-pci <pci-bdf>
+
+Initialization
+~~~~~~~~~~~~~~
+
+On probe the PMD performs the following steps for each PCI function:
+
+* Reads BAR0 and programs the common configuration register with the
+  number of hardware queues to enable (16).
+* For each hardware queue it allocates a 32-entry descriptor ring in
+  IOVA-contiguous memory, programs the queue base address and ring
+  depth into the per-queue registers, and enables the queue.
+* Interrupts are masked; completion is polled by the application.
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 56beb1733f..97399590f6 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -11,6 +11,7 @@ an application through DMA API.
    :maxdepth: 1
    :numbered:
 
+   ae4dma
    cnxk
    dpaa
    dpaa2
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index f012d47a4b..9a78a7ef62 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -63,6 +63,13 @@ New Features
     ``rte_eal_init`` and the application is responsible for probing each device,
   * ``--auto-probing`` enables the initial bus probing, which is the current default behavior.
 
+* **Added AMD AE4DMA DMA PMD.**
+
+  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
+  Each PCI function exposes 16 hardware command queues; the PMD registers one
+  dmadev per channel with a single virtual channel and supports
+  memory-to-memory copy operations.
+
 
 Removed Items
 -------------
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
new file mode 100644
index 0000000000..3d82f86906
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <rte_bus_pci.h>
+#include <bus_pci_driver.h>
+#include <rte_dmadev_pmd.h>
+#include <rte_malloc.h>
+
+#include "ae4dma_internal.h"
+
+/*
+ * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
+ * virtual channel. The HW's per-queue register block must be densely
+ * packed right after the engine-common config register at BAR0+0; the
+ * build-time check below catches an accidental layout change.
+ */
+static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
+		"ae4dma_hwq_regs stride changed; per-queue offset math will break");
+
+RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
+
+#define AE4DMA_PMD_NAME dmadev_ae4dma
+
+static const struct rte_memzone *
+ae4dma_queue_dma_zone_reserve(const char *queue_name,
+		uint32_t queue_size, int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != NULL) {
+		if (((size_t)queue_size <= mz->len) &&
+				((socket_id == SOCKET_ID_ANY) ||
+				 (socket_id == mz->socket_id))) {
+			AE4DMA_PMD_INFO("reuse memzone already "
+					"allocated for %s", queue_name);
+			return mz;
+		}
+		AE4DMA_PMD_ERR("Incompatible memzone already "
+				"allocated %s, size %u, socket %d. "
+				"Requested size %u, socket %u",
+				queue_name, (uint32_t)mz->len,
+				mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+static int
+ae4dma_add_queue(struct ae4dma_dmadev *dev, struct rte_pci_device *pci,
+		uint8_t qn, const char *pci_name)
+{
+	uint32_t dma_addr_lo, dma_addr_hi;
+	struct ae4dma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+
+	dev->io_regs = pci->mem_resource[AE4DMA_PCIE_BAR].addr;
+
+	cmd_q = &dev->cmd_q;
+	cmd_q->id = qn;
+	cmd_q->qidx = 0;
+	cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
+	cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
+
+	/*
+	 * Memzone name must be globally unique. Embed PCI BDF so multiple
+	 * PCI functions probed concurrently don't collide.
+	 */
+	snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			"ae4dma_%s_q%u", pci_name, (unsigned int)qn);
+
+	q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
+			cmd_q->qsize, rte_socket_id());
+	if (q_mz == NULL) {
+		AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
+		return -ENOMEM;
+	}
+
+	cmd_q->mz = q_mz;
+	cmd_q->qbase_addr = q_mz->addr;
+	cmd_q->qbase_desc = q_mz->addr;
+	cmd_q->qbase_phys_addr = q_mz->iova;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+			AE4DMA_CMD_QUEUE_ENABLE);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
+			AE4DMA_DISABLE_INTR);
+	cmd_q->next_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+	cmd_q->next_read = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	cmd_q->ring_buff_count = 0;
+
+	dma_addr_lo = lower_32_bits(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
+	dma_addr_hi = upper_32_bits(cmd_q->qbase_phys_addr);
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
+
+	return 0;
+}
+
+static void
+ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
+		unsigned int ch)
+{
+	snprintf(out, outlen, "%s-ch%u", pci_name, ch);
+}
+
+static int
+ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
+{
+	struct rte_dma_dev *dmadev;
+	struct ae4dma_dmadev *ae4dma;
+	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
+
+	memset(hwq_dev_name, 0, sizeof(hwq_dev_name));
+	ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
+
+	dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
+			sizeof(struct ae4dma_dmadev));
+	if (dmadev == NULL) {
+		AE4DMA_PMD_ERR("Unable to allocate dma device");
+		return -ENOMEM;
+	}
+	dmadev->device = &dev->device;
+	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+
+	ae4dma = dmadev->data->dev_private;
+
+	if (ae4dma_add_queue(ae4dma, dev, qn, name) != 0)
+		goto init_error;
+	return 0;
+
+init_error:
+	AE4DMA_PMD_ERR("failed");
+	rte_dma_pmd_release(hwq_dev_name);
+	return -ENOMEM;
+}
+
+static int
+ae4dma_dmadev_probe(struct rte_pci_driver *drv __rte_unused,
+		struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	void *mmio_base;
+	uint32_t q_per_eng;
+	int ret = 0;
+	uint8_t i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+
+	mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
+	if (mmio_base == NULL) {
+		AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
+		return -ENODEV;
+	}
+
+	/* Program the per-engine HW queue count once. */
+	AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
+			AE4DMA_MAX_HW_QUEUES);
+	q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
+	AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ret = ae4dma_dmadev_create(name, dev, i);
+		if (ret != 0) {
+			AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
+			while (i > 0) {
+				i--;
+				ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+				rte_dma_pmd_release(chname);
+			}
+			break;
+		}
+	}
+	return ret;
+}
+
+static int
+ae4dma_dmadev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	char chname[RTE_DEV_NAME_MAX_LEN];
+	unsigned int i;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+
+	AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+
+	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
+		ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
+		rte_dma_pmd_release(chname);
+	}
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_ae4dma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver ae4dma_pmd_drv = {
+	.id_table = pci_id_ae4dma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.probe = ae4dma_dmadev_probe,
+	.remove = ae4dma_dmadev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
+RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
new file mode 100644
index 0000000000..e7798be09b
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __AE4DMA_HW_DEFS_H__
+#define __AE4DMA_HW_DEFS_H__
+
+#include <stdint.h>
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_memzone.h>
+
+#define AE4DMA_BIT(nr)			(1UL << (nr))
+
+/* ae4dma device details */
+#define AMD_VENDOR_ID	0x1022
+#define AE4DMA_DEVICE_ID	0x149b
+#define AE4DMA_PCIE_BAR 0
+
+/*
+ * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
+ */
+#define AE4DMA_MAX_HW_QUEUES        16
+#define AE4DMA_QUEUE_START_INDEX    0
+#define AE4DMA_CMD_QUEUE_ENABLE		0x1
+#define AE4DMA_CMD_QUEUE_DISABLE	0x0
+
+/* Common to all queues */
+#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
+
+#define AE4DMA_DISABLE_INTR 0x01
+
+/* Descriptor status */
+enum ae4dma_dma_status {
+	AE4DMA_DMA_DESC_SUBMITTED = 0,
+	AE4DMA_DMA_DESC_VALIDATED = 1,
+	AE4DMA_DMA_DESC_PROCESSED = 2,
+	AE4DMA_DMA_DESC_COMPLETED = 3,
+	AE4DMA_DMA_DESC_ERROR = 4,
+};
+
+/* Descriptor error-code */
+enum ae4dma_dma_err {
+	AE4DMA_DMA_ERR_NO_ERR = 0,
+	AE4DMA_DMA_ERR_INV_HEADER = 1,
+	AE4DMA_DMA_ERR_INV_STATUS = 2,
+	AE4DMA_DMA_ERR_INV_LEN = 3,
+	AE4DMA_DMA_ERR_INV_SRC = 4,
+	AE4DMA_DMA_ERR_INV_DST = 5,
+	AE4DMA_DMA_ERR_INV_ALIGN = 6,
+	AE4DMA_DMA_ERR_UNKNOWN = 7,
+};
+
+/* HW Queue status */
+enum ae4dma_hwqueue_status {
+	AE4DMA_HWQUEUE_EMPTY = 0,
+	AE4DMA_HWQUEUE_FULL = 1,
+	AE4DMA_HWQUEUE_NOT_EMPTY = 4,
+};
+/*
+ * descriptor for AE4DMA commands
+ * 8 32-bit words:
+ * word 0: source memory type; destination memory type ; control bits
+ * word 1: desc_id; error code; status
+ * word 2: length
+ * word 3: reserved
+ * word 4: upper 32 bits of source pointer
+ * word 5: low 32 bits of source pointer
+ * word 6: upper 32 bits of destination pointer
+ * word 7: low 32 bits of destination pointer
+ */
+
+/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
+#define AE4DMA_DWORD0_STOP_ON_COMPLETION	AE4DMA_BIT(0)
+#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION	AE4DMA_BIT(1)
+#define AE4DMA_DWORD0_START_OF_MESSAGE		AE4DMA_BIT(3)
+#define AE4DMA_DWORD0_END_OF_MESSAGE		AE4DMA_BIT(4)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE	RTE_GENMASK64(5, 4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE	RTE_GENMASK64(7, 6)
+
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
+#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
+
+struct ae4dma_desc_dword0 {
+	uint8_t byte0;
+	uint8_t byte1;
+	uint16_t timestamp;
+};
+
+struct ae4dma_desc_dword1 {
+	uint8_t status;
+	uint8_t err_code;
+	uint16_t desc_id;
+};
+
+struct ae4dma_desc {
+	struct ae4dma_desc_dword0 dw0;
+	struct ae4dma_desc_dword1 dw1;
+	uint32_t length;
+	uint32_t reserved;
+	uint32_t src_lo;
+	uint32_t src_hi;
+	uint32_t dst_lo;
+	uint32_t dst_hi;
+};
+
+/*
+ * Registers for each queue :4 bytes length
+ * Effective address : offset + reg
+ */
+struct ae4dma_hwq_regs {
+	union {
+		uint32_t control_raw;
+		struct {
+			uint32_t queue_enable: 1;
+			uint32_t reserved_internal: 31;
+		} control;
+	} control_reg;
+
+	union {
+		uint32_t status_raw;
+		struct {
+			uint32_t reserved0: 1;
+			/* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
+			uint32_t queue_status: 2;
+			uint32_t reserved1: 21;
+			uint32_t interrupt_type: 4;
+			uint32_t reserved2: 4;
+		} status;
+	} status_reg;
+
+	uint32_t max_idx;
+	uint32_t read_idx;
+	uint32_t write_idx;
+
+	union {
+		uint32_t intr_status_raw;
+		struct {
+			uint32_t intr_status: 1;
+			uint32_t reserved: 31;
+		} intr_status;
+	} intr_status_reg;
+
+	uint32_t qbase_lo;
+	uint32_t qbase_hi;
+
+};
+
+#endif /* AE4DMA_HW_DEFS_H */
diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
new file mode 100644
index 0000000000..7f149c97b5
--- /dev/null
+++ b/drivers/dma/ae4dma/ae4dma_internal.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _AE4DMA_INTERNAL_H_
+#define _AE4DMA_INTERNAL_H_
+
+#include <stdint.h>
+
+#include "ae4dma_hw_defs.h"
+
+/* Return bits 32-63 of a 64-bit number. */
+#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
+
+/* Return bits 0-31 of a 64-bit number. */
+#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
+
+/* Hardware ring depth (slots per queue); must be power of two. */
+#define AE4DMA_DESCRIPTORS_PER_CMDQ	32
+#define AE4DMA_QUEUE_DESC_SIZE		sizeof(struct ae4dma_desc)
+#define AE4DMA_QUEUE_SIZE(n)		(AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
+
+
+/* AE4DMA registers Write/Read */
+static inline void ae4dma_pci_reg_write(void *base, int offset,
+		uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
+	ae4dma_pci_reg_read(hw_addr, reg_offset)
+
+#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
+	ae4dma_pci_reg_write(hw_addr, reg_offset, value)
+
+
+#define AE4DMA_READ_REG(hw_addr) \
+	ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
+
+#define AE4DMA_WRITE_REG(hw_addr, value) \
+	ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
+
+/* A structure describing an AE4DMA command queue. */
+struct __rte_cache_aligned ae4dma_cmd_queue {
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	volatile struct ae4dma_hwq_regs *hwq_regs;
+
+	struct rte_dma_vchan_conf qcfg;
+	struct rte_dma_stats stats;
+	/* Queue address */
+	struct ae4dma_desc *qbase_desc;
+	void *qbase_addr;
+	rte_iova_t qbase_phys_addr;
+	enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
+	/* Queue identifier */
+	uint64_t id;    /* queue id */
+	uint64_t qidx;  /* queue index */
+	uint64_t qsize; /* queue size */
+	uint32_t ring_buff_count;
+	uint16_t next_read;
+	uint16_t next_write;
+	uint16_t last_write; /* Used to compute submitted count. */
+};
+
+/*
+ * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
+ * dmadevs per PCI function, each owning a single HW command queue.
+ */
+struct ae4dma_dmadev {
+	void *io_regs;
+	struct ae4dma_cmd_queue cmd_q; /* single HW queue owned by this dmadev */
+};
+
+
+extern int ae4dma_pmd_logtype;
+#define RTE_LOGTYPE_AE4DMA_PMD ae4dma_pmd_logtype
+
+#define AE4DMA_PMD_LOG(level, ...) \
+	RTE_LOG_LINE_PREFIX(level, AE4DMA_PMD, "%s(): ", __func__, __VA_ARGS__)
+
+#define AE4DMA_PMD_DEBUG(...)  AE4DMA_PMD_LOG(DEBUG, __VA_ARGS__)
+#define AE4DMA_PMD_INFO(...)   AE4DMA_PMD_LOG(INFO, __VA_ARGS__)
+#define AE4DMA_PMD_ERR(...)    AE4DMA_PMD_LOG(ERR, __VA_ARGS__)
+#define AE4DMA_PMD_WARN(...)   AE4DMA_PMD_LOG(WARNING, __VA_ARGS__)
+
+#endif /* _AE4DMA_INTERNAL_H_ */
diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
new file mode 100644
index 0000000000..e48ab0d561
--- /dev/null
+++ b/drivers/dma/ae4dma/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files('ae4dma_dmadev.c')
+deps += ['bus_pci', 'dmadev']
diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
index e0d94db967..c230ac5a06 100644
--- a/drivers/dma/meson.build
+++ b/drivers/dma/meson.build
@@ -2,6 +2,7 @@
 # Copyright 2021 HiSilicon Limited
 
 drivers = [
+        'ae4dma',
         'cnxk',
         'dpaa',
         'dpaa2',
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 93f2383dff..7d09f155de 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -86,6 +86,9 @@
 cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
             'SVendor': None, 'SDevice': None}
 
+amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
+              'SVendor': None, 'SDevice': None}
+
 virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
               'SVendor': None, 'SDevice': None}
 
@@ -95,7 +98,7 @@
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
 crypto_devices = [encryption_class, intel_processor_class]
-dma_devices = [cnxk_dma, hisilicon_dma,
+dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
                intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
                intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
                odm_dma]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] dma/ae4dma: add control path operations
  2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
  2026-06-25 18:47     ` [PATCH v3 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
@ 2026-06-25 18:47     ` Raghavendra Ningoji
  2026-06-27  0:09       ` fengchengwen
  2026-06-25 18:47     ` [PATCH v3 3/3] dma/ae4dma: add data " Raghavendra Ningoji
  2 siblings, 1 reply; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:47 UTC (permalink / raw)
  To: dev
  Cc: david.marchand, bruce.richardson, fengchengwen, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas, Raghavendra Ningoji

Implement the dmadev control path for the AMD AE4DMA PMD.

This commit adds:
 - dev_configure / vchan_setup: accept a single virtual channel per
   dmadev and clamp the requested ring size to the hardware maximum
   of 32 descriptors (rounded up to a power of two).
 - dev_start / dev_stop / dev_close: program the per-queue control
   register to enable/disable the hardware queue and release the
   descriptor ring memzone on close.
 - dev_info_get: advertise RTE_DMA_CAPA_MEM_TO_MEM and the fixed
   ring depth.
 - dev_dump: print the queue identifiers, ring layout and software
   completion counters.
 - stats_get / stats_reset: expose submitted / completed / errors
   counters maintained by the driver.
 - vchan_status: report IDLE / ACTIVE based on hardware read_idx vs
   write_idx, and HALTED_ERROR when the queue is not enabled.

The dmadev framework is wired through dev_ops in ae4dma_dmadev_create().

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 drivers/dma/ae4dma/ae4dma_dmadev.c | 211 +++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)

diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
index 3d82f86906..607f288623 100644
--- a/drivers/dma/ae4dma/ae4dma_dmadev.c
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -53,6 +53,203 @@ ae4dma_queue_dma_zone_reserve(const char *queue_name,
 			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
 }
 
+static int
+ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
+		const struct rte_dma_conf *dev_conf,
+		uint32_t conf_sz)
+{
+	if (sizeof(struct rte_dma_conf) != conf_sz)
+		return -EINVAL;
+
+	if (dev_conf->nb_vchans != 1)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
+static int
+ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t max_desc = qconf->nb_desc;
+
+	if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
+		return -EINVAL;
+
+	if (max_desc < 2)
+		return -EINVAL;
+
+	if (!rte_is_power_of_2(max_desc))
+		max_desc = rte_align32pow2(max_desc);
+
+	if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
+		AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
+				dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
+		max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	}
+
+	cmd_q->qcfg = *qconf;
+	cmd_q->qcfg.nb_desc = max_desc;
+
+	/* Ensure all counters are reset, if reconfiguring/restarting device. */
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+static int
+ae4dma_dev_start(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	if (nb == 0)
+		return -EBUSY;
+
+	/* Program ring depth expected by hardware. */
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
+	return 0;
+}
+
+static int
+ae4dma_dev_stop(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+	return 0;
+}
+
+static int
+ae4dma_dev_info_get(const struct rte_dma_dev *dev __rte_unused,
+		struct rte_dma_info *info, uint32_t size)
+{
+	if (size < sizeof(*info))
+		return -EINVAL;
+	info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;
+	info->max_vchans = 1;
+	info->min_desc = 2;
+	info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	info->nb_vchans = 1;
+	return 0;
+}
+
+static int
+ae4dma_dev_close(struct rte_dma_dev *dev)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	if (cmd_q->hwq_regs != NULL)
+		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
+				AE4DMA_CMD_QUEUE_DISABLE);
+
+	rte_memzone_free(cmd_q->mz);
+	cmd_q->mz = NULL;
+	cmd_q->qbase_desc = NULL;
+	cmd_q->qbase_addr = NULL;
+	cmd_q->qbase_phys_addr = 0;
+	return 0;
+}
+
+static int
+ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q;
+	void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
+
+	cmd_q = &ae4dma->cmd_q;
+	fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
+	fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
+	fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
+	fprintf(f, "mmio_base_addr	= %p\n", ae4dma_mmio_base_addr);
+	fprintf(f, "queues per ae4dma engine     = %d\n", AE4DMA_READ_REG_OFFSET(
+				ae4dma_mmio_base_addr, AE4DMA_COMMON_CONFIG_OFFSET));
+	fprintf(f, "== Private Data ==\n");
+	fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
+	fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
+			(void *)cmd_q->qbase_desc,
+			(uint64_t)cmd_q->qbase_phys_addr);
+	fprintf(f, "  Next write: %u\n", cmd_q->next_write);
+	fprintf(f, "  Next read: %u\n", cmd_q->next_read);
+	fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
+	fprintf(f, "  }\n");
+	fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", failed: %" PRIu64 " }\n",
+		cmd_q->stats.submitted,
+		cmd_q->stats.completed,
+		cmd_q->stats.errors);
+	return 0;
+}
+static int
+ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		struct rte_dma_stats *rte_stats, uint32_t size)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	const struct rte_dma_stats *stats = &cmd_q->stats;
+
+	if (size < sizeof(*rte_stats))
+		return -EINVAL;
+	if (rte_stats == NULL)
+		return -EINVAL;
+
+	*rte_stats = *stats;
+	return 0;
+}
+
+static int
+ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+
+	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
+	return 0;
+}
+
+/*
+ * Report channel state to the dmadev framework.
+ *
+ *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
+ *                                stopped via dev_stop()).
+ *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
+ *                                no descriptors in flight.
+ *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
+ */
+static int
+ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+		enum rte_dma_vchan_status *status)
+{
+	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint32_t ctrl, hw_read, hw_write;
+
+	if (cmd_q->hwq_regs == NULL) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
+	if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
+		*status = RTE_DMA_VCHAN_HALTED_ERROR;
+		return 0;
+	}
+
+	hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
+	hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
+
+	*status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
+					: RTE_DMA_VCHAN_ACTIVE;
+	return 0;
+}
+
 static int
 ae4dma_add_queue(struct ae4dma_dmadev *dev, struct rte_pci_device *pci,
 		uint8_t qn, const char *pci_name)
@@ -115,6 +312,19 @@ ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
 static int
 ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 {
+	static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
+		.dev_close = ae4dma_dev_close,
+		.dev_configure = ae4dma_dev_configure,
+		.dev_dump = ae4dma_dev_dump,
+		.dev_info_get = ae4dma_dev_info_get,
+		.dev_start = ae4dma_dev_start,
+		.dev_stop = ae4dma_dev_stop,
+		.stats_get = ae4dma_stats_get,
+		.stats_reset = ae4dma_stats_reset,
+		.vchan_status = ae4dma_vchan_status,
+		.vchan_setup = ae4dma_vchan_setup,
+	};
+
 	struct rte_dma_dev *dmadev;
 	struct ae4dma_dmadev *ae4dma;
 	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
@@ -130,6 +340,7 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 	}
 	dmadev->device = &dev->device;
 	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+	dmadev->dev_ops = &ae4dma_dmadev_ops;
 
 	ae4dma = dmadev->data->dev_private;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] dma/ae4dma: add data path operations
  2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
  2026-06-25 18:47     ` [PATCH v3 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
  2026-06-25 18:47     ` [PATCH v3 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
@ 2026-06-25 18:47     ` Raghavendra Ningoji
  2026-06-27  0:23       ` fengchengwen
  2 siblings, 1 reply; 24+ messages in thread
From: Raghavendra Ningoji @ 2026-06-25 18:47 UTC (permalink / raw)
  To: dev
  Cc: david.marchand, bruce.richardson, fengchengwen, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas, Raghavendra Ningoji

Implement the dmadev fast path for the AMD AE4DMA PMD.

This commit adds:
 - copy enqueue (rte_dma_copy): write an AE4DMA descriptor for a
   memory-to-memory transfer; on RTE_DMA_OP_FLAG_SUBMIT the doorbell
   is rung immediately.
 - submit (rte_dma_submit): advance the per-queue write_idx
   register to expose pending descriptors to the hardware.
 - completion (rte_dma_completed / rte_dma_completed_status):
   completion is detected via the hardware's per-queue read_idx
   register, which the engine advances as it processes descriptors.
   The descriptor status / err_code bytes are read only to classify
   each drained slot as success or failure, and HW error codes are
   translated to the dmadev RTE_DMA_STATUS_* enumeration.
 - burst capacity (rte_dma_burst_capacity): report the number of
   free descriptor slots, taking into account the one slot reserved
   to distinguish full from empty on the power-of-two ring.

The fast path entry points are wired through fp_obj in
ae4dma_dmadev_create(). The fill capability is not advertised;
fp_obj->fill is left zero-initialised.

Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
---
 doc/guides/dmadevs/ae4dma.rst      |  22 +++
 drivers/dma/ae4dma/ae4dma_dmadev.c | 287 +++++++++++++++++++++++++++++
 2 files changed, 309 insertions(+)

diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
index a85c1d92ca..37a2096ccf 100644
--- a/doc/guides/dmadevs/ae4dma.rst
+++ b/doc/guides/dmadevs/ae4dma.rst
@@ -51,3 +51,25 @@ On probe the PMD performs the following steps for each PCI function:
   IOVA-contiguous memory, programs the queue base address and ring
   depth into the per-queue registers, and enables the queue.
 * Interrupts are masked; completion is polled by the application.
+
+Usage
+-----
+
+Once a dmadev has been started, copies are submitted with
+``rte_dma_copy()`` and completions are reaped with ``rte_dma_completed()``
+or ``rte_dma_completed_status()``. See the
+:ref:`Enqueue / Dequeue API <dmadev_enqueue_dequeue>` section of the
+dmadev library documentation for details.
+
+Limitations
+-----------
+
+* Only memory-to-memory copies are supported. Fill, scatter-gather and
+  any other operation types are not advertised in
+  ``rte_dma_info::dev_capa``.
+* The maximum number of descriptors per virtual channel is fixed by
+  hardware at 32. The PMD rounds the requested ring size up to a
+  power of two and clamps it to 32.
+* Only a single virtual channel per dmadev is supported; use the 16
+  per-PCI-function dmadevs to obtain channel-level parallelism.
+* Interrupt-driven completion is not supported.
diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
index 607f288623..da3ec42233 100644
--- a/drivers/dma/ae4dma/ae4dma_dmadev.c
+++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
@@ -158,6 +158,72 @@ ae4dma_dev_close(struct rte_dma_dev *dev)
 	return 0;
 }
 
+/* trigger h/w to process enqued desc:doorbell - by next_write */
+static inline void
+__submit(struct ae4dma_dmadev *ae4dma)
+{
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
+	if (nb != 0)
+		cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - cmd_q->last_write +
+				nb) % nb);
+	cmd_q->last_write = cmd_q->next_write;
+}
+
+static int
+ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+
+	__submit(ae4dma);
+	return 0;
+}
+
+/* Write descriptor for enqueue (copy only). */
+static inline int
+__write_desc_copy(void *dev_private, rte_iova_t src, rte_iova_t dst,
+		uint32_t len, uint64_t flags)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	struct ae4dma_desc *dma_desc;
+	uint16_t ret;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t write = cmd_q->next_write;
+
+	if (nb == 0)
+		return -EINVAL;
+
+	/* Reserve one slot to distinguish full from empty (power-of-two ring). */
+	if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
+		return -ENOSPC;
+
+	dma_desc = &cmd_q->qbase_desc[write];
+	memset(dma_desc, 0, sizeof(*dma_desc));
+	dma_desc->length = len;
+	dma_desc->src_hi = upper_32_bits(src);
+	dma_desc->src_lo = lower_32_bits(src);
+	dma_desc->dst_hi = upper_32_bits(dst);
+	dma_desc->dst_lo = lower_32_bits(dst);
+	cmd_q->ring_buff_count++;
+	cmd_q->next_write = (uint16_t)((write + 1) % nb);
+	ret = write;
+	if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+		__submit(ae4dma);
+	return ret;
+}
+
+/* Enqueue a copy operation onto the ae4dma device. */
+static int
+ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
+		rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+	return __write_desc_copy(dev_private, src, dst, length, flags);
+}
+
 static int
 ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
 {
@@ -187,6 +253,220 @@ ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
 		cmd_q->stats.errors);
 	return 0;
 }
+
+/* Translates AE4DMA ChanERRs to DMA error codes. */
+static inline enum rte_dma_status_code
+__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
+{
+	AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
+
+	switch (status) {
+	case AE4DMA_DMA_ERR_NO_ERR:
+		return RTE_DMA_STATUS_SUCCESSFUL;
+	case AE4DMA_DMA_ERR_INV_LEN:
+		return RTE_DMA_STATUS_INVALID_LENGTH;
+	case AE4DMA_DMA_ERR_INV_SRC:
+		return RTE_DMA_STATUS_INVALID_SRC_ADDR;
+	case AE4DMA_DMA_ERR_INV_DST:
+		return RTE_DMA_STATUS_INVALID_DST_ADDR;
+	case AE4DMA_DMA_ERR_INV_ALIGN:
+		/* Name matches DPDK public enum spelling. */
+		return RTE_DMA_STATUS_DATA_POISION;
+	case AE4DMA_DMA_ERR_INV_HEADER:
+	case AE4DMA_DMA_ERR_INV_STATUS:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	default:
+		return RTE_DMA_STATUS_ERROR_UNKNOWN;
+	}
+}
+
+/*
+ * Scan HW queue for completed descriptors (non-blocking).
+ *
+ * The AE4DMA engine signals completion by advancing the per-queue
+ * `read_idx` register; it does not (reliably) write a status value
+ * back into the descriptor. We therefore use the HW `read_idx`
+ * register as the source of truth and only inspect the descriptor's
+ * `dw1.err_code` byte to classify each completion as success or
+ * failure.
+ *
+ * @param cmd_q
+ *   The AE4DMA command queue.
+ * @param max_ops
+ *   Maximum descriptors to process this call.
+ * @param[out] failed_count
+ *   Number of completed descriptors that did not report success.
+ * @return
+ *   Number of descriptors completed (success + failure), <= max_ops.
+ */
+static inline uint16_t
+ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
+		uint16_t *failed_count)
+{
+	volatile struct ae4dma_desc *hw_desc;
+	uint16_t events_count = 0, fails = 0;
+	uint16_t tail;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t hw_read_idx;
+	uint16_t in_flight;
+	uint16_t scan_cap;
+
+	if (nb == 0 || cmd_q->ring_buff_count == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+	mask = nb - 1;
+
+	hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & mask);
+	tail = cmd_q->next_read;
+
+	/*
+	 * Descriptors completed since our last visit live in the
+	 * half-open ring range [tail, hw_read_idx). If HW hasn't
+	 * moved we have nothing to do.
+	 */
+	in_flight = (uint16_t)((hw_read_idx - tail) & mask);
+	if (in_flight == 0) {
+		*failed_count = 0;
+		return 0;
+	}
+
+	scan_cap = max_ops;
+	if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
+		scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
+	if (scan_cap > in_flight)
+		scan_cap = in_flight;
+	if (scan_cap > cmd_q->ring_buff_count)
+		scan_cap = (uint16_t)cmd_q->ring_buff_count;
+
+	while (events_count < scan_cap) {
+		uint8_t hw_status;
+		uint8_t hw_err;
+
+		hw_desc = &cmd_q->qbase_desc[tail];
+		hw_status = hw_desc->dw1.status;
+		hw_err = hw_desc->dw1.err_code;
+
+		/*
+		 * read_idx advancing is the definitive completion
+		 * signal. The per-descriptor status byte is informational
+		 * and may not yet be written when we observe it:
+		 *
+		 *   AE4DMA_DMA_DESC_ERROR (4)
+		 *     Hard failure - err_code names the precise cause.
+		 *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
+		 *     Success.
+		 *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
+		 *     Benign race: HW had not finished updating the
+		 *     status byte at the instant we read it. Since
+		 *     read_idx has moved past this slot, treat it as
+		 *     success unless err_code says otherwise.
+		 *
+		 * A non-zero err_code is treated as a failure regardless
+		 * of the observed status value.
+		 */
+		if (hw_status == AE4DMA_DMA_DESC_ERROR ||
+				hw_err != AE4DMA_DMA_ERR_NO_ERR) {
+			fails++;
+			AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
+					hw_status, hw_err);
+		}
+		cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
+		cmd_q->ring_buff_count--;
+		events_count++;
+		tail = (tail + 1) & mask;
+	}
+
+	cmd_q->stats.completed += events_count;
+	cmd_q->stats.errors += fails;
+	cmd_q->next_read = tail;
+	*failed_count = fails;
+	return events_count;
+}
+
+/* Returns successful operations count and sets error flag if any errors. */
+static uint16_t
+ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
+		const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count, sl_count;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	*has_error = false;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	sl_count = cpl_count - err_count;
+	if (err_count)
+		*has_error = true;
+
+	return sl_count;
+}
+
+static uint16_t
+ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
+		uint16_t max_ops, uint16_t *last_idx,
+		enum rte_dma_status_code *status)
+{
+	struct ae4dma_dmadev *ae4dma = dev_private;
+	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t cpl_count;
+	uint16_t i;
+	uint16_t err_count = 0;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+
+	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
+
+	if (cpl_count > max_ops)
+		cpl_count = max_ops;
+
+	if (cpl_count > 0 && last_idx != NULL)
+		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
+
+	if (likely(err_count == 0)) {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = RTE_DMA_STATUS_SUCCESSFUL;
+	} else {
+		for (i = 0; i < cpl_count; i++)
+			status[i] = __translate_status_ae4dma_to_dma(cmd_q->status[i]);
+	}
+
+	return cpl_count;
+}
+
+/* Get the remaining capacity of the ring. */
+static uint16_t
+ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
+{
+	const struct ae4dma_dmadev *ae4dma = dev_private;
+	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
+	uint16_t nb = cmd_q->qcfg.nb_desc;
+	uint16_t mask;
+	uint16_t read_idx = cmd_q->next_read;
+	uint16_t write_idx = cmd_q->next_write;
+	uint16_t used;
+
+	if (nb < 2 || !rte_is_power_of_2(nb))
+		return 0;
+
+	mask = nb - 1;
+	used = (uint16_t)((write_idx - read_idx) & mask);
+	/* One slot reserved (same rule as enqueue). */
+	if (used >= nb - 1)
+		return 0;
+	return (uint16_t)(nb - 1 - used);
+}
+
 static int
 ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
 		struct rte_dma_stats *rte_stats, uint32_t size)
@@ -342,6 +622,13 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
 	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
 	dmadev->dev_ops = &ae4dma_dmadev_ops;
 
+	dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
+	dmadev->fp_obj->completed = ae4dma_completed;
+	dmadev->fp_obj->completed_status = ae4dma_completed_status;
+	dmadev->fp_obj->copy = ae4dma_enqueue_copy;
+	dmadev->fp_obj->submit = ae4dma_submit;
+	/* fill capability not advertised: leave fp_obj->fill as zero-initialised. */
+
 	ae4dma = dmadev->data->dev_private;
 
 	if (ae4dma_add_queue(ae4dma, dev, qn, name) != 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/3] dma/ae4dma: introduce AMD AE4DMA DMA PMD
  2026-06-25 18:47     ` [PATCH v3 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
@ 2026-06-27  0:01       ` fengchengwen
  0 siblings, 0 replies; 24+ messages in thread
From: fengchengwen @ 2026-06-27  0:01 UTC (permalink / raw)
  To: Raghavendra Ningoji, dev
  Cc: david.marchand, bruce.richardson, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas

On 6/26/2026 2:47 AM, Raghavendra Ningoji wrote:
> Add the skeleton of a new dmadev poll-mode driver for the AMD AE4DMA
> hardware DMA engine, providing only PCI probe/remove and per-queue
> hardware initialisation. An AE4DMA engine exposes 16 hardware command
> queues, each with a 32-entry descriptor ring; the PMD maps each
> hardware channel to its own dmadev with a single virtual channel,
> so a PCI function appears as 16 dmadevs named "<pci-bdf>-ch0" ..
> "<pci-bdf>-ch15".
> 
> This patch only registers the PCI driver, allocates the dmadev
> objects, reserves the per-queue descriptor rings and programs the
> hardware queue base addresses. Control and data path operations are
> added in subsequent patches.
> 
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
> ---
>  .mailmap                               |   1 +
>  MAINTAINERS                            |   5 +
>  doc/guides/dmadevs/ae4dma.rst          |  53 ++++++
>  doc/guides/dmadevs/index.rst           |   1 +
>  doc/guides/rel_notes/release_26_07.rst |   7 +
>  drivers/dma/ae4dma/ae4dma_dmadev.c     | 220 +++++++++++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_hw_defs.h    | 154 +++++++++++++++++
>  drivers/dma/ae4dma/ae4dma_internal.h   |  97 +++++++++++
>  drivers/dma/ae4dma/meson.build         |   7 +
>  drivers/dma/meson.build                |   1 +
>  usertools/dpdk-devbind.py              |   5 +-
>  11 files changed, 550 insertions(+), 1 deletion(-)
>  create mode 100644 doc/guides/dmadevs/ae4dma.rst
>  create mode 100644 drivers/dma/ae4dma/ae4dma_dmadev.c
>  create mode 100644 drivers/dma/ae4dma/ae4dma_hw_defs.h
>  create mode 100644 drivers/dma/ae4dma/ae4dma_internal.h
>  create mode 100644 drivers/dma/ae4dma/meson.build
> 
> diff --git a/.mailmap b/.mailmap
> index 89ba6ffccc..71a62564fa 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1329,6 +1329,7 @@ Radu Bulie <radu-andrei.bulie@nxp.com>
>  Radu Nicolau <radu.nicolau@intel.com>
>  Rafael Ávila de Espíndola <espindola@scylladb.com>
>  Rafal Kozik <rk@semihalf.com>
> +Raghavendra Ningoji <raghavendra.ningoji@amd.com>
>  Ragothaman Jayaraman <rjayaraman@caviumnetworks.com>
>  Rahul Bhansali <rbhansali@marvell.com>
>  Rahul Gupta <rahul.gupta@broadcom.com>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9143d028bc..2e27af49f4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1361,6 +1361,11 @@ F: doc/guides/compressdevs/features/zsda.ini
>  DMAdev Drivers
>  --------------
>  
> +AMD AE4DMA
> +M: Bhagyada Modali <bhagyada.modali@amd.com>
> +F: drivers/dma/ae4dma/
> +F: doc/guides/dmadevs/ae4dma.rst
> +
>  Intel IDXD - EXPERIMENTAL
>  M: Bruce Richardson <bruce.richardson@intel.com>
>  M: Kevin Laatz <kevin.laatz@intel.com>
> diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
> new file mode 100644
> index 0000000000..a85c1d92ca
> --- /dev/null
> +++ b/doc/guides/dmadevs/ae4dma.rst
> @@ -0,0 +1,53 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2025 Advanced Micro Devices, Inc.

2025 -> 2026?

> +
> +.. include:: <isonum.txt>
> +
> +AMD AE4DMA DMA Device Driver
> +============================
> +
> +The ``ae4dma`` dmadev driver is a poll-mode driver (PMD) for the
> +AMD AE4DMA hardware DMA engine. The engine exposes 16 independent
> +hardware command queues, each with a ring of 32 descriptors. The PMD
> +maps each hardware command queue to a separate DPDK dmadev with a
> +single virtual channel, so a single PCI function appears as 16 dmadevs
> +named ``<pci-bdf>-ch0`` through ``<pci-bdf>-ch15``.
> +
> +The driver supports memory-to-memory copy operations only.
> +
> +Hardware Requirements
> +---------------------
> +
> +The ``dpdk-devbind.py`` script can be used to list AE4DMA devices on
> +the system::
> +
> +   dpdk-devbind.py --status-dev dma
> +
> +AE4DMA devices appear with vendor ID ``0x1022`` and device ID
> +``0x149b``.
> +
> +Compilation
> +-----------
> +
> +The driver is built as part of the standard DPDK build on x86 platforms
> +using ``meson`` and ``ninja``; no extra configuration is required.
> +
> +Device Setup
> +------------
> +
> +The AE4DMA device must be bound to a DPDK-compatible kernel module such
> +as ``vfio-pci`` before it can be used::
> +
> +   dpdk-devbind.py -b vfio-pci <pci-bdf>
> +
> +Initialization
> +~~~~~~~~~~~~~~
> +
> +On probe the PMD performs the following steps for each PCI function:
> +
> +* Reads BAR0 and programs the common configuration register with the
> +  number of hardware queues to enable (16).
> +* For each hardware queue it allocates a 32-entry descriptor ring in
> +  IOVA-contiguous memory, programs the queue base address and ring
> +  depth into the per-queue registers, and enables the queue.
> +* Interrupts are masked; completion is polled by the application.
> diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
> index 56beb1733f..97399590f6 100644
> --- a/doc/guides/dmadevs/index.rst
> +++ b/doc/guides/dmadevs/index.rst
> @@ -11,6 +11,7 @@ an application through DMA API.
>     :maxdepth: 1
>     :numbered:
>  
> +   ae4dma
>     cnxk
>     dpaa
>     dpaa2
> diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
> index f012d47a4b..9a78a7ef62 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -63,6 +63,13 @@ New Features
>      ``rte_eal_init`` and the application is responsible for probing each device,
>    * ``--auto-probing`` enables the initial bus probing, which is the current default behavior.
>  
> +* **Added AMD AE4DMA DMA PMD.**
> +
> +  Added a new ``dma/ae4dma`` driver for the AMD AE4DMA hardware DMA engine.
> +  Each PCI function exposes 16 hardware command queues; the PMD registers one
> +  dmadev per channel with a single virtual channel and supports
> +  memory-to-memory copy operations.
> +
>  
>  Removed Items
>  -------------
> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> new file mode 100644
> index 0000000000..3d82f86906
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#include <errno.h>
> +#include <inttypes.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <rte_bus_pci.h>
> +#include <bus_pci_driver.h>
> +#include <rte_dmadev_pmd.h>
> +#include <rte_malloc.h>
> +
> +#include "ae4dma_internal.h"
> +
> +/*
> + * One dmadev per AE4DMA hardware channel; each dmadev has exactly one
> + * virtual channel. The HW's per-queue register block must be densely
> + * packed right after the engine-common config register at BAR0+0; the
> + * build-time check below catches an accidental layout change.
> + */
> +static_assert(sizeof(struct ae4dma_hwq_regs) == 32,
> +		"ae4dma_hwq_regs stride changed; per-queue offset math will break");
> +
> +RTE_LOG_REGISTER_DEFAULT(ae4dma_pmd_logtype, INFO);
> +
> +#define AE4DMA_PMD_NAME dmadev_ae4dma
> +
> +static const struct rte_memzone *
> +ae4dma_queue_dma_zone_reserve(const char *queue_name,
> +		uint32_t queue_size, int socket_id)
> +{
> +	const struct rte_memzone *mz;
> +
> +	mz = rte_memzone_lookup(queue_name);
> +	if (mz != NULL) {
> +		if (((size_t)queue_size <= mz->len) &&
> +				((socket_id == SOCKET_ID_ANY) ||
> +				 (socket_id == mz->socket_id))) {
> +			AE4DMA_PMD_INFO("reuse memzone already "
> +					"allocated for %s", queue_name);
> +			return mz;
> +		}
> +		AE4DMA_PMD_ERR("Incompatible memzone already "
> +				"allocated %s, size %u, socket %d. "
> +				"Requested size %u, socket %u",
> +				queue_name, (uint32_t)mz->len,
> +				mz->socket_id, queue_size, socket_id);
> +		return NULL;
> +	}
> +	return rte_memzone_reserve_aligned(queue_name, queue_size,
> +			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);

No need to do such reuse, and this resource could setup in vchan_setup ops,
but your dmadev has max 32 descriptors and only 1 vchan per-dmadev, so I think
it's ok to setup in the probe.

> +}
> +
> +static int
> +ae4dma_add_queue(struct ae4dma_dmadev *dev, struct rte_pci_device *pci,
> +		uint8_t qn, const char *pci_name)
> +{
> +	uint32_t dma_addr_lo, dma_addr_hi;
> +	struct ae4dma_cmd_queue *cmd_q;
> +	const struct rte_memzone *q_mz;
> +
> +	dev->io_regs = pci->mem_resource[AE4DMA_PCIE_BAR].addr;
> +
> +	cmd_q = &dev->cmd_q;
> +	cmd_q->id = qn;
> +	cmd_q->qidx = 0;
> +	cmd_q->qsize = AE4DMA_QUEUE_SIZE(AE4DMA_QUEUE_DESC_SIZE);
> +	cmd_q->hwq_regs = (volatile struct ae4dma_hwq_regs *)dev->io_regs + (qn + 1);
> +
> +	/*
> +	 * Memzone name must be globally unique. Embed PCI BDF so multiple
> +	 * PCI functions probed concurrently don't collide.
> +	 */
> +	snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
> +			"ae4dma_%s_q%u", pci_name, (unsigned int)qn);
> +
> +	q_mz = ae4dma_queue_dma_zone_reserve(cmd_q->memz_name,
> +			cmd_q->qsize, rte_socket_id());
> +	if (q_mz == NULL) {
> +		AE4DMA_PMD_ERR("memzone reserve failed for %s", cmd_q->memz_name);
> +		return -ENOMEM;
> +	}
> +
> +	cmd_q->mz = q_mz;
> +	cmd_q->qbase_addr = q_mz->addr;
> +	cmd_q->qbase_desc = q_mz->addr;
> +	cmd_q->qbase_phys_addr = q_mz->iova;
> +
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +			AE4DMA_CMD_QUEUE_ENABLE);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->intr_status_reg.intr_status_raw,
> +			AE4DMA_DISABLE_INTR);
> +	cmd_q->next_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +	cmd_q->next_read = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
> +	cmd_q->ring_buff_count = 0;
> +
> +	dma_addr_lo = lower_32_bits(cmd_q->qbase_phys_addr);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_lo, dma_addr_lo);
> +	dma_addr_hi = upper_32_bits(cmd_q->qbase_phys_addr);
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->qbase_hi, dma_addr_hi);
> +
> +	return 0;
> +}
> +
> +static void
> +ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
> +		unsigned int ch)
> +{
> +	snprintf(out, outlen, "%s-ch%u", pci_name, ch);
> +}
> +
> +static int
> +ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
> +{
> +	struct rte_dma_dev *dmadev;
> +	struct ae4dma_dmadev *ae4dma;
> +	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];

Please define local variables in a descending order, with longer ones
placed at the front. It is recommended to modify the entire driver in
this way.

> +
> +	memset(hwq_dev_name, 0, sizeof(hwq_dev_name));

why not char hwq_dev_name[RTE_DEV_NAME_MAX_LEN] = {0};

> +	ae4dma_channel_dev_name(hwq_dev_name, sizeof(hwq_dev_name), name, qn);
> +
> +	dmadev = rte_dma_pmd_allocate(hwq_dev_name, dev->device.numa_node,
> +			sizeof(struct ae4dma_dmadev));
> +	if (dmadev == NULL) {
> +		AE4DMA_PMD_ERR("Unable to allocate dma device");
> +		return -ENOMEM;
> +	}
> +	dmadev->device = &dev->device;
> +	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
> +
> +	ae4dma = dmadev->data->dev_private;
> +
> +	if (ae4dma_add_queue(ae4dma, dev, qn, name) != 0)
> +		goto init_error;
> +	return 0;
> +
> +init_error:
> +	AE4DMA_PMD_ERR("failed");

why not add more info, e.g. Probe failed!

> +	rte_dma_pmd_release(hwq_dev_name);
> +	return -ENOMEM;
> +}
> +
> +static int
> +ae4dma_dmadev_probe(struct rte_pci_driver *drv __rte_unused,
> +		struct rte_pci_device *dev)
> +{
> +	char name[32];
> +	char chname[RTE_DEV_NAME_MAX_LEN];
> +	void *mmio_base;
> +	uint32_t q_per_eng;
> +	int ret = 0;
> +	uint8_t i;
> +
> +	rte_pci_device_name(&dev->addr, name, sizeof(name));
> +	AE4DMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
> +
> +	mmio_base = dev->mem_resource[AE4DMA_PCIE_BAR].addr;
> +	if (mmio_base == NULL) {
> +		AE4DMA_PMD_ERR("%s: BAR%d not mapped", name, AE4DMA_PCIE_BAR);
> +		return -ENODEV;
> +	}
> +
> +	/* Program the per-engine HW queue count once. */
> +	AE4DMA_WRITE_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET,
> +			AE4DMA_MAX_HW_QUEUES);
> +	q_per_eng = AE4DMA_READ_REG_OFFSET(mmio_base, AE4DMA_COMMON_CONFIG_OFFSET);
> +	AE4DMA_PMD_INFO("%s: AE4DMA queues per engine = %u", name, q_per_eng);
> +
> +	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +		ret = ae4dma_dmadev_create(name, dev, i);
> +		if (ret != 0) {
> +			AE4DMA_PMD_ERR("%s create dmadev %u failed!", name, i);
> +			while (i > 0) {
> +				i--;
> +				ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +				rte_dma_pmd_release(chname);
> +			}
> +			break;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static int
> +ae4dma_dmadev_remove(struct rte_pci_device *dev)
> +{
> +	char name[32];
> +	char chname[RTE_DEV_NAME_MAX_LEN];
> +	unsigned int i;
> +
> +	rte_pci_device_name(&dev->addr, name, sizeof(name));
> +
> +	AE4DMA_PMD_INFO("Closing %s on NUMA node %d",
> +			name, dev->device.numa_node);
> +
> +	for (i = 0; i < AE4DMA_MAX_HW_QUEUES; i++) {
> +		ae4dma_channel_dev_name(chname, sizeof(chname), name, i);
> +		rte_dma_pmd_release(chname);
> +	}
> +	return 0;
> +}
> +
> +static const struct rte_pci_id pci_id_ae4dma_map[] = {
> +	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, AE4DMA_DEVICE_ID) },
> +	{ .vendor_id = 0, /* sentinel */ },
> +};
> +
> +static struct rte_pci_driver ae4dma_pmd_drv = {
> +	.id_table = pci_id_ae4dma_map,
> +	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> +	.probe = ae4dma_dmadev_probe,
> +	.remove = ae4dma_dmadev_remove,
> +};
> +
> +RTE_PMD_REGISTER_PCI(AE4DMA_PMD_NAME, ae4dma_pmd_drv);
> +RTE_PMD_REGISTER_PCI_TABLE(AE4DMA_PMD_NAME, pci_id_ae4dma_map);
> +RTE_PMD_REGISTER_KMOD_DEP(AE4DMA_PMD_NAME, "* igb_uio | uio_pci_generic | vfio-pci");
> diff --git a/drivers/dma/ae4dma/ae4dma_hw_defs.h b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> new file mode 100644
> index 0000000000..e7798be09b
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_hw_defs.h
> @@ -0,0 +1,154 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef __AE4DMA_HW_DEFS_H__
> +#define __AE4DMA_HW_DEFS_H__
> +
> +#include <stdint.h>
> +
> +#include <rte_bus_pci.h>
> +#include <rte_byteorder.h>
> +#include <rte_io.h>
> +#include <rte_pci.h>
> +#include <rte_memzone.h>

Some of the include file are not need for this head-file.

> +
> +#define AE4DMA_BIT(nr)			(1UL << (nr))
> +
> +/* ae4dma device details */
> +#define AMD_VENDOR_ID	0x1022
> +#define AE4DMA_DEVICE_ID	0x149b
> +#define AE4DMA_PCIE_BAR 0
> +
> +/*
> + * An AE4DMA engine has 16 DMA queues. Each queue supports 32 descriptors.
> + */
> +#define AE4DMA_MAX_HW_QUEUES        16
> +#define AE4DMA_QUEUE_START_INDEX    0
> +#define AE4DMA_CMD_QUEUE_ENABLE		0x1
> +#define AE4DMA_CMD_QUEUE_DISABLE	0x0
> +
> +/* Common to all queues */
> +#define AE4DMA_COMMON_CONFIG_OFFSET 0x00
> +
> +#define AE4DMA_DISABLE_INTR 0x01
> +
> +/* Descriptor status */
> +enum ae4dma_dma_status {
> +	AE4DMA_DMA_DESC_SUBMITTED = 0,
> +	AE4DMA_DMA_DESC_VALIDATED = 1,
> +	AE4DMA_DMA_DESC_PROCESSED = 2,
> +	AE4DMA_DMA_DESC_COMPLETED = 3,
> +	AE4DMA_DMA_DESC_ERROR = 4,
> +};
> +
> +/* Descriptor error-code */
> +enum ae4dma_dma_err {
> +	AE4DMA_DMA_ERR_NO_ERR = 0,
> +	AE4DMA_DMA_ERR_INV_HEADER = 1,
> +	AE4DMA_DMA_ERR_INV_STATUS = 2,
> +	AE4DMA_DMA_ERR_INV_LEN = 3,
> +	AE4DMA_DMA_ERR_INV_SRC = 4,
> +	AE4DMA_DMA_ERR_INV_DST = 5,
> +	AE4DMA_DMA_ERR_INV_ALIGN = 6,
> +	AE4DMA_DMA_ERR_UNKNOWN = 7,
> +};
> +
> +/* HW Queue status */
> +enum ae4dma_hwqueue_status {
> +	AE4DMA_HWQUEUE_EMPTY = 0,
> +	AE4DMA_HWQUEUE_FULL = 1,
> +	AE4DMA_HWQUEUE_NOT_EMPTY = 4,
> +};
> +/*
> + * descriptor for AE4DMA commands
> + * 8 32-bit words:
> + * word 0: source memory type; destination memory type ; control bits
> + * word 1: desc_id; error code; status
> + * word 2: length
> + * word 3: reserved
> + * word 4: upper 32 bits of source pointer
> + * word 5: low 32 bits of source pointer
> + * word 6: upper 32 bits of destination pointer
> + * word 7: low 32 bits of destination pointer
> + */
> +
> +/* AE4DMA Descriptor - DWORD0 - Controls bits: Reserved for future use */
> +#define AE4DMA_DWORD0_STOP_ON_COMPLETION	AE4DMA_BIT(0)
> +#define AE4DMA_DWORD0_INTERRUPT_ON_COMPLETION	AE4DMA_BIT(1)
> +#define AE4DMA_DWORD0_START_OF_MESSAGE		AE4DMA_BIT(3)
> +#define AE4DMA_DWORD0_END_OF_MESSAGE		AE4DMA_BIT(4)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE	RTE_GENMASK64(5, 4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE	RTE_GENMASK64(7, 6)
> +
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_DESTINATION_MEMORY_TYPE_IOMEMORY  (1<<4)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_MEMORY    (0x0)
> +#define AE4DMA_DWORD0_SOURCE_MEMEORY_TYPE_IOMEMORY  (1<<6)
> +
> +struct ae4dma_desc_dword0 {
> +	uint8_t byte0;
> +	uint8_t byte1;
> +	uint16_t timestamp;
> +};
> +
> +struct ae4dma_desc_dword1 {
> +	uint8_t status;
> +	uint8_t err_code;
> +	uint16_t desc_id;
> +};
> +
> +struct ae4dma_desc {
> +	struct ae4dma_desc_dword0 dw0;
> +	struct ae4dma_desc_dword1 dw1;
> +	uint32_t length;
> +	uint32_t reserved;
> +	uint32_t src_lo;
> +	uint32_t src_hi;
> +	uint32_t dst_lo;
> +	uint32_t dst_hi;
> +};
> +
> +/*
> + * Registers for each queue :4 bytes length
> + * Effective address : offset + reg
> + */
> +struct ae4dma_hwq_regs {
> +	union {
> +		uint32_t control_raw;
> +		struct {
> +			uint32_t queue_enable: 1;
> +			uint32_t reserved_internal: 31;
> +		} control;
> +	} control_reg;
> +
> +	union {
> +		uint32_t status_raw;
> +		struct {
> +			uint32_t reserved0: 1;
> +			/* 0–empty, 1–full, 2–stopped, 3–error , 4–Not Empty */
> +			uint32_t queue_status: 2;
> +			uint32_t reserved1: 21;
> +			uint32_t interrupt_type: 4;
> +			uint32_t reserved2: 4;
> +		} status;
> +	} status_reg;
> +
> +	uint32_t max_idx;
> +	uint32_t read_idx;
> +	uint32_t write_idx;
> +
> +	union {
> +		uint32_t intr_status_raw;
> +		struct {
> +			uint32_t intr_status: 1;
> +			uint32_t reserved: 31;
> +		} intr_status;
> +	} intr_status_reg;
> +
> +	uint32_t qbase_lo;
> +	uint32_t qbase_hi;
> +
> +};
> +
> +#endif /* AE4DMA_HW_DEFS_H */
> diff --git a/drivers/dma/ae4dma/ae4dma_internal.h b/drivers/dma/ae4dma/ae4dma_internal.h
> new file mode 100644
> index 0000000000..7f149c97b5
> --- /dev/null
> +++ b/drivers/dma/ae4dma/ae4dma_internal.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
> + */
> +
> +#ifndef _AE4DMA_INTERNAL_H_
> +#define _AE4DMA_INTERNAL_H_
> +
> +#include <stdint.h>
> +
> +#include "ae4dma_hw_defs.h"
> +
> +/* Return bits 32-63 of a 64-bit number. */
> +#define upper_32_bits(n) ((uint32_t)(((n) >> 16) >> 16))
> +
> +/* Return bits 0-31 of a 64-bit number. */
> +#define lower_32_bits(n) ((uint32_t)((n) & 0xffffffff))
> +
> +/* Hardware ring depth (slots per queue); must be power of two. */
> +#define AE4DMA_DESCRIPTORS_PER_CMDQ	32
> +#define AE4DMA_QUEUE_DESC_SIZE		sizeof(struct ae4dma_desc)
> +#define AE4DMA_QUEUE_SIZE(n)		(AE4DMA_DESCRIPTORS_PER_CMDQ * (n))
> +

two blank lines

> +
> +/* AE4DMA registers Write/Read */
> +static inline void ae4dma_pci_reg_write(void *base, int offset,
> +		uint32_t value)
> +{
> +	volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
> +}
> +
> +static inline uint32_t ae4dma_pci_reg_read(void *base, int offset)
> +{
> +	volatile void *reg_addr = ((uint8_t *)base + offset);
> +
> +	return rte_le_to_cpu_32(rte_read32(reg_addr));
> +}
> +
> +#define AE4DMA_READ_REG_OFFSET(hw_addr, reg_offset) \
> +	ae4dma_pci_reg_read(hw_addr, reg_offset)
> +
> +#define AE4DMA_WRITE_REG_OFFSET(hw_addr, reg_offset, value) \
> +	ae4dma_pci_reg_write(hw_addr, reg_offset, value)
> +
> +

two blank lines

> +#define AE4DMA_READ_REG(hw_addr) \
> +	ae4dma_pci_reg_read((void *)(uintptr_t)(hw_addr), 0)
> +
> +#define AE4DMA_WRITE_REG(hw_addr, value) \
> +	ae4dma_pci_reg_write((void *)(uintptr_t)(hw_addr), 0, value)
> +
> +/* A structure describing an AE4DMA command queue. */
> +struct __rte_cache_aligned ae4dma_cmd_queue {
> +	char memz_name[RTE_MEMZONE_NAMESIZE];
> +	const struct rte_memzone *mz;
> +	volatile struct ae4dma_hwq_regs *hwq_regs;
> +
> +	struct rte_dma_vchan_conf qcfg;
> +	struct rte_dma_stats stats;
> +	/* Queue address */
> +	struct ae4dma_desc *qbase_desc;
> +	void *qbase_addr;
> +	rte_iova_t qbase_phys_addr;
> +	enum ae4dma_dma_err status[AE4DMA_DESCRIPTORS_PER_CMDQ];
> +	/* Queue identifier */
> +	uint64_t id;    /* queue id */
> +	uint64_t qidx;  /* queue index */
> +	uint64_t qsize; /* queue size */
> +	uint32_t ring_buff_count;
> +	uint16_t next_read;
> +	uint16_t next_write;
> +	uint16_t last_write; /* Used to compute submitted count. */
> +};
> +
> +/*
> + * One dmadev per AE4DMA hardware channel: probe creates AE4DMA_MAX_HW_QUEUES
> + * dmadevs per PCI function, each owning a single HW command queue.
> + */
> +struct ae4dma_dmadev {
> +	void *io_regs;
> +	struct ae4dma_cmd_queue cmd_q; /* single HW queue owned by this dmadev */
> +};
> +
> +

two blank line

> +extern int ae4dma_pmd_logtype;
> +#define RTE_LOGTYPE_AE4DMA_PMD ae4dma_pmd_logtype
> +
> +#define AE4DMA_PMD_LOG(level, ...) \
> +	RTE_LOG_LINE_PREFIX(level, AE4DMA_PMD, "%s(): ", __func__, __VA_ARGS__)
> +
> +#define AE4DMA_PMD_DEBUG(...)  AE4DMA_PMD_LOG(DEBUG, __VA_ARGS__)
> +#define AE4DMA_PMD_INFO(...)   AE4DMA_PMD_LOG(INFO, __VA_ARGS__)
> +#define AE4DMA_PMD_ERR(...)    AE4DMA_PMD_LOG(ERR, __VA_ARGS__)
> +#define AE4DMA_PMD_WARN(...)   AE4DMA_PMD_LOG(WARNING, __VA_ARGS__)
> +
> +#endif /* _AE4DMA_INTERNAL_H_ */
> diff --git a/drivers/dma/ae4dma/meson.build b/drivers/dma/ae4dma/meson.build
> new file mode 100644
> index 0000000000..e48ab0d561
> --- /dev/null
> +++ b/drivers/dma/ae4dma/meson.build
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2024 Advanced Micro Devices, Inc. All rights reserved.

2024 -> 2026

Does this also support run BSD or Windows, if not please add following instruments:
if not is_linux
    build = false
    reason = 'only supported on Linux'
    subdir_done()
endif

> +
> +build = dpdk_conf.has('RTE_ARCH_X86')
> +reason = 'only supported on x86'
> +sources = files('ae4dma_dmadev.c')
> +deps += ['bus_pci', 'dmadev']
> diff --git a/drivers/dma/meson.build b/drivers/dma/meson.build
> index e0d94db967..c230ac5a06 100644
> --- a/drivers/dma/meson.build
> +++ b/drivers/dma/meson.build
> @@ -2,6 +2,7 @@
>  # Copyright 2021 HiSilicon Limited
>  
>  drivers = [
> +        'ae4dma',
>          'cnxk',
>          'dpaa',
>          'dpaa2',
> diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
> index 93f2383dff..7d09f155de 100755
> --- a/usertools/dpdk-devbind.py
> +++ b/usertools/dpdk-devbind.py
> @@ -86,6 +86,9 @@
>  cn9k_ree = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f4',
>              'SVendor': None, 'SDevice': None}
>  
> +amd_ae4dma = {'Class': '08', 'Vendor': '1022', 'Device': '149b',
> +              'SVendor': None, 'SDevice': None}
> +
>  virtio_blk = {'Class': '01', 'Vendor': "1af4", 'Device': '1001,1042',
>                'SVendor': None, 'SDevice': None}
>  
> @@ -95,7 +98,7 @@
>  network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
>  baseband_devices = [acceleration_class]
>  crypto_devices = [encryption_class, intel_processor_class]
> -dma_devices = [cnxk_dma, hisilicon_dma,
> +dma_devices = [amd_ae4dma, cnxk_dma, hisilicon_dma,
>                 intel_idxd_gnrd, intel_idxd_dmr, intel_idxd_spr,
>                 intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx,
>                 odm_dma]


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/3] dma/ae4dma: add control path operations
  2026-06-25 18:47     ` [PATCH v3 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
@ 2026-06-27  0:09       ` fengchengwen
  2026-06-28 16:04         ` Stephen Hemminger
  0 siblings, 1 reply; 24+ messages in thread
From: fengchengwen @ 2026-06-27  0:09 UTC (permalink / raw)
  To: Raghavendra Ningoji, dev
  Cc: david.marchand, bruce.richardson, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas

On 6/26/2026 2:47 AM, Raghavendra Ningoji wrote:
> Implement the dmadev control path for the AMD AE4DMA PMD.
> 
> This commit adds:
>  - dev_configure / vchan_setup: accept a single virtual channel per
>    dmadev and clamp the requested ring size to the hardware maximum
>    of 32 descriptors (rounded up to a power of two).
>  - dev_start / dev_stop / dev_close: program the per-queue control
>    register to enable/disable the hardware queue and release the
>    descriptor ring memzone on close.
>  - dev_info_get: advertise RTE_DMA_CAPA_MEM_TO_MEM and the fixed
>    ring depth.

It seemed declare support 2~32 depth, not fixed

>  - dev_dump: print the queue identifiers, ring layout and software
>    completion counters.
>  - stats_get / stats_reset: expose submitted / completed / errors
>    counters maintained by the driver.
>  - vchan_status: report IDLE / ACTIVE based on hardware read_idx vs
>    write_idx, and HALTED_ERROR when the queue is not enabled.
> 
> The dmadev framework is wired through dev_ops in ae4dma_dmadev_create().
> 
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
> ---
>  drivers/dma/ae4dma/ae4dma_dmadev.c | 211 +++++++++++++++++++++++++++++
>  1 file changed, 211 insertions(+)
> 
> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> index 3d82f86906..607f288623 100644
> --- a/drivers/dma/ae4dma/ae4dma_dmadev.c
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -53,6 +53,203 @@ ae4dma_queue_dma_zone_reserve(const char *queue_name,
>  			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
>  }
>  
> +static int
> +ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
> +		const struct rte_dma_conf *dev_conf,
> +		uint32_t conf_sz)
> +{
> +	if (sizeof(struct rte_dma_conf) != conf_sz)
> +		return -EINVAL;

This may break ABI compatible

> +
> +	if (dev_conf->nb_vchans != 1)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/* Setup a virtual channel for AE4DMA, only 1 vchan is supported per dmadev. */
> +static int
> +ae4dma_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +		const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t max_desc = qconf->nb_desc;
> +
> +	if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
> +		return -EINVAL;

This may break ABI compatible

> +
> +	if (max_desc < 2)
> +		return -EINVAL;

No need to do this because rte_dma_vchan_setup already do it.

> +
> +	if (!rte_is_power_of_2(max_desc))
> +		max_desc = rte_align32pow2(max_desc);
> +
> +	if (max_desc > AE4DMA_DESCRIPTORS_PER_CMDQ) {
> +		AE4DMA_PMD_DEBUG("DMA dev %u nb_desc clamped to %u",
> +				dev->data->dev_id, AE4DMA_DESCRIPTORS_PER_CMDQ);
> +		max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +	}

No need to do this because rte_dma_vchan_setup already do it.

> +
> +	cmd_q->qcfg = *qconf;
> +	cmd_q->qcfg.nb_desc = max_desc;
> +
> +	/* Ensure all counters are reset, if reconfiguring/restarting device. */
> +	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
> +	return 0;
> +}
> +
> +static int
> +ae4dma_dev_start(struct rte_dma_dev *dev)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +	if (nb == 0)
> +		return -EBUSY;
> +
> +	/* Program ring depth expected by hardware. */
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->max_idx, nb);
> +	return 0;
> +}
> +
> +static int
> +ae4dma_dev_stop(struct rte_dma_dev *dev)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +	if (cmd_q->hwq_regs != NULL)
> +		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +				AE4DMA_CMD_QUEUE_DISABLE);
> +	return 0;
> +}
> +
> +static int
> +ae4dma_dev_info_get(const struct rte_dma_dev *dev __rte_unused,
> +		struct rte_dma_info *info, uint32_t size)
> +{
> +	if (size < sizeof(*info))
> +		return -EINVAL;
> +	info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM;

You need also decalre support RTE_DMA_CAP_OPS_COPY, please use dpdk-test dmadev_autotest
to test it.

The dpdk-dma-perf could also test dmadev.

> +	info->max_vchans = 1;
> +	info->min_desc = 2;
> +	info->max_desc = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +	info->nb_vchans = 1;
> +	return 0;
> +}
> +
> +static int
> +ae4dma_dev_close(struct rte_dma_dev *dev)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +	if (cmd_q->hwq_regs != NULL)
> +		AE4DMA_WRITE_REG(&cmd_q->hwq_regs->control_reg.control_raw,
> +				AE4DMA_CMD_QUEUE_DISABLE);
> +
> +	rte_memzone_free(cmd_q->mz);
> +	cmd_q->mz = NULL;
> +	cmd_q->qbase_desc = NULL;
> +	cmd_q->qbase_addr = NULL;
> +	cmd_q->qbase_phys_addr = 0;
> +	return 0;
> +}
> +
> +static int
> +ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q;
> +	void *ae4dma_mmio_base_addr = (uint8_t *)ae4dma->io_regs;
> +
> +	cmd_q = &ae4dma->cmd_q;
> +	fprintf(f, "cmd_q->id              = %" PRIx64 "\n", cmd_q->id);
> +	fprintf(f, "cmd_q->qidx            = %" PRIx64 "\n", cmd_q->qidx);
> +	fprintf(f, "cmd_q->qsize           = %" PRIx64 "\n", cmd_q->qsize);
> +	fprintf(f, "mmio_base_addr	= %p\n", ae4dma_mmio_base_addr);
> +	fprintf(f, "queues per ae4dma engine     = %d\n", AE4DMA_READ_REG_OFFSET(
> +				ae4dma_mmio_base_addr, AE4DMA_COMMON_CONFIG_OFFSET));
> +	fprintf(f, "== Private Data ==\n");
> +	fprintf(f, "  Config: { ring_size: %u }\n", cmd_q->qcfg.nb_desc);
> +	fprintf(f, "  Ring virt: %p\tphys: %#" PRIx64 "\n",
> +			(void *)cmd_q->qbase_desc,
> +			(uint64_t)cmd_q->qbase_phys_addr);
> +	fprintf(f, "  Next write: %u\n", cmd_q->next_write);
> +	fprintf(f, "  Next read: %u\n", cmd_q->next_read);
> +	fprintf(f, "  current queue depth: %u\n", cmd_q->ring_buff_count);
> +	fprintf(f, "  }\n");
> +	fprintf(f, "  Key Stats { submitted: %" PRIu64 ", comp: %" PRIu64 ", failed: %" PRIu64 " }\n",
> +		cmd_q->stats.submitted,
> +		cmd_q->stats.completed,
> +		cmd_q->stats.errors);
> +	return 0;
> +}
> +static int
> +ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +		struct rte_dma_stats *rte_stats, uint32_t size)
> +{
> +	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	const struct rte_dma_stats *stats = &cmd_q->stats;
> +
> +	if (size < sizeof(*rte_stats))
> +		return -EINVAL;
> +	if (rte_stats == NULL)
> +		return -EINVAL;

No need to do this check because rte_dma_stats_get already check it
Please make such check on other ops.

> +
> +	*rte_stats = *stats;
> +	return 0;
> +}
> +
> +static int
> +ae4dma_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +
> +	memset(&cmd_q->stats, 0, sizeof(cmd_q->stats));
> +	return 0;
> +}
> +
> +/*
> + * Report channel state to the dmadev framework.
> + *
> + *   RTE_DMA_VCHAN_HALTED_ERROR - HW queue is disabled (never started, or
> + *                                stopped via dev_stop()).
> + *   RTE_DMA_VCHAN_IDLE         - HW has caught up: read_idx == write_idx,
> + *                                no descriptors in flight.
> + *   RTE_DMA_VCHAN_ACTIVE       - HW still has descriptors to process.
> + */
> +static int
> +ae4dma_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
> +		enum rte_dma_vchan_status *status)
> +{
> +	const struct ae4dma_dmadev *ae4dma = dev->fp_obj->dev_private;
> +	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint32_t ctrl, hw_read, hw_write;
> +
> +	if (cmd_q->hwq_regs == NULL) {
> +		*status = RTE_DMA_VCHAN_HALTED_ERROR;
> +		return 0;
> +	}
> +
> +	ctrl = AE4DMA_READ_REG(&cmd_q->hwq_regs->control_reg.control_raw);
> +	if ((ctrl & AE4DMA_CMD_QUEUE_ENABLE) == 0) {
> +		*status = RTE_DMA_VCHAN_HALTED_ERROR;
> +		return 0;
> +	}
> +
> +	hw_read  = AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx);
> +	hw_write = AE4DMA_READ_REG(&cmd_q->hwq_regs->write_idx);
> +
> +	*status = (hw_read == hw_write) ? RTE_DMA_VCHAN_IDLE
> +					: RTE_DMA_VCHAN_ACTIVE;
> +	return 0;
> +}
> +
>  static int
>  ae4dma_add_queue(struct ae4dma_dmadev *dev, struct rte_pci_device *pci,
>  		uint8_t qn, const char *pci_name)
> @@ -115,6 +312,19 @@ ae4dma_channel_dev_name(char *out, size_t outlen, const char *pci_name,
>  static int
>  ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
>  {
> +	static const struct rte_dma_dev_ops ae4dma_dmadev_ops = {
> +		.dev_close = ae4dma_dev_close,
> +		.dev_configure = ae4dma_dev_configure,
> +		.dev_dump = ae4dma_dev_dump,
> +		.dev_info_get = ae4dma_dev_info_get,
> +		.dev_start = ae4dma_dev_start,
> +		.dev_stop = ae4dma_dev_stop,
> +		.stats_get = ae4dma_stats_get,
> +		.stats_reset = ae4dma_stats_reset,
> +		.vchan_status = ae4dma_vchan_status,
> +		.vchan_setup = ae4dma_vchan_setup,
> +	};
> +
>  	struct rte_dma_dev *dmadev;
>  	struct ae4dma_dmadev *ae4dma;
>  	char hwq_dev_name[RTE_DEV_NAME_MAX_LEN];
> @@ -130,6 +340,7 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
>  	}
>  	dmadev->device = &dev->device;
>  	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
> +	dmadev->dev_ops = &ae4dma_dmadev_ops;
>  
>  	ae4dma = dmadev->data->dev_private;
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] dma/ae4dma: add data path operations
  2026-06-25 18:47     ` [PATCH v3 3/3] dma/ae4dma: add data " Raghavendra Ningoji
@ 2026-06-27  0:23       ` fengchengwen
  0 siblings, 0 replies; 24+ messages in thread
From: fengchengwen @ 2026-06-27  0:23 UTC (permalink / raw)
  To: Raghavendra Ningoji, dev
  Cc: david.marchand, bruce.richardson, Selwin.Sebastian,
	bhagyada.modali, rjarry, thomas

On 6/26/2026 2:47 AM, Raghavendra Ningoji wrote:
> Implement the dmadev fast path for the AMD AE4DMA PMD.
> 
> This commit adds:
>  - copy enqueue (rte_dma_copy): write an AE4DMA descriptor for a
>    memory-to-memory transfer; on RTE_DMA_OP_FLAG_SUBMIT the doorbell
>    is rung immediately.
>  - submit (rte_dma_submit): advance the per-queue write_idx
>    register to expose pending descriptors to the hardware.
>  - completion (rte_dma_completed / rte_dma_completed_status):
>    completion is detected via the hardware's per-queue read_idx
>    register, which the engine advances as it processes descriptors.
>    The descriptor status / err_code bytes are read only to classify
>    each drained slot as success or failure, and HW error codes are
>    translated to the dmadev RTE_DMA_STATUS_* enumeration.
>  - burst capacity (rte_dma_burst_capacity): report the number of
>    free descriptor slots, taking into account the one slot reserved
>    to distinguish full from empty on the power-of-two ring.

I don't think it's necessary to write in such detail because the ops
implemented are defined by the framework. If needed, you can supplement
by explaining what special features this driver has.

> 
> The fast path entry points are wired through fp_obj in
> ae4dma_dmadev_create(). The fill capability is not advertised;
> fp_obj->fill is left zero-initialised.
> 
> Signed-off-by: Raghavendra Ningoji <raghavendra.ningoji@amd.com>
> ---
>  doc/guides/dmadevs/ae4dma.rst      |  22 +++
>  drivers/dma/ae4dma/ae4dma_dmadev.c | 287 +++++++++++++++++++++++++++++
>  2 files changed, 309 insertions(+)
> 
> diff --git a/doc/guides/dmadevs/ae4dma.rst b/doc/guides/dmadevs/ae4dma.rst
> index a85c1d92ca..37a2096ccf 100644
> --- a/doc/guides/dmadevs/ae4dma.rst
> +++ b/doc/guides/dmadevs/ae4dma.rst
> @@ -51,3 +51,25 @@ On probe the PMD performs the following steps for each PCI function:
>    IOVA-contiguous memory, programs the queue base address and ring
>    depth into the per-queue registers, and enables the queue.
>  * Interrupts are masked; completion is polled by the application.
> +
> +Usage
> +-----
> +
> +Once a dmadev has been started, copies are submitted with
> +``rte_dma_copy()`` and completions are reaped with ``rte_dma_completed()``
> +or ``rte_dma_completed_status()``. See the
> +:ref:`Enqueue / Dequeue API <dmadev_enqueue_dequeue>` section of the
> +dmadev library documentation for details.
> +
> +Limitations
> +-----------
> +
> +* Only memory-to-memory copies are supported. Fill, scatter-gather and
> +  any other operation types are not advertised in
> +  ``rte_dma_info::dev_capa``.
> +* The maximum number of descriptors per virtual channel is fixed by
> +  hardware at 32. The PMD rounds the requested ring size up to a
> +  power of two and clamps it to 32.
> +* Only a single virtual channel per dmadev is supported; use the 16
> +  per-PCI-function dmadevs to obtain channel-level parallelism.
> +* Interrupt-driven completion is not supported.
> diff --git a/drivers/dma/ae4dma/ae4dma_dmadev.c b/drivers/dma/ae4dma/ae4dma_dmadev.c
> index 607f288623..da3ec42233 100644
> --- a/drivers/dma/ae4dma/ae4dma_dmadev.c
> +++ b/drivers/dma/ae4dma/ae4dma_dmadev.c
> @@ -158,6 +158,72 @@ ae4dma_dev_close(struct rte_dma_dev *dev)
>  	return 0;
>  }
>  
> +/* trigger h/w to process enqued desc:doorbell - by next_write */
> +static inline void
> +__submit(struct ae4dma_dmadev *ae4dma)
> +{
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t write_idx = cmd_q->next_write;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +	AE4DMA_WRITE_REG(&cmd_q->hwq_regs->write_idx, write_idx);
> +	if (nb != 0)
> +		cmd_q->stats.submitted += (uint16_t)((cmd_q->next_write - cmd_q->last_write +
> +				nb) % nb);
> +	cmd_q->last_write = cmd_q->next_write;
> +}
> +
> +static int
> +ae4dma_submit(void *dev_private, uint16_t vchan __rte_unused)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev_private;
> +
> +	__submit(ae4dma);
> +	return 0;
> +}
> +
> +/* Write descriptor for enqueue (copy only). */
> +static inline int
> +__write_desc_copy(void *dev_private, rte_iova_t src, rte_iova_t dst,
> +		uint32_t len, uint64_t flags)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	struct ae4dma_desc *dma_desc;
> +	uint16_t ret;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +	uint16_t write = cmd_q->next_write;
> +
> +	if (nb == 0)
> +		return -EINVAL;
> +
> +	/* Reserve one slot to distinguish full from empty (power-of-two ring). */
> +	if ((uint32_t)cmd_q->ring_buff_count >= (uint32_t)(nb - 1))
> +		return -ENOSPC;
> +
> +	dma_desc = &cmd_q->qbase_desc[write];
> +	memset(dma_desc, 0, sizeof(*dma_desc));
> +	dma_desc->length = len;
> +	dma_desc->src_hi = upper_32_bits(src);
> +	dma_desc->src_lo = lower_32_bits(src);
> +	dma_desc->dst_hi = upper_32_bits(dst);
> +	dma_desc->dst_lo = lower_32_bits(dst);
> +	cmd_q->ring_buff_count++;
> +	cmd_q->next_write = (uint16_t)((write + 1) % nb);

the next_write is [0, nb_desc-1], and it will as return value as copy,
but the dmadev framework expect as [0, 0xFFFF], I doubt your drvier was
not passed in any DMA test (e.g. dpdk-test, dpdk-dma-perf or examples/dma)

> +	ret = write;
> +	if (flags & RTE_DMA_OP_FLAG_SUBMIT)
> +		__submit(ae4dma);
> +	return ret;
> +}
> +
> +/* Enqueue a copy operation onto the ae4dma device. */
> +static int
> +ae4dma_enqueue_copy(void *dev_private, uint16_t vchan __rte_unused,
> +		rte_iova_t src, rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +	return __write_desc_copy(dev_private, src, dst, length, flags);
> +}
> +
>  static int
>  ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
>  {
> @@ -187,6 +253,220 @@ ae4dma_dev_dump(const struct rte_dma_dev *dev, FILE *f)
>  		cmd_q->stats.errors);
>  	return 0;
>  }
> +
> +/* Translates AE4DMA ChanERRs to DMA error codes. */
> +static inline enum rte_dma_status_code
> +__translate_status_ae4dma_to_dma(enum ae4dma_dma_err status)
> +{
> +	AE4DMA_PMD_DEBUG("ae4dma desc status = %d", status);
> +
> +	switch (status) {
> +	case AE4DMA_DMA_ERR_NO_ERR:
> +		return RTE_DMA_STATUS_SUCCESSFUL;
> +	case AE4DMA_DMA_ERR_INV_LEN:
> +		return RTE_DMA_STATUS_INVALID_LENGTH;
> +	case AE4DMA_DMA_ERR_INV_SRC:
> +		return RTE_DMA_STATUS_INVALID_SRC_ADDR;
> +	case AE4DMA_DMA_ERR_INV_DST:
> +		return RTE_DMA_STATUS_INVALID_DST_ADDR;
> +	case AE4DMA_DMA_ERR_INV_ALIGN:
> +		/* Name matches DPDK public enum spelling. */
> +		return RTE_DMA_STATUS_DATA_POISION;

Suggest add RTE_DMA_STATUS_INVALID_ALIGN enum in rte_dmadev.h

> +	case AE4DMA_DMA_ERR_INV_HEADER:
> +	case AE4DMA_DMA_ERR_INV_STATUS:
> +		return RTE_DMA_STATUS_ERROR_UNKNOWN;
> +	default:
> +		return RTE_DMA_STATUS_ERROR_UNKNOWN;
> +	}
> +}
> +
> +/*
> + * Scan HW queue for completed descriptors (non-blocking).
> + *
> + * The AE4DMA engine signals completion by advancing the per-queue
> + * `read_idx` register; it does not (reliably) write a status value
> + * back into the descriptor. We therefore use the HW `read_idx`
> + * register as the source of truth and only inspect the descriptor's
> + * `dw1.err_code` byte to classify each completion as success or
> + * failure.
> + *
> + * @param cmd_q
> + *   The AE4DMA command queue.
> + * @param max_ops
> + *   Maximum descriptors to process this call.
> + * @param[out] failed_count
> + *   Number of completed descriptors that did not report success.
> + * @return
> + *   Number of descriptors completed (success + failure), <= max_ops.
> + */
> +static inline uint16_t
> +ae4dma_scan_hwq(struct ae4dma_cmd_queue *cmd_q, uint16_t max_ops,
> +		uint16_t *failed_count)
> +{
> +	volatile struct ae4dma_desc *hw_desc;
> +	uint16_t events_count = 0, fails = 0;
> +	uint16_t tail;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +	uint16_t mask;
> +	uint16_t hw_read_idx;
> +	uint16_t in_flight;
> +	uint16_t scan_cap;
> +
> +	if (nb == 0 || cmd_q->ring_buff_count == 0) {
> +		*failed_count = 0;
> +		return 0;
> +	}
> +	mask = nb - 1;
> +
> +	hw_read_idx = (uint16_t)(AE4DMA_READ_REG(&cmd_q->hwq_regs->read_idx) & mask);
> +	tail = cmd_q->next_read;
> +
> +	/*
> +	 * Descriptors completed since our last visit live in the
> +	 * half-open ring range [tail, hw_read_idx). If HW hasn't
> +	 * moved we have nothing to do.
> +	 */
> +	in_flight = (uint16_t)((hw_read_idx - tail) & mask);
> +	if (in_flight == 0) {
> +		*failed_count = 0;
> +		return 0;
> +	}
> +
> +	scan_cap = max_ops;
> +	if (scan_cap > AE4DMA_DESCRIPTORS_PER_CMDQ)
> +		scan_cap = AE4DMA_DESCRIPTORS_PER_CMDQ;
> +	if (scan_cap > in_flight)
> +		scan_cap = in_flight;
> +	if (scan_cap > cmd_q->ring_buff_count)
> +		scan_cap = (uint16_t)cmd_q->ring_buff_count;
> +
> +	while (events_count < scan_cap) {
> +		uint8_t hw_status;
> +		uint8_t hw_err;
> +
> +		hw_desc = &cmd_q->qbase_desc[tail];
> +		hw_status = hw_desc->dw1.status;
> +		hw_err = hw_desc->dw1.err_code;
> +
> +		/*
> +		 * read_idx advancing is the definitive completion
> +		 * signal. The per-descriptor status byte is informational
> +		 * and may not yet be written when we observe it:
> +		 *
> +		 *   AE4DMA_DMA_DESC_ERROR (4)
> +		 *     Hard failure - err_code names the precise cause.
> +		 *   AE4DMA_DMA_DESC_COMPLETED (3) or 0
> +		 *     Success.
> +		 *   AE4DMA_DMA_DESC_VALIDATED (1) / _PROCESSED (2)
> +		 *     Benign race: HW had not finished updating the
> +		 *     status byte at the instant we read it. Since
> +		 *     read_idx has moved past this slot, treat it as
> +		 *     success unless err_code says otherwise.
> +		 *
> +		 * A non-zero err_code is treated as a failure regardless
> +		 * of the observed status value.
> +		 */
> +		if (hw_status == AE4DMA_DMA_DESC_ERROR ||
> +				hw_err != AE4DMA_DMA_ERR_NO_ERR) {
> +			fails++;
> +			AE4DMA_PMD_WARN("Desc failed: status=%u err=%u",
> +					hw_status, hw_err);
> +		}
> +		cmd_q->status[events_count] = (enum ae4dma_dma_err)hw_err;
> +		cmd_q->ring_buff_count--;
> +		events_count++;
> +		tail = (tail + 1) & mask;
> +	}
> +
> +	cmd_q->stats.completed += events_count;
> +	cmd_q->stats.errors += fails;
> +	cmd_q->next_read = tail;
> +	*failed_count = fails;
> +	return events_count;
> +}
> +
> +/* Returns successful operations count and sets error flag if any errors. */
> +static uint16_t
> +ae4dma_completed(void *dev_private, uint16_t vchan __rte_unused,
> +		const uint16_t max_ops, uint16_t *last_idx, bool *has_error)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t cpl_count, sl_count;
> +	uint16_t err_count = 0;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +	*has_error = false;
> +
> +	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
> +
> +	if (cpl_count > max_ops)
> +		cpl_count = max_ops;
> +
> +	if (cpl_count > 0 && last_idx != NULL)
> +		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);

the last_idx should be in range of [0, 0xFFFF]

> +
> +	sl_count = cpl_count - err_count;
> +	if (err_count)
> +		*has_error = true;
> +
> +	return sl_count;
> +}
> +
> +static uint16_t
> +ae4dma_completed_status(void *dev_private, uint16_t vchan __rte_unused,
> +		uint16_t max_ops, uint16_t *last_idx,
> +		enum rte_dma_status_code *status)
> +{
> +	struct ae4dma_dmadev *ae4dma = dev_private;
> +	struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t cpl_count;
> +	uint16_t i;
> +	uint16_t err_count = 0;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +
> +	cpl_count = ae4dma_scan_hwq(cmd_q, max_ops, &err_count);
> +
> +	if (cpl_count > max_ops)
> +		cpl_count = max_ops;
> +
> +	if (cpl_count > 0 && last_idx != NULL)
> +		*last_idx = (uint16_t)((cmd_q->next_read - 1 + nb) % nb);
> +
> +	if (likely(err_count == 0)) {
> +		for (i = 0; i < cpl_count; i++)
> +			status[i] = RTE_DMA_STATUS_SUCCESSFUL;
> +	} else {
> +		for (i = 0; i < cpl_count; i++)
> +			status[i] = __translate_status_ae4dma_to_dma(cmd_q->status[i]);
> +	}
> +
> +	return cpl_count;
> +}
> +
> +/* Get the remaining capacity of the ring. */
> +static uint16_t
> +ae4dma_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
> +{
> +	const struct ae4dma_dmadev *ae4dma = dev_private;
> +	const struct ae4dma_cmd_queue *cmd_q = &ae4dma->cmd_q;
> +	uint16_t nb = cmd_q->qcfg.nb_desc;
> +	uint16_t mask;
> +	uint16_t read_idx = cmd_q->next_read;
> +	uint16_t write_idx = cmd_q->next_write;
> +	uint16_t used;
> +
> +	if (nb < 2 || !rte_is_power_of_2(nb))
> +		return 0;

No need to check this

> +
> +	mask = nb - 1;
> +	used = (uint16_t)((write_idx - read_idx) & mask);
> +	/* One slot reserved (same rule as enqueue). */
> +	if (used >= nb - 1)
> +		return 0;
> +	return (uint16_t)(nb - 1 - used);
> +}
> +
>  static int
>  ae4dma_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
>  		struct rte_dma_stats *rte_stats, uint32_t size)
> @@ -342,6 +622,13 @@ ae4dma_dmadev_create(const char *name, struct rte_pci_device *dev, uint8_t qn)
>  	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
>  	dmadev->dev_ops = &ae4dma_dmadev_ops;
>  
> +	dmadev->fp_obj->burst_capacity = ae4dma_burst_capacity;
> +	dmadev->fp_obj->completed = ae4dma_completed;
> +	dmadev->fp_obj->completed_status = ae4dma_completed_status;
> +	dmadev->fp_obj->copy = ae4dma_enqueue_copy;
> +	dmadev->fp_obj->submit = ae4dma_submit;
> +	/* fill capability not advertised: leave fp_obj->fill as zero-initialised. */
> +
>  	ae4dma = dmadev->data->dev_private;
>  
>  	if (ae4dma_add_queue(ae4dma, dev, qn, name) != 0)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/3] dma/ae4dma: add control path operations
  2026-06-27  0:09       ` fengchengwen
@ 2026-06-28 16:04         ` Stephen Hemminger
  0 siblings, 0 replies; 24+ messages in thread
From: Stephen Hemminger @ 2026-06-28 16:04 UTC (permalink / raw)
  To: fengchengwen
  Cc: Raghavendra Ningoji, dev, david.marchand, bruce.richardson,
	Selwin.Sebastian, bhagyada.modali, rjarry, thomas

On Sat, 27 Jun 2026 08:09:09 +0800
fengchengwen <fengchengwen@huawei.com> wrote:

> >  
> > +static int
> > +ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
> > +		const struct rte_dma_conf *dev_conf,
> > +		uint32_t conf_sz)
> > +{
> > +	if (sizeof(struct rte_dma_conf) != conf_sz)
> > +		return -EINVAL;  
> 
> This may break ABI compatible

Ignore that suggestion. This is a reasonable way to handle new configuration
functions. You need/want a minimal set of values. If rte_dma_conf grows in size
then the code can add compatability; by requiring a minimum set of values
and then setting the rest to zero.

Something like

static int
ae4dma_dev_configure(struct rte_dma_dev *dev __rte_unused,
		const struct rte_dma_conf *dev_conf,
		size_t conf_sz)
{
	if (conf_sz < sizeof(struct orig_rte_dma_conf))
		return -EINVAL;

	struct rte_dma_conf conf;
	memcpy(&conf, dev_conf, RTE_MIN(conf_sz, sizeof(conf)));
        dev_conf = &conf;

Looking at rte_dma_conf the structure has holes and dmadev lib
doesn't validate undefined flags, so it already has future ABI problems.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-06-28 16:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 18:18 [PATCH] dma/ae4dma: add AMD AE4DMA DMA PMD Raghavendra Ningoji
2026-05-21 14:28 ` David Marchand
2026-05-25 18:42 ` [PATCH v2 0/3] " Raghavendra Ningoji
2026-05-25 18:42   ` [PATCH v2 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
2026-06-22 12:06     ` David Marchand
2026-06-22 12:16       ` Bruce Richardson
2026-06-24  0:38       ` fengchengwen
2026-06-25 18:41       ` Raghavendra Ningoji
2026-06-22 12:26     ` David Marchand
2026-06-22 12:37       ` Bruce Richardson
2026-06-25 18:43       ` Raghavendra Ningoji
2026-05-25 18:42   ` [PATCH v2 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
2026-06-22 12:15     ` David Marchand
2026-06-25 18:42       ` Raghavendra Ningoji
2026-05-25 18:42   ` [PATCH v2 3/3] dma/ae4dma: add data " Raghavendra Ningoji
2026-06-22 12:25   ` [PATCH v2 0/3] dma/ae4dma: add AMD AE4DMA DMA PMD David Marchand
2026-06-25 18:47   ` [PATCH v3 " Raghavendra Ningoji
2026-06-25 18:47     ` [PATCH v3 1/3] dma/ae4dma: introduce " Raghavendra Ningoji
2026-06-27  0:01       ` fengchengwen
2026-06-25 18:47     ` [PATCH v3 2/3] dma/ae4dma: add control path operations Raghavendra Ningoji
2026-06-27  0:09       ` fengchengwen
2026-06-28 16:04         ` Stephen Hemminger
2026-06-25 18:47     ` [PATCH v3 3/3] dma/ae4dma: add data " Raghavendra Ningoji
2026-06-27  0:23       ` fengchengwen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.