dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] dma: fake-dma and IOVA tests
@ 2025-05-20 22:39 Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

We don't seem to have unit tests for the DMA IOVA API, so I figured
we should add some so to ensure we don't regress moving forward, and it allows
us to extend these later. Its best to just extend existing tests though. I've
found two tests so I've extended them as part of this patchset:

  - drivers/dma/dmatest.c
  - kernel/dma/map_benchmark.c

However running the dmatest requires some old x86 emulation or some
non-upstream qemu patches for intel IOAT a q35 system. This make this
easier by providing a simple in-kernel fake-dma controller to let you test
run all dmatests on most systems. The only issue I found with that was not
being able to get the platform device through an IOMMU for DMA. If folks have
an idea of how to make it easy for a platform device to get an IOMMU for DMA
it would make it easier to allow us to leverage the existing dmatest for
IOVA as well. I only tried briefly with virtio and vfio_iommu_type1, but gave
up fast. Not sure if its easy to later allow a platform device like this
one to leverage it to make it easier for testing.

The kernel/dma/map_benchmark.c test is extended as well, for that I was
able to add follow the instructions on the first commit from that test,
by unbinding a device and attaching it to the map benchmark.

I tried twiddle a mocked IOMMU with iommufd on a q35 guest, but alas,
that just didn't work as I'd hope, ie, nothing, and so this is the best
I have for now to help test IOVA DMA API on a virtualized setup.

Let me know if others have other recomendations.

The hope is to get a CI eventually going to ensure these don't regress.

Luis Chamberlain (6):
  fake-dma: add fake dma engine driver
  dmatest: split dmatest_func() into helpers
  dmatest: move printing to its own routine
  dmatest: add IOVA tests
  dma-mapping: benchmark: move validation parameters into a helper
  dma-mapping: benchmark: add IOVA support

 drivers/dma/Kconfig                           |  11 +
 drivers/dma/Makefile                          |   1 +
 drivers/dma/dmatest.c                         | 795 ++++++++++++------
 drivers/dma/fake-dma.c                        | 718 ++++++++++++++++
 include/linux/map_benchmark.h                 |  11 +
 kernel/dma/Kconfig                            |   4 +-
 kernel/dma/map_benchmark.c                    | 512 +++++++++--
 .../testing/selftests/dma/dma_map_benchmark.c | 145 +++-
 8 files changed, 1864 insertions(+), 333 deletions(-)
 create mode 100644 drivers/dma/fake-dma.c

-- 
2.47.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-21 14:20   ` Robin Murphy
  2025-05-21 23:40   ` kernel test robot
  2025-05-20 22:39 ` [PATCH 2/6] dmatest: split dmatest_func() into helpers Luis Chamberlain
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

Today on x86_64 q35 guests we can't easily test some of the DMA API
with the dmatest out of the box because we lack a DMA engine as the
current qemu intel IOT patches are out of tree. This implements a basic
dma engine to let us use the dmatest API to expand on it and leverage
it on q35 guests.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/dma/Kconfig    |  11 +
 drivers/dma/Makefile   |   1 +
 drivers/dma/fake-dma.c | 718 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 730 insertions(+)
 create mode 100644 drivers/dma/fake-dma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index df2d2dc00a05..716531f2c7e2 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -140,6 +140,17 @@ config DMA_BCM2835
 	select DMA_ENGINE
 	select DMA_VIRTUAL_CHANNELS
 
+config DMA_FAKE
+	tristate "Fake DMA Engine"
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  This implements a fake DMA engine. Useful for testing the DMA API
+	  without any hardware requirements, on any architecture which just
+	  supporst the DMA engine. Enable this if you want to easily run custom
+	  tests on the DMA API without a real DMA engine or the requirement for
+	  things like qemu to virtualize it for you.
+
 config DMA_JZ4780
 	tristate "JZ4780 DMA support"
 	depends on MIPS || COMPILE_TEST
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 19ba465011a6..c75e4b7ad9f2 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
 obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
+obj-$(CONFIG_DMA_FAKE) += fake-dma.o
 obj-$(CONFIG_DMA_JZ4780) += dma-jz4780.o
 obj-$(CONFIG_DMA_SA11X0) += sa11x0-dma.o
 obj-$(CONFIG_DMA_SUN4I) += sun4i-dma.o
diff --git a/drivers/dma/fake-dma.c b/drivers/dma/fake-dma.c
new file mode 100644
index 000000000000..ee1d788a2b83
--- /dev/null
+++ b/drivers/dma/fake-dma.c
@@ -0,0 +1,718 @@
+// SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
+/*
+ * Fake DMA engine test module. This allows us to test DMA engines
+ * without leveraging virtualization.
+ *
+ * Copyright (C) 2025 Luis Chamberlain <mcgrof@kernel.org>
+ *
+ * This driver provides an interface to trigger and test the kernel's
+ * module loader through a series of configurations and a few triggers.
+ * To test this driver use the following script as root:
+ *
+ * tools/testing/selftests/dma/fake.sh --help
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/err.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/freezer.h>
+#include <linux/init.h>
+#include <linux/kthread.h>
+#include <linux/sched/task.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+#include <linux/debugfs.h>
+#include <linux/platform_device.h>
+#include "dmaengine.h"
+
+#define FAKE_MAX_DMA_CHANNELS 20
+
+static unsigned int num_channels = FAKE_MAX_DMA_CHANNELS;
+module_param(num_channels, uint, 0644);
+MODULE_PARM_DESC(num_channels, "Number of channels to support (default: 20)");
+
+struct fake_dma_desc {
+	struct dma_async_tx_descriptor txd;
+	dma_addr_t src;
+	dma_addr_t dst;
+	size_t len;
+	enum dma_transaction_type type;
+	int memset_value;
+	/* For XOR/PQ operations */
+	dma_addr_t *src_list; /* Array of source addresses */
+	unsigned int src_cnt; /* Number of sources */
+	dma_addr_t *dst_list; /* Array of destination addresses (for PQ) */
+	unsigned char *pq_coef; /* P+Q coefficients */
+	struct list_head node;
+};
+
+struct fake_dma_chan {
+	struct dma_chan chan;
+	struct list_head active_list;
+	struct list_head queue;
+	struct work_struct work;
+	spinlock_t lock;
+	bool running;
+};
+
+struct fake_dma_device {
+	struct platform_device *pdev;
+	struct dma_device dma_dev;
+	struct fake_dma_chan *channels;
+};
+
+struct fake_dma_device *single_fake_dma;
+
+static struct platform_driver fake_dma_engine_driver = {
+	.driver = {
+		.name = KBUILD_MODNAME,
+		.owner = THIS_MODULE,
+        },
+};
+
+static int fake_dma_create_platform_device(struct fake_dma_device *fake_dma)
+{
+	fake_dma->pdev = platform_device_register_simple("fake-dma-engine", -1, NULL, 0);
+	if (IS_ERR(fake_dma->pdev))
+		return -ENODEV;
+
+	pr_info("Fake DMA platform device created: %s\n",
+		dev_name(&fake_dma->pdev->dev));
+
+	return 0;
+}
+
+static void fake_dma_destroy_platform_device(struct fake_dma_device  *fake_dma)
+{
+	if (!fake_dma->pdev)
+		return;
+
+	pr_info("Destroying fake DMA platform device: %s ...\n",
+		dev_name(&fake_dma->pdev->dev));
+	platform_device_unregister(fake_dma->pdev);
+}
+
+static inline struct fake_dma_chan *to_fake_dma_chan(struct dma_chan *c)
+{
+	return container_of(c, struct fake_dma_chan, chan);
+}
+
+static inline struct fake_dma_desc *to_fake_dma_desc(struct dma_async_tx_descriptor *txd)
+{
+	return container_of(txd, struct fake_dma_desc, txd);
+}
+
+/* Galois Field multiplication for P+Q operations */
+static unsigned char gf_mul(unsigned char a, unsigned char b)
+{
+	unsigned char result = 0;
+	unsigned char high_bit_set;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (b & 1)
+			result ^= a;
+		high_bit_set = a & 0x80;
+		a <<= 1;
+		if (high_bit_set)
+			a ^= 0x1b; /* x^8 + x^4 + x^3 + x + 1 */
+		b >>= 1;
+	}
+
+	return result;
+}
+
+/* Processes pending transfers */
+static void fake_dma_work_func(struct work_struct *work)
+{
+	struct fake_dma_chan *vchan = container_of(work, struct fake_dma_chan, work);
+	struct fake_dma_desc *vdesc;
+	struct dmaengine_desc_callback cb;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vchan->lock, flags);
+
+	if (list_empty(&vchan->queue)) {
+		vchan->running = false;
+		spin_unlock_irqrestore(&vchan->lock, flags);
+		return;
+	}
+
+	vdesc = list_first_entry(&vchan->queue, struct fake_dma_desc, node);
+	list_del(&vdesc->node);
+	list_add_tail(&vdesc->node, &vchan->active_list);
+
+	spin_unlock_irqrestore(&vchan->lock, flags);
+
+	/* Actually perform the DMA transfer for memcpy operations */
+	if (vdesc->len) {
+		void *src_virt, *dst_virt;
+		void *p_virt, *q_virt;
+		unsigned char *p_bytes, *q_bytes;
+		unsigned int i, j;
+		unsigned char *dst_bytes;
+
+		switch (vdesc->type) {
+		case DMA_MEMCPY:
+			/* Convert DMA addresses to virtual addresses and perform the copy */
+			src_virt = phys_to_virt(vdesc->src);
+			dst_virt = phys_to_virt(vdesc->dst);
+
+			memcpy(dst_virt, src_virt, vdesc->len);
+			break;
+		case DMA_MEMSET:
+			dst_virt = phys_to_virt(vdesc->dst);
+			memset(dst_virt, vdesc->memset_value, vdesc->len);
+			break;
+		case DMA_XOR:
+			dst_virt = phys_to_virt(vdesc->dst);
+			dst_bytes = (unsigned char *)dst_virt;
+
+			memset(dst_virt, 0, vdesc->len);
+
+			/* XOR all sources into destination */
+			for (i = 0; i < vdesc->src_cnt; i++) {
+				void *src_virt = phys_to_virt(vdesc->src_list[i]);
+				unsigned char *src_bytes = (unsigned char *)src_virt;
+
+				for (j = 0; j < vdesc->len; j++)
+					dst_bytes[j] ^= src_bytes[j];
+			}
+			break;
+		case DMA_PQ:
+			p_virt = phys_to_virt(vdesc->dst_list[0]);
+			q_virt = phys_to_virt(vdesc->dst_list[1]);
+			p_bytes = (unsigned char *)p_virt;
+			q_bytes = (unsigned char *)q_virt;
+
+			/* Initialize P and Q destinations to zero */
+			memset(p_virt, 0, vdesc->len);
+			memset(q_virt, 0, vdesc->len);
+
+			/* Calculate P (XOR of all sources) and Q (weighted XOR) */
+			for (i = 0; i < vdesc->src_cnt; i++) {
+				void *src_virt = phys_to_virt(vdesc->src_list[i]);
+				unsigned char *src_bytes = (unsigned char *)src_virt;
+				unsigned char coef = vdesc->pq_coef[i];
+
+				for (j = 0; j < vdesc->len; j++) {
+					/* P calculation: simple XOR */
+					p_bytes[j] ^= src_bytes[j];
+
+					/* Q calculation: multiply in GF(2^8) and XOR */
+					q_bytes[j] ^= gf_mul(src_bytes[j], coef);
+				}
+			}
+			break;
+		default:
+			pr_warn("fake-dma: Unknown DMA operation type %d\n", vdesc->type);
+			break;
+		}
+	}
+
+	/* Mark descriptor as complete */
+	dma_cookie_complete(&vdesc->txd);
+
+	/* Call completion callback if set */
+	dmaengine_desc_get_callback(&vdesc->txd, &cb);
+	if (cb.callback)
+		cb.callback(cb.callback_param);
+
+	/* Process next transfer if available */
+	spin_lock_irqsave(&vchan->lock, flags);
+	list_del(&vdesc->node);
+
+	/* Free allocated memory for XOR/PQ operations */
+	if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
+		kfree(vdesc->src_list);
+		if (vdesc->type == DMA_PQ) {
+			kfree(vdesc->dst_list);
+			kfree(vdesc->pq_coef);
+		}
+	}
+
+	kfree(vdesc);
+
+	if (!list_empty(&vchan->queue)) {
+		spin_unlock_irqrestore(&vchan->lock, flags);
+		schedule_work(&vchan->work);
+	} else {
+		vchan->running = false;
+		spin_unlock_irqrestore(&vchan->lock, flags);
+	}
+}
+
+/* Submit descriptor to the DMA engine */
+static dma_cookie_t fake_dma_tx_submit(struct dma_async_tx_descriptor *txd)
+{
+	struct fake_dma_chan *vchan = to_fake_dma_chan(txd->chan);
+	struct fake_dma_desc *vdesc = to_fake_dma_desc(txd);
+	unsigned long flags;
+	dma_cookie_t cookie;
+
+	spin_lock_irqsave(&vchan->lock, flags);
+
+	cookie = dma_cookie_assign(txd);
+	list_add_tail(&vdesc->node, &vchan->queue);
+
+	/* Schedule processing if not already running */
+	if (!vchan->running) {
+		vchan->running = true;
+		schedule_work(&vchan->work);
+	}
+
+	spin_unlock_irqrestore(&vchan->lock, flags);
+
+	return cookie;
+}
+
+static
+struct dma_async_tx_descriptor *fake_dma_prep_memcpy(struct dma_chan *chan,
+						     dma_addr_t dest,
+						     dma_addr_t src,
+						     size_t len,
+						     unsigned long flags)
+{
+	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
+	struct fake_dma_desc *vdesc;
+
+	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
+	if (!vdesc)
+		return NULL;
+
+	if (!vchan)
+		return NULL;
+
+	dma_async_tx_descriptor_init(&vdesc->txd, chan);
+	vdesc->type = DMA_MEMCPY;
+	vdesc->txd.tx_submit = fake_dma_tx_submit;
+	vdesc->txd.flags = flags;
+	vdesc->src = src;
+	vdesc->dst = dest;
+	vdesc->len = len;
+	INIT_LIST_HEAD(&vdesc->node);
+
+	return &vdesc->txd;
+}
+
+static
+struct dma_async_tx_descriptor * fake_dma_prep_memset(struct dma_chan *chan,
+						      dma_addr_t dest,
+						      int value,
+						      size_t len,
+						      unsigned long flags)
+{
+	struct fake_dma_desc *vdesc;
+
+	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
+	if (!vdesc)
+		return NULL;
+
+	dma_async_tx_descriptor_init(&vdesc->txd, chan);
+	vdesc->type = DMA_MEMSET;
+	vdesc->txd.tx_submit = fake_dma_tx_submit;
+	vdesc->txd.flags = flags;
+	vdesc->dst = dest;
+	vdesc->len = len;
+	vdesc->memset_value = value & 0xFF; /* Ensure it's a single byte */
+
+	INIT_LIST_HEAD(&vdesc->node);
+
+	return &vdesc->txd;
+}
+
+static struct dma_async_tx_descriptor *
+fake_dma_prep_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
+		  unsigned int src_cnt, size_t len, unsigned long flags)
+{
+	struct fake_dma_desc *vdesc;
+
+	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
+	if (!vdesc)
+		return NULL;
+
+	/* Allocate memory for source list */
+	vdesc->src_list = kmalloc(src_cnt * sizeof(dma_addr_t), GFP_NOWAIT);
+	if (!vdesc->src_list) {
+		kfree(vdesc);
+		return NULL;
+	}
+
+	dma_async_tx_descriptor_init(&vdesc->txd, chan);
+	vdesc->type = DMA_XOR;
+	vdesc->txd.tx_submit = fake_dma_tx_submit;
+	vdesc->txd.flags = flags;
+	vdesc->dst = dest;
+	vdesc->len = len;
+	vdesc->src_cnt = src_cnt;
+
+	memcpy(vdesc->src_list, src, src_cnt * sizeof(dma_addr_t));
+
+	INIT_LIST_HEAD(&vdesc->node);
+
+	return &vdesc->txd;
+}
+
+static struct dma_async_tx_descriptor *
+fake_dma_prep_pq(struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src,
+		 unsigned int src_cnt, const unsigned char *scf, size_t len,
+		 unsigned long flags)
+{
+	struct fake_dma_desc *vdesc;
+
+	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
+	if (!vdesc)
+		return NULL;
+
+	vdesc->src_list = kmalloc(src_cnt * sizeof(dma_addr_t), GFP_NOWAIT);
+	if (!vdesc->src_list) {
+		kfree(vdesc);
+		return NULL;
+	}
+
+	/* Allocate memory for destination list (P and Q) */
+	vdesc->dst_list = kmalloc(2 * sizeof(dma_addr_t), GFP_NOWAIT);
+	if (!vdesc->dst_list) {
+		kfree(vdesc->src_list);
+		kfree(vdesc);
+		return NULL;
+	}
+
+	/* Allocate memory for coefficients */
+	vdesc->pq_coef = kmalloc(src_cnt * sizeof(unsigned char), GFP_NOWAIT);
+	if (!vdesc->pq_coef) {
+		kfree(vdesc->dst_list);
+		kfree(vdesc->src_list);
+		kfree(vdesc);
+		return NULL;
+	}
+
+	dma_async_tx_descriptor_init(&vdesc->txd, chan);
+	vdesc->type = DMA_PQ;
+	vdesc->txd.tx_submit = fake_dma_tx_submit;
+	vdesc->txd.flags = flags;
+	vdesc->len = len;
+	vdesc->src_cnt = src_cnt;
+
+	/* Copy source addresses */
+	memcpy(vdesc->src_list, src, src_cnt * sizeof(dma_addr_t));
+	/* Copy destination addresses (P and Q) */
+	memcpy(vdesc->dst_list, dst, 2 * sizeof(dma_addr_t));
+	/* Copy coefficients */
+	memcpy(vdesc->pq_coef, scf, src_cnt * sizeof(unsigned char));
+
+	INIT_LIST_HEAD(&vdesc->node);
+
+	return &vdesc->txd;
+}
+
+static void fake_dma_issue_pending(struct dma_chan *chan)
+{
+	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&vchan->lock, flags);
+
+	/* Start processing if not already running and queue not empty */
+	if (!vchan->running && !list_empty(&vchan->queue)) {
+		vchan->running = true;
+		schedule_work(&vchan->work);
+	}
+
+	spin_unlock_irqrestore(&vchan->lock, flags);
+}
+
+static int fake_dma_alloc_chan_resources(struct dma_chan *chan)
+{
+	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
+
+	INIT_LIST_HEAD(&vchan->active_list);
+	INIT_LIST_HEAD(&vchan->queue);
+	vchan->running = false;
+
+	return 1; /* Number of descriptors allocated */
+}
+
+static void fake_dma_free_chan_resources(struct dma_chan *chan)
+{
+	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
+	struct fake_dma_desc *vdesc, *_vdesc;
+	unsigned long flags;
+
+	cancel_work_sync(&vchan->work);
+
+	spin_lock_irqsave(&vchan->lock, flags);
+
+	/* Free all descriptors in queue */
+	list_for_each_entry_safe(vdesc, _vdesc, &vchan->queue, node) {
+		list_del(&vdesc->node);
+
+		/* Free allocated memory for XOR/PQ operations */
+		if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
+			kfree(vdesc->src_list);
+			if (vdesc->type == DMA_PQ) {
+				kfree(vdesc->dst_list);
+				kfree(vdesc->pq_coef);
+			}
+		}
+		kfree(vdesc);
+	}
+
+	/* Free all descriptors in active list */
+	list_for_each_entry_safe(vdesc, _vdesc, &vchan->active_list, node) {
+		list_del(&vdesc->node);
+		/* Free allocated memory for XOR/PQ operations */
+		if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
+			kfree(vdesc->src_list);
+			if (vdesc->type == DMA_PQ) {
+				kfree(vdesc->dst_list);
+				kfree(vdesc->pq_coef);
+			}
+		}
+		kfree(vdesc);
+	}
+
+	spin_unlock_irqrestore(&vchan->lock, flags);
+}
+
+static void fake_dma_release(struct dma_device *dma_dev)
+{
+	unsigned int i;
+        struct fake_dma_device *fake_dma =
+                container_of(dma_dev, struct fake_dma_device, dma_dev);
+
+	pr_info("refcount for dma device %s hit 0, quiescing...",
+		dev_name(&fake_dma->pdev->dev));
+
+	for (i = 0; i < num_channels; i++) {
+		struct fake_dma_chan *vchan = &fake_dma->channels[i];
+		cancel_work_sync(&vchan->work);
+	}
+
+        put_device(dma_dev->dev);
+}
+
+static void fake_dma_setup_config(struct fake_dma_device *fake_dma)
+{
+	unsigned int i;
+	struct dma_device *dma =  &fake_dma->dma_dev;
+
+	dma->dev = get_device(&fake_dma->pdev->dev);
+
+	/* Set multiple capabilities for dmatest compatibility */
+	dma_cap_set(DMA_MEMCPY, dma->cap_mask);
+	dma_cap_set(DMA_MEMSET, dma->cap_mask);
+	dma_cap_set(DMA_XOR, dma->cap_mask);
+	dma_cap_set(DMA_PQ, dma->cap_mask);
+	dma_cap_set(DMA_PRIVATE, dma->cap_mask);
+
+	dma->device_alloc_chan_resources = fake_dma_alloc_chan_resources;
+	dma->device_free_chan_resources = fake_dma_free_chan_resources;
+	dma->device_prep_dma_memcpy = fake_dma_prep_memcpy;
+	dma->device_prep_dma_memset = fake_dma_prep_memset;
+	dma->device_prep_dma_xor = fake_dma_prep_xor;
+	dma->device_prep_dma_pq = fake_dma_prep_pq;
+	dma->device_issue_pending = fake_dma_issue_pending;
+	dma->device_tx_status = dma_cookie_status;
+	dma->device_release = fake_dma_release;
+
+	dma->copy_align = 4; /* 4-byte alignment for memcpy */
+	dma->fill_align = 4; /* 4-byte alignment for memset */
+	dma->xor_align = 4;  /* 4-byte alignment for xor */
+	dma->pq_align = 4;   /* 4-byte alignment for pq */
+
+	dma->max_xor = 16;   /* Support up to 16 XOR sources */
+	dma->max_pq = 16;    /* Support up to 16 P+Q sources */
+
+	dma->src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) |
+			       BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |
+			       BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
+			       BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
+	dma->dst_addr_widths = dma->src_addr_widths;
+	dma->directions = BIT(DMA_MEM_TO_MEM);
+	dma->residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
+
+	INIT_LIST_HEAD(&dma->channels);
+
+	for (i = 0; i < num_channels; i++) {
+		struct fake_dma_chan *vchan = &fake_dma->channels[i];
+
+		vchan->chan.device = dma;
+		dma_cookie_init(&vchan->chan);
+
+		spin_lock_init(&vchan->lock);
+		INIT_LIST_HEAD(&vchan->active_list);
+		INIT_LIST_HEAD(&vchan->queue);
+
+		INIT_WORK(&vchan->work, fake_dma_work_func);
+
+		list_add_tail(&vchan->chan.device_node, &dma->channels);
+	}
+}
+
+static int fake_dma_load(void)
+{
+	unsigned int i;
+	int ret;
+	struct fake_dma_device *fake_dma;
+	struct dma_device *dma;
+
+	if (single_fake_dma) {
+		pr_err("Fake DMA device already loaded, skipping...");
+		return -EALREADY;
+	}
+
+	if (num_channels > FAKE_MAX_DMA_CHANNELS)
+		num_channels = FAKE_MAX_DMA_CHANNELS;
+
+	ret = platform_driver_register(&fake_dma_engine_driver);
+	if (ret)
+		return ret;
+
+	fake_dma = kzalloc(sizeof(*fake_dma), GFP_KERNEL);
+	if (!fake_dma) {
+		ret = -ENOMEM;
+		goto out_unregister_driver;
+	}
+
+	fake_dma->channels = kzalloc(sizeof(struct fake_dma_chan) * num_channels,
+				     GFP_KERNEL);
+	if (!fake_dma->channels) {
+		ret = -ENOMEM;
+		goto out_free_dma;
+	}
+
+	ret = fake_dma_create_platform_device(fake_dma);
+	if (ret)
+		goto out_free_chans;
+
+	fake_dma->pdev->dev.driver = &fake_dma_engine_driver.driver;
+	ret = device_bind_driver(&fake_dma->pdev->dev);
+	if (ret)
+		goto out_unregister_device;
+
+	fake_dma_setup_config(fake_dma);
+	dma = &fake_dma->dma_dev;
+
+	/* Register with the DMA Engine */
+	ret = dma_async_device_register(dma);
+	if (ret) {
+		ret = -EINVAL;
+		goto out_release_driver;
+	}
+
+	for (i = 0; i < num_channels; i++) {
+		struct fake_dma_chan *vchan = &fake_dma->channels[i];
+		pr_info("Registered fake DMA channel %d (%s)\n",
+			i, dma_chan_name(&vchan->chan));
+	}
+
+	single_fake_dma = fake_dma;
+
+	pr_info("Fake DMA engine: %s registered with %d channels\n",
+		dev_name(&fake_dma->pdev->dev), num_channels);
+
+	pr_info("Fake DMA device name for dmatest: '%s'\n", dev_name(dma->dev));
+	pr_info("Fake DMA device path: '%s'\n", dev_name(&fake_dma->pdev->dev));
+
+	return 0;
+
+out_release_driver:
+	device_release_driver(&fake_dma->pdev->dev);
+out_unregister_device:
+	fake_dma_destroy_platform_device(fake_dma);
+out_free_chans:
+	kfree(fake_dma->channels);
+out_free_dma:
+	kfree(fake_dma);
+	fake_dma = NULL;
+out_unregister_driver:
+	platform_driver_unregister(&fake_dma_engine_driver);
+	return ret;
+}
+
+static void fake_dma_unload(void)
+{
+	struct fake_dma_device *fake_dma = single_fake_dma;
+
+	if (!fake_dma) {
+		pr_info("No fake DMA engines registered yet.\n");
+		return;
+	}
+
+	pr_info("Fake DMA engine: %s unregistering with %d channels ...\n",
+		dev_name(&fake_dma->pdev->dev), num_channels);
+
+	dma_async_device_unregister(&fake_dma->dma_dev);
+
+	/*
+	 * dma_async_device_unregister() will call device_release() only
+	 * if a channel ever gets busy, so we need to tidy up ourselves
+	 * here in case no channels are ever used.
+	 */
+	device_release_driver(&fake_dma->pdev->dev);
+	fake_dma_destroy_platform_device(fake_dma);
+
+	kfree(fake_dma->channels);
+	kfree(fake_dma);
+
+	platform_driver_unregister(&fake_dma_engine_driver);
+	single_fake_dma = NULL;
+}
+
+static ssize_t write_file_load(struct file *file, const char __user *user_buf,
+			       size_t count, loff_t *ppos)
+{
+	fake_dma_load();
+
+	return count;
+}
+
+static const struct file_operations fops_load = {
+	.write = write_file_load,
+	.open = simple_open,
+	.owner = THIS_MODULE,
+	.llseek = default_llseek,
+};
+
+static ssize_t write_file_unload(struct file *file, const char __user *user_buf,
+				 size_t count, loff_t *ppos)
+{
+	fake_dma_unload();
+
+	return count;
+}
+
+static const struct file_operations fops_unload = {
+	.write = write_file_unload,
+	.open = simple_open,
+	.owner = THIS_MODULE,
+	.llseek = default_llseek,
+};
+
+static int __init fake_dma_init(void)
+{
+	struct dentry *fake_dir;
+
+	fake_dir = debugfs_create_dir("fake-dma", NULL);
+	debugfs_create_file("load", 0600, fake_dir, NULL, &fops_load);
+	debugfs_create_file("unload", 0600, fake_dir, NULL, &fops_unload);
+
+	return fake_dma_load();
+}
+late_initcall(fake_dma_init);
+
+static void __exit fake_dma_exit(void)
+{
+	fake_dma_unload();
+}
+module_exit(fake_dma_exit);
+
+MODULE_DESCRIPTION("Fake DMA Engine test module");
+MODULE_AUTHOR("Luis Chamberlain");
+MODULE_LICENSE("GPL v2");
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] dmatest: split dmatest_func() into helpers
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 3/6] dmatest: move printing to its own routine Luis Chamberlain
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

Split the core logic of dmatest_func() into helpers so that the routine
dmatest_func() becomes thin, letting us later add alternative routines
which use a different preamble setup.

This introduces no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/dma/dmatest.c | 607 ++++++++++++++++++++++--------------------
 1 file changed, 320 insertions(+), 287 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 91b2fbc0b864..921d89b4d2ed 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -238,6 +238,7 @@ struct dmatest_thread {
 	struct dmatest_done test_done;
 	bool			done;
 	bool			pending;
+	u8			*pq_coefs;
 };
 
 struct dmatest_chan {
@@ -557,377 +558,409 @@ static int dmatest_alloc_test_data(struct dmatest_data *d,
 	return -ENOMEM;
 }
 
-/*
- * This function repeatedly tests DMA transfers of various lengths and
- * offsets for a given operation type until it is told to exit by
- * kthread_stop(). There may be multiple threads running this function
- * in parallel for a single channel, and there may be multiple channels
- * being tested in parallel.
- *
- * Before each test, the source and destination buffer is initialized
- * with a known pattern. This pattern is different depending on
- * whether it's in an area which is supposed to be copied or
- * overwritten, and different in the source and destination buffers.
- * So if the DMA engine doesn't copy exactly what we tell it to copy,
- * we'll notice.
- */
-static int dmatest_func(void *data)
+static int dmatest_setup_test(struct dmatest_thread *thread,
+			      unsigned int *buf_size,
+			      u8 *align,
+			      bool *is_memset)
 {
-	struct dmatest_thread	*thread = data;
-	struct dmatest_done	*done = &thread->test_done;
-	struct dmatest_info	*info;
-	struct dmatest_params	*params;
-	struct dma_chan		*chan;
-	struct dma_device	*dev;
-	struct device		*dma_dev;
-	unsigned int		error_count;
-	unsigned int		failed_tests = 0;
-	unsigned int		total_tests = 0;
-	dma_cookie_t		cookie;
-	enum dma_status		status;
-	enum dma_ctrl_flags	flags;
-	u8			*pq_coefs = NULL;
-	int			ret;
-	unsigned int		buf_size;
-	struct dmatest_data	*src;
-	struct dmatest_data	*dst;
-	int			i;
-	ktime_t			ktime, start, diff;
-	ktime_t			filltime = 0;
-	ktime_t			comparetime = 0;
-	s64			runtime = 0;
-	unsigned long long	total_len = 0;
-	unsigned long long	iops = 0;
-	u8			align = 0;
-	bool			is_memset = false;
-	dma_addr_t		*srcs;
-	dma_addr_t		*dma_pq;
-
-	set_freezable();
-
-	ret = -ENOMEM;
+	struct dmatest_info *info = thread->info;
+	struct dmatest_params *params = &info->params;
+	struct dma_chan *chan = thread->chan;
+	struct dma_device *dev = chan->device;
+	struct dmatest_data *src = &thread->src;
+	struct dmatest_data *dst = &thread->dst;
+	int ret;
 
-	smp_rmb();
-	thread->pending = false;
-	info = thread->info;
-	params = &info->params;
-	chan = thread->chan;
-	dev = chan->device;
-	dma_dev = dmaengine_get_dma_device(chan);
-
-	src = &thread->src;
-	dst = &thread->dst;
 	if (thread->type == DMA_MEMCPY) {
-		align = params->alignment < 0 ? dev->copy_align :
-						params->alignment;
+		*align = params->alignment < 0 ? dev->copy_align : params->alignment;
 		src->cnt = dst->cnt = 1;
+		*is_memset = false;
 	} else if (thread->type == DMA_MEMSET) {
-		align = params->alignment < 0 ? dev->fill_align :
-						params->alignment;
+		*align = params->alignment < 0 ? dev->fill_align : params->alignment;
 		src->cnt = dst->cnt = 1;
-		is_memset = true;
+		*is_memset = true;
 	} else if (thread->type == DMA_XOR) {
-		/* force odd to ensure dst = src */
 		src->cnt = min_odd(params->xor_sources | 1, dev->max_xor);
 		dst->cnt = 1;
-		align = params->alignment < 0 ? dev->xor_align :
-						params->alignment;
+		*align = params->alignment < 0 ? dev->xor_align : params->alignment;
+		*is_memset = false;
 	} else if (thread->type == DMA_PQ) {
-		/* force odd to ensure dst = src */
 		src->cnt = min_odd(params->pq_sources | 1, dma_maxpq(dev, 0));
 		dst->cnt = 2;
-		align = params->alignment < 0 ? dev->pq_align :
-						params->alignment;
+		*align = params->alignment < 0 ? dev->pq_align : params->alignment;
+		*is_memset = false;
 
-		pq_coefs = kmalloc(params->pq_sources + 1, GFP_KERNEL);
-		if (!pq_coefs)
-			goto err_thread_type;
+		thread->pq_coefs = kmalloc(params->pq_sources + 1, GFP_KERNEL);
+		if (!thread->pq_coefs)
+			return -ENOMEM;
 
-		for (i = 0; i < src->cnt; i++)
-			pq_coefs[i] = 1;
+		for (int i = 0; i < src->cnt; i++)
+			thread->pq_coefs[i] = 1;
 	} else
-		goto err_thread_type;
+		return -EINVAL;
 
-	/* Check if buffer count fits into map count variable (u8) */
+	/* Buffer count check */
 	if ((src->cnt + dst->cnt) >= 255) {
 		pr_err("too many buffers (%d of 255 supported)\n",
-		       src->cnt + dst->cnt);
+		src->cnt + dst->cnt);
+		ret = -EINVAL;
 		goto err_free_coefs;
 	}
 
-	buf_size = params->buf_size;
-	if (1 << align > buf_size) {
+	*buf_size = params->buf_size;
+
+	if (1 << *align > *buf_size) {
 		pr_err("%u-byte buffer too small for %d-byte alignment\n",
-		       buf_size, 1 << align);
+		       *buf_size, 1 << *align);
+		ret = -EINVAL;
 		goto err_free_coefs;
 	}
 
+	/* Set GFP flags */
 	src->gfp_flags = GFP_KERNEL;
 	dst->gfp_flags = GFP_KERNEL;
+
 	if (params->nobounce) {
 		src->gfp_flags = GFP_DMA;
 		dst->gfp_flags = GFP_DMA;
 	}
 
-	if (dmatest_alloc_test_data(src, buf_size, align) < 0)
+	/* Allocate test data */
+	if (dmatest_alloc_test_data(src, *buf_size, *align) < 0) {
+		ret = -ENOMEM;
 		goto err_free_coefs;
+	}
 
-	if (dmatest_alloc_test_data(dst, buf_size, align) < 0)
+	if (dmatest_alloc_test_data(dst, *buf_size, *align) < 0) {
+		ret = -ENOMEM;
 		goto err_src;
+	}
 
-	set_user_nice(current, 10);
+	return 0;
 
-	srcs = kcalloc(src->cnt, sizeof(dma_addr_t), GFP_KERNEL);
-	if (!srcs)
-		goto err_dst;
+err_src:
+	dmatest_free_test_data(src);
+err_free_coefs:
+	kfree(thread->pq_coefs);
+	return ret;
+}
 
-	dma_pq = kcalloc(dst->cnt, sizeof(dma_addr_t), GFP_KERNEL);
-	if (!dma_pq)
-		goto err_srcs_array;
+static void dmatest_cleanup_test(struct dmatest_thread *thread)
+{
+	dmatest_free_test_data(&thread->src);
+	dmatest_free_test_data(&thread->dst);
+	kfree(thread->pq_coefs);
+	thread->pq_coefs = NULL;
+}
 
-	/*
-	 * src and dst buffers are freed by ourselves below
-	 */
-	if (params->polled)
-		flags = DMA_CTRL_ACK;
-	else
-		flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT;
+static int dmatest_do_dma_test(struct dmatest_thread *thread,
+			       unsigned int buf_size, u8 align, bool is_memset,
+			       unsigned int *total_tests,
+			       unsigned int *failed_tests,
+			       unsigned long long *total_len,
+			       ktime_t *filltime,
+			       ktime_t *comparetime)
+{
+	struct dmatest_info *info = thread->info;
+	struct dmatest_params *params = &info->params;
+	struct dma_chan *chan = thread->chan;
+	struct dma_device *dev = chan->device;
+	struct device *dma_dev = dmaengine_get_dma_device(chan);
+	struct dmatest_data *src = &thread->src;
+	struct dmatest_data *dst = &thread->dst;
+	struct dmatest_done *done = &thread->test_done;
+	dma_addr_t *srcs;
+	dma_addr_t *dma_pq;
+	struct dma_async_tx_descriptor *tx = NULL;
+	struct dmaengine_unmap_data *um;
+	dma_addr_t *dsts;
+	unsigned int len;
+	unsigned int error_count;
+	enum dma_ctrl_flags flags;
+	dma_cookie_t cookie;
+	enum dma_status status;
+	ktime_t start, diff;
+	int ret;
 
-	ktime = ktime_get();
-	while (!(kthread_should_stop() ||
-	       (params->iterations && total_tests >= params->iterations))) {
-		struct dma_async_tx_descriptor *tx = NULL;
-		struct dmaengine_unmap_data *um;
-		dma_addr_t *dsts;
-		unsigned int len;
-
-		total_tests++;
-
-		if (params->transfer_size) {
-			if (params->transfer_size >= buf_size) {
-				pr_err("%u-byte transfer size must be lower than %u-buffer size\n",
-				       params->transfer_size, buf_size);
-				break;
-			}
-			len = params->transfer_size;
-		} else if (params->norandom) {
-			len = buf_size;
-		} else {
-			len = dmatest_random() % buf_size + 1;
-		}
+	(*total_tests)++;
 
-		/* Do not alter transfer size explicitly defined by user */
-		if (!params->transfer_size) {
-			len = (len >> align) << align;
-			if (!len)
-				len = 1 << align;
+	/* Calculate transfer length */
+	if (params->transfer_size) {
+		if (params->transfer_size >= buf_size) {
+			pr_err("%u-byte transfer size must be lower than %u-buffer size\n",
+			       params->transfer_size, buf_size);
+			return -EINVAL;
 		}
-		total_len += len;
+		len = params->transfer_size;
+	} else if (params->norandom)
+		len = buf_size;
+	else
+		len = dmatest_random() % buf_size + 1;
 
-		if (params->norandom) {
-			src->off = 0;
-			dst->off = 0;
-		} else {
-			src->off = dmatest_random() % (buf_size - len + 1);
-			dst->off = dmatest_random() % (buf_size - len + 1);
+	/* Align length */
+	if (!params->transfer_size) {
+		len = (len >> align) << align;
+		if (!len)
+			len = 1 << align;
+	}
 
-			src->off = (src->off >> align) << align;
-			dst->off = (dst->off >> align) << align;
-		}
+	*total_len += len;
 
-		if (!params->noverify) {
-			start = ktime_get();
-			dmatest_init_srcs(src->aligned, src->off, len,
-					  buf_size, is_memset);
-			dmatest_init_dsts(dst->aligned, dst->off, len,
-					  buf_size, is_memset);
+	/* Calculate offsets */
+	if (params->norandom) {
+		src->off = 0;
+		dst->off = 0;
+	} else {
+		src->off = dmatest_random() % (buf_size - len + 1);
+		dst->off = dmatest_random() % (buf_size - len + 1);
+		src->off = (src->off >> align) << align;
+		dst->off = (dst->off >> align) << align;
+	}
 
-			diff = ktime_sub(ktime_get(), start);
-			filltime = ktime_add(filltime, diff);
-		}
+	/* Initialize buffers */
+	if (!params->noverify) {
+		start = ktime_get();
+		dmatest_init_srcs(src->aligned, src->off, len, buf_size, is_memset);
+		dmatest_init_dsts(dst->aligned, dst->off, len, buf_size, is_memset);
+		diff = ktime_sub(ktime_get(), start);
+		*filltime = ktime_add(*filltime, diff);
+	}
 
-		um = dmaengine_get_unmap_data(dma_dev, src->cnt + dst->cnt,
-					      GFP_KERNEL);
-		if (!um) {
-			failed_tests++;
-			result("unmap data NULL", total_tests,
-			       src->off, dst->off, len, ret);
-			continue;
-		}
+	/* Map buffers */
+	um = dmaengine_get_unmap_data(dma_dev, src->cnt + dst->cnt, GFP_KERNEL);
+	if (!um) {
+		(*failed_tests)++;
+		result("unmap data NULL", *total_tests, src->off, dst->off, len, ret);
+		return -ENOMEM;
+	}
 
-		um->len = buf_size;
-		for (i = 0; i < src->cnt; i++) {
-			void *buf = src->aligned[i];
-			struct page *pg = virt_to_page(buf);
-			unsigned long pg_off = offset_in_page(buf);
+	um->len = buf_size;
+	srcs = kcalloc(src->cnt, sizeof(dma_addr_t), GFP_KERNEL);
+	if (!srcs) {
+		dmaengine_unmap_put(um);
+		return -ENOMEM;
+	}
 
-			um->addr[i] = dma_map_page(dma_dev, pg, pg_off,
-						   um->len, DMA_TO_DEVICE);
-			srcs[i] = um->addr[i] + src->off;
-			ret = dma_mapping_error(dma_dev, um->addr[i]);
-			if (ret) {
-				result("src mapping error", total_tests,
-				       src->off, dst->off, len, ret);
-				goto error_unmap_continue;
-			}
-			um->to_cnt++;
-		}
-		/* map with DMA_BIDIRECTIONAL to force writeback/invalidate */
-		dsts = &um->addr[src->cnt];
-		for (i = 0; i < dst->cnt; i++) {
-			void *buf = dst->aligned[i];
-			struct page *pg = virt_to_page(buf);
-			unsigned long pg_off = offset_in_page(buf);
-
-			dsts[i] = dma_map_page(dma_dev, pg, pg_off, um->len,
-					       DMA_BIDIRECTIONAL);
-			ret = dma_mapping_error(dma_dev, dsts[i]);
-			if (ret) {
-				result("dst mapping error", total_tests,
-				       src->off, dst->off, len, ret);
-				goto error_unmap_continue;
-			}
-			um->bidi_cnt++;
+	/* Map source buffers */
+	for (int i = 0; i < src->cnt; i++) {
+		void *buf = src->aligned[i];
+		struct page *pg = virt_to_page(buf);
+		unsigned long pg_off = offset_in_page(buf);
+
+		um->addr[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_TO_DEVICE);
+		srcs[i] = um->addr[i] + src->off;
+		ret = dma_mapping_error(dma_dev, um->addr[i]);
+		if (ret) {
+			result("src mapping error", *total_tests, src->off, dst->off, len, ret);
+			goto error_unmap;
 		}
+		um->to_cnt++;
+	}
 
-		if (thread->type == DMA_MEMCPY)
-			tx = dev->device_prep_dma_memcpy(chan,
-							 dsts[0] + dst->off,
-							 srcs[0], len, flags);
-		else if (thread->type == DMA_MEMSET)
-			tx = dev->device_prep_dma_memset(chan,
-						dsts[0] + dst->off,
-						*(src->aligned[0] + src->off),
-						len, flags);
-		else if (thread->type == DMA_XOR)
-			tx = dev->device_prep_dma_xor(chan,
-						      dsts[0] + dst->off,
-						      srcs, src->cnt,
-						      len, flags);
-		else if (thread->type == DMA_PQ) {
-			for (i = 0; i < dst->cnt; i++)
-				dma_pq[i] = dsts[i] + dst->off;
-			tx = dev->device_prep_dma_pq(chan, dma_pq, srcs,
-						     src->cnt, pq_coefs,
-						     len, flags);
+	/* Map destination buffers */
+	dsts = &um->addr[src->cnt];
+	for (int i = 0; i < dst->cnt; i++) {
+		void *buf = dst->aligned[i];
+		struct page *pg = virt_to_page(buf);
+		unsigned long pg_off = offset_in_page(buf);
+
+		dsts[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_BIDIRECTIONAL);
+		ret = dma_mapping_error(dma_dev, dsts[i]);
+		if (ret) {
+			result("dst mapping error", *total_tests, src->off, dst->off, len, ret);
+			goto error_unmap;
 		}
+		um->bidi_cnt++;
+	}
 
-		if (!tx) {
-			result("prep error", total_tests, src->off,
-			       dst->off, len, ret);
-			msleep(100);
-			goto error_unmap_continue;
-		}
+	/* Prepare DMA transaction */
+	if (params->polled)
+		flags = DMA_CTRL_ACK;
+	else
+		flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT;
 
-		done->done = false;
-		if (!params->polled) {
-			tx->callback = dmatest_callback;
-			tx->callback_param = done;
+	if (thread->type == DMA_MEMCPY) {
+		tx = dev->device_prep_dma_memcpy(chan, dsts[0] + dst->off,
+						srcs[0], len, flags);
+	} else if (thread->type == DMA_MEMSET) {
+		tx = dev->device_prep_dma_memset(chan, dsts[0] + dst->off,
+						*(src->aligned[0] + src->off), len, flags);
+	} else if (thread->type == DMA_XOR) {
+		tx = dev->device_prep_dma_xor(chan, dsts[0] + dst->off, srcs,
+					     src->cnt, len, flags);
+	} else if (thread->type == DMA_PQ) {
+		dma_pq = kcalloc(dst->cnt, sizeof(dma_addr_t), GFP_KERNEL);
+		if (!dma_pq) {
+			ret = -ENOMEM;
+			goto error_unmap;
 		}
-		cookie = tx->tx_submit(tx);
+		for (int i = 0; i < dst->cnt; i++)
+			dma_pq[i] = dsts[i] + dst->off;
+		tx = dev->device_prep_dma_pq(chan, dma_pq, srcs,
+					    src->cnt, thread->pq_coefs,
+					    len, flags);
+		kfree(dma_pq);
+	}
 
-		if (dma_submit_error(cookie)) {
-			result("submit error", total_tests, src->off,
-			       dst->off, len, ret);
-			msleep(100);
-			goto error_unmap_continue;
-		}
+	if (!tx) {
+		result("prep error", *total_tests, src->off, dst->off, len, ret);
+		ret = -EIO;
+		goto error_unmap;
+	}
 
-		if (params->polled) {
-			status = dma_sync_wait(chan, cookie);
-			dmaengine_terminate_sync(chan);
-			if (status == DMA_COMPLETE)
-				done->done = true;
-		} else {
-			dma_async_issue_pending(chan);
+	/* Submit transaction */
+	done->done = false;
+	if (!params->polled) {
+		tx->callback = dmatest_callback;
+		tx->callback_param = done;
+	}
 
-			wait_event_freezable_timeout(thread->done_wait,
-					done->done,
-					msecs_to_jiffies(params->timeout));
+	cookie = tx->tx_submit(tx);
 
-			status = dma_async_is_tx_complete(chan, cookie, NULL,
-							  NULL);
-		}
+	if (dma_submit_error(cookie)) {
+		result("submit error", *total_tests, src->off, dst->off, len, ret);
+		ret = -EIO;
+		goto error_unmap;
+	}
 
-		if (!done->done) {
-			result("test timed out", total_tests, src->off, dst->off,
-			       len, 0);
-			goto error_unmap_continue;
-		} else if (status != DMA_COMPLETE &&
-			   !(dma_has_cap(DMA_COMPLETION_NO_ORDER,
-					 dev->cap_mask) &&
-			     status == DMA_OUT_OF_ORDER)) {
-			result(status == DMA_ERROR ?
-			       "completion error status" :
-			       "completion busy status", total_tests, src->off,
-			       dst->off, len, ret);
-			goto error_unmap_continue;
-		}
+	/* Wait for completion */
+	if (params->polled) {
+		status = dma_sync_wait(chan, cookie);
+		dmaengine_terminate_sync(chan);
+		if (status == DMA_COMPLETE)
+			done->done = true;
+	} else {
+		dma_async_issue_pending(chan);
+		wait_event_freezable_timeout(thread->done_wait, done->done,
+				   msecs_to_jiffies(params->timeout));
+		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
+	}
 
-		dmaengine_unmap_put(um);
+	if (!done->done) {
+		result("test timed out", *total_tests, src->off, dst->off, len, 0);
+		ret = -ETIMEDOUT;
+		goto error_unmap;
+	} else if (status != DMA_COMPLETE &&
+		   !(dma_has_cap(DMA_COMPLETION_NO_ORDER, dev->cap_mask) &&
+		     status == DMA_OUT_OF_ORDER)) {
+		result(status == DMA_ERROR ? "completion error status" :
+		       "completion busy status",
+		       *total_tests, src->off, dst->off, len, ret);
+		ret = -EIO;
+		goto error_unmap;
+	}
 
-		if (params->noverify) {
-			verbose_result("test passed", total_tests, src->off,
-				       dst->off, len, 0);
-			continue;
-		}
+	dmaengine_unmap_put(um);
+	kfree(srcs);
 
+	/* Verify results */
+	if (!params->noverify) {
 		start = ktime_get();
-		pr_debug("%s: verifying source buffer...\n", current->comm);
-		error_count = dmatest_verify(src->aligned, 0, src->off,
-				0, PATTERN_SRC, true, is_memset);
+		error_count = dmatest_verify(src->aligned, 0, src->off, 0,
+					    PATTERN_SRC, true, is_memset);
 		error_count += dmatest_verify(src->aligned, src->off,
-				src->off + len, src->off,
-				PATTERN_SRC | PATTERN_COPY, true, is_memset);
+					     src->off + len, src->off,
+					     PATTERN_SRC | PATTERN_COPY, true, is_memset);
 		error_count += dmatest_verify(src->aligned, src->off + len,
-				buf_size, src->off + len,
-				PATTERN_SRC, true, is_memset);
-
-		pr_debug("%s: verifying dest buffer...\n", current->comm);
-		error_count += dmatest_verify(dst->aligned, 0, dst->off,
-				0, PATTERN_DST, false, is_memset);
-
+					     buf_size, src->off + len, PATTERN_SRC, true, is_memset);
+		error_count += dmatest_verify(dst->aligned, 0, dst->off, 0,
+					     PATTERN_DST, false, is_memset);
 		error_count += dmatest_verify(dst->aligned, dst->off,
-				dst->off + len, src->off,
-				PATTERN_SRC | PATTERN_COPY, false, is_memset);
-
+					     dst->off + len, src->off,
+					     PATTERN_SRC | PATTERN_COPY, false, is_memset);
 		error_count += dmatest_verify(dst->aligned, dst->off + len,
-				buf_size, dst->off + len,
-				PATTERN_DST, false, is_memset);
+					     buf_size, dst->off + len, PATTERN_DST, false, is_memset);
 
 		diff = ktime_sub(ktime_get(), start);
-		comparetime = ktime_add(comparetime, diff);
+		*comparetime = ktime_add(*comparetime, diff);
 
 		if (error_count) {
-			result("data error", total_tests, src->off, dst->off,
-			       len, error_count);
-			failed_tests++;
+			result("data error", *total_tests, src->off,
+			       dst->off, len, error_count);
+			(*failed_tests)++;
+			ret = -EIO;
 		} else {
-			verbose_result("test passed", total_tests, src->off,
+			verbose_result("test passed", *total_tests, src->off,
 				       dst->off, len, 0);
+			ret = 0;
 		}
+	} else {
+		verbose_result("test passed", *total_tests, src->off, dst->off, len, 0);
+		ret = 0;
+	}
 
-		continue;
+	return ret;
 
-error_unmap_continue:
-		dmaengine_unmap_put(um);
-		failed_tests++;
+error_unmap:
+	dmaengine_unmap_put(um);
+	kfree(srcs);
+	(*failed_tests)++;
+	return ret;
+}
+
+/*
+ * This function repeatedly tests DMA transfers of various lengths and
+ * offsets for a given operation type until it is told to exit by
+ * kthread_stop(). There may be multiple threads running this function
+ * in parallel for a single channel, and there may be multiple channels
+ * being tested in parallel.
+ *
+ * Before each test, the source and destination buffer is initialized
+ * with a known pattern. This pattern is different depending on
+ * whether it's in an area which is supposed to be copied or
+ * overwritten, and different in the source and destination buffers.
+ * So if the DMA engine doesn't copy exactly what we tell it to copy,
+ * we'll notice.
+ */
+static int dmatest_func(void *data)
+{
+	struct dmatest_thread *thread = data;
+	struct dmatest_info *info = thread->info;
+	struct dmatest_params *params = &info->params;
+	struct dma_chan *chan = thread->chan;
+	unsigned int buf_size;
+	u8 align;
+	bool is_memset;
+	unsigned int failed_tests = 0;
+	unsigned int total_tests = 0;
+	ktime_t ktime, start;
+	ktime_t filltime = 0;
+	ktime_t comparetime = 0;
+	s64 runtime = 0;
+	unsigned long long total_len = 0;
+	unsigned long long iops = 0;
+	int ret;
+
+	set_freezable();
+	smp_rmb();
+	thread->pending = false;
+
+	/* Setup test parameters and allocate buffers */
+	ret = dmatest_setup_test(thread, &buf_size, &align, &is_memset);
+	if (ret)
+		goto err_thread_type;
+
+	set_user_nice(current, 10);
+
+	ktime = start = ktime_get();
+	while (!(kthread_should_stop() ||
+		(params->iterations && total_tests >= params->iterations))) {
+
+		ret = dmatest_do_dma_test(thread, buf_size, align, is_memset,
+					  &total_tests, &failed_tests, &total_len,
+					  &filltime, &comparetime);
+		if (ret < 0)
+			break;
 	}
+
 	ktime = ktime_sub(ktime_get(), ktime);
 	ktime = ktime_sub(ktime, comparetime);
 	ktime = ktime_sub(ktime, filltime);
 	runtime = ktime_to_us(ktime);
 
 	ret = 0;
-	kfree(dma_pq);
-err_srcs_array:
-	kfree(srcs);
-err_dst:
-	dmatest_free_test_data(dst);
-err_src:
-	dmatest_free_test_data(src);
-err_free_coefs:
-	kfree(pq_coefs);
+	dmatest_cleanup_test(thread);
+
 err_thread_type:
 	iops = dmatest_persec(runtime, total_tests);
 	pr_info("%s: summary %u tests, %u failures %llu.%02llu iops %llu KB/s (%d)\n",
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] dmatest: move printing to its own routine
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 2/6] dmatest: split dmatest_func() into helpers Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-21 14:41   ` Robin Murphy
  2025-05-21 22:26   ` kernel test robot
  2025-05-20 22:39 ` [PATCH 4/6] dmatest: add IOVA tests Luis Chamberlain
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

Move statistics printing to its own routine, and while at it, put
the test counters into the struct dmatest_thread for the streaming DMA
API to allow us to later add IOVA DMA API support and be able to
differentiate.

While at it, use a mutex to serialize output so we don't get garbled
messages between different threads.

This makes no functional changes other than serializing the output
and prepping us for IOVA DMA API support.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/dma/dmatest.c | 77 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 921d89b4d2ed..b4c129e688e3 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,8 @@ static bool polled;
 module_param(polled, bool, 0644);
 MODULE_PARM_DESC(polled, "Use polling for completion instead of interrupts");
 
+static DEFINE_MUTEX(stats_mutex);
+
 /**
  * struct dmatest_params - test parameters.
  * @nobounce:		prevent using swiotlb buffer
@@ -239,6 +241,12 @@ struct dmatest_thread {
 	bool			done;
 	bool			pending;
 	u8			*pq_coefs;
+
+	/* Streaming DMA statistics */
+	unsigned int streaming_tests;
+	unsigned int streaming_failures;
+	unsigned long long streaming_total_len;
+	ktime_t streaming_runtime;
 };
 
 struct dmatest_chan {
@@ -898,6 +906,30 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 	return ret;
 }
 
+static void dmatest_print_detailed_stats(struct dmatest_thread *thread)
+{
+	unsigned long long streaming_iops, streaming_kbs;
+	s64 streaming_runtime_us;
+
+	mutex_lock(&stats_mutex);
+
+	streaming_runtime_us = ktime_to_us(thread->streaming_runtime);
+	streaming_iops = dmatest_persec(streaming_runtime_us, thread->streaming_tests);
+	streaming_kbs = dmatest_KBs(streaming_runtime_us, thread->streaming_total_len);
+
+	pr_info("=== %s: DMA Test Results ===\n", current->comm);
+
+	/* Streaming DMA statistics */
+	pr_info("%s: STREAMINMG DMA: %u tests, %u failures\n",
+		current->comm, thread->streaming_tests, thread->streaming_failures);
+	pr_info("%s: STREAMING DMA: %llu.%02llu iops, %llu KB/s, %lld us total\n",
+		current->comm, FIXPT_TO_INT(streaming_iops), FIXPT_GET_FRAC(streaming_iops),
+		streaming_kbs, streaming_runtime_us);
+
+	pr_info("=== %s: End Results ===\n", current->comm);
+	mutex_unlock(&stats_mutex);
+}
+
 /*
  * This function repeatedly tests DMA transfers of various lengths and
  * offsets for a given operation type until it is told to exit by
@@ -921,20 +953,22 @@ static int dmatest_func(void *data)
 	unsigned int buf_size;
 	u8 align;
 	bool is_memset;
-	unsigned int failed_tests = 0;
-	unsigned int total_tests = 0;
-	ktime_t ktime, start;
+	unsigned int total_iterations = 0;
+	ktime_t start_time, streaming_start;
 	ktime_t filltime = 0;
 	ktime_t comparetime = 0;
-	s64 runtime = 0;
-	unsigned long long total_len = 0;
-	unsigned long long iops = 0;
 	int ret;
 
 	set_freezable();
 	smp_rmb();
 	thread->pending = false;
 
+	/* Initialize statistics */
+	thread->streaming_tests = 0;
+	thread->streaming_failures = 0;
+	thread->streaming_total_len = 0;
+	thread->streaming_runtime = 0;
+
 	/* Setup test parameters and allocate buffers */
 	ret = dmatest_setup_test(thread, &buf_size, &align, &is_memset);
 	if (ret)
@@ -942,34 +976,39 @@ static int dmatest_func(void *data)
 
 	set_user_nice(current, 10);
 
-	ktime = start = ktime_get();
+	start_time = ktime_get();
 	while (!(kthread_should_stop() ||
-		(params->iterations && total_tests >= params->iterations))) {
+		(params->iterations && total_iterations >= params->iterations))) {
 
+		/* Test streaming DMA path */
+		streaming_start = ktime_get();
 		ret = dmatest_do_dma_test(thread, buf_size, align, is_memset,
-					  &total_tests, &failed_tests, &total_len,
+					  &thread->streaming_tests, &thread->streaming_failures,
+					  &thread->streaming_total_len,
 					  &filltime, &comparetime);
+		thread->streaming_runtime = ktime_add(thread->streaming_runtime,
+						    ktime_sub(ktime_get(), streaming_start));
 		if (ret < 0)
 			break;
+
+		total_iterations++;
 	}
 
-	ktime = ktime_sub(ktime_get(), ktime);
-	ktime = ktime_sub(ktime, comparetime);
-	ktime = ktime_sub(ktime, filltime);
-	runtime = ktime_to_us(ktime);
+	/* Subtract fill and compare time from both paths */
+	thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
+					   ktime_divns(filltime, 2));
+	thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
+					   ktime_divns(comparetime, 2));
 
 	ret = 0;
 	dmatest_cleanup_test(thread);
 
 err_thread_type:
-	iops = dmatest_persec(runtime, total_tests);
-	pr_info("%s: summary %u tests, %u failures %llu.%02llu iops %llu KB/s (%d)\n",
-		current->comm, total_tests, failed_tests,
-		FIXPT_TO_INT(iops), FIXPT_GET_FRAC(iops),
-		dmatest_KBs(runtime, total_len), ret);
+	/* Print detailed statistics */
+	dmatest_print_detailed_stats(thread);
 
 	/* terminate all transfers on specified channels */
-	if (ret || failed_tests)
+	if (ret || (thread->streaming_failures))
 		dmaengine_terminate_sync(chan);
 
 	thread->done = true;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] dmatest: add IOVA tests
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
                   ` (2 preceding siblings ...)
  2025-05-20 22:39 ` [PATCH 3/6] dmatest: move printing to its own routine Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 5/6] dma-mapping: benchmark: move validation parameters into a helper Luis Chamberlain
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

The IOVA DMA API was added for very efficient mapping when using an
IOMMU, but we lack easy quick tests for it. Just leverage the existing
dmatest driver. We skip IOVA tests if use_dma_iommu() is false, as
dma_iova_try_alloc() would otherwise fail as an IOMMU is needed.

This also lets you compare and contrast performance on both APIs.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/dma/dmatest.c | 285 +++++++++++++++++++++++++++++++++---------
 1 file changed, 229 insertions(+), 56 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index b4c129e688e3..deec99d43742 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -20,6 +20,7 @@
 #include <linux/random.h>
 #include <linux/slab.h>
 #include <linux/wait.h>
+#include <linux/iommu-dma.h>
 
 static bool nobounce;
 module_param(nobounce, bool, 0644);
@@ -247,6 +248,19 @@ struct dmatest_thread {
 	unsigned int streaming_failures;
 	unsigned long long streaming_total_len;
 	ktime_t streaming_runtime;
+
+	bool iova_support;
+	/* IOVA DMA statistics */
+	unsigned int iova_tests;
+	unsigned int iova_failures;
+	unsigned long long iova_total_len;
+	ktime_t iova_runtime;
+
+	/* IOVA-specific timings */
+	ktime_t iova_alloc_time;
+	ktime_t iova_link_time;
+	ktime_t iova_sync_time;
+	ktime_t iova_destroy_time;
 };
 
 struct dmatest_chan {
@@ -667,7 +681,8 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 			       unsigned int *failed_tests,
 			       unsigned long long *total_len,
 			       ktime_t *filltime,
-			       ktime_t *comparetime)
+			       ktime_t *comparetime,
+			       bool use_iova)
 {
 	struct dmatest_info *info = thread->info;
 	struct dmatest_params *params = &info->params;
@@ -677,10 +692,12 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 	struct dmatest_data *src = &thread->src;
 	struct dmatest_data *dst = &thread->dst;
 	struct dmatest_done *done = &thread->test_done;
+	struct dma_iova_state iova_state = {};
+	bool iova_used = false;
 	dma_addr_t *srcs;
 	dma_addr_t *dma_pq;
 	struct dma_async_tx_descriptor *tx = NULL;
-	struct dmaengine_unmap_data *um;
+	struct dmaengine_unmap_data *um = NULL;
 	dma_addr_t *dsts;
 	unsigned int len;
 	unsigned int error_count;
@@ -689,6 +706,7 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 	enum dma_status status;
 	ktime_t start, diff;
 	int ret;
+	enum dma_data_direction dir = DMA_BIDIRECTIONAL;
 
 	(*total_tests)++;
 
@@ -734,51 +752,123 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 		*filltime = ktime_add(*filltime, diff);
 	}
 
-	/* Map buffers */
-	um = dmaengine_get_unmap_data(dma_dev, src->cnt + dst->cnt, GFP_KERNEL);
-	if (!um) {
-		(*failed_tests)++;
-		result("unmap data NULL", *total_tests, src->off, dst->off, len, ret);
-		return -ENOMEM;
-	}
+	/* Try IOVA path if requested */
+	if (use_iova) {
+		phys_addr_t src_phys = virt_to_phys(src->aligned[0] + src->off);
+		ktime_t iova_start;
 
-	um->len = buf_size;
-	srcs = kcalloc(src->cnt, sizeof(dma_addr_t), GFP_KERNEL);
-	if (!srcs) {
-		dmaengine_unmap_put(um);
-		return -ENOMEM;
-	}
+		/* Track IOVA allocation time */
+		iova_start = ktime_get();
+		if (dma_iova_try_alloc(dma_dev, &iova_state, src_phys, len)) {
+			thread->iova_alloc_time = ktime_add(thread->iova_alloc_time,
+							   ktime_sub(ktime_get(), iova_start));
 
-	/* Map source buffers */
-	for (int i = 0; i < src->cnt; i++) {
-		void *buf = src->aligned[i];
-		struct page *pg = virt_to_page(buf);
-		unsigned long pg_off = offset_in_page(buf);
-
-		um->addr[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_TO_DEVICE);
-		srcs[i] = um->addr[i] + src->off;
-		ret = dma_mapping_error(dma_dev, um->addr[i]);
-		if (ret) {
-			result("src mapping error", *total_tests, src->off, dst->off, len, ret);
-			goto error_unmap;
+			/* Track IOVA link time */
+			iova_start = ktime_get();
+			ret = dma_iova_link(dma_dev, &iova_state, src_phys, 0, len, dir, 0);
+			thread->iova_link_time = ktime_add(thread->iova_link_time,
+							  ktime_sub(ktime_get(), iova_start));
+
+			if (ret) {
+				verbose_result("IOVA link failed",
+					      *total_tests, src->off, dst->off, len, ret);
+				dma_iova_free(dma_dev, &iova_state);
+				return ret;
+			}
+
+			/* Track IOVA sync time */
+			iova_start = ktime_get();
+			ret = dma_iova_sync(dma_dev, &iova_state, 0, len);
+			thread->iova_sync_time = ktime_add(thread->iova_sync_time,
+							  ktime_sub(ktime_get(), iova_start));
+
+			if (ret) {
+				verbose_result("IOVA sync failed",
+					      *total_tests, src->off, dst->off, len, ret);
+				dma_iova_unlink(dma_dev, &iova_state, 0, len, dir, 0);
+				dma_iova_free(dma_dev, &iova_state);
+				return ret;
+			}
+
+			iova_used = true;
+			verbose_result("IOVA path used", *total_tests, src->off, dst->off, len,
+				      (unsigned long)iova_state.addr);
+		} else {
+			thread->iova_alloc_time = ktime_add(thread->iova_alloc_time,
+							   ktime_sub(ktime_get(), iova_start));
+			verbose_result("IOVA allocation failed",
+				      *total_tests, src->off, dst->off, len, 0);
+			return -EINVAL;
 		}
-		um->to_cnt++;
 	}
 
-	/* Map destination buffers */
-	dsts = &um->addr[src->cnt];
-	for (int i = 0; i < dst->cnt; i++) {
-		void *buf = dst->aligned[i];
-		struct page *pg = virt_to_page(buf);
-		unsigned long pg_off = offset_in_page(buf);
-
-		dsts[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_BIDIRECTIONAL);
-		ret = dma_mapping_error(dma_dev, dsts[i]);
-		if (ret) {
-			result("dst mapping error", *total_tests, src->off, dst->off, len, ret);
-			goto error_unmap;
+	if (!iova_used) {
+		/* Regular DMA mapping path */
+		um = dmaengine_get_unmap_data(dma_dev, src->cnt + dst->cnt, GFP_KERNEL);
+		if (!um) {
+			(*failed_tests)++;
+			result("unmap data NULL", *total_tests, src->off, dst->off, len, ret);
+			return -ENOMEM;
 		}
-		um->bidi_cnt++;
+
+		um->len = buf_size;
+		srcs = kcalloc(src->cnt, sizeof(dma_addr_t), GFP_KERNEL);
+		if (!srcs) {
+			dmaengine_unmap_put(um);
+			return -ENOMEM;
+		}
+
+		/* Map source buffers */
+		for (int i = 0; i < src->cnt; i++) {
+			void *buf = src->aligned[i];
+			struct page *pg = virt_to_page(buf);
+			unsigned long pg_off = offset_in_page(buf);
+
+			um->addr[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_TO_DEVICE);
+			srcs[i] = um->addr[i] + src->off;
+			ret = dma_mapping_error(dma_dev, um->addr[i]);
+			if (ret) {
+				result("src mapping error", *total_tests, src->off, dst->off, len, ret);
+				goto error_unmap;
+			}
+			um->to_cnt++;
+		}
+
+		/* Map destination buffers */
+		dsts = &um->addr[src->cnt];
+		for (int i = 0; i < dst->cnt; i++) {
+			void *buf = dst->aligned[i];
+			struct page *pg = virt_to_page(buf);
+			unsigned long pg_off = offset_in_page(buf);
+
+			dsts[i] = dma_map_page(dma_dev, pg, pg_off, um->len, DMA_BIDIRECTIONAL);
+			ret = dma_mapping_error(dma_dev, dsts[i]);
+			if (ret) {
+				result("dst mapping error", *total_tests, src->off, dst->off, len, ret);
+				goto error_unmap;
+			}
+			um->bidi_cnt++;
+		}
+	} else {
+		/* For IOVA path, create simple arrays pointing to the IOVA */
+		srcs = kcalloc(src->cnt, sizeof(dma_addr_t), GFP_KERNEL);
+		if (!srcs) {
+			ret = -ENOMEM;
+			goto error_iova_cleanup;
+		}
+
+		dsts = kcalloc(dst->cnt, sizeof(dma_addr_t), GFP_KERNEL);
+		if (!dsts) {
+			ret = -ENOMEM;
+			kfree(srcs);
+			goto error_iova_cleanup;
+		}
+
+		/* For simplicity, use the same IOVA for src and dst in test */
+		for (int i = 0; i < src->cnt; i++)
+			srcs[i] = iova_state.addr;
+		for (int i = 0; i < dst->cnt; i++)
+			dsts[i] = iova_state.addr;
 	}
 
 	/* Prepare DMA transaction */
@@ -858,8 +948,18 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 		goto error_unmap;
 	}
 
-	dmaengine_unmap_put(um);
-	kfree(srcs);
+	/* Cleanup mappings */
+	if (iova_used) {
+		ktime_t destroy_start = ktime_get();
+		dma_iova_destroy(dma_dev, &iova_state, len, dir, 0);
+		thread->iova_destroy_time = ktime_add(thread->iova_destroy_time,
+						     ktime_sub(ktime_get(), destroy_start));
+		kfree(srcs);
+		kfree(dsts);
+	} else {
+		dmaengine_unmap_put(um);
+		kfree(srcs);
+	}
 
 	/* Verify results */
 	if (!params->noverify) {
@@ -883,49 +983,88 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
 		*comparetime = ktime_add(*comparetime, diff);
 
 		if (error_count) {
-			result("data error", *total_tests, src->off,
-			       dst->off, len, error_count);
+			result(iova_used ? "IOVA data error" : "data error", *total_tests,
+			       src->off, dst->off, len, error_count);
 			(*failed_tests)++;
 			ret = -EIO;
 		} else {
-			verbose_result("test passed", *total_tests, src->off,
-				       dst->off, len, 0);
+			verbose_result(iova_used ? "IOVA test passed" : "test passed",
+				      *total_tests, src->off, dst->off, len, 0);
 			ret = 0;
 		}
 	} else {
-		verbose_result("test passed", *total_tests, src->off, dst->off, len, 0);
+		verbose_result(iova_used ? "IOVA test passed" : "test passed",
+			      *total_tests, src->off, dst->off, len, 0);
 		ret = 0;
 	}
 
 	return ret;
 
 error_unmap:
-	dmaengine_unmap_put(um);
-	kfree(srcs);
+	if (iova_used) {
+		kfree(srcs);
+		kfree(dsts);
+		goto error_iova_cleanup;
+	} else {
+		dmaengine_unmap_put(um);
+		kfree(srcs);
+	}
+	(*failed_tests)++;
+	return ret;
+
+error_iova_cleanup:
+	dma_iova_destroy(dma_dev, &iova_state, len, dir, 0);
 	(*failed_tests)++;
 	return ret;
 }
 
 static void dmatest_print_detailed_stats(struct dmatest_thread *thread)
 {
-	unsigned long long streaming_iops, streaming_kbs;
-	s64 streaming_runtime_us;
+	unsigned long long streaming_iops, streaming_kbs, iova_iops, iova_kbs;
+	s64 streaming_runtime_us, iova_runtime_us;
 
 	mutex_lock(&stats_mutex);
 
 	streaming_runtime_us = ktime_to_us(thread->streaming_runtime);
+	iova_runtime_us = ktime_to_us(thread->iova_runtime);
+
 	streaming_iops = dmatest_persec(streaming_runtime_us, thread->streaming_tests);
+	iova_iops = dmatest_persec(iova_runtime_us, thread->iova_tests);
+
 	streaming_kbs = dmatest_KBs(streaming_runtime_us, thread->streaming_total_len);
+	iova_kbs = dmatest_KBs(iova_runtime_us, thread->iova_total_len);
 
 	pr_info("=== %s: DMA Test Results ===\n", current->comm);
 
-	/* Streaming DMA statistics */
 	pr_info("%s: STREAMINMG DMA: %u tests, %u failures\n",
 		current->comm, thread->streaming_tests, thread->streaming_failures);
 	pr_info("%s: STREAMING DMA: %llu.%02llu iops, %llu KB/s, %lld us total\n",
 		current->comm, FIXPT_TO_INT(streaming_iops), FIXPT_GET_FRAC(streaming_iops),
 		streaming_kbs, streaming_runtime_us);
 
+	if (!thread->iova_support)
+		goto out;
+
+	pr_info("%s: IOVA DMA: %u tests, %u failures\n",
+		current->comm, thread->iova_tests, thread->iova_failures);
+	pr_info("%s: IOVA DMA: %llu.%02llu iops, %llu KB/s, %lld us total\n",
+		current->comm, FIXPT_TO_INT(iova_iops), FIXPT_GET_FRAC(iova_iops),
+		iova_kbs, iova_runtime_us);
+
+	pr_info("%s: IOVA timings: alloc %lld us, link %lld us, sync %lld us, destroy %lld us\n",
+		current->comm,
+		ktime_to_us(thread->iova_alloc_time),
+		ktime_to_us(thread->iova_link_time),
+		ktime_to_us(thread->iova_sync_time),
+		ktime_to_us(thread->iova_destroy_time));
+
+	if (streaming_runtime_us > 0 && iova_runtime_us > 0) {
+		long long speedup_pct = ((long long)streaming_runtime_us - iova_runtime_us) * 100 / streaming_runtime_us;
+		pr_info("%s: PERFORMANCE: IOVA is %lld%% %s than STREAMING DMA\n",
+			current->comm, abs(speedup_pct),
+			speedup_pct > 0 ? "faster" : "slower");
+	}
+out:
 	pr_info("=== %s: End Results ===\n", current->comm);
 	mutex_unlock(&stats_mutex);
 }
@@ -937,6 +1076,8 @@ static void dmatest_print_detailed_stats(struct dmatest_thread *thread)
  * in parallel for a single channel, and there may be multiple channels
  * being tested in parallel.
  *
+ * We test both Regular DMA and IOVA paths.
+ *
  * Before each test, the source and destination buffer is initialized
  * with a known pattern. This pattern is different depending on
  * whether it's in an area which is supposed to be copied or
@@ -950,11 +1091,12 @@ static int dmatest_func(void *data)
 	struct dmatest_info *info = thread->info;
 	struct dmatest_params *params = &info->params;
 	struct dma_chan *chan = thread->chan;
+	struct device *dev = dmaengine_get_dma_device(chan);
 	unsigned int buf_size;
 	u8 align;
 	bool is_memset;
 	unsigned int total_iterations = 0;
-	ktime_t start_time, streaming_start;
+	ktime_t start_time, streaming_start, iova_start;
 	ktime_t filltime = 0;
 	ktime_t comparetime = 0;
 	int ret;
@@ -968,6 +1110,15 @@ static int dmatest_func(void *data)
 	thread->streaming_failures = 0;
 	thread->streaming_total_len = 0;
 	thread->streaming_runtime = 0;
+	thread->iova_support = use_dma_iommu(dev);
+	thread->iova_tests = 0;
+	thread->iova_failures = 0;
+	thread->iova_total_len = 0;
+	thread->iova_runtime = 0;
+	thread->iova_alloc_time = 0;
+	thread->iova_link_time = 0;
+	thread->iova_sync_time = 0;
+	thread->iova_destroy_time = 0;
 
 	/* Setup test parameters and allocate buffers */
 	ret = dmatest_setup_test(thread, &buf_size, &align, &is_memset);
@@ -985,12 +1136,28 @@ static int dmatest_func(void *data)
 		ret = dmatest_do_dma_test(thread, buf_size, align, is_memset,
 					  &thread->streaming_tests, &thread->streaming_failures,
 					  &thread->streaming_total_len,
-					  &filltime, &comparetime);
+					  &filltime, &comparetime, false);
 		thread->streaming_runtime = ktime_add(thread->streaming_runtime,
 						    ktime_sub(ktime_get(), streaming_start));
 		if (ret < 0)
 			break;
 
+		/* Test IOVA path */
+		if (thread->iova_support) {
+			iova_start = ktime_get();
+			ret = dmatest_do_dma_test(thread, buf_size,
+						  align, is_memset,
+						  &thread->iova_tests,
+						  &thread->iova_failures,
+						  &thread->iova_total_len,
+						  &filltime, &comparetime, true);
+			thread->iova_runtime = ktime_add(thread->iova_runtime,
+							ktime_sub(ktime_get(),
+							iova_start));
+			if (ret < 0)
+				break;
+		}
+
 		total_iterations++;
 	}
 
@@ -999,6 +1166,12 @@ static int dmatest_func(void *data)
 					   ktime_divns(filltime, 2));
 	thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
 					   ktime_divns(comparetime, 2));
+	if (thread->iova_support) {
+		thread->iova_runtime = ktime_sub(thread->iova_runtime,
+						ktime_divns(filltime, 2));
+		thread->iova_runtime = ktime_sub(thread->iova_runtime,
+						ktime_divns(comparetime, 2));
+	}
 
 	ret = 0;
 	dmatest_cleanup_test(thread);
@@ -1008,7 +1181,7 @@ static int dmatest_func(void *data)
 	dmatest_print_detailed_stats(thread);
 
 	/* terminate all transfers on specified channels */
-	if (ret || (thread->streaming_failures))
+	if (ret || (thread->streaming_failures + thread->iova_failures))
 		dmaengine_terminate_sync(chan);
 
 	thread->done = true;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] dma-mapping: benchmark: move validation parameters into a helper
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
                   ` (3 preceding siblings ...)
  2025-05-20 22:39 ` [PATCH 4/6] dmatest: add IOVA tests Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-20 22:39 ` [PATCH 6/6] dma-mapping: benchmark: add IOVA support Luis Chamberlain
  2025-05-21 11:17 ` [PATCH 0/6] dma: fake-dma and IOVA tests Leon Romanovsky
  6 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

Before we run the benchmark we validate the input parameters. Move
this into a helper so we can also use this for other type of DMA
benchmark tests. This will be used in a subsequent patch for another
type of DMA benchmark.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/dma/map_benchmark.c | 99 +++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 45 deletions(-)

diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index cc19a3efea89..b54345a757cb 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -196,6 +196,55 @@ static int do_map_benchmark(struct map_benchmark_data *map)
 	return ret;
 }
 
+static int validate_benchmark_params(struct map_benchmark_data *map)
+{
+	if (map->bparam.threads == 0 ||
+	    map->bparam.threads > DMA_MAP_MAX_THREADS) {
+		pr_err("invalid thread number\n");
+		return -EINVAL;
+	}
+
+	if (map->bparam.seconds == 0 ||
+	    map->bparam.seconds > DMA_MAP_MAX_SECONDS) {
+		pr_err("invalid duration seconds\n");
+		return -EINVAL;
+	}
+
+	if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {
+		pr_err("invalid transmission delay\n");
+		return -EINVAL;
+	}
+
+	if (map->bparam.node != NUMA_NO_NODE &&
+	    (map->bparam.node < 0 || map->bparam.node >= MAX_NUMNODES ||
+	     !node_possible(map->bparam.node))) {
+		pr_err("invalid numa node\n");
+		return -EINVAL;
+	}
+
+	if (map->bparam.granule < 1 || map->bparam.granule > 1024) {
+		pr_err("invalid granule size\n");
+		return -EINVAL;
+	}
+
+	switch (map->bparam.dma_dir) {
+	case DMA_MAP_BIDIRECTIONAL:
+		map->dir = DMA_BIDIRECTIONAL;
+		break;
+	case DMA_MAP_FROM_DEVICE:
+		map->dir = DMA_FROM_DEVICE;
+		break;
+	case DMA_MAP_TO_DEVICE:
+		map->dir = DMA_TO_DEVICE;
+		break;
+	default:
+		pr_err("invalid DMA direction\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
@@ -207,54 +256,13 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 	if (copy_from_user(&map->bparam, argp, sizeof(map->bparam)))
 		return -EFAULT;
 
+	ret = validate_benchmark_params(map);
+	if (ret)
+		return ret;
+
 	switch (cmd) {
 	case DMA_MAP_BENCHMARK:
-		if (map->bparam.threads == 0 ||
-		    map->bparam.threads > DMA_MAP_MAX_THREADS) {
-			pr_err("invalid thread number\n");
-			return -EINVAL;
-		}
-
-		if (map->bparam.seconds == 0 ||
-		    map->bparam.seconds > DMA_MAP_MAX_SECONDS) {
-			pr_err("invalid duration seconds\n");
-			return -EINVAL;
-		}
-
-		if (map->bparam.dma_trans_ns > DMA_MAP_MAX_TRANS_DELAY) {
-			pr_err("invalid transmission delay\n");
-			return -EINVAL;
-		}
-
-		if (map->bparam.node != NUMA_NO_NODE &&
-		    (map->bparam.node < 0 || map->bparam.node >= MAX_NUMNODES ||
-		     !node_possible(map->bparam.node))) {
-			pr_err("invalid numa node\n");
-			return -EINVAL;
-		}
-
-		if (map->bparam.granule < 1 || map->bparam.granule > 1024) {
-			pr_err("invalid granule size\n");
-			return -EINVAL;
-		}
-
-		switch (map->bparam.dma_dir) {
-		case DMA_MAP_BIDIRECTIONAL:
-			map->dir = DMA_BIDIRECTIONAL;
-			break;
-		case DMA_MAP_FROM_DEVICE:
-			map->dir = DMA_FROM_DEVICE;
-			break;
-		case DMA_MAP_TO_DEVICE:
-			map->dir = DMA_TO_DEVICE;
-			break;
-		default:
-			pr_err("invalid DMA direction\n");
-			return -EINVAL;
-		}
-
 		old_dma_mask = dma_get_mask(map->dev);
-
 		ret = dma_set_mask(map->dev,
 				   DMA_BIT_MASK(map->bparam.dma_bits));
 		if (ret) {
@@ -263,6 +271,7 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 			return -EINVAL;
 		}
 
+		/* Run streaming DMA benchmark */
 		ret = do_map_benchmark(map);
 
 		/*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] dma-mapping: benchmark: add IOVA support
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
                   ` (4 preceding siblings ...)
  2025-05-20 22:39 ` [PATCH 5/6] dma-mapping: benchmark: move validation parameters into a helper Luis Chamberlain
@ 2025-05-20 22:39 ` Luis Chamberlain
  2025-05-21 11:58   ` kernel test robot
  2025-05-21 16:08   ` Robin Murphy
  2025-05-21 11:17 ` [PATCH 0/6] dma: fake-dma and IOVA tests Leon Romanovsky
  6 siblings, 2 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-20 22:39 UTC (permalink / raw)
  To: vkoul, chenxiang66, m.szyprowski, robin.murphy, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev, mcgrof

Add support to use the IOVA DMA API, and allow comparing and contrasting
to the streaming DMA API. Since the IOVA is intended to be an enhancement
when using an IOMMU which supports DMA over it only allow the IOVA to be
used proactively for devices which have this support, that is when
use_dma_iommu() is true. We don't try a fallback as the goal is clear,
only to use the IOVA when intended.

Example output, using intel-iommu on qemu against a random number
generator device, this output is completely artificial as its a VM
and its using more threads than the guest even has cores, the goal was
to at least visualize some numerical output on both paths:

./tools/testing/selftests/dma/dma_map_benchmark -t 24 -i 2
=== DMA Mapping Benchmark Results ===
Configuration: threads:24 seconds:20 node:-1 dir:BIDIRECTIONAL granule:1 iova:2
Buffer size: 1 pages (4 KB)

STREAMING DMA RESULTS:
  Map   latency:    12.3 μs (σ=257.9 μs)
  Unmap latency:     3.7 μs (σ=142.5 μs)
  Total latency:    16.0 μs

IOVA DMA RESULTS:
  Alloc   latency:     0.1 μs (σ= 31.1 μs)
  Link    latency:     2.5 μs (σ=116.9 μs)
  Sync    latency:     9.6 μs (σ=227.8 μs)
  Destroy latency:     3.6 μs (σ=141.2 μs)
  Total latency:    15.8 μs

PERFORMANCE COMPARISON:
  Streaming DMA total:    16.0 μs
  IOVA DMA total:         15.8 μs
  Performance ratio:      0.99x (IOVA is 1.3% faster)
  Streaming throughput:    62500 ops/sec
  IOVA throughput:         63291 ops/sec
  Streaming bandwidth:     244.1 MB/s
  IOVA bandwidth:          247.2 MB/s

IOVA OPERATION BREAKDOWN:
  Alloc:     0.6% (   0.1 μs)
  Link:     15.8% (   2.5 μs)
  Sync:     60.8% (   9.6 μs)
  Destroy:  22.8% (   3.6 μs)

RECOMMENDATIONS:
  ~ IOVA and Streaming APIs show similar performance
=== End of Benchmark ===

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/map_benchmark.h                 |  11 +
 kernel/dma/Kconfig                            |   4 +-
 kernel/dma/map_benchmark.c                    | 417 +++++++++++++++++-
 .../testing/selftests/dma/dma_map_benchmark.c | 145 +++++-
 4 files changed, 562 insertions(+), 15 deletions(-)

diff --git a/include/linux/map_benchmark.h b/include/linux/map_benchmark.h
index 62674c83bde4..da7c9e3ddf21 100644
--- a/include/linux/map_benchmark.h
+++ b/include/linux/map_benchmark.h
@@ -7,6 +7,7 @@
 #define _KERNEL_DMA_BENCHMARK_H
 
 #define DMA_MAP_BENCHMARK       _IOWR('d', 1, struct map_benchmark)
+#define DMA_MAP_BENCHMARK_IOVA	_IOWR('d', 2, struct map_benchmark)
 #define DMA_MAP_MAX_THREADS     1024
 #define DMA_MAP_MAX_SECONDS     300
 #define DMA_MAP_MAX_TRANS_DELAY (10 * NSEC_PER_MSEC)
@@ -27,5 +28,15 @@ struct map_benchmark {
 	__u32 dma_dir; /* DMA data direction */
 	__u32 dma_trans_ns; /* time for DMA transmission in ns */
 	__u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */
+	__u32 has_iommu_dma;
+	__u64 avg_iova_alloc_100ns;
+	__u64 avg_iova_link_100ns;
+	__u64 avg_iova_sync_100ns;
+	__u64 avg_iova_destroy_100ns;
+	__u64 iova_alloc_stddev;
+	__u64 iova_link_stddev;
+	__u64 iova_sync_stddev;
+	__u64 iova_destroy_stddev;
+	__u32 use_iova; /* 0=regular, 1=IOVA, 2=both */
 };
 #endif /* _KERNEL_DMA_BENCHMARK_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 31cfdb6b4bc3..e2d5784f46eb 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -261,10 +261,10 @@ config DMA_API_DEBUG
 	  If unsure, say N.
 
 config DMA_MAP_BENCHMARK
-	bool "Enable benchmarking of streaming DMA mapping"
+	bool "Enable benchmarking of streaming and IOVA DMA mapping"
 	depends on DEBUG_FS
 	help
 	  Provides /sys/kernel/debug/dma_map_benchmark that helps with testing
-	  performance of dma_(un)map_page.
+	  performance of the streaming DMA dma_(un)map_page and IOVA API.
 
 	  See tools/testing/selftests/dma/dma_map_benchmark.c
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index b54345a757cb..3ae34433420b 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -18,6 +18,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/timekeeping.h>
+#include <linux/iommu-dma.h>
 
 struct map_benchmark_data {
 	struct map_benchmark bparam;
@@ -29,9 +30,127 @@ struct map_benchmark_data {
 	atomic64_t sum_sq_map;
 	atomic64_t sum_sq_unmap;
 	atomic64_t loops;
+
+	/* IOVA-specific counters */
+	atomic64_t sum_iova_alloc_100ns;
+	atomic64_t sum_iova_link_100ns;
+	atomic64_t sum_iova_sync_100ns;
+	atomic64_t sum_iova_destroy_100ns;
+	atomic64_t sum_sq_iova_alloc;
+	atomic64_t sum_sq_iova_link;
+	atomic64_t sum_sq_iova_sync;
+	atomic64_t sum_sq_iova_destroy;
+	atomic64_t iova_loops;
 };
 
-static int map_benchmark_thread(void *data)
+static int benchmark_thread_iova(void *data)
+{
+	void *buf;
+	struct map_benchmark_data *map = data;
+	int npages = map->bparam.granule;
+	u64 size = npages * PAGE_SIZE;
+	int ret = 0;
+	enum dma_data_direction dir = map->dir;
+
+	buf = alloc_pages_exact(size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	while (!kthread_should_stop()) {
+		struct dma_iova_state iova_state;
+		phys_addr_t phys;
+		ktime_t alloc_stime, alloc_etime, link_stime, link_etime;
+		ktime_t sync_stime, sync_etime, destroy_stime, destroy_etime;
+		ktime_t alloc_delta, link_delta, sync_delta, destroy_delta;
+		u64 alloc_100ns, link_100ns, sync_100ns, destroy_100ns;
+		u64 alloc_sq, link_sq, sync_sq, destroy_sq;
+
+		/* Stain cache if needed */
+		if (map->dir != DMA_FROM_DEVICE)
+			memset(buf, 0x66, size);
+
+		phys = virt_to_phys(buf);
+
+		/* IOVA allocation */
+		alloc_stime = ktime_get();
+		if (!dma_iova_try_alloc(map->dev, &iova_state, phys, size)) {
+			pr_warn_once("IOVA allocation not supported on device %s\n",
+				     dev_name(map->dev));
+			/* IOVA not supported, skip this iteration */
+			cond_resched();
+			continue;
+		}
+		alloc_etime = ktime_get();
+		alloc_delta = ktime_sub(alloc_etime, alloc_stime);
+
+		/* IOVA linking */
+		link_stime = ktime_get();
+		ret = dma_iova_link(map->dev, &iova_state, phys, 0, size, dir, 0);
+		link_etime = ktime_get();
+		link_delta = ktime_sub(link_etime, link_stime);
+
+		if (ret) {
+			pr_err("dma_iova_link failed on %s\n", dev_name(map->dev));
+			dma_iova_free(map->dev, &iova_state);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		/* IOVA sync */
+		sync_stime = ktime_get();
+		ret = dma_iova_sync(map->dev, &iova_state, 0, size);
+		sync_etime = ktime_get();
+		sync_delta = ktime_sub(sync_etime, sync_stime);
+
+		if (ret) {
+			pr_err("dma_iova_sync failed on %s\n", dev_name(map->dev));
+			dma_iova_unlink(map->dev, &iova_state, 0, size, dir, 0);
+			dma_iova_free(map->dev, &iova_state);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		/* Pretend DMA is transmitting */
+		ndelay(map->bparam.dma_trans_ns);
+
+		/* IOVA destroy */
+		destroy_stime = ktime_get();
+		dma_iova_destroy(map->dev, &iova_state, size, dir, 0);
+		destroy_etime = ktime_get();
+		destroy_delta = ktime_sub(destroy_etime, destroy_stime);
+
+		/* Calculate sum and sum of squares */
+		alloc_100ns = div64_ul(alloc_delta, 100);
+		link_100ns = div64_ul(link_delta, 100);
+		sync_100ns = div64_ul(sync_delta, 100);
+		destroy_100ns = div64_ul(destroy_delta, 100);
+
+		alloc_sq = alloc_100ns * alloc_100ns;
+		link_sq = link_100ns * link_100ns;
+		sync_sq = sync_100ns * sync_100ns;
+		destroy_sq = destroy_100ns * destroy_100ns;
+
+		atomic64_add(alloc_100ns, &map->sum_iova_alloc_100ns);
+		atomic64_add(link_100ns, &map->sum_iova_link_100ns);
+		atomic64_add(sync_100ns, &map->sum_iova_sync_100ns);
+		atomic64_add(destroy_100ns, &map->sum_iova_destroy_100ns);
+
+		atomic64_add(alloc_sq, &map->sum_sq_iova_alloc);
+		atomic64_add(link_sq, &map->sum_sq_iova_link);
+		atomic64_add(sync_sq, &map->sum_sq_iova_sync);
+		atomic64_add(destroy_sq, &map->sum_sq_iova_destroy);
+
+		atomic64_inc(&map->iova_loops);
+
+		cond_resched();
+	}
+
+out:
+	free_pages_exact(buf, size);
+	return ret;
+}
+
+static int benchmark_thread_streaming(void *data)
 {
 	void *buf;
 	dma_addr_t dma_addr;
@@ -96,7 +215,7 @@ static int map_benchmark_thread(void *data)
 		 * we may hangup the kernel in a non-preemptible kernel when
 		 * the test kthreads number >= CPU number, the test kthreads
 		 * will run endless on every CPU since the thread resposible
-		 * for notifying the kthread stop (in do_map_benchmark())
+		 * for notifying the kthread stop (in do_streaming_benchmark())
 		 * could not be scheduled.
 		 *
 		 * Note this may degrade the test concurrency since the test
@@ -112,7 +231,250 @@ static int map_benchmark_thread(void *data)
 	return ret;
 }
 
-static int do_map_benchmark(struct map_benchmark_data *map)
+static int do_iova_benchmark(struct map_benchmark_data *map)
+{
+	struct task_struct **tsk;
+	int threads = map->bparam.threads;
+	int node = map->bparam.node;
+	u64 iova_loops;
+	int ret = 0;
+	int i;
+
+	tsk = kmalloc_array(threads, sizeof(*tsk), GFP_KERNEL);
+	if (!tsk)
+		return -ENOMEM;
+
+	get_device(map->dev);
+
+	/* Create IOVA threads only */
+	for (i = 0; i < threads; i++) {
+		tsk[i] = kthread_create_on_node(benchmark_thread_iova, map,
+				node, "dma-iova-benchmark/%d", i);
+		if (IS_ERR(tsk[i])) {
+			pr_err("create dma_iova thread failed\n");
+			ret = PTR_ERR(tsk[i]);
+			while (--i >= 0)
+				kthread_stop(tsk[i]);
+			goto out;
+		}
+
+		if (node != NUMA_NO_NODE)
+			kthread_bind_mask(tsk[i], cpumask_of_node(node));
+	}
+
+	/* Clear previous IOVA benchmark values */
+	atomic64_set(&map->sum_iova_alloc_100ns, 0);
+	atomic64_set(&map->sum_iova_link_100ns, 0);
+	atomic64_set(&map->sum_iova_sync_100ns, 0);
+	atomic64_set(&map->sum_iova_destroy_100ns, 0);
+	atomic64_set(&map->sum_sq_iova_alloc, 0);
+	atomic64_set(&map->sum_sq_iova_link, 0);
+	atomic64_set(&map->sum_sq_iova_sync, 0);
+	atomic64_set(&map->sum_sq_iova_destroy, 0);
+	atomic64_set(&map->iova_loops, 0);
+
+	/* Start all threads */
+	for (i = 0; i < threads; i++) {
+		get_task_struct(tsk[i]);
+		wake_up_process(tsk[i]);
+	}
+
+	msleep_interruptible(map->bparam.seconds * 1000);
+
+	/* Stop all threads */
+	for (i = 0; i < threads; i++) {
+		int kthread_ret = kthread_stop_put(tsk[i]);
+		if (kthread_ret)
+			ret = kthread_ret;
+	}
+
+	if (ret)
+		goto out;
+
+	/* Calculate IOVA statistics */
+	iova_loops = atomic64_read(&map->iova_loops);
+	if (likely(iova_loops > 0)) {
+		u64 alloc_variance, link_variance, sync_variance, destroy_variance;
+		u64 sum_alloc = atomic64_read(&map->sum_iova_alloc_100ns);
+		u64 sum_link = atomic64_read(&map->sum_iova_link_100ns);
+		u64 sum_sync = atomic64_read(&map->sum_iova_sync_100ns);
+		u64 sum_destroy = atomic64_read(&map->sum_iova_destroy_100ns);
+		u64 sum_sq_alloc = atomic64_read(&map->sum_sq_iova_alloc);
+		u64 sum_sq_link = atomic64_read(&map->sum_sq_iova_link);
+		u64 sum_sq_sync = atomic64_read(&map->sum_sq_iova_sync);
+		u64 sum_sq_destroy = atomic64_read(&map->sum_sq_iova_destroy);
+
+		/* Average latencies */
+		map->bparam.avg_iova_alloc_100ns = div64_u64(sum_alloc, iova_loops);
+		map->bparam.avg_iova_link_100ns = div64_u64(sum_link, iova_loops);
+		map->bparam.avg_iova_sync_100ns = div64_u64(sum_sync, iova_loops);
+		map->bparam.avg_iova_destroy_100ns = div64_u64(sum_destroy, iova_loops);
+
+		/* Standard deviations */
+		alloc_variance = div64_u64(sum_sq_alloc, iova_loops) -
+				map->bparam.avg_iova_alloc_100ns * map->bparam.avg_iova_alloc_100ns;
+		link_variance = div64_u64(sum_sq_link, iova_loops) -
+				map->bparam.avg_iova_link_100ns * map->bparam.avg_iova_link_100ns;
+		sync_variance = div64_u64(sum_sq_sync, iova_loops) -
+				map->bparam.avg_iova_sync_100ns * map->bparam.avg_iova_sync_100ns;
+		destroy_variance = div64_u64(sum_sq_destroy, iova_loops) -
+				map->bparam.avg_iova_destroy_100ns * map->bparam.avg_iova_destroy_100ns;
+
+		map->bparam.iova_alloc_stddev = int_sqrt64(alloc_variance);
+		map->bparam.iova_link_stddev = int_sqrt64(link_variance);
+		map->bparam.iova_sync_stddev = int_sqrt64(sync_variance);
+		map->bparam.iova_destroy_stddev = int_sqrt64(destroy_variance);
+	}
+
+out:
+	put_device(map->dev);
+	kfree(tsk);
+	return ret;
+}
+
+static int do_streaming_iova_benchmark(struct map_benchmark_data *map)
+{
+	struct task_struct **tsk;
+	int threads = map->bparam.threads;
+	int node = map->bparam.node;
+	int regular_threads, iova_threads;
+	u64 loops, iova_loops;
+	int ret = 0;
+	int i;
+
+	tsk = kmalloc_array(threads * 2, sizeof(*tsk), GFP_KERNEL);
+	if (!tsk)
+		return -ENOMEM;
+
+	get_device(map->dev);
+
+	/* Split threads between regular and IOVA testing */
+	regular_threads = threads / 2;
+	iova_threads = threads - regular_threads;
+
+	/* Create streaming DMA threads */
+	for (i = 0; i < regular_threads; i++) {
+		tsk[i] = kthread_create_on_node(benchmark_thread_streaming, map,
+				node, "dma-streaming-benchmark/%d", i);
+		if (IS_ERR(tsk[i])) {
+			pr_err("create dma_map thread failed\n");
+			ret = PTR_ERR(tsk[i]);
+			while (--i >= 0)
+				kthread_stop(tsk[i]);
+			goto out;
+		}
+
+		if (node != NUMA_NO_NODE)
+			kthread_bind_mask(tsk[i], cpumask_of_node(node));
+	}
+
+	/* Create IOVA DMA threads */
+	for (i = regular_threads; i < threads; i++) {
+		tsk[i] = kthread_create_on_node(benchmark_thread_iova, map,
+				node, "dma-iova-benchmark/%d", i - regular_threads);
+		if (IS_ERR(tsk[i])) {
+			pr_err("create dma_iova thread failed\n");
+			ret = PTR_ERR(tsk[i]);
+			while (--i >= 0)
+				kthread_stop(tsk[i]);
+			goto out;
+		}
+
+		if (node != NUMA_NO_NODE)
+			kthread_bind_mask(tsk[i], cpumask_of_node(node));
+	}
+
+	/* Clear previous benchmark values */
+	atomic64_set(&map->sum_map_100ns, 0);
+	atomic64_set(&map->sum_unmap_100ns, 0);
+	atomic64_set(&map->sum_sq_map, 0);
+	atomic64_set(&map->sum_sq_unmap, 0);
+	atomic64_set(&map->loops, 0);
+
+	atomic64_set(&map->sum_iova_alloc_100ns, 0);
+	atomic64_set(&map->sum_iova_link_100ns, 0);
+	atomic64_set(&map->sum_iova_sync_100ns, 0);
+	atomic64_set(&map->sum_iova_destroy_100ns, 0);
+	atomic64_set(&map->sum_sq_iova_alloc, 0);
+	atomic64_set(&map->sum_sq_iova_link, 0);
+	atomic64_set(&map->sum_sq_iova_sync, 0);
+	atomic64_set(&map->sum_sq_iova_destroy, 0);
+	atomic64_set(&map->iova_loops, 0);
+
+	/* Start all threads */
+	for (i = 0; i < threads; i++) {
+		get_task_struct(tsk[i]);
+		wake_up_process(tsk[i]);
+	}
+
+	msleep_interruptible(map->bparam.seconds * 1000);
+
+	/* Stop all threads */
+	for (i = 0; i < threads; i++) {
+		int kthread_ret = kthread_stop_put(tsk[i]);
+		if (kthread_ret)
+			ret = kthread_ret;
+	}
+
+	if (ret)
+		goto out;
+
+	/* Calculate streaming DMA statistics */
+	loops = atomic64_read(&map->loops);
+	if (loops > 0) {
+		u64 map_variance, unmap_variance;
+		u64 sum_map = atomic64_read(&map->sum_map_100ns);
+		u64 sum_unmap = atomic64_read(&map->sum_unmap_100ns);
+		u64 sum_sq_map = atomic64_read(&map->sum_sq_map);
+		u64 sum_sq_unmap = atomic64_read(&map->sum_sq_unmap);
+
+		map->bparam.avg_map_100ns = div64_u64(sum_map, loops);
+		map->bparam.avg_unmap_100ns = div64_u64(sum_unmap, loops);
+
+		map_variance = div64_u64(sum_sq_map, loops) -
+				map->bparam.avg_map_100ns * map->bparam.avg_map_100ns;
+		unmap_variance = div64_u64(sum_sq_unmap, loops) -
+				map->bparam.avg_unmap_100ns * map->bparam.avg_unmap_100ns;
+		map->bparam.map_stddev = int_sqrt64(map_variance);
+		map->bparam.unmap_stddev = int_sqrt64(unmap_variance);
+	}
+
+	/* Calculate IOVA statistics */
+	iova_loops = atomic64_read(&map->iova_loops);
+	if (iova_loops > 0) {
+		u64 alloc_variance, link_variance, sync_variance, destroy_variance;
+		u64 sum_alloc = atomic64_read(&map->sum_iova_alloc_100ns);
+		u64 sum_link = atomic64_read(&map->sum_iova_link_100ns);
+		u64 sum_sync = atomic64_read(&map->sum_iova_sync_100ns);
+		u64 sum_destroy = atomic64_read(&map->sum_iova_destroy_100ns);
+
+		map->bparam.avg_iova_alloc_100ns = div64_u64(sum_alloc, iova_loops);
+		map->bparam.avg_iova_link_100ns = div64_u64(sum_link, iova_loops);
+		map->bparam.avg_iova_sync_100ns = div64_u64(sum_sync, iova_loops);
+		map->bparam.avg_iova_destroy_100ns = div64_u64(sum_destroy, iova_loops);
+
+		alloc_variance = div64_u64(atomic64_read(&map->sum_sq_iova_alloc), iova_loops) -
+				map->bparam.avg_iova_alloc_100ns * map->bparam.avg_iova_alloc_100ns;
+		link_variance = div64_u64(atomic64_read(&map->sum_sq_iova_link), iova_loops) -
+				map->bparam.avg_iova_link_100ns * map->bparam.avg_iova_link_100ns;
+		sync_variance = div64_u64(atomic64_read(&map->sum_sq_iova_sync), iova_loops) -
+				map->bparam.avg_iova_sync_100ns * map->bparam.avg_iova_sync_100ns;
+		destroy_variance = div64_u64(atomic64_read(&map->sum_sq_iova_destroy), iova_loops) -
+				map->bparam.avg_iova_destroy_100ns * map->bparam.avg_iova_destroy_100ns;
+
+		map->bparam.iova_alloc_stddev = int_sqrt64(alloc_variance);
+		map->bparam.iova_link_stddev = int_sqrt64(link_variance);
+		map->bparam.iova_sync_stddev = int_sqrt64(sync_variance);
+		map->bparam.iova_destroy_stddev = int_sqrt64(destroy_variance);
+	}
+
+out:
+	put_device(map->dev);
+	kfree(tsk);
+	return ret;
+}
+
+static int do_streaming_benchmark(struct map_benchmark_data *map)
 {
 	struct task_struct **tsk;
 	int threads = map->bparam.threads;
@@ -128,8 +490,8 @@ static int do_map_benchmark(struct map_benchmark_data *map)
 	get_device(map->dev);
 
 	for (i = 0; i < threads; i++) {
-		tsk[i] = kthread_create_on_node(map_benchmark_thread, map,
-				map->bparam.node, "dma-map-benchmark/%d", i);
+		tsk[i] = kthread_create_on_node(benchmark_thread_streaming, map,
+				map->bparam.node, "dma-streaming-benchmark/%d", i);
 		if (IS_ERR(tsk[i])) {
 			pr_err("create dma_map thread failed\n");
 			ret = PTR_ERR(tsk[i]);
@@ -260,6 +622,11 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 	if (ret)
 		return ret;
 
+	if (!use_dma_iommu(map->dev))
+		map->bparam.has_iommu_dma = 0;
+	else
+		map->bparam.has_iommu_dma = 1;
+
 	switch (cmd) {
 	case DMA_MAP_BENCHMARK:
 		old_dma_mask = dma_get_mask(map->dev);
@@ -272,7 +639,7 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 		}
 
 		/* Run streaming DMA benchmark */
-		ret = do_map_benchmark(map);
+		ret = do_streaming_benchmark(map);
 
 		/*
 		 * restore the original dma_mask as many devices' dma_mask are
@@ -285,6 +652,44 @@ static long map_benchmark_ioctl(struct file *file, unsigned int cmd,
 		if (ret)
 			return ret;
 		break;
+
+	case DMA_MAP_BENCHMARK_IOVA:
+		if (!use_dma_iommu(map->dev)) {
+			pr_info("IOVA API is not supported on this device, lacks IOMMU DMA%s\n",
+				dev_name(map->dev));
+			return -EOPNOTSUPP;
+		}
+		/* Validate IOVA-specific parameters */
+		if (map->bparam.use_iova > 2) {
+			pr_err("invalid IOVA mode, must be 0-2\n");
+			return -EINVAL;
+		}
+
+		/* Save and set DMA mask */
+		old_dma_mask = dma_get_mask(map->dev);
+		ret = dma_set_mask(map->dev, DMA_BIT_MASK(map->bparam.dma_bits));
+		if (ret) {
+			pr_err("failed to set dma_mask on device %s\n",
+				dev_name(map->dev));
+			return -EINVAL;
+		}
+
+		/* Choose benchmark type based on use_iova field */
+		if (map->bparam.use_iova == 2) {
+			/* Both regular and IOVA */
+			ret = do_streaming_iova_benchmark(map);
+		} else if (map->bparam.use_iova == 1) {
+			/* IOVA only */
+			ret = do_iova_benchmark(map);
+		}
+
+		/* Restore original DMA mask */
+		dma_set_mask(map->dev, old_dma_mask);
+
+		if (ret)
+			return ret;
+		break;
+
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c
index b12f1f9babf8..4a2158aa56b6 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -31,10 +31,11 @@ int main(int argc, char **argv)
 	int bits = 32, xdelay = 0, dir = DMA_MAP_BIDIRECTIONAL;
 	/* default granule 1 PAGESIZE */
 	int granule = 1;
+	int use_iova = 0;
 
 	int cmd = DMA_MAP_BENCHMARK;
 
-	while ((opt = getopt(argc, argv, "t:s:n:b:d:x:g:")) != -1) {
+	while ((opt = getopt(argc, argv, "t:s:n:b:d:x:g:i:")) != -1) {
 		switch (opt) {
 		case 't':
 			threads = atoi(optarg);
@@ -51,6 +52,11 @@ int main(int argc, char **argv)
 		case 'd':
 			dir = atoi(optarg);
 			break;
+		case 'i':
+			use_iova = atoi(optarg);
+			if (use_iova)
+				cmd = DMA_MAP_BENCHMARK_IOVA;
+			break;
 		case 'x':
 			xdelay = atoi(optarg);
 			break;
@@ -111,18 +117,143 @@ int main(int argc, char **argv)
 	map.dma_dir = dir;
 	map.dma_trans_ns = xdelay;
 	map.granule = granule;
+	map.use_iova = use_iova;
 
 	if (ioctl(fd, cmd, &map)) {
 		perror("ioctl");
 		exit(1);
 	}
 
-	printf("dma mapping benchmark: threads:%d seconds:%d node:%d dir:%s granule: %d\n",
-			threads, seconds, node, dir[directions], granule);
-	printf("average map latency(us):%.1f standard deviation:%.1f\n",
-			map.avg_map_100ns/10.0, map.map_stddev/10.0);
-	printf("average unmap latency(us):%.1f standard deviation:%.1f\n",
-			map.avg_unmap_100ns/10.0, map.unmap_stddev/10.0);
+	printf("=== DMA Mapping Benchmark Results ===\n");
+	printf("Configuration: threads:%d seconds:%d node:%d dir:%s granule:%d iova:%d, has_iommu_dma:%d\n",
+	       threads, seconds, node, directions[dir], granule, use_iova, map.has_iommu_dma);
+	printf("Buffer size: %d pages (%d KB)\n", granule, granule * 4);
+	printf("\n");
+
+	if (use_iova == 0 || use_iova == 2) {
+	    printf("STREAMING DMA RESULTS:\n");
+	    printf("  Map   latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_map_100ns/10.0, map.map_stddev/10.0);
+	    printf("  Unmap latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_unmap_100ns/10.0, map.unmap_stddev/10.0);
+
+	    double streaming_total = map.avg_map_100ns/10.0 + map.avg_unmap_100ns/10.0;
+	    printf("  Total latency: %7.1f μs\n", streaming_total);
+	    printf("\n");
+	}
+
+	if (map.has_iommu_dma && (use_iova == 1 || use_iova == 2)) {
+	    printf("IOVA DMA RESULTS:\n");
+	    printf("  Alloc   latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_iova_alloc_100ns/10.0, map.iova_alloc_stddev/10.0);
+	    printf("  Link    latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_iova_link_100ns/10.0, map.iova_link_stddev/10.0);
+	    printf("  Sync    latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_iova_sync_100ns/10.0, map.iova_sync_stddev/10.0);
+	    printf("  Destroy latency: %7.1f μs (σ=%5.1f μs)\n",
+		   map.avg_iova_destroy_100ns/10.0, map.iova_destroy_stddev/10.0);
+
+	    double iova_total = map.avg_iova_alloc_100ns/10.0 + map.avg_iova_link_100ns/10.0 +
+				map.avg_iova_sync_100ns/10.0 + map.avg_iova_destroy_100ns/10.0;
+	    printf("  Total latency: %7.1f μs\n", iova_total);
+	    printf("\n");
+	}
+
+	/* Performance comparison for both modes */
+	if (map.has_iommu_dma && use_iova == 2) {
+	    double streaming_total = map.avg_map_100ns/10.0 + map.avg_unmap_100ns/10.0;
+	    double iova_total = map.avg_iova_alloc_100ns/10.0 + map.avg_iova_link_100ns/10.0 +
+				map.avg_iova_sync_100ns/10.0 + map.avg_iova_destroy_100ns/10.0;
+
+	    printf("PERFORMANCE COMPARISON:\n");
+	    printf("  Streaming DMA total: %7.1f μs\n", streaming_total);
+	    printf("  IOVA DMA total:      %7.1f μs\n", iova_total);
+
+	    if (streaming_total > 0) {
+		double performance_ratio = iova_total / streaming_total;
+		double performance_diff = ((iova_total - streaming_total) / streaming_total) * 100.0;
+
+		printf("  Performance ratio:   %7.2fx", performance_ratio);
+		if (performance_ratio < 1.0) {
+		    printf(" (IOVA is %.1f%% faster)\n", -performance_diff);
+		} else {
+		    printf(" (IOVA is %.1f%% slower)\n", performance_diff);
+		}
+
+		// Throughput analysis (operations per second)
+		double streaming_ops_per_sec = 1000000.0 / streaming_total;
+		double iova_ops_per_sec = 1000000.0 / iova_total;
+
+		printf("  Streaming throughput: %8.0f ops/sec\n", streaming_ops_per_sec);
+		printf("  IOVA throughput:      %8.0f ops/sec\n", iova_ops_per_sec);
+
+		/* Memory bandwidth estimate (if applicable) */
+		double buffer_kb = granule * 4.0;
+		double streaming_bw = (streaming_ops_per_sec * buffer_kb) / 1024.0; // MB/s
+		double iova_bw = (iova_ops_per_sec * buffer_kb) / 1024.0; // MB/s
+
+		printf("  Streaming bandwidth:  %8.1f MB/s\n", streaming_bw);
+		printf("  IOVA bandwidth:       %8.1f MB/s\n", iova_bw);
+	    }
+	    printf("\n");
+	}
+
+	/* IOVA breakdown analysis (for IOVA modes) */
+	if (map.has_iommu_dma && (use_iova == 1 || use_iova == 2)) {
+	    double iova_total = map.avg_iova_alloc_100ns/10.0 + map.avg_iova_link_100ns/10.0 +
+				map.avg_iova_sync_100ns/10.0 + map.avg_iova_destroy_100ns/10.0;
+
+	    if (iova_total > 0) {
+		printf("IOVA OPERATION BREAKDOWN:\n");
+		printf("  Alloc:   %5.1f%% (%6.1f μs)\n",
+		       (map.avg_iova_alloc_100ns/10.0 / iova_total) * 100.0, map.avg_iova_alloc_100ns/10.0);
+		printf("  Link:    %5.1f%% (%6.1f μs)\n",
+		       (map.avg_iova_link_100ns/10.0 / iova_total) * 100.0, map.avg_iova_link_100ns/10.0);
+		printf("  Sync:    %5.1f%% (%6.1f μs)\n",
+		       (map.avg_iova_sync_100ns/10.0 / iova_total) * 100.0, map.avg_iova_sync_100ns/10.0);
+		printf("  Destroy: %5.1f%% (%6.1f μs)\n",
+		       (map.avg_iova_destroy_100ns/10.0 / iova_total) * 100.0, map.avg_iova_destroy_100ns/10.0);
+		printf("\n");
+	    }
+	}
+
+	/* Recommendations based on results */
+	if (map.has_iommu_dma && use_iova == 2) {
+	    double streaming_total = map.avg_map_100ns/10.0 + map.avg_unmap_100ns/10.0;
+	    double iova_total = map.avg_iova_alloc_100ns/10.0 + map.avg_iova_link_100ns/10.0 +
+				map.avg_iova_sync_100ns/10.0 + map.avg_iova_destroy_100ns/10.0;
+
+	    printf("RECOMMENDATIONS:\n");
+	    if (iova_total < streaming_total * 0.9) {
+		printf("  ✓ IOVA API shows significant performance benefits\n");
+		printf("  ✓ Consider using IOVA API for this workload\n");
+	    } else if (iova_total < streaming_total * 1.1) {
+		printf("  ~ IOVA and Streaming APIs show similar performance\n");
+	    } else {
+		printf("  ⚠ Streaming API outperforms IOVA API for this benchmark\n");
+
+		/* Identify bottlenecks */
+		double max_iova_op = map.avg_iova_alloc_100ns/10.0;
+		const char* bottleneck = "alloc";
+
+		if (map.avg_iova_link_100ns/10.0 > max_iova_op) {
+		    max_iova_op = map.avg_iova_link_100ns/10.0;
+		    bottleneck = "link";
+		}
+		if (map.avg_iova_sync_100ns/10.0 > max_iova_op) {
+		    max_iova_op = map.avg_iova_sync_100ns/10.0;
+		    bottleneck = "sync";
+		}
+		if (map.avg_iova_destroy_100ns/10.0 > max_iova_op) {
+		    max_iova_op = map.avg_iova_destroy_100ns/10.0;
+		    bottleneck = "destroy";
+		}
+
+		printf("  ➤ Primary bottleneck appears to be IOVA %s operation\n", bottleneck);
+	    }
+	}
+
+	printf("=== End of Benchmark ===\n");
 
 	return 0;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] dma: fake-dma and IOVA tests
  2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
                   ` (5 preceding siblings ...)
  2025-05-20 22:39 ` [PATCH 6/6] dma-mapping: benchmark: add IOVA support Luis Chamberlain
@ 2025-05-21 11:17 ` Leon Romanovsky
  6 siblings, 0 replies; 20+ messages in thread
From: Leon Romanovsky @ 2025-05-21 11:17 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: vkoul, chenxiang66, m.szyprowski, robin.murphy, jgg,
	alex.williamson, joel.granados, iommu, dmaengine, linux-block,
	gost.dev

On Tue, May 20, 2025 at 03:39:07PM -0700, Luis Chamberlain wrote:
> We don't seem to have unit tests for the DMA IOVA API, so I figured
> we should add some so to ensure we don't regress moving forward, and it allows
> us to extend these later. Its best to just extend existing tests though. I've
> found two tests so I've extended them as part of this patchset:
> 
>   - drivers/dma/dmatest.c
>   - kernel/dma/map_benchmark.c
> 
> However running the dmatest requires some old x86 emulation or some
> non-upstream qemu patches for intel IOAT a q35 system. This make this
> easier by providing a simple in-kernel fake-dma controller to let you test
> run all dmatests on most systems. The only issue I found with that was not
> being able to get the platform device through an IOMMU for DMA. If folks have
> an idea of how to make it easy for a platform device to get an IOMMU for DMA
> it would make it easier to allow us to leverage the existing dmatest for
> IOVA as well. I only tried briefly with virtio and vfio_iommu_type1, but gave
> up fast. Not sure if its easy to later allow a platform device like this
> one to leverage it to make it easier for testing.

I'm not sure if this is what you meant, but I'm configuring QEMU in nested VM
mode. It gives me emulated hypervisor with working IOMMU path, which I
tested with NVMe and RDMA.

My QEMU command line is:
/usr/bin/qemu-system-x86_64 -append root=/dev/root rw \
	ignore_loglevel rootfstype=9p rootflags="cache=loose,trans=virtio" \
	earlyprintk=serial,ttyS0,115200 console=hvc0 panic_on_warn=1 intel_iommu=on \
	iommu=nopt iommu.forcedac=1 vfio_iommu_type1.allow_unsafe_interrupts=1 \
	systemd.hostname=mtl-leonro-d-vm \
	-chardev stdio,id=stdio,mux=on,signal=off \
	-cpu host \
	-device virtio-rng-pci \
	-device virtio-balloon-pci \
	-device isa-serial,chardev=stdio \
	-device virtio-serial-pci \
	-device virtconsole,chardev=stdio \
	-device virtio-9p-pci,fsdev=host_fs,mount_tag=/dev/root \
	-device virtio-9p-pci,fsdev=host_bind_fs0,mount_tag=bind0 \
	-device virtio-9p-pci,fsdev=host_bind_fs1,mount_tag=bind1 \
	-device virtio-9p-pci,fsdev=host_bind_fs2,mount_tag=bind2 \
	-device intel-iommu,intremap=on \
	-device nvme-subsys,id=bar \
	-device nvme,id=baz,subsys=bar,serial=qux \
	-device nvme-ns,drive=foo,bus=baz,logical_block_size=4096,physical_block_size=4096,ms=16 \
	-drive file=/home/leonro/.cache/mellanox/mkt/nvme-1g.raw,format=raw,if=none,id=foo \
	-enable-kvm \
	-fsdev local,id=host_bind_fs2,security_model=none,path=/home/leonro \
	-fsdev local,id=host_bind_fs0,security_model=none,path=/plugins \
	-fsdev local,id=host_fs,security_model=none,path=/mnt/self \
	-fsdev local,id=host_bind_fs1,security_model=none,path=/logs \
	-fw_cfg etc/sercon-port,string=2 \
	-kernel /home/leonro/src/kernel/arch/x86/boot/bzImage \
	-m 4G \
	-machine q35,kernel-irqchip=split \
	-mon chardev=stdio \
	-net nic,model=virtio,macaddr=52:54:9a:c5:60:66 \
	-net user,hostfwd=tcp:127.0.0.1:54409-:22 \
	-no-reboot \
	-nodefaults \
	-nographic \
	-smp 64 \
	-vga none%     

> 
> The kernel/dma/map_benchmark.c test is extended as well, for that I was
> able to add follow the instructions on the first commit from that test,
> by unbinding a device and attaching it to the map benchmark.
> 
> I tried twiddle a mocked IOMMU with iommufd on a q35 guest, but alas,
> that just didn't work as I'd hope, ie, nothing, and so this is the best
> I have for now to help test IOVA DMA API on a virtualized setup.
> 
> Let me know if others have other recomendations.
> 
> The hope is to get a CI eventually going to ensure these don't regress.
> 
> Luis Chamberlain (6):
>   fake-dma: add fake dma engine driver
>   dmatest: split dmatest_func() into helpers
>   dmatest: move printing to its own routine
>   dmatest: add IOVA tests
>   dma-mapping: benchmark: move validation parameters into a helper
>   dma-mapping: benchmark: add IOVA support
> 
>  drivers/dma/Kconfig                           |  11 +
>  drivers/dma/Makefile                          |   1 +
>  drivers/dma/dmatest.c                         | 795 ++++++++++++------
>  drivers/dma/fake-dma.c                        | 718 ++++++++++++++++
>  include/linux/map_benchmark.h                 |  11 +
>  kernel/dma/Kconfig                            |   4 +-
>  kernel/dma/map_benchmark.c                    | 512 +++++++++--
>  .../testing/selftests/dma/dma_map_benchmark.c | 145 +++-
>  8 files changed, 1864 insertions(+), 333 deletions(-)
>  create mode 100644 drivers/dma/fake-dma.c
> 
> -- 
> 2.47.2
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] dma-mapping: benchmark: add IOVA support
  2025-05-20 22:39 ` [PATCH 6/6] dma-mapping: benchmark: add IOVA support Luis Chamberlain
@ 2025-05-21 11:58   ` kernel test robot
  2025-05-21 16:08   ` Robin Murphy
  1 sibling, 0 replies; 20+ messages in thread
From: kernel test robot @ 2025-05-21 11:58 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, robin.murphy,
	leon, jgg, alex.williamson, joel.granados
  Cc: llvm, oe-kbuild-all, iommu, dmaengine, linux-block, gost.dev,
	mcgrof

Hi Luis,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.15-rc7 next-20250516]
[cannot apply to vkoul-dmaengine/next shuah-kselftest/next shuah-kselftest/fixes sysctl/sysctl-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/fake-dma-add-fake-dma-engine-driver/20250521-064035
base:   linus/master
patch link:    https://lore.kernel.org/r/20250520223913.3407136-7-mcgrof%40kernel.org
patch subject: [PATCH 6/6] dma-mapping: benchmark: add IOVA support
config: hexagon-randconfig-002-20250521 (https://download.01.org/0day-ci/archive/20250521/202505211909.CzQtqtu8-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250521/202505211909.CzQtqtu8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505211909.CzQtqtu8-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/dma/map_benchmark.c:60:25: error: variable has incomplete type 'struct dma_iova_state'
      60 |                 struct dma_iova_state iova_state;
         |                                       ^
   kernel/dma/map_benchmark.c:60:10: note: forward declaration of 'struct dma_iova_state'
      60 |                 struct dma_iova_state iova_state;
         |                        ^
   kernel/dma/map_benchmark.c:76:8: error: call to undeclared function 'dma_iova_try_alloc'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      76 |                 if (!dma_iova_try_alloc(map->dev, &iova_state, phys, size)) {
         |                      ^
   kernel/dma/map_benchmark.c:88:9: error: call to undeclared function 'dma_iova_link'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      88 |                 ret = dma_iova_link(map->dev, &iova_state, phys, 0, size, dir, 0);
         |                       ^
   kernel/dma/map_benchmark.c:94:4: error: call to undeclared function 'dma_iova_free'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
      94 |                         dma_iova_free(map->dev, &iova_state);
         |                         ^
   kernel/dma/map_benchmark.c:101:9: error: call to undeclared function 'dma_iova_sync'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     101 |                 ret = dma_iova_sync(map->dev, &iova_state, 0, size);
         |                       ^
   kernel/dma/map_benchmark.c:107:4: error: call to undeclared function 'dma_iova_unlink'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     107 |                         dma_iova_unlink(map->dev, &iova_state, 0, size, dir, 0);
         |                         ^
   kernel/dma/map_benchmark.c:108:4: error: call to undeclared function 'dma_iova_free'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     108 |                         dma_iova_free(map->dev, &iova_state);
         |                         ^
   kernel/dma/map_benchmark.c:118:3: error: call to undeclared function 'dma_iova_destroy'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     118 |                 dma_iova_destroy(map->dev, &iova_state, size, dir, 0);
         |                 ^
>> kernel/dma/map_benchmark.c:340:23: warning: variable 'iova_threads' set but not used [-Wunused-but-set-variable]
     340 |         int regular_threads, iova_threads;
         |                              ^
   1 warning and 8 errors generated.


vim +/iova_threads +340 kernel/dma/map_benchmark.c

   334	
   335	static int do_streaming_iova_benchmark(struct map_benchmark_data *map)
   336	{
   337		struct task_struct **tsk;
   338		int threads = map->bparam.threads;
   339		int node = map->bparam.node;
 > 340		int regular_threads, iova_threads;
   341		u64 loops, iova_loops;
   342		int ret = 0;
   343		int i;
   344	
   345		tsk = kmalloc_array(threads * 2, sizeof(*tsk), GFP_KERNEL);
   346		if (!tsk)
   347			return -ENOMEM;
   348	
   349		get_device(map->dev);
   350	
   351		/* Split threads between regular and IOVA testing */
   352		regular_threads = threads / 2;
   353		iova_threads = threads - regular_threads;
   354	
   355		/* Create streaming DMA threads */
   356		for (i = 0; i < regular_threads; i++) {
   357			tsk[i] = kthread_create_on_node(benchmark_thread_streaming, map,
   358					node, "dma-streaming-benchmark/%d", i);
   359			if (IS_ERR(tsk[i])) {
   360				pr_err("create dma_map thread failed\n");
   361				ret = PTR_ERR(tsk[i]);
   362				while (--i >= 0)
   363					kthread_stop(tsk[i]);
   364				goto out;
   365			}
   366	
   367			if (node != NUMA_NO_NODE)
   368				kthread_bind_mask(tsk[i], cpumask_of_node(node));
   369		}
   370	
   371		/* Create IOVA DMA threads */
   372		for (i = regular_threads; i < threads; i++) {
   373			tsk[i] = kthread_create_on_node(benchmark_thread_iova, map,
   374					node, "dma-iova-benchmark/%d", i - regular_threads);
   375			if (IS_ERR(tsk[i])) {
   376				pr_err("create dma_iova thread failed\n");
   377				ret = PTR_ERR(tsk[i]);
   378				while (--i >= 0)
   379					kthread_stop(tsk[i]);
   380				goto out;
   381			}
   382	
   383			if (node != NUMA_NO_NODE)
   384				kthread_bind_mask(tsk[i], cpumask_of_node(node));
   385		}
   386	
   387		/* Clear previous benchmark values */
   388		atomic64_set(&map->sum_map_100ns, 0);
   389		atomic64_set(&map->sum_unmap_100ns, 0);
   390		atomic64_set(&map->sum_sq_map, 0);
   391		atomic64_set(&map->sum_sq_unmap, 0);
   392		atomic64_set(&map->loops, 0);
   393	
   394		atomic64_set(&map->sum_iova_alloc_100ns, 0);
   395		atomic64_set(&map->sum_iova_link_100ns, 0);
   396		atomic64_set(&map->sum_iova_sync_100ns, 0);
   397		atomic64_set(&map->sum_iova_destroy_100ns, 0);
   398		atomic64_set(&map->sum_sq_iova_alloc, 0);
   399		atomic64_set(&map->sum_sq_iova_link, 0);
   400		atomic64_set(&map->sum_sq_iova_sync, 0);
   401		atomic64_set(&map->sum_sq_iova_destroy, 0);
   402		atomic64_set(&map->iova_loops, 0);
   403	
   404		/* Start all threads */
   405		for (i = 0; i < threads; i++) {
   406			get_task_struct(tsk[i]);
   407			wake_up_process(tsk[i]);
   408		}
   409	
   410		msleep_interruptible(map->bparam.seconds * 1000);
   411	
   412		/* Stop all threads */
   413		for (i = 0; i < threads; i++) {
   414			int kthread_ret = kthread_stop_put(tsk[i]);
   415			if (kthread_ret)
   416				ret = kthread_ret;
   417		}
   418	
   419		if (ret)
   420			goto out;
   421	
   422		/* Calculate streaming DMA statistics */
   423		loops = atomic64_read(&map->loops);
   424		if (loops > 0) {
   425			u64 map_variance, unmap_variance;
   426			u64 sum_map = atomic64_read(&map->sum_map_100ns);
   427			u64 sum_unmap = atomic64_read(&map->sum_unmap_100ns);
   428			u64 sum_sq_map = atomic64_read(&map->sum_sq_map);
   429			u64 sum_sq_unmap = atomic64_read(&map->sum_sq_unmap);
   430	
   431			map->bparam.avg_map_100ns = div64_u64(sum_map, loops);
   432			map->bparam.avg_unmap_100ns = div64_u64(sum_unmap, loops);
   433	
   434			map_variance = div64_u64(sum_sq_map, loops) -
   435					map->bparam.avg_map_100ns * map->bparam.avg_map_100ns;
   436			unmap_variance = div64_u64(sum_sq_unmap, loops) -
   437					map->bparam.avg_unmap_100ns * map->bparam.avg_unmap_100ns;
   438			map->bparam.map_stddev = int_sqrt64(map_variance);
   439			map->bparam.unmap_stddev = int_sqrt64(unmap_variance);
   440		}
   441	
   442		/* Calculate IOVA statistics */
   443		iova_loops = atomic64_read(&map->iova_loops);
   444		if (iova_loops > 0) {
   445			u64 alloc_variance, link_variance, sync_variance, destroy_variance;
   446			u64 sum_alloc = atomic64_read(&map->sum_iova_alloc_100ns);
   447			u64 sum_link = atomic64_read(&map->sum_iova_link_100ns);
   448			u64 sum_sync = atomic64_read(&map->sum_iova_sync_100ns);
   449			u64 sum_destroy = atomic64_read(&map->sum_iova_destroy_100ns);
   450	
   451			map->bparam.avg_iova_alloc_100ns = div64_u64(sum_alloc, iova_loops);
   452			map->bparam.avg_iova_link_100ns = div64_u64(sum_link, iova_loops);
   453			map->bparam.avg_iova_sync_100ns = div64_u64(sum_sync, iova_loops);
   454			map->bparam.avg_iova_destroy_100ns = div64_u64(sum_destroy, iova_loops);
   455	
   456			alloc_variance = div64_u64(atomic64_read(&map->sum_sq_iova_alloc), iova_loops) -
   457					map->bparam.avg_iova_alloc_100ns * map->bparam.avg_iova_alloc_100ns;
   458			link_variance = div64_u64(atomic64_read(&map->sum_sq_iova_link), iova_loops) -
   459					map->bparam.avg_iova_link_100ns * map->bparam.avg_iova_link_100ns;
   460			sync_variance = div64_u64(atomic64_read(&map->sum_sq_iova_sync), iova_loops) -
   461					map->bparam.avg_iova_sync_100ns * map->bparam.avg_iova_sync_100ns;
   462			destroy_variance = div64_u64(atomic64_read(&map->sum_sq_iova_destroy), iova_loops) -
   463					map->bparam.avg_iova_destroy_100ns * map->bparam.avg_iova_destroy_100ns;
   464	
   465			map->bparam.iova_alloc_stddev = int_sqrt64(alloc_variance);
   466			map->bparam.iova_link_stddev = int_sqrt64(link_variance);
   467			map->bparam.iova_sync_stddev = int_sqrt64(sync_variance);
   468			map->bparam.iova_destroy_stddev = int_sqrt64(destroy_variance);
   469		}
   470	
   471	out:
   472		put_device(map->dev);
   473		kfree(tsk);
   474		return ret;
   475	}
   476	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
@ 2025-05-21 14:20   ` Robin Murphy
  2025-05-21 17:07     ` Luis Chamberlain
  2025-05-21 23:40   ` kernel test robot
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2025-05-21 14:20 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev

On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> Today on x86_64 q35 guests we can't easily test some of the DMA API
> with the dmatest out of the box because we lack a DMA engine as the
> current qemu intel IOT patches are out of tree. This implements a basic
> dma engine to let us use the dmatest API to expand on it and leverage
> it on q35 guests.

What does doing so ultimately achieve though? It's clearly not very 
meaningful to test the performance or functionality of this "dmaengine" 
itself, which is the main purpose of dmatest. Nor would it be useful for 
using dmatest as a memory stressor *alongside* a CPU workload, since 
nobody needs a kernel driver to make another CPU thrash memory. All this 
fake implementation can really provide is the side-effect of dmatest 
exercising calls to the DMA mapping API the same way that 
dma_map_benchmark already does, and, well, dma_map_benchmark is 
dedicated to exactly that purpose so why not just use it? If there are 
certain interesting mapping patterns that dmatest happens to do that 
dma_map_benchmark doesn't, let's improve dma_map_benchmark; if there's a 
real need for a fake device because some systems have no spare physical 
devices that are practical to unbind, there's no reason 
dma_map_benchmark couldn't trivially provide its own either...

We clearly cannot validate much actual DMA API functionality without a 
real DMA device reading/writing data through the mapping itself. All we 
can do in software at the kernel level is make the calls and maybe check 
that they return expected values based on assumptions about the backend 
- at best this could cover SWIOTLB bouncing of data as well (at least in 
the case of a non-remapped SWIOTLB buffer such that phys_to_virt() on 
the DMA address doesn't go horribly wrong), but that's effectively still 
in the CPU domain anyway. Beyond that, there is obviously no value in 
spending effort on new ways to confirm that a CPU can read data that a 
CPU has written - we already have a comprehensive test suite for that, 
it's called "Linux". The fact that an API call returns a DMA address 
does not and cannot prove that cache coherency management, IOMMU 
configuration, etc. has all been done correctly, unless it's done for a 
real device which is then capable of actually observing all those 
effects by accessing that DMA address itself. A CPU access fundamentally 
cannot do that.

Thanks,
Robin.

> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   drivers/dma/Kconfig    |  11 +
>   drivers/dma/Makefile   |   1 +
>   drivers/dma/fake-dma.c | 718 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 730 insertions(+)
>   create mode 100644 drivers/dma/fake-dma.c
> 
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index df2d2dc00a05..716531f2c7e2 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -140,6 +140,17 @@ config DMA_BCM2835
>   	select DMA_ENGINE
>   	select DMA_VIRTUAL_CHANNELS
>   
> +config DMA_FAKE
> +	tristate "Fake DMA Engine"
> +	select DMA_ENGINE
> +	select DMA_VIRTUAL_CHANNELS
> +	help
> +	  This implements a fake DMA engine. Useful for testing the DMA API
> +	  without any hardware requirements, on any architecture which just
> +	  supporst the DMA engine. Enable this if you want to easily run custom
> +	  tests on the DMA API without a real DMA engine or the requirement for
> +	  things like qemu to virtualize it for you.
> +
>   config DMA_JZ4780
>   	tristate "JZ4780 DMA support"
>   	depends on MIPS || COMPILE_TEST
> diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
> index 19ba465011a6..c75e4b7ad9f2 100644
> --- a/drivers/dma/Makefile
> +++ b/drivers/dma/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
>   obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
>   obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
>   obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
> +obj-$(CONFIG_DMA_FAKE) += fake-dma.o
>   obj-$(CONFIG_DMA_JZ4780) += dma-jz4780.o
>   obj-$(CONFIG_DMA_SA11X0) += sa11x0-dma.o
>   obj-$(CONFIG_DMA_SUN4I) += sun4i-dma.o
> diff --git a/drivers/dma/fake-dma.c b/drivers/dma/fake-dma.c
> new file mode 100644
> index 000000000000..ee1d788a2b83
> --- /dev/null
> +++ b/drivers/dma/fake-dma.c
> @@ -0,0 +1,718 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1
> +/*
> + * Fake DMA engine test module. This allows us to test DMA engines
> + * without leveraging virtualization.
> + *
> + * Copyright (C) 2025 Luis Chamberlain <mcgrof@kernel.org>
> + *
> + * This driver provides an interface to trigger and test the kernel's
> + * module loader through a series of configurations and a few triggers.
> + * To test this driver use the following script as root:
> + *
> + * tools/testing/selftests/dma/fake.sh --help
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/err.h>
> +#include <linux/delay.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dmaengine.h>
> +#include <linux/freezer.h>
> +#include <linux/init.h>
> +#include <linux/kthread.h>
> +#include <linux/sched/task.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +#include <linux/debugfs.h>
> +#include <linux/platform_device.h>
> +#include "dmaengine.h"
> +
> +#define FAKE_MAX_DMA_CHANNELS 20
> +
> +static unsigned int num_channels = FAKE_MAX_DMA_CHANNELS;
> +module_param(num_channels, uint, 0644);
> +MODULE_PARM_DESC(num_channels, "Number of channels to support (default: 20)");
> +
> +struct fake_dma_desc {
> +	struct dma_async_tx_descriptor txd;
> +	dma_addr_t src;
> +	dma_addr_t dst;
> +	size_t len;
> +	enum dma_transaction_type type;
> +	int memset_value;
> +	/* For XOR/PQ operations */
> +	dma_addr_t *src_list; /* Array of source addresses */
> +	unsigned int src_cnt; /* Number of sources */
> +	dma_addr_t *dst_list; /* Array of destination addresses (for PQ) */
> +	unsigned char *pq_coef; /* P+Q coefficients */
> +	struct list_head node;
> +};
> +
> +struct fake_dma_chan {
> +	struct dma_chan chan;
> +	struct list_head active_list;
> +	struct list_head queue;
> +	struct work_struct work;
> +	spinlock_t lock;
> +	bool running;
> +};
> +
> +struct fake_dma_device {
> +	struct platform_device *pdev;
> +	struct dma_device dma_dev;
> +	struct fake_dma_chan *channels;
> +};
> +
> +struct fake_dma_device *single_fake_dma;
> +
> +static struct platform_driver fake_dma_engine_driver = {
> +	.driver = {
> +		.name = KBUILD_MODNAME,
> +		.owner = THIS_MODULE,
> +        },
> +};
> +
> +static int fake_dma_create_platform_device(struct fake_dma_device *fake_dma)
> +{
> +	fake_dma->pdev = platform_device_register_simple("fake-dma-engine", -1, NULL, 0);
> +	if (IS_ERR(fake_dma->pdev))
> +		return -ENODEV;
> +
> +	pr_info("Fake DMA platform device created: %s\n",
> +		dev_name(&fake_dma->pdev->dev));
> +
> +	return 0;
> +}
> +
> +static void fake_dma_destroy_platform_device(struct fake_dma_device  *fake_dma)
> +{
> +	if (!fake_dma->pdev)
> +		return;
> +
> +	pr_info("Destroying fake DMA platform device: %s ...\n",
> +		dev_name(&fake_dma->pdev->dev));
> +	platform_device_unregister(fake_dma->pdev);
> +}
> +
> +static inline struct fake_dma_chan *to_fake_dma_chan(struct dma_chan *c)
> +{
> +	return container_of(c, struct fake_dma_chan, chan);
> +}
> +
> +static inline struct fake_dma_desc *to_fake_dma_desc(struct dma_async_tx_descriptor *txd)
> +{
> +	return container_of(txd, struct fake_dma_desc, txd);
> +}
> +
> +/* Galois Field multiplication for P+Q operations */
> +static unsigned char gf_mul(unsigned char a, unsigned char b)
> +{
> +	unsigned char result = 0;
> +	unsigned char high_bit_set;
> +	int i;
> +
> +	for (i = 0; i < 8; i++) {
> +		if (b & 1)
> +			result ^= a;
> +		high_bit_set = a & 0x80;
> +		a <<= 1;
> +		if (high_bit_set)
> +			a ^= 0x1b; /* x^8 + x^4 + x^3 + x + 1 */
> +		b >>= 1;
> +	}
> +
> +	return result;
> +}
> +
> +/* Processes pending transfers */
> +static void fake_dma_work_func(struct work_struct *work)
> +{
> +	struct fake_dma_chan *vchan = container_of(work, struct fake_dma_chan, work);
> +	struct fake_dma_desc *vdesc;
> +	struct dmaengine_desc_callback cb;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&vchan->lock, flags);
> +
> +	if (list_empty(&vchan->queue)) {
> +		vchan->running = false;
> +		spin_unlock_irqrestore(&vchan->lock, flags);
> +		return;
> +	}
> +
> +	vdesc = list_first_entry(&vchan->queue, struct fake_dma_desc, node);
> +	list_del(&vdesc->node);
> +	list_add_tail(&vdesc->node, &vchan->active_list);
> +
> +	spin_unlock_irqrestore(&vchan->lock, flags);
> +
> +	/* Actually perform the DMA transfer for memcpy operations */
> +	if (vdesc->len) {
> +		void *src_virt, *dst_virt;
> +		void *p_virt, *q_virt;
> +		unsigned char *p_bytes, *q_bytes;
> +		unsigned int i, j;
> +		unsigned char *dst_bytes;
> +
> +		switch (vdesc->type) {
> +		case DMA_MEMCPY:
> +			/* Convert DMA addresses to virtual addresses and perform the copy */
> +			src_virt = phys_to_virt(vdesc->src);
> +			dst_virt = phys_to_virt(vdesc->dst);
> +
> +			memcpy(dst_virt, src_virt, vdesc->len);
> +			break;
> +		case DMA_MEMSET:
> +			dst_virt = phys_to_virt(vdesc->dst);
> +			memset(dst_virt, vdesc->memset_value, vdesc->len);
> +			break;
> +		case DMA_XOR:
> +			dst_virt = phys_to_virt(vdesc->dst);
> +			dst_bytes = (unsigned char *)dst_virt;
> +
> +			memset(dst_virt, 0, vdesc->len);
> +
> +			/* XOR all sources into destination */
> +			for (i = 0; i < vdesc->src_cnt; i++) {
> +				void *src_virt = phys_to_virt(vdesc->src_list[i]);
> +				unsigned char *src_bytes = (unsigned char *)src_virt;
> +
> +				for (j = 0; j < vdesc->len; j++)
> +					dst_bytes[j] ^= src_bytes[j];
> +			}
> +			break;
> +		case DMA_PQ:
> +			p_virt = phys_to_virt(vdesc->dst_list[0]);
> +			q_virt = phys_to_virt(vdesc->dst_list[1]);
> +			p_bytes = (unsigned char *)p_virt;
> +			q_bytes = (unsigned char *)q_virt;
> +
> +			/* Initialize P and Q destinations to zero */
> +			memset(p_virt, 0, vdesc->len);
> +			memset(q_virt, 0, vdesc->len);
> +
> +			/* Calculate P (XOR of all sources) and Q (weighted XOR) */
> +			for (i = 0; i < vdesc->src_cnt; i++) {
> +				void *src_virt = phys_to_virt(vdesc->src_list[i]);
> +				unsigned char *src_bytes = (unsigned char *)src_virt;
> +				unsigned char coef = vdesc->pq_coef[i];
> +
> +				for (j = 0; j < vdesc->len; j++) {
> +					/* P calculation: simple XOR */
> +					p_bytes[j] ^= src_bytes[j];
> +
> +					/* Q calculation: multiply in GF(2^8) and XOR */
> +					q_bytes[j] ^= gf_mul(src_bytes[j], coef);
> +				}
> +			}
> +			break;
> +		default:
> +			pr_warn("fake-dma: Unknown DMA operation type %d\n", vdesc->type);
> +			break;
> +		}
> +	}
> +
> +	/* Mark descriptor as complete */
> +	dma_cookie_complete(&vdesc->txd);
> +
> +	/* Call completion callback if set */
> +	dmaengine_desc_get_callback(&vdesc->txd, &cb);
> +	if (cb.callback)
> +		cb.callback(cb.callback_param);
> +
> +	/* Process next transfer if available */
> +	spin_lock_irqsave(&vchan->lock, flags);
> +	list_del(&vdesc->node);
> +
> +	/* Free allocated memory for XOR/PQ operations */
> +	if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
> +		kfree(vdesc->src_list);
> +		if (vdesc->type == DMA_PQ) {
> +			kfree(vdesc->dst_list);
> +			kfree(vdesc->pq_coef);
> +		}
> +	}
> +
> +	kfree(vdesc);
> +
> +	if (!list_empty(&vchan->queue)) {
> +		spin_unlock_irqrestore(&vchan->lock, flags);
> +		schedule_work(&vchan->work);
> +	} else {
> +		vchan->running = false;
> +		spin_unlock_irqrestore(&vchan->lock, flags);
> +	}
> +}
> +
> +/* Submit descriptor to the DMA engine */
> +static dma_cookie_t fake_dma_tx_submit(struct dma_async_tx_descriptor *txd)
> +{
> +	struct fake_dma_chan *vchan = to_fake_dma_chan(txd->chan);
> +	struct fake_dma_desc *vdesc = to_fake_dma_desc(txd);
> +	unsigned long flags;
> +	dma_cookie_t cookie;
> +
> +	spin_lock_irqsave(&vchan->lock, flags);
> +
> +	cookie = dma_cookie_assign(txd);
> +	list_add_tail(&vdesc->node, &vchan->queue);
> +
> +	/* Schedule processing if not already running */
> +	if (!vchan->running) {
> +		vchan->running = true;
> +		schedule_work(&vchan->work);
> +	}
> +
> +	spin_unlock_irqrestore(&vchan->lock, flags);
> +
> +	return cookie;
> +}
> +
> +static
> +struct dma_async_tx_descriptor *fake_dma_prep_memcpy(struct dma_chan *chan,
> +						     dma_addr_t dest,
> +						     dma_addr_t src,
> +						     size_t len,
> +						     unsigned long flags)
> +{
> +	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
> +	struct fake_dma_desc *vdesc;
> +
> +	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
> +	if (!vdesc)
> +		return NULL;
> +
> +	if (!vchan)
> +		return NULL;
> +
> +	dma_async_tx_descriptor_init(&vdesc->txd, chan);
> +	vdesc->type = DMA_MEMCPY;
> +	vdesc->txd.tx_submit = fake_dma_tx_submit;
> +	vdesc->txd.flags = flags;
> +	vdesc->src = src;
> +	vdesc->dst = dest;
> +	vdesc->len = len;
> +	INIT_LIST_HEAD(&vdesc->node);
> +
> +	return &vdesc->txd;
> +}
> +
> +static
> +struct dma_async_tx_descriptor * fake_dma_prep_memset(struct dma_chan *chan,
> +						      dma_addr_t dest,
> +						      int value,
> +						      size_t len,
> +						      unsigned long flags)
> +{
> +	struct fake_dma_desc *vdesc;
> +
> +	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
> +	if (!vdesc)
> +		return NULL;
> +
> +	dma_async_tx_descriptor_init(&vdesc->txd, chan);
> +	vdesc->type = DMA_MEMSET;
> +	vdesc->txd.tx_submit = fake_dma_tx_submit;
> +	vdesc->txd.flags = flags;
> +	vdesc->dst = dest;
> +	vdesc->len = len;
> +	vdesc->memset_value = value & 0xFF; /* Ensure it's a single byte */
> +
> +	INIT_LIST_HEAD(&vdesc->node);
> +
> +	return &vdesc->txd;
> +}
> +
> +static struct dma_async_tx_descriptor *
> +fake_dma_prep_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
> +		  unsigned int src_cnt, size_t len, unsigned long flags)
> +{
> +	struct fake_dma_desc *vdesc;
> +
> +	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
> +	if (!vdesc)
> +		return NULL;
> +
> +	/* Allocate memory for source list */
> +	vdesc->src_list = kmalloc(src_cnt * sizeof(dma_addr_t), GFP_NOWAIT);
> +	if (!vdesc->src_list) {
> +		kfree(vdesc);
> +		return NULL;
> +	}
> +
> +	dma_async_tx_descriptor_init(&vdesc->txd, chan);
> +	vdesc->type = DMA_XOR;
> +	vdesc->txd.tx_submit = fake_dma_tx_submit;
> +	vdesc->txd.flags = flags;
> +	vdesc->dst = dest;
> +	vdesc->len = len;
> +	vdesc->src_cnt = src_cnt;
> +
> +	memcpy(vdesc->src_list, src, src_cnt * sizeof(dma_addr_t));
> +
> +	INIT_LIST_HEAD(&vdesc->node);
> +
> +	return &vdesc->txd;
> +}
> +
> +static struct dma_async_tx_descriptor *
> +fake_dma_prep_pq(struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src,
> +		 unsigned int src_cnt, const unsigned char *scf, size_t len,
> +		 unsigned long flags)
> +{
> +	struct fake_dma_desc *vdesc;
> +
> +	vdesc = kzalloc(sizeof(*vdesc), GFP_NOWAIT);
> +	if (!vdesc)
> +		return NULL;
> +
> +	vdesc->src_list = kmalloc(src_cnt * sizeof(dma_addr_t), GFP_NOWAIT);
> +	if (!vdesc->src_list) {
> +		kfree(vdesc);
> +		return NULL;
> +	}
> +
> +	/* Allocate memory for destination list (P and Q) */
> +	vdesc->dst_list = kmalloc(2 * sizeof(dma_addr_t), GFP_NOWAIT);
> +	if (!vdesc->dst_list) {
> +		kfree(vdesc->src_list);
> +		kfree(vdesc);
> +		return NULL;
> +	}
> +
> +	/* Allocate memory for coefficients */
> +	vdesc->pq_coef = kmalloc(src_cnt * sizeof(unsigned char), GFP_NOWAIT);
> +	if (!vdesc->pq_coef) {
> +		kfree(vdesc->dst_list);
> +		kfree(vdesc->src_list);
> +		kfree(vdesc);
> +		return NULL;
> +	}
> +
> +	dma_async_tx_descriptor_init(&vdesc->txd, chan);
> +	vdesc->type = DMA_PQ;
> +	vdesc->txd.tx_submit = fake_dma_tx_submit;
> +	vdesc->txd.flags = flags;
> +	vdesc->len = len;
> +	vdesc->src_cnt = src_cnt;
> +
> +	/* Copy source addresses */
> +	memcpy(vdesc->src_list, src, src_cnt * sizeof(dma_addr_t));
> +	/* Copy destination addresses (P and Q) */
> +	memcpy(vdesc->dst_list, dst, 2 * sizeof(dma_addr_t));
> +	/* Copy coefficients */
> +	memcpy(vdesc->pq_coef, scf, src_cnt * sizeof(unsigned char));
> +
> +	INIT_LIST_HEAD(&vdesc->node);
> +
> +	return &vdesc->txd;
> +}
> +
> +static void fake_dma_issue_pending(struct dma_chan *chan)
> +{
> +	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&vchan->lock, flags);
> +
> +	/* Start processing if not already running and queue not empty */
> +	if (!vchan->running && !list_empty(&vchan->queue)) {
> +		vchan->running = true;
> +		schedule_work(&vchan->work);
> +	}
> +
> +	spin_unlock_irqrestore(&vchan->lock, flags);
> +}
> +
> +static int fake_dma_alloc_chan_resources(struct dma_chan *chan)
> +{
> +	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
> +
> +	INIT_LIST_HEAD(&vchan->active_list);
> +	INIT_LIST_HEAD(&vchan->queue);
> +	vchan->running = false;
> +
> +	return 1; /* Number of descriptors allocated */
> +}
> +
> +static void fake_dma_free_chan_resources(struct dma_chan *chan)
> +{
> +	struct fake_dma_chan *vchan = to_fake_dma_chan(chan);
> +	struct fake_dma_desc *vdesc, *_vdesc;
> +	unsigned long flags;
> +
> +	cancel_work_sync(&vchan->work);
> +
> +	spin_lock_irqsave(&vchan->lock, flags);
> +
> +	/* Free all descriptors in queue */
> +	list_for_each_entry_safe(vdesc, _vdesc, &vchan->queue, node) {
> +		list_del(&vdesc->node);
> +
> +		/* Free allocated memory for XOR/PQ operations */
> +		if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
> +			kfree(vdesc->src_list);
> +			if (vdesc->type == DMA_PQ) {
> +				kfree(vdesc->dst_list);
> +				kfree(vdesc->pq_coef);
> +			}
> +		}
> +		kfree(vdesc);
> +	}
> +
> +	/* Free all descriptors in active list */
> +	list_for_each_entry_safe(vdesc, _vdesc, &vchan->active_list, node) {
> +		list_del(&vdesc->node);
> +		/* Free allocated memory for XOR/PQ operations */
> +		if (vdesc->type == DMA_XOR || vdesc->type == DMA_PQ) {
> +			kfree(vdesc->src_list);
> +			if (vdesc->type == DMA_PQ) {
> +				kfree(vdesc->dst_list);
> +				kfree(vdesc->pq_coef);
> +			}
> +		}
> +		kfree(vdesc);
> +	}
> +
> +	spin_unlock_irqrestore(&vchan->lock, flags);
> +}
> +
> +static void fake_dma_release(struct dma_device *dma_dev)
> +{
> +	unsigned int i;
> +        struct fake_dma_device *fake_dma =
> +                container_of(dma_dev, struct fake_dma_device, dma_dev);
> +
> +	pr_info("refcount for dma device %s hit 0, quiescing...",
> +		dev_name(&fake_dma->pdev->dev));
> +
> +	for (i = 0; i < num_channels; i++) {
> +		struct fake_dma_chan *vchan = &fake_dma->channels[i];
> +		cancel_work_sync(&vchan->work);
> +	}
> +
> +        put_device(dma_dev->dev);
> +}
> +
> +static void fake_dma_setup_config(struct fake_dma_device *fake_dma)
> +{
> +	unsigned int i;
> +	struct dma_device *dma =  &fake_dma->dma_dev;
> +
> +	dma->dev = get_device(&fake_dma->pdev->dev);
> +
> +	/* Set multiple capabilities for dmatest compatibility */
> +	dma_cap_set(DMA_MEMCPY, dma->cap_mask);
> +	dma_cap_set(DMA_MEMSET, dma->cap_mask);
> +	dma_cap_set(DMA_XOR, dma->cap_mask);
> +	dma_cap_set(DMA_PQ, dma->cap_mask);
> +	dma_cap_set(DMA_PRIVATE, dma->cap_mask);
> +
> +	dma->device_alloc_chan_resources = fake_dma_alloc_chan_resources;
> +	dma->device_free_chan_resources = fake_dma_free_chan_resources;
> +	dma->device_prep_dma_memcpy = fake_dma_prep_memcpy;
> +	dma->device_prep_dma_memset = fake_dma_prep_memset;
> +	dma->device_prep_dma_xor = fake_dma_prep_xor;
> +	dma->device_prep_dma_pq = fake_dma_prep_pq;
> +	dma->device_issue_pending = fake_dma_issue_pending;
> +	dma->device_tx_status = dma_cookie_status;
> +	dma->device_release = fake_dma_release;
> +
> +	dma->copy_align = 4; /* 4-byte alignment for memcpy */
> +	dma->fill_align = 4; /* 4-byte alignment for memset */
> +	dma->xor_align = 4;  /* 4-byte alignment for xor */
> +	dma->pq_align = 4;   /* 4-byte alignment for pq */
> +
> +	dma->max_xor = 16;   /* Support up to 16 XOR sources */
> +	dma->max_pq = 16;    /* Support up to 16 P+Q sources */
> +
> +	dma->src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) |
> +			       BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |
> +			       BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
> +			       BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
> +	dma->dst_addr_widths = dma->src_addr_widths;
> +	dma->directions = BIT(DMA_MEM_TO_MEM);
> +	dma->residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
> +
> +	INIT_LIST_HEAD(&dma->channels);
> +
> +	for (i = 0; i < num_channels; i++) {
> +		struct fake_dma_chan *vchan = &fake_dma->channels[i];
> +
> +		vchan->chan.device = dma;
> +		dma_cookie_init(&vchan->chan);
> +
> +		spin_lock_init(&vchan->lock);
> +		INIT_LIST_HEAD(&vchan->active_list);
> +		INIT_LIST_HEAD(&vchan->queue);
> +
> +		INIT_WORK(&vchan->work, fake_dma_work_func);
> +
> +		list_add_tail(&vchan->chan.device_node, &dma->channels);
> +	}
> +}
> +
> +static int fake_dma_load(void)
> +{
> +	unsigned int i;
> +	int ret;
> +	struct fake_dma_device *fake_dma;
> +	struct dma_device *dma;
> +
> +	if (single_fake_dma) {
> +		pr_err("Fake DMA device already loaded, skipping...");
> +		return -EALREADY;
> +	}
> +
> +	if (num_channels > FAKE_MAX_DMA_CHANNELS)
> +		num_channels = FAKE_MAX_DMA_CHANNELS;
> +
> +	ret = platform_driver_register(&fake_dma_engine_driver);
> +	if (ret)
> +		return ret;
> +
> +	fake_dma = kzalloc(sizeof(*fake_dma), GFP_KERNEL);
> +	if (!fake_dma) {
> +		ret = -ENOMEM;
> +		goto out_unregister_driver;
> +	}
> +
> +	fake_dma->channels = kzalloc(sizeof(struct fake_dma_chan) * num_channels,
> +				     GFP_KERNEL);
> +	if (!fake_dma->channels) {
> +		ret = -ENOMEM;
> +		goto out_free_dma;
> +	}
> +
> +	ret = fake_dma_create_platform_device(fake_dma);
> +	if (ret)
> +		goto out_free_chans;
> +
> +	fake_dma->pdev->dev.driver = &fake_dma_engine_driver.driver;
> +	ret = device_bind_driver(&fake_dma->pdev->dev);
> +	if (ret)
> +		goto out_unregister_device;
> +
> +	fake_dma_setup_config(fake_dma);
> +	dma = &fake_dma->dma_dev;
> +
> +	/* Register with the DMA Engine */
> +	ret = dma_async_device_register(dma);
> +	if (ret) {
> +		ret = -EINVAL;
> +		goto out_release_driver;
> +	}
> +
> +	for (i = 0; i < num_channels; i++) {
> +		struct fake_dma_chan *vchan = &fake_dma->channels[i];
> +		pr_info("Registered fake DMA channel %d (%s)\n",
> +			i, dma_chan_name(&vchan->chan));
> +	}
> +
> +	single_fake_dma = fake_dma;
> +
> +	pr_info("Fake DMA engine: %s registered with %d channels\n",
> +		dev_name(&fake_dma->pdev->dev), num_channels);
> +
> +	pr_info("Fake DMA device name for dmatest: '%s'\n", dev_name(dma->dev));
> +	pr_info("Fake DMA device path: '%s'\n", dev_name(&fake_dma->pdev->dev));
> +
> +	return 0;
> +
> +out_release_driver:
> +	device_release_driver(&fake_dma->pdev->dev);
> +out_unregister_device:
> +	fake_dma_destroy_platform_device(fake_dma);
> +out_free_chans:
> +	kfree(fake_dma->channels);
> +out_free_dma:
> +	kfree(fake_dma);
> +	fake_dma = NULL;
> +out_unregister_driver:
> +	platform_driver_unregister(&fake_dma_engine_driver);
> +	return ret;
> +}
> +
> +static void fake_dma_unload(void)
> +{
> +	struct fake_dma_device *fake_dma = single_fake_dma;
> +
> +	if (!fake_dma) {
> +		pr_info("No fake DMA engines registered yet.\n");
> +		return;
> +	}
> +
> +	pr_info("Fake DMA engine: %s unregistering with %d channels ...\n",
> +		dev_name(&fake_dma->pdev->dev), num_channels);
> +
> +	dma_async_device_unregister(&fake_dma->dma_dev);
> +
> +	/*
> +	 * dma_async_device_unregister() will call device_release() only
> +	 * if a channel ever gets busy, so we need to tidy up ourselves
> +	 * here in case no channels are ever used.
> +	 */
> +	device_release_driver(&fake_dma->pdev->dev);
> +	fake_dma_destroy_platform_device(fake_dma);
> +
> +	kfree(fake_dma->channels);
> +	kfree(fake_dma);
> +
> +	platform_driver_unregister(&fake_dma_engine_driver);
> +	single_fake_dma = NULL;
> +}
> +
> +static ssize_t write_file_load(struct file *file, const char __user *user_buf,
> +			       size_t count, loff_t *ppos)
> +{
> +	fake_dma_load();
> +
> +	return count;
> +}
> +
> +static const struct file_operations fops_load = {
> +	.write = write_file_load,
> +	.open = simple_open,
> +	.owner = THIS_MODULE,
> +	.llseek = default_llseek,
> +};
> +
> +static ssize_t write_file_unload(struct file *file, const char __user *user_buf,
> +				 size_t count, loff_t *ppos)
> +{
> +	fake_dma_unload();
> +
> +	return count;
> +}
> +
> +static const struct file_operations fops_unload = {
> +	.write = write_file_unload,
> +	.open = simple_open,
> +	.owner = THIS_MODULE,
> +	.llseek = default_llseek,
> +};
> +
> +static int __init fake_dma_init(void)
> +{
> +	struct dentry *fake_dir;
> +
> +	fake_dir = debugfs_create_dir("fake-dma", NULL);
> +	debugfs_create_file("load", 0600, fake_dir, NULL, &fops_load);
> +	debugfs_create_file("unload", 0600, fake_dir, NULL, &fops_unload);
> +
> +	return fake_dma_load();
> +}
> +late_initcall(fake_dma_init);
> +
> +static void __exit fake_dma_exit(void)
> +{
> +	fake_dma_unload();
> +}
> +module_exit(fake_dma_exit);
> +
> +MODULE_DESCRIPTION("Fake DMA Engine test module");
> +MODULE_AUTHOR("Luis Chamberlain");
> +MODULE_LICENSE("GPL v2");


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] dmatest: move printing to its own routine
  2025-05-20 22:39 ` [PATCH 3/6] dmatest: move printing to its own routine Luis Chamberlain
@ 2025-05-21 14:41   ` Robin Murphy
  2025-05-21 17:10     ` Luis Chamberlain
  2025-05-21 22:26   ` kernel test robot
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2025-05-21 14:41 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev

On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> Move statistics printing to its own routine, and while at it, put
> the test counters into the struct dmatest_thread for the streaming DMA
> API to allow us to later add IOVA DMA API support and be able to
> differentiate.
> 
> While at it, use a mutex to serialize output so we don't get garbled
> messages between different threads.
> 
> This makes no functional changes other than serializing the output
> and prepping us for IOVA DMA API support.

Um, what about subtly changing the test timing and total runtime 
calculation, and significantly changing the output format enough to 
almost certainly break any scripts parsing it? What definition of 
"functional" are we using here, exactly? :/

Yes I know the kernel log is not strictly ABI and parsing it is not 
advised in general, but per 
Documentation/driver-api/dmaengine/dmatest.rst this is still the 
officially documented way to gather dmatest results. Also the mutex 
doesn't prevent *other* kernel messages from being interspersed, so 
multi-line output still isn't really stable.

Thanks,
Robin.

> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   drivers/dma/dmatest.c | 77 ++++++++++++++++++++++++++++++++-----------
>   1 file changed, 58 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
> index 921d89b4d2ed..b4c129e688e3 100644
> --- a/drivers/dma/dmatest.c
> +++ b/drivers/dma/dmatest.c
> @@ -92,6 +92,8 @@ static bool polled;
>   module_param(polled, bool, 0644);
>   MODULE_PARM_DESC(polled, "Use polling for completion instead of interrupts");
>   
> +static DEFINE_MUTEX(stats_mutex);
> +
>   /**
>    * struct dmatest_params - test parameters.
>    * @nobounce:		prevent using swiotlb buffer
> @@ -239,6 +241,12 @@ struct dmatest_thread {
>   	bool			done;
>   	bool			pending;
>   	u8			*pq_coefs;
> +
> +	/* Streaming DMA statistics */
> +	unsigned int streaming_tests;
> +	unsigned int streaming_failures;
> +	unsigned long long streaming_total_len;
> +	ktime_t streaming_runtime;
>   };
>   
>   struct dmatest_chan {
> @@ -898,6 +906,30 @@ static int dmatest_do_dma_test(struct dmatest_thread *thread,
>   	return ret;
>   }
>   
> +static void dmatest_print_detailed_stats(struct dmatest_thread *thread)
> +{
> +	unsigned long long streaming_iops, streaming_kbs;
> +	s64 streaming_runtime_us;
> +
> +	mutex_lock(&stats_mutex);
> +
> +	streaming_runtime_us = ktime_to_us(thread->streaming_runtime);
> +	streaming_iops = dmatest_persec(streaming_runtime_us, thread->streaming_tests);
> +	streaming_kbs = dmatest_KBs(streaming_runtime_us, thread->streaming_total_len);
> +
> +	pr_info("=== %s: DMA Test Results ===\n", current->comm);
> +
> +	/* Streaming DMA statistics */
> +	pr_info("%s: STREAMINMG DMA: %u tests, %u failures\n",
> +		current->comm, thread->streaming_tests, thread->streaming_failures);
> +	pr_info("%s: STREAMING DMA: %llu.%02llu iops, %llu KB/s, %lld us total\n",
> +		current->comm, FIXPT_TO_INT(streaming_iops), FIXPT_GET_FRAC(streaming_iops),
> +		streaming_kbs, streaming_runtime_us);
> +
> +	pr_info("=== %s: End Results ===\n", current->comm);
> +	mutex_unlock(&stats_mutex);
> +}
> +
>   /*
>    * This function repeatedly tests DMA transfers of various lengths and
>    * offsets for a given operation type until it is told to exit by
> @@ -921,20 +953,22 @@ static int dmatest_func(void *data)
>   	unsigned int buf_size;
>   	u8 align;
>   	bool is_memset;
> -	unsigned int failed_tests = 0;
> -	unsigned int total_tests = 0;
> -	ktime_t ktime, start;
> +	unsigned int total_iterations = 0;
> +	ktime_t start_time, streaming_start;
>   	ktime_t filltime = 0;
>   	ktime_t comparetime = 0;
> -	s64 runtime = 0;
> -	unsigned long long total_len = 0;
> -	unsigned long long iops = 0;
>   	int ret;
>   
>   	set_freezable();
>   	smp_rmb();
>   	thread->pending = false;
>   
> +	/* Initialize statistics */
> +	thread->streaming_tests = 0;
> +	thread->streaming_failures = 0;
> +	thread->streaming_total_len = 0;
> +	thread->streaming_runtime = 0;
> +
>   	/* Setup test parameters and allocate buffers */
>   	ret = dmatest_setup_test(thread, &buf_size, &align, &is_memset);
>   	if (ret)
> @@ -942,34 +976,39 @@ static int dmatest_func(void *data)
>   
>   	set_user_nice(current, 10);
>   
> -	ktime = start = ktime_get();
> +	start_time = ktime_get();
>   	while (!(kthread_should_stop() ||
> -		(params->iterations && total_tests >= params->iterations))) {
> +		(params->iterations && total_iterations >= params->iterations))) {
>   
> +		/* Test streaming DMA path */
> +		streaming_start = ktime_get();
>   		ret = dmatest_do_dma_test(thread, buf_size, align, is_memset,
> -					  &total_tests, &failed_tests, &total_len,
> +					  &thread->streaming_tests, &thread->streaming_failures,
> +					  &thread->streaming_total_len,
>   					  &filltime, &comparetime);
> +		thread->streaming_runtime = ktime_add(thread->streaming_runtime,
> +						    ktime_sub(ktime_get(), streaming_start));
>   		if (ret < 0)
>   			break;
> +
> +		total_iterations++;
>   	}
>   
> -	ktime = ktime_sub(ktime_get(), ktime);
> -	ktime = ktime_sub(ktime, comparetime);
> -	ktime = ktime_sub(ktime, filltime);
> -	runtime = ktime_to_us(ktime);
> +	/* Subtract fill and compare time from both paths */
> +	thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
> +					   ktime_divns(filltime, 2));
> +	thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
> +					   ktime_divns(comparetime, 2));
>   
>   	ret = 0;
>   	dmatest_cleanup_test(thread);
>   
>   err_thread_type:
> -	iops = dmatest_persec(runtime, total_tests);
> -	pr_info("%s: summary %u tests, %u failures %llu.%02llu iops %llu KB/s (%d)\n",
> -		current->comm, total_tests, failed_tests,
> -		FIXPT_TO_INT(iops), FIXPT_GET_FRAC(iops),
> -		dmatest_KBs(runtime, total_len), ret);
> +	/* Print detailed statistics */
> +	dmatest_print_detailed_stats(thread);
>   
>   	/* terminate all transfers on specified channels */
> -	if (ret || failed_tests)
> +	if (ret || (thread->streaming_failures))
>   		dmaengine_terminate_sync(chan);
>   
>   	thread->done = true;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] dma-mapping: benchmark: add IOVA support
  2025-05-20 22:39 ` [PATCH 6/6] dma-mapping: benchmark: add IOVA support Luis Chamberlain
  2025-05-21 11:58   ` kernel test robot
@ 2025-05-21 16:08   ` Robin Murphy
  2025-05-21 17:17     ` Luis Chamberlain
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2025-05-21 16:08 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, leon, jgg,
	alex.williamson, joel.granados
  Cc: iommu, dmaengine, linux-block, gost.dev

On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> Add support to use the IOVA DMA API, and allow comparing and contrasting
> to the streaming DMA API. Since the IOVA is intended to be an enhancement
> when using an IOMMU which supports DMA over it only allow the IOVA to be
> used proactively for devices which have this support, that is when
> use_dma_iommu() is true. We don't try a fallback as the goal is clear,
> only to use the IOVA when intended.
> 
> Example output, using intel-iommu on qemu against a random number
> generator device, this output is completely artificial as its a VM
> and its using more threads than the guest even has cores, the goal was
> to at least visualize some numerical output on both paths:
> 
> ./tools/testing/selftests/dma/dma_map_benchmark -t 24 -i 2
> === DMA Mapping Benchmark Results ===
> Configuration: threads:24 seconds:20 node:-1 dir:BIDIRECTIONAL granule:1 iova:2
> Buffer size: 1 pages (4 KB)
> 
> STREAMING DMA RESULTS:
>    Map   latency:    12.3 μs (σ=257.9 μs)
>    Unmap latency:     3.7 μs (σ=142.5 μs)
>    Total latency:    16.0 μs
> 
> IOVA DMA RESULTS:
>    Alloc   latency:     0.1 μs (σ= 31.1 μs)
>    Link    latency:     2.5 μs (σ=116.9 μs)
>    Sync    latency:     9.6 μs (σ=227.8 μs)
>    Destroy latency:     3.6 μs (σ=141.2 μs)
>    Total latency:    15.8 μs
> 
> PERFORMANCE COMPARISON:
>    Streaming DMA total:    16.0 μs
>    IOVA DMA total:         15.8 μs
>    Performance ratio:      0.99x (IOVA is 1.3% faster)
>    Streaming throughput:    62500 ops/sec
>    IOVA throughput:         63291 ops/sec
>    Streaming bandwidth:     244.1 MB/s
>    IOVA bandwidth:          247.2 MB/s
> 
> IOVA OPERATION BREAKDOWN:
>    Alloc:     0.6% (   0.1 μs)
>    Link:     15.8% (   2.5 μs)
>    Sync:     60.8% (   9.6 μs)
>    Destroy:  22.8% (   3.6 μs)
> 
> RECOMMENDATIONS:
>    ~ IOVA and Streaming APIs show similar performance
> === End of Benchmark ===
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>   include/linux/map_benchmark.h                 |  11 +
>   kernel/dma/Kconfig                            |   4 +-
>   kernel/dma/map_benchmark.c                    | 417 +++++++++++++++++-
>   .../testing/selftests/dma/dma_map_benchmark.c | 145 +++++-
>   4 files changed, 562 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/map_benchmark.h b/include/linux/map_benchmark.h
> index 62674c83bde4..da7c9e3ddf21 100644
> --- a/include/linux/map_benchmark.h
> +++ b/include/linux/map_benchmark.h
> @@ -7,6 +7,7 @@
>   #define _KERNEL_DMA_BENCHMARK_H
>   
>   #define DMA_MAP_BENCHMARK       _IOWR('d', 1, struct map_benchmark)
> +#define DMA_MAP_BENCHMARK_IOVA	_IOWR('d', 2, struct map_benchmark)
>   #define DMA_MAP_MAX_THREADS     1024
>   #define DMA_MAP_MAX_SECONDS     300
>   #define DMA_MAP_MAX_TRANS_DELAY (10 * NSEC_PER_MSEC)
> @@ -27,5 +28,15 @@ struct map_benchmark {
>   	__u32 dma_dir; /* DMA data direction */
>   	__u32 dma_trans_ns; /* time for DMA transmission in ns */
>   	__u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */
> +	__u32 has_iommu_dma;

Why would userspace care about this? Either they asked for a streaming 
benchmark and it's irrelevant, or they asked for an IOVA benchmark, 
which either succeeded, or failed and this is ignored anyway.

> +	__u64 avg_iova_alloc_100ns;
> +	__u64 avg_iova_link_100ns;
> +	__u64 avg_iova_sync_100ns;
> +	__u64 avg_iova_destroy_100ns;
> +	__u64 iova_alloc_stddev;
> +	__u64 iova_link_stddev;
> +	__u64 iova_sync_stddev;
> +	__u64 iova_destroy_stddev;
> +	__u32 use_iova; /* 0=regular, 1=IOVA, 2=both */

Conversely, why should the kernel have to care about this? If userspace 
wants both benchmarks, they can just run both benchmarks, with whatever 
number of threads for each they fancy. No need to have all that 
complexity kernel-side. If there's a valid desire for running multiple 
different benchmarks *simultaneously* then we should support that in 
general (I can imagine it being potentially interesting to thrash the 
IOVA allocator with several different sizes at once, for example.)

That way, I'd also be inclined to give the new ioctl its own separate 
structure for IOVA results, and avoid impacting the existing ABI.

>   };
>   #endif /* _KERNEL_DMA_BENCHMARK_H */
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index 31cfdb6b4bc3..e2d5784f46eb 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -261,10 +261,10 @@ config DMA_API_DEBUG
>   	  If unsure, say N.
>   
>   config DMA_MAP_BENCHMARK
> -	bool "Enable benchmarking of streaming DMA mapping"
> +	bool "Enable benchmarking of streaming and IOVA DMA mapping"
>   	depends on DEBUG_FS
>   	help
>   	  Provides /sys/kernel/debug/dma_map_benchmark that helps with testing
> -	  performance of dma_(un)map_page.
> +	  performance of the streaming DMA dma_(un)map_page and IOVA API.
>   
>   	  See tools/testing/selftests/dma/dma_map_benchmark.c
> diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> index b54345a757cb..3ae34433420b 100644
> --- a/kernel/dma/map_benchmark.c
> +++ b/kernel/dma/map_benchmark.c
> @@ -18,6 +18,7 @@
>   #include <linux/platform_device.h>
>   #include <linux/slab.h>
>   #include <linux/timekeeping.h>
> +#include <linux/iommu-dma.h>

Nit: these are currently in nice alphabetical order.

>   struct map_benchmark_data {
>   	struct map_benchmark bparam;
[...]
> @@ -112,7 +231,250 @@ static int map_benchmark_thread(void *data)
>   	return ret;
>   }
>   
> -static int do_map_benchmark(struct map_benchmark_data *map)
> +static int do_iova_benchmark(struct map_benchmark_data *map)
> +{
> +	struct task_struct **tsk;
> +	int threads = map->bparam.threads;
> +	int node = map->bparam.node;
> +	u64 iova_loops;
> +	int ret = 0;
> +	int i;
> +
> +	tsk = kmalloc_array(threads, sizeof(*tsk), GFP_KERNEL);
> +	if (!tsk)
> +		return -ENOMEM;
> +
> +	get_device(map->dev);
> +
> +	/* Create IOVA threads only */
> +	for (i = 0; i < threads; i++) {
> +		tsk[i] = kthread_create_on_node(benchmark_thread_iova, map,
> +				node, "dma-iova-benchmark/%d", i);
> +		if (IS_ERR(tsk[i])) {
> +			pr_err("create dma_iova thread failed\n");
> +			ret = PTR_ERR(tsk[i]);
> +			while (--i >= 0)
> +				kthread_stop(tsk[i]);
> +			goto out;
> +		}
> +
> +		if (node != NUMA_NO_NODE)
> +			kthread_bind_mask(tsk[i], cpumask_of_node(node));
> +	}

Duplicating all the thread-wrangling code seems needlessly horrible - 
surely it's easy enough to factor out the stats initialisation and final 
calculation, along with the thread function itself. Perhaps as callbacks 
in the map_benchmark_data?

Similarly, each "thread function" itself only only actually needs to 
consist of the respective "while (!kthread_should_stop())" loop - the 
rest of map_benchmark_thread() could still be used as a common harness 
to avoid duplicating the buffer management code as well.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-21 14:20   ` Robin Murphy
@ 2025-05-21 17:07     ` Luis Chamberlain
  2025-05-22 11:18       ` Marek Szyprowski
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-21 17:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: vkoul, chenxiang66, m.szyprowski, leon, jgg, alex.williamson,
	joel.granados, iommu, dmaengine, linux-block, gost.dev

On Wed, May 21, 2025 at 03:20:11PM +0100, Robin Murphy wrote:
> On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> > Today on x86_64 q35 guests we can't easily test some of the DMA API
> > with the dmatest out of the box because we lack a DMA engine as the
> > current qemu intel IOT patches are out of tree. This implements a basic
> > dma engine to let us use the dmatest API to expand on it and leverage
> > it on q35 guests.
> 
> What does doing so ultimately achieve though?

What do you do to test for regressions automatically today for the DMA API?

This patch series didn't just add a fake-dma engine though but let's
first address that as its what you raised a question for:

Although I didn't add them, with this we can easily enable kernel
selftests to now allow any q35 guest to easily run basic API tests for the
DMA API. It's actually how I found the dma benchmark code, as its the only
selftest we have for DMA. However that benchmark test is not easy to
configure or enable. With kernel selftests you can test for things
outside of the scope of performance.  You can test for expected
correctness of the APIs and to ensure no regressions exist with extected
behavior, otherwise you learn about possible regressions reactively. We
have many selftests that do just that without a focus on performance for
many things, xarray, maple tree, sysctl, firmware loader, module
loading, etc. And yes, they find bugs proactively.

With this then, we should be able to easily add a CI to run these tests
based on linux-next or linus' tags, even if its virtual. Who would run
these? We can get this going daily on kdevops easily, if we want them, we
already have a series of tests automated for different subsystems.

Benchmarking can be done separatley with real hardware -- agreed.
But it does not negate the need for simple virtual kernel selftests.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] dmatest: move printing to its own routine
  2025-05-21 14:41   ` Robin Murphy
@ 2025-05-21 17:10     ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-21 17:10 UTC (permalink / raw)
  To: Robin Murphy
  Cc: vkoul, chenxiang66, m.szyprowski, leon, jgg, alex.williamson,
	joel.granados, iommu, dmaengine, linux-block, gost.dev

On Wed, May 21, 2025 at 03:41:38PM +0100, Robin Murphy wrote:
> On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> > Move statistics printing to its own routine, and while at it, put
> > the test counters into the struct dmatest_thread for the streaming DMA
> > API to allow us to later add IOVA DMA API support and be able to
> > differentiate.
> > 
> > While at it, use a mutex to serialize output so we don't get garbled
> > messages between different threads.
> > 
> > This makes no functional changes other than serializing the output
> > and prepping us for IOVA DMA API support.
> 
> Um, what about subtly changing the test timing and total runtime
> calculation, and significantly changing the output format enough to almost
> certainly break any scripts parsing it? What definition of "functional" are
> we using here, exactly? :/

Sure, we can keep the old format if that is the preference.

> Also the
> mutex doesn't prevent *other* kernel messages from being interspersed, so
> multi-line output still isn't really stable.

*other* sure -- but for this test it makes things legible, otherwise its
quite re-ordered.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] dma-mapping: benchmark: add IOVA support
  2025-05-21 16:08   ` Robin Murphy
@ 2025-05-21 17:17     ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-21 17:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: vkoul, chenxiang66, m.szyprowski, leon, jgg, alex.williamson,
	joel.granados, iommu, dmaengine, linux-block, gost.dev

On Wed, May 21, 2025 at 05:08:08PM +0100, Robin Murphy wrote:
> On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> > diff --git a/include/linux/map_benchmark.h b/include/linux/map_benchmark.h
> > index 62674c83bde4..da7c9e3ddf21 100644
> > --- a/include/linux/map_benchmark.h
> > +++ b/include/linux/map_benchmark.h
> > @@ -27,5 +28,15 @@ struct map_benchmark {
> >   	__u32 dma_dir; /* DMA data direction */
> >   	__u32 dma_trans_ns; /* time for DMA transmission in ns */
> >   	__u32 granule;  /* how many PAGE_SIZE will do map/unmap once a time */
> > +	__u32 has_iommu_dma;
> 
> Why would userspace care about this? Either they asked for a streaming
> benchmark and it's irrelevant, or they asked for an IOVA benchmark, which
> either succeeded, or failed and this is ignored anyway.

Its so we inform userspace that its not possible to run IOVA tests for a
good reason, instead of just saying it failed.

> Conversely, why should the kernel have to care about this? If userspace
> wants both benchmarks, they can just run both benchmarks, with whatever
> number of threads for each they fancy. No need to have all that complexity
> kernel-side. 

I'm not following about the complexity you are referring to here. The
point to has_iommu_dma is to simply avoid running the IOVA tests so
that userspace doesn't get incorrect results for a feature it can't
possibly support.

> If there's a valid desire for running multiple different
> benchmarks *simultaneously* then we should support that in general (I can
> imagine it being potentially interesting to thrash the IOVA allocator with
> several different sizes at once, for example.)

Sure, both are supported. However, no point in running IOVA tests if
you can't possibly run them.

> That way, I'd also be inclined to give the new ioctl its own separate
> structure for IOVA results, and avoid impacting the existing ABI.

Sure.

> > -static int do_map_benchmark(struct map_benchmark_data *map)
> > +static int do_iova_benchmark(struct map_benchmark_data *map)
> > +{
> > +	struct task_struct **tsk;
> > +	int threads = map->bparam.threads;
> > +	int node = map->bparam.node;
> > +	u64 iova_loops;
> > +	int ret = 0;
> > +	int i;
> > +
> > +	tsk = kmalloc_array(threads, sizeof(*tsk), GFP_KERNEL);
> > +	if (!tsk)
> > +		return -ENOMEM;
> > +
> > +	get_device(map->dev);
> > +
> > +	/* Create IOVA threads only */
> > +	for (i = 0; i < threads; i++) {
> > +		tsk[i] = kthread_create_on_node(benchmark_thread_iova, map,
> > +				node, "dma-iova-benchmark/%d", i);
> > +		if (IS_ERR(tsk[i])) {
> > +			pr_err("create dma_iova thread failed\n");
> > +			ret = PTR_ERR(tsk[i]);
> > +			while (--i >= 0)
> > +				kthread_stop(tsk[i]);
> > +			goto out;
> > +		}
> > +
> > +		if (node != NUMA_NO_NODE)
> > +			kthread_bind_mask(tsk[i], cpumask_of_node(node));
> > +	}
> 
> Duplicating all the thread-wrangling code seems needlessly horrible - surely
> it's easy enough to factor out the stats initialisation and final
> calculation, along with the thread function itself. Perhaps as callbacks in
> the map_benchmark_data?

Could try that.

> Similarly, each "thread function" itself only only actually needs to consist
> of the respective "while (!kthread_should_stop())" loop - the rest of
> map_benchmark_thread() could still be used as a common harness to avoid
> duplicating the buffer management code as well.

If we want to have a separate data structure for IOVA tests there's more
reason to keep the threads separated as each would be touching different
data structures, otherwise we end up with a large branch.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] dmatest: move printing to its own routine
  2025-05-20 22:39 ` [PATCH 3/6] dmatest: move printing to its own routine Luis Chamberlain
  2025-05-21 14:41   ` Robin Murphy
@ 2025-05-21 22:26   ` kernel test robot
  1 sibling, 0 replies; 20+ messages in thread
From: kernel test robot @ 2025-05-21 22:26 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, robin.murphy,
	leon, jgg, alex.williamson, joel.granados
  Cc: oe-kbuild-all, iommu, dmaengine, linux-block, gost.dev, mcgrof

Hi Luis,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.15-rc7 next-20250521]
[cannot apply to vkoul-dmaengine/next shuah-kselftest/next shuah-kselftest/fixes sysctl/sysctl-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/fake-dma-add-fake-dma-engine-driver/20250521-064035
base:   linus/master
patch link:    https://lore.kernel.org/r/20250520223913.3407136-4-mcgrof%40kernel.org
patch subject: [PATCH 3/6] dmatest: move printing to its own routine
config: sparc-allmodconfig (https://download.01.org/0day-ci/archive/20250522/202505220605.kiB8N7DJ-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250522/202505220605.kiB8N7DJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505220605.kiB8N7DJ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/dma/dmatest.c: In function 'dmatest_func':
>> drivers/dma/dmatest.c:957:17: warning: variable 'start_time' set but not used [-Wunused-but-set-variable]
     957 |         ktime_t start_time, streaming_start;
         |                 ^~~~~~~~~~


vim +/start_time +957 drivers/dma/dmatest.c

   932	
   933	/*
   934	 * This function repeatedly tests DMA transfers of various lengths and
   935	 * offsets for a given operation type until it is told to exit by
   936	 * kthread_stop(). There may be multiple threads running this function
   937	 * in parallel for a single channel, and there may be multiple channels
   938	 * being tested in parallel.
   939	 *
   940	 * Before each test, the source and destination buffer is initialized
   941	 * with a known pattern. This pattern is different depending on
   942	 * whether it's in an area which is supposed to be copied or
   943	 * overwritten, and different in the source and destination buffers.
   944	 * So if the DMA engine doesn't copy exactly what we tell it to copy,
   945	 * we'll notice.
   946	 */
   947	static int dmatest_func(void *data)
   948	{
   949		struct dmatest_thread *thread = data;
   950		struct dmatest_info *info = thread->info;
   951		struct dmatest_params *params = &info->params;
   952		struct dma_chan *chan = thread->chan;
   953		unsigned int buf_size;
   954		u8 align;
   955		bool is_memset;
   956		unsigned int total_iterations = 0;
 > 957		ktime_t start_time, streaming_start;
   958		ktime_t filltime = 0;
   959		ktime_t comparetime = 0;
   960		int ret;
   961	
   962		set_freezable();
   963		smp_rmb();
   964		thread->pending = false;
   965	
   966		/* Initialize statistics */
   967		thread->streaming_tests = 0;
   968		thread->streaming_failures = 0;
   969		thread->streaming_total_len = 0;
   970		thread->streaming_runtime = 0;
   971	
   972		/* Setup test parameters and allocate buffers */
   973		ret = dmatest_setup_test(thread, &buf_size, &align, &is_memset);
   974		if (ret)
   975			goto err_thread_type;
   976	
   977		set_user_nice(current, 10);
   978	
   979		start_time = ktime_get();
   980		while (!(kthread_should_stop() ||
   981			(params->iterations && total_iterations >= params->iterations))) {
   982	
   983			/* Test streaming DMA path */
   984			streaming_start = ktime_get();
   985			ret = dmatest_do_dma_test(thread, buf_size, align, is_memset,
   986						  &thread->streaming_tests, &thread->streaming_failures,
   987						  &thread->streaming_total_len,
   988						  &filltime, &comparetime);
   989			thread->streaming_runtime = ktime_add(thread->streaming_runtime,
   990							    ktime_sub(ktime_get(), streaming_start));
   991			if (ret < 0)
   992				break;
   993	
   994			total_iterations++;
   995		}
   996	
   997		/* Subtract fill and compare time from both paths */
   998		thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
   999						   ktime_divns(filltime, 2));
  1000		thread->streaming_runtime = ktime_sub(thread->streaming_runtime,
  1001						   ktime_divns(comparetime, 2));
  1002	
  1003		ret = 0;
  1004		dmatest_cleanup_test(thread);
  1005	
  1006	err_thread_type:
  1007		/* Print detailed statistics */
  1008		dmatest_print_detailed_stats(thread);
  1009	
  1010		/* terminate all transfers on specified channels */
  1011		if (ret || (thread->streaming_failures))
  1012			dmaengine_terminate_sync(chan);
  1013	
  1014		thread->done = true;
  1015		wake_up(&thread_wait);
  1016	
  1017		return ret;
  1018	}
  1019	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
  2025-05-21 14:20   ` Robin Murphy
@ 2025-05-21 23:40   ` kernel test robot
  1 sibling, 0 replies; 20+ messages in thread
From: kernel test robot @ 2025-05-21 23:40 UTC (permalink / raw)
  To: Luis Chamberlain, vkoul, chenxiang66, m.szyprowski, robin.murphy,
	leon, jgg, alex.williamson, joel.granados
  Cc: oe-kbuild-all, iommu, dmaengine, linux-block, gost.dev, mcgrof

Hi Luis,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.15-rc7 next-20250521]
[cannot apply to vkoul-dmaengine/next shuah-kselftest/next shuah-kselftest/fixes sysctl/sysctl-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/fake-dma-add-fake-dma-engine-driver/20250521-064035
base:   linus/master
patch link:    https://lore.kernel.org/r/20250520223913.3407136-2-mcgrof%40kernel.org
patch subject: [PATCH 1/6] fake-dma: add fake dma engine driver
config: alpha-randconfig-r113-20250522 (https://download.01.org/0day-ci/archive/20250522/202505220711.3GeHexsR-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 12.4.0
reproduce: (https://download.01.org/0day-ci/archive/20250522/202505220711.3GeHexsR-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505220711.3GeHexsR-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/dma/fake-dma.c:69:24: sparse: sparse: symbol 'single_fake_dma' was not declared. Should it be static?

vim +/single_fake_dma +69 drivers/dma/fake-dma.c

    68	
  > 69	struct fake_dma_device *single_fake_dma;
    70	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-21 17:07     ` Luis Chamberlain
@ 2025-05-22 11:18       ` Marek Szyprowski
  2025-05-22 16:59         ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Marek Szyprowski @ 2025-05-22 11:18 UTC (permalink / raw)
  To: Luis Chamberlain, Robin Murphy
  Cc: vkoul, chenxiang66, leon, jgg, alex.williamson, joel.granados,
	iommu, dmaengine, linux-block, gost.dev

On 21.05.2025 19:07, Luis Chamberlain wrote:
> On Wed, May 21, 2025 at 03:20:11PM +0100, Robin Murphy wrote:
>> On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
>>> Today on x86_64 q35 guests we can't easily test some of the DMA API
>>> with the dmatest out of the box because we lack a DMA engine as the
>>> current qemu intel IOT patches are out of tree. This implements a basic
>>> dma engine to let us use the dmatest API to expand on it and leverage
>>> it on q35 guests.
>> What does doing so ultimately achieve though?
> What do you do to test for regressions automatically today for the DMA API?
>
> This patch series didn't just add a fake-dma engine though but let's
> first address that as its what you raised a question for:
>
> Although I didn't add them, with this we can easily enable kernel
> selftests to now allow any q35 guest to easily run basic API tests for the
> DMA API. It's actually how I found the dma benchmark code, as its the only
> selftest we have for DMA. However that benchmark test is not easy to
> configure or enable. With kernel selftests you can test for things
> outside of the scope of performance.

IMHO adding a fake driver just to use some of its side-effects that are 
related with dma-mapping without the obvious information what would 
actually be tested, is not the right approach. Maybe the dma benchmark 
code can be extended with similar functionality as the selftests for 
dma-engine, I didn't check yet. It would be better to have such 
self-test in the proper layer. If adding the needed functionality to dma 
benchmark is not possible, then maybe create another self-test, which 
will do similar calls to the dma-mapping api as those dma-engine 
self-tests do, but without the whole dma-engine related part.


> You can test for expected
> correctness of the APIs and to ensure no regressions exist with extected
> behavior, otherwise you learn about possible regressions reactively. We
> have many selftests that do just that without a focus on performance for
> many things, xarray, maple tree, sysctl, firmware loader, module
> loading, etc. And yes, they find bugs proactively.
>
> With this then, we should be able to easily add a CI to run these tests
> based on linux-next or linus' tags, even if its virtual. Who would run
> these? We can get this going daily on kdevops easily, if we want them, we
> already have a series of tests automated for different subsystems.
>
> Benchmarking can be done separatley with real hardware -- agreed.
> But it does not negate the need for simple virtual kernel selftests.
>
>    Luis
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-22 11:18       ` Marek Szyprowski
@ 2025-05-22 16:59         ` Luis Chamberlain
  2025-05-22 19:38           ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-22 16:59 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Robin Murphy, vkoul, chenxiang66, leon, jgg, alex.williamson,
	joel.granados, iommu, dmaengine, linux-block, gost.dev,
	Haavard.Skinnemoen, Dan Williams

On Thu, May 22, 2025 at 01:18:45PM +0200, Marek Szyprowski wrote:
> On 21.05.2025 19:07, Luis Chamberlain wrote:
> > On Wed, May 21, 2025 at 03:20:11PM +0100, Robin Murphy wrote:
> >> On 2025-05-20 11:39 pm, Luis Chamberlain wrote:
> >>> Today on x86_64 q35 guests we can't easily test some of the DMA API
> >>> with the dmatest out of the box because we lack a DMA engine as the
> >>> current qemu intel IOT patches are out of tree. This implements a basic
> >>> dma engine to let us use the dmatest API to expand on it and leverage
> >>> it on q35 guests.
> >> What does doing so ultimately achieve though?
> > What do you do to test for regressions automatically today for the DMA API?
> >
> > This patch series didn't just add a fake-dma engine though but let's
> > first address that as its what you raised a question for:
> >
> > Although I didn't add them, with this we can easily enable kernel
> > selftests to now allow any q35 guest to easily run basic API tests for the
> > DMA API. It's actually how I found the dma benchmark code, as its the only
> > selftest we have for DMA. However that benchmark test is not easy to
> > configure or enable. With kernel selftests you can test for things
> > outside of the scope of performance.
> 
> IMHO adding a fake driver just to use some of its side-effects that are 
> related with dma-mapping without the obvious information what would 
> actually be tested, is not the right approach. Maybe the dma benchmark 
> code can be extended with similar functionality as the selftests for 
> dma-engine, I didn't check yet.

I like the idea, however I will can save you time. The dmatest was
added with the requirement for a DMA controller exposing DMA channels
through the DMA engine. The dma benchmark does not, it test an existing device.

At a cursory glance, if we want to do something like this I think this
would be evaluating merging both. Merging both is possible if we're willing
to then make the DMA channel requests optional, as I don't think every
device which we'd bind to the DMA benchmark would / could use the DMA
channels.

The point to the fake-dma driver was to write a DMA controller which
simulates fake DMA channels and registers to the DMA engine, it exposes
these channels so we can leverage the dmatest. The DMA controller capabitilies
are a full feature set to ensure we can test all APIs used for them through
DMA channels. At first I was under the impression DMA controllers always
need to register DMA channels, however I'm now under the impression DMA
controllers don't need to expose DMA channels. Is that right?

If DMA channels are optional and a specialized feature only leveraged
by certain devices, then bundling this into dma-benchmark just
architecturally doesn't make sense.

> It would be better to have such  self-test in the proper layer.

Agreed. Perhaps the dmatest reflects the age of the evolution of the
DMA engine with some specialized DMA controllers with DMA channels,
either private or public for verfy specialized offload operations. And
that's optional?

Regardless, its a feature of the DMA engine, and if we want to keep
test coverage for it, my point that its not feasible today to test
dmatest on q35 guests stands, still validating the need for something
like the fake-dma driver.

Then we'd need stick with two selftests:

 - dmatest - perhaps should be renamed for the emphasis on channels
 - dma-benchmark - for regular DMA API

If this is correct, this patchset still seems to be going in the
right direction.

> If adding the needed functionality to dma 
> benchmark is not possible, then maybe create another self-test, which 
> will do similar calls to the dma-mapping api as those dma-engine 
> self-tests do, but without the whole dma-engine related part.

That's what this patchset does already. What I wish was easier was
to not have to *require* doing to unbind/bind of a real device. That
would allow us to also enable CI testing through virtualization. I'll
try the few knobs suggested by Leon to see if that enables it.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fake-dma: add fake dma engine driver
  2025-05-22 16:59         ` Luis Chamberlain
@ 2025-05-22 19:38           ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2025-05-22 19:38 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Robin Murphy, vkoul, chenxiang66, leon, jgg, alex.williamson,
	joel.granados, iommu, dmaengine, linux-block, gost.dev,
	Haavard.Skinnemoen, Dan Williams

On Thu, May 22, 2025 at 09:59:03AM -0700, Luis Chamberlain wrote:
> I'll try the few knobs suggested by Leon to see if that enables it.

That didn't help, but I'm more intruiged by giving the iommufd mocking a
shot, so I'll try that next as I think that will ultimately be cleaner
if possible.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-05-22 19:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 22:39 [PATCH 0/6] dma: fake-dma and IOVA tests Luis Chamberlain
2025-05-20 22:39 ` [PATCH 1/6] fake-dma: add fake dma engine driver Luis Chamberlain
2025-05-21 14:20   ` Robin Murphy
2025-05-21 17:07     ` Luis Chamberlain
2025-05-22 11:18       ` Marek Szyprowski
2025-05-22 16:59         ` Luis Chamberlain
2025-05-22 19:38           ` Luis Chamberlain
2025-05-21 23:40   ` kernel test robot
2025-05-20 22:39 ` [PATCH 2/6] dmatest: split dmatest_func() into helpers Luis Chamberlain
2025-05-20 22:39 ` [PATCH 3/6] dmatest: move printing to its own routine Luis Chamberlain
2025-05-21 14:41   ` Robin Murphy
2025-05-21 17:10     ` Luis Chamberlain
2025-05-21 22:26   ` kernel test robot
2025-05-20 22:39 ` [PATCH 4/6] dmatest: add IOVA tests Luis Chamberlain
2025-05-20 22:39 ` [PATCH 5/6] dma-mapping: benchmark: move validation parameters into a helper Luis Chamberlain
2025-05-20 22:39 ` [PATCH 6/6] dma-mapping: benchmark: add IOVA support Luis Chamberlain
2025-05-21 11:58   ` kernel test robot
2025-05-21 16:08   ` Robin Murphy
2025-05-21 17:17     ` Luis Chamberlain
2025-05-21 11:17 ` [PATCH 0/6] dma: fake-dma and IOVA tests Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).