* [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support
@ 2026-06-06 0:02 Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 01/23] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
` (22 more replies)
0 siblings, 23 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch, Frank Li
The Smart Data Accelerator Interface (SDXI) is a vendor-neutral
architecture for memory-to-memory data movement offload designed for
kernel bypass and virtualization.
General information on SDXI may be found at:
https://www.snia.org/sdxi
This submission adds a driver with basic support for PCIe-hosted SDXI
1.0 implementations and includes a DMA engine provider with memcpy
capability.
Planned future SDXI work (out of scope for this series):
* Character device for exposing SDXI contexts to user space.
* Support for operation types to be added in future SDXI revisions.
* Greater configurability for control structures, e.g. descriptor ring
size.
The latest released version of the SDXI specification is 1.0:
https://www.snia.org/sites/default/files/technical-work/sdxi/release/SNIA-SDXI-Specification-v1.0a.pdf
Draft versions of future SDXI specifications in development may be found at:
https://www.snia.org/tech_activities/publicreview#sdxi
The DMA engine provider included here survives dmatest runs with both
polled and interrupt-signaled completion modes, with the following
debug options and sanitizers enabled:
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_KASAN=y
CONFIG_PROVE_LOCKING=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_UBSAN=y
Example test:
$ qemu-system-x86_64 -m 4G -smp 4 -kernel ~/bzImage -nographic \
-append 'console=ttyS0 debug sdxi_core.dma_channels=2
dmatest.polled=0 dmatest.iterations=10000 dmatest.run=1 \
dmatest.threads_per_chan=2 sdxi_core.dyndbg=+p \
sdxi_pci.dyndbg=+p' -device vfio-pci,host=0000:01:02.1 \
-initrd ~/rootfs.cpio -M q35 -accel kvm
[...]
# dmesg | grep -i -e sdxi -e dmatest
dmatest: No channels configured, continue with any
sdxi 0000:00:03.0: allocated 64 vectors
sdxi 0000:00:03.0: attempting stop, current state: stopped
sdxi 0000:00:03.0: SDXI 1.0 device found
sdxi 0000:00:03.0: activated
dmatest: Added 2 threads using dma0chan0
dmatest: Added 2 threads using dma0chan1
dmatest: Started 2 threads using dma0chan0
dmatest: Started 2 threads using dma0chan1
dmatest: dma0chan0-copy0: summary 10000 tests, 0 failures
dmatest: dma0chan0-copy1: summary 10000 tests, 0 failures
dmatest: dma0chan1-copy1: summary 10000 tests, 0 failures
dmatest: dma0chan1-copy0: summary 10000 tests, 0 failures
---
Changes in v3:
(I'm continuing to work through the Sashiko-reported issues/comments
from the v2 submission, but IMO there's enough of a delta here to
respin.)
- Fix akey allocation error path in dma.c to return a proper
error value. (Tycho Andersen)
- Disable SR-IOV in PCI removal. (TA)
- Update the Rust list of PCI class codes simultaneously with the C
header. (Sashiko)
- Properly build the bus-agnostic core support as a separate
module (sdxi-core) from the PCI driver (sdxi-pci). (Sashiko)
- Add dependency on CONFIG_64BIT to simplify assumptions around MMIO
and control structure accesses. (Sashiko)
- Use readq/writeq instead of ioread64/iowrite64 since we don't need
to handle port space. (Sashiko)
- Correct vector allocation range to ensure the error IRQ index (0) is
reserved. (Sashiko)
- Fix context control block dma pool allocation failure
check. (Sashiko)
- Ensure device is in stopped state before clearing MMIO_CTL0
configuration during init. (Sashiko)
- Add explicit alignment attributes to packed control structure
types. (Sashiko)
- Rename prep_memcpy_polled() to prep_memcpy_nointr(). (Frank Li)
- Link to v2: https://patch.msgid.link/20260511-sdxi-base-v2-0-889cfed17e3f@amd.com
Changes in v2:
- Drop unneeded dma_set_mask_and_coherent() result check. (Frank Li)
- Inline SDXI_DRV_DESC directly into MODULE_DESCRIPTION(). (FL)
- Drop unneeded braces from simple conditionals. (FL)
- Drop sdxi logging wrapper macros; use dev_dbg, dev_info etc
directly. (FL)
- Reordering of commit message (patch 04, "Feature
discovery..."). (FL)
- Use read_poll_timeout() for function start and stop routines. (FL)
- Align multi-line FIELD_PREP() uses. (FL)
- Drop sdxi_create_dma_pool() helper. (FL)
- Remove unneeded dma_wmb() before iowrite64() to doorbell. (FL)
- Use WRITE_ONCE() to update descriptor ring write index. (FL)
- Make sdxi_completion_poll() eventually time out and adjust call
sites. (FL)
- Remove vestigial sdxi_dma_unregister() declaration. (FL)
- Reserve context ID before allocating context data structures instead
of after.
- Update context ID class to transfer ownership of ID to context
object; sdxi_free_cxt() now responsible for releasing ID once
assigned.
- Align small frequently-updated DMA pool objects to cacheline
boundaries.
- Drop redundant dma_set_mask_and_coherent() from DMA provider.
- Log unarchitected function status values in sdxi_dev_gsv().
- Remove sdxi_to_dev(); the abstraction is unnecessary and sdxi->dev
is shorter.
- Link to v1: https://patch.msgid.link/20260410-sdxi-base-v1-0-1d184cb5c60a@amd.com
Changes in v1:
- Reorder series and introduce functionality incrementally while
remaining buildable and functional at each step. (Jonathan Cameron)
- Use devres APIs where possible for device resources (JC)
- Use cleanup APIs to significantly reduce use of goto-oriented error
unwinding. (JC)
- Drop SDXI_DEBUG config option. (JC)
- Cite SDXI spec version and section number consistently throughout. (JC)
- Combine local variable declarations of same type. (JC)
- Mark descriptor structs __packed. (JC)
- Use designated initializers in descriptor encoding functions. (JC)
- Prefer dev_err_probe() over sdxi_err() in sdxi_pci_init(). (JC)
- Prune unnecessary includes throughout source files. (JC)
- Remove unnecessary/unhelpful comments in several places. (JC)
- Remove SDXI spec material from "Add SNIA SDXI accelerator sub-class"
commit message and reword the remainder. (Bjorn Helgaas)
- Remove unnecessary local for DMA_BIT_MASK() argument in
sdxi_pci_init(). (BH)
- Use "{ }" for final null entry in id table, not "{ 0, }". (BH)
- Replace sample descriptor submission code from the SDXI spec with an
improved API that has unit tests, eliminates a copy step for
callers, and can block until ring space becomes available if
desired.
- Omit the error log facility for now; it can be reintroduced later.
- Use a per-device xarray to allocate context IDs and map them to
context objects.
- Implement interrupt-based completion signaling for memcpy operations
in the DMA engine provider, DMA provider code mostly rewritten.
Non-changes in v1:
- Mario suggested that pci_clear_master() is needed in
sdxi_pci_init()'s error path and in sdxi_pci_exit() (now
sdxi_pci_remove()). However, sdxi uses pcim_enable_device(), which
appears to ensure that master is cleared for the device. Happy to
revisit this if I'm mistaken.
- Link to RFC: https://lore.kernel.org/r/20250905-sdxi-base-v1-0-d0341a1292ba@amd.com
---
Nathan Lynch (23):
PCI: Add SNIA SDXI accelerator sub-class
MAINTAINERS: Add entry for SDXI driver
dmaengine: sdxi: Add PCI initialization
dmaengine: sdxi: Feature discovery and initial configuration
dmaengine: sdxi: Configure context tables
dmaengine: sdxi: Allocate DMA pools
dmaengine: sdxi: Allocate administrative context
dmaengine: sdxi: Install administrative context
dmaengine: sdxi: Start functions on probe, stop on remove
dmaengine: sdxi: Complete administrative context jump start
dmaengine: sdxi: Add client context alloc and release APIs
dmaengine: sdxi: Add descriptor ring management
dmaengine: sdxi: Add unit tests for descriptor ring reservations
dmaengine: sdxi: Attach descriptor ring state to contexts
dmaengine: sdxi: Per-context access key (AKey) table entry allocator
dmaengine: sdxi: Generic descriptor manipulation helpers
dmaengine: sdxi: Add completion status block API
dmaengine: sdxi: Encode context start, stop, and sync descriptors
dmaengine: sdxi: Provide context start and stop APIs
dmaengine: sdxi: Encode nop, copy, and interrupt descriptors
dmaengine: sdxi: Add unit tests for descriptor encoding
dmaengine: sdxi: MSI/MSI-X vector allocation and mapping
dmaengine: sdxi: Add DMA engine provider
MAINTAINERS | 7 +
drivers/dma/Kconfig | 2 +
drivers/dma/Makefile | 1 +
drivers/dma/sdxi/.kunitconfig | 4 +
drivers/dma/sdxi/Kconfig | 40 +++
drivers/dma/sdxi/Makefile | 16 ++
drivers/dma/sdxi/completion.c | 87 +++++++
drivers/dma/sdxi/completion.h | 25 ++
drivers/dma/sdxi/context.c | 507 ++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/context.h | 109 ++++++++
drivers/dma/sdxi/descriptor.c | 198 ++++++++++++++
drivers/dma/sdxi/descriptor.h | 135 ++++++++++
drivers/dma/sdxi/descriptor_kunit.c | 484 ++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/device.c | 371 ++++++++++++++++++++++++++
drivers/dma/sdxi/dma.c | 501 +++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/dma.h | 11 +
drivers/dma/sdxi/hw.h | 254 ++++++++++++++++++
drivers/dma/sdxi/mmio.h | 60 +++++
drivers/dma/sdxi/pci.c | 117 +++++++++
drivers/dma/sdxi/ring.c | 159 +++++++++++
drivers/dma/sdxi/ring.h | 84 ++++++
drivers/dma/sdxi/ring_kunit.c | 105 ++++++++
drivers/dma/sdxi/sdxi.h | 138 ++++++++++
include/linux/pci_ids.h | 1 +
rust/kernel/pci/id.rs | 1 +
25 files changed, 3417 insertions(+)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20250813-sdxi-base-73d7c9fdce57
Best regards,
--
Nathan Lynch <nathan.lynch@amd.com>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3 01/23] PCI: Add SNIA SDXI accelerator sub-class
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 02/23] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay
` (21 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Make the class code for SNIA Smart Data Accelerator Interface (SDXI)
functions available to both C and Rust code.
See PCI Code and ID Assignment spec r1.14, sec 1.19.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
include/linux/pci_ids.h | 1 +
rust/kernel/pci/id.rs | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 24cb42f66e4b..83ab3f27eb5a 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -154,6 +154,7 @@
#define PCI_BASE_CLASS_ACCELERATOR 0x12
#define PCI_CLASS_ACCELERATOR_PROCESSING 0x1200
+#define PCI_CLASS_ACCELERATOR_SDXI 0x120100
#define PCI_CLASS_OTHERS 0xff
diff --git a/rust/kernel/pci/id.rs b/rust/kernel/pci/id.rs
index 50005d176561..d5e006cd6641 100644
--- a/rust/kernel/pci/id.rs
+++ b/rust/kernel/pci/id.rs
@@ -292,6 +292,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
SP_OTHER = bindings::PCI_CLASS_SP_OTHER, // 0x118000
ACCELERATOR_PROCESSING = bindings::PCI_CLASS_ACCELERATOR_PROCESSING, // 0x120000
+ ACCELERATOR_SDXI = bindings::PCI_CLASS_ACCELERATOR_SDXI, // 0x120100
OTHERS = bindings::PCI_CLASS_OTHERS, // 0xff0000
}
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 02/23] MAINTAINERS: Add entry for SDXI driver
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 01/23] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 03/23] dmaengine: sdxi: Add PCI initialization Nathan Lynch via B4 Relay
` (20 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Add an entry for the SDXI driver to MAINTAINERS. Wei and I will
maintain the driver.
The SDXI specification and other materials may be found at:
https://www.snia.org/sdxi
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..5c6d175a3f42 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24036,6 +24036,13 @@ L: sdricohcs-devel@lists.sourceforge.net (subscribers-only)
S: Maintained
F: drivers/mmc/host/sdricoh_cs.c
+SDXI (Smart Data Accelerator Interface) DRIVER
+M: Nathan Lynch <nathan.lynch@amd.com>
+M: Wei Huang <wei.huang2@amd.com>
+L: dmaengine@vger.kernel.org
+S: Supported
+F: drivers/dma/sdxi/
+
SECO BOARDS CEC DRIVER
M: Ettore Chimenti <ek5.chimenti@gmail.com>
S: Maintained
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 03/23] dmaengine: sdxi: Add PCI initialization
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 01/23] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 02/23] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration Nathan Lynch via B4 Relay
` (19 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Add enough code to bind a SDXI device via the class code and map its
control registers and doorbell region. All device resources are
managed with devres at this point, so there is no explicit teardown
path.
While the SDXI specification includes a PCIe binding, the standard is
intended to be independent of the underlying I/O interconnect. So the
driver confines PCI-specific code to pci.c, and the rest (such as
device.c, introduced here) is bus-agnostic. Hence there is some
indirection: during probe, the bus code registers any matched device
with the generic SDXI core, supplying the device and a sdxi_bus_ops
vector. After the core associates a new sdxi_dev with the device,
bus-specific initialization proceeds via the sdxi_bus_ops->init()
callback.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/Kconfig | 2 ++
drivers/dma/Makefile | 1 +
drivers/dma/sdxi/Kconfig | 28 ++++++++++++++++
drivers/dma/sdxi/Makefile | 6 ++++
drivers/dma/sdxi/device.c | 33 +++++++++++++++++++
drivers/dma/sdxi/pci.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/sdxi.h | 38 +++++++++++++++++++++
7 files changed, 192 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index ae6a682c9f76..3d89284e7cf8 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -762,6 +762,8 @@ source "drivers/dma/lgm/Kconfig"
source "drivers/dma/loongson/Kconfig"
+source "drivers/dma/sdxi/Kconfig"
+
source "drivers/dma/stm32/Kconfig"
# clients
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 14aa086629d5..069bba1d26c7 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -84,6 +84,7 @@ obj-$(CONFIG_XGENE_DMA) += xgene-dma.o
obj-$(CONFIG_ST_FDMA) += st_fdma.o
obj-$(CONFIG_FSL_DPAA2_QDMA) += fsl-dpaa2-qdma/
obj-$(CONFIG_INTEL_LDMA) += lgm/
+obj-$(CONFIG_SDXI_CORE) += sdxi/
obj-y += amd/
obj-y += loongson/
diff --git a/drivers/dma/sdxi/Kconfig b/drivers/dma/sdxi/Kconfig
new file mode 100644
index 000000000000..b91b44231a04
--- /dev/null
+++ b/drivers/dma/sdxi/Kconfig
@@ -0,0 +1,28 @@
+config SDXI_CORE
+ tristate "SDXI (Smart Data Accelerator Interface) support"
+ # Doorbell updates require naturally-aligned 64-bit writes, per
+ # SDXI 1.0 9 MMIO Control Registers.
+ depends on 64BIT
+ select DMA_ENGINE
+ help
+ Enable support for Smart Data Accelerator Interface (SDXI)
+ Platform Data Mover devices. SDXI is a vendor-neutral
+ standard for a memory-to-memory data mover and acceleration
+ interface.
+
+ This option is automatically selected by the SDXI PCI driver.
+ Enable it manually if you want to build the SDXI core for unit
+ testing without enabling the PCI driver.
+
+ To compile this driver as a module, choose M here: the module
+ will be named "sdxi-core".
+
+config SDXI_PCI
+ tristate "SDXI (Smart Data Accelerator Interface) PCI driver"
+ depends on PCI_MSI && 64BIT
+ select SDXI_CORE
+ help
+ Enable support for PCI-hosted SDXI devices.
+
+ To compile this driver as a module, choose M here: the module
+ will be named "sdxi-pci".
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
new file mode 100644
index 000000000000..0006edf74d86
--- /dev/null
+++ b/drivers/dma/sdxi/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_SDXI_CORE) += sdxi-core.o
+sdxi-core-y := device.o
+
+obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
+sdxi-pci-y := pci.o
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
new file mode 100644
index 000000000000..0974a83bb45c
--- /dev/null
+++ b/drivers/dma/sdxi/device.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI hardware device driver
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+
+#include "sdxi.h"
+
+int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
+{
+ struct sdxi_dev *sdxi;
+
+ sdxi = devm_kzalloc(dev, sizeof(*sdxi), GFP_KERNEL);
+ if (!sdxi)
+ return -ENOMEM;
+
+ sdxi->dev = dev;
+ sdxi->bus_ops = ops;
+ dev_set_drvdata(dev, sdxi);
+
+ return sdxi->bus_ops->init(sdxi);
+}
+EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
+
+MODULE_AUTHOR("Wei Huang");
+MODULE_AUTHOR("Nathan Lynch");
+MODULE_DESCRIPTION("SDXI core");
+MODULE_LICENSE("GPL");
diff --git a/drivers/dma/sdxi/pci.c b/drivers/dma/sdxi/pci.c
new file mode 100644
index 000000000000..42e8af008b10
--- /dev/null
+++ b/drivers/dma/sdxi/pci.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI PCI device code
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#include <linux/dev_printk.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iomap.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include "sdxi.h"
+
+enum sdxi_mmio_bars {
+ SDXI_PCI_BAR_CTL_REGS = 0,
+ SDXI_PCI_BAR_DOORBELL = 2,
+};
+
+static struct pci_dev *sdxi_to_pci_dev(const struct sdxi_dev *sdxi)
+{
+ return to_pci_dev(sdxi->dev);
+}
+
+static int sdxi_pci_init(struct sdxi_dev *sdxi)
+{
+ struct pci_dev *pdev = sdxi_to_pci_dev(sdxi);
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ ret = pcim_enable_device(pdev);
+ if (ret)
+ return dev_err_probe(dev, ret, "failed to enable device\n");
+
+ dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+
+ sdxi->ctrl_regs = pcim_iomap_region(pdev, SDXI_PCI_BAR_CTL_REGS,
+ KBUILD_MODNAME);
+ if (IS_ERR(sdxi->ctrl_regs))
+ return dev_err_probe(dev, PTR_ERR(sdxi->ctrl_regs),
+ "failed to map control registers\n");
+
+ sdxi->dbs = pcim_iomap_region(pdev, SDXI_PCI_BAR_DOORBELL,
+ KBUILD_MODNAME);
+ if (IS_ERR(sdxi->dbs))
+ return dev_err_probe(dev, PTR_ERR(sdxi->dbs),
+ "failed to map doorbell region\n");
+
+ pci_set_master(pdev);
+ return 0;
+}
+
+static const struct sdxi_bus_ops sdxi_pci_ops = {
+ .init = sdxi_pci_init,
+};
+
+static int sdxi_pci_probe(struct pci_dev *pdev,
+ const struct pci_device_id *id)
+{
+ return sdxi_register(&pdev->dev, &sdxi_pci_ops);
+}
+
+static const struct pci_device_id sdxi_id_table[] = {
+ { PCI_DEVICE_CLASS(PCI_CLASS_ACCELERATOR_SDXI, 0xffffff) },
+ { }
+};
+MODULE_DEVICE_TABLE(pci, sdxi_id_table);
+
+static struct pci_driver sdxi_driver = {
+ .name = "sdxi",
+ .id_table = sdxi_id_table,
+ .probe = sdxi_pci_probe,
+ .sriov_configure = pci_sriov_configure_simple,
+};
+
+MODULE_IMPORT_NS("SDXI");
+MODULE_AUTHOR("Wei Huang");
+MODULE_AUTHOR("Nathan Lynch");
+MODULE_DESCRIPTION("SDXI PCIe interface driver");
+MODULE_LICENSE("GPL");
+module_pci_driver(sdxi_driver);
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
new file mode 100644
index 000000000000..d4c61ca2f875
--- /dev/null
+++ b/drivers/dma/sdxi/sdxi.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * SDXI device driver header
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#ifndef DMA_SDXI_H
+#define DMA_SDXI_H
+
+#include <linux/compiler_types.h>
+#include <linux/types.h>
+
+struct sdxi_dev;
+
+/**
+ * struct sdxi_bus_ops - Bus-specific methods for SDXI devices.
+ */
+struct sdxi_bus_ops {
+ /**
+ * @init: Map control registers and doorbell region, allocate
+ * IRQ ranges. Invoked before bus-agnostic SDXI
+ * function initialization.
+ */
+ int (*init)(struct sdxi_dev *sdxi);
+};
+
+struct sdxi_dev {
+ struct device *dev;
+ void __iomem *ctrl_regs; /* virt addr of ctrl registers */
+ void __iomem *dbs; /* virt addr of doorbells */
+
+ const struct sdxi_bus_ops *bus_ops;
+};
+
+int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops);
+
+#endif /* DMA_SDXI_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (2 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 03/23] dmaengine: sdxi: Add PCI initialization Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:14 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 05/23] dmaengine: sdxi: Configure context tables Nathan Lynch via B4 Relay
` (18 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Discover via the capability registers the doorbell region stride, the
maximum supported context ID, the operation groups implemented, and
limits on buffer and control structure sizes. The driver has the
option of writing more conservative limits to the ctl2 register, but
it uses those supplied by the implementation for now.
Introduce device register definitions and associated masks via mmio.h.
Add convenience wrappers which are first used here:
- sdxi_read64()
- sdxi_write64()
Report the version of the standard to which the device conforms, e.g.
sdxi 0000:00:03.0: SDXI 1.0 device found
After bus-specific initialization, force the SDXI function to stopped
state. This is the expected state from reset, but kexec or driver bugs
can leave a function in other states from which the initialization
code must be able to recover.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/device.c | 178 +++++++++++++++++++++++++++++++++++++++++++++-
drivers/dma/sdxi/mmio.h | 54 ++++++++++++++
drivers/dma/sdxi/sdxi.h | 19 +++++
3 files changed, 250 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index 0974a83bb45c..7c6652f9c3c0 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -5,15 +5,187 @@
* Copyright Advanced Micro Devices, Inc.
*/
+#include <linux/bitfield.h>
+#include <linux/delay.h>
#include <linux/device.h>
#include <linux/export.h>
+#include <linux/iopoll.h>
+#include <linux/jiffies.h>
#include <linux/slab.h>
+#include <linux/time.h>
+#include "mmio.h"
#include "sdxi.h"
+enum sdxi_fn_gsv {
+ SDXI_GSV_STOP = 0,
+ SDXI_GSV_INIT = 1,
+ SDXI_GSV_ACTIVE = 2,
+ SDXI_GSV_STOPG_SF = 3,
+ SDXI_GSV_STOPG_HD = 4,
+ SDXI_GSV_ERROR = 5,
+};
+
+static const char *const gsv_strings[] = {
+ [SDXI_GSV_STOP] = "stopped",
+ [SDXI_GSV_INIT] = "initializing",
+ [SDXI_GSV_ACTIVE] = "active",
+ [SDXI_GSV_STOPG_SF] = "soft stopping",
+ [SDXI_GSV_STOPG_HD] = "hard stopping",
+ [SDXI_GSV_ERROR] = "error",
+};
+
+static const char *gsv_str(enum sdxi_fn_gsv gsv)
+{
+ if ((size_t)gsv < ARRAY_SIZE(gsv_strings))
+ return gsv_strings[(size_t)gsv];
+
+ WARN_ONCE(1, "unexpected gsv %u\n", gsv);
+
+ return "unknown";
+}
+
+enum sdxi_fn_gsr {
+ SDXI_GSRV_RESET = 0,
+ SDXI_GSRV_STOP_SF = 1,
+ SDXI_GSRV_STOP_HD = 2,
+ SDXI_GSRV_ACTIVE = 3,
+};
+
+static enum sdxi_fn_gsv sdxi_dev_gsv(const struct sdxi_dev *sdxi)
+{
+ u64 sts0 = sdxi_read64(sdxi, SDXI_MMIO_STS0);
+ enum sdxi_fn_gsv gsv = FIELD_GET(SDXI_MMIO_STS0_FN_GSV, sts0);
+
+ switch (gsv) {
+ case SDXI_GSV_STOP ... SDXI_GSV_ERROR:
+ break;
+ default:
+ dev_warn_ratelimited(sdxi->dev, "unknown gsv %u\n", gsv);
+ break;
+ }
+
+ return gsv;
+}
+
+static const unsigned long gsv_poll_interval_us = USEC_PER_MSEC;
+static const unsigned long gsv_transition_timeout_us = USEC_PER_SEC;
+
+#define sdxi_dev_gsv_poll(sdxi, val, cond) \
+ read_poll_timeout(sdxi_dev_gsv, val, cond, gsv_poll_interval_us, \
+ gsv_transition_timeout_us, false, sdxi)
+
+static void sdxi_write_fn_gsr(struct sdxi_dev *sdxi, enum sdxi_fn_gsr cmd)
+{
+ u64 ctl0 = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
+
+ FIELD_MODIFY(SDXI_MMIO_CTL0_FN_GSR, &ctl0, cmd);
+ sdxi_write64(sdxi, SDXI_MMIO_CTL0, ctl0);
+}
+
+/* Get the device to the GSV_STOP state. */
+static int sdxi_dev_stop(struct sdxi_dev *sdxi)
+{
+ enum sdxi_fn_gsv status = sdxi_dev_gsv(sdxi);
+ int ret;
+
+ dev_dbg(sdxi->dev, "attempting stop, current state: %s\n",
+ gsv_str(status));
+
+ switch (status) {
+ case SDXI_GSV_INIT:
+ case SDXI_GSV_ACTIVE:
+ sdxi_write_fn_gsr(sdxi, SDXI_GSRV_STOP_SF);
+ break;
+ case SDXI_GSV_STOPG_SF:
+ sdxi_write_fn_gsr(sdxi, SDXI_GSRV_STOP_HD);
+ break;
+ case SDXI_GSV_STOPG_HD:
+ case SDXI_GSV_ERROR:
+ /*
+ * If hard-stopping, there's nothing to do but wait.
+ * If in error state, the reset is issued below.
+ */
+ break;
+ default:
+ /* Unrecognized state; try a reset. */
+ sdxi_write_fn_gsr(sdxi, SDXI_GSRV_RESET);
+ break;
+ }
+
+ /* Wait for transition to either stop or error state. */
+ ret = sdxi_dev_gsv_poll(sdxi, status,
+ status == SDXI_GSV_STOP ||
+ status == SDXI_GSV_ERROR);
+
+ if (ret == 0 && status == SDXI_GSV_ERROR) {
+ sdxi_write_fn_gsr(sdxi, SDXI_GSRV_RESET);
+ ret = sdxi_dev_gsv_poll(sdxi, status, status == SDXI_GSV_STOP);
+ }
+
+ if (ret) {
+ dev_err(sdxi->dev, "stop timed out, current state: %s\n",
+ gsv_str(status));
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * See SDXI 1.0 4.1.8 Activation of the SDXI Function by Software.
+ */
+static int sdxi_fn_activate(struct sdxi_dev *sdxi)
+{
+ u64 version, cap0, cap1, ctl0, ctl2;
+ int err;
+
+ /*
+ * Ensure the function is in GSV_STOP state, then clear ctl0's
+ * pasid and error interrupt configuration while preserving
+ * any assigned group ID (fn_grp_id).
+ */
+ err = sdxi_dev_stop(sdxi);
+ if (err)
+ return err;
+
+ ctl0 = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
+ FIELD_MODIFY(SDXI_MMIO_CTL0_FN_ERR_INTR_EN, &ctl0, 0);
+ FIELD_MODIFY(SDXI_MMIO_CTL0_FN_PASID_VL, &ctl0, 0);
+ FIELD_MODIFY(SDXI_MMIO_CTL0_FN_PASID, &ctl0, 0);
+ sdxi_write64(sdxi, SDXI_MMIO_CTL0, ctl0);
+
+ version = sdxi_read64(sdxi, SDXI_MMIO_VERSION);
+ dev_info(sdxi->dev, "SDXI %llu.%llu device found\n",
+ FIELD_GET(SDXI_MMIO_VERSION_MAJOR, version),
+ FIELD_GET(SDXI_MMIO_VERSION_MINOR, version));
+
+ /* Read capabilities and features. */
+ cap0 = sdxi_read64(sdxi, SDXI_MMIO_CAP0);
+ sdxi->db_stride = SZ_4K;
+ sdxi->db_stride *= 1U << FIELD_GET(SDXI_MMIO_CAP0_DB_STRIDE, cap0);
+
+ cap1 = sdxi_read64(sdxi, SDXI_MMIO_CAP1);
+ sdxi->op_grp_cap = FIELD_GET(SDXI_MMIO_CAP1_OPB_000_CAP, cap1);
+ sdxi->max_cxtid = FIELD_GET(SDXI_MMIO_CAP1_MAX_CXT, cap1);
+
+ /* Apply our configuration. */
+ ctl2 = FIELD_PREP(SDXI_MMIO_CTL2_MAX_CXT, sdxi->max_cxtid);
+ ctl2 |= FIELD_PREP(SDXI_MMIO_CTL2_MAX_BUFFER,
+ FIELD_GET(SDXI_MMIO_CAP1_MAX_BUFFER, cap1));
+ ctl2 |= FIELD_PREP(SDXI_MMIO_CTL2_MAX_AKEY_SZ,
+ FIELD_GET(SDXI_MMIO_CAP1_MAX_AKEY_SZ, cap1));
+ ctl2 |= FIELD_PREP(SDXI_MMIO_CTL2_OPB_000_AVL,
+ FIELD_GET(SDXI_MMIO_CAP1_OPB_000_CAP, cap1));
+ sdxi_write64(sdxi, SDXI_MMIO_CTL2, ctl2);
+
+ return 0;
+}
+
int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
{
struct sdxi_dev *sdxi;
+ int err;
sdxi = devm_kzalloc(dev, sizeof(*sdxi), GFP_KERNEL);
if (!sdxi)
@@ -23,7 +195,11 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
sdxi->bus_ops = ops;
dev_set_drvdata(dev, sdxi);
- return sdxi->bus_ops->init(sdxi);
+ err = sdxi->bus_ops->init(sdxi);
+ if (err)
+ return err;
+
+ return sdxi_fn_activate(sdxi);
}
EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
diff --git a/drivers/dma/sdxi/mmio.h b/drivers/dma/sdxi/mmio.h
new file mode 100644
index 000000000000..f07e857691b9
--- /dev/null
+++ b/drivers/dma/sdxi/mmio.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * SDXI MMIO register offsets and layouts.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#ifndef DMA_SDXI_MMIO_H
+#define DMA_SDXI_MMIO_H
+
+#include <linux/bits.h>
+
+enum sdxi_reg {
+ /* SDXI 1.0 9.1 General Control and Status Registers */
+ SDXI_MMIO_CTL0 = 0x00000,
+ SDXI_MMIO_CTL2 = 0x00010,
+ SDXI_MMIO_STS0 = 0x00100,
+ SDXI_MMIO_CAP0 = 0x00200,
+ SDXI_MMIO_CAP1 = 0x00208,
+ SDXI_MMIO_VERSION = 0x00210,
+};
+
+/* SDXI 1.0 Table 9-2: MMIO_CTL0 */
+#define SDXI_MMIO_CTL0_FN_GSR GENMASK_ULL(1, 0)
+#define SDXI_MMIO_CTL0_FN_PASID_VL BIT_ULL(2)
+#define SDXI_MMIO_CTL0_FN_ERR_INTR_EN BIT_ULL(4)
+#define SDXI_MMIO_CTL0_FN_PASID GENMASK_ULL(27, 8)
+
+/* SDXI 1.0 Table 9-4: MMIO_CTL2 */
+#define SDXI_MMIO_CTL2_MAX_BUFFER GENMASK_ULL(3, 0)
+#define SDXI_MMIO_CTL2_MAX_AKEY_SZ GENMASK_ULL(15, 12)
+#define SDXI_MMIO_CTL2_MAX_CXT GENMASK_ULL(31, 16)
+#define SDXI_MMIO_CTL2_OPB_000_AVL GENMASK_ULL(63, 32)
+
+/* SDXI 1.0 Table 9-5: MMIO_STS0 */
+#define SDXI_MMIO_STS0_FN_GSV GENMASK_ULL(2, 0)
+
+/* SDXI 1.0 Table 9-6: MMIO_CAP0 */
+#define SDXI_MMIO_CAP0_SFUNC GENMASK_ULL(15, 0)
+#define SDXI_MMIO_CAP0_DB_STRIDE GENMASK_ULL(22, 20)
+#define SDXI_MMIO_CAP0_MAX_DS_RING_SZ GENMASK_ULL(28, 24)
+
+/* SDXI 1.0 Table 9-7: MMIO_CAP1 */
+#define SDXI_MMIO_CAP1_MAX_BUFFER GENMASK_ULL(3, 0)
+#define SDXI_MMIO_CAP1_MAX_AKEY_SZ GENMASK_ULL(15, 12)
+#define SDXI_MMIO_CAP1_MAX_CXT GENMASK_ULL(31, 16)
+#define SDXI_MMIO_CAP1_OPB_000_CAP GENMASK_ULL(63, 32)
+
+/* SDXI 1.0 Table 9-8: MMIO_VERSION */
+#define SDXI_MMIO_VERSION_MINOR GENMASK_ULL(7, 0)
+#define SDXI_MMIO_VERSION_MAJOR GENMASK_ULL(23, 16)
+
+#endif /* DMA_SDXI_MMIO_H */
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index d4c61ca2f875..721abf7556d1 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -9,8 +9,12 @@
#define DMA_SDXI_H
#include <linux/compiler_types.h>
+#include <linux/dev_printk.h>
+#include <linux/io.h>
#include <linux/types.h>
+#include "mmio.h"
+
struct sdxi_dev;
/**
@@ -30,9 +34,24 @@ struct sdxi_dev {
void __iomem *ctrl_regs; /* virt addr of ctrl registers */
void __iomem *dbs; /* virt addr of doorbells */
+ /* hardware capabilities (from cap0 & cap1) */
+ u32 db_stride; /* doorbell stride in bytes */
+ u16 max_cxtid; /* Maximum context ID allowed. */
+ u32 op_grp_cap; /* supported operation group cap */
+
const struct sdxi_bus_ops *bus_ops;
};
int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops);
+static inline u64 sdxi_read64(const struct sdxi_dev *sdxi, enum sdxi_reg reg)
+{
+ return readq(sdxi->ctrl_regs + reg);
+}
+
+static inline void sdxi_write64(struct sdxi_dev *sdxi, enum sdxi_reg reg, u64 val)
+{
+ writeq(val, sdxi->ctrl_regs + reg);
+}
+
#endif /* DMA_SDXI_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 05/23] dmaengine: sdxi: Configure context tables
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (3 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools Nathan Lynch via B4 Relay
` (17 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
SDXI uses a two-level table hierarchy to track contexts. There is a
single level 2 table per function which enumerates up to 512 level 1
tables. Each level 1 table enumerates up to 128 contexts.
Allocate and install the L2 table and a single L1 table, enough for
context IDs 0-127 (i.e. the admin context with reserved id 0, plus 127
client contexts). For now, to avoid dynamic management of additional
L1 tables, cap ctl2.max_cxt to 127.
Since the table allocations are devres-managed, there is no
corresponding cleanup code required.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/device.c | 40 +++++++++++++++++++++++++++++--
drivers/dma/sdxi/hw.h | 61 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/mmio.h | 6 +++++
drivers/dma/sdxi/sdxi.h | 5 ++++
4 files changed, 110 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index 7c6652f9c3c0..fa5e27a4190e 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -8,12 +8,15 @@
#include <linux/bitfield.h>
#include <linux/delay.h>
#include <linux/device.h>
+#include <linux/dma-mapping.h>
#include <linux/export.h>
#include <linux/iopoll.h>
#include <linux/jiffies.h>
+#include <linux/log2.h>
#include <linux/slab.h>
#include <linux/time.h>
+#include "hw.h"
#include "mmio.h"
#include "sdxi.h"
@@ -137,7 +140,8 @@ static int sdxi_dev_stop(struct sdxi_dev *sdxi)
*/
static int sdxi_fn_activate(struct sdxi_dev *sdxi)
{
- u64 version, cap0, cap1, ctl0, ctl2;
+ u64 version, cap0, cap1, ctl0, ctl2, cxt_l2, lv01_ptr;
+ struct sdxi_cxt_L2_ent *L2_ent;
int err;
/*
@@ -167,7 +171,13 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
cap1 = sdxi_read64(sdxi, SDXI_MMIO_CAP1);
sdxi->op_grp_cap = FIELD_GET(SDXI_MMIO_CAP1_OPB_000_CAP, cap1);
- sdxi->max_cxtid = FIELD_GET(SDXI_MMIO_CAP1_MAX_CXT, cap1);
+
+ /*
+ * Constrain the number of client contexts supported by the
+ * driver to what fits in a single L1 table.
+ */
+ sdxi->max_cxtid = min(SDXI_L1_TABLE_ENTRIES - 1,
+ FIELD_GET(SDXI_MMIO_CAP1_MAX_CXT, cap1));
/* Apply our configuration. */
ctl2 = FIELD_PREP(SDXI_MMIO_CTL2_MAX_CXT, sdxi->max_cxtid);
@@ -179,6 +189,32 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
FIELD_GET(SDXI_MMIO_CAP1_OPB_000_CAP, cap1));
sdxi_write64(sdxi, SDXI_MMIO_CTL2, ctl2);
+ /* SDXI 1.0 4.1.8.2 Context Level 2 Table Setup */
+ sdxi->L2_table = dmam_alloc_coherent(sdxi->dev,
+ sizeof(*sdxi->L2_table),
+ &sdxi->L2_dma, GFP_KERNEL);
+ if (!sdxi->L2_table)
+ return -ENOMEM;
+
+ cxt_l2 = FIELD_PREP(SDXI_MMIO_CXT_L2_PTR, sdxi->L2_dma >> ilog2(SZ_4K));
+ sdxi_write64(sdxi, SDXI_MMIO_CXT_L2, cxt_l2);
+
+ /* SDXI 1.0 4.1.8.3 Context Level 1 Table Setup */
+ sdxi->L1_table = dmam_alloc_coherent(sdxi->dev,
+ sizeof(*sdxi->L1_table),
+ &sdxi->L1_dma, GFP_KERNEL);
+ if (!sdxi->L1_table)
+ return -ENOMEM;
+ /*
+ * SDXI 1.0 4.1.8.3.c: Initialize the Context level 2 table to
+ * point to the Context Level 1 [table].
+ */
+ L2_ent = &sdxi->L2_table->entry[0];
+ lv01_ptr = FIELD_PREP(SDXI_CXT_L2_ENT_VL, 1) |
+ FIELD_PREP(SDXI_CXT_L2_ENT_LV01_PTR,
+ sdxi->L1_dma >> ilog2(SZ_4K));
+ L2_ent->lv01_ptr = cpu_to_le64(lv01_ptr);
+
return 0;
}
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
new file mode 100644
index 000000000000..00324f45b729
--- /dev/null
+++ b/drivers/dma/sdxi/hw.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright Advanced Micro Devices, Inc. */
+
+/*
+ * Control structures and constants defined in the SDXI specification,
+ * with low-level accessors. The ordering of the structures here
+ * follows the order of their definitions in the SDXI spec.
+ *
+ * Names of structures, members, and subfields (bit ranges within
+ * members) are written to match the spec, generally. E.g. struct
+ * sdxi_cxt_L2_ent corresponds to CXT_L2_ENT in the spec.
+ *
+ * Note: a member can have a subfield whose name is identical to the
+ * member's name. E.g. CXT_L2_ENT's lv01_ptr.
+ *
+ * All reserved fields and bits (usually named "rsvd" or some
+ * variation) must be set to zero by the driver unless otherwise
+ * specified.
+ */
+
+#ifndef DMA_SDXI_HW_H
+#define DMA_SDXI_HW_H
+
+#include <linux/bits.h>
+#include <linux/build_bug.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+/* SDXI 1.0 Table 3-2: Context Level 2 Table Entry (CXT_L2_ENT) */
+struct sdxi_cxt_L2_ent {
+ __le64 lv01_ptr;
+#define SDXI_CXT_L2_ENT_VL BIT_ULL(0)
+#define SDXI_CXT_L2_ENT_LV01_PTR GENMASK_ULL(63, 12)
+} __packed __aligned(8);
+static_assert(sizeof(struct sdxi_cxt_L2_ent) == 8);
+
+/* SDXI 1.0 3.2.1 Context Level 2 Table */
+#define SDXI_L2_TABLE_ENTRIES 512
+struct sdxi_cxt_L2_table {
+ struct sdxi_cxt_L2_ent entry[SDXI_L2_TABLE_ENTRIES];
+};
+static_assert(sizeof(struct sdxi_cxt_L2_table) == 4096);
+
+/* SDXI 1.0 Table 3-3: Context Level 1 Table Entry (CXT_L1_ENT) */
+struct sdxi_cxt_L1_ent {
+ __le64 cxt_ctl_ptr;
+ __le64 akey_ptr;
+ __le32 misc0;
+ __le32 opb_000_enb;
+ __u8 rsvd_0[8];
+} __packed __aligned(32);
+static_assert(sizeof(struct sdxi_cxt_L1_ent) == 32);
+
+/* SDXI 1.0 3.2.2 Context Level 1 Table */
+#define SDXI_L1_TABLE_ENTRIES 128
+struct sdxi_cxt_L1_table {
+ struct sdxi_cxt_L1_ent entry[SDXI_L1_TABLE_ENTRIES];
+};
+static_assert(sizeof(struct sdxi_cxt_L1_table) == 4096);
+
+#endif /* DMA_SDXI_HW_H */
diff --git a/drivers/dma/sdxi/mmio.h b/drivers/dma/sdxi/mmio.h
index f07e857691b9..d1ea82b706ee 100644
--- a/drivers/dma/sdxi/mmio.h
+++ b/drivers/dma/sdxi/mmio.h
@@ -19,6 +19,9 @@ enum sdxi_reg {
SDXI_MMIO_CAP0 = 0x00200,
SDXI_MMIO_CAP1 = 0x00208,
SDXI_MMIO_VERSION = 0x00210,
+
+ /* SDXI 1.0 9.2 Context and RKey Table Registers */
+ SDXI_MMIO_CXT_L2 = 0x10000,
};
/* SDXI 1.0 Table 9-2: MMIO_CTL0 */
@@ -51,4 +54,7 @@ enum sdxi_reg {
#define SDXI_MMIO_VERSION_MINOR GENMASK_ULL(7, 0)
#define SDXI_MMIO_VERSION_MAJOR GENMASK_ULL(23, 16)
+/* SDXI 1.0 Table 9-9: MMIO_CXT_L2 */
+#define SDXI_MMIO_CXT_L2_PTR GENMASK_ULL(63, 12)
+
#endif /* DMA_SDXI_MMIO_H */
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index 721abf7556d1..913292463eee 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -39,6 +39,11 @@ struct sdxi_dev {
u16 max_cxtid; /* Maximum context ID allowed. */
u32 op_grp_cap; /* supported operation group cap */
+ struct sdxi_cxt_L2_table *L2_table;
+ dma_addr_t L2_dma;
+ struct sdxi_cxt_L1_table *L1_table;
+ dma_addr_t L1_dma;
+
const struct sdxi_bus_ops *bus_ops;
};
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (4 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 05/23] dmaengine: sdxi: Configure context tables Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:15 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 07/23] dmaengine: sdxi: Allocate administrative context Nathan Lynch via B4 Relay
` (16 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Each SDXI context consists of several control structures in system
memory:
* Descriptor ring
* Access key (AKey) table
* Context control block (CXT_CTL)
* Context status block (CXT_STS)
* Write index
Of these, the write index, context control and context status blocks
are small enough to justify DMA pools.
SDXI descriptors also may have 32-byte completion status
blocks (CST_BLK) associated with them that software can poll for
completion.
Introduce the C structures for context control, context status, and
completion status blocks. Create a DMA pool for each of these objects
as well as write indexes during SDXI function initialization, ensuring
that potentially frequently-updated objects are aligned to avoid
cacheline sharing.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/device.c | 42 +++++++++++++++++++++++++++++++++++++++++-
drivers/dma/sdxi/hw.h | 28 ++++++++++++++++++++++++++++
drivers/dma/sdxi/sdxi.h | 5 +++++
3 files changed, 74 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index fa5e27a4190e..7aa62a989bac 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -6,13 +6,16 @@
*/
#include <linux/bitfield.h>
+#include <linux/cache.h>
#include <linux/delay.h>
#include <linux/device.h>
#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
#include <linux/export.h>
#include <linux/iopoll.h>
#include <linux/jiffies.h>
#include <linux/log2.h>
+#include <linux/minmax.h>
#include <linux/slab.h>
#include <linux/time.h>
@@ -218,6 +221,43 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
return 0;
}
+static int sdxi_device_init(struct sdxi_dev *sdxi)
+{
+ struct device *dev = sdxi->dev;
+ size_t size, align;
+ int err;
+
+ size = sizeof(__le64);
+ align = max(size, SMP_CACHE_BYTES);
+ sdxi->write_index_pool = dmam_pool_create("Write_Index", dev, size,
+ align, 0);
+ if (!sdxi->write_index_pool)
+ return -ENOMEM;
+
+ size = sizeof(struct sdxi_cxt_sts);
+ align = max(size, SMP_CACHE_BYTES);
+ sdxi->cxt_sts_pool = dmam_pool_create("CXT_STS", dev, size, align, 0);
+ if (!sdxi->cxt_sts_pool)
+ return -ENOMEM;
+
+ size = align = sizeof(struct sdxi_cxt_ctl);
+ sdxi->cxt_ctl_pool = dmam_pool_create("CXT_CTL", dev, size, align, 0);
+ if (!sdxi->cxt_ctl_pool)
+ return -ENOMEM;
+
+ size = sizeof(struct sdxi_cst_blk);
+ align = max(size, SMP_CACHE_BYTES);
+ sdxi->cst_blk_pool = dmam_pool_create("CST_BLK", dev, size, align, 0);
+ if (!sdxi->cst_blk_pool)
+ return -ENOMEM;
+
+ err = sdxi_fn_activate(sdxi);
+ if (err)
+ return err;
+
+ return 0;
+}
+
int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
{
struct sdxi_dev *sdxi;
@@ -235,7 +275,7 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
if (err)
return err;
- return sdxi_fn_activate(sdxi);
+ return sdxi_device_init(sdxi);
}
EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index 00324f45b729..b3fd3587ccf8 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -58,4 +58,32 @@ struct sdxi_cxt_L1_table {
};
static_assert(sizeof(struct sdxi_cxt_L1_table) == 4096);
+/* SDXI 1.0 Table 3-4: Context Control (CXT_CTL) */
+struct sdxi_cxt_ctl {
+ __le64 ds_ring_ptr;
+ __le32 ds_ring_sz;
+ __u8 rsvd_0[4];
+ __le64 cxt_sts_ptr;
+ __le64 write_index_ptr;
+ __u8 rsvd_1[32];
+} __packed __aligned(64);
+static_assert(sizeof(struct sdxi_cxt_ctl) == 64);
+
+/* SDXI 1.0 Table 3-5: Context Status (CXT_STS) */
+struct sdxi_cxt_sts {
+ __u8 state;
+ __u8 misc0;
+ __u8 rsvd_0[6];
+ __le64 read_index;
+} __packed __aligned(16);
+static_assert(sizeof(struct sdxi_cxt_sts) == 16);
+
+/* SDXI 1.0 Table 6-4: CST_BLK (Completion Status Block) */
+struct sdxi_cst_blk {
+ __le64 signal;
+ __le32 flags;
+ __u8 rsvd_0[20];
+} __packed __aligned(32);
+static_assert(sizeof(struct sdxi_cst_blk) == 32);
+
#endif /* DMA_SDXI_HW_H */
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index 913292463eee..ade702b6bec5 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -44,6 +44,11 @@ struct sdxi_dev {
struct sdxi_cxt_L1_table *L1_table;
dma_addr_t L1_dma;
+ struct dma_pool *write_index_pool;
+ struct dma_pool *cxt_sts_pool;
+ struct dma_pool *cxt_ctl_pool;
+ struct dma_pool *cst_blk_pool;
+
const struct sdxi_bus_ops *bus_ops;
};
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 07/23] dmaengine: sdxi: Allocate administrative context
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (5 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 08/23] dmaengine: sdxi: Install " Nathan Lynch via B4 Relay
` (15 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Create the control structure hierarchy in memory for the per-function
administrative context. Use devres to queue the corresponding cleanup
since the admin context is a device-scope resource. The context is
inert for now; changes to follow will make it functional.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Makefile | 4 +-
drivers/dma/sdxi/context.c | 128 +++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/context.h | 54 +++++++++++++++++++
drivers/dma/sdxi/device.c | 11 ++++
drivers/dma/sdxi/hw.h | 43 +++++++++++++++
drivers/dma/sdxi/sdxi.h | 2 +
6 files changed, 241 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index 0006edf74d86..cdf8a455077b 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_SDXI_CORE) += sdxi-core.o
-sdxi-core-y := device.o
+sdxi-core-y := \
+ context.o \
+ device.o
obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
sdxi-pci-y := pci.o
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
new file mode 100644
index 000000000000..443c231303af
--- /dev/null
+++ b/drivers/dma/sdxi/context.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI context management
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#define pr_fmt(fmt) "SDXI: " fmt
+
+#include <linux/bug.h>
+#include <linux/cleanup.h>
+#include <linux/device/devres.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "context.h"
+#include "sdxi.h"
+
+#define DEFAULT_DESC_RING_ENTRIES 1024
+
+enum {
+ /*
+ * The admin context always has ID 0. See SDXI 1.0 3.5
+ * Administrative Context (Context 0).
+ */
+ SDXI_ADMIN_CXT_ID = 0,
+};
+
+/*
+ * Free context and its resources. @cxt may be partially allocated but
+ * must have ->sdxi set.
+ */
+static void sdxi_free_cxt(struct sdxi_cxt *cxt)
+{
+ struct sdxi_dev *sdxi = cxt->sdxi;
+ struct sdxi_sq *sq = cxt->sq;
+
+ if (cxt->cxt_ctl)
+ dma_pool_free(sdxi->cxt_ctl_pool, cxt->cxt_ctl,
+ cxt->cxt_ctl_dma);
+ if (cxt->akey_table)
+ dma_free_coherent(sdxi->dev, sizeof(*cxt->akey_table),
+ cxt->akey_table, cxt->akey_table_dma);
+ if (sq && sq->write_index)
+ dma_pool_free(sdxi->write_index_pool, sq->write_index,
+ sq->write_index_dma);
+ if (sq && sq->cxt_sts)
+ dma_pool_free(sdxi->cxt_sts_pool, sq->cxt_sts, sq->cxt_sts_dma);
+ if (sq && sq->desc_ring)
+ dma_free_coherent(sdxi->dev, sq->ring_size,
+ sq->desc_ring, sq->ring_dma);
+ kfree(cxt->sq);
+ kfree(cxt);
+}
+
+DEFINE_FREE(sdxi_cxt, struct sdxi_cxt *, if (_T) sdxi_free_cxt(_T))
+
+/* Allocate a context and its control structure hierarchy in memory. */
+static struct sdxi_cxt *sdxi_alloc_cxt(struct sdxi_dev *sdxi)
+{
+ struct device *dev = sdxi->dev;
+ struct sdxi_sq *sq;
+ struct sdxi_cxt *cxt __free(sdxi_cxt) = kzalloc(sizeof(*cxt), GFP_KERNEL);
+
+ if (!cxt)
+ return NULL;
+
+ cxt->sdxi = sdxi;
+
+ cxt->sq = kzalloc_obj(*cxt->sq, GFP_KERNEL);
+ if (!cxt->sq)
+ return NULL;
+
+ cxt->akey_table = dma_alloc_coherent(dev, sizeof(*cxt->akey_table),
+ &cxt->akey_table_dma, GFP_KERNEL);
+ if (!cxt->akey_table)
+ return NULL;
+
+ cxt->cxt_ctl = dma_pool_zalloc(sdxi->cxt_ctl_pool, GFP_KERNEL,
+ &cxt->cxt_ctl_dma);
+ if (!cxt->cxt_ctl)
+ return NULL;
+
+ sq = cxt->sq;
+
+ sq->ring_entries = DEFAULT_DESC_RING_ENTRIES;
+ sq->ring_size = sq->ring_entries * sizeof(sq->desc_ring[0]);
+ sq->desc_ring = dma_alloc_coherent(dev, sq->ring_size, &sq->ring_dma,
+ GFP_KERNEL);
+ if (!sq->desc_ring)
+ return NULL;
+
+ sq->cxt_sts = dma_pool_zalloc(sdxi->cxt_sts_pool, GFP_KERNEL,
+ &sq->cxt_sts_dma);
+ if (!sq->cxt_sts)
+ return NULL;
+
+ sq->write_index = dma_pool_zalloc(sdxi->write_index_pool, GFP_KERNEL,
+ &sq->write_index_dma);
+ if (!sq->write_index)
+ return NULL;
+
+ return_ptr(cxt);
+}
+
+static void free_admin_cxt(void *ptr)
+{
+ struct sdxi_dev *sdxi = ptr;
+
+ sdxi_free_cxt(sdxi->admin_cxt);
+}
+
+int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
+{
+ struct sdxi_cxt *cxt __free(sdxi_cxt) = sdxi_alloc_cxt(sdxi);
+ if (!cxt)
+ return -ENOMEM;
+
+ cxt->id = SDXI_ADMIN_CXT_ID;
+ cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
+
+ sdxi->admin_cxt = no_free_ptr(cxt);
+
+ return devm_add_action_or_reset(sdxi->dev, free_admin_cxt, sdxi);
+}
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
new file mode 100644
index 000000000000..a29387900df7
--- /dev/null
+++ b/drivers/dma/sdxi/context.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#ifndef DMA_SDXI_CONTEXT_H
+#define DMA_SDXI_CONTEXT_H
+
+#include <linux/dma-mapping.h>
+#include <linux/types.h>
+
+#include "hw.h"
+#include "sdxi.h"
+
+/*
+ * The size of the AKey table is flexible, from 4KB to 1MB. Always use
+ * the minimum size for now.
+ */
+struct sdxi_akey_table {
+ struct sdxi_akey_ent entry[SZ_4K / sizeof(struct sdxi_akey_ent)];
+};
+
+/* Submission Queue */
+struct sdxi_sq {
+ u32 ring_entries;
+ u32 ring_size;
+ struct sdxi_desc *desc_ring;
+ dma_addr_t ring_dma;
+
+ __le64 *write_index;
+ dma_addr_t write_index_dma;
+
+ struct sdxi_cxt_sts *cxt_sts;
+ dma_addr_t cxt_sts_dma;
+};
+
+struct sdxi_cxt {
+ struct sdxi_dev *sdxi;
+ u16 id;
+
+ __le64 __iomem *db;
+
+ struct sdxi_cxt_ctl *cxt_ctl;
+ dma_addr_t cxt_ctl_dma;
+
+ struct sdxi_akey_table *akey_table;
+ dma_addr_t akey_table_dma;
+
+ struct sdxi_sq *sq;
+};
+
+int sdxi_admin_cxt_init(struct sdxi_dev *sdxi);
+
+#endif /* DMA_SDXI_CONTEXT_H */
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index 7aa62a989bac..4d595e79b8ce 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -19,6 +19,7 @@
#include <linux/slab.h>
#include <linux/time.h>
+#include "context.h"
#include "hw.h"
#include "mmio.h"
#include "sdxi.h"
@@ -218,6 +219,16 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
sdxi->L1_dma >> ilog2(SZ_4K));
L2_ent->lv01_ptr = cpu_to_le64(lv01_ptr);
+ /*
+ * SDXI 1.0 4.1.8.4 Administrative Context
+ *
+ * The admin context will not consume descriptors until we
+ * write its doorbell later.
+ */
+ err = sdxi_admin_cxt_init(sdxi);
+ if (err)
+ return err;
+
return 0;
}
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index b3fd3587ccf8..55d63d50a01b 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -23,6 +23,7 @@
#include <linux/bits.h>
#include <linux/build_bug.h>
+#include <linux/stddef.h>
#include <linux/types.h>
#include <asm/byteorder.h>
@@ -72,12 +73,39 @@ static_assert(sizeof(struct sdxi_cxt_ctl) == 64);
/* SDXI 1.0 Table 3-5: Context Status (CXT_STS) */
struct sdxi_cxt_sts {
__u8 state;
+#define SDXI_CXT_STS_STATE GENMASK(3, 0)
__u8 misc0;
__u8 rsvd_0[6];
__le64 read_index;
} __packed __aligned(16);
static_assert(sizeof(struct sdxi_cxt_sts) == 16);
+/* SDXI 1.0 Table 3-6: CXT_STS.state Encoding */
+/* Valid values for FIELD_GET(SDXI_CXT_STS_STATE, sdxi_cxt_sts.state). */
+enum cxt_sts_state {
+ CXTV_STOP_SW = 0x0,
+ CXTV_RUN = 0x1,
+ CXTV_STOPG_SW = 0x2,
+ CXTV_STOP_FN = 0x4,
+ CXTV_STOPG_FN = 0x6,
+ CXTV_ERR_FN = 0xf,
+};
+
+/* SDXI 1.0 Table 3-7: AKey Table Entry (AKEY_ENT) */
+struct sdxi_akey_ent {
+ __le16 intr_num;
+#define SDXI_AKEY_ENT_VL BIT(0)
+#define SDXI_AKEY_ENT_IV BIT(1)
+#define SDXI_AKEY_ENT_INTR_NUM GENMASK(14, 4)
+ __le16 tgt_sfunc;
+ __le32 pasid;
+ __le16 stag;
+ __u8 rsvd_0[2];
+ __le16 rkey;
+ __u8 rsvd_1[2];
+} __packed __aligned(16);
+static_assert(sizeof(struct sdxi_akey_ent) == 16);
+
/* SDXI 1.0 Table 6-4: CST_BLK (Completion Status Block) */
struct sdxi_cst_blk {
__le64 signal;
@@ -86,4 +114,19 @@ struct sdxi_cst_blk {
} __packed __aligned(32);
static_assert(sizeof(struct sdxi_cst_blk) == 32);
+struct sdxi_desc {
+ union {
+ /*
+ * SDXI 1.0 Table 6-3: DSC_GENERIC SDXI Descriptor
+ * Common Header and Footer Format
+ */
+ struct_group_tagged(sdxi_dsc_generic, generic,
+ __le32 opcode;
+ __u8 operation[52];
+ __le64 csb_ptr;
+ );
+ };
+} __packed __aligned(64);
+static_assert(sizeof(struct sdxi_desc) == 64);
+
#endif /* DMA_SDXI_HW_H */
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index ade702b6bec5..f5e0cd986b9e 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -49,6 +49,8 @@ struct sdxi_dev {
struct dma_pool *cxt_ctl_pool;
struct dma_pool *cst_blk_pool;
+ struct sdxi_cxt *admin_cxt;
+
const struct sdxi_bus_ops *bus_ops;
};
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 08/23] dmaengine: sdxi: Install administrative context
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (6 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 07/23] dmaengine: sdxi: Allocate administrative context Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:26 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove Nathan Lynch via B4 Relay
` (14 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Serialize the context control block, akey table, and L1 entry for the
admin context, making its descriptor ring, write index, and context
status block visible to the SDXI implementation once it is activated.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.c | 162 +++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/context.h | 7 ++
drivers/dma/sdxi/hw.h | 15 +++++
drivers/dma/sdxi/sdxi.h | 9 +++
4 files changed, 193 insertions(+)
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
index 443c231303af..cc99fe9ecf92 100644
--- a/drivers/dma/sdxi/context.c
+++ b/drivers/dma/sdxi/context.c
@@ -7,16 +7,22 @@
#define pr_fmt(fmt) "SDXI: " fmt
+#include <linux/align.h>
+#include <linux/bitfield.h>
#include <linux/bug.h>
#include <linux/cleanup.h>
#include <linux/device/devres.h>
#include <linux/dma-mapping.h>
#include <linux/dmapool.h>
#include <linux/errno.h>
+#include <linux/iommu.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <asm/barrier.h>
+#include <asm/rwonce.h>
#include "context.h"
+#include "hw.h"
#include "sdxi.h"
#define DEFAULT_DESC_RING_ENTRIES 1024
@@ -106,6 +112,152 @@ static struct sdxi_cxt *sdxi_alloc_cxt(struct sdxi_dev *sdxi)
return_ptr(cxt);
}
+struct sdxi_cxt_ctl_cfg {
+ dma_addr_t ds_ring_ptr;
+ dma_addr_t cxt_sts_ptr;
+ dma_addr_t write_index_ptr;
+ u32 ds_ring_sz;
+ u8 qos;
+ u8 csa;
+ bool se;
+};
+
+static int configure_cxt_ctl(struct sdxi_cxt_ctl *ctl, const struct sdxi_cxt_ctl_cfg *cfg)
+{
+ u64 ds_ring_ptr, cxt_sts_ptr, write_index_ptr;
+
+ write_index_ptr = FIELD_PREP(SDXI_CXT_CTL_WRITE_INDEX_PTR,
+ cfg->write_index_ptr >> WRT_INDEX_PTR_SHIFT);
+ cxt_sts_ptr = FIELD_PREP(SDXI_CXT_CTL_CXT_STS_PTR,
+ cfg->cxt_sts_ptr >> CXT_STATUS_PTR_SHIFT);
+
+ *ctl = (typeof(*ctl)) {
+ /*
+ * ds_ring_ptr contains the validity bit and is updated
+ * after a barrier is issued.
+ */
+ .ds_ring_sz = cpu_to_le32(cfg->ds_ring_sz),
+ .cxt_sts_ptr = cpu_to_le64(cxt_sts_ptr),
+ .write_index_ptr = cpu_to_le64(write_index_ptr),
+ };
+
+ ds_ring_ptr = FIELD_PREP(SDXI_CXT_CTL_VL, 1) |
+ FIELD_PREP(SDXI_CXT_CTL_QOS, cfg->qos) |
+ FIELD_PREP(SDXI_CXT_CTL_SE, cfg->se) |
+ FIELD_PREP(SDXI_CXT_CTL_CSA, cfg->csa) |
+ FIELD_PREP(SDXI_CXT_CTL_DS_RING_PTR,
+ cfg->ds_ring_ptr >> DESC_RING_BASE_PTR_SHIFT);
+ /* Ensure other fields are visible before hw sees vl=1. */
+ dma_wmb();
+ WRITE_ONCE(ctl->ds_ring_ptr, cpu_to_le64(ds_ring_ptr));
+
+ return 0;
+}
+
+/*
+ * Logical representation of CXT_L1_ENT subfields.
+ */
+struct sdxi_cxt_L1_cfg {
+ dma_addr_t cxt_ctl_ptr;
+ dma_addr_t akey_ptr;
+ u32 cxt_pasid;
+ u32 opb_000_enb;
+ u16 max_buffer;
+ u8 akey_sz;
+ bool ka;
+ bool pv;
+};
+
+static int configure_L1_entry(struct sdxi_cxt_L1_ent *ent,
+ const struct sdxi_cxt_L1_cfg *cfg)
+{
+ u64 cxt_ctl_ptr, akey_ptr;
+ u32 misc0;
+
+ if (WARN_ON_ONCE(!IS_ALIGNED(cfg->cxt_ctl_ptr, SZ_64)))
+ return -EFAULT;
+ if (WARN_ON_ONCE(!IS_ALIGNED(cfg->akey_ptr, SZ_4K)))
+ return -EFAULT;
+
+ akey_ptr = FIELD_PREP(SDXI_CXT_L1_ENT_AKEY_SZ, cfg->akey_sz) |
+ FIELD_PREP(SDXI_CXT_L1_ENT_AKEY_PTR,
+ cfg->akey_ptr >> L1_CXT_AKEY_PTR_SHIFT);
+
+ misc0 = FIELD_PREP(SDXI_CXT_L1_ENT_PASID, cfg->cxt_pasid) |
+ FIELD_PREP(SDXI_CXT_L1_ENT_MAX_BUFFER, cfg->max_buffer);
+
+ *ent = (typeof(*ent)) {
+ /*
+ * cxt_ctl_ptr contains the validity bit and is
+ * updated after a barrier is issued.
+ */
+ .akey_ptr = cpu_to_le64(akey_ptr),
+ .misc0 = cpu_to_le32(misc0),
+ .opb_000_enb = cpu_to_le32(cfg->opb_000_enb),
+ };
+
+ cxt_ctl_ptr = FIELD_PREP(SDXI_CXT_L1_ENT_VL, 1) |
+ FIELD_PREP(SDXI_CXT_L1_ENT_KA, cfg->ka) |
+ FIELD_PREP(SDXI_CXT_L1_ENT_PV, cfg->pv) |
+ FIELD_PREP(SDXI_CXT_L1_ENT_CXT_CTL_PTR,
+ cfg->cxt_ctl_ptr >> L1_CXT_CTRL_PTR_SHIFT);
+ /* Ensure other fields are visible before hw sees vl=1. */
+ dma_wmb();
+ WRITE_ONCE(ent->cxt_ctl_ptr, cpu_to_le64(cxt_ctl_ptr));
+
+ return 0;
+}
+
+/*
+ * Make the context control structure hierarchy valid from the POV of
+ * the SDXI implementation. This may eventually involve allocation of
+ * a L1 table page, so it needs to be fallible.
+ */
+static int sdxi_publish_cxt(const struct sdxi_cxt *cxt)
+{
+ struct sdxi_cxt_ctl_cfg ctl_cfg;
+ struct sdxi_cxt_L1_cfg L1_cfg;
+ struct sdxi_cxt_L1_ent *ent;
+ u8 l1_idx;
+ int err;
+
+ if (WARN_ONCE(cxt->id > cxt->sdxi->max_cxtid,
+ "can't install cxt with id %u (limit %u)",
+ cxt->id, cxt->sdxi->max_cxtid))
+ return -EINVAL;
+
+ ctl_cfg = (typeof(ctl_cfg)) {
+ .se = 1,
+ .csa = 1,
+ .ds_ring_ptr = cxt->sq->ring_dma,
+ .ds_ring_sz = cxt->sq->ring_size >> 6,
+ .cxt_sts_ptr = cxt->sq->cxt_sts_dma,
+ .write_index_ptr = cxt->sq->write_index_dma,
+ };
+
+ err = configure_cxt_ctl(cxt->cxt_ctl, &ctl_cfg);
+ if (err)
+ return err;
+
+ l1_idx = ID_TO_L1_INDEX(cxt->id);
+
+ ent = &cxt->sdxi->L1_table->entry[l1_idx];
+
+ L1_cfg = (typeof(L1_cfg)) {
+ .ka = 1,
+ .pv = 0,
+ .cxt_ctl_ptr = cxt->cxt_ctl_dma,
+ .akey_sz = akey_table_order(cxt->akey_table),
+ .akey_ptr = cxt->akey_table_dma,
+ .cxt_pasid = IOMMU_NO_PASID,
+ .max_buffer = 11, /* 4GB */
+ .opb_000_enb = cxt->sdxi->op_grp_cap,
+ };
+
+ return configure_L1_entry(ent, &L1_cfg);
+ /* todo: need to send DSC_CXT_UPD to admin */
+}
+
static void free_admin_cxt(void *ptr)
{
struct sdxi_dev *sdxi = ptr;
@@ -115,13 +267,23 @@ static void free_admin_cxt(void *ptr)
int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
{
+ int err;
+ struct sdxi_sq *sq;
+
struct sdxi_cxt *cxt __free(sdxi_cxt) = sdxi_alloc_cxt(sdxi);
if (!cxt)
return -ENOMEM;
+ sq = cxt->sq;
+ /* SDXI 1.0 4.1.8.4.b: Set CXT_STS.state to CXTV_RUN. */
+ sq->cxt_sts->state = FIELD_PREP(SDXI_CXT_STS_STATE, CXTV_RUN);
cxt->id = SDXI_ADMIN_CXT_ID;
cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
+ err = sdxi_publish_cxt(cxt);
+ if (err)
+ return err;
+
sdxi->admin_cxt = no_free_ptr(cxt);
return devm_add_action_or_reset(sdxi->dev, free_admin_cxt, sdxi);
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index a29387900df7..65b773446ba3 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -20,6 +20,13 @@ struct sdxi_akey_table {
struct sdxi_akey_ent entry[SZ_4K / sizeof(struct sdxi_akey_ent)];
};
+/* For encoding the akey table size in CXT_L1_ENT's akey_sz. */
+static inline u8 akey_table_order(const struct sdxi_akey_table *tbl)
+{
+ static_assert(sizeof(*tbl) == SZ_4K);
+ return 0;
+}
+
/* Submission Queue */
struct sdxi_sq {
u32 ring_entries;
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index 55d63d50a01b..4b65337a5975 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -45,8 +45,16 @@ static_assert(sizeof(struct sdxi_cxt_L2_table) == 4096);
/* SDXI 1.0 Table 3-3: Context Level 1 Table Entry (CXT_L1_ENT) */
struct sdxi_cxt_L1_ent {
__le64 cxt_ctl_ptr;
+#define SDXI_CXT_L1_ENT_VL BIT_ULL(0)
+#define SDXI_CXT_L1_ENT_KA BIT_ULL(1)
+#define SDXI_CXT_L1_ENT_PV BIT_ULL(2)
+#define SDXI_CXT_L1_ENT_CXT_CTL_PTR GENMASK_ULL(63, 6)
__le64 akey_ptr;
+#define SDXI_CXT_L1_ENT_AKEY_SZ GENMASK_ULL(3, 0)
+#define SDXI_CXT_L1_ENT_AKEY_PTR GENMASK_ULL(63, 12)
__le32 misc0;
+#define SDXI_CXT_L1_ENT_PASID GENMASK(19, 0)
+#define SDXI_CXT_L1_ENT_MAX_BUFFER GENMASK(23, 20)
__le32 opb_000_enb;
__u8 rsvd_0[8];
} __packed __aligned(32);
@@ -62,10 +70,17 @@ static_assert(sizeof(struct sdxi_cxt_L1_table) == 4096);
/* SDXI 1.0 Table 3-4: Context Control (CXT_CTL) */
struct sdxi_cxt_ctl {
__le64 ds_ring_ptr;
+#define SDXI_CXT_CTL_VL BIT_ULL(0)
+#define SDXI_CXT_CTL_QOS GENMASK_ULL(3, 2)
+#define SDXI_CXT_CTL_SE BIT_ULL(4)
+#define SDXI_CXT_CTL_CSA BIT_ULL(5)
+#define SDXI_CXT_CTL_DS_RING_PTR GENMASK_ULL(63, 6)
__le32 ds_ring_sz;
__u8 rsvd_0[4];
__le64 cxt_sts_ptr;
+#define SDXI_CXT_CTL_CXT_STS_PTR GENMASK_ULL(63, 4)
__le64 write_index_ptr;
+#define SDXI_CXT_CTL_WRITE_INDEX_PTR GENMASK_ULL(63, 3)
__u8 rsvd_1[32];
} __packed __aligned(64);
static_assert(sizeof(struct sdxi_cxt_ctl) == 64);
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index f5e0cd986b9e..903bf18bd3cc 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -15,6 +15,15 @@
#include "mmio.h"
+#define ID_TO_L1_INDEX(id) ((id) & 0x7F)
+
+#define DESC_RING_BASE_PTR_SHIFT 6
+#define CXT_STATUS_PTR_SHIFT 4
+#define WRT_INDEX_PTR_SHIFT 3
+
+#define L1_CXT_CTRL_PTR_SHIFT 6
+#define L1_CXT_AKEY_PTR_SHIFT 12
+
struct sdxi_dev;
/**
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (7 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 08/23] dmaengine: sdxi: Install " Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:14 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start Nathan Lynch via B4 Relay
` (13 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Following admin context setup in the previous patch, drive each SDXI
function to active state during probe. This is done by writing
GSRV_ACTIVE to MMIO_CTL0.fn_gsr and polling MMIO_STS0.fn_gsv until the
function reaches GSV_ACTIVE or an error state. A 1-second timeout has
been sufficient in practice so far.
Introduce sdxi_unregister() to stop the function during remove and wire
it up via the pci_driver .remove callback.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/device.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
drivers/dma/sdxi/pci.c | 7 +++++++
drivers/dma/sdxi/sdxi.h | 1 +
3 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index 4d595e79b8ce..e8f087f758fd 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -90,6 +90,42 @@ static void sdxi_write_fn_gsr(struct sdxi_dev *sdxi, enum sdxi_fn_gsr cmd)
sdxi_write64(sdxi, SDXI_MMIO_CTL0, ctl0);
}
+/*
+ * Transition the function from stopped state to active.
+ * See SDXI 1.0 4.1 SDXI Function State.
+ */
+static int sdxi_dev_start(struct sdxi_dev *sdxi)
+{
+ enum sdxi_fn_gsv status = sdxi_dev_gsv(sdxi);
+ int ret;
+
+ if (status != SDXI_GSV_STOP) {
+ dev_err(sdxi->dev,
+ "can't activate busy device (unexpected gsv: %s)\n",
+ gsv_str(status));
+ return -EBUSY;
+ }
+
+ sdxi_write_fn_gsr(sdxi, SDXI_GSRV_ACTIVE);
+
+ ret = sdxi_dev_gsv_poll(sdxi, status,
+ status == SDXI_GSV_ACTIVE ||
+ status == SDXI_GSV_ERROR);
+ if (ret) {
+ dev_err(sdxi->dev, "activation timed out, current state: %s\n",
+ gsv_str(status));
+ return ret;
+ }
+
+ if (status == SDXI_GSV_ERROR) {
+ dev_err(sdxi->dev, "went to error state during activation\n");
+ return -EIO;
+ }
+
+ dev_dbg(sdxi->dev, "activated\n");
+ return 0;
+}
+
/* Get the device to the GSV_STOP state. */
static int sdxi_dev_stop(struct sdxi_dev *sdxi)
{
@@ -229,7 +265,11 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
if (err)
return err;
- return 0;
+ /*
+ * SDXI 1.0 4.1.8.9: Set MMIO_CTL0.fn_gsr to GSRV_ACTIVE and
+ * wait for MMIO_STS0.fn_gsv to reach GSV_ACTIVE or GSV_ERROR.
+ */
+ return sdxi_dev_start(sdxi);
}
static int sdxi_device_init(struct sdxi_dev *sdxi)
@@ -290,6 +330,14 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
}
EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
+void sdxi_unregister(struct device *dev)
+{
+ struct sdxi_dev *sdxi = dev_get_drvdata(dev);
+
+ sdxi_dev_stop(sdxi);
+}
+EXPORT_SYMBOL_NS_GPL(sdxi_unregister, "SDXI");
+
MODULE_AUTHOR("Wei Huang");
MODULE_AUTHOR("Nathan Lynch");
MODULE_DESCRIPTION("SDXI core");
diff --git a/drivers/dma/sdxi/pci.c b/drivers/dma/sdxi/pci.c
index 42e8af008b10..4d7d6812da6d 100644
--- a/drivers/dma/sdxi/pci.c
+++ b/drivers/dma/sdxi/pci.c
@@ -63,6 +63,12 @@ static int sdxi_pci_probe(struct pci_dev *pdev,
return sdxi_register(&pdev->dev, &sdxi_pci_ops);
}
+static void sdxi_pci_remove(struct pci_dev *pdev)
+{
+ pci_disable_sriov(pdev);
+ sdxi_unregister(&pdev->dev);
+}
+
static const struct pci_device_id sdxi_id_table[] = {
{ PCI_DEVICE_CLASS(PCI_CLASS_ACCELERATOR_SDXI, 0xffffff) },
{ }
@@ -73,6 +79,7 @@ static struct pci_driver sdxi_driver = {
.name = "sdxi",
.id_table = sdxi_id_table,
.probe = sdxi_pci_probe,
+ .remove = sdxi_pci_remove,
.sriov_configure = pci_sriov_configure_simple,
};
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index 903bf18bd3cc..a15b97135308 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -64,6 +64,7 @@ struct sdxi_dev {
};
int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops);
+void sdxi_unregister(struct device *dev);
static inline u64 sdxi_read64(const struct sdxi_dev *sdxi, enum sdxi_reg reg)
{
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (8 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:12 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs Nathan Lynch via B4 Relay
` (12 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Now that the SDXI function has been placed in active state, the admin
context can finally be started by writing its doorbell. Introduce
a sdxi_cxt_push_doorbell() helper to simplify this for callers; it
will be used in all descriptor submission paths.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.h | 6 ++++++
drivers/dma/sdxi/device.c | 15 ++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index 65b773446ba3..d89e026a7736 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -7,6 +7,7 @@
#define DMA_SDXI_CONTEXT_H
#include <linux/dma-mapping.h>
+#include <linux/io.h>
#include <linux/types.h>
#include "hw.h"
@@ -58,4 +59,9 @@ struct sdxi_cxt {
int sdxi_admin_cxt_init(struct sdxi_dev *sdxi);
+static inline void sdxi_cxt_push_doorbell(struct sdxi_cxt *cxt, u64 index)
+{
+ writeq(index, cxt->db);
+}
+
#endif /* DMA_SDXI_CONTEXT_H */
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index e8f087f758fd..b875d8cdb58a 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -269,7 +269,20 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
* SDXI 1.0 4.1.8.9: Set MMIO_CTL0.fn_gsr to GSRV_ACTIVE and
* wait for MMIO_STS0.fn_gsv to reach GSV_ACTIVE or GSV_ERROR.
*/
- return sdxi_dev_start(sdxi);
+ err = sdxi_dev_start(sdxi);
+ if (err)
+ return err;
+
+ /*
+ * SDXI 1.0 4.1.8.10.b: Start the admin context using method
+ * #3 ("Jump Start 1") from 4.3.4 Starting A Context and
+ * Context Signaling. We haven't queued any descriptors to the
+ * admin context at this point, so the appropriate value for
+ * the doorbell is 0.
+ */
+ sdxi_cxt_push_doorbell(sdxi->admin_cxt, 0);
+
+ return 0;
}
static int sdxi_device_init(struct sdxi_dev *sdxi)
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (9 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:22 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management Nathan Lynch via B4 Relay
` (11 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Expose sdxi_cxt_new() and sdxi_cxt_exit(), which are the rest of the
driver's entry points to creating and releasing SDXI contexts.
Track client contexts in a device-wide allocating xarray, mapping
context ID to the context object. The admin context always has ID 0,
so begin allocations at 1. Define a local sdxi_cxt_id class to
facilitate early allocation (before committing more resources) and
automatic release of context IDs.
Introduce new code to invalidate a context's entry in the L1 table on
deallocation.
Support for starting and stopping contexts will be added in changes to
follow.
The only expected user of sdxi_cxt_new() and sdxi_cxt_exit() at this
point is the DMA engine provider code where a client context per
channel will be created.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.c | 122 +++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/context.h | 13 +++++
drivers/dma/sdxi/device.c | 8 +++
drivers/dma/sdxi/sdxi.h | 2 +
4 files changed, 145 insertions(+)
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
index cc99fe9ecf92..2f9a29df5d9c 100644
--- a/drivers/dma/sdxi/context.c
+++ b/drivers/dma/sdxi/context.c
@@ -44,6 +44,10 @@ static void sdxi_free_cxt(struct sdxi_cxt *cxt)
struct sdxi_dev *sdxi = cxt->sdxi;
struct sdxi_sq *sq = cxt->sq;
+ /* Release the id if this is a client context. */
+ if (cxt->id)
+ WARN_ON(xa_erase(&sdxi->client_cxts, cxt->id) != cxt);
+
if (cxt->cxt_ctl)
dma_pool_free(sdxi->cxt_ctl_pool, cxt->cxt_ctl,
cxt->cxt_ctl_dma);
@@ -154,6 +158,16 @@ static int configure_cxt_ctl(struct sdxi_cxt_ctl *ctl, const struct sdxi_cxt_ctl
return 0;
}
+static void invalidate_cxtl_ctl(struct sdxi_cxt_ctl *ctl)
+{
+ u64 ds_ring_ptr = le64_to_cpu(ctl->ds_ring_ptr);
+
+ FIELD_MODIFY(SDXI_CXT_CTL_VL, &ds_ring_ptr, 0);
+ WRITE_ONCE(ctl->ds_ring_ptr, cpu_to_le64(ds_ring_ptr));
+ dma_wmb();
+ *ctl = (typeof(*ctl)) { 0 };
+}
+
/*
* Logical representation of CXT_L1_ENT subfields.
*/
@@ -208,6 +222,16 @@ static int configure_L1_entry(struct sdxi_cxt_L1_ent *ent,
return 0;
}
+static void invalidate_L1_entry(struct sdxi_cxt_L1_ent *ent)
+{
+ u64 cxt_ctl_ptr = le64_to_cpu(ent->cxt_ctl_ptr);
+
+ FIELD_MODIFY(SDXI_CXT_L1_ENT_VL, &cxt_ctl_ptr, 0);
+ WRITE_ONCE(ent->cxt_ctl_ptr, cpu_to_le64(cxt_ctl_ptr));
+ dma_wmb();
+ *ent = (typeof(*ent)) { 0 };
+}
+
/*
* Make the context control structure hierarchy valid from the POV of
* the SDXI implementation. This may eventually involve allocation of
@@ -258,6 +282,17 @@ static int sdxi_publish_cxt(const struct sdxi_cxt *cxt)
/* todo: need to send DSC_CXT_UPD to admin */
}
+/* Invalidate a context. */
+static void sdxi_rescind_cxt(struct sdxi_cxt *cxt)
+{
+ u8 l1_idx = ID_TO_L1_INDEX(cxt->id);
+ struct sdxi_cxt_L1_ent *ent = &cxt->sdxi->L1_table->entry[l1_idx];
+
+ invalidate_L1_entry(ent);
+ invalidate_cxtl_ctl(cxt->cxt_ctl);
+ /* todo: need to send DSC_CXT_UPD to admin */
+}
+
static void free_admin_cxt(void *ptr)
{
struct sdxi_dev *sdxi = ptr;
@@ -288,3 +323,90 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
return devm_add_action_or_reset(sdxi->dev, free_admin_cxt, sdxi);
}
+
+/*
+ * Temporary owner for context id until it can be assigned to a
+ * context object; enables scope-based cleanup.
+ */
+struct sdxi_cxt_id {
+ struct sdxi_dev *sdxi;
+ u16 index;
+};
+
+static void sdxi_cxt_id_dtor(const struct sdxi_cxt_id *cxt_id)
+{
+ if (cxt_id->index == 0)
+ return;
+ WARN_ON(xa_erase(&cxt_id->sdxi->client_cxts, cxt_id->index) != NULL);
+}
+
+static struct sdxi_cxt_id sdxi_cxt_id_ctor(struct sdxi_dev *sdxi)
+{
+ struct xa_limit limit = XA_LIMIT(1, sdxi->max_cxtid);
+ u32 index;
+
+ return (struct sdxi_cxt_id) {
+ .sdxi = sdxi,
+ .index = xa_alloc(&sdxi->client_cxts, &index, NULL,
+ limit, GFP_KERNEL) ? 0 : (u16)index,
+ };
+}
+
+DEFINE_CLASS(sdxi_cxt_id, struct sdxi_cxt_id, sdxi_cxt_id_dtor(&_T),
+ sdxi_cxt_id_ctor(sdxi), struct sdxi_dev *sdxi)
+
+static bool sdxi_cxt_id_valid(const struct sdxi_cxt_id *cxt_id)
+{
+ return cxt_id->index > 0;
+}
+
+/*
+ * Transfer ownership of the id to the context object, recording the
+ * context pointer in the device's client_cxt xarray. sdxi_cxt_free()
+ * is responsible for releasing the id from now on.
+ */
+static void sdxi_cxt_id_assign(struct sdxi_cxt *cxt, struct sdxi_cxt_id *cxt_id)
+{
+ /* We reserved the space in the constructor so this should not fail. */
+ WARN_ON(xa_store(&cxt_id->sdxi->client_cxts,
+ cxt_id->index, cxt, GFP_KERNEL));
+ cxt->id = cxt_id->index;
+ cxt_id->index = 0;
+}
+
+/*
+ * Allocate a context for in-kernel use. Starting the context is the
+ * caller's responsibility.
+ */
+struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi)
+{
+ /*
+ * Ensure an ID is available before allocating memory for the
+ * context and its control structures.
+ */
+ CLASS(sdxi_cxt_id, id)(sdxi);
+ if (!sdxi_cxt_id_valid(&id))
+ return NULL;
+
+ struct sdxi_cxt *cxt __free(sdxi_cxt) = sdxi_alloc_cxt(sdxi);
+ if (!cxt)
+ return NULL;
+
+ sdxi_cxt_id_assign(cxt, &id);
+
+ cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
+
+ if (sdxi_publish_cxt(cxt))
+ return NULL;
+
+ return_ptr(cxt);
+}
+
+void sdxi_cxt_exit(struct sdxi_cxt *cxt)
+{
+ if (WARN_ON(sdxi_cxt_is_admin(cxt)))
+ return;
+
+ sdxi_rescind_cxt(cxt);
+ sdxi_free_cxt(cxt);
+}
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index d89e026a7736..42e377bb2446 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -59,6 +59,19 @@ struct sdxi_cxt {
int sdxi_admin_cxt_init(struct sdxi_dev *sdxi);
+struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi);
+void sdxi_cxt_exit(struct sdxi_cxt *cxt);
+
+static inline struct sdxi_cxt *to_admin_cxt(const struct sdxi_cxt *cxt)
+{
+ return cxt->sdxi->admin_cxt;
+}
+
+static inline bool sdxi_cxt_is_admin(const struct sdxi_cxt *cxt)
+{
+ return cxt == to_admin_cxt(cxt);
+}
+
static inline void sdxi_cxt_push_doorbell(struct sdxi_cxt *cxt, u64 index)
{
writeq(index, cxt->db);
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index b875d8cdb58a..d3d1ad2f2eff 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -18,6 +18,7 @@
#include <linux/minmax.h>
#include <linux/slab.h>
#include <linux/time.h>
+#include <linux/xarray.h>
#include "context.h"
#include "hw.h"
@@ -333,6 +334,7 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
sdxi->dev = dev;
sdxi->bus_ops = ops;
+ xa_init_flags(&sdxi->client_cxts, XA_FLAGS_ALLOC1);
dev_set_drvdata(dev, sdxi);
err = sdxi->bus_ops->init(sdxi);
@@ -346,6 +348,12 @@ EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
void sdxi_unregister(struct device *dev)
{
struct sdxi_dev *sdxi = dev_get_drvdata(dev);
+ struct sdxi_cxt *cxt;
+ unsigned long index;
+
+ xa_for_each(&sdxi->client_cxts, index, cxt)
+ sdxi_cxt_exit(cxt);
+ xa_destroy(&sdxi->client_cxts);
sdxi_dev_stop(sdxi);
}
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index a15b97135308..4e29d1f90dae 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -12,6 +12,7 @@
#include <linux/dev_printk.h>
#include <linux/io.h>
#include <linux/types.h>
+#include <linux/xarray.h>
#include "mmio.h"
@@ -59,6 +60,7 @@ struct sdxi_dev {
struct dma_pool *cst_blk_pool;
struct sdxi_cxt *admin_cxt;
+ struct xarray client_cxts; /* context id -> (struct sdxi_cxt *) */
const struct sdxi_bus_ops *bus_ops;
};
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (10 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:19 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations Nathan Lynch via B4 Relay
` (10 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Introduce a library for managing SDXI descriptor ring state. It
encapsulates determining the next free space in the ring to deposit
descriptors and performing the update of the write index correctly, as
well as iterating over slices (reservations) of the ring without
dealing directly with ring offsets/indexes.
The central abstraction is sdxi_ring_state, which maintains the write
index and a wait queue. An internal spin lock serializes checks for
space in the ring and updates to the write index.
Reservations (sdxi_ring_resv) are intended to be short-lived on-stack
objects representing slices of the ring for callers to populate with
descriptors. Both blocking and non-blocking reservation APIs are
provided.
Descriptor access within a reservation is provided via
sdxi_ring_resv_next() and sdxi_ring_resv_foreach().
Completion handlers must call sdxi_ring_wake_up() when descriptors
have been consumed so that blocked reservations can proceed.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Makefile | 3 +-
drivers/dma/sdxi/ring.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/ring.h | 84 ++++++++++++++++++++++++
3 files changed, 245 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index cdf8a455077b..bfcb443f1e64 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -2,7 +2,8 @@
obj-$(CONFIG_SDXI_CORE) += sdxi-core.o
sdxi-core-y := \
context.o \
- device.o
+ device.o \
+ ring.o
obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
sdxi-pci-y := pci.o
diff --git a/drivers/dma/sdxi/ring.c b/drivers/dma/sdxi/ring.c
new file mode 100644
index 000000000000..91b28c7afbbf
--- /dev/null
+++ b/drivers/dma/sdxi/ring.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI descriptor ring state management. Handles advancing the write
+ * index correctly and supplies "reservations" i.e. slices of the ring
+ * to be filled with descriptors.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+#include <kunit/visibility.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/lockdep.h>
+#include <linux/range.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+#include <asm/barrier.h>
+#include <asm/byteorder.h>
+#include <asm/div64.h>
+#include <asm/rwonce.h>
+
+#include "ring.h"
+#include "hw.h"
+
+/*
+ * Initialize ring management state. Caller is responsible for
+ * allocating, mapping, and initializing the actual control structures
+ * shared with hardware: the indexes and ring array.
+ */
+void sdxi_ring_state_init(struct sdxi_ring_state *rs, const __le64 *read_index,
+ __le64 *write_index, u32 entries,
+ struct sdxi_desc descs[static SZ_1K])
+{
+ WARN_ON_ONCE(!read_index);
+ WARN_ON_ONCE(!write_index);
+ /*
+ * See SDXI 1.0 Table 3-1 Memory Structure Summary. Minimum
+ * descriptor ring size in bytes is 64KB; thus 1024 64-byte
+ * entries.
+ */
+ WARN_ON_ONCE(entries < SZ_1K);
+
+ *rs = (typeof(*rs)) {
+ .write_index = le64_to_cpu(*write_index),
+ .write_index_ptr = write_index,
+ .read_index_ptr = read_index,
+ .entries = entries,
+ .entry = descs,
+ };
+ spin_lock_init(&rs->lock);
+ init_waitqueue_head(&rs->wqh);
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_ring_state_init);
+
+static u64 sdxi_ring_state_load_ridx(struct sdxi_ring_state *rs)
+{
+ lockdep_assert_held(&rs->lock);
+ return le64_to_cpu(READ_ONCE(*rs->read_index_ptr));
+}
+
+static void sdxi_ring_state_store_widx(struct sdxi_ring_state *rs, u64 new_widx)
+{
+ lockdep_assert_held(&rs->lock);
+ rs->write_index = new_widx;
+ WRITE_ONCE(*rs->write_index_ptr, cpu_to_le64(new_widx));
+}
+
+/* Non-blocking ring reservation. Callers must handle ring full (-EBUSY). */
+int sdxi_ring_try_reserve(struct sdxi_ring_state *rs, size_t nr,
+ struct sdxi_ring_resv *resv)
+{
+ u64 new_widx;
+
+ /*
+ * Caller bug, warn and reject.
+ */
+ if (WARN_ONCE(nr < 1 || nr > rs->entries,
+ "Reservation of size %zu requested from ring of size %u\n",
+ nr, rs->entries))
+ return -EINVAL;
+
+ scoped_guard(spinlock_irqsave, &rs->lock) {
+ u64 ridx = sdxi_ring_state_load_ridx(rs);
+
+ /*
+ * Bug: the read index should never exceed the write index.
+ * TODO: sdxi_err() or similar; need a reference to
+ * the device.
+ */
+ if (ridx > rs->write_index)
+ return -EIO;
+
+ new_widx = rs->write_index + nr;
+
+ /*
+ * Not enough space available right now.
+ * TODO: sdxi_dbg() or tracepoint here.
+ */
+ if (new_widx - ridx > rs->entries)
+ return -EBUSY;
+
+ sdxi_ring_state_store_widx(rs, new_widx);
+ }
+
+ *resv = (typeof(*resv)) {
+ .rs = rs,
+ .range = {
+ .start = new_widx - nr,
+ .end = new_widx - 1,
+ },
+ .iter = new_widx - nr,
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_ring_try_reserve);
+
+/* Blocking ring reservation. Retries until success or non-transient error. */
+int sdxi_ring_reserve(struct sdxi_ring_state *rs, size_t nr,
+ struct sdxi_ring_resv *resv)
+{
+ int ret;
+
+ wait_event(rs->wqh,
+ (ret = sdxi_ring_try_reserve(rs, nr, resv)) != -EBUSY);
+
+ return ret;
+}
+
+/* Completion code should call this whenever descriptors have been consumed. */
+void sdxi_ring_wake_up(struct sdxi_ring_state *rs)
+{
+ wake_up_all(&rs->wqh);
+}
+
+static struct sdxi_desc *
+sdxi_desc_ring_entry(const struct sdxi_ring_state *rs, u64 index)
+{
+ return &rs->entry[do_div(index, rs->entries)];
+}
+
+struct sdxi_desc *sdxi_ring_resv_next(struct sdxi_ring_resv *resv)
+{
+ if (resv->range.start <= resv->iter && resv->iter <= resv->range.end)
+ return sdxi_desc_ring_entry(resv->rs, resv->iter++);
+ /*
+ * Caller has iterated to the end of the reservation.
+ */
+ if (resv->iter == resv->range.end + 1)
+ return NULL;
+ /*
+ * Should happen only if caller messed with internal
+ * reservation state.
+ */
+ WARN_ONCE(1, "reservation[%llu,%llu] with iter %llu",
+ resv->range.start, resv->range.end, resv->iter);
+ return NULL;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_ring_resv_next);
diff --git a/drivers/dma/sdxi/ring.h b/drivers/dma/sdxi/ring.h
new file mode 100644
index 000000000000..d5682687c05c
--- /dev/null
+++ b/drivers/dma/sdxi/ring.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright Advanced Micro Devices, Inc. */
+#ifndef DMA_SDXI_RING_H
+#define DMA_SDXI_RING_H
+
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/range.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+#include <asm/barrier.h>
+#include <asm/byteorder.h>
+#include <asm/div64.h>
+#include <asm/rwonce.h>
+
+#include "hw.h"
+
+/*
+ * struct sdxi_ring_state - Descriptor ring management.
+ *
+ * @lock: Guards *read_index_ptr (RO), *write_index_ptr (RW),
+ * write_index (RW). *read_index is incremented by hw.
+ * @write_index: Cached write index value, minimizes dereferences in
+ * critical sections.
+ * @write_index_ptr: Location of the architected write index shared with
+ * the SDXI implementation.
+ * @read_index_ptr: Location of the architected read index shared with
+ * the SDXI implementation.
+ * @entries: Number of entries in the ring.
+ * @entry: The descriptor ring itself, shared with the SDXI implementation.
+ * @wqh: Pending reservations.
+ */
+struct sdxi_ring_state {
+ spinlock_t lock;
+ u64 write_index; /* Cache current value of write index. */
+ __le64 *write_index_ptr;
+ const __le64 *read_index_ptr;
+ u32 entries;
+ struct sdxi_desc *entry;
+ wait_queue_head_t wqh;
+};
+
+/*
+ * Ring reservation and iteration state.
+ */
+struct sdxi_ring_resv {
+ const struct sdxi_ring_state *rs;
+ struct range range;
+ u64 iter;
+};
+
+void sdxi_ring_state_init(struct sdxi_ring_state *ring, const __le64 *read_index,
+ __le64 *write_index, u32 entries,
+ struct sdxi_desc descs[static SZ_1K]);
+void sdxi_ring_wake_up(struct sdxi_ring_state *rs);
+int sdxi_ring_reserve(struct sdxi_ring_state *ring, size_t nr,
+ struct sdxi_ring_resv *resv);
+int sdxi_ring_try_reserve(struct sdxi_ring_state *ring, size_t nr,
+ struct sdxi_ring_resv *resv);
+struct sdxi_desc *sdxi_ring_resv_next(struct sdxi_ring_resv *resv);
+
+/* Reset reservation's internal iterator. */
+static inline void sdxi_ring_resv_reset(struct sdxi_ring_resv *resv)
+{
+ resv->iter = resv->range.start;
+}
+
+/*
+ * Return the value that should be written to the doorbell after
+ * serializing descriptors for this reservation, i.e. the value of the
+ * write index after obtaining the reservation.
+ */
+static inline u64 sdxi_ring_resv_dbval(const struct sdxi_ring_resv *resv)
+{
+ return resv->range.end + 1;
+}
+
+#define sdxi_ring_resv_foreach(resv_, desc_) \
+ for (sdxi_ring_resv_reset(resv_), \
+ desc_ = sdxi_ring_resv_next(resv_); \
+ desc_; \
+ desc_ = sdxi_ring_resv_next(resv_))
+
+#endif /* DMA_SDXI_RING_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (11 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:16 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts Nathan Lynch via B4 Relay
` (9 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Add KUnit tests for the descriptor ring reservation API, covering:
- Valid reservations: full-ring and single-slot after advancing the
read pointer.
- Error paths: zero or over-capacity count (-EINVAL), inconsistent
index state (-EIO), and insufficient space (-EBUSY).
A .kunitconfig is included ease of use:
$ tools/testing/kunit/kunit.py run \
--kunitconfig=drivers/dma/sdxi/.kunitconfig
No SDXI hardware is required to run these tests.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/.kunitconfig | 4 ++
drivers/dma/sdxi/Kconfig | 10 ++++
drivers/dma/sdxi/Makefile | 3 ++
drivers/dma/sdxi/ring_kunit.c | 105 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 122 insertions(+)
diff --git a/drivers/dma/sdxi/.kunitconfig b/drivers/dma/sdxi/.kunitconfig
new file mode 100644
index 000000000000..16a1ae04b156
--- /dev/null
+++ b/drivers/dma/sdxi/.kunitconfig
@@ -0,0 +1,4 @@
+CONFIG_KUNIT=y
+CONFIG_DMADEVICES=y
+CONFIG_SDXI_CORE=y
+CONFIG_SDXI_KUNIT_TEST=y
diff --git a/drivers/dma/sdxi/Kconfig b/drivers/dma/sdxi/Kconfig
index b91b44231a04..7217375c9216 100644
--- a/drivers/dma/sdxi/Kconfig
+++ b/drivers/dma/sdxi/Kconfig
@@ -26,3 +26,13 @@ config SDXI_PCI
To compile this driver as a module, choose M here: the module
will be named "sdxi-pci".
+
+config SDXI_KUNIT_TEST
+ tristate "SDXI unit tests" if !KUNIT_ALL_TESTS
+ depends on SDXI_CORE && KUNIT
+ default KUNIT_ALL_TESTS
+ help
+ KUnit tests for parts of the SDXI driver. Does not require
+ SDXI hardware.
+
+ If unsure, say N.
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index bfcb443f1e64..00e3f1cb0808 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -7,3 +7,6 @@ sdxi-core-y := \
obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
sdxi-pci-y := pci.o
+
+obj-$(CONFIG_SDXI_KUNIT_TEST) += \
+ ring_kunit.o
diff --git a/drivers/dma/sdxi/ring_kunit.c b/drivers/dma/sdxi/ring_kunit.c
new file mode 100644
index 000000000000..3bc7073e0c39
--- /dev/null
+++ b/drivers/dma/sdxi/ring_kunit.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI descriptor ring management tests.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+#include <kunit/device.h>
+#include <kunit/test-bug.h>
+#include <kunit/test.h>
+#include <linux/container_of.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <linux/packing.h>
+#include <linux/string.h>
+
+#include "ring.h"
+
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
+
+static void valid(struct kunit *t)
+{
+ __le64 wi, ri;
+ struct sdxi_ring_state r;
+ struct sdxi_ring_resv resv;
+ struct sdxi_desc *descs, *desc;
+
+
+ descs = kunit_kmalloc_array(t, SZ_1K, sizeof(descs[0]),
+ GFP_KERNEL | __GFP_ZERO);
+ KUNIT_ASSERT_NOT_NULL(t, descs);
+
+ ri = wi = 0;
+ sdxi_ring_state_init(&r, &ri, &wi, SZ_1K, descs);
+
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&r, r.entries, &resv), 0);
+ KUNIT_EXPECT_EQ(t, resv.range.start, 0);
+ KUNIT_EXPECT_EQ(t, resv.range.end, r.entries - 1);
+ KUNIT_EXPECT_EQ(t, le64_to_cpu(wi), r.entries);
+ sdxi_ring_resv_foreach(&resv, desc) {
+ KUNIT_EXPECT_NOT_NULL_MSG(t, sdxi_ring_resv_next(&resv),
+ "unexpected null descriptor for index %llu", resv.iter);
+ }
+
+ ri = cpu_to_le64(1);
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&r, 1, &resv), 0);
+ KUNIT_EXPECT_EQ(t, le64_to_cpu(wi), r.entries + 1);
+ KUNIT_EXPECT_NOT_NULL(t, sdxi_ring_resv_next(&resv));
+}
+
+static void invalid(struct kunit *t)
+{
+ __le64 wi, ri;
+ struct sdxi_ring_state rs;
+ struct sdxi_ring_resv resv;
+ struct sdxi_desc *descs;
+
+ descs = kunit_kmalloc_array(t, SZ_1K, sizeof(descs[0]),
+ GFP_KERNEL | __GFP_ZERO);
+ KUNIT_ASSERT_NOT_NULL(t, descs);
+
+ ri = wi = 0;
+ sdxi_ring_state_init(&rs, &ri, &wi, SZ_1K, descs);
+
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, 0, &resv), -EINVAL);
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, rs.entries + 1, &resv), -EINVAL);
+
+ ri = cpu_to_le64(1);
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, 1, &resv), -EIO);
+
+ ri = 0;
+ wi = cpu_to_le64(rs.entries);
+ sdxi_ring_state_init(&rs, &ri, &wi, SZ_1K, descs);
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, 1, &resv), -EBUSY);
+
+ ri = cpu_to_le64(rs.entries);
+ wi = cpu_to_le64(rs.entries + 1);
+ sdxi_ring_state_init(&rs, &ri, &wi, SZ_1K, descs);
+ KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, rs.entries, &resv), -EBUSY);
+}
+
+static struct kunit_case testcases[] = {
+ KUNIT_CASE(valid),
+ KUNIT_CASE(invalid),
+ {}
+};
+
+static int setup_device(struct kunit *t)
+{
+ struct device *dev = kunit_device_register(t, "sdxi-mock-device");
+
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(t, dev);
+ t->priv = dev;
+ return 0;
+}
+
+static struct kunit_suite generic_desc_ts = {
+ .name = "SDXI descriptor ring management",
+ .test_cases = testcases,
+ .init = setup_device,
+};
+kunit_test_suite(generic_desc_ts);
+
+MODULE_DESCRIPTION("SDXI descriptor ring tests");
+MODULE_AUTHOR("Nathan Lynch");
+MODULE_LICENSE("GPL");
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (12 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:24 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator Nathan Lynch via B4 Relay
` (8 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Attach an instance of struct sdxi_ring_state to each context upon
allocation. Each ring state has the same lifetime has its context and
is freed upon context release.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.c | 14 ++++++++++++++
drivers/dma/sdxi/context.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
index 2f9a29df5d9c..907547ac450f 100644
--- a/drivers/dma/sdxi/context.c
+++ b/drivers/dma/sdxi/context.c
@@ -23,6 +23,7 @@
#include "context.h"
#include "hw.h"
+#include "ring.h"
#include "sdxi.h"
#define DEFAULT_DESC_RING_ENTRIES 1024
@@ -63,6 +64,7 @@ static void sdxi_free_cxt(struct sdxi_cxt *cxt)
dma_free_coherent(sdxi->dev, sq->ring_size,
sq->desc_ring, sq->ring_dma);
kfree(cxt->sq);
+ kfree(cxt->ring_state);
kfree(cxt);
}
@@ -80,6 +82,10 @@ static struct sdxi_cxt *sdxi_alloc_cxt(struct sdxi_dev *sdxi)
cxt->sdxi = sdxi;
+ cxt->ring_state = kzalloc_obj(*cxt->ring_state, GFP_KERNEL);
+ if (!cxt->ring_state)
+ return NULL;
+
cxt->sq = kzalloc_obj(*cxt->sq, GFP_KERNEL);
if (!cxt->sq)
return NULL;
@@ -314,6 +320,8 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
sq->cxt_sts->state = FIELD_PREP(SDXI_CXT_STS_STATE, CXTV_RUN);
cxt->id = SDXI_ADMIN_CXT_ID;
cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
+ sdxi_ring_state_init(cxt->ring_state, &sq->cxt_sts->read_index,
+ sq->write_index, sq->ring_entries, sq->desc_ring);
err = sdxi_publish_cxt(cxt);
if (err)
@@ -380,6 +388,8 @@ static void sdxi_cxt_id_assign(struct sdxi_cxt *cxt, struct sdxi_cxt_id *cxt_id)
*/
struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi)
{
+ struct sdxi_sq *sq;
+
/*
* Ensure an ID is available before allocating memory for the
* context and its control structures.
@@ -396,6 +406,10 @@ struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi)
cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
+ sq = cxt->sq;
+ sdxi_ring_state_init(cxt->ring_state, &sq->cxt_sts->read_index,
+ sq->write_index, sq->ring_entries, sq->desc_ring);
+
if (sdxi_publish_cxt(cxt))
return NULL;
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index 42e377bb2446..0aebcba3dc1e 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -55,6 +55,8 @@ struct sdxi_cxt {
dma_addr_t akey_table_dma;
struct sdxi_sq *sq;
+
+ struct sdxi_ring_state *ring_state;
};
int sdxi_admin_cxt_init(struct sdxi_dev *sdxi);
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (13 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:20 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 16/23] dmaengine: sdxi: Generic descriptor manipulation helpers Nathan Lynch via B4 Relay
` (7 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Each SDXI context has a table of access keys (AKeys). SDXI descriptors
submitted to a context may refer to an AKey associated with that
context by its index in the table. AKeys describe properties of the
access that the descriptor is to perform, such as PASID or a target
SDXI function, or an interrupt to trigger.
Use a per-context IDA to keep track of used entries in the table.
Provide sdxi_alloc_akey(), which claims an AKey table entry for the
caller to program directly; sdxi_akey_index(), which returns the
entry's index for programming into descriptors the caller intends to
submit; and sdxi_free_akey(), which clears the entry and makes it
available again.
The DMA engine provider is currently the only user and allocates a
single entry that encodes the access properties for copy operations
and a completion interrupt. More complex use patterns are possible
when user space gains access to SDXI contexts (not in this series).
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.c | 4 ++++
drivers/dma/sdxi/context.h | 24 ++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
index 907547ac450f..9b0984842d9a 100644
--- a/drivers/dma/sdxi/context.c
+++ b/drivers/dma/sdxi/context.c
@@ -15,6 +15,7 @@
#include <linux/dma-mapping.h>
#include <linux/dmapool.h>
#include <linux/errno.h>
+#include <linux/idr.h>
#include <linux/iommu.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -64,6 +65,7 @@ static void sdxi_free_cxt(struct sdxi_cxt *cxt)
dma_free_coherent(sdxi->dev, sq->ring_size,
sq->desc_ring, sq->ring_dma);
kfree(cxt->sq);
+ ida_destroy(&cxt->akey_ida);
kfree(cxt->ring_state);
kfree(cxt);
}
@@ -322,6 +324,7 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
sdxi_ring_state_init(cxt->ring_state, &sq->cxt_sts->read_index,
sq->write_index, sq->ring_entries, sq->desc_ring);
+ ida_init(&cxt->akey_ida);
err = sdxi_publish_cxt(cxt);
if (err)
@@ -409,6 +412,7 @@ struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi)
sq = cxt->sq;
sdxi_ring_state_init(cxt->ring_state, &sq->cxt_sts->read_index,
sq->write_index, sq->ring_entries, sq->desc_ring);
+ ida_init(&cxt->akey_ida);
if (sdxi_publish_cxt(cxt))
return NULL;
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index 0aebcba3dc1e..a8511f18db5d 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -6,8 +6,11 @@
#ifndef DMA_SDXI_CONTEXT_H
#define DMA_SDXI_CONTEXT_H
+#include <linux/array_size.h>
#include <linux/dma-mapping.h>
+#include <linux/idr.h>
#include <linux/io.h>
+#include <linux/string.h>
#include <linux/types.h>
#include "hw.h"
@@ -51,6 +54,7 @@ struct sdxi_cxt {
struct sdxi_cxt_ctl *cxt_ctl;
dma_addr_t cxt_ctl_dma;
+ struct ida akey_ida;
struct sdxi_akey_table *akey_table;
dma_addr_t akey_table_dma;
@@ -79,4 +83,24 @@ static inline void sdxi_cxt_push_doorbell(struct sdxi_cxt *cxt, u64 index)
writeq(index, cxt->db);
}
+static inline struct sdxi_akey_ent *sdxi_alloc_akey(struct sdxi_cxt *cxt)
+{
+ unsigned int max = ARRAY_SIZE(cxt->akey_table->entry) - 1;
+ int idx = ida_alloc_max(&cxt->akey_ida, max, GFP_KERNEL);
+
+ return idx < 0 ? NULL : &cxt->akey_table->entry[idx];
+}
+
+static inline unsigned int sdxi_akey_index(const struct sdxi_cxt *cxt,
+ const struct sdxi_akey_ent *akey)
+{
+ return akey - &cxt->akey_table->entry[0];
+}
+
+static inline void sdxi_free_akey(struct sdxi_cxt *cxt, struct sdxi_akey_ent *akey)
+{
+ memset(akey, 0, sizeof(*akey));
+ ida_free(&cxt->akey_ida, sdxi_akey_index(cxt, akey));
+}
+
#endif /* DMA_SDXI_CONTEXT_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 16/23] dmaengine: sdxi: Generic descriptor manipulation helpers
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (14 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API Nathan Lynch via B4 Relay
` (6 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Introduce small helper functions for manipulating certain common
properties of descriptors after their operation-specific encoding has
been performed but before they are submitted.
sdxi_desc_set_csb() associates an optional completion status block
with a descriptor.
sdxi_desc_set_fence() forces retirement of any prior descriptors in
the ring before the target descriptor is executed. This is useful for
interrupt descriptors that signal the completion of an operation.
sdxi_desc_set_sequential() ensures that all writes from prior
descriptor operations in the same context are made globally visible
prior to making writes from the target descriptor globally visible.
sdxi_desc_make_valid() sets the descriptor validity bit, transferring
ownership of the descriptor from software to the SDXI
implementation. (The implementation is allowed to execute the
descriptor at this point, but the caller is still obligated to push
the doorbell to ensure execution occurs.)
Each of the preceding functions will warn if invoked on a descriptor
that has already been released to the SDXI implementation (i.e. had
its validity bit set).
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/descriptor.h | 64 +++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/hw.h | 9 ++++++
2 files changed, 73 insertions(+)
diff --git a/drivers/dma/sdxi/descriptor.h b/drivers/dma/sdxi/descriptor.h
new file mode 100644
index 000000000000..c0f01b1be726
--- /dev/null
+++ b/drivers/dma/sdxi/descriptor.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef DMA_SDXI_DESCRIPTOR_H
+#define DMA_SDXI_DESCRIPTOR_H
+
+/*
+ * Facilities for encoding SDXI descriptors.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/ratelimit.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+#include "hw.h"
+
+static inline void sdxi_desc_vl_expect(const struct sdxi_desc *desc, bool expected)
+{
+ u8 vl = FIELD_GET(SDXI_DSC_VL, le32_to_cpu(desc->opcode));
+
+ WARN_RATELIMIT(vl != expected, "expected vl=%u but got %u\n", expected, vl);
+}
+
+static inline void sdxi_desc_set_csb(struct sdxi_desc *desc, dma_addr_t addr)
+{
+ sdxi_desc_vl_expect(desc, 0);
+ desc->csb_ptr = cpu_to_le64(FIELD_PREP(SDXI_DSC_CSB_PTR, addr >> 5));
+}
+
+static inline void sdxi_desc_make_valid(struct sdxi_desc *desc)
+{
+ u32 opcode = le32_to_cpu(desc->opcode);
+
+ sdxi_desc_vl_expect(desc, 0);
+ FIELD_MODIFY(SDXI_DSC_VL, &opcode, 1);
+ /*
+ * Once vl is set, no more modifications to the descriptor
+ * payload are allowed. Ensure the vl update is ordered after
+ * all other initialization of the descriptor.
+ */
+ dma_wmb();
+ WRITE_ONCE(desc->opcode, cpu_to_le32(opcode));
+}
+
+static inline void sdxi_desc_set_fence(struct sdxi_desc *desc)
+{
+ u32 opcode = le32_to_cpu(desc->opcode);
+
+ sdxi_desc_vl_expect(desc, 0);
+ FIELD_MODIFY(SDXI_DSC_FE, &opcode, 1);
+ desc->opcode = cpu_to_le32(opcode);
+}
+
+static inline void sdxi_desc_set_sequential(struct sdxi_desc *desc)
+{
+ u32 opcode = le32_to_cpu(desc->opcode);
+
+ sdxi_desc_vl_expect(desc, 0);
+ FIELD_MODIFY(SDXI_DSC_SE, &opcode, 1);
+ desc->opcode = cpu_to_le32(opcode);
+}
+
+#endif /* DMA_SDXI_DESCRIPTOR_H */
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index 4b65337a5975..29aa6c7e6c23 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -140,6 +140,15 @@ struct sdxi_desc {
__u8 operation[52];
__le64 csb_ptr;
);
+
+/* For opcode field */
+#define SDXI_DSC_VL BIT(0)
+#define SDXI_DSC_SE BIT(1)
+#define SDXI_DSC_FE BIT(2)
+
+/* For csb_ptr field */
+#define SDXI_DSC_CSB_PTR GENMASK_ULL(63, 5)
+
};
} __packed __aligned(64);
static_assert(sizeof(struct sdxi_desc) == 64);
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (15 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 16/23] dmaengine: sdxi: Generic descriptor manipulation helpers Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:21 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 18/23] dmaengine: sdxi: Encode context start, stop, and sync descriptors Nathan Lynch via B4 Relay
` (5 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Introduce an API for managing completion status blocks. These are
DMA-coherent buffers that may be optionally attached to SDXI
descriptors to signal completion. The SDXI implementation clears the
signal field (initialized to 1) upon completion, setting an
error bit in the flags field if problems were encountered executing
the descriptor.
Callers allocate completion blocks from a per-device DMA pool via
sdxi_completion_alloc(). sdxi_completion_attach() associates a
completion with a descriptor by encoding the completion's DMA address
into the descriptor's csb_ptr field.
sdxi_completion_poll() busy-waits until the signal field is cleared by
the implementation, and is intended for descriptors that are expected
to execute quickly.
sdxi_completion_signaled() and sdxi_completion_errored() query the
signal field and error flag of the completion, respectively.
struct sdxi_completion is kept opaque to callers. A DEFINE_FREE
cleanup handler is provided.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Makefile | 1 +
drivers/dma/sdxi/completion.c | 87 +++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/completion.h | 25 +++++++++++++
drivers/dma/sdxi/hw.h | 1 +
4 files changed, 114 insertions(+)
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index 00e3f1cb0808..eacad504a816 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_SDXI_CORE) += sdxi-core.o
sdxi-core-y := \
+ completion.o \
context.o \
device.o \
ring.o
diff --git a/drivers/dma/sdxi/completion.c b/drivers/dma/sdxi/completion.c
new file mode 100644
index 000000000000..7ffd034b129b
--- /dev/null
+++ b/drivers/dma/sdxi/completion.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI Descriptor Completion Status Block handling.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+#include <linux/cleanup.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
+#include <linux/jiffies.h>
+#include <linux/slab.h>
+
+#include "completion.h"
+#include "descriptor.h"
+#include "hw.h"
+
+struct sdxi_completion {
+ struct sdxi_dev *sdxi;
+ struct sdxi_cst_blk *cst_blk;
+ dma_addr_t cst_blk_dma;
+};
+
+struct sdxi_completion *sdxi_completion_alloc(struct sdxi_dev *sdxi)
+{
+ struct sdxi_cst_blk *cst_blk;
+ dma_addr_t cst_blk_dma;
+
+ /*
+ * Assume callers can't tolerate GFP_KERNEL and use
+ * GFP_NOWAIT. Add a gfp_t flags parameter if that changes.
+ */
+ struct sdxi_completion *sc __free(kfree) = kmalloc(sizeof(*sc), GFP_NOWAIT);
+ if (!sc)
+ return NULL;
+
+ cst_blk = dma_pool_zalloc(sdxi->cst_blk_pool, GFP_NOWAIT, &cst_blk_dma);
+ if (!cst_blk)
+ return NULL;
+
+ cst_blk->signal = cpu_to_le64(1);
+
+ *sc = (typeof(*sc)) {
+ .sdxi = sdxi,
+ .cst_blk = cst_blk,
+ .cst_blk_dma = cst_blk_dma,
+ };
+
+ return_ptr(sc);
+}
+
+void sdxi_completion_free(struct sdxi_completion *sc)
+{
+ dma_pool_free(sc->sdxi->cst_blk_pool, sc->cst_blk, sc->cst_blk_dma);
+ kfree(sc);
+}
+
+int sdxi_completion_poll(const struct sdxi_completion *sc)
+{
+ unsigned long deadline = jiffies + msecs_to_jiffies(1000);
+
+ while (le64_to_cpu(READ_ONCE(sc->cst_blk->signal)) != 0) {
+ if (time_after(jiffies, deadline))
+ return -ETIMEDOUT;
+ cpu_relax();
+ }
+
+ return sdxi_completion_errored(sc) ? -EIO : 0;
+}
+
+bool sdxi_completion_signaled(const struct sdxi_completion *sc)
+{
+ dma_rmb();
+ return (sc->cst_blk->signal == 0);
+}
+
+bool sdxi_completion_errored(const struct sdxi_completion *sc)
+{
+ dma_rmb();
+ return FIELD_GET(SDXI_CST_BLK_ER_BIT, le32_to_cpu(sc->cst_blk->flags));
+}
+
+
+void sdxi_completion_attach(struct sdxi_desc *desc,
+ const struct sdxi_completion *cs)
+{
+ sdxi_desc_set_csb(desc, cs->cst_blk_dma);
+}
diff --git a/drivers/dma/sdxi/completion.h b/drivers/dma/sdxi/completion.h
new file mode 100644
index 000000000000..2d11568ac2b9
--- /dev/null
+++ b/drivers/dma/sdxi/completion.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright Advanced Micro Devices, Inc. */
+#ifndef DMA_SDXI_COMPLETION_H
+#define DMA_SDXI_COMPLETION_H
+
+#include <linux/compiler_attributes.h>
+#include "sdxi.h"
+
+/*
+ * Polled completion status block that can be attached to a
+ * descriptor.
+ */
+struct sdxi_completion;
+struct sdxi_desc;
+struct sdxi_completion *sdxi_completion_alloc(struct sdxi_dev *sdxi);
+void sdxi_completion_free(struct sdxi_completion *sc);
+int __must_check sdxi_completion_poll(const struct sdxi_completion *sc);
+void sdxi_completion_attach(struct sdxi_desc *desc,
+ const struct sdxi_completion *sc);
+bool sdxi_completion_signaled(const struct sdxi_completion *sc);
+bool sdxi_completion_errored(const struct sdxi_completion *sc);
+
+DEFINE_FREE(sdxi_completion, struct sdxi_completion *, if (_T) sdxi_completion_free(_T))
+
+#endif /* DMA_SDXI_COMPLETION_H */
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index 29aa6c7e6c23..5c5bef66f1fb 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -125,6 +125,7 @@ static_assert(sizeof(struct sdxi_akey_ent) == 16);
struct sdxi_cst_blk {
__le64 signal;
__le32 flags;
+#define SDXI_CST_BLK_ER_BIT BIT(31)
__u8 rsvd_0[20];
} __packed __aligned(32);
static_assert(sizeof(struct sdxi_cst_blk) == 32);
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 18/23] dmaengine: sdxi: Encode context start, stop, and sync descriptors
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (16 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs Nathan Lynch via B4 Relay
` (4 subsequent siblings)
22 siblings, 0 replies; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Introduce the low-level support for serializing three operation types
to the descriptor ring of the admin context: context start, context
stop, and sync. Each operation has its own distinct type that overlays
the generic struct sdxi_desc, along with a dedicated encoder function
that accepts an operation-specific parameter struct.
The parameter structs (sdxi_cxt_start, sdxi_cxt_stop, sdxi_sync)
expose only a necessary subset of the available descriptor fields to
callers, i.e. the target context range. These can be expanded over
time as needed.
Each encoder function is intended to 1) set any mandatory field values
for the descriptor type (e.g. SDXI_DSC_FE=1 for context start); and 2)
translate conventional kernel types (dma_addr_t, CPU-endian values)
from the parameter block to the descriptor in memory. While they're
expected to operate directly on descriptor ring memory, they do not
set the descriptor validity bit. That is left to the caller, which may
need to make other modifictions to the descriptor, such as attaching a
completion block, before releasing it to the SDXI implementation.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Makefile | 1 +
drivers/dma/sdxi/descriptor.c | 91 +++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/descriptor.h | 46 ++++++++++++++++++++++
drivers/dma/sdxi/hw.h | 64 ++++++++++++++++++++++++++++++
4 files changed, 202 insertions(+)
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index eacad504a816..9b051eca9af7 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_SDXI_CORE) += sdxi-core.o
sdxi-core-y := \
completion.o \
context.o \
+ descriptor.o \
device.o \
ring.o
diff --git a/drivers/dma/sdxi/descriptor.c b/drivers/dma/sdxi/descriptor.c
new file mode 100644
index 000000000000..be2a9244ce19
--- /dev/null
+++ b/drivers/dma/sdxi/descriptor.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI descriptor encoding.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#include <kunit/visibility.h>
+#include <linux/bitfield.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+#include "hw.h"
+#include "descriptor.h"
+
+int sdxi_encode_cxt_start(struct sdxi_desc *desc,
+ const struct sdxi_cxt_start *params)
+{
+ u64 csb_ptr;
+ u32 opcode;
+
+ opcode = (FIELD_PREP(SDXI_DSC_FE, 1) |
+ FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_CXT_START_NM) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_ADMIN));
+
+ csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .cxt_start = (typeof(desc->cxt_start)) {
+ .opcode = cpu_to_le32(opcode),
+ .cxt_start = cpu_to_le16(params->range.cxt_start),
+ .cxt_end = cpu_to_le16(params->range.cxt_end),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_cxt_start);
+
+int sdxi_encode_cxt_stop(struct sdxi_desc *desc,
+ const struct sdxi_cxt_stop *params)
+{
+ u64 csb_ptr;
+ u32 opcode;
+
+ opcode = (FIELD_PREP(SDXI_DSC_FE, 1) |
+ FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_CXT_STOP) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_ADMIN));
+
+ csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .cxt_stop = (typeof(desc->cxt_stop)) {
+ .opcode = cpu_to_le32(opcode),
+ .cxt_start = cpu_to_le16(params->range.cxt_start),
+ .cxt_end = cpu_to_le16(params->range.cxt_end),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_cxt_stop);
+
+int sdxi_encode_sync(struct sdxi_desc *desc, const struct sdxi_sync *params)
+{
+ u64 csb_ptr;
+ u32 opcode;
+ u8 cflags;
+
+ opcode = (FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_SYNC) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_ADMIN));
+
+ cflags = FIELD_PREP(SDXI_DSC_SYNC_FLT, params->filter);
+
+ csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .sync = (typeof(desc->sync)) {
+ .opcode = cpu_to_le32(opcode),
+ .cflags = cflags,
+ .cxt_start = cpu_to_le16(params->range.cxt_start),
+ .cxt_end = cpu_to_le16(params->range.cxt_end),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_sync);
diff --git a/drivers/dma/sdxi/descriptor.h b/drivers/dma/sdxi/descriptor.h
index c0f01b1be726..5b8fd7cbaa03 100644
--- a/drivers/dma/sdxi/descriptor.h
+++ b/drivers/dma/sdxi/descriptor.h
@@ -9,6 +9,7 @@
*/
#include <linux/bitfield.h>
+#include <linux/minmax.h>
#include <linux/ratelimit.h>
#include <linux/types.h>
#include <asm/byteorder.h>
@@ -61,4 +62,49 @@ static inline void sdxi_desc_set_sequential(struct sdxi_desc *desc)
desc->opcode = cpu_to_le32(opcode);
}
+struct sdxi_cxt_range {
+ u16 cxt_start;
+ u16 cxt_end;
+};
+
+static inline struct sdxi_cxt_range sdxi_cxt_range(u16 a, u16 b)
+{
+ return (struct sdxi_cxt_range) {
+ .cxt_start = min(a, b),
+ .cxt_end = max(a, b),
+ };
+}
+
+static inline struct sdxi_cxt_range sdxi_cxt_range_single(u16 nr)
+{
+ return sdxi_cxt_range(nr, nr);
+}
+
+struct sdxi_cxt_start {
+ struct sdxi_cxt_range range;
+};
+
+int sdxi_encode_cxt_start(struct sdxi_desc *desc,
+ const struct sdxi_cxt_start *params);
+
+struct sdxi_cxt_stop {
+ struct sdxi_cxt_range range;
+};
+
+int sdxi_encode_cxt_stop(struct sdxi_desc *desc,
+ const struct sdxi_cxt_stop *params);
+
+struct sdxi_sync {
+ enum sdxi_sync_filter {
+ SDXI_SYNC_FLT_CXT = 0x0,
+ SDXI_SYNC_FLT_STOP = 0x1,
+ SDXI_SYNC_FLT_AKEY = 0x2,
+ SDXI_SYNC_FLT_RKEY = 0x3,
+ SDXI_SYNC_FLT_FN = 0x4,
+ } filter;
+ struct sdxi_cxt_range range;
+};
+
+int sdxi_encode_sync(struct sdxi_desc *desc, const struct sdxi_sync *params);
+
#endif /* DMA_SDXI_DESCRIPTOR_H */
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index 5c5bef66f1fb..d340ed7dc061 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -146,12 +146,76 @@ struct sdxi_desc {
#define SDXI_DSC_VL BIT(0)
#define SDXI_DSC_SE BIT(1)
#define SDXI_DSC_FE BIT(2)
+#define SDXI_DSC_SUBTYPE GENMASK(15, 8)
+#define SDXI_DSC_TYPE GENMASK(26, 16)
/* For csb_ptr field */
+#define SDXI_DSC_NP BIT_ULL(0)
#define SDXI_DSC_CSB_PTR GENMASK_ULL(63, 5)
+#define define_sdxi_dsc(tag_, name_, op_body_) \
+ struct tag_ { \
+ __le32 opcode; \
+ op_body_ \
+ __le64 csb_ptr; \
+ } __packed __aligned(64) name_; \
+ static_assert(sizeof(struct tag_) == \
+ sizeof(struct sdxi_dsc_generic)); \
+ static_assert(offsetof(struct tag_, csb_ptr) == \
+ offsetof(struct sdxi_dsc_generic, csb_ptr))
+
+ /* SDXI 1.0 Table 6-14: DSC_CXT_START Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_cxt_start, cxt_start,
+ __u8 rsvd_0;
+ __u8 vflags;
+ __le16 vf_num;
+ __le16 cxt_start;
+ __le16 cxt_end;
+ __u8 rsvd_1[4];
+ __le64 db_value;
+ __u8 rsvd_2[32];
+ );
+
+ /* SDXI 1.0 Table 6-15: DSC_CXT_STOP Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_cxt_stop, cxt_stop,
+ __u8 rsvd_0;
+ __u8 vflags;
+ __le16 vf_num;
+ __le16 cxt_start;
+ __le16 cxt_end;
+ __u8 rsvd_1[44];
+ );
+
+ /* SDXI 1.0 Table 6-22: DSC_SYNC Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_sync, sync,
+ __u8 cflags;
+ __u8 vflags;
+ __le16 vf_num;
+ __le16 cxt_start;
+ __le16 cxt_end;
+ __le16 key_start;
+ __le16 key_end;
+ __u8 rsvd_0[40];
+ );
+/* For use with sync.cflags */
+#define SDXI_DSC_SYNC_FLT GENMASK(2, 0)
+
+#undef define_sdxi_dsc
};
} __packed __aligned(64);
static_assert(sizeof(struct sdxi_desc) == 64);
+/* SDXI 1.0 Table 6-1: SDXI Operation Groups */
+enum sdxi_dsc_type {
+ SDXI_DSC_OP_TYPE_ADMIN = 0x002,
+};
+
+/* SDXI 1.0 Table 6-2: SDXI Operation Groups, Types, and Subtypes */
+enum sdxi_dsc_subtype {
+ /* Administrative */
+ SDXI_DSC_OP_SUBTYPE_CXT_START_NM = 0x03,
+ SDXI_DSC_OP_SUBTYPE_CXT_STOP = 0x04,
+ SDXI_DSC_OP_SUBTYPE_SYNC = 0x06,
+};
+
#endif /* DMA_SDXI_HW_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (17 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 18/23] dmaengine: sdxi: Encode context start, stop, and sync descriptors Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:22 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors Nathan Lynch via B4 Relay
` (3 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Starting and stopping SDXI client contexts is implemented by submitting
special-purpose descriptors to a function's admin context.
Introduce high-level context start and stop APIs that operate on
struct sdxi_cxt objects, encapsulating the administrative descriptor
submission and completion signaling. These are intended for use by
clients such as the DMA engine provider to come.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/context.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/context.h | 3 ++
2 files changed, 80 insertions(+)
diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
index 9b0984842d9a..afd341416342 100644
--- a/drivers/dma/sdxi/context.c
+++ b/drivers/dma/sdxi/context.c
@@ -22,7 +22,9 @@
#include <asm/barrier.h>
#include <asm/rwonce.h>
+#include "completion.h"
#include "context.h"
+#include "descriptor.h"
#include "hw.h"
#include "ring.h"
#include "sdxi.h"
@@ -335,6 +337,81 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
return devm_add_action_or_reset(sdxi->dev, free_admin_cxt, sdxi);
}
+int sdxi_start_cxt(struct sdxi_cxt *cxt)
+{
+ struct sdxi_cxt *adm = to_admin_cxt(cxt);
+ struct sdxi_desc *desc;
+ struct sdxi_ring_resv resv;
+ int err;
+
+ might_sleep();
+
+ struct sdxi_completion *sc __free(sdxi_completion) =
+ sdxi_completion_alloc(cxt->sdxi);
+
+ if (!sc)
+ return -ENOMEM;
+
+ /* This is not how to start the admin context. */
+ if (WARN_ON(adm == cxt))
+ return -EINVAL;
+
+ err = sdxi_ring_reserve(adm->ring_state, 1, &resv);
+ if (err)
+ return err;
+
+ desc = sdxi_ring_resv_next(&resv);
+ sdxi_encode_cxt_start(desc, &(const struct sdxi_cxt_start) {
+ .range = sdxi_cxt_range_single(cxt->id),
+ });
+ sdxi_completion_attach(desc, sc);
+ sdxi_desc_make_valid(desc);
+ sdxi_cxt_push_doorbell(adm, sdxi_ring_resv_dbval(&resv));
+
+ return sdxi_completion_poll(sc);
+}
+
+void sdxi_stop_cxt(struct sdxi_cxt *cxt)
+{
+ struct sdxi_cxt *adm = to_admin_cxt(cxt);
+ struct sdxi_desc *stop, *sync;
+ struct sdxi_ring_resv resv;
+ int err;
+
+ might_sleep();
+
+ struct sdxi_completion *sc __free(sdxi_completion) =
+ sdxi_completion_alloc(cxt->sdxi);
+
+ if (!sc)
+ return;
+
+ /* This is not how to stop the admin context. */
+ if (WARN_ON(adm == cxt))
+ return;
+
+ err = sdxi_ring_reserve(adm->ring_state, 2, &resv);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ stop = sdxi_ring_resv_next(&resv);
+ sync = sdxi_ring_resv_next(&resv);
+
+ sdxi_encode_cxt_stop(stop, &(const struct sdxi_cxt_stop) {
+ .range = sdxi_cxt_range_single(cxt->id),
+ });
+ sdxi_encode_sync(sync, &(const struct sdxi_sync) {
+ .filter = SDXI_SYNC_FLT_STOP,
+ .range = sdxi_cxt_range_single(cxt->id),
+ });
+ sdxi_completion_attach(sync, sc);
+ sdxi_desc_make_valid(stop);
+ sdxi_desc_make_valid(sync);
+ sdxi_cxt_push_doorbell(adm, sdxi_ring_resv_dbval(&resv));
+
+ WARN_ON(sdxi_completion_poll(sc));
+}
+
/*
* Temporary owner for context id until it can be assigned to a
* context object; enables scope-based cleanup.
diff --git a/drivers/dma/sdxi/context.h b/drivers/dma/sdxi/context.h
index a8511f18db5d..880e7a289810 100644
--- a/drivers/dma/sdxi/context.h
+++ b/drivers/dma/sdxi/context.h
@@ -68,6 +68,9 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi);
struct sdxi_cxt *sdxi_cxt_new(struct sdxi_dev *sdxi);
void sdxi_cxt_exit(struct sdxi_cxt *cxt);
+int sdxi_start_cxt(struct sdxi_cxt *cxt);
+void sdxi_stop_cxt(struct sdxi_cxt *cxt);
+
static inline struct sdxi_cxt *to_admin_cxt(const struct sdxi_cxt *cxt)
{
return cxt->sdxi->admin_cxt;
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (18 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:20 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding Nathan Lynch via B4 Relay
` (2 subsequent siblings)
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Introduce low-level support for serializing three operation types to
the descriptor ring of a client context: nop, copy, and interrupt.
As with the administrative descriptor support introduced earlier, each
operation has its own distinct type that overlays the generic struct
sdxi_desc, along with a dedicated encoder function that accepts an
operation-specific parameter struct.
Copy descriptors are used to implement memcpy offload for the DMA
engine provider, and interrupt descriptors are used to signal the
completion of preceding descriptors in the ring. Nops can be used in
error paths where a ring reservation has been obtained and the caller
needs to submit valid descriptors before returning.
Conditionally expose sdxi_encode_size32() for unit testing.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/descriptor.c | 107 ++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/descriptor.h | 25 ++++++++++
drivers/dma/sdxi/hw.h | 33 +++++++++++++
3 files changed, 165 insertions(+)
diff --git a/drivers/dma/sdxi/descriptor.c b/drivers/dma/sdxi/descriptor.c
index be2a9244ce19..41019e747528 100644
--- a/drivers/dma/sdxi/descriptor.c
+++ b/drivers/dma/sdxi/descriptor.c
@@ -7,12 +7,119 @@
#include <kunit/visibility.h>
#include <linux/bitfield.h>
+#include <linux/bug.h>
+#include <linux/range.h>
+#include <linux/sizes.h>
#include <linux/types.h>
#include <asm/byteorder.h>
#include "hw.h"
#include "descriptor.h"
+VISIBLE_IF_KUNIT int __must_check sdxi_encode_size32(u64 size, __le32 *dest)
+{
+ /*
+ * sizes are encoded as value - 1:
+ * value encoding
+ * 1 0
+ * 2 1
+ * ...
+ * 4G 0xffffffff
+ */
+ if (WARN_ON_ONCE(size > SZ_4G) ||
+ WARN_ON_ONCE(size == 0))
+ return -EINVAL;
+ size = clamp_val(size, 1, SZ_4G);
+ *dest = cpu_to_le32((u32)(size - 1));
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_size32);
+
+void sdxi_serialize_nop(struct sdxi_desc *desc)
+{
+ u32 opcode = (FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_NOP) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_DMAB));
+ u64 csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .nop = (typeof(desc->nop)) {
+ .opcode = cpu_to_le32(opcode),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+}
+
+int sdxi_encode_copy(struct sdxi_desc *desc, const struct sdxi_copy *params)
+{
+ u64 csb_ptr;
+ u32 opcode;
+ __le32 size;
+ int err;
+
+ err = sdxi_encode_size32(params->len, &size);
+ if (err)
+ return err;
+ /*
+ * Reject overlapping src and dst. "Software ... shall not
+ * overlap the source buffer, destination buffer, Atomic
+ * Return Data, or completion status block." - SDXI 1.0 5.6
+ * Memory Consistency Model
+ */
+ if (range_overlaps(&(const struct range) {
+ .start = params->src,
+ .end = params->src + params->len - 1,
+ },
+ &(const struct range) {
+ .start = params->dst,
+ .end = params->dst + params->len - 1,
+ }))
+ return -EINVAL;
+
+ opcode = (FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_COPY) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_DMAB));
+
+ csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .copy = (typeof(desc->copy)) {
+ .opcode = cpu_to_le32(opcode),
+ .size = size,
+ .akey0 = cpu_to_le16(params->src_akey),
+ .akey1 = cpu_to_le16(params->dst_akey),
+ .addr0 = cpu_to_le64(params->src),
+ .addr1 = cpu_to_le64(params->dst),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_copy);
+
+int sdxi_encode_intr(struct sdxi_desc *desc,
+ const struct sdxi_intr *params)
+{
+ u64 csb_ptr;
+ u32 opcode;
+
+ opcode = (FIELD_PREP(SDXI_DSC_SUBTYPE, SDXI_DSC_OP_SUBTYPE_INTR) |
+ FIELD_PREP(SDXI_DSC_TYPE, SDXI_DSC_OP_TYPE_INTR));
+
+ csb_ptr = FIELD_PREP(SDXI_DSC_NP, 1);
+
+ *desc = (typeof(*desc)) {
+ .intr = (typeof(desc->intr)) {
+ .opcode = cpu_to_le32(opcode),
+ .akey = cpu_to_le16(params->akey),
+ .csb_ptr = cpu_to_le64(csb_ptr),
+ },
+ };
+
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_intr);
+
int sdxi_encode_cxt_start(struct sdxi_desc *desc,
const struct sdxi_cxt_start *params)
{
diff --git a/drivers/dma/sdxi/descriptor.h b/drivers/dma/sdxi/descriptor.h
index 5b8fd7cbaa03..14f92c8dea1d 100644
--- a/drivers/dma/sdxi/descriptor.h
+++ b/drivers/dma/sdxi/descriptor.h
@@ -9,6 +9,7 @@
*/
#include <linux/bitfield.h>
+#include <linux/kconfig.h>
#include <linux/minmax.h>
#include <linux/ratelimit.h>
#include <linux/types.h>
@@ -16,6 +17,10 @@
#include "hw.h"
+#if IS_ENABLED(CONFIG_KUNIT)
+int __must_check sdxi_encode_size32(u64 size, __le32 *dest);
+#endif
+
static inline void sdxi_desc_vl_expect(const struct sdxi_desc *desc, bool expected)
{
u8 vl = FIELD_GET(SDXI_DSC_VL, le32_to_cpu(desc->opcode));
@@ -80,6 +85,26 @@ static inline struct sdxi_cxt_range sdxi_cxt_range_single(u16 nr)
return sdxi_cxt_range(nr, nr);
}
+void sdxi_serialize_nop(struct sdxi_desc *desc);
+
+struct sdxi_copy {
+ dma_addr_t src;
+ dma_addr_t dst;
+ u64 len;
+ u16 src_akey;
+ u16 dst_akey;
+};
+
+int sdxi_encode_copy(struct sdxi_desc *desc,
+ const struct sdxi_copy *params);
+
+struct sdxi_intr {
+ u16 akey;
+};
+
+int sdxi_encode_intr(struct sdxi_desc *desc,
+ const struct sdxi_intr *params);
+
struct sdxi_cxt_start {
struct sdxi_cxt_range range;
};
diff --git a/drivers/dma/sdxi/hw.h b/drivers/dma/sdxi/hw.h
index d340ed7dc061..566bebf3f356 100644
--- a/drivers/dma/sdxi/hw.h
+++ b/drivers/dma/sdxi/hw.h
@@ -164,6 +164,30 @@ struct sdxi_desc {
static_assert(offsetof(struct tag_, csb_ptr) == \
offsetof(struct sdxi_dsc_generic, csb_ptr))
+ /* SDXI 1.0 Table 6-6: DSC_DMAB_NOP Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_dmab_nop, nop,
+ __u8 rsvd_0[52];
+ );
+
+ /* SDXI 1.0 Table 6-8: DSC_DMAB_COPY Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_dmab_copy, copy,
+ __le32 size;
+ __u8 attr;
+ __u8 rsvd_0[3];
+ __le16 akey0;
+ __le16 akey1;
+ __le64 addr0;
+ __le64 addr1;
+ __u8 rsvd_1[24];
+ );
+
+ /* SDXI 1.0 Table 6-12: DSC_INTR Descriptor Format */
+ define_sdxi_dsc(sdxi_dsc_intr, intr,
+ __u8 rsvd_0[8];
+ __le16 akey;
+ __u8 rsvd_1[42];
+ );
+
/* SDXI 1.0 Table 6-14: DSC_CXT_START Descriptor Format */
define_sdxi_dsc(sdxi_dsc_cxt_start, cxt_start,
__u8 rsvd_0;
@@ -207,11 +231,20 @@ static_assert(sizeof(struct sdxi_desc) == 64);
/* SDXI 1.0 Table 6-1: SDXI Operation Groups */
enum sdxi_dsc_type {
+ SDXI_DSC_OP_TYPE_DMAB = 0x001,
SDXI_DSC_OP_TYPE_ADMIN = 0x002,
+ SDXI_DSC_OP_TYPE_INTR = 0x004,
};
/* SDXI 1.0 Table 6-2: SDXI Operation Groups, Types, and Subtypes */
enum sdxi_dsc_subtype {
+ /* DMA Base */
+ SDXI_DSC_OP_SUBTYPE_NOP = 0x01,
+ SDXI_DSC_OP_SUBTYPE_COPY = 0x03,
+
+ /* Interrupt */
+ SDXI_DSC_OP_SUBTYPE_INTR = 0x00,
+
/* Administrative */
SDXI_DSC_OP_SUBTYPE_CXT_START_NM = 0x03,
SDXI_DSC_OP_SUBTYPE_CXT_STOP = 0x04,
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (19 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:26 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Test the encoder function for each descriptor type currently used by
the driver.
The production code uses the GENMASK()/BIT() family of macros to
support encoding descriptors. The tests for that code use the packing
API to decode descriptors produced by that code without relying on
those bitmask definitions.
By limiting what's shared between the real code and the tests we gain
confidence in both. If both the driver code and the tests rely on the
bitfield macros, and then upon adding a new descriptor field the
author mistranslates the bit numbering from the spec, that error is
more likely to propagate to the tests undetected than if the test code
relies on a separate mechanism for decoding descriptors.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Kconfig | 1 +
drivers/dma/sdxi/Makefile | 1 +
drivers/dma/sdxi/descriptor_kunit.c | 484 ++++++++++++++++++++++++++++++++++++
3 files changed, 486 insertions(+)
diff --git a/drivers/dma/sdxi/Kconfig b/drivers/dma/sdxi/Kconfig
index 7217375c9216..314b3e725ccb 100644
--- a/drivers/dma/sdxi/Kconfig
+++ b/drivers/dma/sdxi/Kconfig
@@ -31,6 +31,7 @@ config SDXI_KUNIT_TEST
tristate "SDXI unit tests" if !KUNIT_ALL_TESTS
depends on SDXI_CORE && KUNIT
default KUNIT_ALL_TESTS
+ select PACKING
help
KUnit tests for parts of the SDXI driver. Does not require
SDXI hardware.
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index 9b051eca9af7..5218ae2c86ae 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -11,4 +11,5 @@ obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
sdxi-pci-y := pci.o
obj-$(CONFIG_SDXI_KUNIT_TEST) += \
+ descriptor_kunit.o \
ring_kunit.o
diff --git a/drivers/dma/sdxi/descriptor_kunit.c b/drivers/dma/sdxi/descriptor_kunit.c
new file mode 100644
index 000000000000..1f3c2e7ab2dd
--- /dev/null
+++ b/drivers/dma/sdxi/descriptor_kunit.c
@@ -0,0 +1,484 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI descriptor encoding tests.
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ *
+ * While the driver code uses bitfield macros (BIT, GENMASK) to encode
+ * descriptors, these tests use the packing API to decode them.
+ * Capturing the descriptor layout using PACKED_FIELD() is basically a
+ * copy-paste exercise since SDXI defines control structure fields in
+ * terms of bit offsets. Eschewing the bitfield constants such as
+ * SDXI_DSC_VL in the test code makes it possible for the tests to
+ * detect any mistakes in defining them.
+ *
+ * Note that the checks in unpack_fields() can be quite time-consuming
+ * at build time. Uncomment '#define SKIP_PACKING_CHECKS' below if
+ * that's too annoying when working on this code.
+ */
+#include <kunit/device.h>
+#include <kunit/test-bug.h>
+#include <kunit/test.h>
+#include <linux/container_of.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <linux/packing.h>
+#include <linux/stddef.h>
+#include <linux/string.h>
+
+#include "descriptor.h"
+
+/* #define SKIP_PACKING_CHECKS */
+
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
+
+enum {
+ SDXI_PACKING_QUIRKS = QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST,
+};
+
+
+#define desc_field(_high, _low, _target_struct, _member) \
+ PACKED_FIELD(_high, _low, _target_struct, _member)
+#define desc_flag(_bit, _target_struct, _member) \
+ desc_field(_bit, _bit, _target_struct, _member)
+
+/* DMAB_COPY */
+struct unpacked__copy {
+ u32 size;
+ u8 attr_src;
+ u8 attr_dst;
+ u16 akey0;
+ u16 akey1;
+ u64 addr0;
+ u64 addr1;
+};
+
+#define copy_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked__copy, _member)
+
+static const struct packed_field_u16 copy_subfields[] = {
+ copy_field(63, 32, size),
+ copy_field(67, 64, attr_src),
+ copy_field(71, 68, attr_dst),
+ copy_field(111, 96, akey0),
+ copy_field(127, 112, akey1),
+ copy_field(191, 128, addr0),
+ copy_field(255, 192, addr1),
+};
+
+/* DSC_INTR */
+struct unpacked__intr {
+ u16 akey;
+};
+
+#define intr_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked__intr, _member)
+
+static const struct packed_field_u16 intr_subfields[] = {
+ intr_field(111, 96, akey),
+};
+
+/* DSC_SYNC */
+struct unpacked__sync {
+ u8 flt;
+ bool vf;
+ u16 vf_num;
+ u16 cxt_start;
+ u16 cxt_end;
+ u16 key_start;
+ u16 key_end;
+};
+
+#define sync_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked__sync, _member)
+#define sync_flag(_bit, _member) sync_field(_bit, _bit, _member)
+
+static const struct packed_field_u16 sync_subfields[] = {
+ sync_field(34, 32, flt),
+ sync_flag(47, vf),
+ sync_field(63, 48, vf_num),
+ sync_field(79, 64, cxt_start),
+ sync_field(95, 80, cxt_end),
+ sync_field(111, 96, key_start),
+ sync_field(127, 112, key_end),
+};
+
+/* DSC_CXT_START */
+struct unpacked__cxt_start {
+ bool dv;
+ bool vf;
+ u16 vf_num;
+ u16 cxt_start;
+ u16 cxt_end;
+ u64 db_value;
+};
+
+#define cxt_start_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked__cxt_start, _member)
+#define cxt_start_flag(_bit, _member) cxt_start_field(_bit, _bit, _member)
+
+static const struct packed_field_u16 cxt_start_subfields[] = {
+ cxt_start_flag(46, dv),
+ cxt_start_flag(47, vf),
+ cxt_start_field(63, 48, vf_num),
+ cxt_start_field(79, 64, cxt_start),
+ cxt_start_field(95, 80, cxt_end),
+ cxt_start_field(191, 128, db_value),
+};
+
+/* DSC_CXT_STOP */
+struct unpacked__cxt_stop {
+ bool hs;
+ bool vf;
+ u16 vf_num;
+ u16 cxt_start;
+ u16 cxt_end;
+};
+
+#define cxt_stop_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked__cxt_stop, _member)
+#define cxt_stop_flag(_bit, _member) cxt_stop_field(_bit, _bit, _member)
+
+static const struct packed_field_u16 cxt_stop_subfields[] = {
+ cxt_stop_flag(45, hs),
+ cxt_stop_flag(47, vf),
+ cxt_stop_field(63, 48, vf_num),
+ cxt_stop_field(79, 64, cxt_start),
+ cxt_stop_field(95, 80, cxt_end),
+};
+
+/* DSC_GENERIC */
+struct unpacked_desc {
+ u64 csb_ptr;
+ u16 type;
+ u8 subtype;
+ bool vl;
+ bool se;
+ bool fe;
+ bool ch;
+ bool csr;
+ bool rb;
+ bool np;
+ union {
+ struct unpacked__copy copy;
+ struct unpacked__intr intr;
+ struct unpacked__sync sync;
+ struct unpacked__cxt_start cxt_start;
+ struct unpacked__cxt_stop cxt_stop;
+ };
+};
+
+#define generic_field(_high, _low, _member) \
+ desc_field(_high, _low, struct unpacked_desc, _member)
+#define generic_flag(_bit, _member) generic_field(_bit, _bit, _member)
+
+static const struct packed_field_u16 generic_subfields[] = {
+ generic_flag(0, vl),
+ generic_flag(1, se),
+ generic_flag(2, fe),
+ generic_flag(3, ch),
+ generic_flag(4, csr),
+ generic_flag(5, rb),
+ generic_field(15, 8, subtype),
+ generic_field(26, 16, type),
+ generic_flag(448, np),
+ generic_field(511, 453, csb_ptr),
+};
+
+#ifndef SKIP_PACKING_CHECKS
+#define define_unpack_fn(_T) \
+ static void unpack_ ## _T(struct unpacked_desc *to, \
+ const struct sdxi_desc *from) \
+ { \
+ unpack_fields(from, sizeof(*from), to, \
+ generic_subfields, SDXI_PACKING_QUIRKS); \
+ unpack_fields(from, sizeof(*from), &to->_T, \
+ _T ## _subfields, SDXI_PACKING_QUIRKS); \
+ }
+#else
+#define define_unpack_fn(_T) \
+ static void unpack_ ## _T(struct unpacked_desc *to, \
+ const struct sdxi_desc *from) \
+ { \
+ unpack_fields_u16(from, sizeof(*from), to, \
+ generic_subfields, \
+ ARRAY_SIZE(generic_subfields), \
+ SDXI_PACKING_QUIRKS); \
+ unpack_fields_u16(from, sizeof(*from), &to->_T, \
+ _T ## _subfields, \
+ ARRAY_SIZE(_T ## _subfields), \
+ SDXI_PACKING_QUIRKS); \
+ }
+#endif /* SKIP_PACKING_CHECKS */
+
+define_unpack_fn(intr)
+define_unpack_fn(copy)
+define_unpack_fn(sync)
+define_unpack_fn(cxt_start)
+define_unpack_fn(cxt_stop)
+
+static void desc_poison(struct sdxi_desc *d)
+{
+ memset(d, 0xff, sizeof(*d));
+}
+
+static void encode_size32(struct kunit *t)
+{
+ __le32 res = cpu_to_le32(U32_MAX);
+
+ /* Valid sizes. */
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_size32(1, &res));
+ KUNIT_EXPECT_EQ(t, 0, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_size32(SZ_4K, &res));
+ KUNIT_EXPECT_EQ(t, SZ_4K - 1, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_size32(SZ_4M, &res));
+ KUNIT_EXPECT_EQ(t, SZ_4M - 1, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_size32(SZ_4G - 1, &res));
+ KUNIT_EXPECT_EQ(t, SZ_4G - 2, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_size32(SZ_4G, &res));
+ KUNIT_EXPECT_EQ(t, SZ_4G - 1, le32_to_cpu(res));
+
+ /* Invalid sizes. Ensure the out parameter is unmodified. */
+#define RES_VAL 0x843829
+ res = cpu_to_le32(RES_VAL);
+
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_size32(0, &res));
+ KUNIT_EXPECT_EQ(t, RES_VAL, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_size32(SZ_4G + 1, &res));
+ KUNIT_EXPECT_EQ(t, RES_VAL, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_size32(SZ_8G, &res));
+ KUNIT_EXPECT_EQ(t, RES_VAL, le32_to_cpu(res));
+
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_size32(U64_MAX, &res));
+ KUNIT_EXPECT_EQ(t, RES_VAL, le32_to_cpu(res));
+
+#undef RES_VAL
+}
+
+static void copy(struct kunit *t)
+{
+ struct unpacked_desc unpacked;
+ struct sdxi_desc desc = {};
+ struct sdxi_copy copy = {
+ .src = 0x1000,
+ .dst = 0x2000,
+ .len = 4096,
+ .src_akey = 0,
+ .dst_akey = 0,
+ };
+
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_copy(&desc, ©));
+
+ unpack_copy(&unpacked, &desc);
+ KUNIT_EXPECT_EQ(t, unpacked.vl, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_COPY);
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_DMAB);
+ KUNIT_EXPECT_EQ(t, unpacked.csb_ptr, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.np, 1);
+
+ KUNIT_EXPECT_EQ(t, unpacked.copy.size, copy.len - 1);
+
+ /* Zero isn't a valid size. */
+ desc_poison(&desc);
+ copy.len = 0;
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_copy(&desc, ©));
+
+ /* But 1 is. */
+ desc_poison(&desc);
+ copy.len = 1;
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_copy(&desc, ©));
+ unpack_copy(&unpacked, &desc);
+ KUNIT_EXPECT_EQ(t, unpacked.copy.size, copy.len - 1);
+
+ /* SDXI forbids overlapping source and destination. */
+ desc_poison(&desc);
+ copy.len = 4097;
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_copy(&desc, ©));
+ copy = (typeof(copy)) {
+ .src = 0x4000,
+ .dst = 0x4000,
+ .len = 1,
+ .src_akey = 0,
+ .dst_akey = 0,
+ };
+ KUNIT_EXPECT_EQ(t, -EINVAL, sdxi_encode_copy(&desc, ©));
+
+ desc_poison(&desc);
+ KUNIT_EXPECT_EQ(t, 0,
+ sdxi_encode_copy(&desc,
+ &(struct sdxi_copy) {
+ .src = 0x1000,
+ .dst = 0x2000,
+ .len = 0x100,
+ .src_akey = 1,
+ .dst_akey = 2,
+ }));
+ KUNIT_EXPECT_EQ(t, 0x1000, le64_to_cpu(desc.copy.addr0));
+ KUNIT_EXPECT_EQ(t, 0x2000, le64_to_cpu(desc.copy.addr1));
+ KUNIT_EXPECT_EQ(t, 0x100, 1 + le32_to_cpu(desc.copy.size));
+ KUNIT_EXPECT_EQ(t, 1, le16_to_cpu(desc.copy.akey0));
+ KUNIT_EXPECT_EQ(t, 2, le16_to_cpu(desc.copy.akey1));
+
+ unpack_copy(&unpacked, &desc);
+ KUNIT_EXPECT_EQ(t, unpacked.vl, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_COPY);
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_DMAB);
+ KUNIT_EXPECT_EQ(t, unpacked.csb_ptr, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.np, 1);
+
+ KUNIT_EXPECT_EQ(t, unpacked.copy.size, 0x100 - 1);
+}
+
+static void intr(struct kunit *t)
+{
+ struct unpacked_desc unpacked;
+ struct sdxi_intr intr = {
+ .akey = 1234,
+ };
+ struct sdxi_desc desc;
+
+ desc_poison(&desc);
+ KUNIT_EXPECT_EQ(t, 0, sdxi_encode_intr(&desc, &intr));
+
+ unpack_intr(&unpacked, &desc);
+ KUNIT_EXPECT_EQ(t, unpacked.vl, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_INTR);
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_INTR);
+ KUNIT_EXPECT_EQ(t, unpacked.csb_ptr, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.np, 1);
+
+ KUNIT_EXPECT_EQ(t, unpacked.intr.akey, 1234);
+}
+
+static void cxt_start(struct kunit *t)
+{
+ struct unpacked_desc unpacked;
+ struct sdxi_cxt_start start = {
+ .range = sdxi_cxt_range_single(2),
+ };
+ struct sdxi_desc desc;
+
+ desc_poison(&desc);
+ KUNIT_ASSERT_EQ(t, 0, sdxi_encode_cxt_start(&desc, &start));
+
+ unpack_cxt_start(&unpacked, &desc);
+
+ /* Check op-specific fields. */
+ KUNIT_EXPECT_EQ(t, 0, desc.cxt_start.vflags);
+
+ /*
+ * Check generic fields. Some flags have mandatory values
+ * according to the operation type.
+ */
+ KUNIT_EXPECT_EQ(t, unpacked.vl, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.se, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.fe, 1);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_CXT_START_NM);
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_ADMIN);
+ KUNIT_EXPECT_EQ(t, unpacked.csb_ptr, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.np, 1);
+
+ KUNIT_EXPECT_FALSE(t, unpacked.cxt_start.dv);
+ KUNIT_EXPECT_FALSE(t, unpacked.cxt_start.vf);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_start.cxt_start, 2);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_start.cxt_end, 2);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_start.vf_num, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_start.db_value, 0);
+}
+
+static void cxt_stop(struct kunit *t)
+{
+ struct unpacked_desc unpacked;
+ struct sdxi_cxt_stop stop = {
+ .range = sdxi_cxt_range_single(2),
+ };
+ struct sdxi_desc desc;
+
+ desc_poison(&desc);
+ KUNIT_ASSERT_EQ(t, 0, sdxi_encode_cxt_stop(&desc, &stop));
+
+ unpack_cxt_stop(&unpacked, &desc);
+
+ /* Check op-specific fields. */
+ KUNIT_EXPECT_EQ(t, 0, desc.cxt_start.vflags);
+
+ /*
+ * Check generic fields. Some flags have mandatory values
+ * according to the operation type.
+ */
+ KUNIT_EXPECT_EQ(t, unpacked.vl, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.se, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.fe, 1);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_CXT_STOP);
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_ADMIN);
+ KUNIT_EXPECT_EQ(t, unpacked.csb_ptr, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.np, 1);
+
+ KUNIT_EXPECT_FALSE(t, unpacked.cxt_stop.hs);
+ KUNIT_EXPECT_FALSE(t, unpacked.cxt_stop.vf);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_stop.cxt_start, 2);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_stop.cxt_end, 2);
+ KUNIT_EXPECT_EQ(t, unpacked.cxt_stop.vf_num, 0);
+}
+
+static void sync(struct kunit *t)
+{
+ struct sdxi_sync sync = {
+ .filter = SDXI_SYNC_FLT_STOP,
+ .range = sdxi_cxt_range(1, U16_MAX),
+ };
+ struct sdxi_desc desc;
+ struct unpacked_desc unpacked;
+
+ desc_poison(&desc);
+ KUNIT_ASSERT_EQ(t, 0, sdxi_encode_sync(&desc, &sync));
+ unpack_sync(&unpacked, &desc);
+
+ KUNIT_EXPECT_EQ(t, unpacked.type, SDXI_DSC_OP_TYPE_ADMIN);
+ KUNIT_EXPECT_EQ(t, unpacked.subtype, SDXI_DSC_OP_SUBTYPE_SYNC);
+ KUNIT_EXPECT_EQ(t, unpacked.ch, 0);
+ KUNIT_EXPECT_EQ(t, unpacked.sync.flt, SDXI_SYNC_FLT_STOP);
+ KUNIT_EXPECT_EQ(t, unpacked.sync.cxt_start, 1);
+ KUNIT_EXPECT_EQ(t, unpacked.sync.cxt_end, U16_MAX);
+}
+
+static struct kunit_case generic_desc_tcs[] = {
+ KUNIT_CASE(encode_size32),
+ KUNIT_CASE(copy),
+ KUNIT_CASE(intr),
+ KUNIT_CASE(cxt_start),
+ KUNIT_CASE(cxt_stop),
+ KUNIT_CASE(sync),
+ {}
+};
+
+static int generic_desc_setup_device(struct kunit *t)
+{
+ struct device *dev = kunit_device_register(t, "sdxi-mock-device");
+
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(t, dev);
+ t->priv = dev;
+ return 0;
+}
+
+static struct kunit_suite generic_desc_ts = {
+ .name = "Generic SDXI descriptor encoding",
+ .test_cases = generic_desc_tcs,
+ .init = generic_desc_setup_device,
+};
+kunit_test_suite(generic_desc_ts);
+
+MODULE_DESCRIPTION("SDXI descriptor encoding tests");
+MODULE_AUTHOR("Nathan Lynch");
+MODULE_LICENSE("GPL");
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (20 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:31 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Frank Li, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
During PCI probe, allocate a vector per context supported by the
function as reported by the capability register, plus one for the
error log interrupt, which is always vector 0. The rest of the vector
range is available for use with interrupt-generating descriptors.
Introduce sdxi_alloc_vector() and sdxi_free_vector() which are thin
wrappers around the IDA that tracks the allocated vector range.
Introduce sdxi_vector_to_irq() which invokes a new get_irq() bus op to
translate the device-relative index to the Linux IRQ number for use
with request_irq() etc. For PCI this dispatches to pci_irq_vector().
Code such as the DMA engine provider that intends to submit interrupt
descriptors should prepare by using sdxi_alloc_vector() and
sdxi_vector_to_irq(), and clean up by using sdxi_free_vector().
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/device.c | 4 ++++
drivers/dma/sdxi/pci.c | 28 ++++++++++++++++++++++-
drivers/dma/sdxi/sdxi.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index d3d1ad2f2eff..26be376c9545 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -12,6 +12,7 @@
#include <linux/dma-mapping.h>
#include <linux/dmapool.h>
#include <linux/export.h>
+#include <linux/idr.h>
#include <linux/iopoll.h>
#include <linux/jiffies.h>
#include <linux/log2.h>
@@ -334,6 +335,7 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
sdxi->dev = dev;
sdxi->bus_ops = ops;
+ ida_init(&sdxi->vectors);
xa_init_flags(&sdxi->client_cxts, XA_FLAGS_ALLOC1);
dev_set_drvdata(dev, sdxi);
@@ -355,6 +357,8 @@ void sdxi_unregister(struct device *dev)
sdxi_cxt_exit(cxt);
xa_destroy(&sdxi->client_cxts);
+ ida_destroy(&sdxi->vectors);
+
sdxi_dev_stop(sdxi);
}
EXPORT_SYMBOL_NS_GPL(sdxi_unregister, "SDXI");
diff --git a/drivers/dma/sdxi/pci.c b/drivers/dma/sdxi/pci.c
index 4d7d6812da6d..4f981e843138 100644
--- a/drivers/dma/sdxi/pci.c
+++ b/drivers/dma/sdxi/pci.c
@@ -5,6 +5,7 @@
* Copyright Advanced Micro Devices, Inc.
*/
+#include <linux/bitfield.h>
#include <linux/dev_printk.h>
#include <linux/dma-mapping.h>
#include <linux/err.h>
@@ -13,6 +14,7 @@
#include <linux/module.h>
#include <linux/pci.h>
+#include "mmio.h"
#include "sdxi.h"
enum sdxi_mmio_bars {
@@ -29,7 +31,8 @@ static int sdxi_pci_init(struct sdxi_dev *sdxi)
{
struct pci_dev *pdev = sdxi_to_pci_dev(sdxi);
struct device *dev = &pdev->dev;
- int ret;
+ unsigned int cap1_max_cxt;
+ int vecs, ret;
ret = pcim_enable_device(pdev);
if (ret)
@@ -49,12 +52,35 @@ static int sdxi_pci_init(struct sdxi_dev *sdxi)
return dev_err_probe(dev, PTR_ERR(sdxi->dbs),
"failed to map doorbell region\n");
+ /*
+ * Allocate the minimum required set of vectors plus one for
+ * each client context supported by the function.
+ */
+ cap1_max_cxt = FIELD_GET(SDXI_MMIO_CAP1_MAX_CXT,
+ sdxi_read64(sdxi, SDXI_MMIO_CAP1));
+ vecs = pci_alloc_irq_vectors(pdev, SDXI_MIN_VECTORS,
+ SDXI_MIN_VECTORS + cap1_max_cxt,
+ PCI_IRQ_MSI | PCI_IRQ_MSIX);
+ if (vecs < 0)
+ return dev_err_probe(dev, vecs,
+ "failed to allocate MSIs (max_cxt=%u)\n",
+ cap1_max_cxt);
+
+ sdxi->nr_vectors = vecs;
+ dev_dbg(sdxi->dev, "allocated %u vectors\n", sdxi->nr_vectors);
+
pci_set_master(pdev);
return 0;
}
+static int sdxi_pci_get_irq(struct sdxi_dev *sdxi, unsigned int nr)
+{
+ return pci_irq_vector(sdxi_to_pci_dev(sdxi), nr);
+}
+
static const struct sdxi_bus_ops sdxi_pci_ops = {
.init = sdxi_pci_init,
+ .get_irq = sdxi_pci_get_irq,
};
static int sdxi_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/dma/sdxi/sdxi.h b/drivers/dma/sdxi/sdxi.h
index 4e29d1f90dae..72482d7c301e 100644
--- a/drivers/dma/sdxi/sdxi.h
+++ b/drivers/dma/sdxi/sdxi.h
@@ -8,8 +8,10 @@
#ifndef DMA_SDXI_H
#define DMA_SDXI_H
+#include <linux/bug.h>
#include <linux/compiler_types.h>
#include <linux/dev_printk.h>
+#include <linux/idr.h>
#include <linux/io.h>
#include <linux/types.h>
#include <linux/xarray.h>
@@ -25,6 +27,21 @@
#define L1_CXT_CTRL_PTR_SHIFT 6
#define L1_CXT_AKEY_PTR_SHIFT 12
+enum {
+ /*
+ * Per SDXI 1.0 3.4 Error Log, the error log interrupt is
+ * always vector 0.
+ */
+ SDXI_ERROR_VECTOR = 0,
+
+ /*
+ * Request at least one vector to account for the error log
+ * interrupt. Increment this if the driver gains more
+ * dedicated interrupts (e.g. one for the admin context).
+ */
+ SDXI_MIN_VECTORS = 1,
+};
+
struct sdxi_dev;
/**
@@ -37,6 +54,10 @@ struct sdxi_bus_ops {
* function initialization.
*/
int (*init)(struct sdxi_dev *sdxi);
+ /**
+ * @get_irq: Map device interrupt index to Linux IRQ number.
+ */
+ int (*get_irq)(struct sdxi_dev *sdxi, unsigned int index);
};
struct sdxi_dev {
@@ -59,12 +80,48 @@ struct sdxi_dev {
struct dma_pool *cxt_ctl_pool;
struct dma_pool *cst_blk_pool;
+ unsigned int nr_vectors;
+ struct ida vectors;
+
struct sdxi_cxt *admin_cxt;
struct xarray client_cxts; /* context id -> (struct sdxi_cxt *) */
const struct sdxi_bus_ops *bus_ops;
};
+/**
+ * sdxi_alloc_vector() - Allocate an interrupt vector.
+ *
+ * A vector that will have the same lifetime as the device does not
+ * need to be released explicitly. Otherwise the vector must be
+ * released with sdxi_free_vector().
+ */
+static inline int sdxi_alloc_vector(struct sdxi_dev *sdxi)
+{
+ return ida_alloc_range(&sdxi->vectors, SDXI_MIN_VECTORS,
+ sdxi->nr_vectors - 1, GFP_KERNEL);
+}
+
+/**
+ * sdxi_free_vector() - Release a previously allocated index.
+ */
+static inline void sdxi_free_vector(struct sdxi_dev *sdxi, unsigned int nr)
+{
+ ida_free(&sdxi->vectors, nr);
+}
+
+/**
+ * sdxi_vector_to_irq() - Translate an allocated interrupt vector to
+ * Linux IRQ number suitable for passing to
+ * request_irq() et al.
+ */
+static inline int sdxi_vector_to_irq(struct sdxi_dev *sdxi, unsigned int nr)
+{
+ /* Moan if the index isn't currently allocated. */
+ WARN_ON_ONCE(!ida_exists(&sdxi->vectors, nr));
+ return sdxi->bus_ops->get_irq(sdxi, nr);
+}
+
int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops);
void sdxi_unregister(struct device *dev);
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
` (21 preceding siblings ...)
2026-06-06 0:02 ` [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping Nathan Lynch via B4 Relay
@ 2026-06-06 0:02 ` Nathan Lynch via B4 Relay
2026-06-06 0:33 ` sashiko-bot
22 siblings, 1 reply; 40+ messages in thread
From: Nathan Lynch via B4 Relay @ 2026-06-06 0:02 UTC (permalink / raw)
To: Vinod Koul, Frank Li
Cc: Bjorn Helgaas, David Rientjes, John.Kariuki, Jonathan Cameron,
Kinsey Ho, Mario Limonciello, PradeepVineshReddy.Kodamati,
Shivank Garg, Stephen Bates, Tycho Andersen, Wei Huang, Wei Xu,
dmaengine, linux-kernel, linux-pci, Nathan Lynch
From: Nathan Lynch <nathan.lynch@amd.com>
Register a DMA engine provider that implements memcpy. The number of
channels per SDXI function can be controlled via a module
parameter (dma_channels). The provider uses the virt-dma library.
Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
---
drivers/dma/sdxi/Kconfig | 1 +
drivers/dma/sdxi/Makefile | 1 +
drivers/dma/sdxi/device.c | 2 +
drivers/dma/sdxi/dma.c | 501 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/dma/sdxi/dma.h | 11 +
5 files changed, 516 insertions(+)
diff --git a/drivers/dma/sdxi/Kconfig b/drivers/dma/sdxi/Kconfig
index 314b3e725ccb..85e23608405c 100644
--- a/drivers/dma/sdxi/Kconfig
+++ b/drivers/dma/sdxi/Kconfig
@@ -4,6 +4,7 @@ config SDXI_CORE
# SDXI 1.0 9 MMIO Control Registers.
depends on 64BIT
select DMA_ENGINE
+ select DMA_VIRTUAL_CHANNELS
help
Enable support for Smart Data Accelerator Interface (SDXI)
Platform Data Mover devices. SDXI is a vendor-neutral
diff --git a/drivers/dma/sdxi/Makefile b/drivers/dma/sdxi/Makefile
index 5218ae2c86ae..d7ea5776191e 100644
--- a/drivers/dma/sdxi/Makefile
+++ b/drivers/dma/sdxi/Makefile
@@ -5,6 +5,7 @@ sdxi-core-y := \
context.o \
descriptor.o \
device.o \
+ dma.o \
ring.o
obj-$(CONFIG_SDXI_PCI) += sdxi-pci.o
diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
index 26be376c9545..d6097a6d8992 100644
--- a/drivers/dma/sdxi/device.c
+++ b/drivers/dma/sdxi/device.c
@@ -22,6 +22,7 @@
#include <linux/xarray.h>
#include "context.h"
+#include "dma.h"
#include "hw.h"
#include "mmio.h"
#include "sdxi.h"
@@ -321,6 +322,7 @@ static int sdxi_device_init(struct sdxi_dev *sdxi)
if (err)
return err;
+ sdxi_dma_register(sdxi);
return 0;
}
diff --git a/drivers/dma/sdxi/dma.c b/drivers/dma/sdxi/dma.c
new file mode 100644
index 000000000000..01c85cc2763d
--- /dev/null
+++ b/drivers/dma/sdxi/dma.c
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * SDXI dmaengine provider
+ *
+ * Copyright Advanced Micro Devices, Inc.
+ */
+
+#include <linux/cleanup.h>
+#include <linux/delay.h>
+#include <linux/dev_printk.h>
+#include <linux/container_of.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/overflow.h>
+#include <linux/spinlock.h>
+
+#include "../dmaengine.h"
+#include "../virt-dma.h"
+#include "completion.h"
+#include "context.h"
+#include "descriptor.h"
+#include "dma.h"
+#include "ring.h"
+#include "sdxi.h"
+
+static unsigned short dma_channels = 1;
+module_param(dma_channels, ushort, 0644);
+MODULE_PARM_DESC(dma_channels, "DMA channels per function (default: 1)");
+
+/*
+ * An SDXI context is allocated for each channel configured.
+ *
+ * Each context has a descriptor ring with a minimum of 1K entries.
+ * SDXI supports a variety of primitive operations, e.g. copy,
+ * interrupt, nop. Each Linux virtual DMA descriptor may be composed
+ * of a grouping of SDXI descriptors in the ring. E.g. two SDXI
+ * descriptors (copy, then interrupt) to implement a
+ * dma_async_tx_descriptor for memcpy with DMA_PREP_INTERRUPT flag.
+ *
+ * dma_device->device_prep_dma_* functions reserve space in the
+ * descriptor ring and serialize SDXI descriptors implementing the
+ * operation to the reserved slots, leaving their valid (vl) bits
+ * clear. A single virtual descriptor is added to the allocated list.
+ *
+ * dma_async_tx_descriptor->tx_submit() invokes vchan_tx_submit(),
+ * which merely assigns a cookie and moves the txd to the submitted
+ * list without entering the SDXI provider code.
+ *
+ * dma_device->device_issue_pending() (sdxi_dma_issue_pending()) sets vl
+ * on each SDXI descriptor reachable from the submitted list, then
+ * rings the context doorbell. The submitted txds are moved to the
+ * issued list via vchan_issue_pending().
+ */
+
+struct sdxi_dma_chan {
+ struct virt_dma_chan vchan;
+ struct sdxi_cxt *cxt;
+ unsigned int vector;
+ unsigned int irq;
+ struct sdxi_akey_ent *akey;
+};
+
+struct sdxi_dma_dev {
+ struct dma_device dma_dev;
+ size_t nr_channels;
+ struct sdxi_dma_chan sdchan[] __counted_by(nr_channels);
+};
+
+/*
+ * A virtual descriptor can correspond to a group of SDXI hardware descriptors.
+ */
+struct sdxi_dma_desc {
+ struct virt_dma_desc vdesc;
+ struct sdxi_ring_resv resv;
+ struct sdxi_completion *completion;
+};
+
+static struct sdxi_dma_chan *to_sdxi_dma_chan(const struct dma_chan *dma_chan)
+{
+ const struct virt_dma_chan *vchan;
+
+ vchan = container_of_const(dma_chan, struct virt_dma_chan, chan);
+ return container_of(vchan, struct sdxi_dma_chan, vchan);
+}
+
+static struct sdxi_dma_desc *
+to_sdxi_dma_desc(const struct virt_dma_desc *vdesc)
+{
+ return container_of(vdesc, struct sdxi_dma_desc, vdesc);
+}
+
+static void sdxi_tx_desc_free(struct virt_dma_desc *vdesc)
+{
+ struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
+
+ sdxi_completion_free(sddesc->completion);
+ kfree(to_sdxi_dma_desc(vdesc));
+}
+
+static struct sdxi_dma_desc *
+prep_memcpy_intr(struct dma_chan *dma_chan, const struct sdxi_copy *params)
+{
+ struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
+ struct sdxi_akey_ent *akey = to_sdxi_dma_chan(dma_chan)->akey;
+ struct sdxi_desc *copy, *intr;
+
+ struct sdxi_completion *comp __free(sdxi_completion) = sdxi_completion_alloc(cxt->sdxi);
+ if (!comp)
+ return NULL;
+
+ struct sdxi_dma_desc *sddesc __free(kfree) = kzalloc(sizeof(*sddesc), GFP_NOWAIT);
+ if (!sddesc)
+ return NULL;
+
+ if (sdxi_ring_try_reserve(cxt->ring_state, 2, &sddesc->resv))
+ return NULL;
+
+ copy = sdxi_ring_resv_next(&sddesc->resv);
+ (void)sdxi_encode_copy(copy, params); /* Caller checked validity. */
+ sdxi_desc_set_fence(copy); /* Conservatively fence every descriptor. */
+ sdxi_completion_attach(copy, comp);
+
+ sddesc->completion = no_free_ptr(comp);
+
+ intr = sdxi_ring_resv_next(&sddesc->resv);
+ sdxi_encode_intr(intr, &(const struct sdxi_intr) {
+ .akey = sdxi_akey_index(cxt, akey),
+ });
+ /* Raise the interrupt only after the copy has completed. */
+ sdxi_desc_set_fence(intr);
+ return_ptr(sddesc);
+}
+
+static struct sdxi_dma_desc *
+prep_memcpy_nointr(struct dma_chan *dma_chan, const struct sdxi_copy *params)
+{
+ struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
+ struct sdxi_desc *copy;
+
+ struct sdxi_completion *comp __free(sdxi_completion) = sdxi_completion_alloc(cxt->sdxi);
+ if (!comp)
+ return NULL;
+
+ struct sdxi_dma_desc *sddesc __free(kfree) = kzalloc(sizeof(*sddesc), GFP_NOWAIT);
+ if (!sddesc)
+ return NULL;
+
+ if (sdxi_ring_try_reserve(cxt->ring_state, 1, &sddesc->resv))
+ return NULL;
+
+ copy = sdxi_ring_resv_next(&sddesc->resv);
+ (void)sdxi_encode_copy(copy, params); /* Caller checked validity. */
+ sdxi_completion_attach(copy, comp);
+
+ sddesc->completion = no_free_ptr(comp);
+ return_ptr(sddesc);
+}
+
+static struct dma_async_tx_descriptor *
+sdxi_dma_prep_memcpy(struct dma_chan *dma_chan, dma_addr_t dst,
+ dma_addr_t src, size_t len, unsigned long flags)
+{
+ struct sdxi_akey_ent *akey = to_sdxi_dma_chan(dma_chan)->akey;
+ struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
+ u16 akey_index = sdxi_akey_index(cxt, akey);
+ struct sdxi_dma_desc *sddesc;
+ struct sdxi_copy copy = {
+ .src = src,
+ .dst = dst,
+ .src_akey = akey_index,
+ .dst_akey = akey_index,
+ .len = len,
+ };
+
+ /*
+ * Perform a trial encode to a dummy descriptor on the stack
+ * so we can reject bad inputs without touching the ring
+ * state.
+ */
+ if (sdxi_encode_copy(&(struct sdxi_desc){}, ©))
+ return NULL;
+
+ sddesc = (flags & DMA_PREP_INTERRUPT) ?
+ prep_memcpy_intr(dma_chan, ©) :
+ prep_memcpy_nointr(dma_chan, ©);
+
+ if (!sddesc)
+ return NULL;
+
+ return vchan_tx_prep(to_virt_chan(dma_chan), &sddesc->vdesc, flags);
+}
+
+static enum dma_status sdxi_tx_status(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_tx_state *state)
+{
+ struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(chan);
+ struct sdxi_dma_desc *sddesc;
+ enum dma_status status;
+ struct virt_dma_desc *vdesc;
+
+ status = dma_cookie_status(chan, cookie, state);
+ if (status == DMA_COMPLETE)
+ return status;
+
+ guard(spinlock_irqsave)(&sdchan->vchan.lock);
+
+ vdesc = vchan_find_desc(&sdchan->vchan, cookie);
+ if (!vdesc)
+ return status;
+
+ sddesc = to_sdxi_dma_desc(vdesc);
+
+ if (WARN_ON_ONCE(!sddesc->completion))
+ return DMA_ERROR;
+
+ if (!sdxi_completion_signaled(sddesc->completion))
+ return DMA_IN_PROGRESS;
+
+ if (sdxi_completion_errored(sddesc->completion))
+ return DMA_ERROR;
+
+ list_del(&vdesc->node);
+ vchan_cookie_complete(vdesc);
+
+ return dma_cookie_status(chan, cookie, state);
+}
+
+static void sdxi_dma_issue_pending(struct dma_chan *dma_chan)
+{
+ struct virt_dma_chan *vchan = to_virt_chan(dma_chan);
+ struct virt_dma_desc *vdesc;
+ u64 dbval = 0;
+
+ scoped_guard(spinlock_irqsave, &vchan->lock) {
+ /*
+ * This can happen with racing submitters.
+ */
+ if (list_empty(&vchan->desc_submitted))
+ return;
+
+ list_for_each_entry(vdesc, &vchan->desc_submitted, node) {
+ struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
+ struct sdxi_desc *hwdesc;
+
+ sdxi_ring_resv_foreach(&sddesc->resv, hwdesc)
+ sdxi_desc_make_valid(hwdesc);
+ /*
+ * The reservations ought to be ordered
+ * ascending, but use umax() just in case.
+ */
+ dbval = umax(sdxi_ring_resv_dbval(&sddesc->resv), dbval);
+ }
+
+ vchan_issue_pending(vchan);
+ }
+
+ /*
+ * The implementation is required to handle out-of-order
+ * doorbell updates; we can do this after dropping the
+ * lock.
+ */
+ sdxi_cxt_push_doorbell(to_sdxi_dma_chan(dma_chan)->cxt, dbval);
+}
+
+static int sdxi_dma_terminate_all(struct dma_chan *dma_chan)
+{
+ struct virt_dma_chan *vchan = to_virt_chan(dma_chan);
+ u64 dbval = 0;
+
+ /*
+ * Allocated and submitted txds are in the ring but not valid
+ * yet. Overwrite them with nops and then set their valid
+ * bits.
+ *
+ * The implementation may start consuming these as soon as the
+ * valid bits flip. sdxi_dma_synchronize() will ensure they're
+ * all done.
+ */
+ scoped_guard(spinlock_irqsave, &vchan->lock) {
+ struct virt_dma_desc *vdesc;
+ LIST_HEAD(head);
+
+ list_splice_tail_init(&vchan->desc_allocated, &head);
+ list_splice_tail_init(&vchan->desc_submitted, &head);
+
+ if (list_empty(&head))
+ return 0;
+
+ list_for_each_entry(vdesc, &head, node) {
+ struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
+ struct sdxi_desc *hwdesc;
+
+ sdxi_ring_resv_foreach(&sddesc->resv, hwdesc) {
+ sdxi_serialize_nop(hwdesc);
+ sdxi_desc_make_valid(hwdesc);
+ }
+
+ dbval = umax(sdxi_ring_resv_dbval(&sddesc->resv), dbval);
+ }
+
+ list_splice_tail(&head, &vchan->desc_terminated);
+ }
+
+ sdxi_cxt_push_doorbell(to_sdxi_dma_chan(dma_chan)->cxt, dbval);
+
+ return 0;
+}
+
+static void sdxi_dma_synchronize(struct dma_chan *dma_chan)
+{
+ struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
+ struct sdxi_ring_resv resv;
+ struct sdxi_desc *nop;
+ int err;
+
+ /* Submit a single nop with fence and wait for it to complete. */
+
+ if (sdxi_ring_reserve(cxt->ring_state, 1, &resv))
+ return;
+
+ struct sdxi_completion *comp __free(sdxi_completion) = sdxi_completion_alloc(cxt->sdxi);
+ if (!comp)
+ return;
+
+ nop = sdxi_ring_resv_next(&resv);
+ sdxi_serialize_nop(nop);
+ sdxi_completion_attach(nop, comp);
+ sdxi_desc_set_fence(nop);
+ sdxi_desc_make_valid(nop);
+ sdxi_cxt_push_doorbell(cxt, sdxi_ring_resv_dbval(&resv));
+
+ err = sdxi_completion_poll(comp);
+ WARN_ONCE(err, "got %d polling cst_blk", err);
+
+ vchan_synchronize(to_virt_chan(dma_chan));
+}
+
+static irqreturn_t sdxi_dma_cxt_irq(int irq, void *data)
+{
+ struct sdxi_dma_chan *sdchan = data;
+ struct virt_dma_chan *vchan = &sdchan->vchan;
+ struct virt_dma_desc *vdesc;
+ bool completed = false;
+
+ guard(spinlock_irqsave)(&vchan->lock);
+
+ while ((vdesc = vchan_next_desc(vchan))) {
+ struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
+
+ if (!sdxi_completion_signaled(sddesc->completion))
+ break;
+
+ list_del(&vdesc->node);
+ vchan_cookie_complete(&sddesc->vdesc);
+ completed = true;
+ }
+
+ if (completed)
+ sdxi_ring_wake_up(sdchan->cxt->ring_state);
+
+ return IRQ_HANDLED;
+}
+
+static int sdxi_dma_alloc_chan_resources(struct dma_chan *dma_chan)
+{
+ struct sdxi_dev *sdxi = dev_get_drvdata(dma_chan->device->dev);
+ struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(dma_chan);
+ int vector, irq, err;
+
+ sdchan->cxt = sdxi_cxt_new(sdxi);
+ if (!sdchan->cxt)
+ return -ENOMEM;
+ /*
+ * This irq and akey setup should perhaps all be pushed into
+ * the context allocation.
+ */
+ err = vector = sdxi_alloc_vector(sdxi);
+ if (vector < 0)
+ goto exit_cxt;
+
+ sdchan->vector = vector;
+
+ err = irq = sdxi_vector_to_irq(sdxi, vector);
+ if (irq < 0)
+ goto free_vector;
+
+ sdchan->irq = irq;
+
+ /*
+ * Note this akey entry is used for both the completion
+ * interrupt and source and destination access for copies.
+ */
+ sdchan->akey = sdxi_alloc_akey(sdchan->cxt);
+ if (!sdchan->akey) {
+ err = -ENOMEM;
+ goto free_vector;
+ }
+
+ *sdchan->akey = (typeof(*sdchan->akey)) {
+ .intr_num = cpu_to_le16(FIELD_PREP(SDXI_AKEY_ENT_VL, 1) |
+ FIELD_PREP(SDXI_AKEY_ENT_IV, 1) |
+ FIELD_PREP(SDXI_AKEY_ENT_INTR_NUM,
+ vector)),
+ };
+
+ err = request_irq(sdchan->irq, sdxi_dma_cxt_irq,
+ IRQF_TRIGGER_NONE, "SDXI DMAengine", sdchan);
+ if (err)
+ goto free_akey;
+
+ err = sdxi_start_cxt(sdchan->cxt);
+ if (err)
+ goto free_irq;
+
+ return 0;
+free_irq:
+ free_irq(sdchan->irq, sdchan);
+free_akey:
+ sdxi_free_akey(sdchan->cxt, sdchan->akey);
+free_vector:
+ sdxi_free_vector(sdxi, vector);
+exit_cxt:
+ sdxi_cxt_exit(sdchan->cxt);
+ return err;
+}
+
+static void sdxi_dma_free_chan_resources(struct dma_chan *dma_chan)
+{
+ struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(dma_chan);
+
+ sdxi_stop_cxt(sdchan->cxt);
+ free_irq(sdchan->irq, sdchan);
+ sdxi_free_vector(sdchan->cxt->sdxi, sdchan->vector);
+ sdxi_free_akey(sdchan->cxt, sdchan->akey);
+ vchan_free_chan_resources(to_virt_chan(dma_chan));
+ sdxi_cxt_exit(sdchan->cxt);
+}
+
+int sdxi_dma_register(struct sdxi_dev *sdxi)
+{
+ struct device *dev = sdxi->dev;
+ struct sdxi_dma_dev *sddev;
+ struct dma_device *dma_dev;
+ int err;
+
+ if (!dma_channels)
+ return 0;
+ /*
+ * Note that this code assumes the device supports the
+ * interrupt operation group (IntrGrp), which is optional. See
+ * SDXI 1.0 Table 6-1 SDXI Operation Groups.
+ *
+ * TODO: check sdxi->op_grp_cap for IntrGrp support and error
+ * out if it's missing.
+ */
+
+ sddev = devm_kzalloc(dev, struct_size(sddev, sdchan, dma_channels),
+ GFP_KERNEL);
+ if (!sddev)
+ return -ENOMEM;
+
+ sddev->nr_channels = dma_channels;
+
+ dma_dev = &sddev->dma_dev;
+ *dma_dev = (typeof(*dma_dev)) {
+ .dev = dev,
+ .src_addr_widths = DMA_SLAVE_BUSWIDTH_64_BYTES,
+ .dst_addr_widths = DMA_SLAVE_BUSWIDTH_64_BYTES,
+ .directions = BIT(DMA_MEM_TO_MEM),
+ .residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR,
+
+ .device_alloc_chan_resources = sdxi_dma_alloc_chan_resources,
+ .device_free_chan_resources = sdxi_dma_free_chan_resources,
+
+ .device_prep_dma_memcpy = sdxi_dma_prep_memcpy,
+
+ .device_terminate_all = sdxi_dma_terminate_all,
+ .device_synchronize = sdxi_dma_synchronize,
+ .device_tx_status = sdxi_tx_status,
+ .device_issue_pending = sdxi_dma_issue_pending,
+ };
+
+ dma_cap_set(DMA_MEMCPY, dma_dev->cap_mask);
+ INIT_LIST_HEAD(&dma_dev->channels);
+
+ for (size_t i = 0; i < sddev->nr_channels; ++i) {
+ struct sdxi_dma_chan *sdchan = &sddev->sdchan[i];
+
+ sdchan->vchan.desc_free = sdxi_tx_desc_free;
+ vchan_init(&sdchan->vchan, &sddev->dma_dev);
+ }
+
+ err = dmaenginem_async_device_register(dma_dev);
+ if (err)
+ return dev_warn_probe(dev, err, "failed to register dma device\n");
+
+ return 0;
+}
diff --git a/drivers/dma/sdxi/dma.h b/drivers/dma/sdxi/dma.h
new file mode 100644
index 000000000000..d38870ea7d91
--- /dev/null
+++ b/drivers/dma/sdxi/dma.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright Advanced Micro Devices, Inc. */
+
+#ifndef DMA_SDXI_DMA_H
+#define DMA_SDXI_DMA_H
+
+struct sdxi_dev;
+
+int sdxi_dma_register(struct sdxi_dev *sdxi);
+
+#endif /* DMA_SDXI_DMA_H */
--
2.54.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start
2026-06-06 0:02 ` [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start Nathan Lynch via B4 Relay
@ 2026-06-06 0:12 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:12 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: dmaengine, vkoul, linux-pci, Frank.Li
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Missing explicit device stop/reset on probe failure leads to DMA use-after-free
--
commit 416e2a783d5f21c412d3a7e804c33d0f978dad21
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Complete administrative context jump start
This commit introduces a helper to push the administrative context
doorbell. It then updates the device activation sequence to start
the admin context by pushing this doorbell after the device becomes
active.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> index e8f087f758fde..b875d8cdb58af 100644
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
> @@ -269,7 +269,20 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
> * SDXI 1.0 4.1.8.9: Set MMIO_CTL0.fn_gsr to GSRV_ACTIVE and
> * wait for MMIO_STS0.fn_gsv to reach GSV_ACTIVE or GSV_ERROR.
> */
> - return sdxi_dev_start(sdxi);
> + err = sdxi_dev_start(sdxi);
> + if (err)
> + return err;
[Severity: High]
This is a pre-existing issue, but does this error path in sdxi_fn_activate()
leave the hardware active while its DMA-mapped structures are freed?
If sdxi_dev_start() fails during initialization, such as by timing out while
waiting for the device to transition to the active state, the function
returns an error and the device probe fails.
The devres framework then executes cleanup actions in reverse order of their
registration. This causes devres to free the DMA-coherent memory allocated
for the L1/L2 tables and the admin context before it invokes the PCI cleanup
that calls pcim_disable_device() to stop bus mastering.
During this window, the hardware device has been commanded to become active
and retains the pointers to the now-freed DMA memory. If the device completes
its transition and performs any DMA operations, could it corrupt freed kernel
memory and lead to a DMA use-after-free?
Should the device be explicitly commanded to stop or reset in the error
handling path before returning?
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=10
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration
2026-06-06 0:02 ` [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration Nathan Lynch via B4 Relay
@ 2026-06-06 0:14 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:14 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, linux-pci, Frank.Li, dmaengine
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing explicit case for `SDXI_GSV_STOP` in `sdxi_dev_stop()`, causing an unintended hardware reset and a race condition during initialization.
- [Medium] Failure to escalate soft stop to hard stop on timeout breaks kexec recovery.
--
commit 17aa147e1376754aa9259ca4aa25570606278ae6
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Feature discovery and initial configuration
This commit discovers device capabilities, introduces register definitions,
and performs initial configuration. It also attempts to force the SDXI
function to a stopped state during initialization to recover from kexec or
prior driver bugs.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> index 0974a83bb45ca..7c6652f9c3c04 100644
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
[ ... ]
> +/* Get the device to the GSV_STOP state. */
> +static int sdxi_dev_stop(struct sdxi_dev *sdxi)
> +{
> + enum sdxi_fn_gsv status = sdxi_dev_gsv(sdxi);
> + int ret;
> +
> + dev_dbg(sdxi->dev, "attempting stop, current state: %s\n",
> + gsv_str(status));
> +
> + switch (status) {
> + case SDXI_GSV_INIT:
> + case SDXI_GSV_ACTIVE:
> + sdxi_write_fn_gsr(sdxi, SDXI_GSRV_STOP_SF);
> + break;
> + case SDXI_GSV_STOPG_SF:
> + sdxi_write_fn_gsr(sdxi, SDXI_GSRV_STOP_HD);
> + break;
> + case SDXI_GSV_STOPG_HD:
> + case SDXI_GSV_ERROR:
> + /*
> + * If hard-stopping, there's nothing to do but wait.
> + * If in error state, the reset is issued below.
> + */
> + break;
> + default:
> + /* Unrecognized state; try a reset. */
> + sdxi_write_fn_gsr(sdxi, SDXI_GSRV_RESET);
> + break;
> + }
[Severity: High]
Does this switch statement need an explicit case for SDXI_GSV_STOP?
If the device is already in the stopped state (which is the expected
post-reset state), the switch appears to fall through to the default block
and unnecessarily issues a hardware reset.
Immediately after the reset command is written, sdxi_dev_gsv_poll() reads
the status register. Since hardware resets take time to execute, the very
first read will likely still sample the old SDXI_GSV_STOP state. This
could cause the poll loop to exit immediately with success while the reset
is still actively running in the background.
Since the caller proceeds to configure critical control registers right
after this returns, could this lead to lost writes or corrupted state while
the hardware is resetting?
> +
> + /* Wait for transition to either stop or error state. */
> + ret = sdxi_dev_gsv_poll(sdxi, status,
> + status == SDXI_GSV_STOP ||
> + status == SDXI_GSV_ERROR);
> +
> + if (ret == 0 && status == SDXI_GSV_ERROR) {
> + sdxi_write_fn_gsr(sdxi, SDXI_GSRV_RESET);
> + ret = sdxi_dev_gsv_poll(sdxi, status, status == SDXI_GSV_STOP);
> + }
> +
> + if (ret) {
> + dev_err(sdxi->dev, "stop timed out, current state: %s\n",
> + gsv_str(status));
> + return ret;
> + }
[Severity: Medium]
If a soft stop times out, should this escalate to a hard stop or reset
instead of returning an error?
The commit message mentions forcing the device to a stopped state to
recover from kexec or driver bugs. However, if the device is in
SDXI_GSV_ACTIVE and the soft stop (SDXI_GSRV_STOP_SF) times out (perhaps
due to pending PCIe transactions hanging after a crash), this simply returns
-ETIMEDOUT.
Since probe is not automatically retried, does this leave the hardware
stuck in SDXI_GSV_STOPG_SF and break the intended kexec recovery?
> +
> + return 0;
> +}
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=4
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove
2026-06-06 0:02 ` [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove Nathan Lynch via B4 Relay
@ 2026-06-06 0:14 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:14 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, linux-pci, dmaengine, Frank.Li
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] Device continues DMA into freed memory if activation fails during probe or if soft-stop times out during driver removal.
- [High] Missing `pci_free_irq_vectors()` in the device remove callback and error paths leaks IRQ resources.
Pre-existing issues:
- [High] Missing switch case in `sdxi_dev_stop()` for `SDXI_GSV_STOP` causes a race condition and spurious reset.
--
commit fce324da74dfb9a79bfe266368af63dc843a0386
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Start functions on probe, stop on remove
This commit drives each SDXI function to the active state during probe by
writing to the GSR register and polling for the GSV_ACTIVE state. It also
introduces an unregister function to stop the device during driver removal.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> index 4d595e79b8ce5..e8f087f758fde 100644
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
[ ... ]
> +static int sdxi_dev_start(struct sdxi_dev *sdxi)
> +{
> + enum sdxi_fn_gsv status = sdxi_dev_gsv(sdxi);
> + int ret;
[ ... ]
> + sdxi_write_fn_gsr(sdxi, SDXI_GSRV_ACTIVE);
> +
> + ret = sdxi_dev_gsv_poll(sdxi, status,
> + status == SDXI_GSV_ACTIVE ||
> + status == SDXI_GSV_ERROR);
> + if (ret) {
> + dev_err(sdxi->dev, "activation timed out, current state: %s\n",
> + gsv_str(status));
> + return ret;
> + }
[Severity: High]
Is there a risk of DMA use-after-free here?
If the device fails to reach the ACTIVE state within the timeout, this returns
-ETIMEDOUT without forcing the device to a hard stop or reset.
Since the error propagates up and causes the probe to fail, devres will
subsequently free DMA-mapped memory such as the L1 and L2 tables. If the
hardware state machine is still running and eventually activates, could it
continue to perform DMA into the freed memory?
[ ... ]
> @@ -290,6 +330,14 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
> }
> EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
>
> +void sdxi_unregister(struct device *dev)
> +{
> + struct sdxi_dev *sdxi = dev_get_drvdata(dev);
> +
> + sdxi_dev_stop(sdxi);
> +}
[Severity: High]
Does this safely handle a timeout during stop?
Since the return value of sdxi_dev_stop() is ignored, the unbind process
will continue and free memory even if the hardware refused to stop. This
seems like it could also allow the device to continue DMA into freed memory.
[Severity: High]
This is a pre-existing issue, but looking at sdxi_dev_stop() itself, could
the missing SDXI_GSV_STOP case cause a spurious reset?
If sdxi_dev_stop() is called when the device is already in the STOP state:
drivers/dma/sdxi/device.c:sdxi_dev_stop() {
...
switch (status) {
case SDXI_GSV_INIT:
case SDXI_GSV_ACTIVE:
sdxi_write_fn_gsr(sdxi, SDXI_GSRV_STOP_SF);
break;
default:
/* Unrecognized state; try a reset. */
sdxi_write_fn_gsr(sdxi, SDXI_GSRV_RESET);
break;
}
/* Wait for transition to either stop or error state. */
ret = sdxi_dev_gsv_poll(sdxi, status,
status == SDXI_GSV_STOP ||
status == SDXI_GSV_ERROR);
...
}
The status falls through to the default case, writing SDXI_GSRV_RESET to the
hardware. The read_poll_timeout() then immediately reads the STOP state
(before the asynchronous reset takes effect) and returns success.
Could the subsequent initialization steps then execute concurrently with the
hardware reset, causing undefined device behavior?
> diff --git a/drivers/dma/sdxi/pci.c b/drivers/dma/sdxi/pci.c
> index 42e8af008b100..4d7d6812da6d8 100644
> --- a/drivers/dma/sdxi/pci.c
> +++ b/drivers/dma/sdxi/pci.c
[ ... ]
> +static void sdxi_pci_remove(struct pci_dev *pdev)
> +{
> + pci_disable_sriov(pdev);
> + sdxi_unregister(&pdev->dev);
> +}
[Severity: High]
Does this leak the IRQ vectors allocated during probe?
The driver allocates IRQ vectors using pci_alloc_irq_vectors() in
sdxi_pci_init(), but this remove callback completes without calling
pci_free_irq_vectors().
Similarly, if sdxi_device_init() fails in sdxi_register(), the error is
returned directly without freeing the IRQ vectors.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=9
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools
2026-06-06 0:02 ` [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools Nathan Lynch via B4 Relay
@ 2026-06-06 0:15 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:15 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, dmaengine, linux-pci, Frank.Li
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [Medium] The DMA pool for Context Control (CXT_CTL) blocks is hardcoded to a 64-byte alignment, causing cacheline false sharing on architectures with larger cache lines.
Pre-existing issues:
- [High] The driver programs hardware registers with physical memory addresses but lacks a hardware quiesce/cleanup path on probe failure, leading to a structural DMA Use-After-Free (UAF).
--
commit 9cf4c3500101cb5a3bcbf3a8b8a42a1772787cc2
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Allocate DMA pools
This commit introduces C structures for context control, context status,
and completion status blocks. It creates DMA pools for each during SDXI
function initialization to ensure proper alignment and avoid cacheline
sharing.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
[ ... ]
> @@ -218,6 +221,43 @@ static int sdxi_fn_activate(struct sdxi_dev *sdxi)
> return 0;
> }
>
> +static int sdxi_device_init(struct sdxi_dev *sdxi)
> +{
[ ... ]
> + size = sizeof(struct sdxi_cxt_sts);
> + align = max(size, SMP_CACHE_BYTES);
> + sdxi->cxt_sts_pool = dmam_pool_create("CXT_STS", dev, size, align, 0);
> + if (!sdxi->cxt_sts_pool)
> + return -ENOMEM;
> +
> + size = align = sizeof(struct sdxi_cxt_ctl);
> + sdxi->cxt_ctl_pool = dmam_pool_create("CXT_CTL", dev, size, align, 0);
[Severity: Medium]
Does this fixed alignment introduce false sharing on architectures with
larger cache lines?
The commit message states the goal is to avoid cacheline sharing for
frequently-updated objects, but setting alignment strictly to the struct
size (64 bytes) means the allocator could pack two CXT_CTL objects into
a single 128-byte cache line on platforms where SMP_CACHE_BYTES > 64.
Should this use max(size, SMP_CACHE_BYTES) similar to how the other
pools in this function are aligned?
> + if (!sdxi->cxt_ctl_pool)
> + return -ENOMEM;
[ ... ]
> + err = sdxi_fn_activate(sdxi);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
> {
> struct sdxi_dev *sdxi;
> @@ -235,7 +275,7 @@ int sdxi_register(struct device *dev, const struct sdxi_bus_ops *ops)
> if (err)
> return err;
>
> - return sdxi_fn_activate(sdxi);
> + return sdxi_device_init(sdxi);
> }
[Severity: High]
This isn't a bug introduced by this patch, but does the probe sequence
leave the hardware active if an error occurs here?
Looking at sdxi_fn_activate(), it programs hardware registers with the
physical addresses of devres-managed memory:
sdxi_fn_activate() {
...
cxt_l2 = FIELD_PREP(SDXI_MMIO_CXT_L2_PTR, sdxi->L2_dma >> ilog2(SZ_4K));
sdxi_write64(sdxi, SDXI_MMIO_CXT_L2, cxt_l2);
...
}
If a subsequent step fails (like sdxi_dev_start timing out), the function
returns an error and sdxi_device_init() propagates it back up to
sdxi_register().
Because devres actions execute in reverse order, the dmam_alloc_coherent
memory is freed before the PCI bus master bit is cleared. Can the active
device perform a DMA into freed memory during this window?
Additionally, if the driver is re-probed, pci_set_master() is called
before sdxi_fn_activate() can stop the device, which might cause the
hardware to immediately resume DMA to the old freed addresses.
Is there a need for a cleanup path on failure that explicitly stops the
device?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=6
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations
2026-06-06 0:02 ` [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations Nathan Lynch via B4 Relay
@ 2026-06-06 0:16 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:16 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, dmaengine, linux-pci, Frank.Li
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] The KUnit test advances the reservation iterator twice per loop iteration, verifying only half of the reserved descriptors and evaluating unsequenced arguments.
- [Medium] The `invalid` test intentionally triggers a kernel warning (`WARN_ONCE`) without suppressing it, which pollutes the kernel log and causes crashes in CI environments.
Pre-existing issues:
- [High] The ring reservation API updates the hardware-visible write pointer before descriptors are populated, risking the device processing uninitialized memory (TOCTOU) and lacking necessary memory barriers.
--
commit 4fc6ef8e997ae4fb9c2e324c9e00fd0b37015590
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Add unit tests for descriptor ring reservations
Add KUnit tests for the descriptor ring reservation API, covering valid
reservations and error paths. A .kunitconfig is included ease of use. No
SDXI hardware is required to run these tests.
> diff --git a/drivers/dma/sdxi/ring_kunit.c b/drivers/dma/sdxi/ring_kunit.c
> new file mode 100644
> index 0000000000000..3bc7073e0c39a
> --- /dev/null
> +++ b/drivers/dma/sdxi/ring_kunit.c
> @@ -0,0 +1,105 @@
[ ... ]
> +static void valid(struct kunit *t)
> +{
[ ... ]
> + sdxi_ring_resv_foreach(&resv, desc) {
> + KUNIT_EXPECT_NOT_NULL_MSG(t, sdxi_ring_resv_next(&resv),
> + "unexpected null descriptor for index %llu", resv.iter);
> + }
[Severity: Medium]
Does this loop inadvertently consume two descriptors per iteration?
The macro sdxi_ring_resv_foreach() implicitly advances the iterator:
drivers/dma/sdxi/ring.h:
#define sdxi_ring_resv_foreach(resv_, desc_) \
for (sdxi_ring_resv_reset(resv_), \
desc_ = sdxi_ring_resv_next(resv_); \
desc_; \
desc_ = sdxi_ring_resv_next(resv_))
By calling sdxi_ring_resv_next() again inside the loop body, the test
skips validation for half of the elements.
Additionally, could calling sdxi_ring_resv_next(&resv) and reading
resv.iter in the same macro expansion lead to unsequenced argument
evaluation?
> +
> + ri = cpu_to_le64(1);
[ ... ]
> +static void invalid(struct kunit *t)
> +{
[ ... ]
> + ri = wi = 0;
> + sdxi_ring_state_init(&rs, &ri, &wi, SZ_1K, descs);
> +
> + KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, 0, &resv), -EINVAL);
[Severity: Medium]
Will this intentionally trigger a warning that pollutes the kernel log?
The test passes invalid arguments designed to trigger a WARN_ONCE in
sdxi_ring_try_reserve():
drivers/dma/sdxi/ring.c:sdxi_ring_try_reserve()
if (WARN_ONCE(nr < 1 || nr > rs->entries,
Since there is no framework mechanism in this test to handle or suppress
this warning, does it permanently consume the global warned state? This
could hide future issues and cause the test suite to panic if panic_on_warn=1
is set.
[Severity: High]
This is a pre-existing issue, but does the ring reservation API update the
hardware-visible write pointer before descriptors are populated?
Looking at sdxi_ring_try_reserve() in drivers/dma/sdxi/ring.c:
drivers/dma/sdxi/ring.c:sdxi_ring_try_reserve()
if (new_widx - ridx > rs->entries)
return -EBUSY;
sdxi_ring_state_store_widx(rs, new_widx);
It updates rs->write_index_ptr in hardware-visible DMA memory immediately
upon reservation, before the caller populates the returned descriptor range.
If Thread A reserves and begins filling a descriptor, and Thread B reserves,
fills, and rings the doorbell, would the device read the updated write
pointer and process Thread A's uninitialized descriptor?
Could this also require a preceding dma_wmb() barrier to ensure descriptor
writes complete before the write pointer is updated?
> + KUNIT_EXPECT_EQ(t, sdxi_ring_try_reserve(&rs, rs.entries + 1, &resv), -EINVAL);
> +
> + ri = cpu_to_le64(1);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=13
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management
2026-06-06 0:02 ` [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management Nathan Lynch via B4 Relay
@ 2026-06-06 0:19 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:19 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: dmaengine, vkoul, linux-pci, Frank.Li
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Premature hardware index update causes a race condition (Hardware TOCTOU) allowing the hardware to execute uninitialized or stale descriptors.
- [Medium] Expensive 64-bit software division in the descriptor ring iterator.
- [Low] The `sdxi_ring_resv_foreach` macro evaluates its `resv_` argument multiple times, creating side-effect hazards.
--
commit 6a855f532756bc29a5250d62a29d1775d0ef6013
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Add descriptor ring management
Introduce a library for managing SDXI descriptor ring state. It
encapsulates determining the next free space in the ring to deposit
descriptors and performing the update of the write index correctly, as
well as iterating over slices (reservations) of the ring without
dealing directly with ring offsets/indexes.
> diff --git a/drivers/dma/sdxi/ring.c b/drivers/dma/sdxi/ring.c
> new file mode 100644
> index 0000000000000..91b28c7afbbf9
> --- /dev/null
> +++ b/drivers/dma/sdxi/ring.c
[ ... ]
> +static void sdxi_ring_state_store_widx(struct sdxi_ring_state *rs, u64 new_widx)
> +{
> + lockdep_assert_held(&rs->lock);
> + rs->write_index = new_widx;
> + WRITE_ONCE(*rs->write_index_ptr, cpu_to_le64(new_widx));
> +}
[Severity: Critical]
Is it safe to update the hardware-visible write index here before the
descriptors are actually populated?
In sdxi_ring_try_reserve(), this computes the new write index and immediately
writes it to the DMA memory visible to the hardware. The lock is then dropped,
and the caller populates descriptors out-of-order.
If the ring has wrapped around, the old descriptor in a reserved slot might
still have its Validity (V) bit set to 1 from the previous pass (since the
driver does not zero out descriptors upon reclamation).
If another thread concurrently reserves slots, populates them, and rings the
doorbell with its advanced index, the hardware could process all slots up to
that doorbell value. The hardware would then check the V bit on this
preempted thread's slot, find it still set to 1, and erroneously execute the
stale descriptor again, leading to memory corruption.
Can the driver decouple the reservation index from the hardware-visible
index, or clear the V bit inside the reservation lock to prevent this?
[ ... ]
> +static struct sdxi_desc *
> +sdxi_desc_ring_entry(const struct sdxi_ring_state *rs, u64 index)
> +{
> + return &rs->entry[do_div(index, rs->entries)];
> +}
[Severity: Medium]
Does this cause performance issues on 32-bit architectures?
Using do_div() here means performing a 64-by-32-bit division on every
descriptor access, which compiles to an expensive function call (__do_div64)
on 32-bit systems.
Since descriptor rings are typically sized as powers of 2 (e.g., SZ_1K),
could a bitwise AND (index & (rs->entries - 1)) be used instead?
Alternatively, the iterator could maintain a 32-bit wrapping array index
instead of repeatedly dividing the absolute 64-bit iteration counter.
[ ... ]
> diff --git a/drivers/dma/sdxi/ring.h b/drivers/dma/sdxi/ring.h
> new file mode 100644
> index 0000000000000..d5682687c05cd
> --- /dev/null
> +++ b/drivers/dma/sdxi/ring.h
[ ... ]
> +#define sdxi_ring_resv_foreach(resv_, desc_) \
> + for (sdxi_ring_resv_reset(resv_), \
> + desc_ = sdxi_ring_resv_next(resv_); \
> + desc_; \
> + desc_ = sdxi_ring_resv_next(resv_))
[Severity: Low]
Can this macro cause side-effect hazards by evaluating its resv_ argument
multiple times?
The macro passes resv_ into sdxi_ring_resv_reset() once and into
sdxi_ring_resv_next() twice during loop execution. If a caller invokes this
macro with an expression that has side effects (e.g.,
sdxi_ring_resv_foreach(&reservations[i++], desc)), the side effect will be
evaluated multiple times per loop iteration.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=12
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator
2026-06-06 0:02 ` [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator Nathan Lynch via B4 Relay
@ 2026-06-06 0:20 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:20 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: linux-pci, Frank.Li, vkoul, dmaengine
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Calling `ida_destroy()` on an uninitialized `struct ida` during error cleanup will cause a 'spinlock bad magic' BUG/panic.
--
commit bae483227e10a5de8c4656d9cc0ae199cc275f05
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Per-context access key (AKey) table entry allocator
Each SDXI context has a table of access keys (AKeys). SDXI descriptors
submitted to a context may refer to an AKey associated with that
context by its index in the table.
> diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
> index 907547ac450f2..9b0984842d9a7 100644
> --- a/drivers/dma/sdxi/context.c
> +++ b/drivers/dma/sdxi/context.c
> @@ -64,6 +65,7 @@ static void sdxi_free_cxt(struct sdxi_cxt *cxt)
> dma_free_coherent(sdxi->dev, sq->ring_size,
> sq->desc_ring, sq->ring_dma);
> kfree(cxt->sq);
> + ida_destroy(&cxt->akey_ida);
[Severity: High]
Will this trigger a spinlock bad magic BUG during error cleanup?
If sdxi_alloc_cxt() fails during its memory allocations:
drivers/dma/sdxi/context.c:sdxi_alloc_cxt() {
struct sdxi_cxt *cxt __free(sdxi_cxt) = kzalloc(sizeof(*cxt), GFP_KERNEL);
...
return NULL;
}
The __free(sdxi_cxt) cleanup handler automatically calls sdxi_free_cxt().
Because ida_init(&cxt->akey_ida) is only called later by external callers
like sdxi_admin_cxt_init() or sdxi_cxt_new(), the ida remains uninitialized
on these early error paths.
Calling ida_destroy() unconditionally here will try to acquire the
uninitialized XArray spinlock with xas_lock_irqsave(), which crashes the
kernel.
[ ... ]
> @@ -322,6 +324,7 @@ int sdxi_admin_cxt_init(struct sdxi_dev *sdxi)
> cxt->db = sdxi->dbs + cxt->id * sdxi->db_stride;
> sdxi_ring_state_init(cxt->ring_state, &sq->cxt_sts->read_index,
> sq->write_index, sq->ring_entries, sq->desc_ring);
> + ida_init(&cxt->akey_ida);
>
> err = sdxi_publish_cxt(cxt);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=15
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors
2026-06-06 0:02 ` [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors Nathan Lynch via B4 Relay
@ 2026-06-06 0:20 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:20 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: linux-pci, vkoul, dmaengine, Frank.Li
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Use of WARN_ON_ONCE() for normal input validation leading to potential DoS, and subsequent dead clamp_val() code.
--
commit 7911601fb1520af0d6b7d063a371eaff05507d92
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Encode nop, copy, and interrupt descriptors
This commit introduces low-level support for serializing nop, copy, and
interrupt operations to the descriptor ring of an SDXI client context.
> diff --git a/drivers/dma/sdxi/descriptor.c b/drivers/dma/sdxi/descriptor.c
> --- a/drivers/dma/sdxi/descriptor.c
> +++ b/drivers/dma/sdxi/descriptor.c
[ ... ]
> @@ -7,12 +7,119 @@
>
> #include <kunit/visibility.h>
> #include <linux/bitfield.h>
> +#include <linux/bug.h>
> +#include <linux/range.h>
> +#include <linux/sizes.h>
> #include <linux/types.h>
> #include <asm/byteorder.h>
>
> #include "hw.h"
> #include "descriptor.h"
>
> +VISIBLE_IF_KUNIT int __must_check sdxi_encode_size32(u64 size, __le32 *dest)
> +{
> + /*
> + * sizes are encoded as value - 1:
> + * value encoding
> + * 1 0
> + * 2 1
> + * ...
> + * 4G 0xffffffff
> + */
> + if (WARN_ON_ONCE(size > SZ_4G) ||
> + WARN_ON_ONCE(size == 0))
> + return -EINVAL;
[Severity: Medium]
Is WARN_ON_ONCE() appropriate for input validation here?
Since dmaengine_prep_dma_memcpy() doesn't filter out 0 or sizes greater
than 4GB before calling the driver, requesting an unsupported transfer size
will trigger a stack dump, and potentially a system crash if panic_on_warn
is enabled.
> + size = clamp_val(size, 1, SZ_4G);
Also, since the condition above returns -EINVAL for out-of-bounds sizes,
is this clamp_val() call unreachable dead code?
> + *dest = cpu_to_le32((u32)(size - 1));
> + return 0;
> +}
> +EXPORT_SYMBOL_IF_KUNIT(sdxi_encode_size32);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=20
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API
2026-06-06 0:02 ` [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API Nathan Lynch via B4 Relay
@ 2026-06-06 0:21 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:21 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, dmaengine, Frank.Li, linux-pci
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Misplaced dma_rmb() and missing READ_ONCE()/endian conversion in sdxi_completion_signaled() allow stale DMA data reads.
- [High] Race condition in sdxi_completion_poll() can cause false timeouts if the thread is preempted.
- [High] Freeing the completion block after a timeout without aborting the hardware allows the device to write to freed DMA memory.
--
commit 7e5654d9158f4ac258b9deab508138cef7d4089d
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Add completion status block API
Introduce an API for managing completion status blocks. These are
DMA-coherent buffers that may be optionally attached to SDXI
descriptors to signal completion.
> diff --git a/drivers/dma/sdxi/completion.c b/drivers/dma/sdxi/completion.c
> new file mode 100644
> index 0000000000000..7ffd034b129b8
> --- /dev/null
> +++ b/drivers/dma/sdxi/completion.c
[ ... ]
> +int sdxi_completion_poll(const struct sdxi_completion *sc)
> +{
> + unsigned long deadline = jiffies + msecs_to_jiffies(1000);
> +
> + while (le64_to_cpu(READ_ONCE(sc->cst_blk->signal)) != 0) {
> + if (time_after(jiffies, deadline))
> + return -ETIMEDOUT;
[Severity: High]
Could this return a false timeout if the thread is preempted?
If the executing thread checks that the signal is not 0, gets preempted,
and then resumes after the deadline, time_after() will evaluate to true
and the function will return -ETIMEDOUT. Would it be safer to do a final
check of the hardware signal after detecting a timeout to ensure the
device didn't complete the operation during the preemption window?
[Severity: High]
When this loop times out and returns -ETIMEDOUT, the caller's cleanup
guard will free the completion block. Because the hardware operation is
never explicitly aborted, could the device eventually finish the stalled
descriptor and write to the freed DMA address?
This might corrupt memory if the DMA buffer has already been reallocated
from the pool for another descriptor or structure.
> + cpu_relax();
> + }
> +
> + return sdxi_completion_errored(sc) ? -EIO : 0;
> +}
> +
> +bool sdxi_completion_signaled(const struct sdxi_completion *sc)
> +{
> + dma_rmb();
> + return (sc->cst_blk->signal == 0);
[Severity: High]
Should the read memory barrier be placed after the signal is successfully
verified as 0?
Executing dma_rmb() before checking the signal allows the CPU to
speculatively fetch payload memory before the DMA engine has finished
writing it.
Also, does reading sc->cst_blk->signal require READ_ONCE() and
le64_to_cpu() to prevent compiler optimization and endianness issues,
similar to how it is read in sdxi_completion_poll()?
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=17
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs
2026-06-06 0:02 ` [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs Nathan Lynch via B4 Relay
@ 2026-06-06 0:22 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:22 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, linux-pci, dmaengine, Frank.Li
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Silent failures in `sdxi_stop_cxt` do not stop the hardware but provide no error to the caller, leading to subsequent use-after-free of context resources.
- [High] DMA memory use-after-free triggered by `sdxi_completion_poll` timeout in `sdxi_start_cxt` and `sdxi_stop_cxt`.
--
commit 6a36436cc92c4877bfb050c28457f1ac8dbaf6a2
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Provide context start and stop APIs
This commit introduces high-level context start and stop APIs for SDXI.
These APIs encapsulate submitting administrative descriptors and waiting
for completion signaling.
> diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
> index 9b0984842d9a7..afd3414163428 100644
> --- a/drivers/dma/sdxi/context.c
> +++ b/drivers/dma/sdxi/context.c
[ ... ]
> +int sdxi_start_cxt(struct sdxi_cxt *cxt)
> +{
> + struct sdxi_cxt *adm = to_admin_cxt(cxt);
> + struct sdxi_desc *desc;
[ ... ]
> + sdxi_completion_attach(desc, sc);
> + sdxi_desc_make_valid(desc);
> + sdxi_cxt_push_doorbell(adm, sdxi_ring_resv_dbval(&resv));
> +
> + return sdxi_completion_poll(sc);
> +}
[Severity: High]
If sdxi_completion_poll() times out here, does this return early and
trigger the scope-based cleanup of the sdxi_completion structure?
If the hardware operation is just delayed and not actually aborted, could
the hardware later write to this completion block address after it has
been freed back to the DMA pool?
> +void sdxi_stop_cxt(struct sdxi_cxt *cxt)
> +{
> + struct sdxi_cxt *adm = to_admin_cxt(cxt);
> + struct sdxi_desc *stop, *sync;
> + struct sdxi_ring_resv resv;
> + int err;
> +
> + might_sleep();
> +
> + struct sdxi_completion *sc __free(sdxi_completion) =
> + sdxi_completion_alloc(cxt->sdxi);
> +
> + if (!sc)
> + return;
[Severity: Critical]
Since sdxi_stop_cxt() returns void, do these silent early returns leave
the caller unaware that the hardware context was never actually stopped?
If a caller proceeds to tear down the context and free its backing
descriptor ring and memory, could the active DMA engine continue processing
descriptors and corrupt memory?
> + /* This is not how to stop the admin context. */
> + if (WARN_ON(adm == cxt))
> + return;
> +
> + err = sdxi_ring_reserve(adm->ring_state, 2, &resv);
> + if (WARN_ON_ONCE(err))
> + return;
[Severity: Critical]
Similarly, if reserving space in the admin ring fails here, the function
returns silently without stopping the hardware. Could this lead to the same
scenario where context memory is freed while still in use by the hardware?
> + stop = sdxi_ring_resv_next(&resv);
> + sync = sdxi_ring_resv_next(&resv);
[ ... ]
> + sdxi_completion_attach(sync, sc);
> + sdxi_desc_make_valid(stop);
> + sdxi_desc_make_valid(sync);
> + sdxi_cxt_push_doorbell(adm, sdxi_ring_resv_dbval(&resv));
> +
> + WARN_ON(sdxi_completion_poll(sc));
> +}
[Severity: High]
Just as in sdxi_start_cxt(), if the poll function times out here, does the
early return free the in-use DMA memory without aborting the pending DMA
stop/sync operation?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=19
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs
2026-06-06 0:02 ` [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs Nathan Lynch via B4 Relay
@ 2026-06-06 0:22 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:22 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, Frank.Li, linux-pci, dmaengine
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Inverted teardown sequence in `sdxi_unregister()` causes DMA Use-After-Free and host Use-After-Free by freeing contexts while active.
- [High] Context DMA memory is freed without hardware synchronization, leading to use-after-free by the device.
--
commit f66266c7bb64ddb6a5bf8b858f8554af8ba46c5a
Author: Nathan Lynch <nathan.lynch@amd.com>
Subject: dmaengine: sdxi: Add client context alloc and release APIs
This commit exposes sdxi_cxt_new() and sdxi_cxt_exit() to create and release
SDXI contexts, tracking them in an xarray. It also introduces logic to
invalidate a context's L1 entry upon deallocation.
> diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
> index cc99fe9ecf928..2f9a29df5d9ce 100644
> --- a/drivers/dma/sdxi/context.c
> +++ b/drivers/dma/sdxi/context.c
[ ... ]
> +/* Invalidate a context. */
> +static void sdxi_rescind_cxt(struct sdxi_cxt *cxt)
> +{
> + u8 l1_idx = ID_TO_L1_INDEX(cxt->id);
> + struct sdxi_cxt_L1_ent *ent = &cxt->sdxi->L1_table->entry[l1_idx];
> +
> + invalidate_L1_entry(ent);
> + invalidate_cxtl_ctl(cxt->cxt_ctl);
> + /* todo: need to send DSC_CXT_UPD to admin */
> +}
[ ... ]
> +void sdxi_cxt_exit(struct sdxi_cxt *cxt)
> +{
> + if (WARN_ON(sdxi_cxt_is_admin(cxt)))
> + return;
> +
> + sdxi_rescind_cxt(cxt);
> + sdxi_free_cxt(cxt);
> +}
[Severity: High]
Does this code free context DMA memory without synchronizing with the hardware?
By clearing the valid bits but leaving the DSC_CXT_UPD synchronization for a
later TODO in sdxi_rescind_cxt(), the device is not notified and its internal
caches may still reference these context structures.
When sdxi_cxt_exit() immediately calls sdxi_free_cxt() afterwards, could the
hardware perform DMA accesses on memory that has been returned to the
allocators, leading to a device use-after-free?
[ ... ]
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> index b875d8cdb58af..d3d1ad2f2effe 100644
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
[ ... ]
> @@ -346,6 +348,12 @@ EXPORT_SYMBOL_NS_GPL(sdxi_register, "SDXI");
> void sdxi_unregister(struct device *dev)
> {
> struct sdxi_dev *sdxi = dev_get_drvdata(dev);
> + struct sdxi_cxt *cxt;
> + unsigned long index;
> +
> + xa_for_each(&sdxi->client_cxts, index, cxt)
> + sdxi_cxt_exit(cxt);
> + xa_destroy(&sdxi->client_cxts);
>
> sdxi_dev_stop(sdxi);
> }
[Severity: Critical]
Does this teardown sequence result in DMA memory corruption and a host
double-free?
By forcefully freeing all client context DMA memory via sdxi_cxt_exit()
before calling sdxi_dev_stop(), the device hardware is still active and the
contexts themselves have never been stopped via sdxi_stop_cxt().
Furthermore, since these contexts are allocated per channel, won't the DMA
engine framework later invoke sdxi_dma_free_chan_resources() on channel
release or devres teardown, which will attempt to stop and free the already
freed sdchan->cxt?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=11
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts
2026-06-06 0:02 ` [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts Nathan Lynch via B4 Relay
@ 2026-06-06 0:24 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:24 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: Frank.Li, linux-pci, vkoul, dmaengine
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Driver teardown frees descriptor ring state while threads may be waiting, causing a use-after-free.
Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but hardware stalls lead to permanent D-state hangs in `sdxi_ring_reserve`.
--
commit 7d20c79380e3ebc097f2724f0a53221cb5571730
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Attach descriptor ring state to contexts
This commit allocates and attaches a sdxi_ring_state structure to each
sdxi_cxt upon context allocation. The ring state's lifetime is tied to
the context and is freed upon context release.
> diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
> index 2f9a29df5d9ce..907547ac450f2 100644
> --- a/drivers/dma/sdxi/context.c
> +++ b/drivers/dma/sdxi/context.c
[ ... ]
> @@ -63,6 +64,7 @@ static void sdxi_free_cxt(struct sdxi_cxt *cxt)
> dma_free_coherent(sdxi->dev, sq->ring_size,
> sq->desc_ring, sq->ring_dma);
> kfree(cxt->sq);
> + kfree(cxt->ring_state);
[Severity: High]
Does this free cause a use-after-free if threads are still waiting in
sdxi_ring_reserve()?
During teardown, contexts may be forcefully exited without stopping DMA
channels or ensuring active users have left. If sdxi_unregister() calls
sdxi_cxt_exit() -> sdxi_free_cxt(), the ring_state struct is freed.
Threads blocked in sdxi_ring_reserve() rely on the wait queue head and lock
inside this freed allocation:
drivers/dma/sdxi/ring.c:sdxi_ring_reserve() {
wait_event(rs->wqh,
(ret = sdxi_ring_try_reserve(rs, nr, resv)) != -EBUSY);
}
When a sleeping task is spuriously woken up, it could dereference the freed
rs pointer, leading to memory corruption.
[Severity: High]
This is a pre-existing issue, but can hardware stalls lead to permanent
D-state hangs here?
Looking at sdxi_ring_reserve(), the uninterruptible wait_event() lacks
a timeout and ignores signals:
drivers/dma/sdxi/ring.c:sdxi_ring_reserve() {
wait_event(rs->wqh,
(ret = sdxi_ring_try_reserve(rs, nr, resv)) != -EBUSY);
}
If the hardware encounters a fatal error and stops processing descriptors, it
will never increment the read index. Any process attempting to submit a command
will sleep indefinitely.
> kfree(cxt);
> }
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=14
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding
2026-06-06 0:02 ` [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding Nathan Lynch via B4 Relay
@ 2026-06-06 0:26 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:26 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: vkoul, Frank.Li, linux-pci, dmaengine
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Copy-paste error in the `cxt_stop` test causing it to assert on an incorrect union field (`desc.cxt_start.vflags`).
--
commit cabf8e94603d8e7f4dadb05e7df3a65c6cd9cd8f
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Add unit tests for descriptor encoding
This commit adds unit tests for the SDXI descriptor encoder functions.
It uses the packing API to independently decode and verify the descriptor
layouts without relying on the driver's bitfield macros.
> diff --git a/drivers/dma/sdxi/descriptor_kunit.c b/drivers/dma/sdxi/descriptor_kunit.c
> new file mode 100644
> index 0000000000000..1f3c2e7ab2ddf
> --- /dev/null
> +++ b/drivers/dma/sdxi/descriptor_kunit.c
[ ... ]
> +static void cxt_stop(struct kunit *t)
> +{
> + struct unpacked_desc unpacked;
> + struct sdxi_cxt_stop stop = {
> + .range = sdxi_cxt_range_single(2),
> + };
> + struct sdxi_desc desc;
> +
> + desc_poison(&desc);
> + KUNIT_ASSERT_EQ(t, 0, sdxi_encode_cxt_stop(&desc, &stop));
> +
> + unpack_cxt_stop(&unpacked, &desc);
> +
> + /* Check op-specific fields. */
> + KUNIT_EXPECT_EQ(t, 0, desc.cxt_start.vflags);
[Severity: Low]
Is there a copy-paste error here? This is the cxt_stop test, but it checks
desc.cxt_start.vflags instead of desc.cxt_stop.vflags.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=21
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 08/23] dmaengine: sdxi: Install administrative context
2026-06-06 0:02 ` [PATCH v3 08/23] dmaengine: sdxi: Install " Nathan Lynch via B4 Relay
@ 2026-06-06 0:26 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:26 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: linux-pci, vkoul, dmaengine, Frank.Li
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Missing device stop on probe error path leads to hardware DMA use-after-free.
--
commit 982a4ee81c07402d7c367dae3f394c267f1ffedd
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Install administrative context
Serialize the context control block, akey table, and L1 entry for the
admin context, making its descriptor ring, write index, and context
status block visible to the SDXI implementation once it is activated.
> diff --git a/drivers/dma/sdxi/context.c b/drivers/dma/sdxi/context.c
> index 443c231303aff..cc99fe9ecf928 100644
> --- a/drivers/dma/sdxi/context.c
> +++ b/drivers/dma/sdxi/context.c
[ ... ]
[Severity: High]
Could this regression occur where the error path returns a timeout without
stopping the hardware that was just commanded to start?
drivers/dma/sdxi/device.c:sdxi_dev_start() {
...
ret = sdxi_dev_gsv_poll(sdxi, status,
status == SDXI_GSV_ACTIVE ||
status == SDXI_GSV_ERROR);
if (ret) {
dev_err(sdxi->dev, "activation timed out, current state: %s\n",
gsv_str(status));
return ret;
}
...
}
During probe, sdxi_fn_activate() commands the hardware to transition to the
ACTIVE state by writing SDXI_GSRV_ACTIVE in sdxi_dev_start(), and then polls
for it. If this polling times out, sdxi_dev_start() returns -ETIMEDOUT and
the device probe fails:
drivers/dma/sdxi/device.c:sdxi_fn_activate() {
...
err = sdxi_dev_start(sdxi);
if (err)
return err;
...
}
However, the driver does not invoke sdxi_dev_stop() on this error path, nor
does it register a devres action to stop the hardware on teardown. As devres
unwinds the probe failure, it frees the DMA memory for the L1 table,
L2 table, and admin context.
Because the hardware was commanded to start and never stopped, could it
transition to ACTIVE asynchronously after the timeout and access the freed
DMA memory, resulting in IOMMU faults or memory corruption?
Should sdxi_dev_stop() be explicitly called on the error path, or registered
as a devm action?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=8
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping
2026-06-06 0:02 ` [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping Nathan Lynch via B4 Relay
@ 2026-06-06 0:31 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:31 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: linux-pci, dmaengine, vkoul, Frank.Li
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Failure to check the return value of dma_set_mask_and_coherent().
Pre-existing issues:
- [Critical] Teardown ordering risks Use-After-Free: Software tracking structures and DMA memory are destroyed before the hardware is stopped.
--
commit 31c552b5f89a074f21648e62e100ec4d29388cf3
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: MSI/MSI-X vector allocation and mapping
During PCI probe, allocate a vector per context supported by the
function as reported by the capability register, plus one for the
error log interrupt. Introduce wrapper functions for vector
allocation, freeing, and translation to IRQ numbers.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
[ ... ]
> @@ -355,6 +357,8 @@ void sdxi_unregister(struct device *dev)
> sdxi_cxt_exit(cxt);
> xa_destroy(&sdxi->client_cxts);
>
> + ida_destroy(&sdxi->vectors);
> +
> sdxi_dev_stop(sdxi);
> }
[Severity: Critical]
This isn't a bug introduced by this patch, but does this teardown ordering
risk a use-after-free?
In sdxi_unregister(), the driver destroys all client contexts and their
DMA descriptor rings before sdxi_dev_stop() is called to quiesce the
hardware.
Because the hardware is still active, could it write to the freed status
rings or generate interrupts, causing sdxi_dma_cxt_irq() to dereference
freed memory?
Additionally, since the DMA engine framework is unregistered after
sdxi_unregister() completes during devres cleanup, could a client freeing
a channel in that phase cause sdxi_dma_free_chan_resources() to double-free
the already destroyed context?
> diff --git a/drivers/dma/sdxi/pci.c b/drivers/dma/sdxi/pci.c
> --- a/drivers/dma/sdxi/pci.c
> +++ b/drivers/dma/sdxi/pci.c
> @@ -29,7 +31,8 @@ static int sdxi_pci_init(struct sdxi_dev *sdxi)
> {
> struct pci_dev *pdev = sdxi_to_pci_dev(sdxi);
> struct device *dev = &pdev->dev;
> - int ret;
> + unsigned int cap1_max_cxt;
> + int vecs, ret;
>
> ret = pcim_enable_device(pdev);
> if (ret)
[Severity: High]
Does this function need to check the return value of
dma_set_mask_and_coherent()?
Looking at sdxi_pci_init(), it unconditionally configures a 64-bit DMA mask:
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
If the platform or bus rejects the 64-bit mask, could the driver proceed
to allocate and map DMA buffers under the false assumption that 64-bit
addressing is supported, resulting in silently truncated physical addresses
or IOMMU faults?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=22
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider
2026-06-06 0:02 ` [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
@ 2026-06-06 0:33 ` sashiko-bot
0 siblings, 0 replies; 40+ messages in thread
From: sashiko-bot @ 2026-06-06 0:33 UTC (permalink / raw)
To: Nathan Lynch via B4 Relay; +Cc: linux-pci, Frank.Li, vkoul, dmaengine
Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [Critical] Polling timeout in `sdxi_dma_synchronize()` or `sdxi_stop_cxt()` frees the completion status block while the hardware DMA engine may still write to it.
- [Critical] The interrupt handler silently ignores DMA hardware errors, treating failed transfers as successful and masking data corruption.
- [High] `sdxi_dma_terminate_all()` fails to abort or reclaim issued transfers, violating `dmaengine_terminate_all` semantics and causing DMA Use-After-Free.
- [High] Error paths in `sdxi_dma_synchronize()` bypass `vchan_synchronize()` and leave uninitialized holes in the hardware ring, permanently stalling the channel.
- [High] `sdxi_tx_status()` returns `DMA_ERROR` without unlinking the errored descriptor, causing destructive state inconsistencies.
- [High] Polling `sdxi_tx_status()` completes descriptors out of order, stranding older unpolled transfers in `desc_issued`.
- [High] Freeing an unsubmitted descriptor leaves invalid slots in the hardware ring, permanently hanging the DMA channel.
- [Medium] Missing error check for `sdxi_dma_register` in `sdxi_device_init` ignores registration failures.
--
commit c729c81d44b7c609574437378a8416c4d4f6cf2b
Author: Nathan Lynch <nathan.lynch@amd.com>
dmaengine: sdxi: Add DMA engine provider
Register a DMA engine provider that implements memcpy. The number of
channels per SDXI function can be controlled via a module
parameter (dma_channels). The provider uses the virt-dma library.
> diff --git a/drivers/dma/sdxi/device.c b/drivers/dma/sdxi/device.c
> index 26be376c9545f..d6097a6d8992a 100644
> --- a/drivers/dma/sdxi/device.c
> +++ b/drivers/dma/sdxi/device.c
> @@ -321,6 +322,7 @@ static int sdxi_device_init(struct sdxi_dev *sdxi)
> if (err)
> return err;
>
> + sdxi_dma_register(sdxi);
[Severity: Medium]
Is it intentional to ignore the return value of sdxi_dma_register() here?
If registration fails, the device probe will succeed even though the DMA
provider interface isn't actually registered.
> return 0;
> }
>
> diff --git a/drivers/dma/sdxi/dma.c b/drivers/dma/sdxi/dma.c
> new file mode 100644
> index 0000000000000..01c85cc2763d8
> --- /dev/null
> +++ b/drivers/dma/sdxi/dma.c
[ ... ]
> +static void sdxi_tx_desc_free(struct virt_dma_desc *vdesc)
> +{
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> +
> + sdxi_completion_free(sddesc->completion);
> + kfree(to_sdxi_dma_desc(vdesc));
> +}
[Severity: High]
If a client frees a prepared descriptor without submitting it, won't the
reserved hardware slots be left in the ring with their valid bits cleared?
When subsequent descriptors are submitted, it seems the hardware will
permanently hang waiting for these abandoned slots to become valid.
[ ... ]
> +static enum dma_status sdxi_tx_status(struct dma_chan *chan,
> + dma_cookie_t cookie,
> + struct dma_tx_state *state)
> +{
[ ... ]
> + if (!sdxi_completion_signaled(sddesc->completion))
> + return DMA_IN_PROGRESS;
> +
> + if (sdxi_completion_errored(sddesc->completion))
> + return DMA_ERROR;
[Severity: High]
If sdxi_completion_errored() returns true, this returns DMA_ERROR without
unlinking the descriptor from the desc_issued list.
Will the interrupt handler sdxi_dma_cxt_irq() then process this errored
descriptor, ignore the error, and successfully complete it via
vchan_cookie_complete()? Does this cause subsequent polling to report
DMA_COMPLETE and destroy the error state?
> +
> + list_del(&vdesc->node);
> + vchan_cookie_complete(vdesc);
[Severity: High]
If this function successfully calls vchan_cookie_complete(), doesn't it
unconditionally set chan->completed_cookie to this single descriptor's cookie?
If there are older, uncompleted descriptors preceding it in desc_issued, will
subsequent status checks for those older descriptors return DMA_COMPLETE
early?
> +
> + return dma_cookie_status(chan, cookie, state);
> +}
[ ... ]
> +static int sdxi_dma_terminate_all(struct dma_chan *dma_chan)
> +{
> + struct virt_dma_chan *vchan = to_virt_chan(dma_chan);
> + u64 dbval = 0;
[ ... ]
> + scoped_guard(spinlock_irqsave, &vchan->lock) {
> + struct virt_dma_desc *vdesc;
> + LIST_HEAD(head);
> +
> + list_splice_tail_init(&vchan->desc_allocated, &head);
> + list_splice_tail_init(&vchan->desc_submitted, &head);
[Severity: High]
Does this completely ignore the desc_issued list and fail to halt the
hardware context using sdxi_stop_cxt()?
If the desc_issued list is ignored, won't the hardware continue executing
active transfers and writing to destination buffers that the client may have
already freed after calling terminate_all?
> +
> + if (list_empty(&head))
> + return 0;
[ ... ]
> +static void sdxi_dma_synchronize(struct dma_chan *dma_chan)
> +{
> + struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
> + struct sdxi_ring_resv resv;
> + struct sdxi_desc *nop;
> + int err;
> +
> + /* Submit a single nop with fence and wait for it to complete. */
> +
> + if (sdxi_ring_reserve(cxt->ring_state, 1, &resv))
> + return;
> +
> + struct sdxi_completion *comp __free(sdxi_completion) = sdxi_completion_alloc(cxt->sdxi);
> + if (!comp)
> + return;
[Severity: High]
If sdxi_completion_alloc() fails and returns early, does this leave the
reserved hardware slot completely uninitialized with a valid bit of 0?
Will the hardware permanently hang when it encounters this slot on a
subsequent doorbell ring? Also, does returning early bypass
vchan_synchronize() while the channel might still be active?
> +
> + nop = sdxi_ring_resv_next(&resv);
> + sdxi_serialize_nop(nop);
> + sdxi_completion_attach(nop, comp);
> + sdxi_desc_set_fence(nop);
> + sdxi_desc_make_valid(nop);
> + sdxi_cxt_push_doorbell(cxt, sdxi_ring_resv_dbval(&resv));
> +
> + err = sdxi_completion_poll(comp);
> + WARN_ONCE(err, "got %d polling cst_blk", err);
[Severity: Critical]
If sdxi_completion_poll() times out and returns -ETIMEDOUT, the function
returns and comp is automatically freed by the __free() cleanup wrapper.
Does this return the physical memory to the DMA pool while the hardware DMA
engine retains the physical address pointer?
If the hardware eventually processes the descriptor, will it blindly write
status updates to freed DMA memory that may have been reallocated?
(Note: this identical polling timeout issue also seems to affect
sdxi_stop_cxt() in context.c)
> +
> + vchan_synchronize(to_virt_chan(dma_chan));
> +}
> +
> +static irqreturn_t sdxi_dma_cxt_irq(int irq, void *data)
> +{
> + struct sdxi_dma_chan *sdchan = data;
> + struct virt_dma_chan *vchan = &sdchan->vchan;
> + struct virt_dma_desc *vdesc;
> + bool completed = false;
> +
> + guard(spinlock_irqsave)(&vchan->lock);
> +
> + while ((vdesc = vchan_next_desc(vchan))) {
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> +
> + if (!sdxi_completion_signaled(sddesc->completion))
> + break;
> +
> + list_del(&vdesc->node);
> + vchan_cookie_complete(&sddesc->vdesc);
> + completed = true;
> + }
[Severity: Critical]
Does this interrupt handler blindly process signaled descriptors without
calling sdxi_completion_errored() to check for hardware errors?
If the hardware reported an error bit, does this mask the failure and treat
the transfer as successful, potentially causing undetected data corruption?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605-sdxi-base-v3-0-4d38ca2bdffe@amd.com?part=23
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2026-06-06 0:33 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-06 0:02 [PATCH v3 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 01/23] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 02/23] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 03/23] dmaengine: sdxi: Add PCI initialization Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 04/23] dmaengine: sdxi: Feature discovery and initial configuration Nathan Lynch via B4 Relay
2026-06-06 0:14 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 05/23] dmaengine: sdxi: Configure context tables Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 06/23] dmaengine: sdxi: Allocate DMA pools Nathan Lynch via B4 Relay
2026-06-06 0:15 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 07/23] dmaengine: sdxi: Allocate administrative context Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 08/23] dmaengine: sdxi: Install " Nathan Lynch via B4 Relay
2026-06-06 0:26 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 09/23] dmaengine: sdxi: Start functions on probe, stop on remove Nathan Lynch via B4 Relay
2026-06-06 0:14 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 10/23] dmaengine: sdxi: Complete administrative context jump start Nathan Lynch via B4 Relay
2026-06-06 0:12 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 11/23] dmaengine: sdxi: Add client context alloc and release APIs Nathan Lynch via B4 Relay
2026-06-06 0:22 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 12/23] dmaengine: sdxi: Add descriptor ring management Nathan Lynch via B4 Relay
2026-06-06 0:19 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations Nathan Lynch via B4 Relay
2026-06-06 0:16 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts Nathan Lynch via B4 Relay
2026-06-06 0:24 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator Nathan Lynch via B4 Relay
2026-06-06 0:20 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 16/23] dmaengine: sdxi: Generic descriptor manipulation helpers Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 17/23] dmaengine: sdxi: Add completion status block API Nathan Lynch via B4 Relay
2026-06-06 0:21 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 18/23] dmaengine: sdxi: Encode context start, stop, and sync descriptors Nathan Lynch via B4 Relay
2026-06-06 0:02 ` [PATCH v3 19/23] dmaengine: sdxi: Provide context start and stop APIs Nathan Lynch via B4 Relay
2026-06-06 0:22 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors Nathan Lynch via B4 Relay
2026-06-06 0:20 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding Nathan Lynch via B4 Relay
2026-06-06 0:26 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping Nathan Lynch via B4 Relay
2026-06-06 0:31 ` sashiko-bot
2026-06-06 0:02 ` [PATCH v3 23/23] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
2026-06-06 0:33 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox