* [RFC PATCH v3 01/35] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-19 3:08 ` Frank Li
2025-12-17 15:15 ` [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
` (34 subsequent siblings)
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Follow common kernel idioms for indices derived from configfs attributes
and suppress Smatch warnings:
epf_ntb_mw1_show() warn: potential spectre issue 'ntb->mws_size' [r]
epf_ntb_mw1_store() warn: potential spectre issue 'ntb->mws_size' [w]
Also fix the error message for out-of-range MW indices and %lld format
for unsigned values.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
Note: I noticed [RFC PATCH v2 01/27] resurrected the Smatch warnings
https://lore.kernel.org/all/20251129160405.2568284-2-den@valinux.co.jp/
This RFC v3 version therefore reverts to the RFC v1 style, with one
additional fix to correct the sprintf format specifier (%lld->%llu).
---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 24 +++++++++++--------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 3ecc5059f92b..56aab5d354d6 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -995,17 +995,19 @@ static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
struct config_group *group = to_config_group(item); \
struct epf_ntb *ntb = to_epf_ntb(group); \
struct device *dev = &ntb->epf->dev; \
- int win_no; \
+ int win_no, idx; \
\
if (sscanf(#_name, "mw%d", &win_no) != 1) \
return -EINVAL; \
\
- if (win_no <= 0 || win_no > ntb->num_mws) { \
- dev_err(dev, "Invalid num_nws: %d value\n", ntb->num_mws); \
+ idx = win_no - 1; \
+ if (idx < 0 || idx >= ntb->num_mws) { \
+ dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
+ win_no, ntb->num_mws); \
return -EINVAL; \
} \
- \
- return sprintf(page, "%lld\n", ntb->mws_size[win_no - 1]); \
+ idx = array_index_nospec(idx, ntb->num_mws); \
+ return sprintf(page, "%llu\n", ntb->mws_size[idx]); \
}
#define EPF_NTB_MW_W(_name) \
@@ -1015,7 +1017,7 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
struct config_group *group = to_config_group(item); \
struct epf_ntb *ntb = to_epf_ntb(group); \
struct device *dev = &ntb->epf->dev; \
- int win_no; \
+ int win_no, idx; \
u64 val; \
int ret; \
\
@@ -1026,12 +1028,14 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
if (sscanf(#_name, "mw%d", &win_no) != 1) \
return -EINVAL; \
\
- if (win_no <= 0 || win_no > ntb->num_mws) { \
- dev_err(dev, "Invalid num_nws: %d value\n", ntb->num_mws); \
+ idx = win_no - 1; \
+ if (idx < 0 || idx >= ntb->num_mws) { \
+ dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
+ win_no, ntb->num_mws); \
return -EINVAL; \
} \
- \
- ntb->mws_size[win_no - 1] = val; \
+ idx = array_index_nospec(idx, ntb->num_mws); \
+ ntb->mws_size[idx] = val; \
\
return len; \
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 01/35] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access
2025-12-17 15:15 ` [RFC PATCH v3 01/35] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access Koichiro Den
@ 2025-12-19 3:08 ` Frank Li
0 siblings, 0 replies; 61+ messages in thread
From: Frank Li @ 2025-12-19 3:08 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:35AM +0900, Koichiro Den wrote:
> Follow common kernel idioms for indices derived from configfs attributes
> and suppress Smatch warnings:
>
> epf_ntb_mw1_show() warn: potential spectre issue 'ntb->mws_size' [r]
> epf_ntb_mw1_store() warn: potential spectre issue 'ntb->mws_size' [w]
>
> Also fix the error message for out-of-range MW indices and %lld format
> for unsigned values.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Note: I noticed [RFC PATCH v2 01/27] resurrected the Smatch warnings
> https://lore.kernel.org/all/20251129160405.2568284-2-den@valinux.co.jp/
> This RFC v3 version therefore reverts to the RFC v1 style, with one
> additional fix to correct the sprintf format specifier (%lld->%llu).
> ---
> drivers/pci/endpoint/functions/pci-epf-vntb.c | 24 +++++++++++--------
> 1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 3ecc5059f92b..56aab5d354d6 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -995,17 +995,19 @@ static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
> struct config_group *group = to_config_group(item); \
> struct epf_ntb *ntb = to_epf_ntb(group); \
> struct device *dev = &ntb->epf->dev; \
> - int win_no; \
> + int win_no, idx; \
> \
> if (sscanf(#_name, "mw%d", &win_no) != 1) \
> return -EINVAL; \
> \
> - if (win_no <= 0 || win_no > ntb->num_mws) { \
> - dev_err(dev, "Invalid num_nws: %d value\n", ntb->num_mws); \
> + idx = win_no - 1; \
> + if (idx < 0 || idx >= ntb->num_mws) { \
> + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> + win_no, ntb->num_mws); \
> return -EINVAL; \
> } \
> - \
> - return sprintf(page, "%lld\n", ntb->mws_size[win_no - 1]); \
> + idx = array_index_nospec(idx, ntb->num_mws); \
> + return sprintf(page, "%llu\n", ntb->mws_size[idx]); \
> }
>
> #define EPF_NTB_MW_W(_name) \
> @@ -1015,7 +1017,7 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> struct config_group *group = to_config_group(item); \
> struct epf_ntb *ntb = to_epf_ntb(group); \
> struct device *dev = &ntb->epf->dev; \
> - int win_no; \
> + int win_no, idx; \
> u64 val; \
> int ret; \
> \
> @@ -1026,12 +1028,14 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> if (sscanf(#_name, "mw%d", &win_no) != 1) \
> return -EINVAL; \
> \
> - if (win_no <= 0 || win_no > ntb->num_mws) { \
> - dev_err(dev, "Invalid num_nws: %d value\n", ntb->num_mws); \
> + idx = win_no - 1; \
> + if (idx < 0 || idx >= ntb->num_mws) { \
> + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> + win_no, ntb->num_mws); \
> return -EINVAL; \
> } \
> - \
> - ntb->mws_size[win_no - 1] = val; \
> + idx = array_index_nospec(idx, ntb->num_mws); \
> + ntb->mws_size[idx] = val; \
> \
> return len; \
> }
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 01/35] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-19 3:19 ` Frank Li
2025-12-17 15:15 ` [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU Koichiro Den
` (33 subsequent siblings)
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Introduce new mwN_offset configfs attributes to specify memory window
offsets. This enables mapping multiple windows into a single BAR at
arbitrary offsets, improving layout flexibility.
Extend the control register region and add a 32-bit config version
field. Reuse NTB_EPF_TOPOLOGY (0x0C), which is currently unused, as the
version register. The endpoint function driver writes 1
(NTB_EPF_CTRL_VERSION_V1), and ntb_hw_epf reads it at probe time and
refuses to bind to unknown versions.
Endpoint running with an older kernel that do not program
NTB_EPF_CTRL_VERSION will be rejected early by host with newer kernel,
instead of misbehaving at runtime.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/epf/ntb_hw_epf.c | 44 +++++-
drivers/pci/endpoint/functions/pci-epf-vntb.c | 136 ++++++++++++++++--
2 files changed, 160 insertions(+), 20 deletions(-)
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index d3ecf25a5162..126ba38e32ea 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -30,18 +30,22 @@
#define NTB_EPF_LINK_STATUS 0x0A
#define LINK_STATUS_UP BIT(0)
-#define NTB_EPF_TOPOLOGY 0x0C
+/* 0x24 (32bit) is unused */
+#define NTB_EPF_CTRL_VERSION 0x0C
#define NTB_EPF_LOWER_ADDR 0x10
#define NTB_EPF_UPPER_ADDR 0x14
#define NTB_EPF_LOWER_SIZE 0x18
#define NTB_EPF_UPPER_SIZE 0x1C
#define NTB_EPF_MW_COUNT 0x20
-#define NTB_EPF_MW1_OFFSET 0x24
#define NTB_EPF_SPAD_OFFSET 0x28
#define NTB_EPF_SPAD_COUNT 0x2C
#define NTB_EPF_DB_ENTRY_SIZE 0x30
#define NTB_EPF_DB_DATA(n) (0x34 + (n) * 4)
#define NTB_EPF_DB_OFFSET(n) (0xB4 + (n) * 4)
+#define NTB_EPF_MW_OFFSET(n) (0x134 + (n) * 4)
+#define NTB_EPF_MW_SIZE(n) (0x144 + (n) * 4)
+
+#define NTB_EPF_CTRL_VERSION_V1 1
#define NTB_EPF_MIN_DB_COUNT 3
#define NTB_EPF_MAX_DB_COUNT 31
@@ -451,11 +455,12 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
phys_addr_t *base, resource_size_t *size)
{
struct ntb_epf_dev *ndev = ntb_ndev(ntb);
- u32 offset = 0;
+ resource_size_t bar_sz;
+ u32 offset, sz;
int bar;
- if (idx == 0)
- offset = readl(ndev->ctrl_reg + NTB_EPF_MW1_OFFSET);
+ offset = readl(ndev->ctrl_reg + NTB_EPF_MW_OFFSET(idx));
+ sz = readl(ndev->ctrl_reg + NTB_EPF_MW_SIZE(idx));
bar = ntb_epf_mw_to_bar(ndev, idx);
if (bar < 0)
@@ -464,8 +469,11 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
if (base)
*base = pci_resource_start(ndev->ntb.pdev, bar) + offset;
- if (size)
- *size = pci_resource_len(ndev->ntb.pdev, bar) - offset;
+ if (size) {
+ bar_sz = pci_resource_len(ndev->ntb.pdev, bar);
+ *size = sz ? min_t(resource_size_t, sz, bar_sz - offset)
+ : (bar_sz > offset ? bar_sz - offset : 0);
+ }
return 0;
}
@@ -547,6 +555,24 @@ static inline void ntb_epf_init_struct(struct ntb_epf_dev *ndev,
ndev->ntb.ops = &ntb_epf_ops;
}
+static int ntb_epf_check_version(struct ntb_epf_dev *ndev)
+{
+ struct device *dev = ndev->dev;
+ u32 ver;
+
+ ver = readl(ndev->ctrl_reg + NTB_EPF_CTRL_VERSION);
+
+ switch (ver) {
+ case NTB_EPF_CTRL_VERSION_V1:
+ break;
+ default:
+ dev_err(dev, "Unsupported NTB EPF version %u\n", ver);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
{
struct device *dev = ndev->dev;
@@ -695,6 +721,10 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
return ret;
}
+ ret = ntb_epf_check_version(ndev);
+ if (ret)
+ return ret;
+
ret = ntb_epf_init_dev(ndev);
if (ret) {
dev_err(dev, "Failed to init device\n");
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 56aab5d354d6..4dfb3e40dffa 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -39,6 +39,7 @@
#include <linux/atomic.h>
#include <linux/delay.h>
#include <linux/io.h>
+#include <linux/log2.h>
#include <linux/module.h>
#include <linux/slab.h>
@@ -61,6 +62,7 @@ static struct workqueue_struct *kpcintb_workqueue;
#define LINK_STATUS_UP BIT(0)
+#define CTRL_VERSION 1
#define SPAD_COUNT 64
#define DB_COUNT 4
#define NTB_MW_OFFSET 2
@@ -107,7 +109,7 @@ struct epf_ntb_ctrl {
u32 argument;
u16 command_status;
u16 link_status;
- u32 topology;
+ u32 version;
u64 addr;
u64 size;
u32 num_mws;
@@ -117,6 +119,8 @@ struct epf_ntb_ctrl {
u32 db_entry_size;
u32 db_data[MAX_DB_COUNT];
u32 db_offset[MAX_DB_COUNT];
+ u32 mw_offset[MAX_MW];
+ u32 mw_size[MAX_MW];
} __packed;
struct epf_ntb {
@@ -128,6 +132,7 @@ struct epf_ntb {
u32 db_count;
u32 spad_count;
u64 mws_size[MAX_MW];
+ u64 mws_offset[MAX_MW];
atomic64_t db;
u32 vbus_number;
u16 vntb_pid;
@@ -454,10 +459,13 @@ static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
ntb->reg = base;
ctrl = ntb->reg;
+ ctrl->version = CTRL_VERSION;
ctrl->spad_offset = ctrl_size;
ctrl->spad_count = spad_count;
ctrl->num_mws = ntb->num_mws;
+ memset(ctrl->mw_offset, 0, sizeof(ctrl->mw_offset));
+ memset(ctrl->mw_size, 0, sizeof(ctrl->mw_size));
ntb->spad_size = spad_size;
ctrl->db_entry_size = sizeof(u32);
@@ -689,15 +697,31 @@ static void epf_ntb_db_bar_clear(struct epf_ntb *ntb)
*/
static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
{
+ struct device *dev = &ntb->epf->dev;
+ u64 bar_ends[BAR_5 + 1] = { 0 };
+ unsigned long bars_used = 0;
+ enum pci_barno barno;
+ u64 off, size, end;
int ret = 0;
int i;
- u64 size;
- enum pci_barno barno;
- struct device *dev = &ntb->epf->dev;
for (i = 0; i < ntb->num_mws; i++) {
- size = ntb->mws_size[i];
barno = ntb->epf_ntb_bar[BAR_MW1 + i];
+ off = ntb->mws_offset[i];
+ size = ntb->mws_size[i];
+ end = off + size;
+ if (end > bar_ends[barno])
+ bar_ends[barno] = end;
+ bars_used |= BIT(barno);
+ }
+
+ for (barno = BAR_0; barno <= BAR_5; barno++) {
+ if (!(bars_used & BIT(barno)))
+ continue;
+ if (bar_ends[barno] < SZ_4K)
+ size = SZ_4K;
+ else
+ size = roundup_pow_of_two(bar_ends[barno]);
ntb->epf->bar[barno].barno = barno;
ntb->epf->bar[barno].size = size;
@@ -713,8 +737,12 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
&ntb->epf->bar[barno]);
if (ret) {
dev_err(dev, "MW set failed\n");
- goto err_alloc_mem;
+ goto err_set_bar;
}
+ }
+
+ for (i = 0; i < ntb->num_mws; i++) {
+ size = ntb->mws_size[i];
/* Allocate EPC outbound memory windows to vpci vntb device */
ntb->vpci_mw_addr[i] = pci_epc_mem_alloc_addr(ntb->epf->epc,
@@ -723,19 +751,31 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
if (!ntb->vpci_mw_addr[i]) {
ret = -ENOMEM;
dev_err(dev, "Failed to allocate source address\n");
- goto err_set_bar;
+ goto err_alloc_mem;
}
}
+ for (i = 0; i < ntb->num_mws; i++) {
+ ntb->reg->mw_offset[i] = (u32)ntb->mws_offset[i];
+ ntb->reg->mw_size[i] = (u32)ntb->mws_size[i];
+ }
+
return ret;
-err_set_bar:
- pci_epc_clear_bar(ntb->epf->epc,
- ntb->epf->func_no,
- ntb->epf->vfunc_no,
- &ntb->epf->bar[barno]);
err_alloc_mem:
- epf_ntb_mw_bar_clear(ntb, i);
+ while (--i >= 0)
+ pci_epc_mem_free_addr(ntb->epf->epc,
+ ntb->vpci_mw_phy[i],
+ ntb->vpci_mw_addr[i],
+ ntb->mws_size[i]);
+err_set_bar:
+ while (--barno >= BAR_0)
+ if (bars_used & BIT(barno))
+ pci_epc_clear_bar(ntb->epf->epc,
+ ntb->epf->func_no,
+ ntb->epf->vfunc_no,
+ &ntb->epf->bar[barno]);
+
return ret;
}
@@ -1040,6 +1080,60 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
return len; \
}
+#define EPF_NTB_MW_OFF_R(_name) \
+static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
+ char *page) \
+{ \
+ struct config_group *group = to_config_group(item); \
+ struct epf_ntb *ntb = to_epf_ntb(group); \
+ struct device *dev = &ntb->epf->dev; \
+ int win_no, idx; \
+ \
+ if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
+ return -EINVAL; \
+ \
+ idx = win_no - 1; \
+ if (idx < 0 || idx >= ntb->num_mws) { \
+ dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
+ win_no, ntb->num_mws); \
+ return -EINVAL; \
+ } \
+ \
+ idx = array_index_nospec(idx, ntb->num_mws); \
+ return sprintf(page, "%llu\n", ntb->mws_offset[idx]); \
+}
+
+#define EPF_NTB_MW_OFF_W(_name) \
+static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
+ const char *page, size_t len) \
+{ \
+ struct config_group *group = to_config_group(item); \
+ struct epf_ntb *ntb = to_epf_ntb(group); \
+ struct device *dev = &ntb->epf->dev; \
+ int win_no, idx; \
+ u64 val; \
+ int ret; \
+ \
+ ret = kstrtou64(page, 0, &val); \
+ if (ret) \
+ return ret; \
+ \
+ if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
+ return -EINVAL; \
+ \
+ idx = win_no - 1; \
+ if (idx < 0 || idx >= ntb->num_mws) { \
+ dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
+ win_no, ntb->num_mws); \
+ return -EINVAL; \
+ } \
+ \
+ idx = array_index_nospec(idx, ntb->num_mws); \
+ ntb->mws_offset[idx] = val; \
+ \
+ return len; \
+}
+
#define EPF_NTB_BAR_R(_name, _id) \
static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
char *page) \
@@ -1110,6 +1204,14 @@ EPF_NTB_MW_R(mw3)
EPF_NTB_MW_W(mw3)
EPF_NTB_MW_R(mw4)
EPF_NTB_MW_W(mw4)
+EPF_NTB_MW_OFF_R(mw1_offset)
+EPF_NTB_MW_OFF_W(mw1_offset)
+EPF_NTB_MW_OFF_R(mw2_offset)
+EPF_NTB_MW_OFF_W(mw2_offset)
+EPF_NTB_MW_OFF_R(mw3_offset)
+EPF_NTB_MW_OFF_W(mw3_offset)
+EPF_NTB_MW_OFF_R(mw4_offset)
+EPF_NTB_MW_OFF_W(mw4_offset)
EPF_NTB_BAR_R(ctrl_bar, BAR_CONFIG)
EPF_NTB_BAR_W(ctrl_bar, BAR_CONFIG)
EPF_NTB_BAR_R(db_bar, BAR_DB)
@@ -1130,6 +1232,10 @@ CONFIGFS_ATTR(epf_ntb_, mw1);
CONFIGFS_ATTR(epf_ntb_, mw2);
CONFIGFS_ATTR(epf_ntb_, mw3);
CONFIGFS_ATTR(epf_ntb_, mw4);
+CONFIGFS_ATTR(epf_ntb_, mw1_offset);
+CONFIGFS_ATTR(epf_ntb_, mw2_offset);
+CONFIGFS_ATTR(epf_ntb_, mw3_offset);
+CONFIGFS_ATTR(epf_ntb_, mw4_offset);
CONFIGFS_ATTR(epf_ntb_, vbus_number);
CONFIGFS_ATTR(epf_ntb_, vntb_pid);
CONFIGFS_ATTR(epf_ntb_, vntb_vid);
@@ -1148,6 +1254,10 @@ static struct configfs_attribute *epf_ntb_attrs[] = {
&epf_ntb_attr_mw2,
&epf_ntb_attr_mw3,
&epf_ntb_attr_mw4,
+ &epf_ntb_attr_mw1_offset,
+ &epf_ntb_attr_mw2_offset,
+ &epf_ntb_attr_mw3_offset,
+ &epf_ntb_attr_mw4_offset,
&epf_ntb_attr_vbus_number,
&epf_ntb_attr_vntb_pid,
&epf_ntb_attr_vntb_vid,
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning
2025-12-17 15:15 ` [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
@ 2025-12-19 3:19 ` Frank Li
2025-12-19 7:23 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 3:19 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:36AM +0900, Koichiro Den wrote:
> Introduce new mwN_offset configfs attributes to specify memory window
> offsets. This enables mapping multiple windows into a single BAR at
> arbitrary offsets, improving layout flexibility.
>
> Extend the control register region and add a 32-bit config version
> field. Reuse NTB_EPF_TOPOLOGY (0x0C), which is currently unused, as the
> version register. The endpoint function driver writes 1
> (NTB_EPF_CTRL_VERSION_V1), and ntb_hw_epf reads it at probe time and
> refuses to bind to unknown versions.
>
> Endpoint running with an older kernel that do not program
Is it zero if EP have not program it?
> NTB_EPF_CTRL_VERSION will be rejected early by host with newer kernel,
> instead of misbehaving at runtime.
If old one is 0, try best to compatible with old version.
Frank
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/ntb/hw/epf/ntb_hw_epf.c | 44 +++++-
> drivers/pci/endpoint/functions/pci-epf-vntb.c | 136 ++++++++++++++++--
> 2 files changed, 160 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> index d3ecf25a5162..126ba38e32ea 100644
> --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> @@ -30,18 +30,22 @@
> #define NTB_EPF_LINK_STATUS 0x0A
> #define LINK_STATUS_UP BIT(0)
>
> -#define NTB_EPF_TOPOLOGY 0x0C
> +/* 0x24 (32bit) is unused */
> +#define NTB_EPF_CTRL_VERSION 0x0C
> #define NTB_EPF_LOWER_ADDR 0x10
> #define NTB_EPF_UPPER_ADDR 0x14
> #define NTB_EPF_LOWER_SIZE 0x18
> #define NTB_EPF_UPPER_SIZE 0x1C
> #define NTB_EPF_MW_COUNT 0x20
> -#define NTB_EPF_MW1_OFFSET 0x24
> #define NTB_EPF_SPAD_OFFSET 0x28
> #define NTB_EPF_SPAD_COUNT 0x2C
> #define NTB_EPF_DB_ENTRY_SIZE 0x30
> #define NTB_EPF_DB_DATA(n) (0x34 + (n) * 4)
> #define NTB_EPF_DB_OFFSET(n) (0xB4 + (n) * 4)
> +#define NTB_EPF_MW_OFFSET(n) (0x134 + (n) * 4)
> +#define NTB_EPF_MW_SIZE(n) (0x144 + (n) * 4)
> +
> +#define NTB_EPF_CTRL_VERSION_V1 1
>
> #define NTB_EPF_MIN_DB_COUNT 3
> #define NTB_EPF_MAX_DB_COUNT 31
> @@ -451,11 +455,12 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
> phys_addr_t *base, resource_size_t *size)
> {
> struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> - u32 offset = 0;
> + resource_size_t bar_sz;
> + u32 offset, sz;
> int bar;
>
> - if (idx == 0)
> - offset = readl(ndev->ctrl_reg + NTB_EPF_MW1_OFFSET);
> + offset = readl(ndev->ctrl_reg + NTB_EPF_MW_OFFSET(idx));
> + sz = readl(ndev->ctrl_reg + NTB_EPF_MW_SIZE(idx));
>
> bar = ntb_epf_mw_to_bar(ndev, idx);
> if (bar < 0)
> @@ -464,8 +469,11 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
> if (base)
> *base = pci_resource_start(ndev->ntb.pdev, bar) + offset;
>
> - if (size)
> - *size = pci_resource_len(ndev->ntb.pdev, bar) - offset;
> + if (size) {
> + bar_sz = pci_resource_len(ndev->ntb.pdev, bar);
> + *size = sz ? min_t(resource_size_t, sz, bar_sz - offset)
> + : (bar_sz > offset ? bar_sz - offset : 0);
> + }
>
> return 0;
> }
> @@ -547,6 +555,24 @@ static inline void ntb_epf_init_struct(struct ntb_epf_dev *ndev,
> ndev->ntb.ops = &ntb_epf_ops;
> }
>
> +static int ntb_epf_check_version(struct ntb_epf_dev *ndev)
> +{
> + struct device *dev = ndev->dev;
> + u32 ver;
> +
> + ver = readl(ndev->ctrl_reg + NTB_EPF_CTRL_VERSION);
> +
> + switch (ver) {
> + case NTB_EPF_CTRL_VERSION_V1:
> + break;
> + default:
> + dev_err(dev, "Unsupported NTB EPF version %u\n", ver);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
> {
> struct device *dev = ndev->dev;
> @@ -695,6 +721,10 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
> return ret;
> }
>
> + ret = ntb_epf_check_version(ndev);
> + if (ret)
> + return ret;
> +
> ret = ntb_epf_init_dev(ndev);
> if (ret) {
> dev_err(dev, "Failed to init device\n");
> diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> index 56aab5d354d6..4dfb3e40dffa 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> @@ -39,6 +39,7 @@
> #include <linux/atomic.h>
> #include <linux/delay.h>
> #include <linux/io.h>
> +#include <linux/log2.h>
> #include <linux/module.h>
> #include <linux/slab.h>
>
> @@ -61,6 +62,7 @@ static struct workqueue_struct *kpcintb_workqueue;
>
> #define LINK_STATUS_UP BIT(0)
>
> +#define CTRL_VERSION 1
> #define SPAD_COUNT 64
> #define DB_COUNT 4
> #define NTB_MW_OFFSET 2
> @@ -107,7 +109,7 @@ struct epf_ntb_ctrl {
> u32 argument;
> u16 command_status;
> u16 link_status;
> - u32 topology;
> + u32 version;
> u64 addr;
> u64 size;
> u32 num_mws;
> @@ -117,6 +119,8 @@ struct epf_ntb_ctrl {
> u32 db_entry_size;
> u32 db_data[MAX_DB_COUNT];
> u32 db_offset[MAX_DB_COUNT];
> + u32 mw_offset[MAX_MW];
> + u32 mw_size[MAX_MW];
> } __packed;
>
> struct epf_ntb {
> @@ -128,6 +132,7 @@ struct epf_ntb {
> u32 db_count;
> u32 spad_count;
> u64 mws_size[MAX_MW];
> + u64 mws_offset[MAX_MW];
> atomic64_t db;
> u32 vbus_number;
> u16 vntb_pid;
> @@ -454,10 +459,13 @@ static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
> ntb->reg = base;
>
> ctrl = ntb->reg;
> + ctrl->version = CTRL_VERSION;
> ctrl->spad_offset = ctrl_size;
>
> ctrl->spad_count = spad_count;
> ctrl->num_mws = ntb->num_mws;
> + memset(ctrl->mw_offset, 0, sizeof(ctrl->mw_offset));
> + memset(ctrl->mw_size, 0, sizeof(ctrl->mw_size));
> ntb->spad_size = spad_size;
>
> ctrl->db_entry_size = sizeof(u32);
> @@ -689,15 +697,31 @@ static void epf_ntb_db_bar_clear(struct epf_ntb *ntb)
> */
> static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> {
> + struct device *dev = &ntb->epf->dev;
> + u64 bar_ends[BAR_5 + 1] = { 0 };
> + unsigned long bars_used = 0;
> + enum pci_barno barno;
> + u64 off, size, end;
> int ret = 0;
> int i;
> - u64 size;
> - enum pci_barno barno;
> - struct device *dev = &ntb->epf->dev;
>
> for (i = 0; i < ntb->num_mws; i++) {
> - size = ntb->mws_size[i];
> barno = ntb->epf_ntb_bar[BAR_MW1 + i];
> + off = ntb->mws_offset[i];
> + size = ntb->mws_size[i];
> + end = off + size;
> + if (end > bar_ends[barno])
> + bar_ends[barno] = end;
> + bars_used |= BIT(barno);
> + }
> +
> + for (barno = BAR_0; barno <= BAR_5; barno++) {
> + if (!(bars_used & BIT(barno)))
> + continue;
> + if (bar_ends[barno] < SZ_4K)
> + size = SZ_4K;
> + else
> + size = roundup_pow_of_two(bar_ends[barno]);
>
> ntb->epf->bar[barno].barno = barno;
> ntb->epf->bar[barno].size = size;
> @@ -713,8 +737,12 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> &ntb->epf->bar[barno]);
> if (ret) {
> dev_err(dev, "MW set failed\n");
> - goto err_alloc_mem;
> + goto err_set_bar;
> }
> + }
> +
> + for (i = 0; i < ntb->num_mws; i++) {
> + size = ntb->mws_size[i];
>
> /* Allocate EPC outbound memory windows to vpci vntb device */
> ntb->vpci_mw_addr[i] = pci_epc_mem_alloc_addr(ntb->epf->epc,
> @@ -723,19 +751,31 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> if (!ntb->vpci_mw_addr[i]) {
> ret = -ENOMEM;
> dev_err(dev, "Failed to allocate source address\n");
> - goto err_set_bar;
> + goto err_alloc_mem;
> }
> }
>
> + for (i = 0; i < ntb->num_mws; i++) {
> + ntb->reg->mw_offset[i] = (u32)ntb->mws_offset[i];
> + ntb->reg->mw_size[i] = (u32)ntb->mws_size[i];
> + }
> +
> return ret;
>
> -err_set_bar:
> - pci_epc_clear_bar(ntb->epf->epc,
> - ntb->epf->func_no,
> - ntb->epf->vfunc_no,
> - &ntb->epf->bar[barno]);
> err_alloc_mem:
> - epf_ntb_mw_bar_clear(ntb, i);
> + while (--i >= 0)
> + pci_epc_mem_free_addr(ntb->epf->epc,
> + ntb->vpci_mw_phy[i],
> + ntb->vpci_mw_addr[i],
> + ntb->mws_size[i]);
> +err_set_bar:
> + while (--barno >= BAR_0)
> + if (bars_used & BIT(barno))
> + pci_epc_clear_bar(ntb->epf->epc,
> + ntb->epf->func_no,
> + ntb->epf->vfunc_no,
> + &ntb->epf->bar[barno]);
> +
> return ret;
> }
>
> @@ -1040,6 +1080,60 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> return len; \
> }
>
> +#define EPF_NTB_MW_OFF_R(_name) \
> +static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
> + char *page) \
> +{ \
> + struct config_group *group = to_config_group(item); \
> + struct epf_ntb *ntb = to_epf_ntb(group); \
> + struct device *dev = &ntb->epf->dev; \
> + int win_no, idx; \
> + \
> + if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
> + return -EINVAL; \
> + \
> + idx = win_no - 1; \
> + if (idx < 0 || idx >= ntb->num_mws) { \
> + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> + win_no, ntb->num_mws); \
> + return -EINVAL; \
> + } \
> + \
> + idx = array_index_nospec(idx, ntb->num_mws); \
> + return sprintf(page, "%llu\n", ntb->mws_offset[idx]); \
> +}
> +
> +#define EPF_NTB_MW_OFF_W(_name) \
> +static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> + const char *page, size_t len) \
> +{ \
> + struct config_group *group = to_config_group(item); \
> + struct epf_ntb *ntb = to_epf_ntb(group); \
> + struct device *dev = &ntb->epf->dev; \
> + int win_no, idx; \
> + u64 val; \
> + int ret; \
> + \
> + ret = kstrtou64(page, 0, &val); \
> + if (ret) \
> + return ret; \
> + \
> + if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
> + return -EINVAL; \
> + \
> + idx = win_no - 1; \
> + if (idx < 0 || idx >= ntb->num_mws) { \
> + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> + win_no, ntb->num_mws); \
> + return -EINVAL; \
> + } \
> + \
> + idx = array_index_nospec(idx, ntb->num_mws); \
> + ntb->mws_offset[idx] = val; \
> + \
> + return len; \
> +}
> +
> #define EPF_NTB_BAR_R(_name, _id) \
> static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
> char *page) \
> @@ -1110,6 +1204,14 @@ EPF_NTB_MW_R(mw3)
> EPF_NTB_MW_W(mw3)
> EPF_NTB_MW_R(mw4)
> EPF_NTB_MW_W(mw4)
> +EPF_NTB_MW_OFF_R(mw1_offset)
> +EPF_NTB_MW_OFF_W(mw1_offset)
> +EPF_NTB_MW_OFF_R(mw2_offset)
> +EPF_NTB_MW_OFF_W(mw2_offset)
> +EPF_NTB_MW_OFF_R(mw3_offset)
> +EPF_NTB_MW_OFF_W(mw3_offset)
> +EPF_NTB_MW_OFF_R(mw4_offset)
> +EPF_NTB_MW_OFF_W(mw4_offset)
> EPF_NTB_BAR_R(ctrl_bar, BAR_CONFIG)
> EPF_NTB_BAR_W(ctrl_bar, BAR_CONFIG)
> EPF_NTB_BAR_R(db_bar, BAR_DB)
> @@ -1130,6 +1232,10 @@ CONFIGFS_ATTR(epf_ntb_, mw1);
> CONFIGFS_ATTR(epf_ntb_, mw2);
> CONFIGFS_ATTR(epf_ntb_, mw3);
> CONFIGFS_ATTR(epf_ntb_, mw4);
> +CONFIGFS_ATTR(epf_ntb_, mw1_offset);
> +CONFIGFS_ATTR(epf_ntb_, mw2_offset);
> +CONFIGFS_ATTR(epf_ntb_, mw3_offset);
> +CONFIGFS_ATTR(epf_ntb_, mw4_offset);
> CONFIGFS_ATTR(epf_ntb_, vbus_number);
> CONFIGFS_ATTR(epf_ntb_, vntb_pid);
> CONFIGFS_ATTR(epf_ntb_, vntb_vid);
> @@ -1148,6 +1254,10 @@ static struct configfs_attribute *epf_ntb_attrs[] = {
> &epf_ntb_attr_mw2,
> &epf_ntb_attr_mw3,
> &epf_ntb_attr_mw4,
> + &epf_ntb_attr_mw1_offset,
> + &epf_ntb_attr_mw2_offset,
> + &epf_ntb_attr_mw3_offset,
> + &epf_ntb_attr_mw4_offset,
> &epf_ntb_attr_vbus_number,
> &epf_ntb_attr_vntb_pid,
> &epf_ntb_attr_vntb_vid,
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning
2025-12-19 3:19 ` Frank Li
@ 2025-12-19 7:23 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-19 7:23 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 10:19:35PM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:36AM +0900, Koichiro Den wrote:
> > Introduce new mwN_offset configfs attributes to specify memory window
> > offsets. This enables mapping multiple windows into a single BAR at
> > arbitrary offsets, improving layout flexibility.
> >
> > Extend the control register region and add a 32-bit config version
> > field. Reuse NTB_EPF_TOPOLOGY (0x0C), which is currently unused, as the
> > version register. The endpoint function driver writes 1
> > (NTB_EPF_CTRL_VERSION_V1), and ntb_hw_epf reads it at probe time and
> > refuses to bind to unknown versions.
> >
> > Endpoint running with an older kernel that do not program
>
> Is it zero if EP have not program it?
>
> > NTB_EPF_CTRL_VERSION will be rejected early by host with newer kernel,
> > instead of misbehaving at runtime.
>
> If old one is 0, try best to compatible with old version.
Ok, I'll do so. (If the overall direction of this RFC v3 will be agreed
upon, it will be addressed as part of a smaller patchset maybe.)
Thanks for the review,
Koichiro
>
> Frank
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > drivers/ntb/hw/epf/ntb_hw_epf.c | 44 +++++-
> > drivers/pci/endpoint/functions/pci-epf-vntb.c | 136 ++++++++++++++++--
> > 2 files changed, 160 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > index d3ecf25a5162..126ba38e32ea 100644
> > --- a/drivers/ntb/hw/epf/ntb_hw_epf.c
> > +++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
> > @@ -30,18 +30,22 @@
> > #define NTB_EPF_LINK_STATUS 0x0A
> > #define LINK_STATUS_UP BIT(0)
> >
> > -#define NTB_EPF_TOPOLOGY 0x0C
> > +/* 0x24 (32bit) is unused */
> > +#define NTB_EPF_CTRL_VERSION 0x0C
> > #define NTB_EPF_LOWER_ADDR 0x10
> > #define NTB_EPF_UPPER_ADDR 0x14
> > #define NTB_EPF_LOWER_SIZE 0x18
> > #define NTB_EPF_UPPER_SIZE 0x1C
> > #define NTB_EPF_MW_COUNT 0x20
> > -#define NTB_EPF_MW1_OFFSET 0x24
> > #define NTB_EPF_SPAD_OFFSET 0x28
> > #define NTB_EPF_SPAD_COUNT 0x2C
> > #define NTB_EPF_DB_ENTRY_SIZE 0x30
> > #define NTB_EPF_DB_DATA(n) (0x34 + (n) * 4)
> > #define NTB_EPF_DB_OFFSET(n) (0xB4 + (n) * 4)
> > +#define NTB_EPF_MW_OFFSET(n) (0x134 + (n) * 4)
> > +#define NTB_EPF_MW_SIZE(n) (0x144 + (n) * 4)
> > +
> > +#define NTB_EPF_CTRL_VERSION_V1 1
> >
> > #define NTB_EPF_MIN_DB_COUNT 3
> > #define NTB_EPF_MAX_DB_COUNT 31
> > @@ -451,11 +455,12 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
> > phys_addr_t *base, resource_size_t *size)
> > {
> > struct ntb_epf_dev *ndev = ntb_ndev(ntb);
> > - u32 offset = 0;
> > + resource_size_t bar_sz;
> > + u32 offset, sz;
> > int bar;
> >
> > - if (idx == 0)
> > - offset = readl(ndev->ctrl_reg + NTB_EPF_MW1_OFFSET);
> > + offset = readl(ndev->ctrl_reg + NTB_EPF_MW_OFFSET(idx));
> > + sz = readl(ndev->ctrl_reg + NTB_EPF_MW_SIZE(idx));
> >
> > bar = ntb_epf_mw_to_bar(ndev, idx);
> > if (bar < 0)
> > @@ -464,8 +469,11 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
> > if (base)
> > *base = pci_resource_start(ndev->ntb.pdev, bar) + offset;
> >
> > - if (size)
> > - *size = pci_resource_len(ndev->ntb.pdev, bar) - offset;
> > + if (size) {
> > + bar_sz = pci_resource_len(ndev->ntb.pdev, bar);
> > + *size = sz ? min_t(resource_size_t, sz, bar_sz - offset)
> > + : (bar_sz > offset ? bar_sz - offset : 0);
> > + }
> >
> > return 0;
> > }
> > @@ -547,6 +555,24 @@ static inline void ntb_epf_init_struct(struct ntb_epf_dev *ndev,
> > ndev->ntb.ops = &ntb_epf_ops;
> > }
> >
> > +static int ntb_epf_check_version(struct ntb_epf_dev *ndev)
> > +{
> > + struct device *dev = ndev->dev;
> > + u32 ver;
> > +
> > + ver = readl(ndev->ctrl_reg + NTB_EPF_CTRL_VERSION);
> > +
> > + switch (ver) {
> > + case NTB_EPF_CTRL_VERSION_V1:
> > + break;
> > + default:
> > + dev_err(dev, "Unsupported NTB EPF version %u\n", ver);
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
> > {
> > struct device *dev = ndev->dev;
> > @@ -695,6 +721,10 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
> > return ret;
> > }
> >
> > + ret = ntb_epf_check_version(ndev);
> > + if (ret)
> > + return ret;
> > +
> > ret = ntb_epf_init_dev(ndev);
> > if (ret) {
> > dev_err(dev, "Failed to init device\n");
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > index 56aab5d354d6..4dfb3e40dffa 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
> > @@ -39,6 +39,7 @@
> > #include <linux/atomic.h>
> > #include <linux/delay.h>
> > #include <linux/io.h>
> > +#include <linux/log2.h>
> > #include <linux/module.h>
> > #include <linux/slab.h>
> >
> > @@ -61,6 +62,7 @@ static struct workqueue_struct *kpcintb_workqueue;
> >
> > #define LINK_STATUS_UP BIT(0)
> >
> > +#define CTRL_VERSION 1
> > #define SPAD_COUNT 64
> > #define DB_COUNT 4
> > #define NTB_MW_OFFSET 2
> > @@ -107,7 +109,7 @@ struct epf_ntb_ctrl {
> > u32 argument;
> > u16 command_status;
> > u16 link_status;
> > - u32 topology;
> > + u32 version;
> > u64 addr;
> > u64 size;
> > u32 num_mws;
> > @@ -117,6 +119,8 @@ struct epf_ntb_ctrl {
> > u32 db_entry_size;
> > u32 db_data[MAX_DB_COUNT];
> > u32 db_offset[MAX_DB_COUNT];
> > + u32 mw_offset[MAX_MW];
> > + u32 mw_size[MAX_MW];
> > } __packed;
> >
> > struct epf_ntb {
> > @@ -128,6 +132,7 @@ struct epf_ntb {
> > u32 db_count;
> > u32 spad_count;
> > u64 mws_size[MAX_MW];
> > + u64 mws_offset[MAX_MW];
> > atomic64_t db;
> > u32 vbus_number;
> > u16 vntb_pid;
> > @@ -454,10 +459,13 @@ static int epf_ntb_config_spad_bar_alloc(struct epf_ntb *ntb)
> > ntb->reg = base;
> >
> > ctrl = ntb->reg;
> > + ctrl->version = CTRL_VERSION;
> > ctrl->spad_offset = ctrl_size;
> >
> > ctrl->spad_count = spad_count;
> > ctrl->num_mws = ntb->num_mws;
> > + memset(ctrl->mw_offset, 0, sizeof(ctrl->mw_offset));
> > + memset(ctrl->mw_size, 0, sizeof(ctrl->mw_size));
> > ntb->spad_size = spad_size;
> >
> > ctrl->db_entry_size = sizeof(u32);
> > @@ -689,15 +697,31 @@ static void epf_ntb_db_bar_clear(struct epf_ntb *ntb)
> > */
> > static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> > {
> > + struct device *dev = &ntb->epf->dev;
> > + u64 bar_ends[BAR_5 + 1] = { 0 };
> > + unsigned long bars_used = 0;
> > + enum pci_barno barno;
> > + u64 off, size, end;
> > int ret = 0;
> > int i;
> > - u64 size;
> > - enum pci_barno barno;
> > - struct device *dev = &ntb->epf->dev;
> >
> > for (i = 0; i < ntb->num_mws; i++) {
> > - size = ntb->mws_size[i];
> > barno = ntb->epf_ntb_bar[BAR_MW1 + i];
> > + off = ntb->mws_offset[i];
> > + size = ntb->mws_size[i];
> > + end = off + size;
> > + if (end > bar_ends[barno])
> > + bar_ends[barno] = end;
> > + bars_used |= BIT(barno);
> > + }
> > +
> > + for (barno = BAR_0; barno <= BAR_5; barno++) {
> > + if (!(bars_used & BIT(barno)))
> > + continue;
> > + if (bar_ends[barno] < SZ_4K)
> > + size = SZ_4K;
> > + else
> > + size = roundup_pow_of_two(bar_ends[barno]);
> >
> > ntb->epf->bar[barno].barno = barno;
> > ntb->epf->bar[barno].size = size;
> > @@ -713,8 +737,12 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> > &ntb->epf->bar[barno]);
> > if (ret) {
> > dev_err(dev, "MW set failed\n");
> > - goto err_alloc_mem;
> > + goto err_set_bar;
> > }
> > + }
> > +
> > + for (i = 0; i < ntb->num_mws; i++) {
> > + size = ntb->mws_size[i];
> >
> > /* Allocate EPC outbound memory windows to vpci vntb device */
> > ntb->vpci_mw_addr[i] = pci_epc_mem_alloc_addr(ntb->epf->epc,
> > @@ -723,19 +751,31 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
> > if (!ntb->vpci_mw_addr[i]) {
> > ret = -ENOMEM;
> > dev_err(dev, "Failed to allocate source address\n");
> > - goto err_set_bar;
> > + goto err_alloc_mem;
> > }
> > }
> >
> > + for (i = 0; i < ntb->num_mws; i++) {
> > + ntb->reg->mw_offset[i] = (u32)ntb->mws_offset[i];
> > + ntb->reg->mw_size[i] = (u32)ntb->mws_size[i];
> > + }
> > +
> > return ret;
> >
> > -err_set_bar:
> > - pci_epc_clear_bar(ntb->epf->epc,
> > - ntb->epf->func_no,
> > - ntb->epf->vfunc_no,
> > - &ntb->epf->bar[barno]);
> > err_alloc_mem:
> > - epf_ntb_mw_bar_clear(ntb, i);
> > + while (--i >= 0)
> > + pci_epc_mem_free_addr(ntb->epf->epc,
> > + ntb->vpci_mw_phy[i],
> > + ntb->vpci_mw_addr[i],
> > + ntb->mws_size[i]);
> > +err_set_bar:
> > + while (--barno >= BAR_0)
> > + if (bars_used & BIT(barno))
> > + pci_epc_clear_bar(ntb->epf->epc,
> > + ntb->epf->func_no,
> > + ntb->epf->vfunc_no,
> > + &ntb->epf->bar[barno]);
> > +
> > return ret;
> > }
> >
> > @@ -1040,6 +1080,60 @@ static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> > return len; \
> > }
> >
> > +#define EPF_NTB_MW_OFF_R(_name) \
> > +static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
> > + char *page) \
> > +{ \
> > + struct config_group *group = to_config_group(item); \
> > + struct epf_ntb *ntb = to_epf_ntb(group); \
> > + struct device *dev = &ntb->epf->dev; \
> > + int win_no, idx; \
> > + \
> > + if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
> > + return -EINVAL; \
> > + \
> > + idx = win_no - 1; \
> > + if (idx < 0 || idx >= ntb->num_mws) { \
> > + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> > + win_no, ntb->num_mws); \
> > + return -EINVAL; \
> > + } \
> > + \
> > + idx = array_index_nospec(idx, ntb->num_mws); \
> > + return sprintf(page, "%llu\n", ntb->mws_offset[idx]); \
> > +}
> > +
> > +#define EPF_NTB_MW_OFF_W(_name) \
> > +static ssize_t epf_ntb_##_name##_store(struct config_item *item, \
> > + const char *page, size_t len) \
> > +{ \
> > + struct config_group *group = to_config_group(item); \
> > + struct epf_ntb *ntb = to_epf_ntb(group); \
> > + struct device *dev = &ntb->epf->dev; \
> > + int win_no, idx; \
> > + u64 val; \
> > + int ret; \
> > + \
> > + ret = kstrtou64(page, 0, &val); \
> > + if (ret) \
> > + return ret; \
> > + \
> > + if (sscanf(#_name, "mw%d_offset", &win_no) != 1) \
> > + return -EINVAL; \
> > + \
> > + idx = win_no - 1; \
> > + if (idx < 0 || idx >= ntb->num_mws) { \
> > + dev_err(dev, "MW%d out of range (num_mws=%d)\n", \
> > + win_no, ntb->num_mws); \
> > + return -EINVAL; \
> > + } \
> > + \
> > + idx = array_index_nospec(idx, ntb->num_mws); \
> > + ntb->mws_offset[idx] = val; \
> > + \
> > + return len; \
> > +}
> > +
> > #define EPF_NTB_BAR_R(_name, _id) \
> > static ssize_t epf_ntb_##_name##_show(struct config_item *item, \
> > char *page) \
> > @@ -1110,6 +1204,14 @@ EPF_NTB_MW_R(mw3)
> > EPF_NTB_MW_W(mw3)
> > EPF_NTB_MW_R(mw4)
> > EPF_NTB_MW_W(mw4)
> > +EPF_NTB_MW_OFF_R(mw1_offset)
> > +EPF_NTB_MW_OFF_W(mw1_offset)
> > +EPF_NTB_MW_OFF_R(mw2_offset)
> > +EPF_NTB_MW_OFF_W(mw2_offset)
> > +EPF_NTB_MW_OFF_R(mw3_offset)
> > +EPF_NTB_MW_OFF_W(mw3_offset)
> > +EPF_NTB_MW_OFF_R(mw4_offset)
> > +EPF_NTB_MW_OFF_W(mw4_offset)
> > EPF_NTB_BAR_R(ctrl_bar, BAR_CONFIG)
> > EPF_NTB_BAR_W(ctrl_bar, BAR_CONFIG)
> > EPF_NTB_BAR_R(db_bar, BAR_DB)
> > @@ -1130,6 +1232,10 @@ CONFIGFS_ATTR(epf_ntb_, mw1);
> > CONFIGFS_ATTR(epf_ntb_, mw2);
> > CONFIGFS_ATTR(epf_ntb_, mw3);
> > CONFIGFS_ATTR(epf_ntb_, mw4);
> > +CONFIGFS_ATTR(epf_ntb_, mw1_offset);
> > +CONFIGFS_ATTR(epf_ntb_, mw2_offset);
> > +CONFIGFS_ATTR(epf_ntb_, mw3_offset);
> > +CONFIGFS_ATTR(epf_ntb_, mw4_offset);
> > CONFIGFS_ATTR(epf_ntb_, vbus_number);
> > CONFIGFS_ATTR(epf_ntb_, vntb_pid);
> > CONFIGFS_ATTR(epf_ntb_, vntb_vid);
> > @@ -1148,6 +1254,10 @@ static struct configfs_attribute *epf_ntb_attrs[] = {
> > &epf_ntb_attr_mw2,
> > &epf_ntb_attr_mw3,
> > &epf_ntb_attr_mw4,
> > + &epf_ntb_attr_mw1_offset,
> > + &epf_ntb_attr_mw2_offset,
> > + &epf_ntb_attr_mw3_offset,
> > + &epf_ntb_attr_mw4_offset,
> > &epf_ntb_attr_vbus_number,
> > &epf_ntb_attr_vntb_pid,
> > &epf_ntb_attr_vntb_vid,
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 01/35] PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[] access Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 02/35] NTB: epf: Add mwN_offset support and config region versioning Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-19 14:19 ` Frank Li
2025-12-17 15:15 ` [RFC PATCH v3 04/35] NTB: Add offset parameter to MW translation APIs Koichiro Den
` (32 subsequent siblings)
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Extend dw_pcie_ep_set_bar() to support Address Match Mode IB iATU
with the new 'submap' field in pci_epf_bar.
The existing dw_pcie_ep_inbound_atu(), which is for BAR match mode, is
renamed to dw_pcie_ep_ib_atu_bar() and the new dw_pcie_ep_ib_atu_addr()
is introduced, which is for Address match mode.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
.../pci/controller/dwc/pcie-designware-ep.c | 197 ++++++++++++++++--
drivers/pci/controller/dwc/pcie-designware.h | 2 +
drivers/pci/endpoint/pci-epc-core.c | 2 +-
include/linux/pci-epf.h | 27 +++
4 files changed, 215 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index e94cde1a3506..9480aebaa32a 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -139,9 +139,10 @@ static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
return 0;
}
-static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, u8 func_no, int type,
- dma_addr_t parent_bus_addr, enum pci_barno bar,
- size_t size)
+/* Bar match mode */
+static int dw_pcie_ep_ib_atu_bar(struct dw_pcie_ep *ep, u8 func_no, int type,
+ dma_addr_t parent_bus_addr, enum pci_barno bar,
+ size_t size)
{
int ret;
u32 free_win;
@@ -174,6 +175,151 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, u8 func_no, int type,
return 0;
}
+struct dw_pcie_ib_map {
+ struct list_head list;
+ enum pci_barno bar;
+ u64 pci_addr;
+ u64 parent_bus_addr;
+ u64 size;
+ u32 index;
+};
+
+static struct dw_pcie_ib_map *
+dw_pcie_ep_find_ib_map(struct dw_pcie_ep *ep, enum pci_barno bar, u64 pci_addr)
+{
+ struct dw_pcie_ib_map *m;
+
+ list_for_each_entry(m, &ep->ib_map_list, list) {
+ if (m->bar == bar && m->pci_addr == pci_addr)
+ return m;
+ }
+
+ return NULL;
+}
+
+static u64 dw_pcie_ep_read_bar_assigned(struct dw_pcie_ep *ep, u8 func_no,
+ enum pci_barno bar, int flags)
+{
+ u32 reg = PCI_BASE_ADDRESS_0 + (4 * bar);
+ u32 lo, hi;
+ u64 addr;
+
+ lo = dw_pcie_ep_readl_dbi(ep, func_no, reg);
+
+ if (flags & PCI_BASE_ADDRESS_SPACE)
+ return lo & PCI_BASE_ADDRESS_IO_MASK;
+
+ addr = lo & PCI_BASE_ADDRESS_MEM_MASK;
+ if (!(flags & PCI_BASE_ADDRESS_MEM_TYPE_64))
+ return addr;
+
+ hi = dw_pcie_ep_readl_dbi(ep, func_no, reg + 4);
+ return addr | ((u64)hi << 32);
+}
+
+/* Address match mode */
+static int dw_pcie_ep_ib_atu_addr(struct dw_pcie_ep *ep, u8 func_no, int type,
+ struct pci_epf_bar *epf_bar)
+{
+ struct pci_epf_bar_submap *submap = epf_bar->submap;
+ struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+ enum pci_barno bar = epf_bar->barno;
+ struct dw_pcie_ib_map *m, *new;
+ struct device *dev = pci->dev;
+ u64 pci_addr, parent_bus_addr;
+ u64 size, off, base;
+ unsigned long flags;
+ int free_win, ret;
+ u32 i;
+
+ if (!epf_bar->num_submap)
+ return 0;
+
+ if (!submap)
+ return -EINVAL;
+
+ base = dw_pcie_ep_read_bar_assigned(ep, func_no, bar, epf_bar->flags);
+ if (!base) {
+ dev_err(dev,
+ "BAR%u not assigned, cannot set up sub-range mappings\n",
+ bar);
+ return -EINVAL;
+ }
+
+ for (i = 0; i < epf_bar->num_submap; i++) {
+ off = submap[i].offset;
+ size = submap[i].size;
+ parent_bus_addr = submap[i].phys_addr;
+
+ if (!size)
+ continue;
+
+ if (off > (~0ULL) - base)
+ return -EINVAL;
+
+ pci_addr = base + off;
+
+ new = devm_kzalloc(dev, sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ spin_lock_irqsave(&ep->ib_map_lock, flags);
+ m = dw_pcie_ep_find_ib_map(ep, bar, pci_addr);
+ if (m) {
+ if (m->parent_bus_addr == parent_bus_addr &&
+ m->size == size) {
+ spin_unlock_irqrestore(&ep->ib_map_lock, flags);
+ devm_kfree(dev, new);
+ continue;
+ }
+
+ ret = dw_pcie_prog_inbound_atu(pci, m->index, type,
+ parent_bus_addr, pci_addr,
+ size);
+ if (!ret) {
+ m->parent_bus_addr = parent_bus_addr;
+ m->size = size;
+ }
+ spin_unlock_irqrestore(&ep->ib_map_lock, flags);
+ devm_kfree(dev, new);
+ if (ret)
+ return ret;
+ continue;
+ }
+
+ free_win = find_first_zero_bit(ep->ib_window_map,
+ pci->num_ib_windows);
+ if (free_win >= pci->num_ib_windows) {
+ spin_unlock_irqrestore(&ep->ib_map_lock, flags);
+ devm_kfree(dev, new);
+ return -ENOSPC;
+ }
+ set_bit(free_win, ep->ib_window_map);
+
+ new->bar = bar;
+ new->index = free_win;
+ new->pci_addr = pci_addr;
+ new->parent_bus_addr = parent_bus_addr;
+ new->size = size;
+ list_add_tail(&new->list, &ep->ib_map_list);
+
+ spin_unlock_irqrestore(&ep->ib_map_lock, flags);
+
+ ret = dw_pcie_prog_inbound_atu(pci, free_win, type,
+ parent_bus_addr, pci_addr, size);
+ if (ret) {
+ spin_lock_irqsave(&ep->ib_map_lock, flags);
+ list_del(&new->list);
+ clear_bit(free_win, ep->ib_window_map);
+ spin_unlock_irqrestore(&ep->ib_map_lock, flags);
+ devm_kfree(dev, new);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep,
struct dw_pcie_ob_atu_cfg *atu)
{
@@ -204,17 +350,34 @@ static void dw_pcie_ep_clear_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
struct dw_pcie_ep *ep = epc_get_drvdata(epc);
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
enum pci_barno bar = epf_bar->barno;
- u32 atu_index = ep->bar_to_atu[bar] - 1;
+ struct dw_pcie_ib_map *m, *tmp;
+ u32 atu_index;
- if (!ep->bar_to_atu[bar])
+ if (!ep->epf_bar[bar])
return;
__dw_pcie_ep_reset_bar(pci, func_no, bar, epf_bar->flags);
- dw_pcie_disable_atu(pci, PCIE_ATU_REGION_DIR_IB, atu_index);
- clear_bit(atu_index, ep->ib_window_map);
+ /* BAR match iATU */
+ if (ep->bar_to_atu[bar]) {
+ atu_index = ep->bar_to_atu[bar] - 1;
+ dw_pcie_disable_atu(pci, PCIE_ATU_REGION_DIR_IB, atu_index);
+ clear_bit(atu_index, ep->ib_window_map);
+ ep->bar_to_atu[bar] = 0;
+ }
+
+ /* Address match iATU */
+ guard(spinlock_irqsave)(&ep->ib_map_lock);
+ list_for_each_entry_safe(m, tmp, &ep->ib_map_list, list) {
+ if (m->bar != bar)
+ continue;
+ dw_pcie_disable_atu(pci, PCIE_ATU_REGION_DIR_IB, m->index);
+ clear_bit(m->index, ep->ib_window_map);
+ list_del(&m->list);
+ kfree(m);
+ }
+
ep->epf_bar[bar] = NULL;
- ep->bar_to_atu[bar] = 0;
}
static unsigned int dw_pcie_ep_get_rebar_offset(struct dw_pcie *pci,
@@ -364,10 +527,14 @@ static int dw_pcie_ep_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
/*
* We can only dynamically change a BAR if the new BAR size and
* BAR flags do not differ from the existing configuration.
+ * When 'use_submap' is true and the intention is to create
+ * sub-range mappings perhaps incrementally, epf_bar->size
+ * does not mean anything so no need to validate it.
*/
if (ep->epf_bar[bar]->barno != bar ||
- ep->epf_bar[bar]->size != size ||
- ep->epf_bar[bar]->flags != flags)
+ ep->epf_bar[bar]->flags != flags ||
+ ep->epf_bar[bar]->use_submap != epf_bar->use_submap ||
+ (!epf_bar->use_submap && ep->epf_bar[bar]->size != size))
return -EINVAL;
/*
@@ -408,8 +575,12 @@ static int dw_pcie_ep_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
else
type = PCIE_ATU_TYPE_IO;
- ret = dw_pcie_ep_inbound_atu(ep, func_no, type, epf_bar->phys_addr, bar,
- size);
+ if (epf_bar->use_submap)
+ ret = dw_pcie_ep_ib_atu_addr(ep, func_no, type, epf_bar);
+ else
+ ret = dw_pcie_ep_ib_atu_bar(ep, func_no, type,
+ epf_bar->phys_addr, bar, size);
+
if (ret)
return ret;
@@ -1120,6 +1291,8 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
struct device *dev = pci->dev;
INIT_LIST_HEAD(&ep->func_list);
+ INIT_LIST_HEAD(&ep->ib_map_list);
+ spin_lock_init(&ep->ib_map_lock);
ep->msi_iatu_mapped = false;
ep->msi_msg_addr = 0;
ep->msi_map_size = 0;
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index f555926a526e..1770a2318557 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -476,6 +476,8 @@ struct dw_pcie_ep {
phys_addr_t *outbound_addr;
unsigned long *ib_window_map;
unsigned long *ob_window_map;
+ struct list_head ib_map_list;
+ spinlock_t ib_map_lock;
void __iomem *msi_mem;
phys_addr_t msi_mem_phys;
struct pci_epf_bar *epf_bar[PCI_STD_NUM_BARS];
diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
index ca7f19cc973a..2b95dbc7242a 100644
--- a/drivers/pci/endpoint/pci-epc-core.c
+++ b/drivers/pci/endpoint/pci-epc-core.c
@@ -604,7 +604,7 @@ int pci_epc_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
(epc_features->bar[bar].fixed_size != epf_bar->size))
return -EINVAL;
- if (!is_power_of_2(epf_bar->size))
+ if (!epf_bar->num_submap && !is_power_of_2(epf_bar->size))
return -EINVAL;
if ((epf_bar->barno == BAR_5 && flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ||
diff --git a/include/linux/pci-epf.h b/include/linux/pci-epf.h
index 48f68c4dcfa5..126647b9f01e 100644
--- a/include/linux/pci-epf.h
+++ b/include/linux/pci-epf.h
@@ -110,6 +110,25 @@ struct pci_epf_driver {
#define to_pci_epf_driver(drv) container_of_const((drv), struct pci_epf_driver, driver)
+/**
+ * struct pci_epf_bar_submap - represents a BAR subrange for inbound mapping
+ * @phys_addr: physical address that should be mapped to the BAR subrange
+ * @size: the size of the subrange to be mapped
+ * @offset: The byte offset from the BAR base
+ * @mapped: Set to true if already mapped
+ *
+ * When @use_submap is set in struct pci_epf_bar, an EPF driver may describe
+ * multiple independent mappings within a single BAR. An EPC driver can use
+ * these descriptors to set up the required address translation (e.g. multiple
+ * inbound iATU regions) without requiring the whole BAR to be mapped at once.
+ */
+struct pci_epf_bar_submap {
+ dma_addr_t phys_addr;
+ size_t size;
+ size_t offset;
+ bool mapped;
+};
+
/**
* struct pci_epf_bar - represents the BAR of EPF device
* @phys_addr: physical address that should be mapped to the BAR
@@ -119,6 +138,9 @@ struct pci_epf_driver {
* requirement
* @barno: BAR number
* @flags: flags that are set for the BAR
+ * @use_submap: set true to request subrange mappings within this BAR
+ * @num_submap: number of entries in @submap
+ * @submap: array of subrange descriptors allocated by the caller
*/
struct pci_epf_bar {
dma_addr_t phys_addr;
@@ -127,6 +149,11 @@ struct pci_epf_bar {
size_t mem_size;
enum pci_barno barno;
int flags;
+
+ /* Optional sub-range mapping */
+ bool use_submap;
+ int num_submap;
+ struct pci_epf_bar_submap *submap;
};
/**
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU
2025-12-17 15:15 ` [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU Koichiro Den
@ 2025-12-19 14:19 ` Frank Li
2025-12-20 15:36 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 14:19 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:37AM +0900, Koichiro Den wrote:
> Extend dw_pcie_ep_set_bar() to support Address Match Mode IB iATU
> with the new 'submap' field in pci_epf_bar.
>
> The existing dw_pcie_ep_inbound_atu(), which is for BAR match mode, is
> renamed to dw_pcie_ep_ib_atu_bar() and the new dw_pcie_ep_ib_atu_addr()
> is introduced, which is for Address match mode.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> .../pci/controller/dwc/pcie-designware-ep.c | 197 ++++++++++++++++--
> drivers/pci/controller/dwc/pcie-designware.h | 2 +
> drivers/pci/endpoint/pci-epc-core.c | 2 +-
> include/linux/pci-epf.h | 27 +++
> 4 files changed, 215 insertions(+), 13 deletions(-)
>
...
>
> #define to_pci_epf_driver(drv) container_of_const((drv), struct pci_epf_driver, driver)
>
> +/**
> + * struct pci_epf_bar_submap - represents a BAR subrange for inbound mapping
> + * @phys_addr: physical address that should be mapped to the BAR subrange
> + * @size: the size of the subrange to be mapped
> + * @offset: The byte offset from the BAR base
> + * @mapped: Set to true if already mapped
> + *
> + * When @use_submap is set in struct pci_epf_bar, an EPF driver may describe
> + * multiple independent mappings within a single BAR. An EPC driver can use
> + * these descriptors to set up the required address translation (e.g. multiple
> + * inbound iATU regions) without requiring the whole BAR to be mapped at once.
> + */
> +struct pci_epf_bar_submap {
> + dma_addr_t phys_addr;
> + size_t size;
> + size_t offset;
> + bool mapped;
Can we move dw_pcie_ib_map's neccessary information to here, so needn't
addition list to map it? such as atu_index. if atu_index assign, which
should means mapped.
> +};
> +
> /**
> * struct pci_epf_bar - represents the BAR of EPF device
> * @phys_addr: physical address that should be mapped to the BAR
> @@ -119,6 +138,9 @@ struct pci_epf_driver {
> * requirement
> * @barno: BAR number
> * @flags: flags that are set for the BAR
> + * @use_submap: set true to request subrange mappings within this BAR
> + * @num_submap: number of entries in @submap
> + * @submap: array of subrange descriptors allocated by the caller
> */
> struct pci_epf_bar {
> dma_addr_t phys_addr;
> @@ -127,6 +149,11 @@ struct pci_epf_bar {
> size_t mem_size;
> enum pci_barno barno;
> int flags;
> +
> + /* Optional sub-range mapping */
> + bool use_submap;
> + int num_submap;
Can we use num_submap != 0 means request subrange?
Frank
> + struct pci_epf_bar_submap *submap;
> };
>
> /**
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU
2025-12-19 14:19 ` Frank Li
@ 2025-12-20 15:36 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-20 15:36 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Fri, Dec 19, 2025 at 09:19:26AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:37AM +0900, Koichiro Den wrote:
> > Extend dw_pcie_ep_set_bar() to support Address Match Mode IB iATU
> > with the new 'submap' field in pci_epf_bar.
> >
> > The existing dw_pcie_ep_inbound_atu(), which is for BAR match mode, is
> > renamed to dw_pcie_ep_ib_atu_bar() and the new dw_pcie_ep_ib_atu_addr()
> > is introduced, which is for Address match mode.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > .../pci/controller/dwc/pcie-designware-ep.c | 197 ++++++++++++++++--
> > drivers/pci/controller/dwc/pcie-designware.h | 2 +
> > drivers/pci/endpoint/pci-epc-core.c | 2 +-
> > include/linux/pci-epf.h | 27 +++
> > 4 files changed, 215 insertions(+), 13 deletions(-)
> >
> ...
> >
> > #define to_pci_epf_driver(drv) container_of_const((drv), struct pci_epf_driver, driver)
> >
> > +/**
> > + * struct pci_epf_bar_submap - represents a BAR subrange for inbound mapping
> > + * @phys_addr: physical address that should be mapped to the BAR subrange
> > + * @size: the size of the subrange to be mapped
> > + * @offset: The byte offset from the BAR base
> > + * @mapped: Set to true if already mapped
> > + *
> > + * When @use_submap is set in struct pci_epf_bar, an EPF driver may describe
> > + * multiple independent mappings within a single BAR. An EPC driver can use
> > + * these descriptors to set up the required address translation (e.g. multiple
> > + * inbound iATU regions) without requiring the whole BAR to be mapped at once.
> > + */
> > +struct pci_epf_bar_submap {
> > + dma_addr_t phys_addr;
> > + size_t size;
> > + size_t offset;
> > + bool mapped;
>
> Can we move dw_pcie_ib_map's neccessary information to here, so needn't
> addition list to map it? such as atu_index. if atu_index assign, which
> should means mapped.
The 'mapped' field in pci_epf_bar_submap is actually a leftover from an
early draft. I'll drop it, sorry for the confusion.
I would still prefer to keep the atu index in a private structure (ie.
dw_pcie_ib_map). pci_epf_bar_submap is part of the API and I think should
remain a declarative description of the requested sub-range mappings,
without exposing driver-internal state back to the caller.
>
> > +};
> > +
> > /**
> > * struct pci_epf_bar - represents the BAR of EPF device
> > * @phys_addr: physical address that should be mapped to the BAR
> > @@ -119,6 +138,9 @@ struct pci_epf_driver {
> > * requirement
> > * @barno: BAR number
> > * @flags: flags that are set for the BAR
> > + * @use_submap: set true to request subrange mappings within this BAR
> > + * @num_submap: number of entries in @submap
> > + * @submap: array of subrange descriptors allocated by the caller
> > */
> > struct pci_epf_bar {
> > dma_addr_t phys_addr;
> > @@ -127,6 +149,11 @@ struct pci_epf_bar {
> > size_t mem_size;
> > enum pci_barno barno;
> > int flags;
> > +
> > + /* Optional sub-range mapping */
> > + bool use_submap;
> > + int num_submap;
>
> Can we use num_submap != 0 means request subrange?
Some existing pci_epc_set_bar() callers seem to use a two-stage sequence,
ie. first they only initialize the BAR (with phys_addr == 0), and later
they program the actual BAR-match (re-)mapping (with phys_addr != 0).
If we used only num_submap != 0 as the discriminator, Address Match mode
initialization (num_submap == 0) would be indistinguishable from the
existing BAR-match initialization, and we could end up programming a
meaningless BAR-match mapping with phys_addr == 0. That's why I added an
explicit 'use_submap' flag in addition to 'num_submap'.
Koichiro
>
> Frank
> > + struct pci_epf_bar_submap *submap;
> > };
> >
> > /**
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 04/35] NTB: Add offset parameter to MW translation APIs
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (2 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 03/35] PCI: dwc: ep: Support BAR subrange inbound mapping via address match iATU Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 05/35] PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when present Koichiro Den
` (31 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Extend ntb_mw_set_trans() and ntb_mw_get_align() with an offset
argument. This supports subrange mapping inside a BAR for platforms that
require offset-based translations.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/amd/ntb_hw_amd.c | 6 ++++--
drivers/ntb/hw/epf/ntb_hw_epf.c | 6 ++++--
drivers/ntb/hw/idt/ntb_hw_idt.c | 3 ++-
drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 ++++--
drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 ++-
drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 ++++--
drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 ++++--
drivers/ntb/msi.c | 6 +++---
drivers/ntb/ntb_transport.c | 4 ++--
drivers/ntb/test/ntb_perf.c | 4 ++--
drivers/ntb/test/ntb_tool.c | 6 +++---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 5 +++--
include/linux/ntb.h | 18 +++++++++++-------
14 files changed, 49 insertions(+), 32 deletions(-)
diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 1a163596ddf5..c0137df413c4 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -92,7 +92,8 @@ static int amd_ntb_mw_count(struct ntb_dev *ntb, int pidx)
static int amd_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
int bar;
@@ -117,7 +118,8 @@ static int amd_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
}
static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
unsigned long xlat_reg, limit_reg = 0;
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 126ba38e32ea..89a536562abf 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -167,7 +167,8 @@ static int ntb_epf_mw_count(struct ntb_dev *ntb, int pidx)
static int ntb_epf_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct ntb_epf_dev *ndev = ntb_ndev(ntb);
struct device *dev = ndev->dev;
@@ -405,7 +406,8 @@ static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
}
static int ntb_epf_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct ntb_epf_dev *ndev = ntb_ndev(ntb);
struct device *dev = ndev->dev;
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index f27df8d7f3b9..8c2cf149b99b 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -1190,7 +1190,8 @@ static int idt_ntb_mw_count(struct ntb_dev *ntb, int pidx)
static int idt_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct idt_ntb_dev *ndev = to_ndev_ntb(ntb);
struct idt_ntb_peer *peer;
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen1.c b/drivers/ntb/hw/intel/ntb_hw_gen1.c
index 079b8cd79785..6cbbd6cdf4c0 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen1.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen1.c
@@ -804,7 +804,8 @@ int intel_ntb_mw_count(struct ntb_dev *ntb, int pidx)
int intel_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct intel_ntb_dev *ndev = ntb_ndev(ntb);
resource_size_t bar_size, mw_size;
@@ -840,7 +841,8 @@ int intel_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
}
static int intel_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct intel_ntb_dev *ndev = ntb_ndev(ntb);
unsigned long base_reg, xlat_reg, limit_reg;
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen1.h b/drivers/ntb/hw/intel/ntb_hw_gen1.h
index 344249fc18d1..f9ebd2780b7f 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen1.h
+++ b/drivers/ntb/hw/intel/ntb_hw_gen1.h
@@ -159,7 +159,7 @@ int ndev_mw_to_bar(struct intel_ntb_dev *ndev, int idx);
int intel_ntb_mw_count(struct ntb_dev *ntb, int pidx);
int intel_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align, resource_size_t *size_align,
- resource_size_t *size_max);
+ resource_size_t *size_max, resource_size_t *offset);
int intel_ntb_peer_mw_count(struct ntb_dev *ntb);
int intel_ntb_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
phys_addr_t *base, resource_size_t *size);
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen3.c b/drivers/ntb/hw/intel/ntb_hw_gen3.c
index a5aa96a31f4a..98722032ca5d 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen3.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen3.c
@@ -444,7 +444,8 @@ int intel_ntb3_link_enable(struct ntb_dev *ntb, enum ntb_speed max_speed,
return 0;
}
static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct intel_ntb_dev *ndev = ntb_ndev(ntb);
unsigned long xlat_reg, limit_reg;
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen4.c b/drivers/ntb/hw/intel/ntb_hw_gen4.c
index 22cac7975b3c..8df90ea04c7c 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen4.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen4.c
@@ -335,7 +335,8 @@ ssize_t ndev_ntb4_debugfs_read(struct file *filp, char __user *ubuf,
}
static int intel_ntb4_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct intel_ntb_dev *ndev = ntb_ndev(ntb);
unsigned long xlat_reg, limit_reg, idx_reg;
@@ -524,7 +525,8 @@ static int intel_ntb4_link_disable(struct ntb_dev *ntb)
static int intel_ntb4_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct intel_ntb_dev *ndev = ntb_ndev(ntb);
resource_size_t bar_size, mw_size;
diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
index e38540b92716..5d8bace78d4f 100644
--- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
+++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
@@ -191,7 +191,8 @@ static int peer_lut_index(struct switchtec_ntb *sndev, int mw_idx)
static int switchtec_ntb_mw_get_align(struct ntb_dev *ntb, int pidx,
int widx, resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct switchtec_ntb *sndev = ntb_sndev(ntb);
int lut;
@@ -268,7 +269,8 @@ static void switchtec_ntb_mw_set_lut(struct switchtec_ntb *sndev, int idx,
}
static int switchtec_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
struct switchtec_ntb *sndev = ntb_sndev(ntb);
struct ntb_ctrl_regs __iomem *ctl = sndev->mmio_peer_ctrl;
diff --git a/drivers/ntb/msi.c b/drivers/ntb/msi.c
index 6817d504c12a..8875bcbf2ea4 100644
--- a/drivers/ntb/msi.c
+++ b/drivers/ntb/msi.c
@@ -117,7 +117,7 @@ int ntb_msi_setup_mws(struct ntb_dev *ntb)
return peer_widx;
ret = ntb_mw_get_align(ntb, peer, peer_widx, &addr_align,
- NULL, NULL);
+ NULL, NULL, NULL);
if (ret)
return ret;
@@ -132,7 +132,7 @@ int ntb_msi_setup_mws(struct ntb_dev *ntb)
}
ret = ntb_mw_get_align(ntb, peer, peer_widx, NULL,
- &size_align, &size_max);
+ &size_align, &size_max, NULL);
if (ret)
goto error_out;
@@ -142,7 +142,7 @@ int ntb_msi_setup_mws(struct ntb_dev *ntb)
mw_min_size = mw_size;
ret = ntb_mw_set_trans(ntb, peer, peer_widx,
- addr, mw_size);
+ addr, mw_size, 0);
if (ret)
goto error_out;
}
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index d5a544bf8fd6..e16a8147ddc5 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -829,7 +829,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
return -EINVAL;
rc = ntb_mw_get_align(nt->ndev, PIDX, num_mw, &xlat_align,
- &xlat_align_size, NULL);
+ &xlat_align_size, NULL, NULL);
if (rc)
return rc;
@@ -864,7 +864,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
/* Notify HW the memory location of the receive buffer */
rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
- mw->xlat_size);
+ mw->xlat_size, 0);
if (rc) {
dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
ntb_free_mw(nt, num_mw);
diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index dfd175f79e8f..b842b69e4242 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -573,7 +573,7 @@ static int perf_setup_inbuf(struct perf_peer *peer)
/* Get inbound MW parameters */
ret = ntb_mw_get_align(perf->ntb, peer->pidx, perf->gidx,
- &xlat_align, &size_align, &size_max);
+ &xlat_align, &size_align, &size_max, NULL);
if (ret) {
dev_err(&perf->ntb->dev, "Couldn't get inbuf restrictions\n");
return ret;
@@ -604,7 +604,7 @@ static int perf_setup_inbuf(struct perf_peer *peer)
}
ret = ntb_mw_set_trans(perf->ntb, peer->pidx, peer->gidx,
- peer->inbuf_xlat, peer->inbuf_size);
+ peer->inbuf_xlat, peer->inbuf_size, 0);
if (ret) {
dev_err(&perf->ntb->dev, "Failed to set inbuf translation\n");
goto err_free_inbuf;
diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
index 641cb7e05a47..7a7ba486bba7 100644
--- a/drivers/ntb/test/ntb_tool.c
+++ b/drivers/ntb/test/ntb_tool.c
@@ -578,7 +578,7 @@ static int tool_setup_mw(struct tool_ctx *tc, int pidx, int widx,
return 0;
ret = ntb_mw_get_align(tc->ntb, pidx, widx, &addr_align,
- &size_align, &size);
+ &size_align, &size, NULL);
if (ret)
return ret;
@@ -595,7 +595,7 @@ static int tool_setup_mw(struct tool_ctx *tc, int pidx, int widx,
goto err_free_dma;
}
- ret = ntb_mw_set_trans(tc->ntb, pidx, widx, inmw->dma_base, inmw->size);
+ ret = ntb_mw_set_trans(tc->ntb, pidx, widx, inmw->dma_base, inmw->size, 0);
if (ret)
goto err_free_dma;
@@ -652,7 +652,7 @@ static ssize_t tool_mw_trans_read(struct file *filep, char __user *ubuf,
return -ENOMEM;
ret = ntb_mw_get_align(inmw->tc->ntb, inmw->pidx, inmw->widx,
- &addr_align, &size_align, &size_max);
+ &addr_align, &size_align, &size_max, NULL);
if (ret)
goto err;
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 4dfb3e40dffa..4db1fabfd8a4 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1384,7 +1384,7 @@ static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
}
static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size, resource_size_t offset)
{
struct epf_ntb *ntb = ntb_ndev(ndev);
struct pci_epf_bar *epf_bar;
@@ -1507,7 +1507,8 @@ static u64 vntb_epf_db_read(struct ntb_dev *ndev)
static int vntb_epf_mw_get_align(struct ntb_dev *ndev, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
struct epf_ntb *ntb = ntb_ndev(ndev);
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 8ff9d663096b..d7ce5d2e60d0 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -273,9 +273,11 @@ struct ntb_dev_ops {
int (*mw_get_align)(struct ntb_dev *ntb, int pidx, int widx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max);
+ resource_size_t *size_max,
+ resource_size_t *offset);
int (*mw_set_trans)(struct ntb_dev *ntb, int pidx, int widx,
- dma_addr_t addr, resource_size_t size);
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset);
int (*mw_clear_trans)(struct ntb_dev *ntb, int pidx, int widx);
int (*peer_mw_count)(struct ntb_dev *ntb);
int (*peer_mw_get_addr)(struct ntb_dev *ntb, int widx,
@@ -823,13 +825,14 @@ static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
static inline int ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
resource_size_t *addr_align,
resource_size_t *size_align,
- resource_size_t *size_max)
+ resource_size_t *size_max,
+ resource_size_t *offset)
{
if (!(ntb_link_is_up(ntb, NULL, NULL) & BIT_ULL(pidx)))
return -ENOTCONN;
return ntb->ops->mw_get_align(ntb, pidx, widx, addr_align, size_align,
- size_max);
+ size_max, offset);
}
/**
@@ -852,12 +855,13 @@ static inline int ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
* Return: Zero on success, otherwise an error number.
*/
static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
- dma_addr_t addr, resource_size_t size)
+ dma_addr_t addr, resource_size_t size,
+ resource_size_t offset)
{
if (!ntb->ops->mw_set_trans)
return 0;
- return ntb->ops->mw_set_trans(ntb, pidx, widx, addr, size);
+ return ntb->ops->mw_set_trans(ntb, pidx, widx, addr, size, offset);
}
/**
@@ -875,7 +879,7 @@ static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
static inline int ntb_mw_clear_trans(struct ntb_dev *ntb, int pidx, int widx)
{
if (!ntb->ops->mw_clear_trans)
- return ntb_mw_set_trans(ntb, pidx, widx, 0, 0);
+ return ntb_mw_set_trans(ntb, pidx, widx, 0, 0, 0);
return ntb->ops->mw_clear_trans(ntb, pidx, widx);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 05/35] PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when present
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (3 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 04/35] NTB: Add offset parameter to MW translation APIs Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 06/35] NTB: ntb_transport: Support partial memory windows with offsets Koichiro Den
` (30 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
The NTB API functions ntb_mw_set_trans() and ntb_mw_get_align() now
support non-zero MW offsets. Update pci-epf-vntb to populate
mws_offset[idx] when the offset parameter is provided. Users can now
retrieve the offset and pass it to ntb_mw_set_trans().
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 4db1fabfd8a4..337995e2f3ce 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1521,6 +1521,9 @@ static int vntb_epf_mw_get_align(struct ntb_dev *ndev, int pidx, int idx,
if (size_max)
*size_max = ntb->mws_size[idx];
+ if (offset)
+ *offset = ntb->mws_offset[idx];
+
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 06/35] NTB: ntb_transport: Support partial memory windows with offsets
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (4 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 05/35] PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when present Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 07/35] PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC driver Koichiro Den
` (29 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
The NTB API functions ntb_mw_set_trans() and ntb_mw_get_align() now
support non-zero MW offsets. Update ntb_transport to make use of this
capability by propagating the offset when setting up MW translations.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index e16a8147ddc5..57b4c0511927 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -823,13 +823,14 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
size_t xlat_size, buff_size;
resource_size_t xlat_align;
resource_size_t xlat_align_size;
+ resource_size_t offset;
int rc;
if (!size)
return -EINVAL;
rc = ntb_mw_get_align(nt->ndev, PIDX, num_mw, &xlat_align,
- &xlat_align_size, NULL, NULL);
+ &xlat_align_size, NULL, &offset);
if (rc)
return rc;
@@ -864,7 +865,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
/* Notify HW the memory location of the receive buffer */
rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
- mw->xlat_size, 0);
+ mw->xlat_size, offset);
if (rc) {
dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
ntb_free_mw(nt, num_mw);
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 07/35] PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC driver
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (5 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 06/35] NTB: ntb_transport: Support partial memory windows with offsets Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 08/35] NTB: core: Add .get_private_data() to ntb_dev_ops Koichiro Den
` (28 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Now that pci_epc_set_bar supports subrange mapping, give a hint about
that when calling pci_epc_set_bar(). For example, DW EPC chooses Address
Match Mode IB iATU mapping when 'use_submap' is set to true.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 30 ++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 337995e2f3ce..23bbcfd20c3b 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -731,6 +731,12 @@ static int epf_ntb_mw_bar_init(struct epf_ntb *ntb)
PCI_BASE_ADDRESS_MEM_TYPE_64 :
PCI_BASE_ADDRESS_MEM_TYPE_32;
+ /* express preference for subrange mapping */
+ ntb->epf->bar[barno].use_submap = true;
+ ntb->epf->bar[barno].num_submap = 0;
+ if (WARN_ON(ntb->epf->bar[barno].submap))
+ dev_warn(dev, "BAR%u submap is not NULL\n", barno);
+
ret = pci_epc_set_bar(ntb->epf->epc,
ntb->epf->func_no,
ntb->epf->vfunc_no,
@@ -1391,6 +1397,7 @@ static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
enum pci_barno barno;
int ret;
struct device *dev;
+ unsigned int sb;
dev = &ntb->ntb.dev;
barno = ntb->epf_ntb_bar[BAR_MW1 + idx];
@@ -1399,7 +1406,28 @@ static int vntb_epf_mw_set_trans(struct ntb_dev *ndev, int pidx, int idx,
epf_bar->barno = barno;
epf_bar->size = size;
- ret = pci_epc_set_bar(ntb->epf->epc, 0, 0, epf_bar);
+ /* express preference for subrange mapping */
+ epf_bar->use_submap = true;
+ for (sb = 0; sb < epf_bar->num_submap; sb++) {
+ if (epf_bar->submap[sb].offset == offset) {
+ dev_warn(dev, "offset 0x%llx is already mapped\n",
+ offset);
+ return -EBUSY;
+ }
+ }
+ epf_bar->num_submap++;
+ epf_bar->submap = devm_krealloc_array(
+ &ntb->epf->dev, epf_bar->submap,
+ epf_bar->num_submap, sizeof(*epf_bar->submap),
+ GFP_KERNEL);
+ if (!epf_bar->submap)
+ return -ENOMEM;
+ epf_bar->submap[sb].phys_addr = addr;
+ epf_bar->submap[sb].size = size;
+ epf_bar->submap[sb].offset = offset;
+
+ ret = pci_epc_set_bar(ntb->epf->epc, ntb->epf->func_no,
+ ntb->epf->vfunc_no, epf_bar);
if (ret) {
dev_err(dev, "failure set mw trans\n");
return ret;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 08/35] NTB: core: Add .get_private_data() to ntb_dev_ops
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (6 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 07/35] PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC driver Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 09/35] NTB: epf: vntb: Implement .get_private_data() callback Koichiro Den
` (27 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add an optional get_private_data() callback to retrieve a private data
specific to the underlying hardware driver, e.g. pci_epc device
associated with the NTB implementation.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
include/linux/ntb.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index d7ce5d2e60d0..0dcd9bb57f47 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -256,6 +256,7 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
* @msg_clear_mask: See ntb_msg_clear_mask().
* @msg_read: See ntb_msg_read().
* @peer_msg_write: See ntb_peer_msg_write().
+ * @get_private_data: See ntb_get_private_data().
*/
struct ntb_dev_ops {
int (*port_number)(struct ntb_dev *ntb);
@@ -331,6 +332,7 @@ struct ntb_dev_ops {
int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
u32 (*msg_read)(struct ntb_dev *ntb, int *pidx, int midx);
int (*peer_msg_write)(struct ntb_dev *ntb, int pidx, int midx, u32 msg);
+ void *(*get_private_data)(struct ntb_dev *ntb);
};
static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
@@ -393,6 +395,9 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
/* !ops->msg_clear_mask == !ops->msg_count && */
!ops->msg_read == !ops->msg_count &&
!ops->peer_msg_write == !ops->msg_count &&
+
+ /* Miscellaneous optional callbacks */
+ /* ops->get_private_data && */
1;
}
@@ -1567,6 +1572,21 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
}
+/**
+ * ntb_get_private_data() - get private data specific to the hardware driver
+ * @ntb: NTB device context.
+ *
+ * Retrieve private data specific to the hardware driver.
+ *
+ * Return: Pointer to the private data if available. or %NULL if not.
+ */
+static inline void __maybe_unused *ntb_get_private_data(struct ntb_dev *ntb)
+{
+ if (!ntb->ops->get_private_data)
+ return NULL;
+ return ntb->ops->get_private_data(ntb);
+}
+
/**
* ntb_peer_resource_idx() - get a resource index for a given peer idx
* @ntb: NTB device context.
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 09/35] NTB: epf: vntb: Implement .get_private_data() callback
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (7 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 08/35] NTB: core: Add .get_private_data() to ntb_dev_ops Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 10/35] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts Koichiro Den
` (26 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Implement the new get_private_data() operation for the EPF vNTB driver
to expose its associated EPC device to NTB subsystems.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/pci/endpoint/functions/pci-epf-vntb.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index 23bbcfd20c3b..c89f5b0775fa 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1582,6 +1582,15 @@ static int vntb_epf_link_disable(struct ntb_dev *ntb)
return 0;
}
+static void *vntb_epf_get_private_data(struct ntb_dev *ntb)
+{
+ struct epf_ntb *ndev = ntb_ndev(ntb);
+
+ if (!ndev || !ndev->epf)
+ return NULL;
+ return (void *)ndev->epf->epc;
+}
+
static const struct ntb_dev_ops vntb_epf_ops = {
.mw_count = vntb_epf_mw_count,
.spad_count = vntb_epf_spad_count,
@@ -1603,6 +1612,7 @@ static const struct ntb_dev_ops vntb_epf_ops = {
.db_clear_mask = vntb_epf_db_clear_mask,
.db_clear = vntb_epf_db_clear,
.link_disable = vntb_epf_link_disable,
+ .get_private_data = vntb_epf_get_private_data,
};
static int pci_vntb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 10/35] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (8 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 09/35] NTB: epf: vntb: Implement .get_private_data() callback Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 11/35] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
` (25 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
When multiple MSI vectors are allocated for the DesignWare eDMA, the
driver currently records the same MSI message for all IRQs by calling
get_cached_msi_msg() per vector. For multi-vector MSI (as opposed to
MSI-X), the cached message corresponds to vector 0 and msg.data is
supposed to be adjusted by the vector index.
As a result, all eDMA interrupts share the same MSI data value and the
interrupt controller cannot distinguish between them.
Introduce dw_edma_compose_msi() to construct the correct MSI message for
each vector. For MSI-X nothing changes. For multi-vector MSI, derive the
base IRQ with msi_get_virq(dev, 0) and apply the per-vector offset to
msg.data before storing it in dw->irq[i].msi.
This makes each IMWr MSI vector use a unique MSI data value.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 744c60ec9641..1b935da65d05 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -855,6 +855,28 @@ static inline void dw_edma_add_irq_mask(u32 *mask, u32 alloc, u16 cnt)
(*mask)++;
}
+static void dw_edma_compose_msi(struct device *dev, int irq, struct msi_msg *out)
+{
+ struct msi_desc *desc = irq_get_msi_desc(irq);
+ struct msi_msg msg;
+ unsigned int base;
+
+ if (!desc)
+ return;
+
+ get_cached_msi_msg(irq, &msg);
+ if (!desc->pci.msi_attrib.is_msix) {
+ /*
+ * For multi-vector MSI, the cached message corresponds to
+ * vector 0. Adjust msg.data by the IRQ index so that each
+ * vector gets a unique MSI data value for IMWr Data Register.
+ */
+ base = msi_get_virq(dev, 0);
+ msg.data |= (irq - base);
+ }
+ *out = msg;
+}
+
static int dw_edma_irq_request(struct dw_edma *dw,
u32 *wr_alloc, u32 *rd_alloc)
{
@@ -885,8 +907,7 @@ static int dw_edma_irq_request(struct dw_edma *dw,
return err;
}
- if (irq_get_msi_desc(irq))
- get_cached_msi_msg(irq, &dw->irq[0].msi);
+ dw_edma_compose_msi(dev, irq, &dw->irq[0].msi);
dw->nr_irqs = 1;
} else {
@@ -912,8 +933,7 @@ static int dw_edma_irq_request(struct dw_edma *dw,
if (err)
goto err_irq_free;
- if (irq_get_msi_desc(irq))
- get_cached_msi_msg(irq, &dw->irq[i].msi);
+ dw_edma_compose_msi(dev, irq, &dw->irq[i].msi);
}
dw->nr_irqs = i;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 11/35] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (9 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 10/35] dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr interrupts Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 12/35] NTB: ntb_transport: Dynamically determine qp count Koichiro Den
` (24 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Historically both TX and RX have assumed the same per-QP MW slice
(tx_max_entry == remote rx_max_entry), while those are calculated
separately in different places (pre and post the link-up negotiation
point). This has been safe because nt->link_is_up is never set to true
unless the pre-determined qp_count are the same among them, and qp_count
is typically limited to nt->mw_count, which should be carefully
configured by admin.
However, setup_qp_mw can actually split mw and handle multi-qps in one
MW properly, so qp_count needs not to be limited by nt->mw_count. Once
we relaxing the limitation, pre-determined qp_count can differ among
host side and endpoint, and link-up negotiation can easily fail.
Move the TX MW configuration (per-QP offset and size) into
ntb_transport_setup_qp_mw() so that both RX and TX layout decisions are
centralized in a single helper. ntb_transport_init_queue() now deals
only with per-QP software state, not with MW layout.
This keeps the previous behaviour, while preparing for relaxing the
qp_count limitation and improving readibility.
No functional change is intended.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 76 ++++++++++++++++---------------------
1 file changed, 32 insertions(+), 44 deletions(-)
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 57b4c0511927..42abd1ce02d5 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -569,7 +569,10 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
struct ntb_transport_mw *mw;
struct ntb_dev *ndev = nt->ndev;
struct ntb_queue_entry *entry;
- unsigned int rx_size, num_qps_mw;
+ phys_addr_t mw_base;
+ resource_size_t mw_size;
+ unsigned int rx_size, tx_size, num_qps_mw;
+ u64 qp_offset;
unsigned int mw_num, mw_count, qp_count;
unsigned int i;
int node;
@@ -588,13 +591,38 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
else
num_qps_mw = qp_count / mw_count;
- rx_size = (unsigned int)mw->xlat_size / num_qps_mw;
- qp->rx_buff = mw->virt_addr + rx_size * (qp_num / mw_count);
- rx_size -= sizeof(struct ntb_rx_info);
+ mw_base = nt->mw_vec[mw_num].phys_addr;
+ mw_size = nt->mw_vec[mw_num].phys_size;
+
+ if (mw_size > mw->xlat_size)
+ mw_size = mw->xlat_size;
+ if (max_mw_size && mw_size > max_mw_size)
+ mw_size = max_mw_size;
+
+ tx_size = (unsigned int)mw_size / num_qps_mw;
+ qp_offset = tx_size * (qp_num / mw_count);
+
+ qp->rx_buff = mw->virt_addr + qp_offset;
+
+ qp->tx_mw_size = tx_size;
+ qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
+ if (!qp->tx_mw)
+ return -EINVAL;
+
+ qp->tx_mw_phys = mw_base + qp_offset;
+ if (!qp->tx_mw_phys)
+ return -EINVAL;
+ rx_size = tx_size;
+ rx_size -= sizeof(struct ntb_rx_info);
qp->remote_rx_info = qp->rx_buff + rx_size;
+ tx_size -= sizeof(struct ntb_rx_info);
+ qp->rx_info = qp->tx_mw + tx_size;
+
/* Due to housekeeping, there must be atleast 2 buffs */
+ qp->tx_max_frame = min(transport_mtu, tx_size / 2);
+ qp->tx_max_entry = tx_size / qp->tx_max_frame;
qp->rx_max_frame = min(transport_mtu, rx_size / 2);
qp->rx_max_entry = rx_size / qp->rx_max_frame;
qp->rx_index = 0;
@@ -1133,16 +1161,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
unsigned int qp_num)
{
struct ntb_transport_qp *qp;
- phys_addr_t mw_base;
- resource_size_t mw_size;
- unsigned int num_qps_mw, tx_size;
- unsigned int mw_num, mw_count, qp_count;
- u64 qp_offset;
-
- mw_count = nt->mw_count;
- qp_count = nt->qp_count;
-
- mw_num = QP_TO_MW(nt, qp_num);
qp = &nt->qp_vec[qp_num];
qp->qp_num = qp_num;
@@ -1152,36 +1170,6 @@ static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
qp->event_handler = NULL;
ntb_qp_link_context_reset(qp);
- if (mw_num < qp_count % mw_count)
- num_qps_mw = qp_count / mw_count + 1;
- else
- num_qps_mw = qp_count / mw_count;
-
- mw_base = nt->mw_vec[mw_num].phys_addr;
- mw_size = nt->mw_vec[mw_num].phys_size;
-
- if (max_mw_size && mw_size > max_mw_size)
- mw_size = max_mw_size;
-
- tx_size = (unsigned int)mw_size / num_qps_mw;
- qp_offset = tx_size * (qp_num / mw_count);
-
- qp->tx_mw_size = tx_size;
- qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
- if (!qp->tx_mw)
- return -EINVAL;
-
- qp->tx_mw_phys = mw_base + qp_offset;
- if (!qp->tx_mw_phys)
- return -EINVAL;
-
- tx_size -= sizeof(struct ntb_rx_info);
- qp->rx_info = qp->tx_mw + tx_size;
-
- /* Due to housekeeping, there must be atleast 2 buffs */
- qp->tx_max_frame = min(transport_mtu, tx_size / 2);
- qp->tx_max_entry = tx_size / qp->tx_max_frame;
-
if (nt->debugfs_node_dir) {
char debugfs_name[8];
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 12/35] NTB: ntb_transport: Dynamically determine qp count
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (10 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 11/35] NTB: ntb_transport: Move TX memory window setup into setup_qp_mw() Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper Koichiro Den
` (23 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
One MW can host multiple queue pairs, so stop limiting qp_count to the
number of MWs.
Now that both TX and RX MW sizing are done in the same place, the MW
layout is derived from a single code path on both host and endpoint, so
the layout cannot diverge between the two sides.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 42abd1ce02d5..bac842177b55 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1024,6 +1024,7 @@ static void ntb_transport_link_work(struct work_struct *work)
struct ntb_dev *ndev = nt->ndev;
struct pci_dev *pdev = ndev->pdev;
resource_size_t size;
+ u64 qp_bitmap_free;
u32 val;
int rc = 0, i, spad;
@@ -1071,8 +1072,23 @@ static void ntb_transport_link_work(struct work_struct *work)
val = ntb_spad_read(ndev, NUM_QPS);
dev_dbg(&pdev->dev, "Remote max number of qps = %d\n", val);
- if (val != nt->qp_count)
+ if (val == 0)
goto out;
+ else if (val < nt->qp_count) {
+ /*
+ * Clamp local qp_count to peer-advertised NUM_QPS to avoid
+ * mismatched queues.
+ */
+ qp_bitmap_free = nt->qp_bitmap_free;
+ for (i = val; i < nt->qp_count; i++) {
+ nt->qp_bitmap &= ~BIT_ULL(i);
+ nt->qp_bitmap_free &= ~BIT_ULL(i);
+ }
+ dev_warn(&pdev->dev,
+ "Local number of qps is reduced: %d->%d (qp_bitmap_free: 0x%llx->0x%llx)\n",
+ nt->qp_count, val, qp_bitmap_free, nt->qp_bitmap_free);
+ nt->qp_count = val;
+ }
val = ntb_spad_read(ndev, NUM_MWS);
dev_dbg(&pdev->dev, "Remote number of mws = %d\n", val);
@@ -1301,8 +1317,6 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
if (max_num_clients && max_num_clients < qp_count)
qp_count = max_num_clients;
- else if (nt->mw_count < qp_count)
- qp_count = nt->mw_count;
qp_bitmap &= BIT_ULL(qp_count) - 1;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (11 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 12/35] NTB: ntb_transport: Dynamically determine qp count Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-19 14:31 ` Frank Li
2025-12-17 15:15 ` [RFC PATCH v3 14/35] NTB: epf: Reserve a subset of MSI vectors for non-NTB users Koichiro Den
` (22 subsequent siblings)
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
When ntb_transport is used on top of an endpoint function (EPF) NTB
implementation, DMA mappings should be associated with the underlying
PCIe controller device rather than the virtual NTB PCI function. This
matters for IOMMU configuration and DMA mask validation.
Add a small helper, get_dma_dev(), that returns the appropriate struct
device for DMA mapping, i.e. &pdev->dev for a regular NTB host bridge
and the EPC parent device for EPF-based NTB endpoints. Use it in the
places where we set up DMA mappings or log DMA-related errors.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index bac842177b55..78d0469edbcc 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -63,6 +63,7 @@
#include <linux/mutex.h>
#include "linux/ntb.h"
#include "linux/ntb_transport.h"
+#include <linux/pci-epc.h>
#define NTB_TRANSPORT_VERSION 4
#define NTB_TRANSPORT_VER "4"
@@ -259,6 +260,26 @@ struct ntb_payload_header {
unsigned int flags;
};
+/*
+ * Return the device that should be used for DMA mapping.
+ *
+ * On RC, this is simply &pdev->dev.
+ * On EPF-backed NTB endpoints, use the EPC parent device so that
+ * DMA capabilities and IOMMU configuration are taken from the
+ * controller rather than the virtual NTB PCI function.
+ */
+static struct device *get_dma_dev(struct ntb_dev *ndev)
+{
+ struct device *dev = &ndev->pdev->dev;
+ struct pci_epc *epc;
+
+ epc = (struct pci_epc *)ntb_get_private_data(ndev);
+ if (epc)
+ dev = epc->dev.parent;
+
+ return dev;
+}
+
enum {
VERSION = 0,
QP_LINKS,
@@ -771,13 +792,13 @@ static void ntb_transport_msi_desc_changed(void *data)
static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
{
struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
- struct pci_dev *pdev = nt->ndev->pdev;
+ struct device *dev = get_dma_dev(nt->ndev);
if (!mw->virt_addr)
return;
ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
- dma_free_coherent(&pdev->dev, mw->alloc_size,
+ dma_free_coherent(dev, mw->alloc_size,
mw->alloc_addr, mw->dma_addr);
mw->xlat_size = 0;
mw->buff_size = 0;
@@ -847,7 +868,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
resource_size_t size)
{
struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
- struct pci_dev *pdev = nt->ndev->pdev;
+ struct device *dev = get_dma_dev(nt->ndev);
size_t xlat_size, buff_size;
resource_size_t xlat_align;
resource_size_t xlat_align_size;
@@ -877,12 +898,12 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
mw->buff_size = buff_size;
mw->alloc_size = buff_size;
- rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
+ rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
if (rc) {
mw->alloc_size *= 2;
- rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
+ rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
if (rc) {
- dev_err(&pdev->dev,
+ dev_err(dev,
"Unable to alloc aligned MW buff\n");
mw->xlat_size = 0;
mw->buff_size = 0;
@@ -895,7 +916,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
mw->xlat_size, offset);
if (rc) {
- dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
+ dev_err(dev, "Unable to set mw%d translation", num_mw);
ntb_free_mw(nt, num_mw);
return -EIO;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper
2025-12-17 15:15 ` [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper Koichiro Den
@ 2025-12-19 14:31 ` Frank Li
2025-12-20 15:29 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 14:31 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:47AM +0900, Koichiro Den wrote:
> When ntb_transport is used on top of an endpoint function (EPF) NTB
> implementation, DMA mappings should be associated with the underlying
> PCIe controller device rather than the virtual NTB PCI function. This
> matters for IOMMU configuration and DMA mask validation.
>
> Add a small helper, get_dma_dev(), that returns the appropriate struct
> device for DMA mapping, i.e. &pdev->dev for a regular NTB host bridge
> and the EPC parent device for EPF-based NTB endpoints. Use it in the
> places where we set up DMA mappings or log DMA-related errors.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/ntb/ntb_transport.c | 35 ++++++++++++++++++++++++++++-------
> 1 file changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index bac842177b55..78d0469edbcc 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -63,6 +63,7 @@
> #include <linux/mutex.h>
> #include "linux/ntb.h"
> #include "linux/ntb_transport.h"
> +#include <linux/pci-epc.h>
>
> #define NTB_TRANSPORT_VERSION 4
> #define NTB_TRANSPORT_VER "4"
> @@ -259,6 +260,26 @@ struct ntb_payload_header {
> unsigned int flags;
> };
>
> +/*
> + * Return the device that should be used for DMA mapping.
> + *
> + * On RC, this is simply &pdev->dev.
> + * On EPF-backed NTB endpoints, use the EPC parent device so that
> + * DMA capabilities and IOMMU configuration are taken from the
> + * controller rather than the virtual NTB PCI function.
> + */
> +static struct device *get_dma_dev(struct ntb_dev *ndev)
> +{
> + struct device *dev = &ndev->pdev->dev;
> + struct pci_epc *epc;
> +
> + epc = (struct pci_epc *)ntb_get_private_data(ndev);
> + if (epc)
> + dev = epc->dev.parent;
> +
> + return dev;
> +}
> +
I think add callback .get_dma_dev() directly. So vntb epf driver to provide
a implement. The file is common for all ntb transfer, should not include
ntb lower driver's specific implmentatin.
Frank
> enum {
> VERSION = 0,
> QP_LINKS,
> @@ -771,13 +792,13 @@ static void ntb_transport_msi_desc_changed(void *data)
> static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
> {
> struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> - struct pci_dev *pdev = nt->ndev->pdev;
> + struct device *dev = get_dma_dev(nt->ndev);
>
> if (!mw->virt_addr)
> return;
>
> ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
> - dma_free_coherent(&pdev->dev, mw->alloc_size,
> + dma_free_coherent(dev, mw->alloc_size,
> mw->alloc_addr, mw->dma_addr);
> mw->xlat_size = 0;
> mw->buff_size = 0;
> @@ -847,7 +868,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> resource_size_t size)
> {
> struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> - struct pci_dev *pdev = nt->ndev->pdev;
> + struct device *dev = get_dma_dev(nt->ndev);
> size_t xlat_size, buff_size;
> resource_size_t xlat_align;
> resource_size_t xlat_align_size;
> @@ -877,12 +898,12 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> mw->buff_size = buff_size;
> mw->alloc_size = buff_size;
>
> - rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> + rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
> if (rc) {
> mw->alloc_size *= 2;
> - rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> + rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
> if (rc) {
> - dev_err(&pdev->dev,
> + dev_err(dev,
> "Unable to alloc aligned MW buff\n");
> mw->xlat_size = 0;
> mw->buff_size = 0;
> @@ -895,7 +916,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
> mw->xlat_size, offset);
> if (rc) {
> - dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
> + dev_err(dev, "Unable to set mw%d translation", num_mw);
> ntb_free_mw(nt, num_mw);
> return -EIO;
> }
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper
2025-12-19 14:31 ` Frank Li
@ 2025-12-20 15:29 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-20 15:29 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Fri, Dec 19, 2025 at 09:31:11AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:47AM +0900, Koichiro Den wrote:
> > When ntb_transport is used on top of an endpoint function (EPF) NTB
> > implementation, DMA mappings should be associated with the underlying
> > PCIe controller device rather than the virtual NTB PCI function. This
> > matters for IOMMU configuration and DMA mask validation.
> >
> > Add a small helper, get_dma_dev(), that returns the appropriate struct
> > device for DMA mapping, i.e. &pdev->dev for a regular NTB host bridge
> > and the EPC parent device for EPF-based NTB endpoints. Use it in the
> > places where we set up DMA mappings or log DMA-related errors.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > drivers/ntb/ntb_transport.c | 35 ++++++++++++++++++++++++++++-------
> > 1 file changed, 28 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> > index bac842177b55..78d0469edbcc 100644
> > --- a/drivers/ntb/ntb_transport.c
> > +++ b/drivers/ntb/ntb_transport.c
> > @@ -63,6 +63,7 @@
> > #include <linux/mutex.h>
> > #include "linux/ntb.h"
> > #include "linux/ntb_transport.h"
> > +#include <linux/pci-epc.h>
> >
> > #define NTB_TRANSPORT_VERSION 4
> > #define NTB_TRANSPORT_VER "4"
> > @@ -259,6 +260,26 @@ struct ntb_payload_header {
> > unsigned int flags;
> > };
> >
> > +/*
> > + * Return the device that should be used for DMA mapping.
> > + *
> > + * On RC, this is simply &pdev->dev.
> > + * On EPF-backed NTB endpoints, use the EPC parent device so that
> > + * DMA capabilities and IOMMU configuration are taken from the
> > + * controller rather than the virtual NTB PCI function.
> > + */
> > +static struct device *get_dma_dev(struct ntb_dev *ndev)
> > +{
> > + struct device *dev = &ndev->pdev->dev;
> > + struct pci_epc *epc;
> > +
> > + epc = (struct pci_epc *)ntb_get_private_data(ndev);
> > + if (epc)
> > + dev = epc->dev.parent;
> > +
> > + return dev;
> > +}
> > +
>
> I think add callback .get_dma_dev() directly. So vntb epf driver to provide
> a implement. The file is common for all ntb transfer, should not include
> ntb lower driver's specific implmentatin.
That makes sense, thanks for pointing that out.
Koichiro
>
> Frank
>
> > enum {
> > VERSION = 0,
> > QP_LINKS,
> > @@ -771,13 +792,13 @@ static void ntb_transport_msi_desc_changed(void *data)
> > static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
> > {
> > struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> > - struct pci_dev *pdev = nt->ndev->pdev;
> > + struct device *dev = get_dma_dev(nt->ndev);
> >
> > if (!mw->virt_addr)
> > return;
> >
> > ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
> > - dma_free_coherent(&pdev->dev, mw->alloc_size,
> > + dma_free_coherent(dev, mw->alloc_size,
> > mw->alloc_addr, mw->dma_addr);
> > mw->xlat_size = 0;
> > mw->buff_size = 0;
> > @@ -847,7 +868,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> > resource_size_t size)
> > {
> > struct ntb_transport_mw *mw = &nt->mw_vec[num_mw];
> > - struct pci_dev *pdev = nt->ndev->pdev;
> > + struct device *dev = get_dma_dev(nt->ndev);
> > size_t xlat_size, buff_size;
> > resource_size_t xlat_align;
> > resource_size_t xlat_align_size;
> > @@ -877,12 +898,12 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> > mw->buff_size = buff_size;
> > mw->alloc_size = buff_size;
> >
> > - rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> > + rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
> > if (rc) {
> > mw->alloc_size *= 2;
> > - rc = ntb_alloc_mw_buffer(mw, &pdev->dev, xlat_align);
> > + rc = ntb_alloc_mw_buffer(mw, dev, xlat_align);
> > if (rc) {
> > - dev_err(&pdev->dev,
> > + dev_err(dev,
> > "Unable to alloc aligned MW buff\n");
> > mw->xlat_size = 0;
> > mw->buff_size = 0;
> > @@ -895,7 +916,7 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
> > rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
> > mw->xlat_size, offset);
> > if (rc) {
> > - dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
> > + dev_err(dev, "Unable to set mw%d translation", num_mw);
> > ntb_free_mw(nt, num_mw);
> > return -EIO;
> > }
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 14/35] NTB: epf: Reserve a subset of MSI vectors for non-NTB users
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (12 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 13/35] NTB: ntb_transport: Introduce get_dma_dev() helper Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 15/35] NTB: ntb_transport: Move internal types to ntb_transport_internal.h Koichiro Den
` (21 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
The ntb_hw_epf driver currently uses all MSI/MSI-X vectors allocated for
the endpoint as doorbell interrupts. On SoCs that also run other
functions on the same PCIe controller (e.g. DesignWare eDMA), we need to
reserve some vectors for those other consumers.
Introduce NTB_EPF_IRQ_RESERVE and track the total number of allocated
vectors in ntb_epf_dev's 'num_irqs' field. Use only (num_irqs -
NTB_EPF_IRQ_RESERVE) vectors for NTB doorbells and free all num_irqs
vectors in the teardown path, so that the remaining vectors can be used
by other endpoint functions such as the integrated DesignWare eDMA.
This makes it possible to share the PCIe controller MSI space between
NTB and other on-chip IP blocks.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/epf/ntb_hw_epf.c | 34 +++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 89a536562abf..4ecc6b2177b4 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -49,6 +49,7 @@
#define NTB_EPF_MIN_DB_COUNT 3
#define NTB_EPF_MAX_DB_COUNT 31
+#define NTB_EPF_IRQ_RESERVE 8
#define NTB_EPF_COMMAND_TIMEOUT 1000 /* 1 Sec */
@@ -87,6 +88,8 @@ struct ntb_epf_dev {
unsigned int spad_count;
unsigned int db_count;
+ unsigned int num_irqs;
+
void __iomem *ctrl_reg;
void __iomem *db_reg;
void __iomem *peer_spad_reg;
@@ -341,7 +344,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
u32 argument = MSIX_ENABLE;
int irq;
int ret;
- int i;
+ int i = 0;
irq = pci_alloc_irq_vectors(pdev, msi_min, msi_max, PCI_IRQ_MSIX);
if (irq < 0) {
@@ -355,33 +358,39 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
argument &= ~MSIX_ENABLE;
}
+ ndev->num_irqs = irq;
+ irq -= NTB_EPF_IRQ_RESERVE;
+ if (irq <= 0) {
+ dev_err(dev, "Not enough irqs allocated\n");
+ ret = -ENOSPC;
+ goto err_out;
+ }
+
for (i = 0; i < irq; i++) {
ret = request_irq(pci_irq_vector(pdev, i), ntb_epf_vec_isr,
0, "ntb_epf", ndev);
if (ret) {
dev_err(dev, "Failed to request irq\n");
- goto err_request_irq;
+ goto err_out;
}
}
- ndev->db_count = irq - 1;
+ ndev->db_count = irq;
ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
argument | irq);
if (ret) {
dev_err(dev, "Failed to configure doorbell\n");
- goto err_configure_db;
+ goto err_out;
}
return 0;
-err_configure_db:
- for (i = 0; i < ndev->db_count + 1; i++)
+err_out:
+ while (i-- > 0)
free_irq(pci_irq_vector(pdev, i), ndev);
-err_request_irq:
pci_free_irq_vectors(pdev);
-
return ret;
}
@@ -489,7 +498,7 @@ static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
u32 db_offset;
u32 db_data;
- if (interrupt_num > ndev->db_count) {
+ if (interrupt_num >= ndev->db_count) {
dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
interrupt_num, ndev->db_count);
return -EINVAL;
@@ -499,6 +508,7 @@ static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
+
writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
db_offset);
@@ -581,8 +591,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
int ret;
/* One Link interrupt and rest doorbell interrupt */
- ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1,
- NTB_EPF_MAX_DB_COUNT + 1);
+ ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
+ NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
if (ret) {
dev_err(dev, "Failed to init ISR\n");
return ret;
@@ -689,7 +699,7 @@ static void ntb_epf_cleanup_isr(struct ntb_epf_dev *ndev)
ntb_epf_send_command(ndev, CMD_TEARDOWN_DOORBELL, ndev->db_count + 1);
- for (i = 0; i < ndev->db_count + 1; i++)
+ for (i = 0; i < ndev->num_irqs; i++)
free_irq(pci_irq_vector(pdev, i), ndev);
pci_free_irq_vectors(pdev);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 15/35] NTB: ntb_transport: Move internal types to ntb_transport_internal.h
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (13 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 14/35] NTB: epf: Reserve a subset of MSI vectors for non-NTB users Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 16/35] NTB: ntb_transport: Introduce ntb_transport_backend_ops Koichiro Den
` (20 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
No functional changes intended.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 168 ++-------------------------
drivers/ntb/ntb_transport_internal.h | 164 ++++++++++++++++++++++++++
include/linux/ntb_transport.h | 5 +
3 files changed, 181 insertions(+), 156 deletions(-)
create mode 100644 drivers/ntb/ntb_transport_internal.h
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 78d0469edbcc..3969fa29a5b9 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -65,6 +65,8 @@
#include "linux/ntb_transport.h"
#include <linux/pci-epc.h>
+#include "ntb_transport_internal.h"
+
#define NTB_TRANSPORT_VERSION 4
#define NTB_TRANSPORT_VER "4"
#define NTB_TRANSPORT_NAME "ntb_transport"
@@ -76,11 +78,11 @@ MODULE_VERSION(NTB_TRANSPORT_VER);
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Intel Corporation");
-static unsigned long max_mw_size;
+unsigned long max_mw_size;
module_param(max_mw_size, ulong, 0644);
MODULE_PARM_DESC(max_mw_size, "Limit size of large memory windows");
-static unsigned int transport_mtu = 0x10000;
+unsigned int transport_mtu = 0x10000;
module_param(transport_mtu, uint, 0644);
MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
@@ -96,7 +98,7 @@ static bool use_dma;
module_param(use_dma, bool, 0644);
MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
-static bool use_msi;
+bool use_msi;
#ifdef CONFIG_NTB_MSI
module_param(use_msi, bool, 0644);
MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
@@ -107,153 +109,12 @@ static struct dentry *nt_debugfs_dir;
/* Only two-ports NTB devices are supported */
#define PIDX NTB_DEF_PEER_IDX
-struct ntb_queue_entry {
- /* ntb_queue list reference */
- struct list_head entry;
- /* pointers to data to be transferred */
- void *cb_data;
- void *buf;
- unsigned int len;
- unsigned int flags;
- int retries;
- int errors;
- unsigned int tx_index;
- unsigned int rx_index;
-
- struct ntb_transport_qp *qp;
- union {
- struct ntb_payload_header __iomem *tx_hdr;
- struct ntb_payload_header *rx_hdr;
- };
-};
-
-struct ntb_rx_info {
- unsigned int entry;
-};
-
-struct ntb_transport_qp {
- struct ntb_transport_ctx *transport;
- struct ntb_dev *ndev;
- void *cb_data;
- struct dma_chan *tx_dma_chan;
- struct dma_chan *rx_dma_chan;
-
- bool client_ready;
- bool link_is_up;
- bool active;
-
- u8 qp_num; /* Only 64 QP's are allowed. 0-63 */
- u64 qp_bit;
-
- struct ntb_rx_info __iomem *rx_info;
- struct ntb_rx_info *remote_rx_info;
-
- void (*tx_handler)(struct ntb_transport_qp *qp, void *qp_data,
- void *data, int len);
- struct list_head tx_free_q;
- spinlock_t ntb_tx_free_q_lock;
- void __iomem *tx_mw;
- phys_addr_t tx_mw_phys;
- size_t tx_mw_size;
- dma_addr_t tx_mw_dma_addr;
- unsigned int tx_index;
- unsigned int tx_max_entry;
- unsigned int tx_max_frame;
-
- void (*rx_handler)(struct ntb_transport_qp *qp, void *qp_data,
- void *data, int len);
- struct list_head rx_post_q;
- struct list_head rx_pend_q;
- struct list_head rx_free_q;
- /* ntb_rx_q_lock: synchronize access to rx_XXXX_q */
- spinlock_t ntb_rx_q_lock;
- void *rx_buff;
- unsigned int rx_index;
- unsigned int rx_max_entry;
- unsigned int rx_max_frame;
- unsigned int rx_alloc_entry;
- dma_cookie_t last_cookie;
- struct tasklet_struct rxc_db_work;
-
- void (*event_handler)(void *data, int status);
- struct delayed_work link_work;
- struct work_struct link_cleanup;
-
- struct dentry *debugfs_dir;
- struct dentry *debugfs_stats;
-
- /* Stats */
- u64 rx_bytes;
- u64 rx_pkts;
- u64 rx_ring_empty;
- u64 rx_err_no_buf;
- u64 rx_err_oflow;
- u64 rx_err_ver;
- u64 rx_memcpy;
- u64 rx_async;
- u64 tx_bytes;
- u64 tx_pkts;
- u64 tx_ring_full;
- u64 tx_err_no_buf;
- u64 tx_memcpy;
- u64 tx_async;
-
- bool use_msi;
- int msi_irq;
- struct ntb_msi_desc msi_desc;
- struct ntb_msi_desc peer_msi_desc;
-};
-
-struct ntb_transport_mw {
- phys_addr_t phys_addr;
- resource_size_t phys_size;
- void __iomem *vbase;
- size_t xlat_size;
- size_t buff_size;
- size_t alloc_size;
- void *alloc_addr;
- void *virt_addr;
- dma_addr_t dma_addr;
-};
-
struct ntb_transport_client_dev {
struct list_head entry;
struct ntb_transport_ctx *nt;
struct device dev;
};
-struct ntb_transport_ctx {
- struct list_head entry;
- struct list_head client_devs;
-
- struct ntb_dev *ndev;
-
- struct ntb_transport_mw *mw_vec;
- struct ntb_transport_qp *qp_vec;
- unsigned int mw_count;
- unsigned int qp_count;
- u64 qp_bitmap;
- u64 qp_bitmap_free;
-
- bool use_msi;
- unsigned int msi_spad_offset;
- u64 msi_db_mask;
-
- bool link_is_up;
- struct delayed_work link_work;
- struct work_struct link_cleanup;
-
- struct dentry *debugfs_node_dir;
-
- /* Make sure workq of link event be executed serially */
- struct mutex link_event_lock;
-};
-
-enum {
- DESC_DONE_FLAG = BIT(0),
- LINK_DOWN_FLAG = BIT(1),
-};
-
struct ntb_payload_header {
unsigned int ver;
unsigned int len;
@@ -268,7 +129,7 @@ struct ntb_payload_header {
* DMA capabilities and IOMMU configuration are taken from the
* controller rather than the virtual NTB PCI function.
*/
-static struct device *get_dma_dev(struct ntb_dev *ndev)
+struct device *get_dma_dev(struct ntb_dev *ndev)
{
struct device *dev = &ndev->pdev->dev;
struct pci_epc *epc;
@@ -295,7 +156,6 @@ enum {
#define drv_client(__drv) \
container_of((__drv), struct ntb_transport_client, driver)
-#define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
#define NTB_QP_DEF_NUM_ENTRIES 100
#define NTB_LINK_DOWN_TIMEOUT 10
@@ -532,8 +392,7 @@ static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
}
DEFINE_SHOW_ATTRIBUTE(ntb_qp_debugfs_stats);
-static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
- struct list_head *list)
+void ntb_list_add(spinlock_t *lock, struct list_head *entry, struct list_head *list)
{
unsigned long flags;
@@ -542,8 +401,7 @@ static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
spin_unlock_irqrestore(lock, flags);
}
-static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
- struct list_head *list)
+struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list)
{
struct ntb_queue_entry *entry;
unsigned long flags;
@@ -562,9 +420,8 @@ static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
return entry;
}
-static struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock,
- struct list_head *list,
- struct list_head *to_list)
+struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
+ struct list_head *to_list)
{
struct ntb_queue_entry *entry;
unsigned long flags;
@@ -982,7 +839,7 @@ static void ntb_qp_link_cleanup_work(struct work_struct *work)
msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
}
-static void ntb_qp_link_down(struct ntb_transport_qp *qp)
+void ntb_qp_link_down(struct ntb_transport_qp *qp)
{
schedule_work(&qp->link_cleanup);
}
@@ -1194,8 +1051,7 @@ static void ntb_qp_link_work(struct work_struct *work)
msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
}
-static int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
- unsigned int qp_num)
+int ntb_transport_init_queue(struct ntb_transport_ctx *nt, unsigned int qp_num)
{
struct ntb_transport_qp *qp;
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
new file mode 100644
index 000000000000..79c7dbcf6f91
--- /dev/null
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _NTB_TRANSPORT_INTERNAL_H_
+#define _NTB_TRANSPORT_INTERNAL_H_
+
+#include <linux/ntb_transport.h>
+
+extern unsigned long max_mw_size;
+extern unsigned int transport_mtu;
+extern bool use_msi;
+
+#define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
+
+struct ntb_queue_entry {
+ /* ntb_queue list reference */
+ struct list_head entry;
+ /* pointers to data to be transferred */
+ void *cb_data;
+ void *buf;
+ unsigned int len;
+ unsigned int flags;
+ int retries;
+ int errors;
+ unsigned int tx_index;
+ unsigned int rx_index;
+
+ struct ntb_transport_qp *qp;
+ union {
+ struct ntb_payload_header __iomem *tx_hdr;
+ struct ntb_payload_header *rx_hdr;
+ };
+};
+
+struct ntb_rx_info {
+ unsigned int entry;
+};
+
+struct ntb_transport_qp {
+ struct ntb_transport_ctx *transport;
+ struct ntb_dev *ndev;
+ void *cb_data;
+ struct dma_chan *tx_dma_chan;
+ struct dma_chan *rx_dma_chan;
+
+ bool client_ready;
+ bool link_is_up;
+ bool active;
+
+ u8 qp_num; /* Only 64 QP's are allowed. 0-63 */
+ u64 qp_bit;
+
+ struct ntb_rx_info __iomem *rx_info;
+ struct ntb_rx_info *remote_rx_info;
+
+ void (*tx_handler)(struct ntb_transport_qp *qp, void *qp_data,
+ void *data, int len);
+ struct list_head tx_free_q;
+ spinlock_t ntb_tx_free_q_lock;
+ void __iomem *tx_mw;
+ phys_addr_t tx_mw_phys;
+ size_t tx_mw_size;
+ dma_addr_t tx_mw_dma_addr;
+ unsigned int tx_index;
+ unsigned int tx_max_entry;
+ unsigned int tx_max_frame;
+
+ void (*rx_handler)(struct ntb_transport_qp *qp, void *qp_data,
+ void *data, int len);
+ struct list_head rx_post_q;
+ struct list_head rx_pend_q;
+ struct list_head rx_free_q;
+ /* ntb_rx_q_lock: synchronize access to rx_XXXX_q */
+ spinlock_t ntb_rx_q_lock;
+ void *rx_buff;
+ unsigned int rx_index;
+ unsigned int rx_max_entry;
+ unsigned int rx_max_frame;
+ unsigned int rx_alloc_entry;
+ dma_cookie_t last_cookie;
+ struct tasklet_struct rxc_db_work;
+
+ void (*event_handler)(void *data, int status);
+ struct delayed_work link_work;
+ struct work_struct link_cleanup;
+
+ struct dentry *debugfs_dir;
+ struct dentry *debugfs_stats;
+
+ /* Stats */
+ u64 rx_bytes;
+ u64 rx_pkts;
+ u64 rx_ring_empty;
+ u64 rx_err_no_buf;
+ u64 rx_err_oflow;
+ u64 rx_err_ver;
+ u64 rx_memcpy;
+ u64 rx_async;
+ u64 tx_bytes;
+ u64 tx_pkts;
+ u64 tx_ring_full;
+ u64 tx_err_no_buf;
+ u64 tx_memcpy;
+ u64 tx_async;
+
+ bool use_msi;
+ int msi_irq;
+ struct ntb_msi_desc msi_desc;
+ struct ntb_msi_desc peer_msi_desc;
+};
+
+struct ntb_transport_mw {
+ phys_addr_t phys_addr;
+ resource_size_t phys_size;
+ void __iomem *vbase;
+ size_t xlat_size;
+ size_t buff_size;
+ size_t alloc_size;
+ void *alloc_addr;
+ void *virt_addr;
+ dma_addr_t dma_addr;
+};
+
+struct ntb_transport_ctx {
+ struct list_head entry;
+ struct list_head client_devs;
+
+ struct ntb_dev *ndev;
+
+ struct ntb_transport_mw *mw_vec;
+ struct ntb_transport_qp *qp_vec;
+ unsigned int mw_count;
+ unsigned int qp_count;
+ u64 qp_bitmap;
+ u64 qp_bitmap_free;
+
+ bool use_msi;
+ unsigned int msi_spad_offset;
+ u64 msi_db_mask;
+
+ bool link_is_up;
+ struct delayed_work link_work;
+ struct work_struct link_cleanup;
+
+ struct dentry *debugfs_node_dir;
+
+ /* Make sure workq of link event be executed serially */
+ struct mutex link_event_lock;
+};
+
+enum {
+ DESC_DONE_FLAG = BIT(0),
+ LINK_DOWN_FLAG = BIT(1),
+};
+
+void ntb_list_add(spinlock_t *lock, struct list_head *entry, struct list_head *list);
+struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock, struct list_head *list);
+struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
+ struct list_head *to_list);
+void ntb_qp_link_down(struct ntb_transport_qp *qp);
+int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
+ unsigned int qp_num);
+struct device *get_dma_dev(struct ntb_dev *ndev);
+
+#endif /* _NTB_TRANSPORT_INTERNAL_H_ */
diff --git a/include/linux/ntb_transport.h b/include/linux/ntb_transport.h
index 7243eb98a722..b128ced77b39 100644
--- a/include/linux/ntb_transport.h
+++ b/include/linux/ntb_transport.h
@@ -48,6 +48,9 @@
* Jon Mason <jon.mason@intel.com>
*/
+#ifndef __LINUX_NTB_TRANSPORT_H
+#define __LINUX_NTB_TRANSPORT_H
+
struct ntb_transport_qp;
struct ntb_transport_client {
@@ -84,3 +87,5 @@ void ntb_transport_link_up(struct ntb_transport_qp *qp);
void ntb_transport_link_down(struct ntb_transport_qp *qp);
bool ntb_transport_link_query(struct ntb_transport_qp *qp);
unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp);
+
+#endif /* __LINUX_NTB_TRANSPORT_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 16/35] NTB: ntb_transport: Introduce ntb_transport_backend_ops
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (14 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 15/35] NTB: ntb_transport: Move internal types to ntb_transport_internal.h Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 17/35] dmaengine: dw-edma: Add helper func to retrieve register base and size Koichiro Den
` (19 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Introduce struct ntb_transport_backend_ops to abstract queue setup and
enqueue/poll operations. The existing implementation is moved behind
this interface, and a later patch will introduce an alternative backend
implementation.
No functional changes.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport.c | 133 ++++++++++++++++++---------
drivers/ntb/ntb_transport_internal.h | 21 +++++
2 files changed, 112 insertions(+), 42 deletions(-)
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 3969fa29a5b9..bff8b41a0d3e 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -348,15 +348,9 @@ void ntb_transport_unregister_client(struct ntb_transport_client *drv)
}
EXPORT_SYMBOL_GPL(ntb_transport_unregister_client);
-static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
+static void ntb_transport_default_debugfs_stats_show(struct seq_file *s,
+ struct ntb_transport_qp *qp)
{
- struct ntb_transport_qp *qp = s->private;
-
- if (!qp || !qp->link_is_up)
- return 0;
-
- seq_puts(s, "\nNTB QP stats:\n\n");
-
seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
seq_printf(s, "rx_memcpy - \t%llu\n", qp->rx_memcpy);
@@ -386,6 +380,17 @@ static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
seq_printf(s, "Using TX DMA - \t%s\n", qp->tx_dma_chan ? "Yes" : "No");
seq_printf(s, "Using RX DMA - \t%s\n", qp->rx_dma_chan ? "Yes" : "No");
seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
+}
+
+static int ntb_qp_debugfs_stats_show(struct seq_file *s, void *v)
+{
+ struct ntb_transport_qp *qp = s->private;
+
+ if (!qp || !qp->link_is_up)
+ return 0;
+
+ seq_puts(s, "\nNTB QP stats:\n\n");
+ qp->transport->backend_ops.debugfs_stats_show(s, qp);
seq_putc(s, '\n');
return 0;
@@ -440,8 +445,8 @@ struct ntb_queue_entry *ntb_list_mv(spinlock_t *lock, struct list_head *list,
return entry;
}
-static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt,
- unsigned int qp_num)
+static int ntb_transport_default_setup_qp_mw(struct ntb_transport_ctx *nt,
+ unsigned int qp_num)
{
struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
struct ntb_transport_mw *mw;
@@ -994,7 +999,7 @@ static void ntb_transport_link_work(struct work_struct *work)
for (i = 0; i < nt->qp_count; i++) {
struct ntb_transport_qp *qp = &nt->qp_vec[i];
- ntb_transport_setup_qp_mw(nt, i);
+ nt->backend_ops.setup_qp_mw(nt, i);
ntb_transport_setup_qp_peer_msi(nt, i);
if (qp->client_ready)
@@ -1095,6 +1100,46 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt, unsigned int qp_num)
return 0;
}
+static unsigned int ntb_transport_default_tx_free_entry(struct ntb_transport_qp *qp)
+{
+ unsigned int head = qp->tx_index;
+ unsigned int tail = qp->remote_rx_info->entry;
+
+ return tail >= head ? tail - head : qp->tx_max_entry + tail - head;
+}
+
+static int ntb_transport_default_rx_enqueue(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
+
+ if (qp->active)
+ tasklet_schedule(&qp->rxc_db_work);
+
+ return 0;
+}
+
+static void ntb_transport_default_rx_poll(struct ntb_transport_qp *qp);
+static int ntb_transport_default_tx_enqueue(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry,
+ void *cb, void *data, unsigned int len,
+ unsigned int flags);
+
+static const struct ntb_transport_backend_ops default_backend_ops = {
+ .setup_qp_mw = ntb_transport_default_setup_qp_mw,
+ .tx_free_entry = ntb_transport_default_tx_free_entry,
+ .tx_enqueue = ntb_transport_default_tx_enqueue,
+ .rx_enqueue = ntb_transport_default_rx_enqueue,
+ .rx_poll = ntb_transport_default_rx_poll,
+ .debugfs_stats_show = ntb_transport_default_debugfs_stats_show,
+};
+
+static int ntb_transport_default_init(struct ntb_transport_ctx *nt)
+{
+ nt->backend_ops = default_backend_ops;
+ return 0;
+}
+
static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
{
struct ntb_transport_ctx *nt;
@@ -1129,6 +1174,10 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
nt->ndev = ndev;
+ rc = ntb_transport_default_init(nt);
+ if (rc)
+ return rc;
+
/*
* If we are using MSI, and have at least one extra memory window,
* we will reserve the last MW for the MSI window.
@@ -1538,14 +1587,10 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
return 0;
}
-static void ntb_transport_rxc_db(unsigned long data)
+static void ntb_transport_default_rx_poll(struct ntb_transport_qp *qp)
{
- struct ntb_transport_qp *qp = (void *)data;
int rc, i;
- dev_dbg(&qp->ndev->pdev->dev, "%s: doorbell %d received\n",
- __func__, qp->qp_num);
-
/* Limit the number of packets processed in a single interrupt to
* provide fairness to others
*/
@@ -1577,6 +1622,17 @@ static void ntb_transport_rxc_db(unsigned long data)
}
}
+static void ntb_transport_rxc_db(unsigned long data)
+{
+ struct ntb_transport_qp *qp = (void *)data;
+ struct ntb_transport_ctx *nt = qp->transport;
+
+ dev_dbg(&qp->ndev->pdev->dev, "%s: doorbell %d received\n",
+ __func__, qp->qp_num);
+
+ nt->backend_ops.rx_poll(qp);
+}
+
static void ntb_tx_copy_callback(void *data,
const struct dmaengine_result *res)
{
@@ -1746,9 +1802,18 @@ static void ntb_async_tx(struct ntb_transport_qp *qp,
qp->tx_memcpy++;
}
-static int ntb_process_tx(struct ntb_transport_qp *qp,
- struct ntb_queue_entry *entry)
+static int ntb_transport_default_tx_enqueue(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry,
+ void *cb, void *data, unsigned int len,
+ unsigned int flags)
{
+ entry->cb_data = cb;
+ entry->buf = data;
+ entry->len = len;
+ entry->flags = flags;
+ entry->errors = 0;
+ entry->tx_index = 0;
+
if (!ntb_transport_tx_free_entry(qp)) {
qp->tx_ring_full++;
return -EAGAIN;
@@ -1775,6 +1840,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
static void ntb_send_link_down(struct ntb_transport_qp *qp)
{
+ struct ntb_transport_ctx *nt = qp->transport;
struct pci_dev *pdev = qp->ndev->pdev;
struct ntb_queue_entry *entry;
int i, rc;
@@ -1794,12 +1860,7 @@ static void ntb_send_link_down(struct ntb_transport_qp *qp)
if (!entry)
return;
- entry->cb_data = NULL;
- entry->buf = NULL;
- entry->len = 0;
- entry->flags = LINK_DOWN_FLAG;
-
- rc = ntb_process_tx(qp, entry);
+ rc = nt->backend_ops.tx_enqueue(qp, entry, NULL, NULL, 0, LINK_DOWN_FLAG);
if (rc)
dev_err(&pdev->dev, "ntb: QP%d unable to send linkdown msg\n",
qp->qp_num);
@@ -2086,6 +2147,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_rx_remove);
int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
unsigned int len)
{
+ struct ntb_transport_ctx *nt = qp->transport;
struct ntb_queue_entry *entry;
if (!qp)
@@ -2103,12 +2165,7 @@ int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
entry->errors = 0;
entry->rx_index = 0;
- ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
-
- if (qp->active)
- tasklet_schedule(&qp->rxc_db_work);
-
- return 0;
+ return nt->backend_ops.rx_enqueue(qp, entry);
}
EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
@@ -2128,6 +2185,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
unsigned int len)
{
+ struct ntb_transport_ctx *nt = qp->transport;
struct ntb_queue_entry *entry;
int rc;
@@ -2144,15 +2202,7 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
return -EBUSY;
}
- entry->cb_data = cb;
- entry->buf = data;
- entry->len = len;
- entry->flags = 0;
- entry->errors = 0;
- entry->retries = 0;
- entry->tx_index = 0;
-
- rc = ntb_process_tx(qp, entry);
+ rc = nt->backend_ops.tx_enqueue(qp, entry, cb, data, len, 0);
if (rc)
ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
&qp->tx_free_q);
@@ -2274,10 +2324,9 @@ EXPORT_SYMBOL_GPL(ntb_transport_max_size);
unsigned int ntb_transport_tx_free_entry(struct ntb_transport_qp *qp)
{
- unsigned int head = qp->tx_index;
- unsigned int tail = qp->remote_rx_info->entry;
+ struct ntb_transport_ctx *nt = qp->transport;
- return tail >= head ? tail - head : qp->tx_max_entry + tail - head;
+ return nt->backend_ops.tx_free_entry(qp);
}
EXPORT_SYMBOL_GPL(ntb_transport_tx_free_entry);
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index 79c7dbcf6f91..33c06be36dfd 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -120,12 +120,33 @@ struct ntb_transport_mw {
dma_addr_t dma_addr;
};
+/**
+ * struct ntb_transport_backend_ops - backend-specific transport hooks
+ * @setup_qp_mw: Set up memory windows for a given queue pair.
+ * @tx_free_entry: Return the number of free TX entries for the queue pair.
+ * @tx_enqueue: Backend-specific TX enqueue implementation.
+ * @rx_enqueue: Backend-specific RX enqueue implementation.
+ * @rx_poll: Poll for RX completions / push new RX buffers.
+ * @debugfs_stats_show: Dump backend-specific statistics, if any.
+ */
+struct ntb_transport_backend_ops {
+ int (*setup_qp_mw)(struct ntb_transport_ctx *nt, unsigned int qp_num);
+ unsigned int (*tx_free_entry)(struct ntb_transport_qp *qp);
+ int (*tx_enqueue)(struct ntb_transport_qp *qp, struct ntb_queue_entry *entry,
+ void *cb, void *data, unsigned int len, unsigned int flags);
+ int (*rx_enqueue)(struct ntb_transport_qp *qp, struct ntb_queue_entry *entry);
+ void (*rx_poll)(struct ntb_transport_qp *qp);
+ void (*debugfs_stats_show)(struct seq_file *s, struct ntb_transport_qp *qp);
+};
+
struct ntb_transport_ctx {
struct list_head entry;
struct list_head client_devs;
struct ntb_dev *ndev;
+ struct ntb_transport_backend_ops backend_ops;
+
struct ntb_transport_mw *mw_vec;
struct ntb_transport_qp *qp_vec;
unsigned int mw_count;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 17/35] dmaengine: dw-edma: Add helper func to retrieve register base and size
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (15 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 16/35] NTB: ntb_transport: Introduce ntb_transport_backend_ops Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 18/35] dmaengine: dw-edma: Add per-channel interrupt routing mode Koichiro Den
` (18 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Remote eDMA users (e.g. NTB) may need to expose the integrated DW eDMA
register block through a memory window.
Add a helper function that returns the physical base and size for a
given DesignWare EP controller.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
.../pci/controller/dwc/pcie-designware-ep.c | 1 +
drivers/pci/controller/dwc/pcie-designware.c | 25 +++++++++++++++++++
include/linux/dma/edma.h | 24 ++++++++++++++++++
3 files changed, 50 insertions(+)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 9480aebaa32a..46d18e7945db 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -12,6 +12,7 @@
#include <linux/platform_device.h>
#include "pcie-designware.h"
+#include <linux/dma/edma.h>
#include <linux/pci-epc.h>
#include <linux/pci-epf.h>
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index 75fc8b767fcc..1de88df7b1af 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -162,8 +162,12 @@ int dw_pcie_get_resources(struct dw_pcie *pci)
pci->edma.reg_base = devm_ioremap_resource(pci->dev, res);
if (IS_ERR(pci->edma.reg_base))
return PTR_ERR(pci->edma.reg_base);
+ pci->edma.reg_phys = res->start;
+ pci->edma.reg_size = resource_size(res);
} else if (pci->atu_size >= 2 * DEFAULT_DBI_DMA_OFFSET) {
pci->edma.reg_base = pci->atu_base + DEFAULT_DBI_DMA_OFFSET;
+ pci->edma.reg_phys = pci->atu_phys_addr + DEFAULT_DBI_DMA_OFFSET;
+ pci->edma.reg_size = pci->atu_size - DEFAULT_DBI_DMA_OFFSET;
}
}
@@ -1204,3 +1208,24 @@ resource_size_t dw_pcie_parent_bus_offset(struct dw_pcie *pci,
return cpu_phys_addr - reg_addr;
}
+
+int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys, size_t *sz)
+{
+ struct dw_pcie_ep *ep = epc_get_drvdata(epc);
+ struct dw_pcie *pci;
+
+ if (!ep)
+ return -ENODEV;
+
+ pci = to_dw_pcie_from_ep(ep);
+ if (!pci->edma.reg_base || !pci->edma.reg_phys)
+ return -ENODEV;
+
+ if (phys)
+ *phys = pci->edma.reg_phys;
+ if (sz)
+ *sz = pci->edma.reg_size;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_get_reg_window);
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 3080747689f6..11d6eeb19fff 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -11,6 +11,7 @@
#include <linux/device.h>
#include <linux/dmaengine.h>
+#include <linux/pci-epc.h>
#define EDMA_MAX_WR_CH 8
#define EDMA_MAX_RD_CH 8
@@ -60,6 +61,27 @@ enum dw_edma_chip_flags {
DW_EDMA_CHIP_LOCAL = BIT(0),
};
+#if IS_REACHABLE(CONFIG_PCIE_DW)
+/**
+ * dw_edma_get_reg_window - get eDMA register base and size
+ *
+ * @epc: the EPC device with which the eDMA instance is integrated
+ * @phys: the output parameter that returns the register base address
+ * @sz: the output parameter that returns the register space size
+ *
+ * Remote eDMA users (e.g. NTB) may need to expose the integrated DW eDMA
+ * register block through a memory window. This helper returns the physical
+ * base and size for a given DesignWare EP controller.
+ */
+int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys, size_t *sz);
+#else
+static inline int dw_edma_get_reg_window(struct pci_epc *epc, phys_addr_t *phys,
+ size_t *sz)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_PCIE_DW */
+
/**
* struct dw_edma_chip - representation of DesignWare eDMA controller hardware
* @dev: struct device of the eDMA controller
@@ -85,6 +107,8 @@ struct dw_edma_chip {
u32 flags;
void __iomem *reg_base;
+ phys_addr_t reg_phys;
+ size_t reg_size;
u16 ll_wr_cnt;
u16 ll_rd_cnt;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 18/35] dmaengine: dw-edma: Add per-channel interrupt routing mode
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (16 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 17/35] dmaengine: dw-edma: Add helper func to retrieve register base and size Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 19/35] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled Koichiro Den
` (17 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
DesignWare eDMA linked-list mode supports both local and remote
completion interrupts (LIE/RIE). For remote eDMA users, we need to
decide per-channel whether completion should be handled locally,
remotely, or both.
Introduce a per-channel interrupt routing mode and export a small API to
configure/query it. Update v0 programming so that RIE and local
done/abort interrupt masking follow the selected mode. The default mode
keeps the original behavior, so unless the new APIs are explicitly used,
no functional changes.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.c | 49 +++++++++++++++++++++++++++
drivers/dma/dw-edma/dw-edma-core.h | 2 ++
drivers/dma/dw-edma/dw-edma-v0-core.c | 26 +++++++++-----
include/linux/dma/edma.h | 46 +++++++++++++++++++++++++
4 files changed, 115 insertions(+), 8 deletions(-)
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 1b935da65d05..0bceca2d56c5 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -765,6 +765,7 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
chan->configured = false;
chan->request = EDMA_REQ_NONE;
chan->status = EDMA_ST_IDLE;
+ chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
if (chan->dir == EDMA_DIR_WRITE)
chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
@@ -1059,6 +1060,54 @@ int dw_edma_remove(struct dw_edma_chip *chip)
}
EXPORT_SYMBOL_GPL(dw_edma_remove);
+int dw_edma_chan_irq_config(struct dma_chan *dchan,
+ enum dw_edma_ch_irq_mode mode)
+{
+ struct dw_edma_chan *chan;
+
+ /* Only LOCAL/REMOTE bits are valid. Zero keeps legacy behaviour. */
+ if (mode & ~(DW_EDMA_CH_IRQ_LOCAL | DW_EDMA_CH_IRQ_REMOTE))
+ return -EINVAL;
+
+ if (!dchan || !dchan->device ||
+ dchan->device->device_prep_slave_sg_config != dw_edma_device_prep_slave_sg_config)
+ return -ENODEV;
+
+ chan = dchan2dw_edma_chan(dchan);
+ if (!chan)
+ return -ENODEV;
+
+ chan->irq_mode = mode;
+
+ dev_vdbg(chan->dw->chip->dev, "Channel: %s[%u] set irq_mode=%u\n",
+ str_write_read(chan->dir == EDMA_DIR_WRITE),
+ chan->id, mode);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
+
+bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
+{
+ struct dw_edma_chan *chan;
+ struct dw_edma *dw;
+
+ if (!dchan || !dchan->device ||
+ dchan->device->device_prep_slave_sg_config != dw_edma_device_prep_slave_sg_config)
+ return false;
+
+ chan = dchan2dw_edma_chan(dchan);
+ if (!chan)
+ return false;
+
+ dw = chan->dw;
+ if (dw->chip->flags & DW_EDMA_CHIP_LOCAL)
+ return chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE;
+ else
+ return chan->irq_mode == DW_EDMA_CH_IRQ_LOCAL;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
+
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 71894b9e0b15..8458d676551a 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -81,6 +81,8 @@ struct dw_edma_chan {
struct msi_msg msi;
+ enum dw_edma_ch_irq_mode irq_mode;
+
enum dw_edma_request request;
enum dw_edma_status status;
u8 configured;
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index b75fdaffad9a..42a254eb9379 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -256,8 +256,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
for_each_set_bit(pos, &val, total) {
chan = &dw->chan[pos + off];
- dw_edma_v0_core_clear_done_int(chan);
- done(chan);
+ if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+ dw_edma_v0_core_clear_done_int(chan);
+ done(chan);
+ }
ret = IRQ_HANDLED;
}
@@ -267,8 +269,10 @@ dw_edma_v0_core_handle_int(struct dw_edma_irq *dw_irq, enum dw_edma_dir dir,
for_each_set_bit(pos, &val, total) {
chan = &dw->chan[pos + off];
- dw_edma_v0_core_clear_abort_int(chan);
- abort(chan);
+ if (!dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+ dw_edma_v0_core_clear_abort_int(chan);
+ abort(chan);
+ }
ret = IRQ_HANDLED;
}
@@ -331,7 +335,8 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
j--;
if (!j) {
control |= DW_EDMA_V0_LIE;
- if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ if (!(chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL) &&
+ chan->irq_mode != DW_EDMA_CH_IRQ_LOCAL)
control |= DW_EDMA_V0_RIE;
}
@@ -407,10 +412,15 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
break;
}
}
- /* Interrupt unmask - done, abort */
+ /* Interrupt mask/unmask - done, abort */
tmp = GET_RW_32(dw, chan->dir, int_mask);
- tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
- tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+ if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
+ tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
+ tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+ } else {
+ tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
+ tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
+ }
SET_RW_32(dw, chan->dir, int_mask, tmp);
/* Linked list error */
tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 11d6eeb19fff..8c1b1d25fa44 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -61,6 +61,40 @@ enum dw_edma_chip_flags {
DW_EDMA_CHIP_LOCAL = BIT(0),
};
+/*
+ * enum dw_edma_ch_irq_mode - per-channel interrupt routing control
+ * @DW_EDMA_CH_IRQ_DEFAULT: LIE=1/RIE=1, local interrupt unmasked
+ * @DW_EDMA_CH_IRQ_LOCAL: LIE=1/RIE=0
+ * @DW_EDMA_CH_IRQ_REMOTE: LIE=1/RIE=1, local interrupt masked
+ *
+ * Some implementations require using LIE=1/RIE=1 with the local interrupt
+ * masked to generate a remote-only interrupt (rather than LIE=0/RIE=1).
+ * See the DesignWare endpoint databook 5.40, "Hint" below "Figure 8-22
+ * Write Interrupt Generation".
+ */
+enum dw_edma_ch_irq_mode {
+ DW_EDMA_CH_IRQ_DEFAULT = 0,
+ DW_EDMA_CH_IRQ_LOCAL,
+ DW_EDMA_CH_IRQ_REMOTE,
+};
+
+/**
+ * dw_edma_chan_irq_config - configure per-channel interrupt routing
+ * @chan: DMA channel obtained from dma_request_channel()
+ * @mode: interrupt routing mode
+ *
+ * Returns 0 on success, -EINVAL for invalid @mode, or -ENODEV if @chan does
+ * not belong to the DesignWare eDMA driver.
+ */
+int dw_edma_chan_irq_config(struct dma_chan *chan,
+ enum dw_edma_ch_irq_mode mode);
+
+/**
+ * dw_edma_chan_ignore_irq - tell whether local IRQ handling should be ignored
+ * @chan: DMA channel obtained from dma_request_channel()
+ */
+bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
+
#if IS_REACHABLE(CONFIG_PCIE_DW)
/**
* dw_edma_get_reg_window - get eDMA register base and size
@@ -141,4 +175,16 @@ static inline int dw_edma_remove(struct dw_edma_chip *chip)
}
#endif /* CONFIG_DW_EDMA */
+#if !IS_ENABLED(CONFIG_DW_EDMA)
+static inline int dw_edma_chan_irq_config(struct dma_chan *chan,
+ enum dw_edma_ch_irq_mode mode)
+{
+ return -ENODEV;
+}
+static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
+{
+ return false;
+}
+#endif
+
#endif /* _DW_EDMA_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 19/35] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (17 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 18/35] dmaengine: dw-edma: Add per-channel interrupt routing mode Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 20/35] dmaengine: dw-edma: Add notify-only channels support Koichiro Den
` (16 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
When a channel is configured to suppress host side interruption (RIE=0),
the host side driver cannot rely on IRQ-driven progress. Add an optional
polling path for such channels. Polling is only enabled for channels where
dw_edma_chan_ignore_irq() is true.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.c | 97 ++++++++++++++++++++++++------
drivers/dma/dw-edma/dw-edma-core.h | 4 ++
2 files changed, 84 insertions(+), 17 deletions(-)
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 0bceca2d56c5..09b10ad1f38a 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -303,23 +303,6 @@ static int dw_edma_device_terminate_all(struct dma_chan *dchan)
return err;
}
-static void dw_edma_device_issue_pending(struct dma_chan *dchan)
-{
- struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
- unsigned long flags;
-
- if (!chan->configured)
- return;
-
- spin_lock_irqsave(&chan->vc.lock, flags);
- if (vchan_issue_pending(&chan->vc) && chan->request == EDMA_REQ_NONE &&
- chan->status == EDMA_ST_IDLE) {
- chan->status = EDMA_ST_BUSY;
- dw_edma_start_transfer(chan);
- }
- spin_unlock_irqrestore(&chan->vc.lock, flags);
-}
-
static enum dma_status
dw_edma_device_tx_status(struct dma_chan *dchan, dma_cookie_t cookie,
struct dma_tx_state *txstate)
@@ -707,6 +690,68 @@ static irqreturn_t dw_edma_interrupt_common(int irq, void *data)
return ret;
}
+static void dw_edma_done_arm(struct dw_edma_chan *chan)
+{
+ if (!dw_edma_chan_ignore_irq(&chan->vc.chan))
+ /* no need to arm since it's not to be ignored */
+ return;
+
+ queue_delayed_work(system_wq, &chan->poll_work, 1);
+}
+
+static void dw_edma_chan_poll_done(struct dma_chan *dchan)
+{
+ struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+ enum dma_status st;
+
+ if (!dw_edma_chan_ignore_irq(dchan))
+ /* no need to poll since it's not to be ignored */
+ return;
+
+ guard(spinlock_irqsave)(&chan->poll_lock);
+
+ if (chan->status != EDMA_ST_BUSY)
+ return;
+
+ st = dw_edma_core_ch_status(chan);
+
+ switch (st) {
+ case DMA_COMPLETE:
+ dw_edma_done_interrupt(chan);
+ if (chan->status == EDMA_ST_BUSY)
+ dw_edma_done_arm(chan);
+ break;
+ case DMA_IN_PROGRESS:
+ dw_edma_done_arm(chan);
+ break;
+ case DMA_ERROR:
+ dw_edma_abort_interrupt(chan);
+ break;
+ default:
+ break;
+ }
+}
+
+static void dw_edma_device_issue_pending(struct dma_chan *dchan)
+{
+ struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
+ unsigned long flags;
+
+ if (!chan->configured)
+ return;
+
+ dw_edma_chan_poll_done(dchan);
+
+ spin_lock_irqsave(&chan->vc.lock, flags);
+ if (vchan_issue_pending(&chan->vc) && chan->request == EDMA_REQ_NONE &&
+ chan->status == EDMA_ST_IDLE) {
+ chan->status = EDMA_ST_BUSY;
+ dw_edma_start_transfer(chan);
+ } else
+ dw_edma_done_arm(chan);
+ spin_unlock_irqrestore(&chan->vc.lock, flags);
+}
+
static int dw_edma_alloc_chan_resources(struct dma_chan *dchan)
{
struct dw_edma_chan *chan = dchan2dw_edma_chan(dchan);
@@ -1060,6 +1105,19 @@ int dw_edma_remove(struct dw_edma_chip *chip)
}
EXPORT_SYMBOL_GPL(dw_edma_remove);
+static void dw_edma_poll_work(struct work_struct *work)
+{
+ struct delayed_work *dwork = to_delayed_work(work);
+ struct dw_edma_chan *chan =
+ container_of(dwork, struct dw_edma_chan, poll_work);
+ struct dma_chan *dchan = &chan->vc.chan;
+
+ if (!chan->configured)
+ return;
+
+ dw_edma_chan_poll_done(dchan);
+}
+
int dw_edma_chan_irq_config(struct dma_chan *dchan,
enum dw_edma_ch_irq_mode mode)
{
@@ -1083,6 +1141,11 @@ int dw_edma_chan_irq_config(struct dma_chan *dchan,
str_write_read(chan->dir == EDMA_DIR_WRITE),
chan->id, mode);
+ if (dw_edma_chan_ignore_irq(&chan->vc.chan)) {
+ spin_lock_init(&chan->poll_lock);
+ INIT_DELAYED_WORK(&chan->poll_work, dw_edma_poll_work);
+ }
+
return 0;
}
EXPORT_SYMBOL_GPL(dw_edma_chan_irq_config);
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 8458d676551a..11fe4532f0bf 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -11,6 +11,7 @@
#include <linux/msi.h>
#include <linux/dma/edma.h>
+#include <linux/workqueue.h>
#include "../virt-dma.h"
@@ -83,6 +84,9 @@ struct dw_edma_chan {
enum dw_edma_ch_irq_mode irq_mode;
+ struct delayed_work poll_work;
+ spinlock_t poll_lock;
+
enum dw_edma_request request;
enum dw_edma_status status;
u8 configured;
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 20/35] dmaengine: dw-edma: Add notify-only channels support
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (18 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 19/35] dmaengine: dw-edma: Poll completion when local IRQ handling is disabled Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 21/35] dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region Koichiro Den
` (15 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Remote eDMA users may want to prepare descriptors on the remote side while
the local side only needs completion notifications (no cookie-based
accounting).
Provide a lightweight per-channel notification callback infrastructure.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.c | 32 ++++++++++++++++++++++++++++++
drivers/dma/dw-edma/dw-edma-core.h | 4 ++++
include/linux/dma/edma.h | 22 ++++++++++++++++++++
3 files changed, 58 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 09b10ad1f38a..8e262f61f02d 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -608,6 +608,13 @@ static void dw_edma_done_interrupt(struct dw_edma_chan *chan)
struct virt_dma_desc *vd;
unsigned long flags;
+ if (chan->notify_only) {
+ if (chan->notify_cb)
+ chan->notify_cb(&chan->vc.chan, chan->notify_cb_param);
+ /* no cookie on this side, just return */
+ return;
+ }
+
spin_lock_irqsave(&chan->vc.lock, flags);
vd = vchan_next_desc(&chan->vc);
if (vd) {
@@ -811,6 +818,9 @@ static int dw_edma_channel_setup(struct dw_edma *dw, u32 wr_alloc, u32 rd_alloc)
chan->request = EDMA_REQ_NONE;
chan->status = EDMA_ST_IDLE;
chan->irq_mode = DW_EDMA_CH_IRQ_DEFAULT;
+ chan->notify_cb = NULL;
+ chan->notify_cb_param = NULL;
+ chan->notify_only = false;
if (chan->dir == EDMA_DIR_WRITE)
chan->ll_max = (chip->ll_region_wr[chan->id].sz / EDMA_LL_SZ);
@@ -1171,6 +1181,28 @@ bool dw_edma_chan_ignore_irq(struct dma_chan *dchan)
}
EXPORT_SYMBOL_GPL(dw_edma_chan_ignore_irq);
+int dw_edma_chan_register_notify(struct dma_chan *dchan,
+ void (*cb)(struct dma_chan *chan, void *user),
+ void *user)
+{
+ struct dw_edma_chan *chan;
+
+ if (!dchan || !dchan->device ||
+ dchan->device->device_prep_slave_sg_config != dw_edma_device_prep_slave_sg_config)
+ return -ENODEV;
+
+ chan = dchan2dw_edma_chan(dchan);
+ if (!chan)
+ return -ENODEV;
+
+ chan->notify_cb = cb;
+ chan->notify_cb_param = user;
+ chan->notify_only = !!cb;
+
+ return dw_edma_chan_irq_config(dchan, DW_EDMA_CH_IRQ_LOCAL);
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
+
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index 11fe4532f0bf..f652d2e38843 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -84,6 +84,10 @@ struct dw_edma_chan {
enum dw_edma_ch_irq_mode irq_mode;
+ void (*notify_cb)(struct dma_chan *chan, void *user);
+ void *notify_cb_param;
+ bool notify_only;
+
struct delayed_work poll_work;
spinlock_t poll_lock;
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 8c1b1d25fa44..4caf5cc5c368 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -95,6 +95,21 @@ int dw_edma_chan_irq_config(struct dma_chan *chan,
*/
bool dw_edma_chan_ignore_irq(struct dma_chan *chan);
+/**
+ * dw_edma_chan_register_notify - register local completion callback for a
+ * notification-only channel
+ * @chan: DMA channel obtained from dma_request_channel()
+ * @cb: callback invoked in hardirq context when LIE interrupt is raised
+ * @user: opaque pointer passed back to @cb
+ *
+ * Intended for channels where descriptors are prepared on the remote side and
+ * the local side only wants completion notifications. This forces LOCAL mode
+ * so that the local side receives LIE interrupts.
+ */
+int dw_edma_chan_register_notify(struct dma_chan *chan,
+ void (*cb)(struct dma_chan *chan, void *user),
+ void *user);
+
#if IS_REACHABLE(CONFIG_PCIE_DW)
/**
* dw_edma_get_reg_window - get eDMA register base and size
@@ -185,6 +200,13 @@ static inline bool dw_edma_chan_ignore_irq(struct dma_chan *chan)
{
return false;
}
+static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
+ void (*cb)(struct dma_chan *chan,
+ void *user),
+ void *user)
+{
+ return -ENODEV;
+}
#endif
#endif /* _DW_EDMA_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 21/35] dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (19 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 20/35] dmaengine: dw-edma: Add notify-only channels support Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers Koichiro Den
` (14 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Remote eDMA users may want to know LL memory region addresses that were
configured perhaps on boot time by SoC glue driver, so that those
regions can later be exposed to the remote host side, who will run
dw_edma_probe() to configure remote eDMA.
Export a helper to query the LL region associated with a dma_chan.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.c | 27 +++++++++++++++++++++++++++
include/linux/dma/edma.h | 14 ++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 8e262f61f02d..77f523f40038 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -1203,6 +1203,33 @@ int dw_edma_chan_register_notify(struct dma_chan *dchan,
}
EXPORT_SYMBOL_GPL(dw_edma_chan_register_notify);
+int dw_edma_chan_get_ll_region(struct dma_chan *dchan,
+ struct dw_edma_region *region)
+{
+ struct dw_edma_chip *chip;
+ struct dw_edma_chan *chan;
+
+ if (!dchan || !region || !dchan->device ||
+ dchan->device->device_prep_slave_sg_config != dw_edma_device_prep_slave_sg_config)
+ return -ENODEV;
+
+ chan = dchan2dw_edma_chan(dchan);
+ if (!chan)
+ return -ENODEV;
+
+ chip = chan->dw->chip;
+ if (!(chip->flags & DW_EDMA_CHIP_LOCAL))
+ return -EINVAL;
+
+ if (chan->dir == EDMA_DIR_WRITE)
+ *region = chip->ll_region_wr[chan->id];
+ else
+ *region = chip->ll_region_rd[chan->id];
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dw_edma_chan_get_ll_region);
+
MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("Synopsys DesignWare eDMA controller core driver");
MODULE_AUTHOR("Gustavo Pimentel <gustavo.pimentel@synopsys.com>");
diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h
index 4caf5cc5c368..1f40e027fa56 100644
--- a/include/linux/dma/edma.h
+++ b/include/linux/dma/edma.h
@@ -110,6 +110,15 @@ int dw_edma_chan_register_notify(struct dma_chan *chan,
void (*cb)(struct dma_chan *chan, void *user),
void *user);
+/**
+ * dw_edma_chan_get_ll_region - get linked list (LL) memory for a dma_chan
+ * @chan: the target DMA channel
+ * @region: output parameter returning the corresponding LL region
+ */
+int dw_edma_chan_get_ll_region(struct dma_chan *chan,
+ struct dw_edma_region *region);
+
+
#if IS_REACHABLE(CONFIG_PCIE_DW)
/**
* dw_edma_get_reg_window - get eDMA register base and size
@@ -207,6 +216,11 @@ static inline int dw_edma_chan_register_notify(struct dma_chan *chan,
{
return -ENODEV;
}
+static inline int dw_edma_chan_get_ll_region(struct dma_chan *chan,
+ struct dw_edma_region *region)
+{
+ return -EINVAL;
+}
#endif
#endif /* _DW_EDMA_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (20 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 21/35] dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-19 14:39 ` Frank Li
2025-12-17 15:15 ` [RFC PATCH v3 23/35] NTB: ntb_transport: Split core into ntb_transport_core.c Koichiro Den
` (13 subsequent siblings)
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
The per-direction int_mask and linked_list_err_en registers are shared
between all channels. Updating them requires a read-modify-write
sequence, which can lose concurrent updates when multiple channels are
started in parallel. This may leave interrupts masked and stall
transfers under high load.
Protect the RMW sequences with dw->lock.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/dma/dw-edma/dw-edma-core.h | 3 ++-
drivers/dma/dw-edma/dw-edma-v0-core.c | 13 ++++++++++---
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
index f652d2e38843..d393976a8bfc 100644
--- a/drivers/dma/dw-edma/dw-edma-core.h
+++ b/drivers/dma/dw-edma/dw-edma-core.h
@@ -118,7 +118,8 @@ struct dw_edma {
struct dw_edma_chan *chan;
- raw_spinlock_t lock; /* Only for legacy */
+ /* For legacy + shared regs RMW among channels */
+ raw_spinlock_t lock;
struct dw_edma_chip *chip;
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index 42a254eb9379..770b011ba3e4 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -369,7 +369,8 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
struct dw_edma *dw = chan->dw;
- u32 tmp;
+ unsigned long flags;
+ u32 tmp, orig;
dw_edma_v0_core_write_chunk(chunk);
@@ -413,7 +414,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
}
}
/* Interrupt mask/unmask - done, abort */
+ raw_spin_lock_irqsave(&dw->lock, flags);
tmp = GET_RW_32(dw, chan->dir, int_mask);
+ orig = tmp;
if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
@@ -421,11 +424,15 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
}
- SET_RW_32(dw, chan->dir, int_mask, tmp);
+ if (tmp != orig)
+ SET_RW_32(dw, chan->dir, int_mask, tmp);
/* Linked list error */
tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
+ orig = tmp;
tmp |= FIELD_PREP(EDMA_V0_LINKED_LIST_ERR_MASK, BIT(chan->id));
- SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
+ if (tmp != orig)
+ SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
+ raw_spin_unlock_irqrestore(&dw->lock, flags);
/* Channel control */
SET_CH_32(dw, chan->dir, chan->id, ch_control1,
(DW_EDMA_V0_CCS | DW_EDMA_V0_LLE));
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers
2025-12-17 15:15 ` [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers Koichiro Den
@ 2025-12-19 14:39 ` Frank Li
2025-12-20 15:21 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 14:39 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:56AM +0900, Koichiro Den wrote:
> The per-direction int_mask and linked_list_err_en registers are shared
> between all channels. Updating them requires a read-modify-write
> sequence, which can lose concurrent updates when multiple channels are
> started in parallel. This may leave interrupts masked and stall
> transfers under high load.
>
> Protect the RMW sequences with dw->lock.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
I just posted a similar patch
https://lore.kernel.org/imx/20251212-edma_ll-v1-1-fc863d9f5ca3@nxp.com/
It change some method and I am working on add new request during dma engine
running.
At least, you can base on above thread.
Frank
> drivers/dma/dw-edma/dw-edma-core.h | 3 ++-
> drivers/dma/dw-edma/dw-edma-v0-core.c | 13 ++++++++++---
> 2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> index f652d2e38843..d393976a8bfc 100644
> --- a/drivers/dma/dw-edma/dw-edma-core.h
> +++ b/drivers/dma/dw-edma/dw-edma-core.h
> @@ -118,7 +118,8 @@ struct dw_edma {
>
> struct dw_edma_chan *chan;
>
> - raw_spinlock_t lock; /* Only for legacy */
> + /* For legacy + shared regs RMW among channels */
> + raw_spinlock_t lock;
>
> struct dw_edma_chip *chip;
>
> diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> index 42a254eb9379..770b011ba3e4 100644
> --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> @@ -369,7 +369,8 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> {
> struct dw_edma_chan *chan = chunk->chan;
> struct dw_edma *dw = chan->dw;
> - u32 tmp;
> + unsigned long flags;
> + u32 tmp, orig;
>
> dw_edma_v0_core_write_chunk(chunk);
>
> @@ -413,7 +414,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> }
> }
> /* Interrupt mask/unmask - done, abort */
> + raw_spin_lock_irqsave(&dw->lock, flags);
> tmp = GET_RW_32(dw, chan->dir, int_mask);
> + orig = tmp;
> if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> @@ -421,11 +424,15 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> }
> - SET_RW_32(dw, chan->dir, int_mask, tmp);
> + if (tmp != orig)
> + SET_RW_32(dw, chan->dir, int_mask, tmp);
> /* Linked list error */
> tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> + orig = tmp;
> tmp |= FIELD_PREP(EDMA_V0_LINKED_LIST_ERR_MASK, BIT(chan->id));
> - SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
> + if (tmp != orig)
> + SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
> + raw_spin_unlock_irqrestore(&dw->lock, flags);
> /* Channel control */
> SET_CH_32(dw, chan->dir, chan->id, ch_control1,
> (DW_EDMA_V0_CCS | DW_EDMA_V0_LLE));
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers
2025-12-19 14:39 ` Frank Li
@ 2025-12-20 15:21 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-20 15:21 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Fri, Dec 19, 2025 at 09:39:37AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:56AM +0900, Koichiro Den wrote:
> > The per-direction int_mask and linked_list_err_en registers are shared
> > between all channels. Updating them requires a read-modify-write
> > sequence, which can lose concurrent updates when multiple channels are
> > started in parallel. This may leave interrupts masked and stall
> > transfers under high load.
> >
> > Protect the RMW sequences with dw->lock.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
>
> I just posted a similar patch
> https://lore.kernel.org/imx/20251212-edma_ll-v1-1-fc863d9f5ca3@nxp.com/
>
> It change some method and I am working on add new request during dma engine
> running.
>
> At least, you can base on above thread.
I hadn't seen it, thanks for the pointer. I'll read through it to base my
work on your series.
Koichiro
>
> Frank
>
> > drivers/dma/dw-edma/dw-edma-core.h | 3 ++-
> > drivers/dma/dw-edma/dw-edma-v0-core.c | 13 ++++++++++---
> > 2 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.h b/drivers/dma/dw-edma/dw-edma-core.h
> > index f652d2e38843..d393976a8bfc 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.h
> > +++ b/drivers/dma/dw-edma/dw-edma-core.h
> > @@ -118,7 +118,8 @@ struct dw_edma {
> >
> > struct dw_edma_chan *chan;
> >
> > - raw_spinlock_t lock; /* Only for legacy */
> > + /* For legacy + shared regs RMW among channels */
> > + raw_spinlock_t lock;
> >
> > struct dw_edma_chip *chip;
> >
> > diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > index 42a254eb9379..770b011ba3e4 100644
> > --- a/drivers/dma/dw-edma/dw-edma-v0-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
> > @@ -369,7 +369,8 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > {
> > struct dw_edma_chan *chan = chunk->chan;
> > struct dw_edma *dw = chan->dw;
> > - u32 tmp;
> > + unsigned long flags;
> > + u32 tmp, orig;
> >
> > dw_edma_v0_core_write_chunk(chunk);
> >
> > @@ -413,7 +414,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > }
> > }
> > /* Interrupt mask/unmask - done, abort */
> > + raw_spin_lock_irqsave(&dw->lock, flags);
> > tmp = GET_RW_32(dw, chan->dir, int_mask);
> > + orig = tmp;
> > if (chan->irq_mode == DW_EDMA_CH_IRQ_REMOTE) {
> > tmp |= FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > tmp |= FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > @@ -421,11 +424,15 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
> > tmp &= ~FIELD_PREP(EDMA_V0_DONE_INT_MASK, BIT(chan->id));
> > tmp &= ~FIELD_PREP(EDMA_V0_ABORT_INT_MASK, BIT(chan->id));
> > }
> > - SET_RW_32(dw, chan->dir, int_mask, tmp);
> > + if (tmp != orig)
> > + SET_RW_32(dw, chan->dir, int_mask, tmp);
> > /* Linked list error */
> > tmp = GET_RW_32(dw, chan->dir, linked_list_err_en);
> > + orig = tmp;
> > tmp |= FIELD_PREP(EDMA_V0_LINKED_LIST_ERR_MASK, BIT(chan->id));
> > - SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
> > + if (tmp != orig)
> > + SET_RW_32(dw, chan->dir, linked_list_err_en, tmp);
> > + raw_spin_unlock_irqrestore(&dw->lock, flags);
> > /* Channel control */
> > SET_CH_32(dw, chan->dir, chan->id, ch_control1,
> > (DW_EDMA_V0_CCS | DW_EDMA_V0_LLE));
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 23/35] NTB: ntb_transport: Split core into ntb_transport_core.c
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (21 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 22/35] dmaengine: dw-edma: Serialize RMW on shared interrupt registers Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 24/35] NTB: ntb_transport: Add additional hooks for DW eDMA backend Koichiro Den
` (12 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Prepare ntb_transport for alternative backend by renaming the current
implementation to ntb_transport_core.c and switching the module build to
ntb_transport-y.
No functional change.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/Makefile | 2 ++
drivers/ntb/{ntb_transport.c => ntb_transport_core.c} | 0
2 files changed, 2 insertions(+)
rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (100%)
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 3a6fa181ff99..9b66e5fafbc0 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -4,3 +4,5 @@ obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
ntb-y := core.o
ntb-$(CONFIG_NTB_MSI) += msi.o
+
+ntb_transport-y := ntb_transport_core.o
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport_core.c
similarity index 100%
rename from drivers/ntb/ntb_transport.c
rename to drivers/ntb/ntb_transport_core.c
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 24/35] NTB: ntb_transport: Add additional hooks for DW eDMA backend
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (22 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 23/35] NTB: ntb_transport: Split core into ntb_transport_core.c Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:15 ` [RFC PATCH v3 25/35] NTB: hw: Introduce DesignWare eDMA helper Koichiro Den
` (11 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add the infrastructure needed by the upcoming DW eDMA backed backend:
- add hooks and those invocations
(.enable/.disable/.pre_link_up/.post_link_up/.qp_init/.qp_free)
- store backend-private pointers in ctx/qp
No functional changes.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/ntb_transport_core.c | 34 ++++++++++++++++++++++++++++
drivers/ntb/ntb_transport_internal.h | 20 ++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index bff8b41a0d3e..40c2548f5930 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -879,6 +879,9 @@ static void ntb_transport_link_cleanup(struct ntb_transport_ctx *nt)
count = ntb_spad_count(nt->ndev);
for (i = 0; i < count; i++)
ntb_spad_write(nt->ndev, i, 0);
+
+ if (nt->backend_ops.disable)
+ nt->backend_ops.disable(nt);
}
static void ntb_transport_link_cleanup_work(struct work_struct *work)
@@ -915,6 +918,12 @@ static void ntb_transport_link_work(struct work_struct *work)
/* send the local info, in the opposite order of the way we read it */
+ if (nt->backend_ops.pre_link_up) {
+ rc = nt->backend_ops.pre_link_up(nt);
+ if (rc)
+ return;
+ }
+
if (nt->use_msi) {
rc = ntb_msi_setup_mws(ndev);
if (rc) {
@@ -996,6 +1005,12 @@ static void ntb_transport_link_work(struct work_struct *work)
nt->link_is_up = true;
+ if (nt->backend_ops.post_link_up) {
+ rc = nt->backend_ops.post_link_up(nt);
+ if (rc)
+ return;
+ }
+
for (i = 0; i < nt->qp_count; i++) {
struct ntb_transport_qp *qp = &nt->qp_vec[i];
@@ -1178,6 +1193,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
if (rc)
return rc;
+ if (nt->backend_ops.enable) {
+ rc = nt->backend_ops.enable(nt, &mw_count);
+ if (rc)
+ goto err;
+ }
+
/*
* If we are using MSI, and have at least one extra memory window,
* we will reserve the last MW for the MSI window.
@@ -1267,6 +1288,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
rc = ntb_transport_init_queue(nt, i);
if (rc)
goto err2;
+
+ if (nt->backend_ops.qp_init) {
+ rc = nt->backend_ops.qp_init(nt, i);
+ if (rc)
+ goto err2;
+ }
}
INIT_DELAYED_WORK(&nt->link_work, ntb_transport_link_work);
@@ -1298,6 +1325,9 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
}
kfree(nt->mw_vec);
err:
+ if (nt->backend_ops.disable)
+ nt->backend_ops.disable(nt);
+
kfree(nt);
return rc;
}
@@ -2021,6 +2051,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_create_queue);
*/
void ntb_transport_free_queue(struct ntb_transport_qp *qp)
{
+ struct ntb_transport_ctx *nt = qp->transport;
struct pci_dev *pdev;
struct ntb_queue_entry *entry;
u64 qp_bit;
@@ -2074,6 +2105,9 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
cancel_delayed_work_sync(&qp->link_work);
+ if (nt->backend_ops.qp_free)
+ nt->backend_ops.qp_free(qp);
+
qp->cb_data = NULL;
qp->rx_handler = NULL;
qp->tx_handler = NULL;
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index 33c06be36dfd..51ff08062d73 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -106,6 +106,9 @@ struct ntb_transport_qp {
int msi_irq;
struct ntb_msi_desc msi_desc;
struct ntb_msi_desc peer_msi_desc;
+
+ /* Backend-specific */
+ void *priv;
};
struct ntb_transport_mw {
@@ -122,6 +125,14 @@ struct ntb_transport_mw {
/**
* struct ntb_transport_backend_ops - backend-specific transport hooks
+ * @enable: Optional. Enable backend. Called once on
+ * ntb_transport_probe().
+ * @disable: Optional. Backend teardown hook.
+ * @qp_init: Optional. QP initialization hook called on
+ * ntb_transport_probe().
+ * @qp_free: Optional. Undo qp_init.
+ * @pre_link_up: Optional. Called before link-up handshake.
+ * @post_link_up: Optional. Called after link-up handshake.
* @setup_qp_mw: Set up memory windows for a given queue pair.
* @tx_free_entry: Return the number of free TX entries for the queue pair.
* @tx_enqueue: Backend-specific TX enqueue implementation.
@@ -130,6 +141,12 @@ struct ntb_transport_mw {
* @debugfs_stats_show: Dump backend-specific statistics, if any.
*/
struct ntb_transport_backend_ops {
+ int (*enable)(struct ntb_transport_ctx *nt, unsigned int *mw_count);
+ void (*disable)(struct ntb_transport_ctx *nt);
+ int (*qp_init)(struct ntb_transport_ctx *nt, unsigned int qp_num);
+ void (*qp_free)(struct ntb_transport_qp *qp);
+ int (*pre_link_up)(struct ntb_transport_ctx *nt);
+ int (*post_link_up)(struct ntb_transport_ctx *nt);
int (*setup_qp_mw)(struct ntb_transport_ctx *nt, unsigned int qp_num);
unsigned int (*tx_free_entry)(struct ntb_transport_qp *qp);
int (*tx_enqueue)(struct ntb_transport_qp *qp, struct ntb_queue_entry *entry,
@@ -166,6 +183,9 @@ struct ntb_transport_ctx {
/* Make sure workq of link event be executed serially */
struct mutex link_event_lock;
+
+ /* Backend-specific context */
+ void *priv;
};
enum {
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 25/35] NTB: hw: Introduce DesignWare eDMA helper
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (23 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 24/35] NTB: ntb_transport: Add additional hooks for DW eDMA backend Koichiro Den
@ 2025-12-17 15:15 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode Koichiro Den
` (10 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:15 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add a helper library under drivers/ntb/hw/edma/ that is to be used by
the NTB transport remote eDMA backend. This is not an NTB hardware
driver but rather encapsulates DesignWare eDMA specific plumbing.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/edma/ntb_hw_edma.c | 754 ++++++++++++++++++++++++++++++
drivers/ntb/hw/edma/ntb_hw_edma.h | 76 +++
2 files changed, 830 insertions(+)
create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.c
create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.h
diff --git a/drivers/ntb/hw/edma/ntb_hw_edma.c b/drivers/ntb/hw/edma/ntb_hw_edma.c
new file mode 100644
index 000000000000..50c4ddee285f
--- /dev/null
+++ b/drivers/ntb/hw/edma/ntb_hw_edma.c
@@ -0,0 +1,754 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NTB remote DesignWare eDMA helpers
+ *
+ * This file is a helper library used by the NTB transport remote-eDMA backend,
+ * not a standalone NTB hardware driver. It contains the DesignWare eDMA
+ * specific plumbing needed to expose/map peer-accessible resources via an NTB
+ * memory window and to manage DMA channels and peer notifications.
+ */
+
+#include <linux/dma/edma.h>
+#include <linux/dmaengine.h>
+#include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/irqdomain.h>
+#include <linux/ntb.h>
+#include <linux/pci.h>
+#include <linux/pci-epc.h>
+#include <linux/spinlock.h>
+
+#include "ntb_hw_edma.h"
+
+/* Default eDMA LLP memory size */
+#define DMA_LLP_MEM_SIZE PAGE_SIZE
+
+#define NTB_EDMA_MW_IDX_INVALID (-1)
+
+struct ntb_edma_ctx {
+ bool initialized;
+
+ /* Fields for the notification handling */
+ u32 qp_count;
+ u32 *notify_src_virt;
+ dma_addr_t notify_src_phys;
+ struct scatterlist sgl;
+
+ /* Host to EP scratch buffer to tell the event info */
+ union {
+ struct ntb_edma_db *db_virt;
+ struct ntb_edma_db __iomem *db_io;
+ };
+ dma_addr_t db_phys;
+
+ /* Below are the records for teardown path */
+
+ /* For ntb_edma_info to be unmapped on teardown */
+ struct ntb_edma_info *info_virt;
+ dma_addr_t info_phys;
+ size_t info_bytes;
+
+ int mw_index;
+ bool mw_trans_set;
+
+ /* eDMA register window IOMMU mapping (EP side) */
+ bool reg_mapped;
+ struct iommu_domain *iommu_dom;
+ unsigned long reg_iova;
+ size_t reg_iova_size;
+
+ /* Read channels delegated to the host side (EP side) */
+ struct dma_chan *dchan[NTB_EDMA_TOTAL_CH_NUM];
+
+ /* RC-side state */
+ bool peer_initialized;
+ bool peer_probed;
+ struct dw_edma_chip *peer_chip;
+ void __iomem *peer_virt;
+ resource_size_t peer_virt_size;
+};
+
+typedef void (*ntb_edma_interrupt_cb_t)(void *data, int qp_num);
+
+struct ntb_edma_interrupt {
+ ntb_edma_interrupt_cb_t cb;
+ void *data;
+};
+
+struct ntb_edma_filter {
+ struct device *dma_dev;
+ u32 direction;
+};
+
+static struct ntb_edma_ctx edma_ctx;
+static struct ntb_edma_interrupt intr;
+
+static DEFINE_SPINLOCK(ntb_edma_notify_lock);
+
+static bool ntb_edma_filter_fn(struct dma_chan *chan, void *arg)
+{
+ struct ntb_edma_filter *filter = arg;
+ u32 dir = filter->direction;
+ struct dma_slave_caps caps;
+ int ret;
+
+ if (chan->device->dev != filter->dma_dev)
+ return false;
+
+ ret = dma_get_slave_caps(chan, &caps);
+ if (ret < 0)
+ return false;
+
+ return !!(caps.directions & dir);
+}
+
+static void ntb_edma_notify_cb(struct dma_chan *dchan, void *data)
+{
+ struct ntb_edma_interrupt *v = data;
+ ntb_edma_interrupt_cb_t cb;
+ struct ntb_edma_db *db;
+ void *cb_data;
+ u32 qp_count;
+ u32 i, val;
+
+ guard(spinlock_irqsave)(&ntb_edma_notify_lock);
+
+ cb = v->cb;
+ cb_data = v->data;
+ qp_count = edma_ctx.qp_count;
+ db = edma_ctx.db_virt;
+ if (!cb || !db)
+ return;
+
+ for (i = 0; i < qp_count; i++) {
+ val = READ_ONCE(db->db[i]);
+ if (!val)
+ continue;
+
+ WRITE_ONCE(db->db[i], 0);
+ cb(cb_data, i);
+ }
+}
+
+static void ntb_edma_undelegate_chans(struct ntb_edma_ctx *ctx)
+{
+ unsigned int i;
+
+ if (!ctx)
+ return;
+
+ scoped_guard(spinlock_irqsave, &ntb_edma_notify_lock) {
+ intr.cb = NULL;
+ intr.data = NULL;
+ }
+
+ for (i = 0; i < NTB_EDMA_TOTAL_CH_NUM; i++) {
+ if (!ctx->dchan[i])
+ continue;
+
+ if (i == NTB_EDMA_CH_NUM)
+ dw_edma_chan_register_notify(ctx->dchan[i], NULL, NULL);
+
+ dma_release_channel(ctx->dchan[i]);
+ ctx->dchan[i] = NULL;
+ }
+}
+
+static int ntb_edma_delegate_chans(struct device *dev, struct ntb_edma_ctx *ctx,
+ struct ntb_edma_info *info,
+ ntb_edma_interrupt_cb_t cb, void *data)
+{
+ struct ntb_edma_filter filter;
+ struct dw_edma_region region;
+ dma_cap_mask_t dma_mask;
+ struct dma_chan *chan;
+ unsigned int i;
+ int rc;
+
+ dma_cap_zero(dma_mask);
+ dma_cap_set(DMA_SLAVE, dma_mask);
+
+ filter.dma_dev = dev;
+
+ /* Configure read channels, which will be driven by the host side */
+ for (i = 0; i < NTB_EDMA_TOTAL_CH_NUM; i++) {
+ filter.direction = BIT(DMA_DEV_TO_MEM);
+ chan = dma_request_channel(dma_mask, ntb_edma_filter_fn,
+ &filter);
+ if (!chan) {
+ rc = -ENODEV;
+ goto err;
+ }
+ ctx->dchan[i] = chan;
+
+ if (i == NTB_EDMA_CH_NUM) {
+ scoped_guard(spinlock_irqsave, &ntb_edma_notify_lock) {
+ intr.cb = cb;
+ intr.data = data;
+ }
+ rc = dw_edma_chan_register_notify(
+ chan, ntb_edma_notify_cb, &intr);
+ if (rc)
+ goto err;
+ } else {
+ rc = dw_edma_chan_irq_config(chan, DW_EDMA_CH_IRQ_REMOTE);
+ if (rc)
+ dev_warn(dev, "irq config failed (i=%u %d)\n",
+ i, rc);
+ }
+
+ rc = dw_edma_chan_get_ll_region(chan, ®ion);
+ if (rc)
+ goto err;
+
+ info->ll_rd_phys[i] = region.paddr;
+ }
+
+ return 0;
+
+err:
+ ntb_edma_undelegate_chans(ctx);
+ return rc;
+}
+
+static void ntb_edma_ctx_reset(struct ntb_edma_ctx *ctx)
+{
+ ctx->initialized = false;
+ ctx->mw_index = NTB_EDMA_MW_IDX_INVALID;
+ ctx->mw_trans_set = false;
+ ctx->reg_mapped = false;
+ ctx->iommu_dom = NULL;
+ ctx->reg_iova = 0;
+ ctx->reg_iova_size = 0;
+ ctx->db_phys = 0;
+ ctx->qp_count = 0;
+ ctx->info_virt = NULL;
+ ctx->info_phys = 0;
+ ctx->info_bytes = 0;
+ ctx->db_virt = NULL;
+ memset(ctx->dchan, 0, sizeof(ctx->dchan));
+}
+
+int ntb_edma_setup_mws(struct ntb_dev *ndev, int mw_index,
+ unsigned int qp_count, ntb_edma_interrupt_cb_t cb,
+ void *data)
+{
+ struct ntb_edma_ctx *ctx = &edma_ctx;
+ const size_t info_bytes = PAGE_SIZE;
+ resource_size_t size_max, offset;
+ dma_addr_t db_phys, info_phys;
+ size_t reg_size, reg_size_mw;
+ struct ntb_edma_info *info;
+ phys_addr_t edma_reg_phys;
+ struct iommu_domain *dom;
+ struct ntb_edma_db *db;
+ size_t ll_bytes, size;
+ struct pci_epc *epc;
+ struct device *dev;
+ unsigned long iova;
+ phys_addr_t phys;
+ u64 need;
+ int rc;
+ u32 i;
+
+ if (ctx->initialized)
+ return 0;
+
+ /* Clean up stale state from a previous failed attempt. */
+ ntb_edma_teardown_mws(ndev);
+
+ epc = (struct pci_epc *)ntb_get_private_data(ndev);
+ if (!epc)
+ return -ENODEV;
+ dev = epc->dev.parent;
+
+ ntb_edma_ctx_reset(ctx);
+
+ ctx->mw_index = mw_index;
+ ctx->qp_count = qp_count;
+
+ info = dma_alloc_coherent(dev, info_bytes, &info_phys, GFP_KERNEL);
+ if (!info)
+ return -ENOMEM;
+ memset(info, 0, info_bytes);
+
+ ctx->info_virt = info;
+ ctx->info_phys = info_phys;
+ ctx->info_bytes = info_bytes;
+
+ /* Get eDMA reg base and size, IOMMU map it if necessary */
+ rc = dw_edma_get_reg_window(epc, &edma_reg_phys, ®_size);
+ if (rc) {
+ dev_err(&ndev->pdev->dev,
+ "failed to get eDMA register window: %d\n", rc);
+ goto err;
+ }
+ dom = iommu_get_domain_for_dev(dev);
+ if (dom) {
+ phys = edma_reg_phys & PAGE_MASK;
+ size = PAGE_ALIGN(reg_size + edma_reg_phys - phys);
+ iova = phys;
+
+ rc = iommu_map(dom, iova, phys, size,
+ IOMMU_READ | IOMMU_WRITE | IOMMU_MMIO,
+ GFP_KERNEL);
+ if (rc) {
+ dev_err(&ndev->dev,
+ "failed to direct map eDMA reg: %d\n", rc);
+ goto err;
+ }
+
+ ctx->reg_mapped = true;
+ ctx->iommu_dom = dom;
+ ctx->reg_iova = iova;
+ ctx->reg_iova_size = size;
+ }
+
+ /* Read channels are driven by the peer (host side) */
+ rc = ntb_edma_delegate_chans(dev, ctx, info, cb, data);
+ if (rc) {
+ dev_err(&ndev->pdev->dev,
+ "failed to prepare channels to delegate: %d\n", rc);
+ goto err;
+ }
+
+ /* Scratch buffer for notification */
+ db = dma_alloc_coherent(dev, sizeof(*db), &db_phys, GFP_KERNEL);
+ if (!db) {
+ rc = -ENOMEM;
+ goto err;
+ }
+ memset(db, 0, sizeof(*db));
+
+ ctx->db_virt = db;
+ ctx->db_phys = db_phys;
+
+ /* Prep works for IB iATU mappings */
+ ll_bytes = NTB_EDMA_TOTAL_CH_NUM * DMA_LLP_MEM_SIZE;
+ reg_size_mw = roundup_pow_of_two(reg_size);
+ need = info_bytes + PAGE_SIZE + reg_size_mw + ll_bytes;
+
+ rc = ntb_mw_get_align(ndev, 0, mw_index, NULL, NULL, &size_max, &offset);
+ if (rc)
+ goto err;
+
+ if (size_max < need) {
+ rc = -ENOSPC;
+ goto err;
+ }
+
+ /* iATU map ntb_edma_info */
+ rc = ntb_mw_set_trans(ndev, 0, mw_index, info_phys, info_bytes, offset);
+ if (rc)
+ goto err;
+ ctx->mw_trans_set = true;
+ offset += info_bytes;
+
+ /* iATU map ntb_edma_db */
+ rc = ntb_mw_set_trans(ndev, 0, mw_index, db_phys, PAGE_SIZE, offset);
+ if (rc)
+ goto err;
+ offset += PAGE_SIZE;
+
+ /* iATU map eDMA reg */
+ rc = ntb_mw_set_trans(ndev, 0, mw_index, edma_reg_phys, reg_size_mw,
+ offset);
+ if (rc)
+ goto err;
+ offset += reg_size_mw;
+
+ /* iATU map LL location */
+ for (i = 0; i < NTB_EDMA_TOTAL_CH_NUM; i++) {
+ rc = ntb_mw_set_trans(ndev, 0, mw_index, info->ll_rd_phys[i],
+ DMA_LLP_MEM_SIZE, offset);
+ if (rc)
+ goto err;
+ offset += DMA_LLP_MEM_SIZE;
+ }
+
+ /* Fill in info */
+ info->magic = NTB_EDMA_INFO_MAGIC;
+ info->reg_size = reg_size_mw;
+ info->ch_cnt = NTB_EDMA_TOTAL_CH_NUM;
+ info->db_base = db_phys;
+
+ ctx->initialized = true;
+ return 0;
+
+err:
+ ntb_edma_teardown_mws(ndev);
+ return rc;
+}
+
+static int ntb_edma_irq_vector(struct device *dev, unsigned int nr)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ int ret, nvec;
+
+ nvec = pci_msi_vec_count(pdev);
+ for (; nr < nvec; nr++) {
+ ret = pci_irq_vector(pdev, nr);
+ if (!irq_has_action(ret))
+ return ret;
+ }
+ return 0;
+}
+
+static const struct dw_edma_plat_ops ntb_edma_ops = {
+ .irq_vector = ntb_edma_irq_vector,
+};
+
+int ntb_edma_setup_peer(struct ntb_dev *ndev, int mw_index,
+ unsigned int qp_count)
+{
+ struct ntb_edma_ctx *ctx = &edma_ctx;
+ struct ntb_edma_info __iomem *info;
+ struct dw_edma_chip *chip;
+ void __iomem *edma_virt;
+ resource_size_t mw_size;
+ phys_addr_t edma_phys;
+ unsigned int ch_cnt;
+ unsigned int i;
+ int ret;
+ u64 off;
+
+ if (ctx->peer_initialized)
+ return 0;
+
+ /* Clean up stale state from a previous failed attempt. */
+ ntb_edma_teardown_peer(ndev);
+
+ ret = ntb_peer_mw_get_addr(ndev, mw_index, &edma_phys, &mw_size);
+ if (ret)
+ return ret;
+
+ edma_virt = ioremap(edma_phys, mw_size);
+ if (!edma_virt)
+ return -ENOMEM;
+
+ ctx->peer_virt = edma_virt;
+ ctx->peer_virt_size = mw_size;
+
+ info = edma_virt;
+ if (readl(&info->magic) != NTB_EDMA_INFO_MAGIC) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ ch_cnt = readw(&info->ch_cnt);
+ if (ch_cnt != NTB_EDMA_TOTAL_CH_NUM) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ chip = devm_kzalloc(&ndev->dev, sizeof(*chip), GFP_KERNEL);
+ if (!chip) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ off = 2 * PAGE_SIZE;
+ chip->dev = &ndev->pdev->dev;
+ chip->nr_irqs = 4;
+ chip->ops = &ntb_edma_ops;
+ chip->flags = 0;
+ chip->reg_base = edma_virt + off;
+ chip->mf = EDMA_MF_EDMA_UNROLL;
+ chip->ll_wr_cnt = 0;
+ chip->ll_rd_cnt = ch_cnt;
+
+ ctx->db_io = (void __iomem *)edma_virt + PAGE_SIZE;
+ ctx->qp_count = qp_count;
+ ctx->db_phys = readq(&info->db_base);
+
+ ctx->notify_src_virt = dma_alloc_coherent(&ndev->pdev->dev,
+ sizeof(*ctx->notify_src_virt),
+ &ctx->notify_src_phys,
+ GFP_KERNEL);
+ if (!ctx->notify_src_virt) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ off += readl(&info->reg_size);
+
+ for (i = 0; i < ch_cnt; i++) {
+ chip->ll_region_rd[i].vaddr.io = edma_virt + off;
+ chip->ll_region_rd[i].paddr = readq(&info->ll_rd_phys[i]);
+ chip->ll_region_rd[i].sz = DMA_LLP_MEM_SIZE;
+ off += DMA_LLP_MEM_SIZE;
+ }
+
+ if (!pci_dev_msi_enabled(ndev->pdev)) {
+ ret = -ENXIO;
+ goto err;
+ }
+
+ ret = dw_edma_probe(chip);
+ if (ret) {
+ dev_err(&ndev->dev, "dw_edma_probe failed: %d\n", ret);
+ goto err;
+ }
+
+ ctx->peer_chip = chip;
+ ctx->peer_probed = true;
+ ctx->peer_initialized = true;
+ return 0;
+
+err:
+ ntb_edma_teardown_peer(ndev);
+ return ret;
+}
+
+void ntb_edma_teardown_mws(struct ntb_dev *ndev)
+{
+ struct ntb_edma_ctx *ctx = &edma_ctx;
+ struct device *dev = NULL;
+ struct pci_epc *epc;
+ struct ntb_edma_db *db;
+ struct ntb_edma_info *info;
+ dma_addr_t db_phys, info_phys;
+ size_t info_bytes;
+
+ epc = (struct pci_epc *)ntb_get_private_data(ndev);
+ WARN_ON(!epc);
+ if (epc)
+ dev = epc->dev.parent;
+
+ scoped_guard(spinlock_irqsave, &ntb_edma_notify_lock) {
+ db = ctx->db_virt;
+ db_phys = ctx->db_phys;
+
+ /* Make callbacks no-op first. */
+ intr.cb = NULL;
+ intr.data = NULL;
+ ctx->db_virt = NULL;
+ ctx->qp_count = 0;
+ }
+
+ info = ctx->info_virt;
+ info_phys = ctx->info_phys;
+ info_bytes = ctx->info_bytes;
+
+ /* Disconnect the MW before freeing its backing memory */
+ if (ctx->mw_trans_set && ctx->mw_index != NTB_EDMA_MW_IDX_INVALID)
+ ntb_mw_clear_trans(ndev, 0, ctx->mw_index);
+
+ ntb_edma_undelegate_chans(ctx);
+
+ if (ctx->reg_mapped)
+ iommu_unmap(ctx->iommu_dom, ctx->reg_iova, ctx->reg_iova_size);
+
+ if (db && dev)
+ dma_free_coherent(dev, sizeof(*db), db, db_phys);
+
+ if (info && dev && info_bytes)
+ dma_free_coherent(dev, info_bytes, info, info_phys);
+
+ ntb_edma_ctx_reset(ctx);
+}
+
+void ntb_edma_teardown_peer(struct ntb_dev *ndev)
+{
+ struct ntb_edma_ctx *ctx = &edma_ctx;
+ void __iomem *peer_virt = ctx->peer_virt;
+ struct dw_edma_chip *chip = ctx->peer_chip;
+ u32 *notify_src = ctx->notify_src_virt;
+ dma_addr_t notify_src_phys = ctx->notify_src_phys;
+
+ /* Stop using peer MMIO early. */
+ ctx->db_io = NULL;
+ ctx->db_phys = 0;
+ ctx->qp_count = 0;
+
+ if (ctx->peer_probed && chip)
+ dw_edma_remove(chip);
+
+ ctx->peer_initialized = false;
+ ctx->peer_probed = false;
+ ctx->peer_chip = NULL;
+
+ if (notify_src)
+ dma_free_coherent(&ndev->pdev->dev, sizeof(*notify_src),
+ notify_src, notify_src_phys);
+
+ ctx->notify_src_virt = NULL;
+ ctx->notify_src_phys = 0;
+ memset(&ctx->sgl, 0, sizeof(ctx->sgl));
+
+ if (peer_virt)
+ iounmap(peer_virt);
+
+ ctx->peer_virt = NULL;
+ ctx->peer_virt_size = 0;
+}
+
+void ntb_edma_teardown_chans(struct ntb_edma_chans *edma)
+{
+ unsigned int i;
+
+ if (!edma)
+ return;
+
+ for (i = 0; i < NTB_EDMA_CH_NUM; i++) {
+ if (!edma->chan[i])
+ continue;
+ dma_release_channel(edma->chan[i]);
+ edma->chan[i] = NULL;
+ }
+ edma->num_chans = 0;
+
+ if (edma->intr_chan) {
+ dma_release_channel(edma->intr_chan);
+ edma->intr_chan = NULL;
+ }
+
+ atomic_set(&edma->cur_chan, 0);
+}
+
+int ntb_edma_setup_chans(struct device *dev, struct ntb_edma_chans *edma,
+ bool remote)
+{
+ struct ntb_edma_filter filter;
+ dma_cap_mask_t dma_mask;
+ unsigned int i;
+ int rc;
+
+ dma_cap_zero(dma_mask);
+ dma_cap_set(DMA_SLAVE, dma_mask);
+
+ memset(edma, 0, sizeof(*edma));
+ edma->dev = dev;
+
+ mutex_init(&edma->lock);
+
+ filter.dma_dev = dev;
+ filter.direction = BIT(DMA_MEM_TO_DEV);
+ for (i = 0; i < NTB_EDMA_CH_NUM; i++) {
+ edma->chan[i] = dma_request_channel(
+ dma_mask, ntb_edma_filter_fn, &filter);
+ if (!edma->chan[i])
+ break;
+ edma->num_chans++;
+
+ if (remote)
+ rc = dw_edma_chan_irq_config(edma->chan[i],
+ DW_EDMA_CH_IRQ_REMOTE);
+ else
+ rc = dw_edma_chan_irq_config(edma->chan[i],
+ DW_EDMA_CH_IRQ_LOCAL);
+
+ if (rc) {
+ dev_err(dev, "irq config failed on ch%u: %d\n", i, rc);
+ goto err;
+ }
+ }
+
+ if (!edma->num_chans) {
+ dev_warn(dev, "Remote eDMA channels failed to initialize\n");
+ ntb_edma_teardown_chans(edma);
+ return -ENODEV;
+ }
+ return 0;
+err:
+ ntb_edma_teardown_chans(edma);
+ return rc;
+}
+
+int ntb_edma_setup_intr_chan(struct device *dev, struct ntb_edma_chans *edma)
+{
+ struct ntb_edma_filter filter;
+ dma_cap_mask_t dma_mask;
+ struct dma_slave_config cfg;
+ struct scatterlist *sgl = &edma_ctx.sgl;
+ int rc;
+
+ if (edma->intr_chan)
+ return 0;
+
+ if (!edma_ctx.notify_src_virt || !edma_ctx.db_phys)
+ return -EINVAL;
+
+ dma_cap_zero(dma_mask);
+ dma_cap_set(DMA_SLAVE, dma_mask);
+
+ filter.dma_dev = dev;
+ filter.direction = BIT(DMA_MEM_TO_DEV);
+
+ edma->intr_chan = dma_request_channel(dma_mask, ntb_edma_filter_fn,
+ &filter);
+ if (!edma->intr_chan) {
+ dev_warn(dev,
+ "Remote eDMA notify channel could not be allocated\n");
+ return -ENODEV;
+ }
+
+ rc = dw_edma_chan_irq_config(edma->intr_chan, DW_EDMA_CH_IRQ_LOCAL);
+ if (rc)
+ goto err_release;
+
+ /* Ensure store is visible before kicking DMA transfer */
+ wmb();
+
+ sg_init_table(sgl, 1);
+ sg_dma_address(sgl) = edma_ctx.notify_src_phys;
+ sg_dma_len(sgl) = sizeof(u32);
+
+ memset(&cfg, 0, sizeof(cfg));
+ cfg.dst_addr = edma_ctx.db_phys; /* The first 32bit is 'target' */
+ cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ cfg.direction = DMA_MEM_TO_DEV;
+
+ rc = dmaengine_slave_config(edma->intr_chan, &cfg);
+ if (rc)
+ goto err_release;
+
+ return 0;
+
+err_release:
+ dma_release_channel(edma->intr_chan);
+ edma->intr_chan = NULL;
+ return rc;
+}
+
+struct dma_chan *ntb_edma_pick_chan(struct ntb_edma_chans *edma,
+ unsigned int idx)
+{
+ return edma->chan[idx % edma->num_chans];
+}
+
+int ntb_edma_notify_peer(struct ntb_edma_chans *edma, int qp_num)
+{
+ struct dma_async_tx_descriptor *txd;
+ dma_cookie_t cookie;
+
+ if (!edma || !edma->intr_chan)
+ return -ENXIO;
+
+ if (qp_num < 0 || qp_num >= edma_ctx.qp_count)
+ return -EINVAL;
+
+ if (!edma_ctx.db_io)
+ return -EINVAL;
+
+ guard(mutex)(&edma->lock);
+
+ writel(1, &edma_ctx.db_io->db[qp_num]);
+
+ /* Ensure store is visible before kicking the DMA transfer */
+ wmb();
+
+ txd = dmaengine_prep_slave_sg(edma->intr_chan, &edma_ctx.sgl, 1,
+ DMA_MEM_TO_DEV,
+ DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
+ if (!txd)
+ return -ENOSPC;
+
+ cookie = dmaengine_submit(txd);
+ if (dma_submit_error(cookie))
+ return -ENOSPC;
+
+ dma_async_issue_pending(edma->intr_chan);
+ return 0;
+}
diff --git a/drivers/ntb/hw/edma/ntb_hw_edma.h b/drivers/ntb/hw/edma/ntb_hw_edma.h
new file mode 100644
index 000000000000..46b50e504389
--- /dev/null
+++ b/drivers/ntb/hw/edma/ntb_hw_edma.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _NTB_HW_EDMA_H_
+#define _NTB_HW_EDMA_H_
+
+#include <linux/completion.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+
+#define NTB_EDMA_CH_NUM 4
+
+/* One extra channel is reserved for notification (RC to EP interrupt kick). */
+#define NTB_EDMA_TOTAL_CH_NUM (NTB_EDMA_CH_NUM + 1)
+
+#define NTB_EDMA_INFO_MAGIC 0x45444D41 /* "EDMA" */
+
+#define NTB_EDMA_NOTIFY_MAX_QP 64
+
+typedef void (*ntb_edma_interrupt_cb_t)(void *data, int qp_num);
+
+/*
+ * REMOTE_EDMA_EP:
+ * Endpoint owns the eDMA engine and pushes descriptors into a shared MW.
+ *
+ * REMOTE_EDMA_RC:
+ * Root Complex controls the endpoint eDMA through the shared MW and
+ * drives reads/writes on behalf of the host.
+ */
+typedef enum {
+ REMOTE_EDMA_UNKNOWN,
+ REMOTE_EDMA_EP,
+ REMOTE_EDMA_RC,
+} remote_edma_mode_t;
+
+struct ntb_edma_info {
+ u32 magic;
+ u32 reg_size;
+ u16 ch_cnt;
+ u64 db_base;
+ u64 ll_rd_phys[NTB_EDMA_TOTAL_CH_NUM];
+};
+
+struct ntb_edma_db {
+ u32 target;
+ u32 db[NTB_EDMA_NOTIFY_MAX_QP];
+};
+
+struct ntb_edma_chans {
+ struct device *dev;
+
+ struct dma_chan *chan[NTB_EDMA_CH_NUM];
+ struct dma_chan *intr_chan;
+
+ unsigned int num_chans;
+ atomic_t cur_chan;
+
+ struct mutex lock;
+};
+
+int ntb_edma_setup_mws(struct ntb_dev *ndev, int mw_index,
+ unsigned int qp_count, ntb_edma_interrupt_cb_t cb,
+ void *data);
+int ntb_edma_setup_peer(struct ntb_dev *ndev, int mw_index,
+ unsigned int qp_count);
+void ntb_edma_teardown_mws(struct ntb_dev *ndev);
+void ntb_edma_teardown_peer(struct ntb_dev *ndev);
+int ntb_edma_setup_chans(struct device *dma_dev, struct ntb_edma_chans *edma,
+ bool remote);
+int ntb_edma_setup_intr_chan(struct device *dma_dev,
+ struct ntb_edma_chans *edma);
+struct dma_chan *ntb_edma_pick_chan(struct ntb_edma_chans *edma,
+ unsigned int idx);
+void ntb_edma_teardown_chans(struct ntb_edma_chans *edma);
+int ntb_edma_notify_peer(struct ntb_edma_chans *edma, int qp_num);
+
+#endif
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (24 preceding siblings ...)
2025-12-17 15:15 ` [RFC PATCH v3 25/35] NTB: hw: Introduce DesignWare eDMA helper Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-19 15:00 ` Frank Li
2026-01-06 18:51 ` Dave Jiang
2025-12-17 15:16 ` [RFC PATCH v3 27/35] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
` (9 subsequent siblings)
35 siblings, 2 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add a new ntb_transport backend that uses a DesignWare eDMA engine
located on the endpoint, to be driven by both host and endpoint.
The endpoint exposes a dedicated memory window which contains the eDMA
register block, a small control structure (struct ntb_edma_info) and
per-channel linked-list (LL) rings for read channels. Endpoint drives
its local eDMA write channels for its transmission, while host side
uses the remote eDMA read channels for its transmission.
A key benefit of this backend is that the memory window no longer needs
to carry data-plane payload. This makes the design less sensitive to
limited memory window space and allows scaling to multiple queue pairs.
The memory window layout is specific to the eDMA-backed backend, so
there is no automatic fallback to the memcpy-based default transport
that requires the different layout.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/Kconfig | 12 +
drivers/ntb/Makefile | 2 +
drivers/ntb/ntb_transport_core.c | 15 +-
drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
drivers/ntb/ntb_transport_internal.h | 15 +
5 files changed, 1029 insertions(+), 2 deletions(-)
create mode 100644 drivers/ntb/ntb_transport_edma.c
diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index df16c755b4da..5ba6d0b7f5ba 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -37,4 +37,16 @@ config NTB_TRANSPORT
If unsure, say N.
+config NTB_TRANSPORT_EDMA
+ bool "NTB Transport backed by remote eDMA"
+ depends on NTB_TRANSPORT
+ depends on PCI
+ select DMA_ENGINE
+ select NTB_EDMA
+ help
+ Enable a transport backend that uses a remote DesignWare eDMA engine
+ exposed through a dedicated NTB memory window. The host uses the
+ endpoint's eDMA engine to move data in both directions.
+ Say Y here if you intend to use the 'use_remote_edma' module parameter.
+
endif # NTB
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 9b66e5fafbc0..b9086b32ecde 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -6,3 +6,5 @@ ntb-y := core.o
ntb-$(CONFIG_NTB_MSI) += msi.o
ntb_transport-y := ntb_transport_core.o
+ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
+ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
index 40c2548f5930..bd21232f26fe 100644
--- a/drivers/ntb/ntb_transport_core.c
+++ b/drivers/ntb/ntb_transport_core.c
@@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
#endif
+bool use_remote_edma;
+#ifdef CONFIG_NTB_TRANSPORT_EDMA
+module_param(use_remote_edma, bool, 0644);
+MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
+#endif
+
static struct dentry *nt_debugfs_dir;
/* Only two-ports NTB devices are supported */
@@ -156,7 +162,7 @@ enum {
#define drv_client(__drv) \
container_of((__drv), struct ntb_transport_client, driver)
-#define NTB_QP_DEF_NUM_ENTRIES 100
+#define NTB_QP_DEF_NUM_ENTRIES 128
#define NTB_LINK_DOWN_TIMEOUT 10
static void ntb_transport_rxc_db(unsigned long data);
@@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
nt->ndev = ndev;
- rc = ntb_transport_default_init(nt);
+ if (use_remote_edma)
+ rc = ntb_transport_edma_init(nt);
+ else
+ rc = ntb_transport_default_init(nt);
+
if (rc)
return rc;
@@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
nt->qp_bitmap_free &= ~qp_bit;
+ qp->qp_bit = qp_bit;
qp->cb_data = data;
qp->rx_handler = handlers->rx_handler;
qp->tx_handler = handlers->tx_handler;
diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
new file mode 100644
index 000000000000..6ae5da0a1367
--- /dev/null
+++ b/drivers/ntb/ntb_transport_edma.c
@@ -0,0 +1,987 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NTB transport backend for remote DesignWare eDMA.
+ *
+ * This implements the backend_ops used when use_remote_edma=1 and
+ * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
+ */
+
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/debugfs.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/errno.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/ntb.h>
+#include <linux/pci.h>
+#include <linux/pci-epc.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+
+#include "hw/edma/ntb_hw_edma.h"
+#include "ntb_transport_internal.h"
+
+#define NTB_EDMA_RING_ORDER 7
+#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
+#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
+
+#define NTB_EDMA_MAX_POLL 32
+
+/*
+ * Remote eDMA mode implementation
+ */
+struct ntb_transport_ctx_edma {
+ remote_edma_mode_t remote_edma_mode;
+ struct device *dma_dev;
+ struct workqueue_struct *wq;
+ struct ntb_edma_chans chans;
+};
+
+struct ntb_transport_qp_edma {
+ struct ntb_transport_qp *qp;
+
+ /*
+ * For ensuring peer notification in non-atomic context.
+ * ntb_peer_db_set might sleep or schedule.
+ */
+ struct work_struct db_work;
+
+ u32 rx_prod;
+ u32 rx_cons;
+ u32 tx_cons;
+ u32 tx_issue;
+
+ spinlock_t rx_lock;
+ spinlock_t tx_lock;
+
+ struct work_struct rx_work;
+ struct work_struct tx_work;
+};
+
+struct ntb_edma_desc {
+ u32 len;
+ u32 flags;
+ u64 addr; /* DMA address */
+ u64 data;
+};
+
+struct ntb_edma_ring {
+ struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
+ u32 head;
+ u32 tail;
+};
+
+static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
+{
+ struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+ return ctx->remote_edma_mode == REMOTE_EDMA_RC;
+}
+
+static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
+{
+ struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+ return ctx->remote_edma_mode == REMOTE_EDMA_EP;
+}
+
+static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
+{
+ return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
+}
+
+static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
+ unsigned int n)
+{
+ return n ^ !!ntb_qp_edma_is_ep(qp);
+}
+
+static inline struct ntb_edma_ring *
+ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
+{
+ unsigned int r = ntb_edma_ring_sel(qp, n);
+
+ return &((struct ntb_edma_ring *)qp->rx_buff)[r];
+}
+
+static inline struct ntb_edma_ring __iomem *
+ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
+{
+ unsigned int r = ntb_edma_ring_sel(qp, n);
+
+ return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
+}
+
+static inline struct ntb_edma_desc *
+ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
+{
+ return &ntb_edma_ring_local(qp, n)->desc[i];
+}
+
+static inline struct ntb_edma_desc __iomem *
+ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
+ unsigned int i)
+{
+ return &ntb_edma_ring_remote(qp, n)->desc[i];
+}
+
+static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
+ unsigned int n)
+{
+ return &ntb_edma_ring_local(qp, n)->head;
+}
+
+static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
+ unsigned int n)
+{
+ return &ntb_edma_ring_remote(qp, n)->head;
+}
+
+static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
+ unsigned int n)
+{
+ return &ntb_edma_ring_local(qp, n)->tail;
+}
+
+static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
+ unsigned int n)
+{
+ return &ntb_edma_ring_remote(qp, n)->tail;
+}
+
+/* The 'i' must be generated by ntb_edma_ring_idx() */
+#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
+#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
+#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
+#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
+
+#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
+#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
+
+#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
+#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
+
+/* ntb_edma_ring helpers */
+static __always_inline u32 ntb_edma_ring_idx(u32 v)
+{
+ return v & NTB_EDMA_RING_MASK;
+}
+
+static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
+{
+ if (head >= tail) {
+ WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
+ return head - tail;
+ }
+
+ WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
+ return U32_MAX - tail + head + 1;
+}
+
+static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
+{
+ return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
+}
+
+static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
+{
+ return ntb_edma_ring_free_entry(head, tail) == 0;
+}
+
+static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
+{
+ struct ntb_transport_qp_edma *edma = qp->priv;
+ unsigned int head, tail;
+
+ scoped_guard(spinlock_irqsave, &edma->tx_lock) {
+ /* In this scope, only 'head' might proceed */
+ tail = READ_ONCE(edma->tx_issue);
+ head = READ_ONCE(*NTB_HEAD_TX_I(qp));
+ }
+ /*
+ * 'used' amount indicates how much the other end has refilled,
+ * which are available for us to use for TX.
+ */
+ return ntb_edma_ring_used_entry(head, tail);
+}
+
+static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
+ struct ntb_transport_qp *qp)
+{
+ seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
+ seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
+ seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
+ seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
+ seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
+ seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
+
+ seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
+ seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
+ seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
+ seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
+ seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
+ seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
+ seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
+ seq_putc(s, '\n');
+
+ seq_puts(s, "Using Remote eDMA - Yes\n");
+ seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
+}
+
+static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
+{
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+
+ if (ctx->wq)
+ destroy_workqueue(ctx->wq);
+ ctx->wq = NULL;
+
+ ntb_edma_teardown_chans(&ctx->chans);
+
+ switch (ctx->remote_edma_mode) {
+ case REMOTE_EDMA_EP:
+ ntb_edma_teardown_mws(nt->ndev);
+ break;
+ case REMOTE_EDMA_RC:
+ ntb_edma_teardown_peer(nt->ndev);
+ break;
+ case REMOTE_EDMA_UNKNOWN:
+ default:
+ break;
+ }
+
+ ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
+}
+
+static void ntb_transport_edma_db_work(struct work_struct *work)
+{
+ struct ntb_transport_qp_edma *edma =
+ container_of(work, struct ntb_transport_qp_edma, db_work);
+ struct ntb_transport_qp *qp = edma->qp;
+
+ ntb_peer_db_set(qp->ndev, qp->qp_bit);
+}
+
+static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
+{
+ struct ntb_transport_qp *qp = edma->qp;
+ struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+
+ if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
+ return;
+
+ /*
+ * Called from contexts that may be atomic. Since ntb_peer_db_set()
+ * may sleep, delegate the actual doorbell write to a workqueue.
+ */
+ queue_work(system_highpri_wq, &edma->db_work);
+}
+
+static void ntb_transport_edma_isr(void *data, int qp_num)
+{
+ struct ntb_transport_ctx *nt = data;
+ struct ntb_transport_qp_edma *edma;
+ struct ntb_transport_ctx_edma *ctx;
+ struct ntb_transport_qp *qp;
+
+ if (qp_num < 0 || qp_num >= nt->qp_count)
+ return;
+
+ qp = &nt->qp_vec[qp_num];
+ if (WARN_ON(!qp))
+ return;
+
+ ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
+ edma = qp->priv;
+
+ queue_work(ctx->wq, &edma->rx_work);
+ queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
+{
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+ struct ntb_dev *ndev = nt->ndev;
+ struct pci_dev *pdev = ndev->pdev;
+ int peer_mw;
+ int rc;
+
+ if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
+ return 0;
+
+ peer_mw = ntb_peer_mw_count(ndev);
+ if (peer_mw <= 0)
+ return -ENODEV;
+
+ rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
+ if (rc) {
+ dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
+ return rc;
+ }
+
+ rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
+ if (rc) {
+ dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
+ goto err_teardown_peer;
+ }
+
+ rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
+ if (rc) {
+ dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
+ rc);
+ goto err_teardown_chans;
+ }
+
+ ctx->remote_edma_mode = REMOTE_EDMA_RC;
+ return 0;
+
+err_teardown_chans:
+ ntb_edma_teardown_chans(&ctx->chans);
+err_teardown_peer:
+ ntb_edma_teardown_peer(ndev);
+ return rc;
+}
+
+
+static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
+{
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+ struct ntb_dev *ndev = nt->ndev;
+ struct pci_dev *pdev = ndev->pdev;
+ int peer_mw;
+ int rc;
+
+ if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
+ return 0;
+
+ /**
+ * This check assumes that the endpoint (pci-epf-vntb.c)
+ * ntb_dev_ops implements .get_private_data() while the host side
+ * (ntb_hw_epf.c) does not.
+ */
+ if (!ntb_get_private_data(ndev))
+ return 0;
+
+ peer_mw = ntb_peer_mw_count(ndev);
+ if (peer_mw <= 0)
+ return -ENODEV;
+
+ rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
+ ntb_transport_edma_isr, nt);
+ if (rc) {
+ dev_err(&pdev->dev,
+ "Failed to set up memory window for eDMA: %d\n", rc);
+ return rc;
+ }
+
+ rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
+ if (rc) {
+ dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
+ ntb_edma_teardown_mws(ndev);
+ return rc;
+ }
+
+ ctx->remote_edma_mode = REMOTE_EDMA_EP;
+ return 0;
+}
+
+
+static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
+ unsigned int qp_num)
+{
+ struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+ struct ntb_dev *ndev = nt->ndev;
+ struct ntb_queue_entry *entry;
+ struct ntb_transport_mw *mw;
+ unsigned int mw_num, mw_count, qp_count;
+ unsigned int qp_offset, rx_info_offset;
+ unsigned int mw_size, mw_size_per_qp;
+ unsigned int num_qps_mw;
+ size_t edma_total;
+ unsigned int i;
+ int node;
+
+ mw_count = nt->mw_count;
+ qp_count = nt->qp_count;
+
+ mw_num = QP_TO_MW(nt, qp_num);
+ mw = &nt->mw_vec[mw_num];
+
+ if (!mw->virt_addr)
+ return -ENOMEM;
+
+ if (mw_num < qp_count % mw_count)
+ num_qps_mw = qp_count / mw_count + 1;
+ else
+ num_qps_mw = qp_count / mw_count;
+
+ mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
+ if (max_mw_size && mw_size > max_mw_size)
+ mw_size = max_mw_size;
+
+ mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
+ qp_offset = mw_size_per_qp * (qp_num / mw_count);
+ rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
+
+ qp->tx_mw_size = mw_size_per_qp;
+ qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
+ if (!qp->tx_mw)
+ return -EINVAL;
+ qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
+ if (!qp->tx_mw_phys)
+ return -EINVAL;
+ qp->rx_info = qp->tx_mw + rx_info_offset;
+ qp->rx_buff = mw->virt_addr + qp_offset;
+ qp->remote_rx_info = qp->rx_buff + rx_info_offset;
+
+ /* Due to housekeeping, there must be at least 2 buffs */
+ qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
+ qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
+
+ /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
+ edma_total = 2 * sizeof(struct ntb_edma_ring);
+ if (rx_info_offset < edma_total) {
+ dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
+ edma_total, rx_info_offset);
+ return -EINVAL;
+ }
+ qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
+ qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
+
+ /*
+ * Checking to see if we have more entries than the default.
+ * We should add additional entries if that is the case so we
+ * can be in sync with the transport frames.
+ */
+ node = dev_to_node(&ndev->dev);
+ for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
+ entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->qp = qp;
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
+ &qp->rx_free_q);
+ qp->rx_alloc_entry++;
+ }
+
+ memset(qp->rx_buff, 0, edma_total);
+
+ qp->rx_pkts = 0;
+ qp->tx_pkts = 0;
+
+ return 0;
+}
+
+static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
+{
+ struct device *dma_dev = get_dma_dev(qp->ndev);
+ struct ntb_transport_qp_edma *edma = qp->priv;
+ struct ntb_queue_entry *entry;
+ struct ntb_edma_desc *in;
+ unsigned int len;
+ bool link_down;
+ u32 idx;
+
+ if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
+ edma->rx_cons) == 0)
+ return 0;
+
+ idx = ntb_edma_ring_idx(edma->rx_cons);
+ in = NTB_DESC_RX_I(qp, idx);
+ if (!(in->flags & DESC_DONE_FLAG))
+ return 0;
+
+ link_down = in->flags & LINK_DOWN_FLAG;
+ in->flags = 0;
+ len = in->len; /* might be smaller than entry->len */
+
+ entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
+ if (WARN_ON(!entry))
+ return 0;
+
+ if (link_down) {
+ ntb_qp_link_down(qp);
+ edma->rx_cons++;
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
+ return 1;
+ }
+
+ dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
+
+ qp->rx_bytes += len;
+ qp->rx_pkts++;
+ edma->rx_cons++;
+
+ if (qp->rx_handler && qp->client_ready)
+ qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
+
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
+ return 1;
+}
+
+static void ntb_transport_edma_rx_work(struct work_struct *work)
+{
+ struct ntb_transport_qp_edma *edma = container_of(
+ work, struct ntb_transport_qp_edma, rx_work);
+ struct ntb_transport_qp *qp = edma->qp;
+ struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
+ unsigned int i;
+
+ for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
+ if (!ntb_transport_edma_rx_complete(qp))
+ break;
+ }
+
+ if (ntb_transport_edma_rx_complete(qp))
+ queue_work(ctx->wq, &edma->rx_work);
+}
+
+static void ntb_transport_edma_tx_work(struct work_struct *work)
+{
+ struct ntb_transport_qp_edma *edma = container_of(
+ work, struct ntb_transport_qp_edma, tx_work);
+ struct ntb_transport_qp *qp = edma->qp;
+ struct ntb_edma_desc *in, __iomem *out;
+ struct ntb_queue_entry *entry;
+ unsigned int len;
+ void *cb_data;
+ u32 idx;
+
+ while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
+ edma->tx_cons) != 0) {
+ /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
+ smp_rmb();
+
+ idx = ntb_edma_ring_idx(edma->tx_cons);
+ in = NTB_DESC_TX_I(qp, idx);
+ entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
+ if (!entry || !(entry->flags & DESC_DONE_FLAG))
+ break;
+
+ in->data = 0;
+
+ cb_data = entry->cb_data;
+ len = entry->len;
+
+ out = NTB_DESC_TX_O(qp, idx);
+
+ WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
+
+ /*
+ * No need to add barrier in-between to enforce ordering here.
+ * The other side proceeds only after both flags and tail are
+ * updated.
+ */
+ iowrite32(entry->flags, &out->flags);
+ iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
+
+ ntb_transport_edma_notify_peer(edma);
+
+ ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+ &qp->tx_free_q);
+
+ if (qp->tx_handler)
+ qp->tx_handler(qp, qp->cb_data, cb_data, len);
+
+ /* stat updates */
+ qp->tx_bytes += len;
+ qp->tx_pkts++;
+ }
+}
+
+static void ntb_transport_edma_tx_cb(void *data,
+ const struct dmaengine_result *res)
+{
+ struct ntb_queue_entry *entry = data;
+ struct ntb_transport_qp *qp = entry->qp;
+ struct ntb_transport_ctx *nt = qp->transport;
+ struct device *dma_dev = get_dma_dev(qp->ndev);
+ enum dmaengine_tx_result dma_err = res->result;
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+ struct ntb_transport_qp_edma *edma = qp->priv;
+
+ switch (dma_err) {
+ case DMA_TRANS_READ_FAILED:
+ case DMA_TRANS_WRITE_FAILED:
+ case DMA_TRANS_ABORTED:
+ entry->errors++;
+ entry->len = -EIO;
+ break;
+ case DMA_TRANS_NOERROR:
+ default:
+ break;
+ }
+ dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
+ sg_dma_address(&entry->sgl) = 0;
+
+ entry->flags |= DESC_DONE_FLAG;
+
+ queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
+ size_t len, void *rc_src, dma_addr_t dst,
+ struct ntb_queue_entry *entry)
+{
+ struct scatterlist *sgl = &entry->sgl;
+ struct dma_async_tx_descriptor *txd;
+ struct dma_slave_config cfg;
+ dma_cookie_t cookie;
+ int nents, rc;
+
+ if (!d)
+ return -ENODEV;
+
+ if (!chan)
+ return -ENXIO;
+
+ if (WARN_ON(!rc_src || !dst))
+ return -EINVAL;
+
+ if (WARN_ON(sg_dma_address(sgl)))
+ return -EINVAL;
+
+ sg_init_one(sgl, rc_src, len);
+ nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
+ if (nents <= 0)
+ return -EIO;
+
+ memset(&cfg, 0, sizeof(cfg));
+ cfg.dst_addr = dst;
+ cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+ cfg.direction = DMA_MEM_TO_DEV;
+
+ txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
+ DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
+ if (!txd) {
+ rc = -EIO;
+ goto out_unmap;
+ }
+
+ txd->callback_result = ntb_transport_edma_tx_cb;
+ txd->callback_param = entry;
+
+ cookie = dmaengine_submit(txd);
+ if (dma_submit_error(cookie)) {
+ rc = -EIO;
+ goto out_unmap;
+ }
+ dma_async_issue_pending(chan);
+ return 0;
+out_unmap:
+ dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
+ return rc;
+}
+
+static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ struct device *dma_dev = get_dma_dev(qp->ndev);
+ struct ntb_transport_qp_edma *edma = qp->priv;
+ struct ntb_transport_ctx *nt = qp->transport;
+ struct ntb_edma_desc *in, __iomem *out;
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+ unsigned int len = entry->len;
+ struct dma_chan *chan;
+ u32 issue, idx, head;
+ dma_addr_t dst;
+ int rc;
+
+ WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
+
+ scoped_guard(spinlock_irqsave, &edma->tx_lock) {
+ head = READ_ONCE(*NTB_HEAD_TX_I(qp));
+ issue = edma->tx_issue;
+ if (ntb_edma_ring_used_entry(head, issue) == 0) {
+ qp->tx_ring_full++;
+ return -ENOSPC;
+ }
+
+ /*
+ * ntb_transport_edma_tx_work() checks entry->flags
+ * so it needs to be set before tx_issue++.
+ */
+ idx = ntb_edma_ring_idx(issue);
+ in = NTB_DESC_TX_I(qp, idx);
+ in->data = (uintptr_t)entry;
+
+ /* Make in->data visible before tx_issue++ */
+ smp_wmb();
+
+ WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
+ }
+
+ /* Publish the final transfer length to the other end */
+ out = NTB_DESC_TX_O(qp, idx);
+ iowrite32(len, &out->len);
+ ioread32(&out->len);
+
+ if (unlikely(!len)) {
+ entry->flags |= DESC_DONE_FLAG;
+ queue_work(ctx->wq, &edma->tx_work);
+ return 0;
+ }
+
+ /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
+ dma_rmb();
+
+ /* kick remote eDMA read transfer */
+ dst = (dma_addr_t)in->addr;
+ chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
+ rc = ntb_transport_edma_submit(dma_dev, chan, len,
+ entry->buf, dst, entry);
+ if (rc) {
+ entry->errors++;
+ entry->len = -EIO;
+ entry->flags |= DESC_DONE_FLAG;
+ queue_work(ctx->wq, &edma->tx_work);
+ }
+ return 0;
+}
+
+static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry,
+ void *cb, void *data, unsigned int len,
+ unsigned int flags)
+{
+ struct device *dma_dev;
+
+ if (entry->addr) {
+ /* Deferred unmap */
+ dma_dev = get_dma_dev(qp->ndev);
+ dma_unmap_single(dma_dev, entry->addr, entry->len,
+ DMA_TO_DEVICE);
+ }
+
+ entry->cb_data = cb;
+ entry->buf = data;
+ entry->len = len;
+ entry->flags = flags;
+ entry->errors = 0;
+ entry->addr = 0;
+
+ WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
+
+ return ntb_transport_edma_tx_enqueue_inner(qp, entry);
+}
+
+static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ struct device *dma_dev = get_dma_dev(qp->ndev);
+ struct ntb_transport_qp_edma *edma = qp->priv;
+ struct ntb_edma_desc *in, __iomem *out;
+ unsigned int len = entry->len;
+ void *data = entry->buf;
+ dma_addr_t dst;
+ u32 idx;
+ int rc;
+
+ dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
+ rc = dma_mapping_error(dma_dev, dst);
+ if (rc)
+ return rc;
+
+ guard(spinlock_bh)(&edma->rx_lock);
+
+ if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
+ READ_ONCE(edma->rx_cons))) {
+ rc = -ENOSPC;
+ goto out_unmap;
+ }
+
+ idx = ntb_edma_ring_idx(edma->rx_prod);
+ in = NTB_DESC_RX_I(qp, idx);
+ out = NTB_DESC_RX_O(qp, idx);
+
+ iowrite32(len, &out->len);
+ iowrite64(dst, &out->addr);
+
+ WARN_ON(in->flags & DESC_DONE_FLAG);
+ in->data = (uintptr_t)entry;
+ entry->addr = dst;
+
+ /* Ensure len/addr are visible before the head update */
+ dma_wmb();
+
+ WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
+ iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
+
+ return 0;
+out_unmap:
+ dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
+ return rc;
+}
+
+static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ int rc;
+
+ rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
+ if (rc) {
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
+ &qp->rx_free_q);
+ return rc;
+ }
+
+ ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
+
+ if (qp->active)
+ tasklet_schedule(&qp->rxc_db_work);
+
+ return 0;
+}
+
+static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
+{
+ struct ntb_transport_ctx *nt = qp->transport;
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+ struct ntb_transport_qp_edma *edma = qp->priv;
+
+ queue_work(ctx->wq, &edma->rx_work);
+ queue_work(ctx->wq, &edma->tx_work);
+}
+
+static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
+ unsigned int qp_num)
+{
+ struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+ struct ntb_transport_qp_edma *edma;
+ struct ntb_dev *ndev = nt->ndev;
+ int node;
+
+ node = dev_to_node(&ndev->dev);
+
+ qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
+ if (!qp->priv)
+ return -ENOMEM;
+
+ edma = (struct ntb_transport_qp_edma *)qp->priv;
+ edma->qp = qp;
+ edma->rx_prod = 0;
+ edma->rx_cons = 0;
+ edma->tx_cons = 0;
+ edma->tx_issue = 0;
+
+ spin_lock_init(&edma->rx_lock);
+ spin_lock_init(&edma->tx_lock);
+
+ INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
+ INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
+ INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
+
+ return 0;
+}
+
+static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
+{
+ struct ntb_transport_qp_edma *edma = qp->priv;
+
+ cancel_work_sync(&edma->db_work);
+ cancel_work_sync(&edma->rx_work);
+ cancel_work_sync(&edma->tx_work);
+
+ kfree(qp->priv);
+}
+
+static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
+{
+ struct ntb_dev *ndev = nt->ndev;
+ struct pci_dev *pdev = ndev->pdev;
+ int rc;
+
+ rc = ntb_transport_edma_ep_init(nt);
+ if (rc)
+ dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
+
+ return rc;
+}
+
+static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
+{
+ struct ntb_dev *ndev = nt->ndev;
+ struct pci_dev *pdev = ndev->pdev;
+ int rc;
+
+ rc = ntb_transport_edma_rc_init(nt);
+ if (rc)
+ dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
+
+ return rc;
+}
+
+static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
+ unsigned int *mw_count)
+{
+ struct ntb_dev *ndev = nt->ndev;
+ struct ntb_transport_ctx_edma *ctx = nt->priv;
+
+ if (!use_remote_edma)
+ return 0;
+
+ /*
+ * We need at least one MW for the transport plus one MW reserved
+ * for the remote eDMA window (see ntb_edma_setup_mws/peer).
+ */
+ if (*mw_count <= 1) {
+ dev_err(&ndev->dev,
+ "remote eDMA requires at least two MWS (have %u)\n",
+ *mw_count);
+ return -ENODEV;
+ }
+
+ ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
+ if (!ctx->wq) {
+ ntb_transport_edma_uninit(nt);
+ return -ENOMEM;
+ }
+
+ /* Reserve the last peer MW exclusively for the eDMA window. */
+ *mw_count -= 1;
+
+ return 0;
+}
+
+static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
+{
+ ntb_transport_edma_uninit(nt);
+}
+
+static const struct ntb_transport_backend_ops edma_backend_ops = {
+ .enable = ntb_transport_edma_enable,
+ .disable = ntb_transport_edma_disable,
+ .qp_init = ntb_transport_edma_qp_init,
+ .qp_free = ntb_transport_edma_qp_free,
+ .pre_link_up = ntb_transport_edma_pre_link_up,
+ .post_link_up = ntb_transport_edma_post_link_up,
+ .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
+ .tx_free_entry = ntb_transport_edma_tx_free_entry,
+ .tx_enqueue = ntb_transport_edma_tx_enqueue,
+ .rx_enqueue = ntb_transport_edma_rx_enqueue,
+ .rx_poll = ntb_transport_edma_rx_poll,
+ .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
+};
+
+int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
+{
+ struct ntb_dev *ndev = nt->ndev;
+ int node;
+
+ node = dev_to_node(&ndev->dev);
+ nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
+ node);
+ if (!nt->priv)
+ return -ENOMEM;
+
+ nt->backend_ops = edma_backend_ops;
+ /*
+ * On remote eDMA mode, one DMA read channel is used for Host side
+ * to interrupt EP.
+ */
+ use_msi = false;
+ return 0;
+}
diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
index 51ff08062d73..9fff65980d3d 100644
--- a/drivers/ntb/ntb_transport_internal.h
+++ b/drivers/ntb/ntb_transport_internal.h
@@ -8,6 +8,7 @@
extern unsigned long max_mw_size;
extern unsigned int transport_mtu;
extern bool use_msi;
+extern bool use_remote_edma;
#define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
@@ -29,6 +30,11 @@ struct ntb_queue_entry {
struct ntb_payload_header __iomem *tx_hdr;
struct ntb_payload_header *rx_hdr;
};
+
+#ifdef CONFIG_NTB_TRANSPORT_EDMA
+ dma_addr_t addr;
+ struct scatterlist sgl;
+#endif
};
struct ntb_rx_info {
@@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
unsigned int qp_num);
struct device *get_dma_dev(struct ntb_dev *ndev);
+#ifdef CONFIG_NTB_TRANSPORT_EDMA
+int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
+#else
+static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
+{
+ return -EOPNOTSUPP;
+}
+#endif /* CONFIG_NTB_TRANSPORT_EDMA */
+
#endif /* _NTB_TRANSPORT_INTERNAL_H_ */
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2025-12-17 15:16 ` [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode Koichiro Den
@ 2025-12-19 15:00 ` Frank Li
2025-12-20 15:28 ` Koichiro Den
2026-01-06 18:51 ` Dave Jiang
1 sibling, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 15:00 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:16:00AM +0900, Koichiro Den wrote:
> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> located on the endpoint, to be driven by both host and endpoint.
>
> The endpoint exposes a dedicated memory window which contains the eDMA
> register block, a small control structure (struct ntb_edma_info) and
> per-channel linked-list (LL) rings for read channels. Endpoint drives
> its local eDMA write channels for its transmission, while host side
> uses the remote eDMA read channels for its transmission.
I just glace the code. Look likes you use standard DMA API and
per-channel linked-list (LL) ring, which can be pure software.
So it is not nessasry to binding to Designware EDMA. Maybe other vendor
PCIe's build DMA can work with this code?
Frank
>
> A key benefit of this backend is that the memory window no longer needs
> to carry data-plane payload. This makes the design less sensitive to
> limited memory window space and allows scaling to multiple queue pairs.
> The memory window layout is specific to the eDMA-backed backend, so
> there is no automatic fallback to the memcpy-based default transport
> that requires the different layout.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/ntb/Kconfig | 12 +
> drivers/ntb/Makefile | 2 +
> drivers/ntb/ntb_transport_core.c | 15 +-
> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> drivers/ntb/ntb_transport_internal.h | 15 +
> 5 files changed, 1029 insertions(+), 2 deletions(-)
> create mode 100644 drivers/ntb/ntb_transport_edma.c
>
> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> index df16c755b4da..5ba6d0b7f5ba 100644
> --- a/drivers/ntb/Kconfig
> +++ b/drivers/ntb/Kconfig
> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>
> If unsure, say N.
>
> +config NTB_TRANSPORT_EDMA
> + bool "NTB Transport backed by remote eDMA"
> + depends on NTB_TRANSPORT
> + depends on PCI
> + select DMA_ENGINE
> + select NTB_EDMA
> + help
> + Enable a transport backend that uses a remote DesignWare eDMA engine
> + exposed through a dedicated NTB memory window. The host uses the
> + endpoint's eDMA engine to move data in both directions.
> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> +
> endif # NTB
> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> index 9b66e5fafbc0..b9086b32ecde 100644
> --- a/drivers/ntb/Makefile
> +++ b/drivers/ntb/Makefile
> @@ -6,3 +6,5 @@ ntb-y := core.o
> ntb-$(CONFIG_NTB_MSI) += msi.o
>
> ntb_transport-y := ntb_transport_core.o
> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> index 40c2548f5930..bd21232f26fe 100644
> --- a/drivers/ntb/ntb_transport_core.c
> +++ b/drivers/ntb/ntb_transport_core.c
> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> #endif
>
> +bool use_remote_edma;
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> +module_param(use_remote_edma, bool, 0644);
> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> +#endif
> +
> static struct dentry *nt_debugfs_dir;
>
> /* Only two-ports NTB devices are supported */
> @@ -156,7 +162,7 @@ enum {
> #define drv_client(__drv) \
> container_of((__drv), struct ntb_transport_client, driver)
>
> -#define NTB_QP_DEF_NUM_ENTRIES 100
> +#define NTB_QP_DEF_NUM_ENTRIES 128
> #define NTB_LINK_DOWN_TIMEOUT 10
>
> static void ntb_transport_rxc_db(unsigned long data);
> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
>
> nt->ndev = ndev;
>
> - rc = ntb_transport_default_init(nt);
> + if (use_remote_edma)
> + rc = ntb_transport_edma_init(nt);
> + else
> + rc = ntb_transport_default_init(nt);
> +
> if (rc)
> return rc;
>
> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
>
> nt->qp_bitmap_free &= ~qp_bit;
>
> + qp->qp_bit = qp_bit;
> qp->cb_data = data;
> qp->rx_handler = handlers->rx_handler;
> qp->tx_handler = handlers->tx_handler;
> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> new file mode 100644
> index 000000000000..6ae5da0a1367
> --- /dev/null
> +++ b/drivers/ntb/ntb_transport_edma.c
> @@ -0,0 +1,987 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NTB transport backend for remote DesignWare eDMA.
> + *
> + * This implements the backend_ops used when use_remote_edma=1 and
> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> + */
> +
> +#include <linux/bug.h>
> +#include <linux/compiler.h>
> +#include <linux/debugfs.h>
> +#include <linux/dmaengine.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/errno.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/ntb.h>
> +#include <linux/pci.h>
> +#include <linux/pci-epc.h>
> +#include <linux/seq_file.h>
> +#include <linux/slab.h>
> +
> +#include "hw/edma/ntb_hw_edma.h"
> +#include "ntb_transport_internal.h"
> +
> +#define NTB_EDMA_RING_ORDER 7
> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> +
> +#define NTB_EDMA_MAX_POLL 32
> +
> +/*
> + * Remote eDMA mode implementation
> + */
> +struct ntb_transport_ctx_edma {
> + remote_edma_mode_t remote_edma_mode;
> + struct device *dma_dev;
> + struct workqueue_struct *wq;
> + struct ntb_edma_chans chans;
> +};
> +
> +struct ntb_transport_qp_edma {
> + struct ntb_transport_qp *qp;
> +
> + /*
> + * For ensuring peer notification in non-atomic context.
> + * ntb_peer_db_set might sleep or schedule.
> + */
> + struct work_struct db_work;
> +
> + u32 rx_prod;
> + u32 rx_cons;
> + u32 tx_cons;
> + u32 tx_issue;
> +
> + spinlock_t rx_lock;
> + spinlock_t tx_lock;
> +
> + struct work_struct rx_work;
> + struct work_struct tx_work;
> +};
> +
> +struct ntb_edma_desc {
> + u32 len;
> + u32 flags;
> + u64 addr; /* DMA address */
> + u64 data;
> +};
> +
> +struct ntb_edma_ring {
> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> + u32 head;
> + u32 tail;
> +};
> +
> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> +}
> +
> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> +}
> +
> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> +{
> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> +}
> +
> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return n ^ !!ntb_qp_edma_is_ep(qp);
> +}
> +
> +static inline struct ntb_edma_ring *
> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> +{
> + unsigned int r = ntb_edma_ring_sel(qp, n);
> +
> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> +}
> +
> +static inline struct ntb_edma_ring __iomem *
> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> +{
> + unsigned int r = ntb_edma_ring_sel(qp, n);
> +
> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> +}
> +
> +static inline struct ntb_edma_desc *
> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> +{
> + return &ntb_edma_ring_local(qp, n)->desc[i];
> +}
> +
> +static inline struct ntb_edma_desc __iomem *
> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> + unsigned int i)
> +{
> + return &ntb_edma_ring_remote(qp, n)->desc[i];
> +}
> +
> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_local(qp, n)->head;
> +}
> +
> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_remote(qp, n)->head;
> +}
> +
> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_local(qp, n)->tail;
> +}
> +
> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_remote(qp, n)->tail;
> +}
> +
> +/* The 'i' must be generated by ntb_edma_ring_idx() */
> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> +
> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> +
> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> +
> +/* ntb_edma_ring helpers */
> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> +{
> + return v & NTB_EDMA_RING_MASK;
> +}
> +
> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> +{
> + if (head >= tail) {
> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> + return head - tail;
> + }
> +
> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> + return U32_MAX - tail + head + 1;
> +}
> +
> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> +{
> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> +}
> +
> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> +{
> + return ntb_edma_ring_free_entry(head, tail) == 0;
> +}
> +
> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + unsigned int head, tail;
> +
> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> + /* In this scope, only 'head' might proceed */
> + tail = READ_ONCE(edma->tx_issue);
> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> + }
> + /*
> + * 'used' amount indicates how much the other end has refilled,
> + * which are available for us to use for TX.
> + */
> + return ntb_edma_ring_used_entry(head, tail);
> +}
> +
> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> + struct ntb_transport_qp *qp)
> +{
> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> +
> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> + seq_putc(s, '\n');
> +
> + seq_puts(s, "Using Remote eDMA - Yes\n");
> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> +}
> +
> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> +
> + if (ctx->wq)
> + destroy_workqueue(ctx->wq);
> + ctx->wq = NULL;
> +
> + ntb_edma_teardown_chans(&ctx->chans);
> +
> + switch (ctx->remote_edma_mode) {
> + case REMOTE_EDMA_EP:
> + ntb_edma_teardown_mws(nt->ndev);
> + break;
> + case REMOTE_EDMA_RC:
> + ntb_edma_teardown_peer(nt->ndev);
> + break;
> + case REMOTE_EDMA_UNKNOWN:
> + default:
> + break;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> +}
> +
> +static void ntb_transport_edma_db_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma =
> + container_of(work, struct ntb_transport_qp_edma, db_work);
> + struct ntb_transport_qp *qp = edma->qp;
> +
> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> +}
> +
> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> +{
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> + return;
> +
> + /*
> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> + * may sleep, delegate the actual doorbell write to a workqueue.
> + */
> + queue_work(system_highpri_wq, &edma->db_work);
> +}
> +
> +static void ntb_transport_edma_isr(void *data, int qp_num)
> +{
> + struct ntb_transport_ctx *nt = data;
> + struct ntb_transport_qp_edma *edma;
> + struct ntb_transport_ctx_edma *ctx;
> + struct ntb_transport_qp *qp;
> +
> + if (qp_num < 0 || qp_num >= nt->qp_count)
> + return;
> +
> + qp = &nt->qp_vec[qp_num];
> + if (WARN_ON(!qp))
> + return;
> +
> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> + edma = qp->priv;
> +
> + queue_work(ctx->wq, &edma->rx_work);
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int peer_mw;
> + int rc;
> +
> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> + return 0;
> +
> + peer_mw = ntb_peer_mw_count(ndev);
> + if (peer_mw <= 0)
> + return -ENODEV;
> +
> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> + return rc;
> + }
> +
> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> + goto err_teardown_peer;
> + }
> +
> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> + rc);
> + goto err_teardown_chans;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> + return 0;
> +
> +err_teardown_chans:
> + ntb_edma_teardown_chans(&ctx->chans);
> +err_teardown_peer:
> + ntb_edma_teardown_peer(ndev);
> + return rc;
> +}
> +
> +
> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int peer_mw;
> + int rc;
> +
> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> + return 0;
> +
> + /**
> + * This check assumes that the endpoint (pci-epf-vntb.c)
> + * ntb_dev_ops implements .get_private_data() while the host side
> + * (ntb_hw_epf.c) does not.
> + */
> + if (!ntb_get_private_data(ndev))
> + return 0;
> +
> + peer_mw = ntb_peer_mw_count(ndev);
> + if (peer_mw <= 0)
> + return -ENODEV;
> +
> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> + ntb_transport_edma_isr, nt);
> + if (rc) {
> + dev_err(&pdev->dev,
> + "Failed to set up memory window for eDMA: %d\n", rc);
> + return rc;
> + }
> +
> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> + ntb_edma_teardown_mws(ndev);
> + return rc;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> + return 0;
> +}
> +
> +
> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> + unsigned int qp_num)
> +{
> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> + struct ntb_dev *ndev = nt->ndev;
> + struct ntb_queue_entry *entry;
> + struct ntb_transport_mw *mw;
> + unsigned int mw_num, mw_count, qp_count;
> + unsigned int qp_offset, rx_info_offset;
> + unsigned int mw_size, mw_size_per_qp;
> + unsigned int num_qps_mw;
> + size_t edma_total;
> + unsigned int i;
> + int node;
> +
> + mw_count = nt->mw_count;
> + qp_count = nt->qp_count;
> +
> + mw_num = QP_TO_MW(nt, qp_num);
> + mw = &nt->mw_vec[mw_num];
> +
> + if (!mw->virt_addr)
> + return -ENOMEM;
> +
> + if (mw_num < qp_count % mw_count)
> + num_qps_mw = qp_count / mw_count + 1;
> + else
> + num_qps_mw = qp_count / mw_count;
> +
> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> + if (max_mw_size && mw_size > max_mw_size)
> + mw_size = max_mw_size;
> +
> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> +
> + qp->tx_mw_size = mw_size_per_qp;
> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> + if (!qp->tx_mw)
> + return -EINVAL;
> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> + if (!qp->tx_mw_phys)
> + return -EINVAL;
> + qp->rx_info = qp->tx_mw + rx_info_offset;
> + qp->rx_buff = mw->virt_addr + qp_offset;
> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> +
> + /* Due to housekeeping, there must be at least 2 buffs */
> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> +
> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> + edma_total = 2 * sizeof(struct ntb_edma_ring);
> + if (rx_info_offset < edma_total) {
> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> + edma_total, rx_info_offset);
> + return -EINVAL;
> + }
> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> +
> + /*
> + * Checking to see if we have more entries than the default.
> + * We should add additional entries if that is the case so we
> + * can be in sync with the transport frames.
> + */
> + node = dev_to_node(&ndev->dev);
> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> + if (!entry)
> + return -ENOMEM;
> +
> + entry->qp = qp;
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> + &qp->rx_free_q);
> + qp->rx_alloc_entry++;
> + }
> +
> + memset(qp->rx_buff, 0, edma_total);
> +
> + qp->rx_pkts = 0;
> + qp->tx_pkts = 0;
> +
> + return 0;
> +}
> +
> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_queue_entry *entry;
> + struct ntb_edma_desc *in;
> + unsigned int len;
> + bool link_down;
> + u32 idx;
> +
> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> + edma->rx_cons) == 0)
> + return 0;
> +
> + idx = ntb_edma_ring_idx(edma->rx_cons);
> + in = NTB_DESC_RX_I(qp, idx);
> + if (!(in->flags & DESC_DONE_FLAG))
> + return 0;
> +
> + link_down = in->flags & LINK_DOWN_FLAG;
> + in->flags = 0;
> + len = in->len; /* might be smaller than entry->len */
> +
> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> + if (WARN_ON(!entry))
> + return 0;
> +
> + if (link_down) {
> + ntb_qp_link_down(qp);
> + edma->rx_cons++;
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> + return 1;
> + }
> +
> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> +
> + qp->rx_bytes += len;
> + qp->rx_pkts++;
> + edma->rx_cons++;
> +
> + if (qp->rx_handler && qp->client_ready)
> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> +
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> + return 1;
> +}
> +
> +static void ntb_transport_edma_rx_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma = container_of(
> + work, struct ntb_transport_qp_edma, rx_work);
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> + unsigned int i;
> +
> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> + if (!ntb_transport_edma_rx_complete(qp))
> + break;
> + }
> +
> + if (ntb_transport_edma_rx_complete(qp))
> + queue_work(ctx->wq, &edma->rx_work);
> +}
> +
> +static void ntb_transport_edma_tx_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma = container_of(
> + work, struct ntb_transport_qp_edma, tx_work);
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_edma_desc *in, __iomem *out;
> + struct ntb_queue_entry *entry;
> + unsigned int len;
> + void *cb_data;
> + u32 idx;
> +
> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> + edma->tx_cons) != 0) {
> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> + smp_rmb();
> +
> + idx = ntb_edma_ring_idx(edma->tx_cons);
> + in = NTB_DESC_TX_I(qp, idx);
> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> + break;
> +
> + in->data = 0;
> +
> + cb_data = entry->cb_data;
> + len = entry->len;
> +
> + out = NTB_DESC_TX_O(qp, idx);
> +
> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> +
> + /*
> + * No need to add barrier in-between to enforce ordering here.
> + * The other side proceeds only after both flags and tail are
> + * updated.
> + */
> + iowrite32(entry->flags, &out->flags);
> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> +
> + ntb_transport_edma_notify_peer(edma);
> +
> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> + &qp->tx_free_q);
> +
> + if (qp->tx_handler)
> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> +
> + /* stat updates */
> + qp->tx_bytes += len;
> + qp->tx_pkts++;
> + }
> +}
> +
> +static void ntb_transport_edma_tx_cb(void *data,
> + const struct dmaengine_result *res)
> +{
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + enum dmaengine_tx_result dma_err = res->result;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + switch (dma_err) {
> + case DMA_TRANS_READ_FAILED:
> + case DMA_TRANS_WRITE_FAILED:
> + case DMA_TRANS_ABORTED:
> + entry->errors++;
> + entry->len = -EIO;
> + break;
> + case DMA_TRANS_NOERROR:
> + default:
> + break;
> + }
> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> + sg_dma_address(&entry->sgl) = 0;
> +
> + entry->flags |= DESC_DONE_FLAG;
> +
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> + size_t len, void *rc_src, dma_addr_t dst,
> + struct ntb_queue_entry *entry)
> +{
> + struct scatterlist *sgl = &entry->sgl;
> + struct dma_async_tx_descriptor *txd;
> + struct dma_slave_config cfg;
> + dma_cookie_t cookie;
> + int nents, rc;
> +
> + if (!d)
> + return -ENODEV;
> +
> + if (!chan)
> + return -ENXIO;
> +
> + if (WARN_ON(!rc_src || !dst))
> + return -EINVAL;
> +
> + if (WARN_ON(sg_dma_address(sgl)))
> + return -EINVAL;
> +
> + sg_init_one(sgl, rc_src, len);
> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> + if (nents <= 0)
> + return -EIO;
> +
> + memset(&cfg, 0, sizeof(cfg));
> + cfg.dst_addr = dst;
> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + cfg.direction = DMA_MEM_TO_DEV;
> +
> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> + if (!txd) {
> + rc = -EIO;
> + goto out_unmap;
> + }
> +
> + txd->callback_result = ntb_transport_edma_tx_cb;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie)) {
> + rc = -EIO;
> + goto out_unmap;
> + }
> + dma_async_issue_pending(chan);
> + return 0;
> +out_unmap:
> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> + return rc;
> +}
> +
> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct ntb_edma_desc *in, __iomem *out;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + unsigned int len = entry->len;
> + struct dma_chan *chan;
> + u32 issue, idx, head;
> + dma_addr_t dst;
> + int rc;
> +
> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> +
> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> + issue = edma->tx_issue;
> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> + qp->tx_ring_full++;
> + return -ENOSPC;
> + }
> +
> + /*
> + * ntb_transport_edma_tx_work() checks entry->flags
> + * so it needs to be set before tx_issue++.
> + */
> + idx = ntb_edma_ring_idx(issue);
> + in = NTB_DESC_TX_I(qp, idx);
> + in->data = (uintptr_t)entry;
> +
> + /* Make in->data visible before tx_issue++ */
> + smp_wmb();
> +
> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> + }
> +
> + /* Publish the final transfer length to the other end */
> + out = NTB_DESC_TX_O(qp, idx);
> + iowrite32(len, &out->len);
> + ioread32(&out->len);
> +
> + if (unlikely(!len)) {
> + entry->flags |= DESC_DONE_FLAG;
> + queue_work(ctx->wq, &edma->tx_work);
> + return 0;
> + }
> +
> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> + dma_rmb();
> +
> + /* kick remote eDMA read transfer */
> + dst = (dma_addr_t)in->addr;
> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> + entry->buf, dst, entry);
> + if (rc) {
> + entry->errors++;
> + entry->len = -EIO;
> + entry->flags |= DESC_DONE_FLAG;
> + queue_work(ctx->wq, &edma->tx_work);
> + }
> + return 0;
> +}
> +
> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry,
> + void *cb, void *data, unsigned int len,
> + unsigned int flags)
> +{
> + struct device *dma_dev;
> +
> + if (entry->addr) {
> + /* Deferred unmap */
> + dma_dev = get_dma_dev(qp->ndev);
> + dma_unmap_single(dma_dev, entry->addr, entry->len,
> + DMA_TO_DEVICE);
> + }
> +
> + entry->cb_data = cb;
> + entry->buf = data;
> + entry->len = len;
> + entry->flags = flags;
> + entry->errors = 0;
> + entry->addr = 0;
> +
> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> +
> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> +}
> +
> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_edma_desc *in, __iomem *out;
> + unsigned int len = entry->len;
> + void *data = entry->buf;
> + dma_addr_t dst;
> + u32 idx;
> + int rc;
> +
> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> + rc = dma_mapping_error(dma_dev, dst);
> + if (rc)
> + return rc;
> +
> + guard(spinlock_bh)(&edma->rx_lock);
> +
> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> + READ_ONCE(edma->rx_cons))) {
> + rc = -ENOSPC;
> + goto out_unmap;
> + }
> +
> + idx = ntb_edma_ring_idx(edma->rx_prod);
> + in = NTB_DESC_RX_I(qp, idx);
> + out = NTB_DESC_RX_O(qp, idx);
> +
> + iowrite32(len, &out->len);
> + iowrite64(dst, &out->addr);
> +
> + WARN_ON(in->flags & DESC_DONE_FLAG);
> + in->data = (uintptr_t)entry;
> + entry->addr = dst;
> +
> + /* Ensure len/addr are visible before the head update */
> + dma_wmb();
> +
> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> +
> + return 0;
> +out_unmap:
> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> + return rc;
> +}
> +
> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + int rc;
> +
> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> + if (rc) {
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> + &qp->rx_free_q);
> + return rc;
> + }
> +
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> +
> + if (qp->active)
> + tasklet_schedule(&qp->rxc_db_work);
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + queue_work(ctx->wq, &edma->rx_work);
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> + unsigned int qp_num)
> +{
> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> + struct ntb_transport_qp_edma *edma;
> + struct ntb_dev *ndev = nt->ndev;
> + int node;
> +
> + node = dev_to_node(&ndev->dev);
> +
> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> + if (!qp->priv)
> + return -ENOMEM;
> +
> + edma = (struct ntb_transport_qp_edma *)qp->priv;
> + edma->qp = qp;
> + edma->rx_prod = 0;
> + edma->rx_cons = 0;
> + edma->tx_cons = 0;
> + edma->tx_issue = 0;
> +
> + spin_lock_init(&edma->rx_lock);
> + spin_lock_init(&edma->tx_lock);
> +
> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + cancel_work_sync(&edma->db_work);
> + cancel_work_sync(&edma->rx_work);
> + cancel_work_sync(&edma->tx_work);
> +
> + kfree(qp->priv);
> +}
> +
> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int rc;
> +
> + rc = ntb_transport_edma_ep_init(nt);
> + if (rc)
> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> +
> + return rc;
> +}
> +
> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int rc;
> +
> + rc = ntb_transport_edma_rc_init(nt);
> + if (rc)
> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> +
> + return rc;
> +}
> +
> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> + unsigned int *mw_count)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> +
> + if (!use_remote_edma)
> + return 0;
> +
> + /*
> + * We need at least one MW for the transport plus one MW reserved
> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> + */
> + if (*mw_count <= 1) {
> + dev_err(&ndev->dev,
> + "remote eDMA requires at least two MWS (have %u)\n",
> + *mw_count);
> + return -ENODEV;
> + }
> +
> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> + if (!ctx->wq) {
> + ntb_transport_edma_uninit(nt);
> + return -ENOMEM;
> + }
> +
> + /* Reserve the last peer MW exclusively for the eDMA window. */
> + *mw_count -= 1;
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> +{
> + ntb_transport_edma_uninit(nt);
> +}
> +
> +static const struct ntb_transport_backend_ops edma_backend_ops = {
> + .enable = ntb_transport_edma_enable,
> + .disable = ntb_transport_edma_disable,
> + .qp_init = ntb_transport_edma_qp_init,
> + .qp_free = ntb_transport_edma_qp_free,
> + .pre_link_up = ntb_transport_edma_pre_link_up,
> + .post_link_up = ntb_transport_edma_post_link_up,
> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> + .rx_poll = ntb_transport_edma_rx_poll,
> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> +};
> +
> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + int node;
> +
> + node = dev_to_node(&ndev->dev);
> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> + node);
> + if (!nt->priv)
> + return -ENOMEM;
> +
> + nt->backend_ops = edma_backend_ops;
> + /*
> + * On remote eDMA mode, one DMA read channel is used for Host side
> + * to interrupt EP.
> + */
> + use_msi = false;
> + return 0;
> +}
> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> index 51ff08062d73..9fff65980d3d 100644
> --- a/drivers/ntb/ntb_transport_internal.h
> +++ b/drivers/ntb/ntb_transport_internal.h
> @@ -8,6 +8,7 @@
> extern unsigned long max_mw_size;
> extern unsigned int transport_mtu;
> extern bool use_msi;
> +extern bool use_remote_edma;
>
> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
>
> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> struct ntb_payload_header __iomem *tx_hdr;
> struct ntb_payload_header *rx_hdr;
> };
> +
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> + dma_addr_t addr;
> + struct scatterlist sgl;
> +#endif
> };
>
> struct ntb_rx_info {
> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> unsigned int qp_num);
> struct device *get_dma_dev(struct ntb_dev *ndev);
>
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> +#else
> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> +
> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2025-12-19 15:00 ` Frank Li
@ 2025-12-20 15:28 ` Koichiro Den
2026-01-06 18:46 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-20 15:28 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Fri, Dec 19, 2025 at 10:00:52AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:16:00AM +0900, Koichiro Den wrote:
> > Add a new ntb_transport backend that uses a DesignWare eDMA engine
> > located on the endpoint, to be driven by both host and endpoint.
> >
> > The endpoint exposes a dedicated memory window which contains the eDMA
> > register block, a small control structure (struct ntb_edma_info) and
> > per-channel linked-list (LL) rings for read channels. Endpoint drives
> > its local eDMA write channels for its transmission, while host side
> > uses the remote eDMA read channels for its transmission.
>
> I just glace the code. Look likes you use standard DMA API and
> per-channel linked-list (LL) ring, which can be pure software.
>
> So it is not nessasry to binding to Designware EDMA. Maybe other vendor
> PCIe's build DMA can work with this code?
Yes, the DesignWare-specific parts are encapsulated under
drivers/ntb/hw/edma/, so the ntb_transport_edma itself is not tightly
coupled to DesignWare eDMA. In other words, if it's not the case and
something remains, that's just my mistake.
I intentionally avoided introducing an extra abstraction layer prematurely.
If we later want to support other vendors' PCIe built-in DMA engines for
edma_backend_ops, an additional internal abstraction under the
'edma_backend_ops' implementation can be introduced at that point.
Do you think I should do so now?
Koichiro
>
> Frank
> >
> > A key benefit of this backend is that the memory window no longer needs
> > to carry data-plane payload. This makes the design less sensitive to
> > limited memory window space and allows scaling to multiple queue pairs.
> > The memory window layout is specific to the eDMA-backed backend, so
> > there is no automatic fallback to the memcpy-based default transport
> > that requires the different layout.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > drivers/ntb/Kconfig | 12 +
> > drivers/ntb/Makefile | 2 +
> > drivers/ntb/ntb_transport_core.c | 15 +-
> > drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> > drivers/ntb/ntb_transport_internal.h | 15 +
> > 5 files changed, 1029 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/ntb/ntb_transport_edma.c
> >
> > diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> > index df16c755b4da..5ba6d0b7f5ba 100644
> > --- a/drivers/ntb/Kconfig
> > +++ b/drivers/ntb/Kconfig
> > @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >
> > If unsure, say N.
> >
> > +config NTB_TRANSPORT_EDMA
> > + bool "NTB Transport backed by remote eDMA"
> > + depends on NTB_TRANSPORT
> > + depends on PCI
> > + select DMA_ENGINE
> > + select NTB_EDMA
> > + help
> > + Enable a transport backend that uses a remote DesignWare eDMA engine
> > + exposed through a dedicated NTB memory window. The host uses the
> > + endpoint's eDMA engine to move data in both directions.
> > + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> > +
> > endif # NTB
> > diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> > index 9b66e5fafbc0..b9086b32ecde 100644
> > --- a/drivers/ntb/Makefile
> > +++ b/drivers/ntb/Makefile
> > @@ -6,3 +6,5 @@ ntb-y := core.o
> > ntb-$(CONFIG_NTB_MSI) += msi.o
> >
> > ntb_transport-y := ntb_transport_core.o
> > +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> > +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> > diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> > index 40c2548f5930..bd21232f26fe 100644
> > --- a/drivers/ntb/ntb_transport_core.c
> > +++ b/drivers/ntb/ntb_transport_core.c
> > @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> > MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> > #endif
> >
> > +bool use_remote_edma;
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > +module_param(use_remote_edma, bool, 0644);
> > +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> > +#endif
> > +
> > static struct dentry *nt_debugfs_dir;
> >
> > /* Only two-ports NTB devices are supported */
> > @@ -156,7 +162,7 @@ enum {
> > #define drv_client(__drv) \
> > container_of((__drv), struct ntb_transport_client, driver)
> >
> > -#define NTB_QP_DEF_NUM_ENTRIES 100
> > +#define NTB_QP_DEF_NUM_ENTRIES 128
> > #define NTB_LINK_DOWN_TIMEOUT 10
> >
> > static void ntb_transport_rxc_db(unsigned long data);
> > @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
> >
> > nt->ndev = ndev;
> >
> > - rc = ntb_transport_default_init(nt);
> > + if (use_remote_edma)
> > + rc = ntb_transport_edma_init(nt);
> > + else
> > + rc = ntb_transport_default_init(nt);
> > +
> > if (rc)
> > return rc;
> >
> > @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
> >
> > nt->qp_bitmap_free &= ~qp_bit;
> >
> > + qp->qp_bit = qp_bit;
> > qp->cb_data = data;
> > qp->rx_handler = handlers->rx_handler;
> > qp->tx_handler = handlers->tx_handler;
> > diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> > new file mode 100644
> > index 000000000000..6ae5da0a1367
> > --- /dev/null
> > +++ b/drivers/ntb/ntb_transport_edma.c
> > @@ -0,0 +1,987 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NTB transport backend for remote DesignWare eDMA.
> > + *
> > + * This implements the backend_ops used when use_remote_edma=1 and
> > + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> > + */
> > +
> > +#include <linux/bug.h>
> > +#include <linux/compiler.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/dmaengine.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/errno.h>
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/ntb.h>
> > +#include <linux/pci.h>
> > +#include <linux/pci-epc.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/slab.h>
> > +
> > +#include "hw/edma/ntb_hw_edma.h"
> > +#include "ntb_transport_internal.h"
> > +
> > +#define NTB_EDMA_RING_ORDER 7
> > +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> > +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> > +
> > +#define NTB_EDMA_MAX_POLL 32
> > +
> > +/*
> > + * Remote eDMA mode implementation
> > + */
> > +struct ntb_transport_ctx_edma {
> > + remote_edma_mode_t remote_edma_mode;
> > + struct device *dma_dev;
> > + struct workqueue_struct *wq;
> > + struct ntb_edma_chans chans;
> > +};
> > +
> > +struct ntb_transport_qp_edma {
> > + struct ntb_transport_qp *qp;
> > +
> > + /*
> > + * For ensuring peer notification in non-atomic context.
> > + * ntb_peer_db_set might sleep or schedule.
> > + */
> > + struct work_struct db_work;
> > +
> > + u32 rx_prod;
> > + u32 rx_cons;
> > + u32 tx_cons;
> > + u32 tx_issue;
> > +
> > + spinlock_t rx_lock;
> > + spinlock_t tx_lock;
> > +
> > + struct work_struct rx_work;
> > + struct work_struct tx_work;
> > +};
> > +
> > +struct ntb_edma_desc {
> > + u32 len;
> > + u32 flags;
> > + u64 addr; /* DMA address */
> > + u64 data;
> > +};
> > +
> > +struct ntb_edma_ring {
> > + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> > + u32 head;
> > + u32 tail;
> > +};
> > +
> > +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> > +}
> > +
> > +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> > +}
> > +
> > +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> > +{
> > + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> > +}
> > +
> > +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return n ^ !!ntb_qp_edma_is_ep(qp);
> > +}
> > +
> > +static inline struct ntb_edma_ring *
> > +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> > +{
> > + unsigned int r = ntb_edma_ring_sel(qp, n);
> > +
> > + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> > +}
> > +
> > +static inline struct ntb_edma_ring __iomem *
> > +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> > +{
> > + unsigned int r = ntb_edma_ring_sel(qp, n);
> > +
> > + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> > +}
> > +
> > +static inline struct ntb_edma_desc *
> > +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->desc[i];
> > +}
> > +
> > +static inline struct ntb_edma_desc __iomem *
> > +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> > + unsigned int i)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->desc[i];
> > +}
> > +
> > +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->head;
> > +}
> > +
> > +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->head;
> > +}
> > +
> > +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->tail;
> > +}
> > +
> > +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->tail;
> > +}
> > +
> > +/* The 'i' must be generated by ntb_edma_ring_idx() */
> > +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> > +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> > +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> > +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> > +
> > +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> > +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> > +
> > +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> > +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> > +
> > +/* ntb_edma_ring helpers */
> > +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> > +{
> > + return v & NTB_EDMA_RING_MASK;
> > +}
> > +
> > +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> > +{
> > + if (head >= tail) {
> > + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> > + return head - tail;
> > + }
> > +
> > + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> > + return U32_MAX - tail + head + 1;
> > +}
> > +
> > +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> > +{
> > + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> > +}
> > +
> > +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> > +{
> > + return ntb_edma_ring_free_entry(head, tail) == 0;
> > +}
> > +
> > +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + unsigned int head, tail;
> > +
> > + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> > + /* In this scope, only 'head' might proceed */
> > + tail = READ_ONCE(edma->tx_issue);
> > + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> > + }
> > + /*
> > + * 'used' amount indicates how much the other end has refilled,
> > + * which are available for us to use for TX.
> > + */
> > + return ntb_edma_ring_used_entry(head, tail);
> > +}
> > +
> > +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> > + struct ntb_transport_qp *qp)
> > +{
> > + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> > + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> > + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> > + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> > + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> > + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> > +
> > + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> > + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> > + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> > + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> > + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> > + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> > + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> > + seq_putc(s, '\n');
> > +
> > + seq_puts(s, "Using Remote eDMA - Yes\n");
> > + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> > +}
> > +
> > +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > +
> > + if (ctx->wq)
> > + destroy_workqueue(ctx->wq);
> > + ctx->wq = NULL;
> > +
> > + ntb_edma_teardown_chans(&ctx->chans);
> > +
> > + switch (ctx->remote_edma_mode) {
> > + case REMOTE_EDMA_EP:
> > + ntb_edma_teardown_mws(nt->ndev);
> > + break;
> > + case REMOTE_EDMA_RC:
> > + ntb_edma_teardown_peer(nt->ndev);
> > + break;
> > + case REMOTE_EDMA_UNKNOWN:
> > + default:
> > + break;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> > +}
> > +
> > +static void ntb_transport_edma_db_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma =
> > + container_of(work, struct ntb_transport_qp_edma, db_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > +
> > + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> > +}
> > +
> > +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> > +{
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> > + return;
> > +
> > + /*
> > + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> > + * may sleep, delegate the actual doorbell write to a workqueue.
> > + */
> > + queue_work(system_highpri_wq, &edma->db_work);
> > +}
> > +
> > +static void ntb_transport_edma_isr(void *data, int qp_num)
> > +{
> > + struct ntb_transport_ctx *nt = data;
> > + struct ntb_transport_qp_edma *edma;
> > + struct ntb_transport_ctx_edma *ctx;
> > + struct ntb_transport_qp *qp;
> > +
> > + if (qp_num < 0 || qp_num >= nt->qp_count)
> > + return;
> > +
> > + qp = &nt->qp_vec[qp_num];
> > + if (WARN_ON(!qp))
> > + return;
> > +
> > + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> > + edma = qp->priv;
> > +
> > + queue_work(ctx->wq, &edma->rx_work);
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int peer_mw;
> > + int rc;
> > +
> > + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> > + return 0;
> > +
> > + peer_mw = ntb_peer_mw_count(ndev);
> > + if (peer_mw <= 0)
> > + return -ENODEV;
> > +
> > + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> > + goto err_teardown_peer;
> > + }
> > +
> > + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> > + rc);
> > + goto err_teardown_chans;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> > + return 0;
> > +
> > +err_teardown_chans:
> > + ntb_edma_teardown_chans(&ctx->chans);
> > +err_teardown_peer:
> > + ntb_edma_teardown_peer(ndev);
> > + return rc;
> > +}
> > +
> > +
> > +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int peer_mw;
> > + int rc;
> > +
> > + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> > + return 0;
> > +
> > + /**
> > + * This check assumes that the endpoint (pci-epf-vntb.c)
> > + * ntb_dev_ops implements .get_private_data() while the host side
> > + * (ntb_hw_epf.c) does not.
> > + */
> > + if (!ntb_get_private_data(ndev))
> > + return 0;
> > +
> > + peer_mw = ntb_peer_mw_count(ndev);
> > + if (peer_mw <= 0)
> > + return -ENODEV;
> > +
> > + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> > + ntb_transport_edma_isr, nt);
> > + if (rc) {
> > + dev_err(&pdev->dev,
> > + "Failed to set up memory window for eDMA: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> > + ntb_edma_teardown_mws(ndev);
> > + return rc;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> > + return 0;
> > +}
> > +
> > +
> > +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> > + unsigned int qp_num)
> > +{
> > + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct ntb_queue_entry *entry;
> > + struct ntb_transport_mw *mw;
> > + unsigned int mw_num, mw_count, qp_count;
> > + unsigned int qp_offset, rx_info_offset;
> > + unsigned int mw_size, mw_size_per_qp;
> > + unsigned int num_qps_mw;
> > + size_t edma_total;
> > + unsigned int i;
> > + int node;
> > +
> > + mw_count = nt->mw_count;
> > + qp_count = nt->qp_count;
> > +
> > + mw_num = QP_TO_MW(nt, qp_num);
> > + mw = &nt->mw_vec[mw_num];
> > +
> > + if (!mw->virt_addr)
> > + return -ENOMEM;
> > +
> > + if (mw_num < qp_count % mw_count)
> > + num_qps_mw = qp_count / mw_count + 1;
> > + else
> > + num_qps_mw = qp_count / mw_count;
> > +
> > + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> > + if (max_mw_size && mw_size > max_mw_size)
> > + mw_size = max_mw_size;
> > +
> > + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> > + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> > + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> > +
> > + qp->tx_mw_size = mw_size_per_qp;
> > + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> > + if (!qp->tx_mw)
> > + return -EINVAL;
> > + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> > + if (!qp->tx_mw_phys)
> > + return -EINVAL;
> > + qp->rx_info = qp->tx_mw + rx_info_offset;
> > + qp->rx_buff = mw->virt_addr + qp_offset;
> > + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> > +
> > + /* Due to housekeeping, there must be at least 2 buffs */
> > + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> > + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> > +
> > + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> > + edma_total = 2 * sizeof(struct ntb_edma_ring);
> > + if (rx_info_offset < edma_total) {
> > + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> > + edma_total, rx_info_offset);
> > + return -EINVAL;
> > + }
> > + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> > + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> > +
> > + /*
> > + * Checking to see if we have more entries than the default.
> > + * We should add additional entries if that is the case so we
> > + * can be in sync with the transport frames.
> > + */
> > + node = dev_to_node(&ndev->dev);
> > + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> > + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> > + if (!entry)
> > + return -ENOMEM;
> > +
> > + entry->qp = qp;
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> > + &qp->rx_free_q);
> > + qp->rx_alloc_entry++;
> > + }
> > +
> > + memset(qp->rx_buff, 0, edma_total);
> > +
> > + qp->rx_pkts = 0;
> > + qp->tx_pkts = 0;
> > +
> > + return 0;
> > +}
> > +
> > +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_queue_entry *entry;
> > + struct ntb_edma_desc *in;
> > + unsigned int len;
> > + bool link_down;
> > + u32 idx;
> > +
> > + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> > + edma->rx_cons) == 0)
> > + return 0;
> > +
> > + idx = ntb_edma_ring_idx(edma->rx_cons);
> > + in = NTB_DESC_RX_I(qp, idx);
> > + if (!(in->flags & DESC_DONE_FLAG))
> > + return 0;
> > +
> > + link_down = in->flags & LINK_DOWN_FLAG;
> > + in->flags = 0;
> > + len = in->len; /* might be smaller than entry->len */
> > +
> > + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> > + if (WARN_ON(!entry))
> > + return 0;
> > +
> > + if (link_down) {
> > + ntb_qp_link_down(qp);
> > + edma->rx_cons++;
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> > + return 1;
> > + }
> > +
> > + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> > +
> > + qp->rx_bytes += len;
> > + qp->rx_pkts++;
> > + edma->rx_cons++;
> > +
> > + if (qp->rx_handler && qp->client_ready)
> > + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> > +
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> > + return 1;
> > +}
> > +
> > +static void ntb_transport_edma_rx_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma = container_of(
> > + work, struct ntb_transport_qp_edma, rx_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > + unsigned int i;
> > +
> > + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> > + if (!ntb_transport_edma_rx_complete(qp))
> > + break;
> > + }
> > +
> > + if (ntb_transport_edma_rx_complete(qp))
> > + queue_work(ctx->wq, &edma->rx_work);
> > +}
> > +
> > +static void ntb_transport_edma_tx_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma = container_of(
> > + work, struct ntb_transport_qp_edma, tx_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + struct ntb_queue_entry *entry;
> > + unsigned int len;
> > + void *cb_data;
> > + u32 idx;
> > +
> > + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> > + edma->tx_cons) != 0) {
> > + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> > + smp_rmb();
> > +
> > + idx = ntb_edma_ring_idx(edma->tx_cons);
> > + in = NTB_DESC_TX_I(qp, idx);
> > + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> > + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> > + break;
> > +
> > + in->data = 0;
> > +
> > + cb_data = entry->cb_data;
> > + len = entry->len;
> > +
> > + out = NTB_DESC_TX_O(qp, idx);
> > +
> > + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> > +
> > + /*
> > + * No need to add barrier in-between to enforce ordering here.
> > + * The other side proceeds only after both flags and tail are
> > + * updated.
> > + */
> > + iowrite32(entry->flags, &out->flags);
> > + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> > +
> > + ntb_transport_edma_notify_peer(edma);
> > +
> > + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> > + &qp->tx_free_q);
> > +
> > + if (qp->tx_handler)
> > + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> > +
> > + /* stat updates */
> > + qp->tx_bytes += len;
> > + qp->tx_pkts++;
> > + }
> > +}
> > +
> > +static void ntb_transport_edma_tx_cb(void *data,
> > + const struct dmaengine_result *res)
> > +{
> > + struct ntb_queue_entry *entry = data;
> > + struct ntb_transport_qp *qp = entry->qp;
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + enum dmaengine_tx_result dma_err = res->result;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + switch (dma_err) {
> > + case DMA_TRANS_READ_FAILED:
> > + case DMA_TRANS_WRITE_FAILED:
> > + case DMA_TRANS_ABORTED:
> > + entry->errors++;
> > + entry->len = -EIO;
> > + break;
> > + case DMA_TRANS_NOERROR:
> > + default:
> > + break;
> > + }
> > + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> > + sg_dma_address(&entry->sgl) = 0;
> > +
> > + entry->flags |= DESC_DONE_FLAG;
> > +
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> > + size_t len, void *rc_src, dma_addr_t dst,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct scatterlist *sgl = &entry->sgl;
> > + struct dma_async_tx_descriptor *txd;
> > + struct dma_slave_config cfg;
> > + dma_cookie_t cookie;
> > + int nents, rc;
> > +
> > + if (!d)
> > + return -ENODEV;
> > +
> > + if (!chan)
> > + return -ENXIO;
> > +
> > + if (WARN_ON(!rc_src || !dst))
> > + return -EINVAL;
> > +
> > + if (WARN_ON(sg_dma_address(sgl)))
> > + return -EINVAL;
> > +
> > + sg_init_one(sgl, rc_src, len);
> > + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> > + if (nents <= 0)
> > + return -EIO;
> > +
> > + memset(&cfg, 0, sizeof(cfg));
> > + cfg.dst_addr = dst;
> > + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > + cfg.direction = DMA_MEM_TO_DEV;
> > +
> > + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> > + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> > + if (!txd) {
> > + rc = -EIO;
> > + goto out_unmap;
> > + }
> > +
> > + txd->callback_result = ntb_transport_edma_tx_cb;
> > + txd->callback_param = entry;
> > +
> > + cookie = dmaengine_submit(txd);
> > + if (dma_submit_error(cookie)) {
> > + rc = -EIO;
> > + goto out_unmap;
> > + }
> > + dma_async_issue_pending(chan);
> > + return 0;
> > +out_unmap:
> > + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + unsigned int len = entry->len;
> > + struct dma_chan *chan;
> > + u32 issue, idx, head;
> > + dma_addr_t dst;
> > + int rc;
> > +
> > + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> > +
> > + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> > + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> > + issue = edma->tx_issue;
> > + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> > + qp->tx_ring_full++;
> > + return -ENOSPC;
> > + }
> > +
> > + /*
> > + * ntb_transport_edma_tx_work() checks entry->flags
> > + * so it needs to be set before tx_issue++.
> > + */
> > + idx = ntb_edma_ring_idx(issue);
> > + in = NTB_DESC_TX_I(qp, idx);
> > + in->data = (uintptr_t)entry;
> > +
> > + /* Make in->data visible before tx_issue++ */
> > + smp_wmb();
> > +
> > + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> > + }
> > +
> > + /* Publish the final transfer length to the other end */
> > + out = NTB_DESC_TX_O(qp, idx);
> > + iowrite32(len, &out->len);
> > + ioread32(&out->len);
> > +
> > + if (unlikely(!len)) {
> > + entry->flags |= DESC_DONE_FLAG;
> > + queue_work(ctx->wq, &edma->tx_work);
> > + return 0;
> > + }
> > +
> > + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> > + dma_rmb();
> > +
> > + /* kick remote eDMA read transfer */
> > + dst = (dma_addr_t)in->addr;
> > + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> > + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> > + entry->buf, dst, entry);
> > + if (rc) {
> > + entry->errors++;
> > + entry->len = -EIO;
> > + entry->flags |= DESC_DONE_FLAG;
> > + queue_work(ctx->wq, &edma->tx_work);
> > + }
> > + return 0;
> > +}
> > +
> > +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry,
> > + void *cb, void *data, unsigned int len,
> > + unsigned int flags)
> > +{
> > + struct device *dma_dev;
> > +
> > + if (entry->addr) {
> > + /* Deferred unmap */
> > + dma_dev = get_dma_dev(qp->ndev);
> > + dma_unmap_single(dma_dev, entry->addr, entry->len,
> > + DMA_TO_DEVICE);
> > + }
> > +
> > + entry->cb_data = cb;
> > + entry->buf = data;
> > + entry->len = len;
> > + entry->flags = flags;
> > + entry->errors = 0;
> > + entry->addr = 0;
> > +
> > + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> > +
> > + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> > +}
> > +
> > +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + unsigned int len = entry->len;
> > + void *data = entry->buf;
> > + dma_addr_t dst;
> > + u32 idx;
> > + int rc;
> > +
> > + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> > + rc = dma_mapping_error(dma_dev, dst);
> > + if (rc)
> > + return rc;
> > +
> > + guard(spinlock_bh)(&edma->rx_lock);
> > +
> > + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> > + READ_ONCE(edma->rx_cons))) {
> > + rc = -ENOSPC;
> > + goto out_unmap;
> > + }
> > +
> > + idx = ntb_edma_ring_idx(edma->rx_prod);
> > + in = NTB_DESC_RX_I(qp, idx);
> > + out = NTB_DESC_RX_O(qp, idx);
> > +
> > + iowrite32(len, &out->len);
> > + iowrite64(dst, &out->addr);
> > +
> > + WARN_ON(in->flags & DESC_DONE_FLAG);
> > + in->data = (uintptr_t)entry;
> > + entry->addr = dst;
> > +
> > + /* Ensure len/addr are visible before the head update */
> > + dma_wmb();
> > +
> > + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> > + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> > +
> > + return 0;
> > +out_unmap:
> > + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + int rc;
> > +
> > + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> > + if (rc) {
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> > + &qp->rx_free_q);
> > + return rc;
> > + }
> > +
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> > +
> > + if (qp->active)
> > + tasklet_schedule(&qp->rxc_db_work);
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + queue_work(ctx->wq, &edma->rx_work);
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> > + unsigned int qp_num)
> > +{
> > + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> > + struct ntb_transport_qp_edma *edma;
> > + struct ntb_dev *ndev = nt->ndev;
> > + int node;
> > +
> > + node = dev_to_node(&ndev->dev);
> > +
> > + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> > + if (!qp->priv)
> > + return -ENOMEM;
> > +
> > + edma = (struct ntb_transport_qp_edma *)qp->priv;
> > + edma->qp = qp;
> > + edma->rx_prod = 0;
> > + edma->rx_cons = 0;
> > + edma->tx_cons = 0;
> > + edma->tx_issue = 0;
> > +
> > + spin_lock_init(&edma->rx_lock);
> > + spin_lock_init(&edma->tx_lock);
> > +
> > + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> > + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> > + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + cancel_work_sync(&edma->db_work);
> > + cancel_work_sync(&edma->rx_work);
> > + cancel_work_sync(&edma->tx_work);
> > +
> > + kfree(qp->priv);
> > +}
> > +
> > +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int rc;
> > +
> > + rc = ntb_transport_edma_ep_init(nt);
> > + if (rc)
> > + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> > +
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int rc;
> > +
> > + rc = ntb_transport_edma_rc_init(nt);
> > + if (rc)
> > + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> > +
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> > + unsigned int *mw_count)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > +
> > + if (!use_remote_edma)
> > + return 0;
> > +
> > + /*
> > + * We need at least one MW for the transport plus one MW reserved
> > + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> > + */
> > + if (*mw_count <= 1) {
> > + dev_err(&ndev->dev,
> > + "remote eDMA requires at least two MWS (have %u)\n",
> > + *mw_count);
> > + return -ENODEV;
> > + }
> > +
> > + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> > + if (!ctx->wq) {
> > + ntb_transport_edma_uninit(nt);
> > + return -ENOMEM;
> > + }
> > +
> > + /* Reserve the last peer MW exclusively for the eDMA window. */
> > + *mw_count -= 1;
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> > +{
> > + ntb_transport_edma_uninit(nt);
> > +}
> > +
> > +static const struct ntb_transport_backend_ops edma_backend_ops = {
> > + .enable = ntb_transport_edma_enable,
> > + .disable = ntb_transport_edma_disable,
> > + .qp_init = ntb_transport_edma_qp_init,
> > + .qp_free = ntb_transport_edma_qp_free,
> > + .pre_link_up = ntb_transport_edma_pre_link_up,
> > + .post_link_up = ntb_transport_edma_post_link_up,
> > + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> > + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> > + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> > + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> > + .rx_poll = ntb_transport_edma_rx_poll,
> > + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> > +};
> > +
> > +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + int node;
> > +
> > + node = dev_to_node(&ndev->dev);
> > + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> > + node);
> > + if (!nt->priv)
> > + return -ENOMEM;
> > +
> > + nt->backend_ops = edma_backend_ops;
> > + /*
> > + * On remote eDMA mode, one DMA read channel is used for Host side
> > + * to interrupt EP.
> > + */
> > + use_msi = false;
> > + return 0;
> > +}
> > diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> > index 51ff08062d73..9fff65980d3d 100644
> > --- a/drivers/ntb/ntb_transport_internal.h
> > +++ b/drivers/ntb/ntb_transport_internal.h
> > @@ -8,6 +8,7 @@
> > extern unsigned long max_mw_size;
> > extern unsigned int transport_mtu;
> > extern bool use_msi;
> > +extern bool use_remote_edma;
> >
> > #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
> >
> > @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> > struct ntb_payload_header __iomem *tx_hdr;
> > struct ntb_payload_header *rx_hdr;
> > };
> > +
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > + dma_addr_t addr;
> > + struct scatterlist sgl;
> > +#endif
> > };
> >
> > struct ntb_rx_info {
> > @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> > unsigned int qp_num);
> > struct device *get_dma_dev(struct ntb_dev *ndev);
> >
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> > +#else
> > +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> > +
> > #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2025-12-20 15:28 ` Koichiro Den
@ 2026-01-06 18:46 ` Dave Jiang
2026-01-07 15:05 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-06 18:46 UTC (permalink / raw)
To: Koichiro Den, Frank Li
Cc: ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On 12/20/25 8:28 AM, Koichiro Den wrote:
> On Fri, Dec 19, 2025 at 10:00:52AM -0500, Frank Li wrote:
>> On Thu, Dec 18, 2025 at 12:16:00AM +0900, Koichiro Den wrote:
>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
>>> located on the endpoint, to be driven by both host and endpoint.
>>>
>>> The endpoint exposes a dedicated memory window which contains the eDMA
>>> register block, a small control structure (struct ntb_edma_info) and
>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
>>> its local eDMA write channels for its transmission, while host side
>>> uses the remote eDMA read channels for its transmission.
>>
>> I just glace the code. Look likes you use standard DMA API and
>> per-channel linked-list (LL) ring, which can be pure software.
>>
>> So it is not nessasry to binding to Designware EDMA. Maybe other vendor
>> PCIe's build DMA can work with this code?
>
> Yes, the DesignWare-specific parts are encapsulated under
> drivers/ntb/hw/edma/, so the ntb_transport_edma itself is not tightly
> coupled to DesignWare eDMA. In other words, if it's not the case and
> something remains, that's just my mistake.
>
> I intentionally avoided introducing an extra abstraction layer prematurely.
> If we later want to support other vendors' PCIe built-in DMA engines for
> edma_backend_ops, an additional internal abstraction under the
> 'edma_backend_ops' implementation can be introduced at that point.
> Do you think I should do so now?
I agree with Frank. Make it generic to allow future other vendors to utilize since this is the generic transport part.
DJ
>
> Koichiro
>
>>
>> Frank
>>>
>>> A key benefit of this backend is that the memory window no longer needs
>>> to carry data-plane payload. This makes the design less sensitive to
>>> limited memory window space and allows scaling to multiple queue pairs.
>>> The memory window layout is specific to the eDMA-backed backend, so
>>> there is no automatic fallback to the memcpy-based default transport
>>> that requires the different layout.
>>>
>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
>>> ---
>>> drivers/ntb/Kconfig | 12 +
>>> drivers/ntb/Makefile | 2 +
>>> drivers/ntb/ntb_transport_core.c | 15 +-
>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
>>> drivers/ntb/ntb_transport_internal.h | 15 +
>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
>>>
>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
>>> index df16c755b4da..5ba6d0b7f5ba 100644
>>> --- a/drivers/ntb/Kconfig
>>> +++ b/drivers/ntb/Kconfig
>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>>>
>>> If unsure, say N.
>>>
>>> +config NTB_TRANSPORT_EDMA
>>> + bool "NTB Transport backed by remote eDMA"
>>> + depends on NTB_TRANSPORT
>>> + depends on PCI
>>> + select DMA_ENGINE
>>> + select NTB_EDMA
>>> + help
>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
>>> + exposed through a dedicated NTB memory window. The host uses the
>>> + endpoint's eDMA engine to move data in both directions.
>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
>>> +
>>> endif # NTB
>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
>>> index 9b66e5fafbc0..b9086b32ecde 100644
>>> --- a/drivers/ntb/Makefile
>>> +++ b/drivers/ntb/Makefile
>>> @@ -6,3 +6,5 @@ ntb-y := core.o
>>> ntb-$(CONFIG_NTB_MSI) += msi.o
>>>
>>> ntb_transport-y := ntb_transport_core.o
>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
>>> index 40c2548f5930..bd21232f26fe 100644
>>> --- a/drivers/ntb/ntb_transport_core.c
>>> +++ b/drivers/ntb/ntb_transport_core.c
>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
>>> #endif
>>>
>>> +bool use_remote_edma;
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> +module_param(use_remote_edma, bool, 0644);
>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
>>> +#endif
>>> +
>>> static struct dentry *nt_debugfs_dir;
>>>
>>> /* Only two-ports NTB devices are supported */
>>> @@ -156,7 +162,7 @@ enum {
>>> #define drv_client(__drv) \
>>> container_of((__drv), struct ntb_transport_client, driver)
>>>
>>> -#define NTB_QP_DEF_NUM_ENTRIES 100
>>> +#define NTB_QP_DEF_NUM_ENTRIES 128
>>> #define NTB_LINK_DOWN_TIMEOUT 10
>>>
>>> static void ntb_transport_rxc_db(unsigned long data);
>>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
>>>
>>> nt->ndev = ndev;
>>>
>>> - rc = ntb_transport_default_init(nt);
>>> + if (use_remote_edma)
>>> + rc = ntb_transport_edma_init(nt);
>>> + else
>>> + rc = ntb_transport_default_init(nt);
>>> +
>>> if (rc)
>>> return rc;
>>>
>>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
>>>
>>> nt->qp_bitmap_free &= ~qp_bit;
>>>
>>> + qp->qp_bit = qp_bit;
>>> qp->cb_data = data;
>>> qp->rx_handler = handlers->rx_handler;
>>> qp->tx_handler = handlers->tx_handler;
>>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
>>> new file mode 100644
>>> index 000000000000..6ae5da0a1367
>>> --- /dev/null
>>> +++ b/drivers/ntb/ntb_transport_edma.c
>>> @@ -0,0 +1,987 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * NTB transport backend for remote DesignWare eDMA.
>>> + *
>>> + * This implements the backend_ops used when use_remote_edma=1 and
>>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
>>> + */
>>> +
>>> +#include <linux/bug.h>
>>> +#include <linux/compiler.h>
>>> +#include <linux/debugfs.h>
>>> +#include <linux/dmaengine.h>
>>> +#include <linux/dma-mapping.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/io-64-nonatomic-lo-hi.h>
>>> +#include <linux/ntb.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/pci-epc.h>
>>> +#include <linux/seq_file.h>
>>> +#include <linux/slab.h>
>>> +
>>> +#include "hw/edma/ntb_hw_edma.h"
>>> +#include "ntb_transport_internal.h"
>>> +
>>> +#define NTB_EDMA_RING_ORDER 7
>>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
>>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
>>> +
>>> +#define NTB_EDMA_MAX_POLL 32
>>> +
>>> +/*
>>> + * Remote eDMA mode implementation
>>> + */
>>> +struct ntb_transport_ctx_edma {
>>> + remote_edma_mode_t remote_edma_mode;
>>> + struct device *dma_dev;
>>> + struct workqueue_struct *wq;
>>> + struct ntb_edma_chans chans;
>>> +};
>>> +
>>> +struct ntb_transport_qp_edma {
>>> + struct ntb_transport_qp *qp;
>>> +
>>> + /*
>>> + * For ensuring peer notification in non-atomic context.
>>> + * ntb_peer_db_set might sleep or schedule.
>>> + */
>>> + struct work_struct db_work;
>>> +
>>> + u32 rx_prod;
>>> + u32 rx_cons;
>>> + u32 tx_cons;
>>> + u32 tx_issue;
>>> +
>>> + spinlock_t rx_lock;
>>> + spinlock_t tx_lock;
>>> +
>>> + struct work_struct rx_work;
>>> + struct work_struct tx_work;
>>> +};
>>> +
>>> +struct ntb_edma_desc {
>>> + u32 len;
>>> + u32 flags;
>>> + u64 addr; /* DMA address */
>>> + u64 data;
>>> +};
>>> +
>>> +struct ntb_edma_ring {
>>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
>>> + u32 head;
>>> + u32 tail;
>>> +};
>>> +
>>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
>>> +}
>>> +
>>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
>>> +}
>>> +
>>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
>>> +{
>>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
>>> +}
>>> +
>>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return n ^ !!ntb_qp_edma_is_ep(qp);
>>> +}
>>> +
>>> +static inline struct ntb_edma_ring *
>>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
>>> +{
>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>> +
>>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
>>> +}
>>> +
>>> +static inline struct ntb_edma_ring __iomem *
>>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
>>> +{
>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>> +
>>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
>>> +}
>>> +
>>> +static inline struct ntb_edma_desc *
>>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->desc[i];
>>> +}
>>> +
>>> +static inline struct ntb_edma_desc __iomem *
>>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
>>> + unsigned int i)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
>>> +}
>>> +
>>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->head;
>>> +}
>>> +
>>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->head;
>>> +}
>>> +
>>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->tail;
>>> +}
>>> +
>>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->tail;
>>> +}
>>> +
>>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
>>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
>>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
>>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
>>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
>>> +
>>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
>>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
>>> +
>>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
>>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
>>> +
>>> +/* ntb_edma_ring helpers */
>>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
>>> +{
>>> + return v & NTB_EDMA_RING_MASK;
>>> +}
>>> +
>>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
>>> +{
>>> + if (head >= tail) {
>>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
>>> + return head - tail;
>>> + }
>>> +
>>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
>>> + return U32_MAX - tail + head + 1;
>>> +}
>>> +
>>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
>>> +{
>>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
>>> +}
>>> +
>>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
>>> +{
>>> + return ntb_edma_ring_free_entry(head, tail) == 0;
>>> +}
>>> +
>>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + unsigned int head, tail;
>>> +
>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>> + /* In this scope, only 'head' might proceed */
>>> + tail = READ_ONCE(edma->tx_issue);
>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>> + }
>>> + /*
>>> + * 'used' amount indicates how much the other end has refilled,
>>> + * which are available for us to use for TX.
>>> + */
>>> + return ntb_edma_ring_used_entry(head, tail);
>>> +}
>>> +
>>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
>>> + struct ntb_transport_qp *qp)
>>> +{
>>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
>>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
>>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
>>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
>>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
>>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
>>> +
>>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
>>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
>>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
>>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
>>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
>>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
>>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
>>> + seq_putc(s, '\n');
>>> +
>>> + seq_puts(s, "Using Remote eDMA - Yes\n");
>>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
>>> +}
>>> +
>>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> +
>>> + if (ctx->wq)
>>> + destroy_workqueue(ctx->wq);
>>> + ctx->wq = NULL;
>>> +
>>> + ntb_edma_teardown_chans(&ctx->chans);
>>> +
>>> + switch (ctx->remote_edma_mode) {
>>> + case REMOTE_EDMA_EP:
>>> + ntb_edma_teardown_mws(nt->ndev);
>>> + break;
>>> + case REMOTE_EDMA_RC:
>>> + ntb_edma_teardown_peer(nt->ndev);
>>> + break;
>>> + case REMOTE_EDMA_UNKNOWN:
>>> + default:
>>> + break;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
>>> +}
>>> +
>>> +static void ntb_transport_edma_db_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma =
>>> + container_of(work, struct ntb_transport_qp_edma, db_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> +
>>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
>>> +}
>>> +
>>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
>>> +{
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
>>> + return;
>>> +
>>> + /*
>>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
>>> + * may sleep, delegate the actual doorbell write to a workqueue.
>>> + */
>>> + queue_work(system_highpri_wq, &edma->db_work);
>>> +}
>>> +
>>> +static void ntb_transport_edma_isr(void *data, int qp_num)
>>> +{
>>> + struct ntb_transport_ctx *nt = data;
>>> + struct ntb_transport_qp_edma *edma;
>>> + struct ntb_transport_ctx_edma *ctx;
>>> + struct ntb_transport_qp *qp;
>>> +
>>> + if (qp_num < 0 || qp_num >= nt->qp_count)
>>> + return;
>>> +
>>> + qp = &nt->qp_vec[qp_num];
>>> + if (WARN_ON(!qp))
>>> + return;
>>> +
>>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
>>> + edma = qp->priv;
>>> +
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int peer_mw;
>>> + int rc;
>>> +
>>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
>>> + return 0;
>>> +
>>> + peer_mw = ntb_peer_mw_count(ndev);
>>> + if (peer_mw <= 0)
>>> + return -ENODEV;
>>> +
>>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
>>> + return rc;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>> + goto err_teardown_peer;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
>>> + rc);
>>> + goto err_teardown_chans;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
>>> + return 0;
>>> +
>>> +err_teardown_chans:
>>> + ntb_edma_teardown_chans(&ctx->chans);
>>> +err_teardown_peer:
>>> + ntb_edma_teardown_peer(ndev);
>>> + return rc;
>>> +}
>>> +
>>> +
>>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int peer_mw;
>>> + int rc;
>>> +
>>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
>>> + return 0;
>>> +
>>> + /**
>>> + * This check assumes that the endpoint (pci-epf-vntb.c)
>>> + * ntb_dev_ops implements .get_private_data() while the host side
>>> + * (ntb_hw_epf.c) does not.
>>> + */
>>> + if (!ntb_get_private_data(ndev))
>>> + return 0;
>>> +
>>> + peer_mw = ntb_peer_mw_count(ndev);
>>> + if (peer_mw <= 0)
>>> + return -ENODEV;
>>> +
>>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
>>> + ntb_transport_edma_isr, nt);
>>> + if (rc) {
>>> + dev_err(&pdev->dev,
>>> + "Failed to set up memory window for eDMA: %d\n", rc);
>>> + return rc;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>> + ntb_edma_teardown_mws(ndev);
>>> + return rc;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
>>> + return 0;
>>> +}
>>> +
>>> +
>>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
>>> + unsigned int qp_num)
>>> +{
>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct ntb_queue_entry *entry;
>>> + struct ntb_transport_mw *mw;
>>> + unsigned int mw_num, mw_count, qp_count;
>>> + unsigned int qp_offset, rx_info_offset;
>>> + unsigned int mw_size, mw_size_per_qp;
>>> + unsigned int num_qps_mw;
>>> + size_t edma_total;
>>> + unsigned int i;
>>> + int node;
>>> +
>>> + mw_count = nt->mw_count;
>>> + qp_count = nt->qp_count;
>>> +
>>> + mw_num = QP_TO_MW(nt, qp_num);
>>> + mw = &nt->mw_vec[mw_num];
>>> +
>>> + if (!mw->virt_addr)
>>> + return -ENOMEM;
>>> +
>>> + if (mw_num < qp_count % mw_count)
>>> + num_qps_mw = qp_count / mw_count + 1;
>>> + else
>>> + num_qps_mw = qp_count / mw_count;
>>> +
>>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
>>> + if (max_mw_size && mw_size > max_mw_size)
>>> + mw_size = max_mw_size;
>>> +
>>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
>>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
>>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
>>> +
>>> + qp->tx_mw_size = mw_size_per_qp;
>>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
>>> + if (!qp->tx_mw)
>>> + return -EINVAL;
>>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
>>> + if (!qp->tx_mw_phys)
>>> + return -EINVAL;
>>> + qp->rx_info = qp->tx_mw + rx_info_offset;
>>> + qp->rx_buff = mw->virt_addr + qp_offset;
>>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
>>> +
>>> + /* Due to housekeeping, there must be at least 2 buffs */
>>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>> +
>>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
>>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
>>> + if (rx_info_offset < edma_total) {
>>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
>>> + edma_total, rx_info_offset);
>>> + return -EINVAL;
>>> + }
>>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
>>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
>>> +
>>> + /*
>>> + * Checking to see if we have more entries than the default.
>>> + * We should add additional entries if that is the case so we
>>> + * can be in sync with the transport frames.
>>> + */
>>> + node = dev_to_node(&ndev->dev);
>>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
>>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
>>> + if (!entry)
>>> + return -ENOMEM;
>>> +
>>> + entry->qp = qp;
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>> + &qp->rx_free_q);
>>> + qp->rx_alloc_entry++;
>>> + }
>>> +
>>> + memset(qp->rx_buff, 0, edma_total);
>>> +
>>> + qp->rx_pkts = 0;
>>> + qp->tx_pkts = 0;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_queue_entry *entry;
>>> + struct ntb_edma_desc *in;
>>> + unsigned int len;
>>> + bool link_down;
>>> + u32 idx;
>>> +
>>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
>>> + edma->rx_cons) == 0)
>>> + return 0;
>>> +
>>> + idx = ntb_edma_ring_idx(edma->rx_cons);
>>> + in = NTB_DESC_RX_I(qp, idx);
>>> + if (!(in->flags & DESC_DONE_FLAG))
>>> + return 0;
>>> +
>>> + link_down = in->flags & LINK_DOWN_FLAG;
>>> + in->flags = 0;
>>> + len = in->len; /* might be smaller than entry->len */
>>> +
>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>> + if (WARN_ON(!entry))
>>> + return 0;
>>> +
>>> + if (link_down) {
>>> + ntb_qp_link_down(qp);
>>> + edma->rx_cons++;
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>> + return 1;
>>> + }
>>> +
>>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
>>> +
>>> + qp->rx_bytes += len;
>>> + qp->rx_pkts++;
>>> + edma->rx_cons++;
>>> +
>>> + if (qp->rx_handler && qp->client_ready)
>>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
>>> +
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>> + return 1;
>>> +}
>>> +
>>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = container_of(
>>> + work, struct ntb_transport_qp_edma, rx_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> + unsigned int i;
>>> +
>>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
>>> + if (!ntb_transport_edma_rx_complete(qp))
>>> + break;
>>> + }
>>> +
>>> + if (ntb_transport_edma_rx_complete(qp))
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> +}
>>> +
>>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = container_of(
>>> + work, struct ntb_transport_qp_edma, tx_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + struct ntb_queue_entry *entry;
>>> + unsigned int len;
>>> + void *cb_data;
>>> + u32 idx;
>>> +
>>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
>>> + edma->tx_cons) != 0) {
>>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
>>> + smp_rmb();
>>> +
>>> + idx = ntb_edma_ring_idx(edma->tx_cons);
>>> + in = NTB_DESC_TX_I(qp, idx);
>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
>>> + break;
>>> +
>>> + in->data = 0;
>>> +
>>> + cb_data = entry->cb_data;
>>> + len = entry->len;
>>> +
>>> + out = NTB_DESC_TX_O(qp, idx);
>>> +
>>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
>>> +
>>> + /*
>>> + * No need to add barrier in-between to enforce ordering here.
>>> + * The other side proceeds only after both flags and tail are
>>> + * updated.
>>> + */
>>> + iowrite32(entry->flags, &out->flags);
>>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
>>> +
>>> + ntb_transport_edma_notify_peer(edma);
>>> +
>>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
>>> + &qp->tx_free_q);
>>> +
>>> + if (qp->tx_handler)
>>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
>>> +
>>> + /* stat updates */
>>> + qp->tx_bytes += len;
>>> + qp->tx_pkts++;
>>> + }
>>> +}
>>> +
>>> +static void ntb_transport_edma_tx_cb(void *data,
>>> + const struct dmaengine_result *res)
>>> +{
>>> + struct ntb_queue_entry *entry = data;
>>> + struct ntb_transport_qp *qp = entry->qp;
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + enum dmaengine_tx_result dma_err = res->result;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + switch (dma_err) {
>>> + case DMA_TRANS_READ_FAILED:
>>> + case DMA_TRANS_WRITE_FAILED:
>>> + case DMA_TRANS_ABORTED:
>>> + entry->errors++;
>>> + entry->len = -EIO;
>>> + break;
>>> + case DMA_TRANS_NOERROR:
>>> + default:
>>> + break;
>>> + }
>>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
>>> + sg_dma_address(&entry->sgl) = 0;
>>> +
>>> + entry->flags |= DESC_DONE_FLAG;
>>> +
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
>>> + size_t len, void *rc_src, dma_addr_t dst,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct scatterlist *sgl = &entry->sgl;
>>> + struct dma_async_tx_descriptor *txd;
>>> + struct dma_slave_config cfg;
>>> + dma_cookie_t cookie;
>>> + int nents, rc;
>>> +
>>> + if (!d)
>>> + return -ENODEV;
>>> +
>>> + if (!chan)
>>> + return -ENXIO;
>>> +
>>> + if (WARN_ON(!rc_src || !dst))
>>> + return -EINVAL;
>>> +
>>> + if (WARN_ON(sg_dma_address(sgl)))
>>> + return -EINVAL;
>>> +
>>> + sg_init_one(sgl, rc_src, len);
>>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
>>> + if (nents <= 0)
>>> + return -EIO;
>>> +
>>> + memset(&cfg, 0, sizeof(cfg));
>>> + cfg.dst_addr = dst;
>>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>> + cfg.direction = DMA_MEM_TO_DEV;
>>> +
>>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
>>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
>>> + if (!txd) {
>>> + rc = -EIO;
>>> + goto out_unmap;
>>> + }
>>> +
>>> + txd->callback_result = ntb_transport_edma_tx_cb;
>>> + txd->callback_param = entry;
>>> +
>>> + cookie = dmaengine_submit(txd);
>>> + if (dma_submit_error(cookie)) {
>>> + rc = -EIO;
>>> + goto out_unmap;
>>> + }
>>> + dma_async_issue_pending(chan);
>>> + return 0;
>>> +out_unmap:
>>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + unsigned int len = entry->len;
>>> + struct dma_chan *chan;
>>> + u32 issue, idx, head;
>>> + dma_addr_t dst;
>>> + int rc;
>>> +
>>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
>>> +
>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>> + issue = edma->tx_issue;
>>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
>>> + qp->tx_ring_full++;
>>> + return -ENOSPC;
>>> + }
>>> +
>>> + /*
>>> + * ntb_transport_edma_tx_work() checks entry->flags
>>> + * so it needs to be set before tx_issue++.
>>> + */
>>> + idx = ntb_edma_ring_idx(issue);
>>> + in = NTB_DESC_TX_I(qp, idx);
>>> + in->data = (uintptr_t)entry;
>>> +
>>> + /* Make in->data visible before tx_issue++ */
>>> + smp_wmb();
>>> +
>>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
>>> + }
>>> +
>>> + /* Publish the final transfer length to the other end */
>>> + out = NTB_DESC_TX_O(qp, idx);
>>> + iowrite32(len, &out->len);
>>> + ioread32(&out->len);
>>> +
>>> + if (unlikely(!len)) {
>>> + entry->flags |= DESC_DONE_FLAG;
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> + return 0;
>>> + }
>>> +
>>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
>>> + dma_rmb();
>>> +
>>> + /* kick remote eDMA read transfer */
>>> + dst = (dma_addr_t)in->addr;
>>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
>>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
>>> + entry->buf, dst, entry);
>>> + if (rc) {
>>> + entry->errors++;
>>> + entry->len = -EIO;
>>> + entry->flags |= DESC_DONE_FLAG;
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry,
>>> + void *cb, void *data, unsigned int len,
>>> + unsigned int flags)
>>> +{
>>> + struct device *dma_dev;
>>> +
>>> + if (entry->addr) {
>>> + /* Deferred unmap */
>>> + dma_dev = get_dma_dev(qp->ndev);
>>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
>>> + DMA_TO_DEVICE);
>>> + }
>>> +
>>> + entry->cb_data = cb;
>>> + entry->buf = data;
>>> + entry->len = len;
>>> + entry->flags = flags;
>>> + entry->errors = 0;
>>> + entry->addr = 0;
>>> +
>>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
>>> +
>>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + unsigned int len = entry->len;
>>> + void *data = entry->buf;
>>> + dma_addr_t dst;
>>> + u32 idx;
>>> + int rc;
>>> +
>>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
>>> + rc = dma_mapping_error(dma_dev, dst);
>>> + if (rc)
>>> + return rc;
>>> +
>>> + guard(spinlock_bh)(&edma->rx_lock);
>>> +
>>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
>>> + READ_ONCE(edma->rx_cons))) {
>>> + rc = -ENOSPC;
>>> + goto out_unmap;
>>> + }
>>> +
>>> + idx = ntb_edma_ring_idx(edma->rx_prod);
>>> + in = NTB_DESC_RX_I(qp, idx);
>>> + out = NTB_DESC_RX_O(qp, idx);
>>> +
>>> + iowrite32(len, &out->len);
>>> + iowrite64(dst, &out->addr);
>>> +
>>> + WARN_ON(in->flags & DESC_DONE_FLAG);
>>> + in->data = (uintptr_t)entry;
>>> + entry->addr = dst;
>>> +
>>> + /* Ensure len/addr are visible before the head update */
>>> + dma_wmb();
>>> +
>>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
>>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
>>> +
>>> + return 0;
>>> +out_unmap:
>>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
>>> + if (rc) {
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>> + &qp->rx_free_q);
>>> + return rc;
>>> + }
>>> +
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
>>> +
>>> + if (qp->active)
>>> + tasklet_schedule(&qp->rxc_db_work);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
>>> + unsigned int qp_num)
>>> +{
>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>> + struct ntb_transport_qp_edma *edma;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + int node;
>>> +
>>> + node = dev_to_node(&ndev->dev);
>>> +
>>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
>>> + if (!qp->priv)
>>> + return -ENOMEM;
>>> +
>>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
>>> + edma->qp = qp;
>>> + edma->rx_prod = 0;
>>> + edma->rx_cons = 0;
>>> + edma->tx_cons = 0;
>>> + edma->tx_issue = 0;
>>> +
>>> + spin_lock_init(&edma->rx_lock);
>>> + spin_lock_init(&edma->tx_lock);
>>> +
>>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
>>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
>>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + cancel_work_sync(&edma->db_work);
>>> + cancel_work_sync(&edma->rx_work);
>>> + cancel_work_sync(&edma->tx_work);
>>> +
>>> + kfree(qp->priv);
>>> +}
>>> +
>>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_ep_init(nt);
>>> + if (rc)
>>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
>>> +
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_rc_init(nt);
>>> + if (rc)
>>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
>>> +
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
>>> + unsigned int *mw_count)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> +
>>> + if (!use_remote_edma)
>>> + return 0;
>>> +
>>> + /*
>>> + * We need at least one MW for the transport plus one MW reserved
>>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
>>> + */
>>> + if (*mw_count <= 1) {
>>> + dev_err(&ndev->dev,
>>> + "remote eDMA requires at least two MWS (have %u)\n",
>>> + *mw_count);
>>> + return -ENODEV;
>>> + }
>>> +
>>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
>>> + if (!ctx->wq) {
>>> + ntb_transport_edma_uninit(nt);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + /* Reserve the last peer MW exclusively for the eDMA window. */
>>> + *mw_count -= 1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
>>> +{
>>> + ntb_transport_edma_uninit(nt);
>>> +}
>>> +
>>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
>>> + .enable = ntb_transport_edma_enable,
>>> + .disable = ntb_transport_edma_disable,
>>> + .qp_init = ntb_transport_edma_qp_init,
>>> + .qp_free = ntb_transport_edma_qp_free,
>>> + .pre_link_up = ntb_transport_edma_pre_link_up,
>>> + .post_link_up = ntb_transport_edma_post_link_up,
>>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
>>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
>>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
>>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
>>> + .rx_poll = ntb_transport_edma_rx_poll,
>>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
>>> +};
>>> +
>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + int node;
>>> +
>>> + node = dev_to_node(&ndev->dev);
>>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
>>> + node);
>>> + if (!nt->priv)
>>> + return -ENOMEM;
>>> +
>>> + nt->backend_ops = edma_backend_ops;
>>> + /*
>>> + * On remote eDMA mode, one DMA read channel is used for Host side
>>> + * to interrupt EP.
>>> + */
>>> + use_msi = false;
>>> + return 0;
>>> +}
>>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
>>> index 51ff08062d73..9fff65980d3d 100644
>>> --- a/drivers/ntb/ntb_transport_internal.h
>>> +++ b/drivers/ntb/ntb_transport_internal.h
>>> @@ -8,6 +8,7 @@
>>> extern unsigned long max_mw_size;
>>> extern unsigned int transport_mtu;
>>> extern bool use_msi;
>>> +extern bool use_remote_edma;
>>>
>>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
>>>
>>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
>>> struct ntb_payload_header __iomem *tx_hdr;
>>> struct ntb_payload_header *rx_hdr;
>>> };
>>> +
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> + dma_addr_t addr;
>>> + struct scatterlist sgl;
>>> +#endif
>>> };
>>>
>>> struct ntb_rx_info {
>>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
>>> unsigned int qp_num);
>>> struct device *get_dma_dev(struct ntb_dev *ndev);
>>>
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
>>> +#else
>>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + return -EOPNOTSUPP;
>>> +}
>>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
>>> +
>>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
>>> --
>>> 2.51.0
>>>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-06 18:46 ` Dave Jiang
@ 2026-01-07 15:05 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2026-01-07 15:05 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Tue, Jan 06, 2026 at 11:46:23AM -0700, Dave Jiang wrote:
>
>
> On 12/20/25 8:28 AM, Koichiro Den wrote:
> > On Fri, Dec 19, 2025 at 10:00:52AM -0500, Frank Li wrote:
> >> On Thu, Dec 18, 2025 at 12:16:00AM +0900, Koichiro Den wrote:
> >>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> >>> located on the endpoint, to be driven by both host and endpoint.
> >>>
> >>> The endpoint exposes a dedicated memory window which contains the eDMA
> >>> register block, a small control structure (struct ntb_edma_info) and
> >>> per-channel linked-list (LL) rings for read channels. Endpoint drives
> >>> its local eDMA write channels for its transmission, while host side
> >>> uses the remote eDMA read channels for its transmission.
> >>
> >> I just glace the code. Look likes you use standard DMA API and
> >> per-channel linked-list (LL) ring, which can be pure software.
> >>
> >> So it is not nessasry to binding to Designware EDMA. Maybe other vendor
> >> PCIe's build DMA can work with this code?
> >
> > Yes, the DesignWare-specific parts are encapsulated under
> > drivers/ntb/hw/edma/, so the ntb_transport_edma itself is not tightly
> > coupled to DesignWare eDMA. In other words, if it's not the case and
> > something remains, that's just my mistake.
> >
> > I intentionally avoided introducing an extra abstraction layer prematurely.
> > If we later want to support other vendors' PCIe built-in DMA engines for
> > edma_backend_ops, an additional internal abstraction under the
> > 'edma_backend_ops' implementation can be introduced at that point.
> > Do you think I should do so now?
>
> I agree with Frank. Make it generic to allow future other vendors to utilize since this is the generic transport part.
Thank you for your feedback.
So I plan to introduce an additional internal abstraction 'ntb_edma_backend'.
ntb_transport_edma selects one specific ntb_edma_backend implementation.
For now, dw-edma (drivers/ntb/hw/edma/ntb_dw_edma.c) is the only choise,
but some drivers/ntb/hw/edma/ntb_xyz.c might show up in the future.
Regards,
Koichiro
>
> DJ
>
> >
> > Koichiro
> >
> >>
> >> Frank
> >>>
> >>> A key benefit of this backend is that the memory window no longer needs
> >>> to carry data-plane payload. This makes the design less sensitive to
> >>> limited memory window space and allows scaling to multiple queue pairs.
> >>> The memory window layout is specific to the eDMA-backed backend, so
> >>> there is no automatic fallback to the memcpy-based default transport
> >>> that requires the different layout.
> >>>
> >>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> >>> ---
> >>> drivers/ntb/Kconfig | 12 +
> >>> drivers/ntb/Makefile | 2 +
> >>> drivers/ntb/ntb_transport_core.c | 15 +-
> >>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> >>> drivers/ntb/ntb_transport_internal.h | 15 +
> >>> 5 files changed, 1029 insertions(+), 2 deletions(-)
> >>> create mode 100644 drivers/ntb/ntb_transport_edma.c
> >>>
> >>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> >>> index df16c755b4da..5ba6d0b7f5ba 100644
> >>> --- a/drivers/ntb/Kconfig
> >>> +++ b/drivers/ntb/Kconfig
> >>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >>>
> >>> If unsure, say N.
> >>>
> >>> +config NTB_TRANSPORT_EDMA
> >>> + bool "NTB Transport backed by remote eDMA"
> >>> + depends on NTB_TRANSPORT
> >>> + depends on PCI
> >>> + select DMA_ENGINE
> >>> + select NTB_EDMA
> >>> + help
> >>> + Enable a transport backend that uses a remote DesignWare eDMA engine
> >>> + exposed through a dedicated NTB memory window. The host uses the
> >>> + endpoint's eDMA engine to move data in both directions.
> >>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> >>> +
> >>> endif # NTB
> >>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> >>> index 9b66e5fafbc0..b9086b32ecde 100644
> >>> --- a/drivers/ntb/Makefile
> >>> +++ b/drivers/ntb/Makefile
> >>> @@ -6,3 +6,5 @@ ntb-y := core.o
> >>> ntb-$(CONFIG_NTB_MSI) += msi.o
> >>>
> >>> ntb_transport-y := ntb_transport_core.o
> >>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> >>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> >>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> >>> index 40c2548f5930..bd21232f26fe 100644
> >>> --- a/drivers/ntb/ntb_transport_core.c
> >>> +++ b/drivers/ntb/ntb_transport_core.c
> >>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> >>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> >>> #endif
> >>>
> >>> +bool use_remote_edma;
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> +module_param(use_remote_edma, bool, 0644);
> >>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> >>> +#endif
> >>> +
> >>> static struct dentry *nt_debugfs_dir;
> >>>
> >>> /* Only two-ports NTB devices are supported */
> >>> @@ -156,7 +162,7 @@ enum {
> >>> #define drv_client(__drv) \
> >>> container_of((__drv), struct ntb_transport_client, driver)
> >>>
> >>> -#define NTB_QP_DEF_NUM_ENTRIES 100
> >>> +#define NTB_QP_DEF_NUM_ENTRIES 128
> >>> #define NTB_LINK_DOWN_TIMEOUT 10
> >>>
> >>> static void ntb_transport_rxc_db(unsigned long data);
> >>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
> >>>
> >>> nt->ndev = ndev;
> >>>
> >>> - rc = ntb_transport_default_init(nt);
> >>> + if (use_remote_edma)
> >>> + rc = ntb_transport_edma_init(nt);
> >>> + else
> >>> + rc = ntb_transport_default_init(nt);
> >>> +
> >>> if (rc)
> >>> return rc;
> >>>
> >>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
> >>>
> >>> nt->qp_bitmap_free &= ~qp_bit;
> >>>
> >>> + qp->qp_bit = qp_bit;
> >>> qp->cb_data = data;
> >>> qp->rx_handler = handlers->rx_handler;
> >>> qp->tx_handler = handlers->tx_handler;
> >>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> >>> new file mode 100644
> >>> index 000000000000..6ae5da0a1367
> >>> --- /dev/null
> >>> +++ b/drivers/ntb/ntb_transport_edma.c
> >>> @@ -0,0 +1,987 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * NTB transport backend for remote DesignWare eDMA.
> >>> + *
> >>> + * This implements the backend_ops used when use_remote_edma=1 and
> >>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> >>> + */
> >>> +
> >>> +#include <linux/bug.h>
> >>> +#include <linux/compiler.h>
> >>> +#include <linux/debugfs.h>
> >>> +#include <linux/dmaengine.h>
> >>> +#include <linux/dma-mapping.h>
> >>> +#include <linux/errno.h>
> >>> +#include <linux/io-64-nonatomic-lo-hi.h>
> >>> +#include <linux/ntb.h>
> >>> +#include <linux/pci.h>
> >>> +#include <linux/pci-epc.h>
> >>> +#include <linux/seq_file.h>
> >>> +#include <linux/slab.h>
> >>> +
> >>> +#include "hw/edma/ntb_hw_edma.h"
> >>> +#include "ntb_transport_internal.h"
> >>> +
> >>> +#define NTB_EDMA_RING_ORDER 7
> >>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> >>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> >>> +
> >>> +#define NTB_EDMA_MAX_POLL 32
> >>> +
> >>> +/*
> >>> + * Remote eDMA mode implementation
> >>> + */
> >>> +struct ntb_transport_ctx_edma {
> >>> + remote_edma_mode_t remote_edma_mode;
> >>> + struct device *dma_dev;
> >>> + struct workqueue_struct *wq;
> >>> + struct ntb_edma_chans chans;
> >>> +};
> >>> +
> >>> +struct ntb_transport_qp_edma {
> >>> + struct ntb_transport_qp *qp;
> >>> +
> >>> + /*
> >>> + * For ensuring peer notification in non-atomic context.
> >>> + * ntb_peer_db_set might sleep or schedule.
> >>> + */
> >>> + struct work_struct db_work;
> >>> +
> >>> + u32 rx_prod;
> >>> + u32 rx_cons;
> >>> + u32 tx_cons;
> >>> + u32 tx_issue;
> >>> +
> >>> + spinlock_t rx_lock;
> >>> + spinlock_t tx_lock;
> >>> +
> >>> + struct work_struct rx_work;
> >>> + struct work_struct tx_work;
> >>> +};
> >>> +
> >>> +struct ntb_edma_desc {
> >>> + u32 len;
> >>> + u32 flags;
> >>> + u64 addr; /* DMA address */
> >>> + u64 data;
> >>> +};
> >>> +
> >>> +struct ntb_edma_ring {
> >>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> >>> + u32 head;
> >>> + u32 tail;
> >>> +};
> >>> +
> >>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> >>> +}
> >>> +
> >>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> >>> +}
> >>> +
> >>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> >>> +{
> >>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> >>> +}
> >>> +
> >>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return n ^ !!ntb_qp_edma_is_ep(qp);
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_ring *
> >>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> >>> +{
> >>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>> +
> >>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_ring __iomem *
> >>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> >>> +{
> >>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>> +
> >>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_desc *
> >>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->desc[i];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_desc __iomem *
> >>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> >>> + unsigned int i)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
> >>> +}
> >>> +
> >>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->head;
> >>> +}
> >>> +
> >>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->head;
> >>> +}
> >>> +
> >>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->tail;
> >>> +}
> >>> +
> >>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->tail;
> >>> +}
> >>> +
> >>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
> >>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> >>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> >>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> >>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> >>> +
> >>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> >>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> >>> +
> >>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> >>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> >>> +
> >>> +/* ntb_edma_ring helpers */
> >>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> >>> +{
> >>> + return v & NTB_EDMA_RING_MASK;
> >>> +}
> >>> +
> >>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> >>> +{
> >>> + if (head >= tail) {
> >>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> >>> + return head - tail;
> >>> + }
> >>> +
> >>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> >>> + return U32_MAX - tail + head + 1;
> >>> +}
> >>> +
> >>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> >>> +{
> >>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> >>> +}
> >>> +
> >>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> >>> +{
> >>> + return ntb_edma_ring_free_entry(head, tail) == 0;
> >>> +}
> >>> +
> >>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + unsigned int head, tail;
> >>> +
> >>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>> + /* In this scope, only 'head' might proceed */
> >>> + tail = READ_ONCE(edma->tx_issue);
> >>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>> + }
> >>> + /*
> >>> + * 'used' amount indicates how much the other end has refilled,
> >>> + * which are available for us to use for TX.
> >>> + */
> >>> + return ntb_edma_ring_used_entry(head, tail);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> >>> + struct ntb_transport_qp *qp)
> >>> +{
> >>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> >>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> >>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> >>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> >>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> >>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> >>> +
> >>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> >>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> >>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> >>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> >>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> >>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> >>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> >>> + seq_putc(s, '\n');
> >>> +
> >>> + seq_puts(s, "Using Remote eDMA - Yes\n");
> >>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> +
> >>> + if (ctx->wq)
> >>> + destroy_workqueue(ctx->wq);
> >>> + ctx->wq = NULL;
> >>> +
> >>> + ntb_edma_teardown_chans(&ctx->chans);
> >>> +
> >>> + switch (ctx->remote_edma_mode) {
> >>> + case REMOTE_EDMA_EP:
> >>> + ntb_edma_teardown_mws(nt->ndev);
> >>> + break;
> >>> + case REMOTE_EDMA_RC:
> >>> + ntb_edma_teardown_peer(nt->ndev);
> >>> + break;
> >>> + case REMOTE_EDMA_UNKNOWN:
> >>> + default:
> >>> + break;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_db_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma =
> >>> + container_of(work, struct ntb_transport_qp_edma, db_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> +
> >>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> >>> +{
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> >>> + return;
> >>> +
> >>> + /*
> >>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> >>> + * may sleep, delegate the actual doorbell write to a workqueue.
> >>> + */
> >>> + queue_work(system_highpri_wq, &edma->db_work);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_isr(void *data, int qp_num)
> >>> +{
> >>> + struct ntb_transport_ctx *nt = data;
> >>> + struct ntb_transport_qp_edma *edma;
> >>> + struct ntb_transport_ctx_edma *ctx;
> >>> + struct ntb_transport_qp *qp;
> >>> +
> >>> + if (qp_num < 0 || qp_num >= nt->qp_count)
> >>> + return;
> >>> +
> >>> + qp = &nt->qp_vec[qp_num];
> >>> + if (WARN_ON(!qp))
> >>> + return;
> >>> +
> >>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> >>> + edma = qp->priv;
> >>> +
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int peer_mw;
> >>> + int rc;
> >>> +
> >>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> >>> + return 0;
> >>> +
> >>> + peer_mw = ntb_peer_mw_count(ndev);
> >>> + if (peer_mw <= 0)
> >>> + return -ENODEV;
> >>> +
> >>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>> + goto err_teardown_peer;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> >>> + rc);
> >>> + goto err_teardown_chans;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> >>> + return 0;
> >>> +
> >>> +err_teardown_chans:
> >>> + ntb_edma_teardown_chans(&ctx->chans);
> >>> +err_teardown_peer:
> >>> + ntb_edma_teardown_peer(ndev);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +
> >>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int peer_mw;
> >>> + int rc;
> >>> +
> >>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> >>> + return 0;
> >>> +
> >>> + /**
> >>> + * This check assumes that the endpoint (pci-epf-vntb.c)
> >>> + * ntb_dev_ops implements .get_private_data() while the host side
> >>> + * (ntb_hw_epf.c) does not.
> >>> + */
> >>> + if (!ntb_get_private_data(ndev))
> >>> + return 0;
> >>> +
> >>> + peer_mw = ntb_peer_mw_count(ndev);
> >>> + if (peer_mw <= 0)
> >>> + return -ENODEV;
> >>> +
> >>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> >>> + ntb_transport_edma_isr, nt);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev,
> >>> + "Failed to set up memory window for eDMA: %d\n", rc);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>> + ntb_edma_teardown_mws(ndev);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> >>> + return 0;
> >>> +}
> >>> +
> >>> +
> >>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> >>> + unsigned int qp_num)
> >>> +{
> >>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct ntb_queue_entry *entry;
> >>> + struct ntb_transport_mw *mw;
> >>> + unsigned int mw_num, mw_count, qp_count;
> >>> + unsigned int qp_offset, rx_info_offset;
> >>> + unsigned int mw_size, mw_size_per_qp;
> >>> + unsigned int num_qps_mw;
> >>> + size_t edma_total;
> >>> + unsigned int i;
> >>> + int node;
> >>> +
> >>> + mw_count = nt->mw_count;
> >>> + qp_count = nt->qp_count;
> >>> +
> >>> + mw_num = QP_TO_MW(nt, qp_num);
> >>> + mw = &nt->mw_vec[mw_num];
> >>> +
> >>> + if (!mw->virt_addr)
> >>> + return -ENOMEM;
> >>> +
> >>> + if (mw_num < qp_count % mw_count)
> >>> + num_qps_mw = qp_count / mw_count + 1;
> >>> + else
> >>> + num_qps_mw = qp_count / mw_count;
> >>> +
> >>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> >>> + if (max_mw_size && mw_size > max_mw_size)
> >>> + mw_size = max_mw_size;
> >>> +
> >>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> >>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> >>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> >>> +
> >>> + qp->tx_mw_size = mw_size_per_qp;
> >>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> >>> + if (!qp->tx_mw)
> >>> + return -EINVAL;
> >>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> >>> + if (!qp->tx_mw_phys)
> >>> + return -EINVAL;
> >>> + qp->rx_info = qp->tx_mw + rx_info_offset;
> >>> + qp->rx_buff = mw->virt_addr + qp_offset;
> >>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> >>> +
> >>> + /* Due to housekeeping, there must be at least 2 buffs */
> >>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>> +
> >>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> >>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
> >>> + if (rx_info_offset < edma_total) {
> >>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> >>> + edma_total, rx_info_offset);
> >>> + return -EINVAL;
> >>> + }
> >>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>> +
> >>> + /*
> >>> + * Checking to see if we have more entries than the default.
> >>> + * We should add additional entries if that is the case so we
> >>> + * can be in sync with the transport frames.
> >>> + */
> >>> + node = dev_to_node(&ndev->dev);
> >>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> >>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> >>> + if (!entry)
> >>> + return -ENOMEM;
> >>> +
> >>> + entry->qp = qp;
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>> + &qp->rx_free_q);
> >>> + qp->rx_alloc_entry++;
> >>> + }
> >>> +
> >>> + memset(qp->rx_buff, 0, edma_total);
> >>> +
> >>> + qp->rx_pkts = 0;
> >>> + qp->tx_pkts = 0;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_queue_entry *entry;
> >>> + struct ntb_edma_desc *in;
> >>> + unsigned int len;
> >>> + bool link_down;
> >>> + u32 idx;
> >>> +
> >>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> >>> + edma->rx_cons) == 0)
> >>> + return 0;
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->rx_cons);
> >>> + in = NTB_DESC_RX_I(qp, idx);
> >>> + if (!(in->flags & DESC_DONE_FLAG))
> >>> + return 0;
> >>> +
> >>> + link_down = in->flags & LINK_DOWN_FLAG;
> >>> + in->flags = 0;
> >>> + len = in->len; /* might be smaller than entry->len */
> >>> +
> >>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>> + if (WARN_ON(!entry))
> >>> + return 0;
> >>> +
> >>> + if (link_down) {
> >>> + ntb_qp_link_down(qp);
> >>> + edma->rx_cons++;
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>> + return 1;
> >>> + }
> >>> +
> >>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> >>> +
> >>> + qp->rx_bytes += len;
> >>> + qp->rx_pkts++;
> >>> + edma->rx_cons++;
> >>> +
> >>> + if (qp->rx_handler && qp->client_ready)
> >>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> >>> +
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>> + return 1;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = container_of(
> >>> + work, struct ntb_transport_qp_edma, rx_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> + unsigned int i;
> >>> +
> >>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> >>> + if (!ntb_transport_edma_rx_complete(qp))
> >>> + break;
> >>> + }
> >>> +
> >>> + if (ntb_transport_edma_rx_complete(qp))
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = container_of(
> >>> + work, struct ntb_transport_qp_edma, tx_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + struct ntb_queue_entry *entry;
> >>> + unsigned int len;
> >>> + void *cb_data;
> >>> + u32 idx;
> >>> +
> >>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> >>> + edma->tx_cons) != 0) {
> >>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> >>> + smp_rmb();
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->tx_cons);
> >>> + in = NTB_DESC_TX_I(qp, idx);
> >>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> >>> + break;
> >>> +
> >>> + in->data = 0;
> >>> +
> >>> + cb_data = entry->cb_data;
> >>> + len = entry->len;
> >>> +
> >>> + out = NTB_DESC_TX_O(qp, idx);
> >>> +
> >>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> >>> +
> >>> + /*
> >>> + * No need to add barrier in-between to enforce ordering here.
> >>> + * The other side proceeds only after both flags and tail are
> >>> + * updated.
> >>> + */
> >>> + iowrite32(entry->flags, &out->flags);
> >>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> >>> +
> >>> + ntb_transport_edma_notify_peer(edma);
> >>> +
> >>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> >>> + &qp->tx_free_q);
> >>> +
> >>> + if (qp->tx_handler)
> >>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> >>> +
> >>> + /* stat updates */
> >>> + qp->tx_bytes += len;
> >>> + qp->tx_pkts++;
> >>> + }
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_tx_cb(void *data,
> >>> + const struct dmaengine_result *res)
> >>> +{
> >>> + struct ntb_queue_entry *entry = data;
> >>> + struct ntb_transport_qp *qp = entry->qp;
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + enum dmaengine_tx_result dma_err = res->result;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + switch (dma_err) {
> >>> + case DMA_TRANS_READ_FAILED:
> >>> + case DMA_TRANS_WRITE_FAILED:
> >>> + case DMA_TRANS_ABORTED:
> >>> + entry->errors++;
> >>> + entry->len = -EIO;
> >>> + break;
> >>> + case DMA_TRANS_NOERROR:
> >>> + default:
> >>> + break;
> >>> + }
> >>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> >>> + sg_dma_address(&entry->sgl) = 0;
> >>> +
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> +
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> >>> + size_t len, void *rc_src, dma_addr_t dst,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct scatterlist *sgl = &entry->sgl;
> >>> + struct dma_async_tx_descriptor *txd;
> >>> + struct dma_slave_config cfg;
> >>> + dma_cookie_t cookie;
> >>> + int nents, rc;
> >>> +
> >>> + if (!d)
> >>> + return -ENODEV;
> >>> +
> >>> + if (!chan)
> >>> + return -ENXIO;
> >>> +
> >>> + if (WARN_ON(!rc_src || !dst))
> >>> + return -EINVAL;
> >>> +
> >>> + if (WARN_ON(sg_dma_address(sgl)))
> >>> + return -EINVAL;
> >>> +
> >>> + sg_init_one(sgl, rc_src, len);
> >>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>> + if (nents <= 0)
> >>> + return -EIO;
> >>> +
> >>> + memset(&cfg, 0, sizeof(cfg));
> >>> + cfg.dst_addr = dst;
> >>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>> + cfg.direction = DMA_MEM_TO_DEV;
> >>> +
> >>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> >>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> >>> + if (!txd) {
> >>> + rc = -EIO;
> >>> + goto out_unmap;
> >>> + }
> >>> +
> >>> + txd->callback_result = ntb_transport_edma_tx_cb;
> >>> + txd->callback_param = entry;
> >>> +
> >>> + cookie = dmaengine_submit(txd);
> >>> + if (dma_submit_error(cookie)) {
> >>> + rc = -EIO;
> >>> + goto out_unmap;
> >>> + }
> >>> + dma_async_issue_pending(chan);
> >>> + return 0;
> >>> +out_unmap:
> >>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + unsigned int len = entry->len;
> >>> + struct dma_chan *chan;
> >>> + u32 issue, idx, head;
> >>> + dma_addr_t dst;
> >>> + int rc;
> >>> +
> >>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> >>> +
> >>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>> + issue = edma->tx_issue;
> >>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> >>> + qp->tx_ring_full++;
> >>> + return -ENOSPC;
> >>> + }
> >>> +
> >>> + /*
> >>> + * ntb_transport_edma_tx_work() checks entry->flags
> >>> + * so it needs to be set before tx_issue++.
> >>> + */
> >>> + idx = ntb_edma_ring_idx(issue);
> >>> + in = NTB_DESC_TX_I(qp, idx);
> >>> + in->data = (uintptr_t)entry;
> >>> +
> >>> + /* Make in->data visible before tx_issue++ */
> >>> + smp_wmb();
> >>> +
> >>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> >>> + }
> >>> +
> >>> + /* Publish the final transfer length to the other end */
> >>> + out = NTB_DESC_TX_O(qp, idx);
> >>> + iowrite32(len, &out->len);
> >>> + ioread32(&out->len);
> >>> +
> >>> + if (unlikely(!len)) {
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> >>> + dma_rmb();
> >>> +
> >>> + /* kick remote eDMA read transfer */
> >>> + dst = (dma_addr_t)in->addr;
> >>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> >>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> >>> + entry->buf, dst, entry);
> >>> + if (rc) {
> >>> + entry->errors++;
> >>> + entry->len = -EIO;
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> + }
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry,
> >>> + void *cb, void *data, unsigned int len,
> >>> + unsigned int flags)
> >>> +{
> >>> + struct device *dma_dev;
> >>> +
> >>> + if (entry->addr) {
> >>> + /* Deferred unmap */
> >>> + dma_dev = get_dma_dev(qp->ndev);
> >>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
> >>> + DMA_TO_DEVICE);
> >>> + }
> >>> +
> >>> + entry->cb_data = cb;
> >>> + entry->buf = data;
> >>> + entry->len = len;
> >>> + entry->flags = flags;
> >>> + entry->errors = 0;
> >>> + entry->addr = 0;
> >>> +
> >>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> >>> +
> >>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + unsigned int len = entry->len;
> >>> + void *data = entry->buf;
> >>> + dma_addr_t dst;
> >>> + u32 idx;
> >>> + int rc;
> >>> +
> >>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> >>> + rc = dma_mapping_error(dma_dev, dst);
> >>> + if (rc)
> >>> + return rc;
> >>> +
> >>> + guard(spinlock_bh)(&edma->rx_lock);
> >>> +
> >>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> >>> + READ_ONCE(edma->rx_cons))) {
> >>> + rc = -ENOSPC;
> >>> + goto out_unmap;
> >>> + }
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->rx_prod);
> >>> + in = NTB_DESC_RX_I(qp, idx);
> >>> + out = NTB_DESC_RX_O(qp, idx);
> >>> +
> >>> + iowrite32(len, &out->len);
> >>> + iowrite64(dst, &out->addr);
> >>> +
> >>> + WARN_ON(in->flags & DESC_DONE_FLAG);
> >>> + in->data = (uintptr_t)entry;
> >>> + entry->addr = dst;
> >>> +
> >>> + /* Ensure len/addr are visible before the head update */
> >>> + dma_wmb();
> >>> +
> >>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> >>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> >>> +
> >>> + return 0;
> >>> +out_unmap:
> >>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> >>> + if (rc) {
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>> + &qp->rx_free_q);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> >>> +
> >>> + if (qp->active)
> >>> + tasklet_schedule(&qp->rxc_db_work);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> >>> + unsigned int qp_num)
> >>> +{
> >>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>> + struct ntb_transport_qp_edma *edma;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + int node;
> >>> +
> >>> + node = dev_to_node(&ndev->dev);
> >>> +
> >>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> >>> + if (!qp->priv)
> >>> + return -ENOMEM;
> >>> +
> >>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
> >>> + edma->qp = qp;
> >>> + edma->rx_prod = 0;
> >>> + edma->rx_cons = 0;
> >>> + edma->tx_cons = 0;
> >>> + edma->tx_issue = 0;
> >>> +
> >>> + spin_lock_init(&edma->rx_lock);
> >>> + spin_lock_init(&edma->tx_lock);
> >>> +
> >>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> >>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> >>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + cancel_work_sync(&edma->db_work);
> >>> + cancel_work_sync(&edma->rx_work);
> >>> + cancel_work_sync(&edma->tx_work);
> >>> +
> >>> + kfree(qp->priv);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_ep_init(nt);
> >>> + if (rc)
> >>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> >>> +
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_rc_init(nt);
> >>> + if (rc)
> >>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> >>> +
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> >>> + unsigned int *mw_count)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> +
> >>> + if (!use_remote_edma)
> >>> + return 0;
> >>> +
> >>> + /*
> >>> + * We need at least one MW for the transport plus one MW reserved
> >>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> >>> + */
> >>> + if (*mw_count <= 1) {
> >>> + dev_err(&ndev->dev,
> >>> + "remote eDMA requires at least two MWS (have %u)\n",
> >>> + *mw_count);
> >>> + return -ENODEV;
> >>> + }
> >>> +
> >>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> >>> + if (!ctx->wq) {
> >>> + ntb_transport_edma_uninit(nt);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + /* Reserve the last peer MW exclusively for the eDMA window. */
> >>> + *mw_count -= 1;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + ntb_transport_edma_uninit(nt);
> >>> +}
> >>> +
> >>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
> >>> + .enable = ntb_transport_edma_enable,
> >>> + .disable = ntb_transport_edma_disable,
> >>> + .qp_init = ntb_transport_edma_qp_init,
> >>> + .qp_free = ntb_transport_edma_qp_free,
> >>> + .pre_link_up = ntb_transport_edma_pre_link_up,
> >>> + .post_link_up = ntb_transport_edma_post_link_up,
> >>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> >>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> >>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> >>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> >>> + .rx_poll = ntb_transport_edma_rx_poll,
> >>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> >>> +};
> >>> +
> >>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + int node;
> >>> +
> >>> + node = dev_to_node(&ndev->dev);
> >>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> >>> + node);
> >>> + if (!nt->priv)
> >>> + return -ENOMEM;
> >>> +
> >>> + nt->backend_ops = edma_backend_ops;
> >>> + /*
> >>> + * On remote eDMA mode, one DMA read channel is used for Host side
> >>> + * to interrupt EP.
> >>> + */
> >>> + use_msi = false;
> >>> + return 0;
> >>> +}
> >>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> >>> index 51ff08062d73..9fff65980d3d 100644
> >>> --- a/drivers/ntb/ntb_transport_internal.h
> >>> +++ b/drivers/ntb/ntb_transport_internal.h
> >>> @@ -8,6 +8,7 @@
> >>> extern unsigned long max_mw_size;
> >>> extern unsigned int transport_mtu;
> >>> extern bool use_msi;
> >>> +extern bool use_remote_edma;
> >>>
> >>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
> >>>
> >>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> >>> struct ntb_payload_header __iomem *tx_hdr;
> >>> struct ntb_payload_header *rx_hdr;
> >>> };
> >>> +
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> + dma_addr_t addr;
> >>> + struct scatterlist sgl;
> >>> +#endif
> >>> };
> >>>
> >>> struct ntb_rx_info {
> >>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> >>> unsigned int qp_num);
> >>> struct device *get_dma_dev(struct ntb_dev *ndev);
> >>>
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> >>> +#else
> >>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + return -EOPNOTSUPP;
> >>> +}
> >>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> >>> +
> >>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
> >>> --
> >>> 2.51.0
> >>>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2025-12-17 15:16 ` [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode Koichiro Den
2025-12-19 15:00 ` Frank Li
@ 2026-01-06 18:51 ` Dave Jiang
2026-01-07 14:54 ` Koichiro Den
1 sibling, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-06 18:51 UTC (permalink / raw)
To: Koichiro Den, Frank.Li, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring
On 12/17/25 8:16 AM, Koichiro Den wrote:
> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> located on the endpoint, to be driven by both host and endpoint.
>
> The endpoint exposes a dedicated memory window which contains the eDMA
> register block, a small control structure (struct ntb_edma_info) and
> per-channel linked-list (LL) rings for read channels. Endpoint drives
> its local eDMA write channels for its transmission, while host side
> uses the remote eDMA read channels for its transmission.
>
> A key benefit of this backend is that the memory window no longer needs
> to carry data-plane payload. This makes the design less sensitive to
> limited memory window space and allows scaling to multiple queue pairs.
> The memory window layout is specific to the eDMA-backed backend, so
> there is no automatic fallback to the memcpy-based default transport
> that requires the different layout.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> drivers/ntb/Kconfig | 12 +
> drivers/ntb/Makefile | 2 +
> drivers/ntb/ntb_transport_core.c | 15 +-
> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> drivers/ntb/ntb_transport_internal.h | 15 +
> 5 files changed, 1029 insertions(+), 2 deletions(-)
> create mode 100644 drivers/ntb/ntb_transport_edma.c
>
> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> index df16c755b4da..5ba6d0b7f5ba 100644
> --- a/drivers/ntb/Kconfig
> +++ b/drivers/ntb/Kconfig
> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>
> If unsure, say N.
>
> +config NTB_TRANSPORT_EDMA
> + bool "NTB Transport backed by remote eDMA"
> + depends on NTB_TRANSPORT
> + depends on PCI
> + select DMA_ENGINE
> + select NTB_EDMA
> + help
> + Enable a transport backend that uses a remote DesignWare eDMA engine
> + exposed through a dedicated NTB memory window. The host uses the
> + endpoint's eDMA engine to move data in both directions.
> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> +
> endif # NTB
> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> index 9b66e5fafbc0..b9086b32ecde 100644
> --- a/drivers/ntb/Makefile
> +++ b/drivers/ntb/Makefile
> @@ -6,3 +6,5 @@ ntb-y := core.o
> ntb-$(CONFIG_NTB_MSI) += msi.o
>
> ntb_transport-y := ntb_transport_core.o
> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> index 40c2548f5930..bd21232f26fe 100644
> --- a/drivers/ntb/ntb_transport_core.c
> +++ b/drivers/ntb/ntb_transport_core.c
> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> #endif
>
> +bool use_remote_edma;
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> +module_param(use_remote_edma, bool, 0644);
> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> +#endif
This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
DJ
> +
> static struct dentry *nt_debugfs_dir;
>
> /* Only two-ports NTB devices are supported */
> @@ -156,7 +162,7 @@ enum {
> #define drv_client(__drv) \
> container_of((__drv), struct ntb_transport_client, driver)
>
> -#define NTB_QP_DEF_NUM_ENTRIES 100
> +#define NTB_QP_DEF_NUM_ENTRIES 128
> #define NTB_LINK_DOWN_TIMEOUT 10
>
> static void ntb_transport_rxc_db(unsigned long data);
> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
>
> nt->ndev = ndev;
>
> - rc = ntb_transport_default_init(nt);
> + if (use_remote_edma)
> + rc = ntb_transport_edma_init(nt);
> + else
> + rc = ntb_transport_default_init(nt);
> +
> if (rc)
> return rc;
>
> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
>
> nt->qp_bitmap_free &= ~qp_bit;
>
> + qp->qp_bit = qp_bit;
> qp->cb_data = data;
> qp->rx_handler = handlers->rx_handler;
> qp->tx_handler = handlers->tx_handler;
> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> new file mode 100644
> index 000000000000..6ae5da0a1367
> --- /dev/null
> +++ b/drivers/ntb/ntb_transport_edma.c
> @@ -0,0 +1,987 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NTB transport backend for remote DesignWare eDMA.
> + *
> + * This implements the backend_ops used when use_remote_edma=1 and
> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> + */
> +
> +#include <linux/bug.h>
> +#include <linux/compiler.h>
> +#include <linux/debugfs.h>
> +#include <linux/dmaengine.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/errno.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/ntb.h>
> +#include <linux/pci.h>
> +#include <linux/pci-epc.h>
> +#include <linux/seq_file.h>
> +#include <linux/slab.h>
> +
> +#include "hw/edma/ntb_hw_edma.h"
> +#include "ntb_transport_internal.h"
> +
> +#define NTB_EDMA_RING_ORDER 7
> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> +
> +#define NTB_EDMA_MAX_POLL 32
> +
> +/*
> + * Remote eDMA mode implementation
> + */
> +struct ntb_transport_ctx_edma {
> + remote_edma_mode_t remote_edma_mode;
> + struct device *dma_dev;
> + struct workqueue_struct *wq;
> + struct ntb_edma_chans chans;
> +};
> +
> +struct ntb_transport_qp_edma {
> + struct ntb_transport_qp *qp;
> +
> + /*
> + * For ensuring peer notification in non-atomic context.
> + * ntb_peer_db_set might sleep or schedule.
> + */
> + struct work_struct db_work;
> +
> + u32 rx_prod;
> + u32 rx_cons;
> + u32 tx_cons;
> + u32 tx_issue;
> +
> + spinlock_t rx_lock;
> + spinlock_t tx_lock;
> +
> + struct work_struct rx_work;
> + struct work_struct tx_work;
> +};
> +
> +struct ntb_edma_desc {
> + u32 len;
> + u32 flags;
> + u64 addr; /* DMA address */
> + u64 data;
> +};
> +
> +struct ntb_edma_ring {
> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> + u32 head;
> + u32 tail;
> +};
> +
> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> +}
> +
> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> +}
> +
> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> +{
> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> +}
> +
> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return n ^ !!ntb_qp_edma_is_ep(qp);
> +}
> +
> +static inline struct ntb_edma_ring *
> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> +{
> + unsigned int r = ntb_edma_ring_sel(qp, n);
> +
> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> +}
> +
> +static inline struct ntb_edma_ring __iomem *
> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> +{
> + unsigned int r = ntb_edma_ring_sel(qp, n);
> +
> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> +}
> +
> +static inline struct ntb_edma_desc *
> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> +{
> + return &ntb_edma_ring_local(qp, n)->desc[i];
> +}
> +
> +static inline struct ntb_edma_desc __iomem *
> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> + unsigned int i)
> +{
> + return &ntb_edma_ring_remote(qp, n)->desc[i];
> +}
> +
> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_local(qp, n)->head;
> +}
> +
> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_remote(qp, n)->head;
> +}
> +
> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_local(qp, n)->tail;
> +}
> +
> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> + unsigned int n)
> +{
> + return &ntb_edma_ring_remote(qp, n)->tail;
> +}
> +
> +/* The 'i' must be generated by ntb_edma_ring_idx() */
> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> +
> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> +
> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> +
> +/* ntb_edma_ring helpers */
> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> +{
> + return v & NTB_EDMA_RING_MASK;
> +}
> +
> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> +{
> + if (head >= tail) {
> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> + return head - tail;
> + }
> +
> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> + return U32_MAX - tail + head + 1;
> +}
> +
> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> +{
> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> +}
> +
> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> +{
> + return ntb_edma_ring_free_entry(head, tail) == 0;
> +}
> +
> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + unsigned int head, tail;
> +
> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> + /* In this scope, only 'head' might proceed */
> + tail = READ_ONCE(edma->tx_issue);
> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> + }
> + /*
> + * 'used' amount indicates how much the other end has refilled,
> + * which are available for us to use for TX.
> + */
> + return ntb_edma_ring_used_entry(head, tail);
> +}
> +
> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> + struct ntb_transport_qp *qp)
> +{
> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> +
> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> + seq_putc(s, '\n');
> +
> + seq_puts(s, "Using Remote eDMA - Yes\n");
> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> +}
> +
> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> +
> + if (ctx->wq)
> + destroy_workqueue(ctx->wq);
> + ctx->wq = NULL;
> +
> + ntb_edma_teardown_chans(&ctx->chans);
> +
> + switch (ctx->remote_edma_mode) {
> + case REMOTE_EDMA_EP:
> + ntb_edma_teardown_mws(nt->ndev);
> + break;
> + case REMOTE_EDMA_RC:
> + ntb_edma_teardown_peer(nt->ndev);
> + break;
> + case REMOTE_EDMA_UNKNOWN:
> + default:
> + break;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> +}
> +
> +static void ntb_transport_edma_db_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma =
> + container_of(work, struct ntb_transport_qp_edma, db_work);
> + struct ntb_transport_qp *qp = edma->qp;
> +
> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> +}
> +
> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> +{
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> +
> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> + return;
> +
> + /*
> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> + * may sleep, delegate the actual doorbell write to a workqueue.
> + */
> + queue_work(system_highpri_wq, &edma->db_work);
> +}
> +
> +static void ntb_transport_edma_isr(void *data, int qp_num)
> +{
> + struct ntb_transport_ctx *nt = data;
> + struct ntb_transport_qp_edma *edma;
> + struct ntb_transport_ctx_edma *ctx;
> + struct ntb_transport_qp *qp;
> +
> + if (qp_num < 0 || qp_num >= nt->qp_count)
> + return;
> +
> + qp = &nt->qp_vec[qp_num];
> + if (WARN_ON(!qp))
> + return;
> +
> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> + edma = qp->priv;
> +
> + queue_work(ctx->wq, &edma->rx_work);
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int peer_mw;
> + int rc;
> +
> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> + return 0;
> +
> + peer_mw = ntb_peer_mw_count(ndev);
> + if (peer_mw <= 0)
> + return -ENODEV;
> +
> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> + return rc;
> + }
> +
> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> + goto err_teardown_peer;
> + }
> +
> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> + rc);
> + goto err_teardown_chans;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> + return 0;
> +
> +err_teardown_chans:
> + ntb_edma_teardown_chans(&ctx->chans);
> +err_teardown_peer:
> + ntb_edma_teardown_peer(ndev);
> + return rc;
> +}
> +
> +
> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int peer_mw;
> + int rc;
> +
> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> + return 0;
> +
> + /**
> + * This check assumes that the endpoint (pci-epf-vntb.c)
> + * ntb_dev_ops implements .get_private_data() while the host side
> + * (ntb_hw_epf.c) does not.
> + */
> + if (!ntb_get_private_data(ndev))
> + return 0;
> +
> + peer_mw = ntb_peer_mw_count(ndev);
> + if (peer_mw <= 0)
> + return -ENODEV;
> +
> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> + ntb_transport_edma_isr, nt);
> + if (rc) {
> + dev_err(&pdev->dev,
> + "Failed to set up memory window for eDMA: %d\n", rc);
> + return rc;
> + }
> +
> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> + if (rc) {
> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> + ntb_edma_teardown_mws(ndev);
> + return rc;
> + }
> +
> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> + return 0;
> +}
> +
> +
> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> + unsigned int qp_num)
> +{
> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> + struct ntb_dev *ndev = nt->ndev;
> + struct ntb_queue_entry *entry;
> + struct ntb_transport_mw *mw;
> + unsigned int mw_num, mw_count, qp_count;
> + unsigned int qp_offset, rx_info_offset;
> + unsigned int mw_size, mw_size_per_qp;
> + unsigned int num_qps_mw;
> + size_t edma_total;
> + unsigned int i;
> + int node;
> +
> + mw_count = nt->mw_count;
> + qp_count = nt->qp_count;
> +
> + mw_num = QP_TO_MW(nt, qp_num);
> + mw = &nt->mw_vec[mw_num];
> +
> + if (!mw->virt_addr)
> + return -ENOMEM;
> +
> + if (mw_num < qp_count % mw_count)
> + num_qps_mw = qp_count / mw_count + 1;
> + else
> + num_qps_mw = qp_count / mw_count;
> +
> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> + if (max_mw_size && mw_size > max_mw_size)
> + mw_size = max_mw_size;
> +
> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> +
> + qp->tx_mw_size = mw_size_per_qp;
> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> + if (!qp->tx_mw)
> + return -EINVAL;
> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> + if (!qp->tx_mw_phys)
> + return -EINVAL;
> + qp->rx_info = qp->tx_mw + rx_info_offset;
> + qp->rx_buff = mw->virt_addr + qp_offset;
> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> +
> + /* Due to housekeeping, there must be at least 2 buffs */
> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> +
> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> + edma_total = 2 * sizeof(struct ntb_edma_ring);
> + if (rx_info_offset < edma_total) {
> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> + edma_total, rx_info_offset);
> + return -EINVAL;
> + }
> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> +
> + /*
> + * Checking to see if we have more entries than the default.
> + * We should add additional entries if that is the case so we
> + * can be in sync with the transport frames.
> + */
> + node = dev_to_node(&ndev->dev);
> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> + if (!entry)
> + return -ENOMEM;
> +
> + entry->qp = qp;
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> + &qp->rx_free_q);
> + qp->rx_alloc_entry++;
> + }
> +
> + memset(qp->rx_buff, 0, edma_total);
> +
> + qp->rx_pkts = 0;
> + qp->tx_pkts = 0;
> +
> + return 0;
> +}
> +
> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_queue_entry *entry;
> + struct ntb_edma_desc *in;
> + unsigned int len;
> + bool link_down;
> + u32 idx;
> +
> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> + edma->rx_cons) == 0)
> + return 0;
> +
> + idx = ntb_edma_ring_idx(edma->rx_cons);
> + in = NTB_DESC_RX_I(qp, idx);
> + if (!(in->flags & DESC_DONE_FLAG))
> + return 0;
> +
> + link_down = in->flags & LINK_DOWN_FLAG;
> + in->flags = 0;
> + len = in->len; /* might be smaller than entry->len */
> +
> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> + if (WARN_ON(!entry))
> + return 0;
> +
> + if (link_down) {
> + ntb_qp_link_down(qp);
> + edma->rx_cons++;
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> + return 1;
> + }
> +
> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> +
> + qp->rx_bytes += len;
> + qp->rx_pkts++;
> + edma->rx_cons++;
> +
> + if (qp->rx_handler && qp->client_ready)
> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> +
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> + return 1;
> +}
> +
> +static void ntb_transport_edma_rx_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma = container_of(
> + work, struct ntb_transport_qp_edma, rx_work);
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> + unsigned int i;
> +
> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> + if (!ntb_transport_edma_rx_complete(qp))
> + break;
> + }
> +
> + if (ntb_transport_edma_rx_complete(qp))
> + queue_work(ctx->wq, &edma->rx_work);
> +}
> +
> +static void ntb_transport_edma_tx_work(struct work_struct *work)
> +{
> + struct ntb_transport_qp_edma *edma = container_of(
> + work, struct ntb_transport_qp_edma, tx_work);
> + struct ntb_transport_qp *qp = edma->qp;
> + struct ntb_edma_desc *in, __iomem *out;
> + struct ntb_queue_entry *entry;
> + unsigned int len;
> + void *cb_data;
> + u32 idx;
> +
> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> + edma->tx_cons) != 0) {
> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> + smp_rmb();
> +
> + idx = ntb_edma_ring_idx(edma->tx_cons);
> + in = NTB_DESC_TX_I(qp, idx);
> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> + break;
> +
> + in->data = 0;
> +
> + cb_data = entry->cb_data;
> + len = entry->len;
> +
> + out = NTB_DESC_TX_O(qp, idx);
> +
> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> +
> + /*
> + * No need to add barrier in-between to enforce ordering here.
> + * The other side proceeds only after both flags and tail are
> + * updated.
> + */
> + iowrite32(entry->flags, &out->flags);
> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> +
> + ntb_transport_edma_notify_peer(edma);
> +
> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> + &qp->tx_free_q);
> +
> + if (qp->tx_handler)
> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> +
> + /* stat updates */
> + qp->tx_bytes += len;
> + qp->tx_pkts++;
> + }
> +}
> +
> +static void ntb_transport_edma_tx_cb(void *data,
> + const struct dmaengine_result *res)
> +{
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + enum dmaengine_tx_result dma_err = res->result;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + switch (dma_err) {
> + case DMA_TRANS_READ_FAILED:
> + case DMA_TRANS_WRITE_FAILED:
> + case DMA_TRANS_ABORTED:
> + entry->errors++;
> + entry->len = -EIO;
> + break;
> + case DMA_TRANS_NOERROR:
> + default:
> + break;
> + }
> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> + sg_dma_address(&entry->sgl) = 0;
> +
> + entry->flags |= DESC_DONE_FLAG;
> +
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> + size_t len, void *rc_src, dma_addr_t dst,
> + struct ntb_queue_entry *entry)
> +{
> + struct scatterlist *sgl = &entry->sgl;
> + struct dma_async_tx_descriptor *txd;
> + struct dma_slave_config cfg;
> + dma_cookie_t cookie;
> + int nents, rc;
> +
> + if (!d)
> + return -ENODEV;
> +
> + if (!chan)
> + return -ENXIO;
> +
> + if (WARN_ON(!rc_src || !dst))
> + return -EINVAL;
> +
> + if (WARN_ON(sg_dma_address(sgl)))
> + return -EINVAL;
> +
> + sg_init_one(sgl, rc_src, len);
> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> + if (nents <= 0)
> + return -EIO;
> +
> + memset(&cfg, 0, sizeof(cfg));
> + cfg.dst_addr = dst;
> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> + cfg.direction = DMA_MEM_TO_DEV;
> +
> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> + if (!txd) {
> + rc = -EIO;
> + goto out_unmap;
> + }
> +
> + txd->callback_result = ntb_transport_edma_tx_cb;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie)) {
> + rc = -EIO;
> + goto out_unmap;
> + }
> + dma_async_issue_pending(chan);
> + return 0;
> +out_unmap:
> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> + return rc;
> +}
> +
> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct ntb_edma_desc *in, __iomem *out;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + unsigned int len = entry->len;
> + struct dma_chan *chan;
> + u32 issue, idx, head;
> + dma_addr_t dst;
> + int rc;
> +
> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> +
> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> + issue = edma->tx_issue;
> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> + qp->tx_ring_full++;
> + return -ENOSPC;
> + }
> +
> + /*
> + * ntb_transport_edma_tx_work() checks entry->flags
> + * so it needs to be set before tx_issue++.
> + */
> + idx = ntb_edma_ring_idx(issue);
> + in = NTB_DESC_TX_I(qp, idx);
> + in->data = (uintptr_t)entry;
> +
> + /* Make in->data visible before tx_issue++ */
> + smp_wmb();
> +
> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> + }
> +
> + /* Publish the final transfer length to the other end */
> + out = NTB_DESC_TX_O(qp, idx);
> + iowrite32(len, &out->len);
> + ioread32(&out->len);
> +
> + if (unlikely(!len)) {
> + entry->flags |= DESC_DONE_FLAG;
> + queue_work(ctx->wq, &edma->tx_work);
> + return 0;
> + }
> +
> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> + dma_rmb();
> +
> + /* kick remote eDMA read transfer */
> + dst = (dma_addr_t)in->addr;
> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> + entry->buf, dst, entry);
> + if (rc) {
> + entry->errors++;
> + entry->len = -EIO;
> + entry->flags |= DESC_DONE_FLAG;
> + queue_work(ctx->wq, &edma->tx_work);
> + }
> + return 0;
> +}
> +
> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry,
> + void *cb, void *data, unsigned int len,
> + unsigned int flags)
> +{
> + struct device *dma_dev;
> +
> + if (entry->addr) {
> + /* Deferred unmap */
> + dma_dev = get_dma_dev(qp->ndev);
> + dma_unmap_single(dma_dev, entry->addr, entry->len,
> + DMA_TO_DEVICE);
> + }
> +
> + entry->cb_data = cb;
> + entry->buf = data;
> + entry->len = len;
> + entry->flags = flags;
> + entry->errors = 0;
> + entry->addr = 0;
> +
> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> +
> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> +}
> +
> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + struct device *dma_dev = get_dma_dev(qp->ndev);
> + struct ntb_transport_qp_edma *edma = qp->priv;
> + struct ntb_edma_desc *in, __iomem *out;
> + unsigned int len = entry->len;
> + void *data = entry->buf;
> + dma_addr_t dst;
> + u32 idx;
> + int rc;
> +
> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> + rc = dma_mapping_error(dma_dev, dst);
> + if (rc)
> + return rc;
> +
> + guard(spinlock_bh)(&edma->rx_lock);
> +
> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> + READ_ONCE(edma->rx_cons))) {
> + rc = -ENOSPC;
> + goto out_unmap;
> + }
> +
> + idx = ntb_edma_ring_idx(edma->rx_prod);
> + in = NTB_DESC_RX_I(qp, idx);
> + out = NTB_DESC_RX_O(qp, idx);
> +
> + iowrite32(len, &out->len);
> + iowrite64(dst, &out->addr);
> +
> + WARN_ON(in->flags & DESC_DONE_FLAG);
> + in->data = (uintptr_t)entry;
> + entry->addr = dst;
> +
> + /* Ensure len/addr are visible before the head update */
> + dma_wmb();
> +
> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> +
> + return 0;
> +out_unmap:
> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> + return rc;
> +}
> +
> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + int rc;
> +
> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> + if (rc) {
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> + &qp->rx_free_q);
> + return rc;
> + }
> +
> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> +
> + if (qp->active)
> + tasklet_schedule(&qp->rxc_db_work);
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_ctx *nt = qp->transport;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + queue_work(ctx->wq, &edma->rx_work);
> + queue_work(ctx->wq, &edma->tx_work);
> +}
> +
> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> + unsigned int qp_num)
> +{
> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> + struct ntb_transport_qp_edma *edma;
> + struct ntb_dev *ndev = nt->ndev;
> + int node;
> +
> + node = dev_to_node(&ndev->dev);
> +
> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> + if (!qp->priv)
> + return -ENOMEM;
> +
> + edma = (struct ntb_transport_qp_edma *)qp->priv;
> + edma->qp = qp;
> + edma->rx_prod = 0;
> + edma->rx_cons = 0;
> + edma->tx_cons = 0;
> + edma->tx_issue = 0;
> +
> + spin_lock_init(&edma->rx_lock);
> + spin_lock_init(&edma->tx_lock);
> +
> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> +{
> + struct ntb_transport_qp_edma *edma = qp->priv;
> +
> + cancel_work_sync(&edma->db_work);
> + cancel_work_sync(&edma->rx_work);
> + cancel_work_sync(&edma->tx_work);
> +
> + kfree(qp->priv);
> +}
> +
> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int rc;
> +
> + rc = ntb_transport_edma_ep_init(nt);
> + if (rc)
> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> +
> + return rc;
> +}
> +
> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct pci_dev *pdev = ndev->pdev;
> + int rc;
> +
> + rc = ntb_transport_edma_rc_init(nt);
> + if (rc)
> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> +
> + return rc;
> +}
> +
> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> + unsigned int *mw_count)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> +
> + if (!use_remote_edma)
> + return 0;
> +
> + /*
> + * We need at least one MW for the transport plus one MW reserved
> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> + */
> + if (*mw_count <= 1) {
> + dev_err(&ndev->dev,
> + "remote eDMA requires at least two MWS (have %u)\n",
> + *mw_count);
> + return -ENODEV;
> + }
> +
> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> + if (!ctx->wq) {
> + ntb_transport_edma_uninit(nt);
> + return -ENOMEM;
> + }
> +
> + /* Reserve the last peer MW exclusively for the eDMA window. */
> + *mw_count -= 1;
> +
> + return 0;
> +}
> +
> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> +{
> + ntb_transport_edma_uninit(nt);
> +}
> +
> +static const struct ntb_transport_backend_ops edma_backend_ops = {
> + .enable = ntb_transport_edma_enable,
> + .disable = ntb_transport_edma_disable,
> + .qp_init = ntb_transport_edma_qp_init,
> + .qp_free = ntb_transport_edma_qp_free,
> + .pre_link_up = ntb_transport_edma_pre_link_up,
> + .post_link_up = ntb_transport_edma_post_link_up,
> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> + .rx_poll = ntb_transport_edma_rx_poll,
> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> +};
> +
> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> +{
> + struct ntb_dev *ndev = nt->ndev;
> + int node;
> +
> + node = dev_to_node(&ndev->dev);
> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> + node);
> + if (!nt->priv)
> + return -ENOMEM;
> +
> + nt->backend_ops = edma_backend_ops;
> + /*
> + * On remote eDMA mode, one DMA read channel is used for Host side
> + * to interrupt EP.
> + */
> + use_msi = false;
> + return 0;
> +}
> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> index 51ff08062d73..9fff65980d3d 100644
> --- a/drivers/ntb/ntb_transport_internal.h
> +++ b/drivers/ntb/ntb_transport_internal.h
> @@ -8,6 +8,7 @@
> extern unsigned long max_mw_size;
> extern unsigned int transport_mtu;
> extern bool use_msi;
> +extern bool use_remote_edma;
>
> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
>
> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> struct ntb_payload_header __iomem *tx_hdr;
> struct ntb_payload_header *rx_hdr;
> };
> +
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> + dma_addr_t addr;
> + struct scatterlist sgl;
> +#endif
> };
>
> struct ntb_rx_info {
> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> unsigned int qp_num);
> struct device *get_dma_dev(struct ntb_dev *ndev);
>
> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> +#else
> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> +
> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-06 18:51 ` Dave Jiang
@ 2026-01-07 14:54 ` Koichiro Den
2026-01-07 19:02 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2026-01-07 14:54 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
>
>
> On 12/17/25 8:16 AM, Koichiro Den wrote:
> > Add a new ntb_transport backend that uses a DesignWare eDMA engine
> > located on the endpoint, to be driven by both host and endpoint.
> >
> > The endpoint exposes a dedicated memory window which contains the eDMA
> > register block, a small control structure (struct ntb_edma_info) and
> > per-channel linked-list (LL) rings for read channels. Endpoint drives
> > its local eDMA write channels for its transmission, while host side
> > uses the remote eDMA read channels for its transmission.
> >
> > A key benefit of this backend is that the memory window no longer needs
> > to carry data-plane payload. This makes the design less sensitive to
> > limited memory window space and allows scaling to multiple queue pairs.
> > The memory window layout is specific to the eDMA-backed backend, so
> > there is no automatic fallback to the memcpy-based default transport
> > that requires the different layout.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > drivers/ntb/Kconfig | 12 +
> > drivers/ntb/Makefile | 2 +
> > drivers/ntb/ntb_transport_core.c | 15 +-
> > drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> > drivers/ntb/ntb_transport_internal.h | 15 +
> > 5 files changed, 1029 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/ntb/ntb_transport_edma.c
> >
> > diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> > index df16c755b4da..5ba6d0b7f5ba 100644
> > --- a/drivers/ntb/Kconfig
> > +++ b/drivers/ntb/Kconfig
> > @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >
> > If unsure, say N.
> >
> > +config NTB_TRANSPORT_EDMA
> > + bool "NTB Transport backed by remote eDMA"
> > + depends on NTB_TRANSPORT
> > + depends on PCI
> > + select DMA_ENGINE
> > + select NTB_EDMA
> > + help
> > + Enable a transport backend that uses a remote DesignWare eDMA engine
> > + exposed through a dedicated NTB memory window. The host uses the
> > + endpoint's eDMA engine to move data in both directions.
> > + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> > +
> > endif # NTB
> > diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> > index 9b66e5fafbc0..b9086b32ecde 100644
> > --- a/drivers/ntb/Makefile
> > +++ b/drivers/ntb/Makefile
> > @@ -6,3 +6,5 @@ ntb-y := core.o
> > ntb-$(CONFIG_NTB_MSI) += msi.o
> >
> > ntb_transport-y := ntb_transport_core.o
> > +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> > +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> > diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> > index 40c2548f5930..bd21232f26fe 100644
> > --- a/drivers/ntb/ntb_transport_core.c
> > +++ b/drivers/ntb/ntb_transport_core.c
> > @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> > MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> > #endif
> >
> > +bool use_remote_edma;
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > +module_param(use_remote_edma, bool, 0644);
> > +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> > +#endif
>
> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
Agreed. I plan to drop 'use_remote_edma' and instead,
- add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
- introduce ntb_transport_backend_register() for transports to self-register via
struct ntb_transport_backend { .name, .ops }, and
- have the core select the backend whose .name matches transport_type.
I think this should keep any non-default transport-specific logic out of
ntb_transport_core, or at least keep it to a minimum, while still allowing
non-defualt transports (*ntb_transport_edma is the only choice for now
though) to plug in cleanly.
If you see a cleaner approach, I would appreciate it if you could elaborate
a bit more on your idea.
Thanks,
Koichiro
>
> DJ
>
> > +
> > static struct dentry *nt_debugfs_dir;
> >
> > /* Only two-ports NTB devices are supported */
> > @@ -156,7 +162,7 @@ enum {
> > #define drv_client(__drv) \
> > container_of((__drv), struct ntb_transport_client, driver)
> >
> > -#define NTB_QP_DEF_NUM_ENTRIES 100
> > +#define NTB_QP_DEF_NUM_ENTRIES 128
> > #define NTB_LINK_DOWN_TIMEOUT 10
> >
> > static void ntb_transport_rxc_db(unsigned long data);
> > @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
> >
> > nt->ndev = ndev;
> >
> > - rc = ntb_transport_default_init(nt);
> > + if (use_remote_edma)
> > + rc = ntb_transport_edma_init(nt);
> > + else
> > + rc = ntb_transport_default_init(nt);
> > +
> > if (rc)
> > return rc;
> >
> > @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
> >
> > nt->qp_bitmap_free &= ~qp_bit;
> >
> > + qp->qp_bit = qp_bit;
> > qp->cb_data = data;
> > qp->rx_handler = handlers->rx_handler;
> > qp->tx_handler = handlers->tx_handler;
> > diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> > new file mode 100644
> > index 000000000000..6ae5da0a1367
> > --- /dev/null
> > +++ b/drivers/ntb/ntb_transport_edma.c
> > @@ -0,0 +1,987 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NTB transport backend for remote DesignWare eDMA.
> > + *
> > + * This implements the backend_ops used when use_remote_edma=1 and
> > + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> > + */
> > +
> > +#include <linux/bug.h>
> > +#include <linux/compiler.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/dmaengine.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/errno.h>
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/ntb.h>
> > +#include <linux/pci.h>
> > +#include <linux/pci-epc.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/slab.h>
> > +
> > +#include "hw/edma/ntb_hw_edma.h"
> > +#include "ntb_transport_internal.h"
> > +
> > +#define NTB_EDMA_RING_ORDER 7
> > +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> > +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> > +
> > +#define NTB_EDMA_MAX_POLL 32
> > +
> > +/*
> > + * Remote eDMA mode implementation
> > + */
> > +struct ntb_transport_ctx_edma {
> > + remote_edma_mode_t remote_edma_mode;
> > + struct device *dma_dev;
> > + struct workqueue_struct *wq;
> > + struct ntb_edma_chans chans;
> > +};
> > +
> > +struct ntb_transport_qp_edma {
> > + struct ntb_transport_qp *qp;
> > +
> > + /*
> > + * For ensuring peer notification in non-atomic context.
> > + * ntb_peer_db_set might sleep or schedule.
> > + */
> > + struct work_struct db_work;
> > +
> > + u32 rx_prod;
> > + u32 rx_cons;
> > + u32 tx_cons;
> > + u32 tx_issue;
> > +
> > + spinlock_t rx_lock;
> > + spinlock_t tx_lock;
> > +
> > + struct work_struct rx_work;
> > + struct work_struct tx_work;
> > +};
> > +
> > +struct ntb_edma_desc {
> > + u32 len;
> > + u32 flags;
> > + u64 addr; /* DMA address */
> > + u64 data;
> > +};
> > +
> > +struct ntb_edma_ring {
> > + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> > + u32 head;
> > + u32 tail;
> > +};
> > +
> > +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> > +}
> > +
> > +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> > +}
> > +
> > +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> > +{
> > + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> > +}
> > +
> > +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return n ^ !!ntb_qp_edma_is_ep(qp);
> > +}
> > +
> > +static inline struct ntb_edma_ring *
> > +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> > +{
> > + unsigned int r = ntb_edma_ring_sel(qp, n);
> > +
> > + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> > +}
> > +
> > +static inline struct ntb_edma_ring __iomem *
> > +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> > +{
> > + unsigned int r = ntb_edma_ring_sel(qp, n);
> > +
> > + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> > +}
> > +
> > +static inline struct ntb_edma_desc *
> > +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->desc[i];
> > +}
> > +
> > +static inline struct ntb_edma_desc __iomem *
> > +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> > + unsigned int i)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->desc[i];
> > +}
> > +
> > +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->head;
> > +}
> > +
> > +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->head;
> > +}
> > +
> > +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_local(qp, n)->tail;
> > +}
> > +
> > +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> > + unsigned int n)
> > +{
> > + return &ntb_edma_ring_remote(qp, n)->tail;
> > +}
> > +
> > +/* The 'i' must be generated by ntb_edma_ring_idx() */
> > +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> > +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> > +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> > +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> > +
> > +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> > +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> > +
> > +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> > +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> > +
> > +/* ntb_edma_ring helpers */
> > +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> > +{
> > + return v & NTB_EDMA_RING_MASK;
> > +}
> > +
> > +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> > +{
> > + if (head >= tail) {
> > + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> > + return head - tail;
> > + }
> > +
> > + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> > + return U32_MAX - tail + head + 1;
> > +}
> > +
> > +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> > +{
> > + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> > +}
> > +
> > +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> > +{
> > + return ntb_edma_ring_free_entry(head, tail) == 0;
> > +}
> > +
> > +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + unsigned int head, tail;
> > +
> > + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> > + /* In this scope, only 'head' might proceed */
> > + tail = READ_ONCE(edma->tx_issue);
> > + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> > + }
> > + /*
> > + * 'used' amount indicates how much the other end has refilled,
> > + * which are available for us to use for TX.
> > + */
> > + return ntb_edma_ring_used_entry(head, tail);
> > +}
> > +
> > +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> > + struct ntb_transport_qp *qp)
> > +{
> > + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> > + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> > + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> > + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> > + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> > + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> > +
> > + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> > + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> > + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> > + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> > + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> > + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> > + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> > + seq_putc(s, '\n');
> > +
> > + seq_puts(s, "Using Remote eDMA - Yes\n");
> > + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> > +}
> > +
> > +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > +
> > + if (ctx->wq)
> > + destroy_workqueue(ctx->wq);
> > + ctx->wq = NULL;
> > +
> > + ntb_edma_teardown_chans(&ctx->chans);
> > +
> > + switch (ctx->remote_edma_mode) {
> > + case REMOTE_EDMA_EP:
> > + ntb_edma_teardown_mws(nt->ndev);
> > + break;
> > + case REMOTE_EDMA_RC:
> > + ntb_edma_teardown_peer(nt->ndev);
> > + break;
> > + case REMOTE_EDMA_UNKNOWN:
> > + default:
> > + break;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> > +}
> > +
> > +static void ntb_transport_edma_db_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma =
> > + container_of(work, struct ntb_transport_qp_edma, db_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > +
> > + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> > +}
> > +
> > +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> > +{
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > +
> > + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> > + return;
> > +
> > + /*
> > + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> > + * may sleep, delegate the actual doorbell write to a workqueue.
> > + */
> > + queue_work(system_highpri_wq, &edma->db_work);
> > +}
> > +
> > +static void ntb_transport_edma_isr(void *data, int qp_num)
> > +{
> > + struct ntb_transport_ctx *nt = data;
> > + struct ntb_transport_qp_edma *edma;
> > + struct ntb_transport_ctx_edma *ctx;
> > + struct ntb_transport_qp *qp;
> > +
> > + if (qp_num < 0 || qp_num >= nt->qp_count)
> > + return;
> > +
> > + qp = &nt->qp_vec[qp_num];
> > + if (WARN_ON(!qp))
> > + return;
> > +
> > + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> > + edma = qp->priv;
> > +
> > + queue_work(ctx->wq, &edma->rx_work);
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int peer_mw;
> > + int rc;
> > +
> > + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> > + return 0;
> > +
> > + peer_mw = ntb_peer_mw_count(ndev);
> > + if (peer_mw <= 0)
> > + return -ENODEV;
> > +
> > + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> > + goto err_teardown_peer;
> > + }
> > +
> > + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> > + rc);
> > + goto err_teardown_chans;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> > + return 0;
> > +
> > +err_teardown_chans:
> > + ntb_edma_teardown_chans(&ctx->chans);
> > +err_teardown_peer:
> > + ntb_edma_teardown_peer(ndev);
> > + return rc;
> > +}
> > +
> > +
> > +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int peer_mw;
> > + int rc;
> > +
> > + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> > + return 0;
> > +
> > + /**
> > + * This check assumes that the endpoint (pci-epf-vntb.c)
> > + * ntb_dev_ops implements .get_private_data() while the host side
> > + * (ntb_hw_epf.c) does not.
> > + */
> > + if (!ntb_get_private_data(ndev))
> > + return 0;
> > +
> > + peer_mw = ntb_peer_mw_count(ndev);
> > + if (peer_mw <= 0)
> > + return -ENODEV;
> > +
> > + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> > + ntb_transport_edma_isr, nt);
> > + if (rc) {
> > + dev_err(&pdev->dev,
> > + "Failed to set up memory window for eDMA: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> > + if (rc) {
> > + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> > + ntb_edma_teardown_mws(ndev);
> > + return rc;
> > + }
> > +
> > + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> > + return 0;
> > +}
> > +
> > +
> > +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> > + unsigned int qp_num)
> > +{
> > + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct ntb_queue_entry *entry;
> > + struct ntb_transport_mw *mw;
> > + unsigned int mw_num, mw_count, qp_count;
> > + unsigned int qp_offset, rx_info_offset;
> > + unsigned int mw_size, mw_size_per_qp;
> > + unsigned int num_qps_mw;
> > + size_t edma_total;
> > + unsigned int i;
> > + int node;
> > +
> > + mw_count = nt->mw_count;
> > + qp_count = nt->qp_count;
> > +
> > + mw_num = QP_TO_MW(nt, qp_num);
> > + mw = &nt->mw_vec[mw_num];
> > +
> > + if (!mw->virt_addr)
> > + return -ENOMEM;
> > +
> > + if (mw_num < qp_count % mw_count)
> > + num_qps_mw = qp_count / mw_count + 1;
> > + else
> > + num_qps_mw = qp_count / mw_count;
> > +
> > + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> > + if (max_mw_size && mw_size > max_mw_size)
> > + mw_size = max_mw_size;
> > +
> > + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> > + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> > + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> > +
> > + qp->tx_mw_size = mw_size_per_qp;
> > + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> > + if (!qp->tx_mw)
> > + return -EINVAL;
> > + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> > + if (!qp->tx_mw_phys)
> > + return -EINVAL;
> > + qp->rx_info = qp->tx_mw + rx_info_offset;
> > + qp->rx_buff = mw->virt_addr + qp_offset;
> > + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> > +
> > + /* Due to housekeeping, there must be at least 2 buffs */
> > + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> > + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> > +
> > + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> > + edma_total = 2 * sizeof(struct ntb_edma_ring);
> > + if (rx_info_offset < edma_total) {
> > + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> > + edma_total, rx_info_offset);
> > + return -EINVAL;
> > + }
> > + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> > + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> > +
> > + /*
> > + * Checking to see if we have more entries than the default.
> > + * We should add additional entries if that is the case so we
> > + * can be in sync with the transport frames.
> > + */
> > + node = dev_to_node(&ndev->dev);
> > + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> > + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> > + if (!entry)
> > + return -ENOMEM;
> > +
> > + entry->qp = qp;
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> > + &qp->rx_free_q);
> > + qp->rx_alloc_entry++;
> > + }
> > +
> > + memset(qp->rx_buff, 0, edma_total);
> > +
> > + qp->rx_pkts = 0;
> > + qp->tx_pkts = 0;
> > +
> > + return 0;
> > +}
> > +
> > +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_queue_entry *entry;
> > + struct ntb_edma_desc *in;
> > + unsigned int len;
> > + bool link_down;
> > + u32 idx;
> > +
> > + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> > + edma->rx_cons) == 0)
> > + return 0;
> > +
> > + idx = ntb_edma_ring_idx(edma->rx_cons);
> > + in = NTB_DESC_RX_I(qp, idx);
> > + if (!(in->flags & DESC_DONE_FLAG))
> > + return 0;
> > +
> > + link_down = in->flags & LINK_DOWN_FLAG;
> > + in->flags = 0;
> > + len = in->len; /* might be smaller than entry->len */
> > +
> > + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> > + if (WARN_ON(!entry))
> > + return 0;
> > +
> > + if (link_down) {
> > + ntb_qp_link_down(qp);
> > + edma->rx_cons++;
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> > + return 1;
> > + }
> > +
> > + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> > +
> > + qp->rx_bytes += len;
> > + qp->rx_pkts++;
> > + edma->rx_cons++;
> > +
> > + if (qp->rx_handler && qp->client_ready)
> > + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> > +
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> > + return 1;
> > +}
> > +
> > +static void ntb_transport_edma_rx_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma = container_of(
> > + work, struct ntb_transport_qp_edma, rx_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> > + unsigned int i;
> > +
> > + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> > + if (!ntb_transport_edma_rx_complete(qp))
> > + break;
> > + }
> > +
> > + if (ntb_transport_edma_rx_complete(qp))
> > + queue_work(ctx->wq, &edma->rx_work);
> > +}
> > +
> > +static void ntb_transport_edma_tx_work(struct work_struct *work)
> > +{
> > + struct ntb_transport_qp_edma *edma = container_of(
> > + work, struct ntb_transport_qp_edma, tx_work);
> > + struct ntb_transport_qp *qp = edma->qp;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + struct ntb_queue_entry *entry;
> > + unsigned int len;
> > + void *cb_data;
> > + u32 idx;
> > +
> > + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> > + edma->tx_cons) != 0) {
> > + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> > + smp_rmb();
> > +
> > + idx = ntb_edma_ring_idx(edma->tx_cons);
> > + in = NTB_DESC_TX_I(qp, idx);
> > + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> > + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> > + break;
> > +
> > + in->data = 0;
> > +
> > + cb_data = entry->cb_data;
> > + len = entry->len;
> > +
> > + out = NTB_DESC_TX_O(qp, idx);
> > +
> > + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> > +
> > + /*
> > + * No need to add barrier in-between to enforce ordering here.
> > + * The other side proceeds only after both flags and tail are
> > + * updated.
> > + */
> > + iowrite32(entry->flags, &out->flags);
> > + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> > +
> > + ntb_transport_edma_notify_peer(edma);
> > +
> > + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> > + &qp->tx_free_q);
> > +
> > + if (qp->tx_handler)
> > + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> > +
> > + /* stat updates */
> > + qp->tx_bytes += len;
> > + qp->tx_pkts++;
> > + }
> > +}
> > +
> > +static void ntb_transport_edma_tx_cb(void *data,
> > + const struct dmaengine_result *res)
> > +{
> > + struct ntb_queue_entry *entry = data;
> > + struct ntb_transport_qp *qp = entry->qp;
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + enum dmaengine_tx_result dma_err = res->result;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + switch (dma_err) {
> > + case DMA_TRANS_READ_FAILED:
> > + case DMA_TRANS_WRITE_FAILED:
> > + case DMA_TRANS_ABORTED:
> > + entry->errors++;
> > + entry->len = -EIO;
> > + break;
> > + case DMA_TRANS_NOERROR:
> > + default:
> > + break;
> > + }
> > + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> > + sg_dma_address(&entry->sgl) = 0;
> > +
> > + entry->flags |= DESC_DONE_FLAG;
> > +
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> > + size_t len, void *rc_src, dma_addr_t dst,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct scatterlist *sgl = &entry->sgl;
> > + struct dma_async_tx_descriptor *txd;
> > + struct dma_slave_config cfg;
> > + dma_cookie_t cookie;
> > + int nents, rc;
> > +
> > + if (!d)
> > + return -ENODEV;
> > +
> > + if (!chan)
> > + return -ENXIO;
> > +
> > + if (WARN_ON(!rc_src || !dst))
> > + return -EINVAL;
> > +
> > + if (WARN_ON(sg_dma_address(sgl)))
> > + return -EINVAL;
> > +
> > + sg_init_one(sgl, rc_src, len);
> > + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> > + if (nents <= 0)
> > + return -EIO;
> > +
> > + memset(&cfg, 0, sizeof(cfg));
> > + cfg.dst_addr = dst;
> > + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> > + cfg.direction = DMA_MEM_TO_DEV;
> > +
> > + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> > + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> > + if (!txd) {
> > + rc = -EIO;
> > + goto out_unmap;
> > + }
> > +
> > + txd->callback_result = ntb_transport_edma_tx_cb;
> > + txd->callback_param = entry;
> > +
> > + cookie = dmaengine_submit(txd);
> > + if (dma_submit_error(cookie)) {
> > + rc = -EIO;
> > + goto out_unmap;
> > + }
> > + dma_async_issue_pending(chan);
> > + return 0;
> > +out_unmap:
> > + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + unsigned int len = entry->len;
> > + struct dma_chan *chan;
> > + u32 issue, idx, head;
> > + dma_addr_t dst;
> > + int rc;
> > +
> > + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> > +
> > + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> > + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> > + issue = edma->tx_issue;
> > + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> > + qp->tx_ring_full++;
> > + return -ENOSPC;
> > + }
> > +
> > + /*
> > + * ntb_transport_edma_tx_work() checks entry->flags
> > + * so it needs to be set before tx_issue++.
> > + */
> > + idx = ntb_edma_ring_idx(issue);
> > + in = NTB_DESC_TX_I(qp, idx);
> > + in->data = (uintptr_t)entry;
> > +
> > + /* Make in->data visible before tx_issue++ */
> > + smp_wmb();
> > +
> > + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> > + }
> > +
> > + /* Publish the final transfer length to the other end */
> > + out = NTB_DESC_TX_O(qp, idx);
> > + iowrite32(len, &out->len);
> > + ioread32(&out->len);
> > +
> > + if (unlikely(!len)) {
> > + entry->flags |= DESC_DONE_FLAG;
> > + queue_work(ctx->wq, &edma->tx_work);
> > + return 0;
> > + }
> > +
> > + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> > + dma_rmb();
> > +
> > + /* kick remote eDMA read transfer */
> > + dst = (dma_addr_t)in->addr;
> > + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> > + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> > + entry->buf, dst, entry);
> > + if (rc) {
> > + entry->errors++;
> > + entry->len = -EIO;
> > + entry->flags |= DESC_DONE_FLAG;
> > + queue_work(ctx->wq, &edma->tx_work);
> > + }
> > + return 0;
> > +}
> > +
> > +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry,
> > + void *cb, void *data, unsigned int len,
> > + unsigned int flags)
> > +{
> > + struct device *dma_dev;
> > +
> > + if (entry->addr) {
> > + /* Deferred unmap */
> > + dma_dev = get_dma_dev(qp->ndev);
> > + dma_unmap_single(dma_dev, entry->addr, entry->len,
> > + DMA_TO_DEVICE);
> > + }
> > +
> > + entry->cb_data = cb;
> > + entry->buf = data;
> > + entry->len = len;
> > + entry->flags = flags;
> > + entry->errors = 0;
> > + entry->addr = 0;
> > +
> > + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> > +
> > + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> > +}
> > +
> > +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + struct device *dma_dev = get_dma_dev(qp->ndev);
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > + struct ntb_edma_desc *in, __iomem *out;
> > + unsigned int len = entry->len;
> > + void *data = entry->buf;
> > + dma_addr_t dst;
> > + u32 idx;
> > + int rc;
> > +
> > + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> > + rc = dma_mapping_error(dma_dev, dst);
> > + if (rc)
> > + return rc;
> > +
> > + guard(spinlock_bh)(&edma->rx_lock);
> > +
> > + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> > + READ_ONCE(edma->rx_cons))) {
> > + rc = -ENOSPC;
> > + goto out_unmap;
> > + }
> > +
> > + idx = ntb_edma_ring_idx(edma->rx_prod);
> > + in = NTB_DESC_RX_I(qp, idx);
> > + out = NTB_DESC_RX_O(qp, idx);
> > +
> > + iowrite32(len, &out->len);
> > + iowrite64(dst, &out->addr);
> > +
> > + WARN_ON(in->flags & DESC_DONE_FLAG);
> > + in->data = (uintptr_t)entry;
> > + entry->addr = dst;
> > +
> > + /* Ensure len/addr are visible before the head update */
> > + dma_wmb();
> > +
> > + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> > + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> > +
> > + return 0;
> > +out_unmap:
> > + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + int rc;
> > +
> > + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> > + if (rc) {
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> > + &qp->rx_free_q);
> > + return rc;
> > + }
> > +
> > + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> > +
> > + if (qp->active)
> > + tasklet_schedule(&qp->rxc_db_work);
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_ctx *nt = qp->transport;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + queue_work(ctx->wq, &edma->rx_work);
> > + queue_work(ctx->wq, &edma->tx_work);
> > +}
> > +
> > +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> > + unsigned int qp_num)
> > +{
> > + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> > + struct ntb_transport_qp_edma *edma;
> > + struct ntb_dev *ndev = nt->ndev;
> > + int node;
> > +
> > + node = dev_to_node(&ndev->dev);
> > +
> > + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> > + if (!qp->priv)
> > + return -ENOMEM;
> > +
> > + edma = (struct ntb_transport_qp_edma *)qp->priv;
> > + edma->qp = qp;
> > + edma->rx_prod = 0;
> > + edma->rx_cons = 0;
> > + edma->tx_cons = 0;
> > + edma->tx_issue = 0;
> > +
> > + spin_lock_init(&edma->rx_lock);
> > + spin_lock_init(&edma->tx_lock);
> > +
> > + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> > + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> > + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> > +{
> > + struct ntb_transport_qp_edma *edma = qp->priv;
> > +
> > + cancel_work_sync(&edma->db_work);
> > + cancel_work_sync(&edma->rx_work);
> > + cancel_work_sync(&edma->tx_work);
> > +
> > + kfree(qp->priv);
> > +}
> > +
> > +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int rc;
> > +
> > + rc = ntb_transport_edma_ep_init(nt);
> > + if (rc)
> > + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> > +
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct pci_dev *pdev = ndev->pdev;
> > + int rc;
> > +
> > + rc = ntb_transport_edma_rc_init(nt);
> > + if (rc)
> > + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> > +
> > + return rc;
> > +}
> > +
> > +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> > + unsigned int *mw_count)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + struct ntb_transport_ctx_edma *ctx = nt->priv;
> > +
> > + if (!use_remote_edma)
> > + return 0;
> > +
> > + /*
> > + * We need at least one MW for the transport plus one MW reserved
> > + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> > + */
> > + if (*mw_count <= 1) {
> > + dev_err(&ndev->dev,
> > + "remote eDMA requires at least two MWS (have %u)\n",
> > + *mw_count);
> > + return -ENODEV;
> > + }
> > +
> > + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> > + if (!ctx->wq) {
> > + ntb_transport_edma_uninit(nt);
> > + return -ENOMEM;
> > + }
> > +
> > + /* Reserve the last peer MW exclusively for the eDMA window. */
> > + *mw_count -= 1;
> > +
> > + return 0;
> > +}
> > +
> > +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> > +{
> > + ntb_transport_edma_uninit(nt);
> > +}
> > +
> > +static const struct ntb_transport_backend_ops edma_backend_ops = {
> > + .enable = ntb_transport_edma_enable,
> > + .disable = ntb_transport_edma_disable,
> > + .qp_init = ntb_transport_edma_qp_init,
> > + .qp_free = ntb_transport_edma_qp_free,
> > + .pre_link_up = ntb_transport_edma_pre_link_up,
> > + .post_link_up = ntb_transport_edma_post_link_up,
> > + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> > + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> > + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> > + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> > + .rx_poll = ntb_transport_edma_rx_poll,
> > + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> > +};
> > +
> > +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> > +{
> > + struct ntb_dev *ndev = nt->ndev;
> > + int node;
> > +
> > + node = dev_to_node(&ndev->dev);
> > + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> > + node);
> > + if (!nt->priv)
> > + return -ENOMEM;
> > +
> > + nt->backend_ops = edma_backend_ops;
> > + /*
> > + * On remote eDMA mode, one DMA read channel is used for Host side
> > + * to interrupt EP.
> > + */
> > + use_msi = false;
> > + return 0;
> > +}
> > diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> > index 51ff08062d73..9fff65980d3d 100644
> > --- a/drivers/ntb/ntb_transport_internal.h
> > +++ b/drivers/ntb/ntb_transport_internal.h
> > @@ -8,6 +8,7 @@
> > extern unsigned long max_mw_size;
> > extern unsigned int transport_mtu;
> > extern bool use_msi;
> > +extern bool use_remote_edma;
> >
> > #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
> >
> > @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> > struct ntb_payload_header __iomem *tx_hdr;
> > struct ntb_payload_header *rx_hdr;
> > };
> > +
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > + dma_addr_t addr;
> > + struct scatterlist sgl;
> > +#endif
> > };
> >
> > struct ntb_rx_info {
> > @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> > unsigned int qp_num);
> > struct device *get_dma_dev(struct ntb_dev *ndev);
> >
> > +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> > +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> > +#else
> > +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> > +
> > #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-07 14:54 ` Koichiro Den
@ 2026-01-07 19:02 ` Dave Jiang
2026-01-08 1:25 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-07 19:02 UTC (permalink / raw)
To: Koichiro Den
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On 1/7/26 7:54 AM, Koichiro Den wrote:
> On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
>>
>>
>> On 12/17/25 8:16 AM, Koichiro Den wrote:
>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
>>> located on the endpoint, to be driven by both host and endpoint.
>>>
>>> The endpoint exposes a dedicated memory window which contains the eDMA
>>> register block, a small control structure (struct ntb_edma_info) and
>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
>>> its local eDMA write channels for its transmission, while host side
>>> uses the remote eDMA read channels for its transmission.
>>>
>>> A key benefit of this backend is that the memory window no longer needs
>>> to carry data-plane payload. This makes the design less sensitive to
>>> limited memory window space and allows scaling to multiple queue pairs.
>>> The memory window layout is specific to the eDMA-backed backend, so
>>> there is no automatic fallback to the memcpy-based default transport
>>> that requires the different layout.
>>>
>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
>>> ---
>>> drivers/ntb/Kconfig | 12 +
>>> drivers/ntb/Makefile | 2 +
>>> drivers/ntb/ntb_transport_core.c | 15 +-
>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
>>> drivers/ntb/ntb_transport_internal.h | 15 +
>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
>>>
>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
>>> index df16c755b4da..5ba6d0b7f5ba 100644
>>> --- a/drivers/ntb/Kconfig
>>> +++ b/drivers/ntb/Kconfig
>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>>>
>>> If unsure, say N.
>>>
>>> +config NTB_TRANSPORT_EDMA
>>> + bool "NTB Transport backed by remote eDMA"
>>> + depends on NTB_TRANSPORT
>>> + depends on PCI
>>> + select DMA_ENGINE
>>> + select NTB_EDMA
>>> + help
>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
>>> + exposed through a dedicated NTB memory window. The host uses the
>>> + endpoint's eDMA engine to move data in both directions.
>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
>>> +
>>> endif # NTB
>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
>>> index 9b66e5fafbc0..b9086b32ecde 100644
>>> --- a/drivers/ntb/Makefile
>>> +++ b/drivers/ntb/Makefile
>>> @@ -6,3 +6,5 @@ ntb-y := core.o
>>> ntb-$(CONFIG_NTB_MSI) += msi.o
>>>
>>> ntb_transport-y := ntb_transport_core.o
>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
>>> index 40c2548f5930..bd21232f26fe 100644
>>> --- a/drivers/ntb/ntb_transport_core.c
>>> +++ b/drivers/ntb/ntb_transport_core.c
>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
>>> #endif
>>>
>>> +bool use_remote_edma;
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> +module_param(use_remote_edma, bool, 0644);
>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
>>> +#endif
>>
>> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
>
> Agreed. I plan to drop 'use_remote_edma' and instead,
> - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
> - introduce ntb_transport_backend_register() for transports to self-register via
> struct ntb_transport_backend { .name, .ops }, and
> - have the core select the backend whose .name matches transport_type.
>
> I think this should keep any non-default transport-specific logic out of
> ntb_transport_core, or at least keep it to a minimum, while still allowing
> non-defualt transports (*ntb_transport_edma is the only choice for now
> though) to plug in cleanly.
>
> If you see a cleaner approach, I would appreciate it if you could elaborate
> a bit more on your idea.
Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing? I'm trying to see if we can do away with the module param. Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
>
> Thanks,
> Koichiro
>
>>
>> DJ
>>
>>> +
>>> static struct dentry *nt_debugfs_dir;
>>>
>>> /* Only two-ports NTB devices are supported */
>>> @@ -156,7 +162,7 @@ enum {
>>> #define drv_client(__drv) \
>>> container_of((__drv), struct ntb_transport_client, driver)
>>>
>>> -#define NTB_QP_DEF_NUM_ENTRIES 100
>>> +#define NTB_QP_DEF_NUM_ENTRIES 128
>>> #define NTB_LINK_DOWN_TIMEOUT 10
>>>
>>> static void ntb_transport_rxc_db(unsigned long data);
>>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
>>>
>>> nt->ndev = ndev;
>>>
>>> - rc = ntb_transport_default_init(nt);
>>> + if (use_remote_edma)
>>> + rc = ntb_transport_edma_init(nt);
>>> + else
>>> + rc = ntb_transport_default_init(nt);
>>> +
>>> if (rc)
>>> return rc;
>>>
>>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
>>>
>>> nt->qp_bitmap_free &= ~qp_bit;
>>>
>>> + qp->qp_bit = qp_bit;
>>> qp->cb_data = data;
>>> qp->rx_handler = handlers->rx_handler;
>>> qp->tx_handler = handlers->tx_handler;
>>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
>>> new file mode 100644
>>> index 000000000000..6ae5da0a1367
>>> --- /dev/null
>>> +++ b/drivers/ntb/ntb_transport_edma.c
>>> @@ -0,0 +1,987 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * NTB transport backend for remote DesignWare eDMA.
>>> + *
>>> + * This implements the backend_ops used when use_remote_edma=1 and
>>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
>>> + */
>>> +
>>> +#include <linux/bug.h>
>>> +#include <linux/compiler.h>
>>> +#include <linux/debugfs.h>
>>> +#include <linux/dmaengine.h>
>>> +#include <linux/dma-mapping.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/io-64-nonatomic-lo-hi.h>
>>> +#include <linux/ntb.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/pci-epc.h>
>>> +#include <linux/seq_file.h>
>>> +#include <linux/slab.h>
>>> +
>>> +#include "hw/edma/ntb_hw_edma.h"
>>> +#include "ntb_transport_internal.h"
>>> +
>>> +#define NTB_EDMA_RING_ORDER 7
>>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
>>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
>>> +
>>> +#define NTB_EDMA_MAX_POLL 32
>>> +
>>> +/*
>>> + * Remote eDMA mode implementation
>>> + */
>>> +struct ntb_transport_ctx_edma {
>>> + remote_edma_mode_t remote_edma_mode;
>>> + struct device *dma_dev;
>>> + struct workqueue_struct *wq;
>>> + struct ntb_edma_chans chans;
>>> +};
>>> +
>>> +struct ntb_transport_qp_edma {
>>> + struct ntb_transport_qp *qp;
>>> +
>>> + /*
>>> + * For ensuring peer notification in non-atomic context.
>>> + * ntb_peer_db_set might sleep or schedule.
>>> + */
>>> + struct work_struct db_work;
>>> +
>>> + u32 rx_prod;
>>> + u32 rx_cons;
>>> + u32 tx_cons;
>>> + u32 tx_issue;
>>> +
>>> + spinlock_t rx_lock;
>>> + spinlock_t tx_lock;
>>> +
>>> + struct work_struct rx_work;
>>> + struct work_struct tx_work;
>>> +};
>>> +
>>> +struct ntb_edma_desc {
>>> + u32 len;
>>> + u32 flags;
>>> + u64 addr; /* DMA address */
>>> + u64 data;
>>> +};
>>> +
>>> +struct ntb_edma_ring {
>>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
>>> + u32 head;
>>> + u32 tail;
>>> +};
>>> +
>>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
>>> +}
>>> +
>>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
>>> +}
>>> +
>>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
>>> +{
>>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
>>> +}
>>> +
>>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return n ^ !!ntb_qp_edma_is_ep(qp);
>>> +}
>>> +
>>> +static inline struct ntb_edma_ring *
>>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
>>> +{
>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>> +
>>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
>>> +}
>>> +
>>> +static inline struct ntb_edma_ring __iomem *
>>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
>>> +{
>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>> +
>>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
>>> +}
>>> +
>>> +static inline struct ntb_edma_desc *
>>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->desc[i];
>>> +}
>>> +
>>> +static inline struct ntb_edma_desc __iomem *
>>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
>>> + unsigned int i)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
>>> +}
>>> +
>>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->head;
>>> +}
>>> +
>>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->head;
>>> +}
>>> +
>>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_local(qp, n)->tail;
>>> +}
>>> +
>>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
>>> + unsigned int n)
>>> +{
>>> + return &ntb_edma_ring_remote(qp, n)->tail;
>>> +}
>>> +
>>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
>>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
>>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
>>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
>>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
>>> +
>>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
>>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
>>> +
>>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
>>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
>>> +
>>> +/* ntb_edma_ring helpers */
>>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
>>> +{
>>> + return v & NTB_EDMA_RING_MASK;
>>> +}
>>> +
>>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
>>> +{
>>> + if (head >= tail) {
>>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
>>> + return head - tail;
>>> + }
>>> +
>>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
>>> + return U32_MAX - tail + head + 1;
>>> +}
>>> +
>>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
>>> +{
>>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
>>> +}
>>> +
>>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
>>> +{
>>> + return ntb_edma_ring_free_entry(head, tail) == 0;
>>> +}
>>> +
>>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + unsigned int head, tail;
>>> +
>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>> + /* In this scope, only 'head' might proceed */
>>> + tail = READ_ONCE(edma->tx_issue);
>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>> + }
>>> + /*
>>> + * 'used' amount indicates how much the other end has refilled,
>>> + * which are available for us to use for TX.
>>> + */
>>> + return ntb_edma_ring_used_entry(head, tail);
>>> +}
>>> +
>>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
>>> + struct ntb_transport_qp *qp)
>>> +{
>>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
>>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
>>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
>>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
>>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
>>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
>>> +
>>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
>>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
>>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
>>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
>>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
>>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
>>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
>>> + seq_putc(s, '\n');
>>> +
>>> + seq_puts(s, "Using Remote eDMA - Yes\n");
>>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
>>> +}
>>> +
>>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> +
>>> + if (ctx->wq)
>>> + destroy_workqueue(ctx->wq);
>>> + ctx->wq = NULL;
>>> +
>>> + ntb_edma_teardown_chans(&ctx->chans);
>>> +
>>> + switch (ctx->remote_edma_mode) {
>>> + case REMOTE_EDMA_EP:
>>> + ntb_edma_teardown_mws(nt->ndev);
>>> + break;
>>> + case REMOTE_EDMA_RC:
>>> + ntb_edma_teardown_peer(nt->ndev);
>>> + break;
>>> + case REMOTE_EDMA_UNKNOWN:
>>> + default:
>>> + break;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
>>> +}
>>> +
>>> +static void ntb_transport_edma_db_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma =
>>> + container_of(work, struct ntb_transport_qp_edma, db_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> +
>>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
>>> +}
>>> +
>>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
>>> +{
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> +
>>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
>>> + return;
>>> +
>>> + /*
>>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
>>> + * may sleep, delegate the actual doorbell write to a workqueue.
>>> + */
>>> + queue_work(system_highpri_wq, &edma->db_work);
>>> +}
>>> +
>>> +static void ntb_transport_edma_isr(void *data, int qp_num)
>>> +{
>>> + struct ntb_transport_ctx *nt = data;
>>> + struct ntb_transport_qp_edma *edma;
>>> + struct ntb_transport_ctx_edma *ctx;
>>> + struct ntb_transport_qp *qp;
>>> +
>>> + if (qp_num < 0 || qp_num >= nt->qp_count)
>>> + return;
>>> +
>>> + qp = &nt->qp_vec[qp_num];
>>> + if (WARN_ON(!qp))
>>> + return;
>>> +
>>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
>>> + edma = qp->priv;
>>> +
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int peer_mw;
>>> + int rc;
>>> +
>>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
>>> + return 0;
>>> +
>>> + peer_mw = ntb_peer_mw_count(ndev);
>>> + if (peer_mw <= 0)
>>> + return -ENODEV;
>>> +
>>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
>>> + return rc;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>> + goto err_teardown_peer;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
>>> + rc);
>>> + goto err_teardown_chans;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
>>> + return 0;
>>> +
>>> +err_teardown_chans:
>>> + ntb_edma_teardown_chans(&ctx->chans);
>>> +err_teardown_peer:
>>> + ntb_edma_teardown_peer(ndev);
>>> + return rc;
>>> +}
>>> +
>>> +
>>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int peer_mw;
>>> + int rc;
>>> +
>>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
>>> + return 0;
>>> +
>>> + /**
>>> + * This check assumes that the endpoint (pci-epf-vntb.c)
>>> + * ntb_dev_ops implements .get_private_data() while the host side
>>> + * (ntb_hw_epf.c) does not.
>>> + */
>>> + if (!ntb_get_private_data(ndev))
>>> + return 0;
>>> +
>>> + peer_mw = ntb_peer_mw_count(ndev);
>>> + if (peer_mw <= 0)
>>> + return -ENODEV;
>>> +
>>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
>>> + ntb_transport_edma_isr, nt);
>>> + if (rc) {
>>> + dev_err(&pdev->dev,
>>> + "Failed to set up memory window for eDMA: %d\n", rc);
>>> + return rc;
>>> + }
>>> +
>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
>>> + if (rc) {
>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>> + ntb_edma_teardown_mws(ndev);
>>> + return rc;
>>> + }
>>> +
>>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
>>> + return 0;
>>> +}
>>> +
>>> +
>>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
>>> + unsigned int qp_num)
>>> +{
>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct ntb_queue_entry *entry;
>>> + struct ntb_transport_mw *mw;
>>> + unsigned int mw_num, mw_count, qp_count;
>>> + unsigned int qp_offset, rx_info_offset;
>>> + unsigned int mw_size, mw_size_per_qp;
>>> + unsigned int num_qps_mw;
>>> + size_t edma_total;
>>> + unsigned int i;
>>> + int node;
>>> +
>>> + mw_count = nt->mw_count;
>>> + qp_count = nt->qp_count;
>>> +
>>> + mw_num = QP_TO_MW(nt, qp_num);
>>> + mw = &nt->mw_vec[mw_num];
>>> +
>>> + if (!mw->virt_addr)
>>> + return -ENOMEM;
>>> +
>>> + if (mw_num < qp_count % mw_count)
>>> + num_qps_mw = qp_count / mw_count + 1;
>>> + else
>>> + num_qps_mw = qp_count / mw_count;
>>> +
>>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
>>> + if (max_mw_size && mw_size > max_mw_size)
>>> + mw_size = max_mw_size;
>>> +
>>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
>>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
>>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
>>> +
>>> + qp->tx_mw_size = mw_size_per_qp;
>>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
>>> + if (!qp->tx_mw)
>>> + return -EINVAL;
>>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
>>> + if (!qp->tx_mw_phys)
>>> + return -EINVAL;
>>> + qp->rx_info = qp->tx_mw + rx_info_offset;
>>> + qp->rx_buff = mw->virt_addr + qp_offset;
>>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
>>> +
>>> + /* Due to housekeeping, there must be at least 2 buffs */
>>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>> +
>>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
>>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
>>> + if (rx_info_offset < edma_total) {
>>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
>>> + edma_total, rx_info_offset);
>>> + return -EINVAL;
>>> + }
>>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
>>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
>>> +
>>> + /*
>>> + * Checking to see if we have more entries than the default.
>>> + * We should add additional entries if that is the case so we
>>> + * can be in sync with the transport frames.
>>> + */
>>> + node = dev_to_node(&ndev->dev);
>>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
>>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
>>> + if (!entry)
>>> + return -ENOMEM;
>>> +
>>> + entry->qp = qp;
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>> + &qp->rx_free_q);
>>> + qp->rx_alloc_entry++;
>>> + }
>>> +
>>> + memset(qp->rx_buff, 0, edma_total);
>>> +
>>> + qp->rx_pkts = 0;
>>> + qp->tx_pkts = 0;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_queue_entry *entry;
>>> + struct ntb_edma_desc *in;
>>> + unsigned int len;
>>> + bool link_down;
>>> + u32 idx;
>>> +
>>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
>>> + edma->rx_cons) == 0)
>>> + return 0;
>>> +
>>> + idx = ntb_edma_ring_idx(edma->rx_cons);
>>> + in = NTB_DESC_RX_I(qp, idx);
>>> + if (!(in->flags & DESC_DONE_FLAG))
>>> + return 0;
>>> +
>>> + link_down = in->flags & LINK_DOWN_FLAG;
>>> + in->flags = 0;
>>> + len = in->len; /* might be smaller than entry->len */
>>> +
>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>> + if (WARN_ON(!entry))
>>> + return 0;
>>> +
>>> + if (link_down) {
>>> + ntb_qp_link_down(qp);
>>> + edma->rx_cons++;
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>> + return 1;
>>> + }
>>> +
>>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
>>> +
>>> + qp->rx_bytes += len;
>>> + qp->rx_pkts++;
>>> + edma->rx_cons++;
>>> +
>>> + if (qp->rx_handler && qp->client_ready)
>>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
>>> +
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>> + return 1;
>>> +}
>>> +
>>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = container_of(
>>> + work, struct ntb_transport_qp_edma, rx_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>> + unsigned int i;
>>> +
>>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
>>> + if (!ntb_transport_edma_rx_complete(qp))
>>> + break;
>>> + }
>>> +
>>> + if (ntb_transport_edma_rx_complete(qp))
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> +}
>>> +
>>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = container_of(
>>> + work, struct ntb_transport_qp_edma, tx_work);
>>> + struct ntb_transport_qp *qp = edma->qp;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + struct ntb_queue_entry *entry;
>>> + unsigned int len;
>>> + void *cb_data;
>>> + u32 idx;
>>> +
>>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
>>> + edma->tx_cons) != 0) {
>>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
>>> + smp_rmb();
>>> +
>>> + idx = ntb_edma_ring_idx(edma->tx_cons);
>>> + in = NTB_DESC_TX_I(qp, idx);
>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
>>> + break;
>>> +
>>> + in->data = 0;
>>> +
>>> + cb_data = entry->cb_data;
>>> + len = entry->len;
>>> +
>>> + out = NTB_DESC_TX_O(qp, idx);
>>> +
>>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
>>> +
>>> + /*
>>> + * No need to add barrier in-between to enforce ordering here.
>>> + * The other side proceeds only after both flags and tail are
>>> + * updated.
>>> + */
>>> + iowrite32(entry->flags, &out->flags);
>>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
>>> +
>>> + ntb_transport_edma_notify_peer(edma);
>>> +
>>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
>>> + &qp->tx_free_q);
>>> +
>>> + if (qp->tx_handler)
>>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
>>> +
>>> + /* stat updates */
>>> + qp->tx_bytes += len;
>>> + qp->tx_pkts++;
>>> + }
>>> +}
>>> +
>>> +static void ntb_transport_edma_tx_cb(void *data,
>>> + const struct dmaengine_result *res)
>>> +{
>>> + struct ntb_queue_entry *entry = data;
>>> + struct ntb_transport_qp *qp = entry->qp;
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + enum dmaengine_tx_result dma_err = res->result;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + switch (dma_err) {
>>> + case DMA_TRANS_READ_FAILED:
>>> + case DMA_TRANS_WRITE_FAILED:
>>> + case DMA_TRANS_ABORTED:
>>> + entry->errors++;
>>> + entry->len = -EIO;
>>> + break;
>>> + case DMA_TRANS_NOERROR:
>>> + default:
>>> + break;
>>> + }
>>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
>>> + sg_dma_address(&entry->sgl) = 0;
>>> +
>>> + entry->flags |= DESC_DONE_FLAG;
>>> +
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
>>> + size_t len, void *rc_src, dma_addr_t dst,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct scatterlist *sgl = &entry->sgl;
>>> + struct dma_async_tx_descriptor *txd;
>>> + struct dma_slave_config cfg;
>>> + dma_cookie_t cookie;
>>> + int nents, rc;
>>> +
>>> + if (!d)
>>> + return -ENODEV;
>>> +
>>> + if (!chan)
>>> + return -ENXIO;
>>> +
>>> + if (WARN_ON(!rc_src || !dst))
>>> + return -EINVAL;
>>> +
>>> + if (WARN_ON(sg_dma_address(sgl)))
>>> + return -EINVAL;
>>> +
>>> + sg_init_one(sgl, rc_src, len);
>>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
>>> + if (nents <= 0)
>>> + return -EIO;
>>> +
>>> + memset(&cfg, 0, sizeof(cfg));
>>> + cfg.dst_addr = dst;
>>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>> + cfg.direction = DMA_MEM_TO_DEV;
>>> +
>>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
>>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
>>> + if (!txd) {
>>> + rc = -EIO;
>>> + goto out_unmap;
>>> + }
>>> +
>>> + txd->callback_result = ntb_transport_edma_tx_cb;
>>> + txd->callback_param = entry;
>>> +
>>> + cookie = dmaengine_submit(txd);
>>> + if (dma_submit_error(cookie)) {
>>> + rc = -EIO;
>>> + goto out_unmap;
>>> + }
>>> + dma_async_issue_pending(chan);
>>> + return 0;
>>> +out_unmap:
>>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + unsigned int len = entry->len;
>>> + struct dma_chan *chan;
>>> + u32 issue, idx, head;
>>> + dma_addr_t dst;
>>> + int rc;
>>> +
>>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
>>> +
>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>> + issue = edma->tx_issue;
>>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
>>> + qp->tx_ring_full++;
>>> + return -ENOSPC;
>>> + }
>>> +
>>> + /*
>>> + * ntb_transport_edma_tx_work() checks entry->flags
>>> + * so it needs to be set before tx_issue++.
>>> + */
>>> + idx = ntb_edma_ring_idx(issue);
>>> + in = NTB_DESC_TX_I(qp, idx);
>>> + in->data = (uintptr_t)entry;
>>> +
>>> + /* Make in->data visible before tx_issue++ */
>>> + smp_wmb();
>>> +
>>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
>>> + }
>>> +
>>> + /* Publish the final transfer length to the other end */
>>> + out = NTB_DESC_TX_O(qp, idx);
>>> + iowrite32(len, &out->len);
>>> + ioread32(&out->len);
>>> +
>>> + if (unlikely(!len)) {
>>> + entry->flags |= DESC_DONE_FLAG;
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> + return 0;
>>> + }
>>> +
>>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
>>> + dma_rmb();
>>> +
>>> + /* kick remote eDMA read transfer */
>>> + dst = (dma_addr_t)in->addr;
>>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
>>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
>>> + entry->buf, dst, entry);
>>> + if (rc) {
>>> + entry->errors++;
>>> + entry->len = -EIO;
>>> + entry->flags |= DESC_DONE_FLAG;
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry,
>>> + void *cb, void *data, unsigned int len,
>>> + unsigned int flags)
>>> +{
>>> + struct device *dma_dev;
>>> +
>>> + if (entry->addr) {
>>> + /* Deferred unmap */
>>> + dma_dev = get_dma_dev(qp->ndev);
>>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
>>> + DMA_TO_DEVICE);
>>> + }
>>> +
>>> + entry->cb_data = cb;
>>> + entry->buf = data;
>>> + entry->len = len;
>>> + entry->flags = flags;
>>> + entry->errors = 0;
>>> + entry->addr = 0;
>>> +
>>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
>>> +
>>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> + struct ntb_edma_desc *in, __iomem *out;
>>> + unsigned int len = entry->len;
>>> + void *data = entry->buf;
>>> + dma_addr_t dst;
>>> + u32 idx;
>>> + int rc;
>>> +
>>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
>>> + rc = dma_mapping_error(dma_dev, dst);
>>> + if (rc)
>>> + return rc;
>>> +
>>> + guard(spinlock_bh)(&edma->rx_lock);
>>> +
>>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
>>> + READ_ONCE(edma->rx_cons))) {
>>> + rc = -ENOSPC;
>>> + goto out_unmap;
>>> + }
>>> +
>>> + idx = ntb_edma_ring_idx(edma->rx_prod);
>>> + in = NTB_DESC_RX_I(qp, idx);
>>> + out = NTB_DESC_RX_O(qp, idx);
>>> +
>>> + iowrite32(len, &out->len);
>>> + iowrite64(dst, &out->addr);
>>> +
>>> + WARN_ON(in->flags & DESC_DONE_FLAG);
>>> + in->data = (uintptr_t)entry;
>>> + entry->addr = dst;
>>> +
>>> + /* Ensure len/addr are visible before the head update */
>>> + dma_wmb();
>>> +
>>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
>>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
>>> +
>>> + return 0;
>>> +out_unmap:
>>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
>>> + struct ntb_queue_entry *entry)
>>> +{
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
>>> + if (rc) {
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>> + &qp->rx_free_q);
>>> + return rc;
>>> + }
>>> +
>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
>>> +
>>> + if (qp->active)
>>> + tasklet_schedule(&qp->rxc_db_work);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_ctx *nt = qp->transport;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + queue_work(ctx->wq, &edma->rx_work);
>>> + queue_work(ctx->wq, &edma->tx_work);
>>> +}
>>> +
>>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
>>> + unsigned int qp_num)
>>> +{
>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>> + struct ntb_transport_qp_edma *edma;
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + int node;
>>> +
>>> + node = dev_to_node(&ndev->dev);
>>> +
>>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
>>> + if (!qp->priv)
>>> + return -ENOMEM;
>>> +
>>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
>>> + edma->qp = qp;
>>> + edma->rx_prod = 0;
>>> + edma->rx_cons = 0;
>>> + edma->tx_cons = 0;
>>> + edma->tx_issue = 0;
>>> +
>>> + spin_lock_init(&edma->rx_lock);
>>> + spin_lock_init(&edma->tx_lock);
>>> +
>>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
>>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
>>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
>>> +{
>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>> +
>>> + cancel_work_sync(&edma->db_work);
>>> + cancel_work_sync(&edma->rx_work);
>>> + cancel_work_sync(&edma->tx_work);
>>> +
>>> + kfree(qp->priv);
>>> +}
>>> +
>>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_ep_init(nt);
>>> + if (rc)
>>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
>>> +
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct pci_dev *pdev = ndev->pdev;
>>> + int rc;
>>> +
>>> + rc = ntb_transport_edma_rc_init(nt);
>>> + if (rc)
>>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
>>> +
>>> + return rc;
>>> +}
>>> +
>>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
>>> + unsigned int *mw_count)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>> +
>>> + if (!use_remote_edma)
>>> + return 0;
>>> +
>>> + /*
>>> + * We need at least one MW for the transport plus one MW reserved
>>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
>>> + */
>>> + if (*mw_count <= 1) {
>>> + dev_err(&ndev->dev,
>>> + "remote eDMA requires at least two MWS (have %u)\n",
>>> + *mw_count);
>>> + return -ENODEV;
>>> + }
>>> +
>>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
>>> + if (!ctx->wq) {
>>> + ntb_transport_edma_uninit(nt);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + /* Reserve the last peer MW exclusively for the eDMA window. */
>>> + *mw_count -= 1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
>>> +{
>>> + ntb_transport_edma_uninit(nt);
>>> +}
>>> +
>>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
>>> + .enable = ntb_transport_edma_enable,
>>> + .disable = ntb_transport_edma_disable,
>>> + .qp_init = ntb_transport_edma_qp_init,
>>> + .qp_free = ntb_transport_edma_qp_free,
>>> + .pre_link_up = ntb_transport_edma_pre_link_up,
>>> + .post_link_up = ntb_transport_edma_post_link_up,
>>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
>>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
>>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
>>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
>>> + .rx_poll = ntb_transport_edma_rx_poll,
>>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
>>> +};
>>> +
>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + struct ntb_dev *ndev = nt->ndev;
>>> + int node;
>>> +
>>> + node = dev_to_node(&ndev->dev);
>>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
>>> + node);
>>> + if (!nt->priv)
>>> + return -ENOMEM;
>>> +
>>> + nt->backend_ops = edma_backend_ops;
>>> + /*
>>> + * On remote eDMA mode, one DMA read channel is used for Host side
>>> + * to interrupt EP.
>>> + */
>>> + use_msi = false;
>>> + return 0;
>>> +}
>>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
>>> index 51ff08062d73..9fff65980d3d 100644
>>> --- a/drivers/ntb/ntb_transport_internal.h
>>> +++ b/drivers/ntb/ntb_transport_internal.h
>>> @@ -8,6 +8,7 @@
>>> extern unsigned long max_mw_size;
>>> extern unsigned int transport_mtu;
>>> extern bool use_msi;
>>> +extern bool use_remote_edma;
>>>
>>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
>>>
>>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
>>> struct ntb_payload_header __iomem *tx_hdr;
>>> struct ntb_payload_header *rx_hdr;
>>> };
>>> +
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> + dma_addr_t addr;
>>> + struct scatterlist sgl;
>>> +#endif
>>> };
>>>
>>> struct ntb_rx_info {
>>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
>>> unsigned int qp_num);
>>> struct device *get_dma_dev(struct ntb_dev *ndev);
>>>
>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
>>> +#else
>>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>> +{
>>> + return -EOPNOTSUPP;
>>> +}
>>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
>>> +
>>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
>>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-07 19:02 ` Dave Jiang
@ 2026-01-08 1:25 ` Koichiro Den
2026-01-08 17:55 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2026-01-08 1:25 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Wed, Jan 07, 2026 at 12:02:15PM -0700, Dave Jiang wrote:
>
>
> On 1/7/26 7:54 AM, Koichiro Den wrote:
> > On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 12/17/25 8:16 AM, Koichiro Den wrote:
> >>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> >>> located on the endpoint, to be driven by both host and endpoint.
> >>>
> >>> The endpoint exposes a dedicated memory window which contains the eDMA
> >>> register block, a small control structure (struct ntb_edma_info) and
> >>> per-channel linked-list (LL) rings for read channels. Endpoint drives
> >>> its local eDMA write channels for its transmission, while host side
> >>> uses the remote eDMA read channels for its transmission.
> >>>
> >>> A key benefit of this backend is that the memory window no longer needs
> >>> to carry data-plane payload. This makes the design less sensitive to
> >>> limited memory window space and allows scaling to multiple queue pairs.
> >>> The memory window layout is specific to the eDMA-backed backend, so
> >>> there is no automatic fallback to the memcpy-based default transport
> >>> that requires the different layout.
> >>>
> >>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> >>> ---
> >>> drivers/ntb/Kconfig | 12 +
> >>> drivers/ntb/Makefile | 2 +
> >>> drivers/ntb/ntb_transport_core.c | 15 +-
> >>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> >>> drivers/ntb/ntb_transport_internal.h | 15 +
> >>> 5 files changed, 1029 insertions(+), 2 deletions(-)
> >>> create mode 100644 drivers/ntb/ntb_transport_edma.c
> >>>
> >>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> >>> index df16c755b4da..5ba6d0b7f5ba 100644
> >>> --- a/drivers/ntb/Kconfig
> >>> +++ b/drivers/ntb/Kconfig
> >>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >>>
> >>> If unsure, say N.
> >>>
> >>> +config NTB_TRANSPORT_EDMA
> >>> + bool "NTB Transport backed by remote eDMA"
> >>> + depends on NTB_TRANSPORT
> >>> + depends on PCI
> >>> + select DMA_ENGINE
> >>> + select NTB_EDMA
> >>> + help
> >>> + Enable a transport backend that uses a remote DesignWare eDMA engine
> >>> + exposed through a dedicated NTB memory window. The host uses the
> >>> + endpoint's eDMA engine to move data in both directions.
> >>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> >>> +
> >>> endif # NTB
> >>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> >>> index 9b66e5fafbc0..b9086b32ecde 100644
> >>> --- a/drivers/ntb/Makefile
> >>> +++ b/drivers/ntb/Makefile
> >>> @@ -6,3 +6,5 @@ ntb-y := core.o
> >>> ntb-$(CONFIG_NTB_MSI) += msi.o
> >>>
> >>> ntb_transport-y := ntb_transport_core.o
> >>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> >>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> >>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> >>> index 40c2548f5930..bd21232f26fe 100644
> >>> --- a/drivers/ntb/ntb_transport_core.c
> >>> +++ b/drivers/ntb/ntb_transport_core.c
> >>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> >>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> >>> #endif
> >>>
> >>> +bool use_remote_edma;
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> +module_param(use_remote_edma, bool, 0644);
> >>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> >>> +#endif
> >>
> >> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
> >
> > Agreed. I plan to drop 'use_remote_edma' and instead,
> > - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
> > - introduce ntb_transport_backend_register() for transports to self-register via
> > struct ntb_transport_backend { .name, .ops }, and
> > - have the core select the backend whose .name matches transport_type.
> >
> > I think this should keep any non-default transport-specific logic out of
> > ntb_transport_core, or at least keep it to a minimum, while still allowing
> > non-defualt transports (*ntb_transport_edma is the only choice for now
> > though) to plug in cleanly.
> >
> > If you see a cleaner approach, I would appreciate it if you could elaborate
> > a bit more on your idea.
>
Thank you for the comment, let me respond inline below.
> Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing?
At least in the current implementation, the remote eDMA use is an
all-or-nothing type rather than something that can be selected per
ntb_transport_mw.
The way remote eDMA consumes MW is quite similar to how ntb_msi uses them
today. Assuming multiple MWs are available, the last MW is reserved to
expose the remote eDMA info/register/LL regions to the host by packing all
of them into a single MW. In that sense, it does not map naturally to a
per-MW selection model.
> I'm trying to see if we can do away with the module param.
I think it is useful to keep an explicit way for an administrator to choose
the transport type (default vs edma). Even on platforms where dw-edma is
available, there can potentially be platform-specific or hard-to-reproduce
issues (e.g. problems that only show up with certain transfer patterns),
and having a way to fall back the long-existing traditional transport can
be valuable.
That said, I am not opposed to making the default behavior an automatic
selection, where edma is chosen when it's available and the parameter is
left unset.
> Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
I'm not sure how selecting the transport type at ntb_netdev probe time
would work in practice, and what additional benefit that would provide.
Kind regards,
Koichiro
>
> >
> > Thanks,
> > Koichiro
> >
> >>
> >> DJ
> >>
> >>> +
> >>> static struct dentry *nt_debugfs_dir;
> >>>
> >>> /* Only two-ports NTB devices are supported */
> >>> @@ -156,7 +162,7 @@ enum {
> >>> #define drv_client(__drv) \
> >>> container_of((__drv), struct ntb_transport_client, driver)
> >>>
> >>> -#define NTB_QP_DEF_NUM_ENTRIES 100
> >>> +#define NTB_QP_DEF_NUM_ENTRIES 128
> >>> #define NTB_LINK_DOWN_TIMEOUT 10
> >>>
> >>> static void ntb_transport_rxc_db(unsigned long data);
> >>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
> >>>
> >>> nt->ndev = ndev;
> >>>
> >>> - rc = ntb_transport_default_init(nt);
> >>> + if (use_remote_edma)
> >>> + rc = ntb_transport_edma_init(nt);
> >>> + else
> >>> + rc = ntb_transport_default_init(nt);
> >>> +
> >>> if (rc)
> >>> return rc;
> >>>
> >>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
> >>>
> >>> nt->qp_bitmap_free &= ~qp_bit;
> >>>
> >>> + qp->qp_bit = qp_bit;
> >>> qp->cb_data = data;
> >>> qp->rx_handler = handlers->rx_handler;
> >>> qp->tx_handler = handlers->tx_handler;
> >>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> >>> new file mode 100644
> >>> index 000000000000..6ae5da0a1367
> >>> --- /dev/null
> >>> +++ b/drivers/ntb/ntb_transport_edma.c
> >>> @@ -0,0 +1,987 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * NTB transport backend for remote DesignWare eDMA.
> >>> + *
> >>> + * This implements the backend_ops used when use_remote_edma=1 and
> >>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> >>> + */
> >>> +
> >>> +#include <linux/bug.h>
> >>> +#include <linux/compiler.h>
> >>> +#include <linux/debugfs.h>
> >>> +#include <linux/dmaengine.h>
> >>> +#include <linux/dma-mapping.h>
> >>> +#include <linux/errno.h>
> >>> +#include <linux/io-64-nonatomic-lo-hi.h>
> >>> +#include <linux/ntb.h>
> >>> +#include <linux/pci.h>
> >>> +#include <linux/pci-epc.h>
> >>> +#include <linux/seq_file.h>
> >>> +#include <linux/slab.h>
> >>> +
> >>> +#include "hw/edma/ntb_hw_edma.h"
> >>> +#include "ntb_transport_internal.h"
> >>> +
> >>> +#define NTB_EDMA_RING_ORDER 7
> >>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> >>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> >>> +
> >>> +#define NTB_EDMA_MAX_POLL 32
> >>> +
> >>> +/*
> >>> + * Remote eDMA mode implementation
> >>> + */
> >>> +struct ntb_transport_ctx_edma {
> >>> + remote_edma_mode_t remote_edma_mode;
> >>> + struct device *dma_dev;
> >>> + struct workqueue_struct *wq;
> >>> + struct ntb_edma_chans chans;
> >>> +};
> >>> +
> >>> +struct ntb_transport_qp_edma {
> >>> + struct ntb_transport_qp *qp;
> >>> +
> >>> + /*
> >>> + * For ensuring peer notification in non-atomic context.
> >>> + * ntb_peer_db_set might sleep or schedule.
> >>> + */
> >>> + struct work_struct db_work;
> >>> +
> >>> + u32 rx_prod;
> >>> + u32 rx_cons;
> >>> + u32 tx_cons;
> >>> + u32 tx_issue;
> >>> +
> >>> + spinlock_t rx_lock;
> >>> + spinlock_t tx_lock;
> >>> +
> >>> + struct work_struct rx_work;
> >>> + struct work_struct tx_work;
> >>> +};
> >>> +
> >>> +struct ntb_edma_desc {
> >>> + u32 len;
> >>> + u32 flags;
> >>> + u64 addr; /* DMA address */
> >>> + u64 data;
> >>> +};
> >>> +
> >>> +struct ntb_edma_ring {
> >>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> >>> + u32 head;
> >>> + u32 tail;
> >>> +};
> >>> +
> >>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> >>> +}
> >>> +
> >>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> >>> +}
> >>> +
> >>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> >>> +{
> >>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> >>> +}
> >>> +
> >>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return n ^ !!ntb_qp_edma_is_ep(qp);
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_ring *
> >>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> >>> +{
> >>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>> +
> >>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_ring __iomem *
> >>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> >>> +{
> >>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>> +
> >>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_desc *
> >>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->desc[i];
> >>> +}
> >>> +
> >>> +static inline struct ntb_edma_desc __iomem *
> >>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> >>> + unsigned int i)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
> >>> +}
> >>> +
> >>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->head;
> >>> +}
> >>> +
> >>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->head;
> >>> +}
> >>> +
> >>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_local(qp, n)->tail;
> >>> +}
> >>> +
> >>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> >>> + unsigned int n)
> >>> +{
> >>> + return &ntb_edma_ring_remote(qp, n)->tail;
> >>> +}
> >>> +
> >>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
> >>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> >>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> >>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> >>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> >>> +
> >>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> >>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> >>> +
> >>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> >>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> >>> +
> >>> +/* ntb_edma_ring helpers */
> >>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> >>> +{
> >>> + return v & NTB_EDMA_RING_MASK;
> >>> +}
> >>> +
> >>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> >>> +{
> >>> + if (head >= tail) {
> >>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> >>> + return head - tail;
> >>> + }
> >>> +
> >>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> >>> + return U32_MAX - tail + head + 1;
> >>> +}
> >>> +
> >>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> >>> +{
> >>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> >>> +}
> >>> +
> >>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> >>> +{
> >>> + return ntb_edma_ring_free_entry(head, tail) == 0;
> >>> +}
> >>> +
> >>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + unsigned int head, tail;
> >>> +
> >>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>> + /* In this scope, only 'head' might proceed */
> >>> + tail = READ_ONCE(edma->tx_issue);
> >>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>> + }
> >>> + /*
> >>> + * 'used' amount indicates how much the other end has refilled,
> >>> + * which are available for us to use for TX.
> >>> + */
> >>> + return ntb_edma_ring_used_entry(head, tail);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> >>> + struct ntb_transport_qp *qp)
> >>> +{
> >>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> >>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> >>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> >>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> >>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> >>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> >>> +
> >>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> >>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> >>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> >>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> >>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> >>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> >>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> >>> + seq_putc(s, '\n');
> >>> +
> >>> + seq_puts(s, "Using Remote eDMA - Yes\n");
> >>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> +
> >>> + if (ctx->wq)
> >>> + destroy_workqueue(ctx->wq);
> >>> + ctx->wq = NULL;
> >>> +
> >>> + ntb_edma_teardown_chans(&ctx->chans);
> >>> +
> >>> + switch (ctx->remote_edma_mode) {
> >>> + case REMOTE_EDMA_EP:
> >>> + ntb_edma_teardown_mws(nt->ndev);
> >>> + break;
> >>> + case REMOTE_EDMA_RC:
> >>> + ntb_edma_teardown_peer(nt->ndev);
> >>> + break;
> >>> + case REMOTE_EDMA_UNKNOWN:
> >>> + default:
> >>> + break;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_db_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma =
> >>> + container_of(work, struct ntb_transport_qp_edma, db_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> +
> >>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> >>> +{
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> +
> >>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> >>> + return;
> >>> +
> >>> + /*
> >>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> >>> + * may sleep, delegate the actual doorbell write to a workqueue.
> >>> + */
> >>> + queue_work(system_highpri_wq, &edma->db_work);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_isr(void *data, int qp_num)
> >>> +{
> >>> + struct ntb_transport_ctx *nt = data;
> >>> + struct ntb_transport_qp_edma *edma;
> >>> + struct ntb_transport_ctx_edma *ctx;
> >>> + struct ntb_transport_qp *qp;
> >>> +
> >>> + if (qp_num < 0 || qp_num >= nt->qp_count)
> >>> + return;
> >>> +
> >>> + qp = &nt->qp_vec[qp_num];
> >>> + if (WARN_ON(!qp))
> >>> + return;
> >>> +
> >>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> >>> + edma = qp->priv;
> >>> +
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int peer_mw;
> >>> + int rc;
> >>> +
> >>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> >>> + return 0;
> >>> +
> >>> + peer_mw = ntb_peer_mw_count(ndev);
> >>> + if (peer_mw <= 0)
> >>> + return -ENODEV;
> >>> +
> >>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>> + goto err_teardown_peer;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> >>> + rc);
> >>> + goto err_teardown_chans;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> >>> + return 0;
> >>> +
> >>> +err_teardown_chans:
> >>> + ntb_edma_teardown_chans(&ctx->chans);
> >>> +err_teardown_peer:
> >>> + ntb_edma_teardown_peer(ndev);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +
> >>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int peer_mw;
> >>> + int rc;
> >>> +
> >>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> >>> + return 0;
> >>> +
> >>> + /**
> >>> + * This check assumes that the endpoint (pci-epf-vntb.c)
> >>> + * ntb_dev_ops implements .get_private_data() while the host side
> >>> + * (ntb_hw_epf.c) does not.
> >>> + */
> >>> + if (!ntb_get_private_data(ndev))
> >>> + return 0;
> >>> +
> >>> + peer_mw = ntb_peer_mw_count(ndev);
> >>> + if (peer_mw <= 0)
> >>> + return -ENODEV;
> >>> +
> >>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> >>> + ntb_transport_edma_isr, nt);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev,
> >>> + "Failed to set up memory window for eDMA: %d\n", rc);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> >>> + if (rc) {
> >>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>> + ntb_edma_teardown_mws(ndev);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> >>> + return 0;
> >>> +}
> >>> +
> >>> +
> >>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> >>> + unsigned int qp_num)
> >>> +{
> >>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct ntb_queue_entry *entry;
> >>> + struct ntb_transport_mw *mw;
> >>> + unsigned int mw_num, mw_count, qp_count;
> >>> + unsigned int qp_offset, rx_info_offset;
> >>> + unsigned int mw_size, mw_size_per_qp;
> >>> + unsigned int num_qps_mw;
> >>> + size_t edma_total;
> >>> + unsigned int i;
> >>> + int node;
> >>> +
> >>> + mw_count = nt->mw_count;
> >>> + qp_count = nt->qp_count;
> >>> +
> >>> + mw_num = QP_TO_MW(nt, qp_num);
> >>> + mw = &nt->mw_vec[mw_num];
> >>> +
> >>> + if (!mw->virt_addr)
> >>> + return -ENOMEM;
> >>> +
> >>> + if (mw_num < qp_count % mw_count)
> >>> + num_qps_mw = qp_count / mw_count + 1;
> >>> + else
> >>> + num_qps_mw = qp_count / mw_count;
> >>> +
> >>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> >>> + if (max_mw_size && mw_size > max_mw_size)
> >>> + mw_size = max_mw_size;
> >>> +
> >>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> >>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> >>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> >>> +
> >>> + qp->tx_mw_size = mw_size_per_qp;
> >>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> >>> + if (!qp->tx_mw)
> >>> + return -EINVAL;
> >>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> >>> + if (!qp->tx_mw_phys)
> >>> + return -EINVAL;
> >>> + qp->rx_info = qp->tx_mw + rx_info_offset;
> >>> + qp->rx_buff = mw->virt_addr + qp_offset;
> >>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> >>> +
> >>> + /* Due to housekeeping, there must be at least 2 buffs */
> >>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>> +
> >>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> >>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
> >>> + if (rx_info_offset < edma_total) {
> >>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> >>> + edma_total, rx_info_offset);
> >>> + return -EINVAL;
> >>> + }
> >>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>> +
> >>> + /*
> >>> + * Checking to see if we have more entries than the default.
> >>> + * We should add additional entries if that is the case so we
> >>> + * can be in sync with the transport frames.
> >>> + */
> >>> + node = dev_to_node(&ndev->dev);
> >>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> >>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> >>> + if (!entry)
> >>> + return -ENOMEM;
> >>> +
> >>> + entry->qp = qp;
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>> + &qp->rx_free_q);
> >>> + qp->rx_alloc_entry++;
> >>> + }
> >>> +
> >>> + memset(qp->rx_buff, 0, edma_total);
> >>> +
> >>> + qp->rx_pkts = 0;
> >>> + qp->tx_pkts = 0;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_queue_entry *entry;
> >>> + struct ntb_edma_desc *in;
> >>> + unsigned int len;
> >>> + bool link_down;
> >>> + u32 idx;
> >>> +
> >>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> >>> + edma->rx_cons) == 0)
> >>> + return 0;
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->rx_cons);
> >>> + in = NTB_DESC_RX_I(qp, idx);
> >>> + if (!(in->flags & DESC_DONE_FLAG))
> >>> + return 0;
> >>> +
> >>> + link_down = in->flags & LINK_DOWN_FLAG;
> >>> + in->flags = 0;
> >>> + len = in->len; /* might be smaller than entry->len */
> >>> +
> >>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>> + if (WARN_ON(!entry))
> >>> + return 0;
> >>> +
> >>> + if (link_down) {
> >>> + ntb_qp_link_down(qp);
> >>> + edma->rx_cons++;
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>> + return 1;
> >>> + }
> >>> +
> >>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> >>> +
> >>> + qp->rx_bytes += len;
> >>> + qp->rx_pkts++;
> >>> + edma->rx_cons++;
> >>> +
> >>> + if (qp->rx_handler && qp->client_ready)
> >>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> >>> +
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>> + return 1;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = container_of(
> >>> + work, struct ntb_transport_qp_edma, rx_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>> + unsigned int i;
> >>> +
> >>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> >>> + if (!ntb_transport_edma_rx_complete(qp))
> >>> + break;
> >>> + }
> >>> +
> >>> + if (ntb_transport_edma_rx_complete(qp))
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = container_of(
> >>> + work, struct ntb_transport_qp_edma, tx_work);
> >>> + struct ntb_transport_qp *qp = edma->qp;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + struct ntb_queue_entry *entry;
> >>> + unsigned int len;
> >>> + void *cb_data;
> >>> + u32 idx;
> >>> +
> >>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> >>> + edma->tx_cons) != 0) {
> >>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> >>> + smp_rmb();
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->tx_cons);
> >>> + in = NTB_DESC_TX_I(qp, idx);
> >>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> >>> + break;
> >>> +
> >>> + in->data = 0;
> >>> +
> >>> + cb_data = entry->cb_data;
> >>> + len = entry->len;
> >>> +
> >>> + out = NTB_DESC_TX_O(qp, idx);
> >>> +
> >>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> >>> +
> >>> + /*
> >>> + * No need to add barrier in-between to enforce ordering here.
> >>> + * The other side proceeds only after both flags and tail are
> >>> + * updated.
> >>> + */
> >>> + iowrite32(entry->flags, &out->flags);
> >>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> >>> +
> >>> + ntb_transport_edma_notify_peer(edma);
> >>> +
> >>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> >>> + &qp->tx_free_q);
> >>> +
> >>> + if (qp->tx_handler)
> >>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> >>> +
> >>> + /* stat updates */
> >>> + qp->tx_bytes += len;
> >>> + qp->tx_pkts++;
> >>> + }
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_tx_cb(void *data,
> >>> + const struct dmaengine_result *res)
> >>> +{
> >>> + struct ntb_queue_entry *entry = data;
> >>> + struct ntb_transport_qp *qp = entry->qp;
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + enum dmaengine_tx_result dma_err = res->result;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + switch (dma_err) {
> >>> + case DMA_TRANS_READ_FAILED:
> >>> + case DMA_TRANS_WRITE_FAILED:
> >>> + case DMA_TRANS_ABORTED:
> >>> + entry->errors++;
> >>> + entry->len = -EIO;
> >>> + break;
> >>> + case DMA_TRANS_NOERROR:
> >>> + default:
> >>> + break;
> >>> + }
> >>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> >>> + sg_dma_address(&entry->sgl) = 0;
> >>> +
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> +
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> >>> + size_t len, void *rc_src, dma_addr_t dst,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct scatterlist *sgl = &entry->sgl;
> >>> + struct dma_async_tx_descriptor *txd;
> >>> + struct dma_slave_config cfg;
> >>> + dma_cookie_t cookie;
> >>> + int nents, rc;
> >>> +
> >>> + if (!d)
> >>> + return -ENODEV;
> >>> +
> >>> + if (!chan)
> >>> + return -ENXIO;
> >>> +
> >>> + if (WARN_ON(!rc_src || !dst))
> >>> + return -EINVAL;
> >>> +
> >>> + if (WARN_ON(sg_dma_address(sgl)))
> >>> + return -EINVAL;
> >>> +
> >>> + sg_init_one(sgl, rc_src, len);
> >>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>> + if (nents <= 0)
> >>> + return -EIO;
> >>> +
> >>> + memset(&cfg, 0, sizeof(cfg));
> >>> + cfg.dst_addr = dst;
> >>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>> + cfg.direction = DMA_MEM_TO_DEV;
> >>> +
> >>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> >>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> >>> + if (!txd) {
> >>> + rc = -EIO;
> >>> + goto out_unmap;
> >>> + }
> >>> +
> >>> + txd->callback_result = ntb_transport_edma_tx_cb;
> >>> + txd->callback_param = entry;
> >>> +
> >>> + cookie = dmaengine_submit(txd);
> >>> + if (dma_submit_error(cookie)) {
> >>> + rc = -EIO;
> >>> + goto out_unmap;
> >>> + }
> >>> + dma_async_issue_pending(chan);
> >>> + return 0;
> >>> +out_unmap:
> >>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + unsigned int len = entry->len;
> >>> + struct dma_chan *chan;
> >>> + u32 issue, idx, head;
> >>> + dma_addr_t dst;
> >>> + int rc;
> >>> +
> >>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> >>> +
> >>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>> + issue = edma->tx_issue;
> >>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> >>> + qp->tx_ring_full++;
> >>> + return -ENOSPC;
> >>> + }
> >>> +
> >>> + /*
> >>> + * ntb_transport_edma_tx_work() checks entry->flags
> >>> + * so it needs to be set before tx_issue++.
> >>> + */
> >>> + idx = ntb_edma_ring_idx(issue);
> >>> + in = NTB_DESC_TX_I(qp, idx);
> >>> + in->data = (uintptr_t)entry;
> >>> +
> >>> + /* Make in->data visible before tx_issue++ */
> >>> + smp_wmb();
> >>> +
> >>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> >>> + }
> >>> +
> >>> + /* Publish the final transfer length to the other end */
> >>> + out = NTB_DESC_TX_O(qp, idx);
> >>> + iowrite32(len, &out->len);
> >>> + ioread32(&out->len);
> >>> +
> >>> + if (unlikely(!len)) {
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> >>> + dma_rmb();
> >>> +
> >>> + /* kick remote eDMA read transfer */
> >>> + dst = (dma_addr_t)in->addr;
> >>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> >>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> >>> + entry->buf, dst, entry);
> >>> + if (rc) {
> >>> + entry->errors++;
> >>> + entry->len = -EIO;
> >>> + entry->flags |= DESC_DONE_FLAG;
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> + }
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry,
> >>> + void *cb, void *data, unsigned int len,
> >>> + unsigned int flags)
> >>> +{
> >>> + struct device *dma_dev;
> >>> +
> >>> + if (entry->addr) {
> >>> + /* Deferred unmap */
> >>> + dma_dev = get_dma_dev(qp->ndev);
> >>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
> >>> + DMA_TO_DEVICE);
> >>> + }
> >>> +
> >>> + entry->cb_data = cb;
> >>> + entry->buf = data;
> >>> + entry->len = len;
> >>> + entry->flags = flags;
> >>> + entry->errors = 0;
> >>> + entry->addr = 0;
> >>> +
> >>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> >>> +
> >>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> + struct ntb_edma_desc *in, __iomem *out;
> >>> + unsigned int len = entry->len;
> >>> + void *data = entry->buf;
> >>> + dma_addr_t dst;
> >>> + u32 idx;
> >>> + int rc;
> >>> +
> >>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> >>> + rc = dma_mapping_error(dma_dev, dst);
> >>> + if (rc)
> >>> + return rc;
> >>> +
> >>> + guard(spinlock_bh)(&edma->rx_lock);
> >>> +
> >>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> >>> + READ_ONCE(edma->rx_cons))) {
> >>> + rc = -ENOSPC;
> >>> + goto out_unmap;
> >>> + }
> >>> +
> >>> + idx = ntb_edma_ring_idx(edma->rx_prod);
> >>> + in = NTB_DESC_RX_I(qp, idx);
> >>> + out = NTB_DESC_RX_O(qp, idx);
> >>> +
> >>> + iowrite32(len, &out->len);
> >>> + iowrite64(dst, &out->addr);
> >>> +
> >>> + WARN_ON(in->flags & DESC_DONE_FLAG);
> >>> + in->data = (uintptr_t)entry;
> >>> + entry->addr = dst;
> >>> +
> >>> + /* Ensure len/addr are visible before the head update */
> >>> + dma_wmb();
> >>> +
> >>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> >>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> >>> +
> >>> + return 0;
> >>> +out_unmap:
> >>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> >>> + struct ntb_queue_entry *entry)
> >>> +{
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> >>> + if (rc) {
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>> + &qp->rx_free_q);
> >>> + return rc;
> >>> + }
> >>> +
> >>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> >>> +
> >>> + if (qp->active)
> >>> + tasklet_schedule(&qp->rxc_db_work);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_ctx *nt = qp->transport;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + queue_work(ctx->wq, &edma->rx_work);
> >>> + queue_work(ctx->wq, &edma->tx_work);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> >>> + unsigned int qp_num)
> >>> +{
> >>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>> + struct ntb_transport_qp_edma *edma;
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + int node;
> >>> +
> >>> + node = dev_to_node(&ndev->dev);
> >>> +
> >>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> >>> + if (!qp->priv)
> >>> + return -ENOMEM;
> >>> +
> >>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
> >>> + edma->qp = qp;
> >>> + edma->rx_prod = 0;
> >>> + edma->rx_cons = 0;
> >>> + edma->tx_cons = 0;
> >>> + edma->tx_issue = 0;
> >>> +
> >>> + spin_lock_init(&edma->rx_lock);
> >>> + spin_lock_init(&edma->tx_lock);
> >>> +
> >>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> >>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> >>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> >>> +{
> >>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>> +
> >>> + cancel_work_sync(&edma->db_work);
> >>> + cancel_work_sync(&edma->rx_work);
> >>> + cancel_work_sync(&edma->tx_work);
> >>> +
> >>> + kfree(qp->priv);
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_ep_init(nt);
> >>> + if (rc)
> >>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> >>> +
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct pci_dev *pdev = ndev->pdev;
> >>> + int rc;
> >>> +
> >>> + rc = ntb_transport_edma_rc_init(nt);
> >>> + if (rc)
> >>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> >>> +
> >>> + return rc;
> >>> +}
> >>> +
> >>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> >>> + unsigned int *mw_count)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>> +
> >>> + if (!use_remote_edma)
> >>> + return 0;
> >>> +
> >>> + /*
> >>> + * We need at least one MW for the transport plus one MW reserved
> >>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> >>> + */
> >>> + if (*mw_count <= 1) {
> >>> + dev_err(&ndev->dev,
> >>> + "remote eDMA requires at least two MWS (have %u)\n",
> >>> + *mw_count);
> >>> + return -ENODEV;
> >>> + }
> >>> +
> >>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> >>> + if (!ctx->wq) {
> >>> + ntb_transport_edma_uninit(nt);
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + /* Reserve the last peer MW exclusively for the eDMA window. */
> >>> + *mw_count -= 1;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + ntb_transport_edma_uninit(nt);
> >>> +}
> >>> +
> >>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
> >>> + .enable = ntb_transport_edma_enable,
> >>> + .disable = ntb_transport_edma_disable,
> >>> + .qp_init = ntb_transport_edma_qp_init,
> >>> + .qp_free = ntb_transport_edma_qp_free,
> >>> + .pre_link_up = ntb_transport_edma_pre_link_up,
> >>> + .post_link_up = ntb_transport_edma_post_link_up,
> >>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> >>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> >>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> >>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> >>> + .rx_poll = ntb_transport_edma_rx_poll,
> >>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> >>> +};
> >>> +
> >>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + struct ntb_dev *ndev = nt->ndev;
> >>> + int node;
> >>> +
> >>> + node = dev_to_node(&ndev->dev);
> >>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> >>> + node);
> >>> + if (!nt->priv)
> >>> + return -ENOMEM;
> >>> +
> >>> + nt->backend_ops = edma_backend_ops;
> >>> + /*
> >>> + * On remote eDMA mode, one DMA read channel is used for Host side
> >>> + * to interrupt EP.
> >>> + */
> >>> + use_msi = false;
> >>> + return 0;
> >>> +}
> >>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> >>> index 51ff08062d73..9fff65980d3d 100644
> >>> --- a/drivers/ntb/ntb_transport_internal.h
> >>> +++ b/drivers/ntb/ntb_transport_internal.h
> >>> @@ -8,6 +8,7 @@
> >>> extern unsigned long max_mw_size;
> >>> extern unsigned int transport_mtu;
> >>> extern bool use_msi;
> >>> +extern bool use_remote_edma;
> >>>
> >>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
> >>>
> >>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> >>> struct ntb_payload_header __iomem *tx_hdr;
> >>> struct ntb_payload_header *rx_hdr;
> >>> };
> >>> +
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> + dma_addr_t addr;
> >>> + struct scatterlist sgl;
> >>> +#endif
> >>> };
> >>>
> >>> struct ntb_rx_info {
> >>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> >>> unsigned int qp_num);
> >>> struct device *get_dma_dev(struct ntb_dev *ndev);
> >>>
> >>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> >>> +#else
> >>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>> +{
> >>> + return -EOPNOTSUPP;
> >>> +}
> >>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> >>> +
> >>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
> >>
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-08 1:25 ` Koichiro Den
@ 2026-01-08 17:55 ` Dave Jiang
2026-01-10 13:43 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-08 17:55 UTC (permalink / raw)
To: Koichiro Den
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On 1/7/26 6:25 PM, Koichiro Den wrote:
> On Wed, Jan 07, 2026 at 12:02:15PM -0700, Dave Jiang wrote:
>>
>>
>> On 1/7/26 7:54 AM, Koichiro Den wrote:
>>> On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
>>>>
>>>>
>>>> On 12/17/25 8:16 AM, Koichiro Den wrote:
>>>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
>>>>> located on the endpoint, to be driven by both host and endpoint.
>>>>>
>>>>> The endpoint exposes a dedicated memory window which contains the eDMA
>>>>> register block, a small control structure (struct ntb_edma_info) and
>>>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
>>>>> its local eDMA write channels for its transmission, while host side
>>>>> uses the remote eDMA read channels for its transmission.
>>>>>
>>>>> A key benefit of this backend is that the memory window no longer needs
>>>>> to carry data-plane payload. This makes the design less sensitive to
>>>>> limited memory window space and allows scaling to multiple queue pairs.
>>>>> The memory window layout is specific to the eDMA-backed backend, so
>>>>> there is no automatic fallback to the memcpy-based default transport
>>>>> that requires the different layout.
>>>>>
>>>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
>>>>> ---
>>>>> drivers/ntb/Kconfig | 12 +
>>>>> drivers/ntb/Makefile | 2 +
>>>>> drivers/ntb/ntb_transport_core.c | 15 +-
>>>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
>>>>> drivers/ntb/ntb_transport_internal.h | 15 +
>>>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
>>>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
>>>>>
>>>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
>>>>> index df16c755b4da..5ba6d0b7f5ba 100644
>>>>> --- a/drivers/ntb/Kconfig
>>>>> +++ b/drivers/ntb/Kconfig
>>>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>>>>>
>>>>> If unsure, say N.
>>>>>
>>>>> +config NTB_TRANSPORT_EDMA
>>>>> + bool "NTB Transport backed by remote eDMA"
>>>>> + depends on NTB_TRANSPORT
>>>>> + depends on PCI
>>>>> + select DMA_ENGINE
>>>>> + select NTB_EDMA
>>>>> + help
>>>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
>>>>> + exposed through a dedicated NTB memory window. The host uses the
>>>>> + endpoint's eDMA engine to move data in both directions.
>>>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
>>>>> +
>>>>> endif # NTB
>>>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
>>>>> index 9b66e5fafbc0..b9086b32ecde 100644
>>>>> --- a/drivers/ntb/Makefile
>>>>> +++ b/drivers/ntb/Makefile
>>>>> @@ -6,3 +6,5 @@ ntb-y := core.o
>>>>> ntb-$(CONFIG_NTB_MSI) += msi.o
>>>>>
>>>>> ntb_transport-y := ntb_transport_core.o
>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
>>>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
>>>>> index 40c2548f5930..bd21232f26fe 100644
>>>>> --- a/drivers/ntb/ntb_transport_core.c
>>>>> +++ b/drivers/ntb/ntb_transport_core.c
>>>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
>>>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
>>>>> #endif
>>>>>
>>>>> +bool use_remote_edma;
>>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>>>> +module_param(use_remote_edma, bool, 0644);
>>>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
>>>>> +#endif
>>>>
>>>> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
>>>
>>> Agreed. I plan to drop 'use_remote_edma' and instead,
>>> - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
>>> - introduce ntb_transport_backend_register() for transports to self-register via
>>> struct ntb_transport_backend { .name, .ops }, and
>>> - have the core select the backend whose .name matches transport_type.
>>>
>>> I think this should keep any non-default transport-specific logic out of
>>> ntb_transport_core, or at least keep it to a minimum, while still allowing
>>> non-defualt transports (*ntb_transport_edma is the only choice for now
>>> though) to plug in cleanly.
>>>
>>> If you see a cleaner approach, I would appreciate it if you could elaborate
>>> a bit more on your idea.
>>
>
> Thank you for the comment, let me respond inline below.
>
>> Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing?
>
> At least in the current implementation, the remote eDMA use is an
> all-or-nothing type rather than something that can be selected per
> ntb_transport_mw.
>
> The way remote eDMA consumes MW is quite similar to how ntb_msi uses them
> today. Assuming multiple MWs are available, the last MW is reserved to
> expose the remote eDMA info/register/LL regions to the host by packing all
> of them into a single MW. In that sense, it does not map naturally to a
> per-MW selection model.
>
>> I'm trying to see if we can do away with the module param.
>
> I think it is useful to keep an explicit way for an administrator to choose
> the transport type (default vs edma). Even on platforms where dw-edma is
> available, there can potentially be platform-specific or hard-to-reproduce
> issues (e.g. problems that only show up with certain transfer patterns),
> and having a way to fall back the long-existing traditional transport can
> be valuable.
>
> That said, I am not opposed to making the default behavior an automatic
> selection, where edma is chosen when it's available and the parameter is
> left unset.
>
>> Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
>
> I'm not sure how selecting the transport type at ntb_netdev probe time
> would work in practice, and what additional benefit that would provide.
So currently ntb_netdev or ntb_transport are not auto-loaded right? They are manually probed by the user. So with the new transport, the user would modprobe ntb_transport_edma.ko. And that would trigger the eDMA transport setup right? With the ntb_transport_core library existing, we should be able to load both the ntb_transport_host and ntb_transport_edma at the same time theoretically. And ntb_netdev should be able to select one or the other transport. This is the most versatile scenario. An alternative is there can be only 1 transport ever loaded, and when ntb_transport_edma is loaded, it just looks like the default transport and netdev functions as it always has without knowing what the underneath transport is. On the platform if there are multiple NTB ports, it would be nice to have the flexibility of allowing each port choose the usage of the current transport and the edma transport if the user desires.
DJ
>
> Kind regards,
> Koichiro
>
>>
>>>
>>> Thanks,
>>> Koichiro
>>>
>>>>
>>>> DJ
>>>>
>>>>> +
>>>>> static struct dentry *nt_debugfs_dir;
>>>>>
>>>>> /* Only two-ports NTB devices are supported */
>>>>> @@ -156,7 +162,7 @@ enum {
>>>>> #define drv_client(__drv) \
>>>>> container_of((__drv), struct ntb_transport_client, driver)
>>>>>
>>>>> -#define NTB_QP_DEF_NUM_ENTRIES 100
>>>>> +#define NTB_QP_DEF_NUM_ENTRIES 128
>>>>> #define NTB_LINK_DOWN_TIMEOUT 10
>>>>>
>>>>> static void ntb_transport_rxc_db(unsigned long data);
>>>>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
>>>>>
>>>>> nt->ndev = ndev;
>>>>>
>>>>> - rc = ntb_transport_default_init(nt);
>>>>> + if (use_remote_edma)
>>>>> + rc = ntb_transport_edma_init(nt);
>>>>> + else
>>>>> + rc = ntb_transport_default_init(nt);
>>>>> +
>>>>> if (rc)
>>>>> return rc;
>>>>>
>>>>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
>>>>>
>>>>> nt->qp_bitmap_free &= ~qp_bit;
>>>>>
>>>>> + qp->qp_bit = qp_bit;
>>>>> qp->cb_data = data;
>>>>> qp->rx_handler = handlers->rx_handler;
>>>>> qp->tx_handler = handlers->tx_handler;
>>>>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
>>>>> new file mode 100644
>>>>> index 000000000000..6ae5da0a1367
>>>>> --- /dev/null
>>>>> +++ b/drivers/ntb/ntb_transport_edma.c
>>>>> @@ -0,0 +1,987 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>>> +/*
>>>>> + * NTB transport backend for remote DesignWare eDMA.
>>>>> + *
>>>>> + * This implements the backend_ops used when use_remote_edma=1 and
>>>>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
>>>>> + */
>>>>> +
>>>>> +#include <linux/bug.h>
>>>>> +#include <linux/compiler.h>
>>>>> +#include <linux/debugfs.h>
>>>>> +#include <linux/dmaengine.h>
>>>>> +#include <linux/dma-mapping.h>
>>>>> +#include <linux/errno.h>
>>>>> +#include <linux/io-64-nonatomic-lo-hi.h>
>>>>> +#include <linux/ntb.h>
>>>>> +#include <linux/pci.h>
>>>>> +#include <linux/pci-epc.h>
>>>>> +#include <linux/seq_file.h>
>>>>> +#include <linux/slab.h>
>>>>> +
>>>>> +#include "hw/edma/ntb_hw_edma.h"
>>>>> +#include "ntb_transport_internal.h"
>>>>> +
>>>>> +#define NTB_EDMA_RING_ORDER 7
>>>>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
>>>>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
>>>>> +
>>>>> +#define NTB_EDMA_MAX_POLL 32
>>>>> +
>>>>> +/*
>>>>> + * Remote eDMA mode implementation
>>>>> + */
>>>>> +struct ntb_transport_ctx_edma {
>>>>> + remote_edma_mode_t remote_edma_mode;
>>>>> + struct device *dma_dev;
>>>>> + struct workqueue_struct *wq;
>>>>> + struct ntb_edma_chans chans;
>>>>> +};
>>>>> +
>>>>> +struct ntb_transport_qp_edma {
>>>>> + struct ntb_transport_qp *qp;
>>>>> +
>>>>> + /*
>>>>> + * For ensuring peer notification in non-atomic context.
>>>>> + * ntb_peer_db_set might sleep or schedule.
>>>>> + */
>>>>> + struct work_struct db_work;
>>>>> +
>>>>> + u32 rx_prod;
>>>>> + u32 rx_cons;
>>>>> + u32 tx_cons;
>>>>> + u32 tx_issue;
>>>>> +
>>>>> + spinlock_t rx_lock;
>>>>> + spinlock_t tx_lock;
>>>>> +
>>>>> + struct work_struct rx_work;
>>>>> + struct work_struct tx_work;
>>>>> +};
>>>>> +
>>>>> +struct ntb_edma_desc {
>>>>> + u32 len;
>>>>> + u32 flags;
>>>>> + u64 addr; /* DMA address */
>>>>> + u64 data;
>>>>> +};
>>>>> +
>>>>> +struct ntb_edma_ring {
>>>>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
>>>>> + u32 head;
>>>>> + u32 tail;
>>>>> +};
>>>>> +
>>>>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>>>> +
>>>>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
>>>>> +}
>>>>> +
>>>>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>>>> +
>>>>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
>>>>> +}
>>>>> +
>>>>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
>>>>> +}
>>>>> +
>>>>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
>>>>> + unsigned int n)
>>>>> +{
>>>>> + return n ^ !!ntb_qp_edma_is_ep(qp);
>>>>> +}
>>>>> +
>>>>> +static inline struct ntb_edma_ring *
>>>>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
>>>>> +{
>>>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>>>> +
>>>>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
>>>>> +}
>>>>> +
>>>>> +static inline struct ntb_edma_ring __iomem *
>>>>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
>>>>> +{
>>>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
>>>>> +
>>>>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
>>>>> +}
>>>>> +
>>>>> +static inline struct ntb_edma_desc *
>>>>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
>>>>> +{
>>>>> + return &ntb_edma_ring_local(qp, n)->desc[i];
>>>>> +}
>>>>> +
>>>>> +static inline struct ntb_edma_desc __iomem *
>>>>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
>>>>> + unsigned int i)
>>>>> +{
>>>>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
>>>>> +}
>>>>> +
>>>>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
>>>>> + unsigned int n)
>>>>> +{
>>>>> + return &ntb_edma_ring_local(qp, n)->head;
>>>>> +}
>>>>> +
>>>>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
>>>>> + unsigned int n)
>>>>> +{
>>>>> + return &ntb_edma_ring_remote(qp, n)->head;
>>>>> +}
>>>>> +
>>>>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
>>>>> + unsigned int n)
>>>>> +{
>>>>> + return &ntb_edma_ring_local(qp, n)->tail;
>>>>> +}
>>>>> +
>>>>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
>>>>> + unsigned int n)
>>>>> +{
>>>>> + return &ntb_edma_ring_remote(qp, n)->tail;
>>>>> +}
>>>>> +
>>>>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
>>>>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
>>>>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
>>>>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
>>>>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
>>>>> +
>>>>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
>>>>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
>>>>> +
>>>>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
>>>>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
>>>>> +
>>>>> +/* ntb_edma_ring helpers */
>>>>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
>>>>> +{
>>>>> + return v & NTB_EDMA_RING_MASK;
>>>>> +}
>>>>> +
>>>>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
>>>>> +{
>>>>> + if (head >= tail) {
>>>>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
>>>>> + return head - tail;
>>>>> + }
>>>>> +
>>>>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
>>>>> + return U32_MAX - tail + head + 1;
>>>>> +}
>>>>> +
>>>>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
>>>>> +{
>>>>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
>>>>> +}
>>>>> +
>>>>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
>>>>> +{
>>>>> + return ntb_edma_ring_free_entry(head, tail) == 0;
>>>>> +}
>>>>> +
>>>>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> + unsigned int head, tail;
>>>>> +
>>>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>>>> + /* In this scope, only 'head' might proceed */
>>>>> + tail = READ_ONCE(edma->tx_issue);
>>>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>>>> + }
>>>>> + /*
>>>>> + * 'used' amount indicates how much the other end has refilled,
>>>>> + * which are available for us to use for TX.
>>>>> + */
>>>>> + return ntb_edma_ring_used_entry(head, tail);
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
>>>>> + struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
>>>>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
>>>>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
>>>>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
>>>>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
>>>>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
>>>>> +
>>>>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
>>>>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
>>>>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
>>>>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
>>>>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
>>>>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
>>>>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
>>>>> + seq_putc(s, '\n');
>>>>> +
>>>>> + seq_puts(s, "Using Remote eDMA - Yes\n");
>>>>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> +
>>>>> + if (ctx->wq)
>>>>> + destroy_workqueue(ctx->wq);
>>>>> + ctx->wq = NULL;
>>>>> +
>>>>> + ntb_edma_teardown_chans(&ctx->chans);
>>>>> +
>>>>> + switch (ctx->remote_edma_mode) {
>>>>> + case REMOTE_EDMA_EP:
>>>>> + ntb_edma_teardown_mws(nt->ndev);
>>>>> + break;
>>>>> + case REMOTE_EDMA_RC:
>>>>> + ntb_edma_teardown_peer(nt->ndev);
>>>>> + break;
>>>>> + case REMOTE_EDMA_UNKNOWN:
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_db_work(struct work_struct *work)
>>>>> +{
>>>>> + struct ntb_transport_qp_edma *edma =
>>>>> + container_of(work, struct ntb_transport_qp_edma, db_work);
>>>>> + struct ntb_transport_qp *qp = edma->qp;
>>>>> +
>>>>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
>>>>> +{
>>>>> + struct ntb_transport_qp *qp = edma->qp;
>>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>>>> +
>>>>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
>>>>> + return;
>>>>> +
>>>>> + /*
>>>>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
>>>>> + * may sleep, delegate the actual doorbell write to a workqueue.
>>>>> + */
>>>>> + queue_work(system_highpri_wq, &edma->db_work);
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_isr(void *data, int qp_num)
>>>>> +{
>>>>> + struct ntb_transport_ctx *nt = data;
>>>>> + struct ntb_transport_qp_edma *edma;
>>>>> + struct ntb_transport_ctx_edma *ctx;
>>>>> + struct ntb_transport_qp *qp;
>>>>> +
>>>>> + if (qp_num < 0 || qp_num >= nt->qp_count)
>>>>> + return;
>>>>> +
>>>>> + qp = &nt->qp_vec[qp_num];
>>>>> + if (WARN_ON(!qp))
>>>>> + return;
>>>>> +
>>>>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
>>>>> + edma = qp->priv;
>>>>> +
>>>>> + queue_work(ctx->wq, &edma->rx_work);
>>>>> + queue_work(ctx->wq, &edma->tx_work);
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct pci_dev *pdev = ndev->pdev;
>>>>> + int peer_mw;
>>>>> + int rc;
>>>>> +
>>>>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
>>>>> + return 0;
>>>>> +
>>>>> + peer_mw = ntb_peer_mw_count(ndev);
>>>>> + if (peer_mw <= 0)
>>>>> + return -ENODEV;
>>>>> +
>>>>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
>>>>> + if (rc) {
>>>>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
>>>>> + return rc;
>>>>> + }
>>>>> +
>>>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
>>>>> + if (rc) {
>>>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>>>> + goto err_teardown_peer;
>>>>> + }
>>>>> +
>>>>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
>>>>> + if (rc) {
>>>>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
>>>>> + rc);
>>>>> + goto err_teardown_chans;
>>>>> + }
>>>>> +
>>>>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
>>>>> + return 0;
>>>>> +
>>>>> +err_teardown_chans:
>>>>> + ntb_edma_teardown_chans(&ctx->chans);
>>>>> +err_teardown_peer:
>>>>> + ntb_edma_teardown_peer(ndev);
>>>>> + return rc;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct pci_dev *pdev = ndev->pdev;
>>>>> + int peer_mw;
>>>>> + int rc;
>>>>> +
>>>>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
>>>>> + return 0;
>>>>> +
>>>>> + /**
>>>>> + * This check assumes that the endpoint (pci-epf-vntb.c)
>>>>> + * ntb_dev_ops implements .get_private_data() while the host side
>>>>> + * (ntb_hw_epf.c) does not.
>>>>> + */
>>>>> + if (!ntb_get_private_data(ndev))
>>>>> + return 0;
>>>>> +
>>>>> + peer_mw = ntb_peer_mw_count(ndev);
>>>>> + if (peer_mw <= 0)
>>>>> + return -ENODEV;
>>>>> +
>>>>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
>>>>> + ntb_transport_edma_isr, nt);
>>>>> + if (rc) {
>>>>> + dev_err(&pdev->dev,
>>>>> + "Failed to set up memory window for eDMA: %d\n", rc);
>>>>> + return rc;
>>>>> + }
>>>>> +
>>>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
>>>>> + if (rc) {
>>>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
>>>>> + ntb_edma_teardown_mws(ndev);
>>>>> + return rc;
>>>>> + }
>>>>> +
>>>>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
>>>>> + unsigned int qp_num)
>>>>> +{
>>>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct ntb_queue_entry *entry;
>>>>> + struct ntb_transport_mw *mw;
>>>>> + unsigned int mw_num, mw_count, qp_count;
>>>>> + unsigned int qp_offset, rx_info_offset;
>>>>> + unsigned int mw_size, mw_size_per_qp;
>>>>> + unsigned int num_qps_mw;
>>>>> + size_t edma_total;
>>>>> + unsigned int i;
>>>>> + int node;
>>>>> +
>>>>> + mw_count = nt->mw_count;
>>>>> + qp_count = nt->qp_count;
>>>>> +
>>>>> + mw_num = QP_TO_MW(nt, qp_num);
>>>>> + mw = &nt->mw_vec[mw_num];
>>>>> +
>>>>> + if (!mw->virt_addr)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + if (mw_num < qp_count % mw_count)
>>>>> + num_qps_mw = qp_count / mw_count + 1;
>>>>> + else
>>>>> + num_qps_mw = qp_count / mw_count;
>>>>> +
>>>>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
>>>>> + if (max_mw_size && mw_size > max_mw_size)
>>>>> + mw_size = max_mw_size;
>>>>> +
>>>>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
>>>>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
>>>>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
>>>>> +
>>>>> + qp->tx_mw_size = mw_size_per_qp;
>>>>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
>>>>> + if (!qp->tx_mw)
>>>>> + return -EINVAL;
>>>>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
>>>>> + if (!qp->tx_mw_phys)
>>>>> + return -EINVAL;
>>>>> + qp->rx_info = qp->tx_mw + rx_info_offset;
>>>>> + qp->rx_buff = mw->virt_addr + qp_offset;
>>>>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
>>>>> +
>>>>> + /* Due to housekeeping, there must be at least 2 buffs */
>>>>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>>>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
>>>>> +
>>>>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
>>>>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
>>>>> + if (rx_info_offset < edma_total) {
>>>>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
>>>>> + edma_total, rx_info_offset);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
>>>>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
>>>>> +
>>>>> + /*
>>>>> + * Checking to see if we have more entries than the default.
>>>>> + * We should add additional entries if that is the case so we
>>>>> + * can be in sync with the transport frames.
>>>>> + */
>>>>> + node = dev_to_node(&ndev->dev);
>>>>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
>>>>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
>>>>> + if (!entry)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + entry->qp = qp;
>>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>>>> + &qp->rx_free_q);
>>>>> + qp->rx_alloc_entry++;
>>>>> + }
>>>>> +
>>>>> + memset(qp->rx_buff, 0, edma_total);
>>>>> +
>>>>> + qp->rx_pkts = 0;
>>>>> + qp->tx_pkts = 0;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> + struct ntb_queue_entry *entry;
>>>>> + struct ntb_edma_desc *in;
>>>>> + unsigned int len;
>>>>> + bool link_down;
>>>>> + u32 idx;
>>>>> +
>>>>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
>>>>> + edma->rx_cons) == 0)
>>>>> + return 0;
>>>>> +
>>>>> + idx = ntb_edma_ring_idx(edma->rx_cons);
>>>>> + in = NTB_DESC_RX_I(qp, idx);
>>>>> + if (!(in->flags & DESC_DONE_FLAG))
>>>>> + return 0;
>>>>> +
>>>>> + link_down = in->flags & LINK_DOWN_FLAG;
>>>>> + in->flags = 0;
>>>>> + len = in->len; /* might be smaller than entry->len */
>>>>> +
>>>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>>>> + if (WARN_ON(!entry))
>>>>> + return 0;
>>>>> +
>>>>> + if (link_down) {
>>>>> + ntb_qp_link_down(qp);
>>>>> + edma->rx_cons++;
>>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>>>> + return 1;
>>>>> + }
>>>>> +
>>>>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
>>>>> +
>>>>> + qp->rx_bytes += len;
>>>>> + qp->rx_pkts++;
>>>>> + edma->rx_cons++;
>>>>> +
>>>>> + if (qp->rx_handler && qp->client_ready)
>>>>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
>>>>> +
>>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
>>>>> + return 1;
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
>>>>> +{
>>>>> + struct ntb_transport_qp_edma *edma = container_of(
>>>>> + work, struct ntb_transport_qp_edma, rx_work);
>>>>> + struct ntb_transport_qp *qp = edma->qp;
>>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
>>>>> + unsigned int i;
>>>>> +
>>>>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
>>>>> + if (!ntb_transport_edma_rx_complete(qp))
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + if (ntb_transport_edma_rx_complete(qp))
>>>>> + queue_work(ctx->wq, &edma->rx_work);
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
>>>>> +{
>>>>> + struct ntb_transport_qp_edma *edma = container_of(
>>>>> + work, struct ntb_transport_qp_edma, tx_work);
>>>>> + struct ntb_transport_qp *qp = edma->qp;
>>>>> + struct ntb_edma_desc *in, __iomem *out;
>>>>> + struct ntb_queue_entry *entry;
>>>>> + unsigned int len;
>>>>> + void *cb_data;
>>>>> + u32 idx;
>>>>> +
>>>>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
>>>>> + edma->tx_cons) != 0) {
>>>>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
>>>>> + smp_rmb();
>>>>> +
>>>>> + idx = ntb_edma_ring_idx(edma->tx_cons);
>>>>> + in = NTB_DESC_TX_I(qp, idx);
>>>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
>>>>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
>>>>> + break;
>>>>> +
>>>>> + in->data = 0;
>>>>> +
>>>>> + cb_data = entry->cb_data;
>>>>> + len = entry->len;
>>>>> +
>>>>> + out = NTB_DESC_TX_O(qp, idx);
>>>>> +
>>>>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
>>>>> +
>>>>> + /*
>>>>> + * No need to add barrier in-between to enforce ordering here.
>>>>> + * The other side proceeds only after both flags and tail are
>>>>> + * updated.
>>>>> + */
>>>>> + iowrite32(entry->flags, &out->flags);
>>>>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
>>>>> +
>>>>> + ntb_transport_edma_notify_peer(edma);
>>>>> +
>>>>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
>>>>> + &qp->tx_free_q);
>>>>> +
>>>>> + if (qp->tx_handler)
>>>>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
>>>>> +
>>>>> + /* stat updates */
>>>>> + qp->tx_bytes += len;
>>>>> + qp->tx_pkts++;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_tx_cb(void *data,
>>>>> + const struct dmaengine_result *res)
>>>>> +{
>>>>> + struct ntb_queue_entry *entry = data;
>>>>> + struct ntb_transport_qp *qp = entry->qp;
>>>>> + struct ntb_transport_ctx *nt = qp->transport;
>>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>>>> + enum dmaengine_tx_result dma_err = res->result;
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> +
>>>>> + switch (dma_err) {
>>>>> + case DMA_TRANS_READ_FAILED:
>>>>> + case DMA_TRANS_WRITE_FAILED:
>>>>> + case DMA_TRANS_ABORTED:
>>>>> + entry->errors++;
>>>>> + entry->len = -EIO;
>>>>> + break;
>>>>> + case DMA_TRANS_NOERROR:
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
>>>>> + sg_dma_address(&entry->sgl) = 0;
>>>>> +
>>>>> + entry->flags |= DESC_DONE_FLAG;
>>>>> +
>>>>> + queue_work(ctx->wq, &edma->tx_work);
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
>>>>> + size_t len, void *rc_src, dma_addr_t dst,
>>>>> + struct ntb_queue_entry *entry)
>>>>> +{
>>>>> + struct scatterlist *sgl = &entry->sgl;
>>>>> + struct dma_async_tx_descriptor *txd;
>>>>> + struct dma_slave_config cfg;
>>>>> + dma_cookie_t cookie;
>>>>> + int nents, rc;
>>>>> +
>>>>> + if (!d)
>>>>> + return -ENODEV;
>>>>> +
>>>>> + if (!chan)
>>>>> + return -ENXIO;
>>>>> +
>>>>> + if (WARN_ON(!rc_src || !dst))
>>>>> + return -EINVAL;
>>>>> +
>>>>> + if (WARN_ON(sg_dma_address(sgl)))
>>>>> + return -EINVAL;
>>>>> +
>>>>> + sg_init_one(sgl, rc_src, len);
>>>>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
>>>>> + if (nents <= 0)
>>>>> + return -EIO;
>>>>> +
>>>>> + memset(&cfg, 0, sizeof(cfg));
>>>>> + cfg.dst_addr = dst;
>>>>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>>>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
>>>>> + cfg.direction = DMA_MEM_TO_DEV;
>>>>> +
>>>>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
>>>>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
>>>>> + if (!txd) {
>>>>> + rc = -EIO;
>>>>> + goto out_unmap;
>>>>> + }
>>>>> +
>>>>> + txd->callback_result = ntb_transport_edma_tx_cb;
>>>>> + txd->callback_param = entry;
>>>>> +
>>>>> + cookie = dmaengine_submit(txd);
>>>>> + if (dma_submit_error(cookie)) {
>>>>> + rc = -EIO;
>>>>> + goto out_unmap;
>>>>> + }
>>>>> + dma_async_issue_pending(chan);
>>>>> + return 0;
>>>>> +out_unmap:
>>>>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
>>>>> + return rc;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
>>>>> + struct ntb_queue_entry *entry)
>>>>> +{
>>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> + struct ntb_transport_ctx *nt = qp->transport;
>>>>> + struct ntb_edma_desc *in, __iomem *out;
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> + unsigned int len = entry->len;
>>>>> + struct dma_chan *chan;
>>>>> + u32 issue, idx, head;
>>>>> + dma_addr_t dst;
>>>>> + int rc;
>>>>> +
>>>>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
>>>>> +
>>>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
>>>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
>>>>> + issue = edma->tx_issue;
>>>>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
>>>>> + qp->tx_ring_full++;
>>>>> + return -ENOSPC;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * ntb_transport_edma_tx_work() checks entry->flags
>>>>> + * so it needs to be set before tx_issue++.
>>>>> + */
>>>>> + idx = ntb_edma_ring_idx(issue);
>>>>> + in = NTB_DESC_TX_I(qp, idx);
>>>>> + in->data = (uintptr_t)entry;
>>>>> +
>>>>> + /* Make in->data visible before tx_issue++ */
>>>>> + smp_wmb();
>>>>> +
>>>>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
>>>>> + }
>>>>> +
>>>>> + /* Publish the final transfer length to the other end */
>>>>> + out = NTB_DESC_TX_O(qp, idx);
>>>>> + iowrite32(len, &out->len);
>>>>> + ioread32(&out->len);
>>>>> +
>>>>> + if (unlikely(!len)) {
>>>>> + entry->flags |= DESC_DONE_FLAG;
>>>>> + queue_work(ctx->wq, &edma->tx_work);
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
>>>>> + dma_rmb();
>>>>> +
>>>>> + /* kick remote eDMA read transfer */
>>>>> + dst = (dma_addr_t)in->addr;
>>>>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
>>>>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
>>>>> + entry->buf, dst, entry);
>>>>> + if (rc) {
>>>>> + entry->errors++;
>>>>> + entry->len = -EIO;
>>>>> + entry->flags |= DESC_DONE_FLAG;
>>>>> + queue_work(ctx->wq, &edma->tx_work);
>>>>> + }
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
>>>>> + struct ntb_queue_entry *entry,
>>>>> + void *cb, void *data, unsigned int len,
>>>>> + unsigned int flags)
>>>>> +{
>>>>> + struct device *dma_dev;
>>>>> +
>>>>> + if (entry->addr) {
>>>>> + /* Deferred unmap */
>>>>> + dma_dev = get_dma_dev(qp->ndev);
>>>>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
>>>>> + DMA_TO_DEVICE);
>>>>> + }
>>>>> +
>>>>> + entry->cb_data = cb;
>>>>> + entry->buf = data;
>>>>> + entry->len = len;
>>>>> + entry->flags = flags;
>>>>> + entry->errors = 0;
>>>>> + entry->addr = 0;
>>>>> +
>>>>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
>>>>> +
>>>>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
>>>>> + struct ntb_queue_entry *entry)
>>>>> +{
>>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> + struct ntb_edma_desc *in, __iomem *out;
>>>>> + unsigned int len = entry->len;
>>>>> + void *data = entry->buf;
>>>>> + dma_addr_t dst;
>>>>> + u32 idx;
>>>>> + int rc;
>>>>> +
>>>>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
>>>>> + rc = dma_mapping_error(dma_dev, dst);
>>>>> + if (rc)
>>>>> + return rc;
>>>>> +
>>>>> + guard(spinlock_bh)(&edma->rx_lock);
>>>>> +
>>>>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
>>>>> + READ_ONCE(edma->rx_cons))) {
>>>>> + rc = -ENOSPC;
>>>>> + goto out_unmap;
>>>>> + }
>>>>> +
>>>>> + idx = ntb_edma_ring_idx(edma->rx_prod);
>>>>> + in = NTB_DESC_RX_I(qp, idx);
>>>>> + out = NTB_DESC_RX_O(qp, idx);
>>>>> +
>>>>> + iowrite32(len, &out->len);
>>>>> + iowrite64(dst, &out->addr);
>>>>> +
>>>>> + WARN_ON(in->flags & DESC_DONE_FLAG);
>>>>> + in->data = (uintptr_t)entry;
>>>>> + entry->addr = dst;
>>>>> +
>>>>> + /* Ensure len/addr are visible before the head update */
>>>>> + dma_wmb();
>>>>> +
>>>>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
>>>>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
>>>>> +
>>>>> + return 0;
>>>>> +out_unmap:
>>>>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
>>>>> + return rc;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
>>>>> + struct ntb_queue_entry *entry)
>>>>> +{
>>>>> + int rc;
>>>>> +
>>>>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
>>>>> + if (rc) {
>>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
>>>>> + &qp->rx_free_q);
>>>>> + return rc;
>>>>> + }
>>>>> +
>>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
>>>>> +
>>>>> + if (qp->active)
>>>>> + tasklet_schedule(&qp->rxc_db_work);
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct ntb_transport_ctx *nt = qp->transport;
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> +
>>>>> + queue_work(ctx->wq, &edma->rx_work);
>>>>> + queue_work(ctx->wq, &edma->tx_work);
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
>>>>> + unsigned int qp_num)
>>>>> +{
>>>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
>>>>> + struct ntb_transport_qp_edma *edma;
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + int node;
>>>>> +
>>>>> + node = dev_to_node(&ndev->dev);
>>>>> +
>>>>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
>>>>> + if (!qp->priv)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
>>>>> + edma->qp = qp;
>>>>> + edma->rx_prod = 0;
>>>>> + edma->rx_cons = 0;
>>>>> + edma->tx_cons = 0;
>>>>> + edma->tx_issue = 0;
>>>>> +
>>>>> + spin_lock_init(&edma->rx_lock);
>>>>> + spin_lock_init(&edma->tx_lock);
>>>>> +
>>>>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
>>>>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
>>>>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
>>>>> +{
>>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
>>>>> +
>>>>> + cancel_work_sync(&edma->db_work);
>>>>> + cancel_work_sync(&edma->rx_work);
>>>>> + cancel_work_sync(&edma->tx_work);
>>>>> +
>>>>> + kfree(qp->priv);
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct pci_dev *pdev = ndev->pdev;
>>>>> + int rc;
>>>>> +
>>>>> + rc = ntb_transport_edma_ep_init(nt);
>>>>> + if (rc)
>>>>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
>>>>> +
>>>>> + return rc;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct pci_dev *pdev = ndev->pdev;
>>>>> + int rc;
>>>>> +
>>>>> + rc = ntb_transport_edma_rc_init(nt);
>>>>> + if (rc)
>>>>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
>>>>> +
>>>>> + return rc;
>>>>> +}
>>>>> +
>>>>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
>>>>> + unsigned int *mw_count)
>>>>> +{
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
>>>>> +
>>>>> + if (!use_remote_edma)
>>>>> + return 0;
>>>>> +
>>>>> + /*
>>>>> + * We need at least one MW for the transport plus one MW reserved
>>>>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
>>>>> + */
>>>>> + if (*mw_count <= 1) {
>>>>> + dev_err(&ndev->dev,
>>>>> + "remote eDMA requires at least two MWS (have %u)\n",
>>>>> + *mw_count);
>>>>> + return -ENODEV;
>>>>> + }
>>>>> +
>>>>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
>>>>> + if (!ctx->wq) {
>>>>> + ntb_transport_edma_uninit(nt);
>>>>> + return -ENOMEM;
>>>>> + }
>>>>> +
>>>>> + /* Reserve the last peer MW exclusively for the eDMA window. */
>>>>> + *mw_count -= 1;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + ntb_transport_edma_uninit(nt);
>>>>> +}
>>>>> +
>>>>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
>>>>> + .enable = ntb_transport_edma_enable,
>>>>> + .disable = ntb_transport_edma_disable,
>>>>> + .qp_init = ntb_transport_edma_qp_init,
>>>>> + .qp_free = ntb_transport_edma_qp_free,
>>>>> + .pre_link_up = ntb_transport_edma_pre_link_up,
>>>>> + .post_link_up = ntb_transport_edma_post_link_up,
>>>>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
>>>>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
>>>>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
>>>>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
>>>>> + .rx_poll = ntb_transport_edma_rx_poll,
>>>>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
>>>>> +};
>>>>> +
>>>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + struct ntb_dev *ndev = nt->ndev;
>>>>> + int node;
>>>>> +
>>>>> + node = dev_to_node(&ndev->dev);
>>>>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
>>>>> + node);
>>>>> + if (!nt->priv)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + nt->backend_ops = edma_backend_ops;
>>>>> + /*
>>>>> + * On remote eDMA mode, one DMA read channel is used for Host side
>>>>> + * to interrupt EP.
>>>>> + */
>>>>> + use_msi = false;
>>>>> + return 0;
>>>>> +}
>>>>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
>>>>> index 51ff08062d73..9fff65980d3d 100644
>>>>> --- a/drivers/ntb/ntb_transport_internal.h
>>>>> +++ b/drivers/ntb/ntb_transport_internal.h
>>>>> @@ -8,6 +8,7 @@
>>>>> extern unsigned long max_mw_size;
>>>>> extern unsigned int transport_mtu;
>>>>> extern bool use_msi;
>>>>> +extern bool use_remote_edma;
>>>>>
>>>>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
>>>>>
>>>>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
>>>>> struct ntb_payload_header __iomem *tx_hdr;
>>>>> struct ntb_payload_header *rx_hdr;
>>>>> };
>>>>> +
>>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>>>> + dma_addr_t addr;
>>>>> + struct scatterlist sgl;
>>>>> +#endif
>>>>> };
>>>>>
>>>>> struct ntb_rx_info {
>>>>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
>>>>> unsigned int qp_num);
>>>>> struct device *get_dma_dev(struct ntb_dev *ndev);
>>>>>
>>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
>>>>> +#else
>>>>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
>>>>> +{
>>>>> + return -EOPNOTSUPP;
>>>>> +}
>>>>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
>>>>> +
>>>>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
>>>>
>>
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-08 17:55 ` Dave Jiang
@ 2026-01-10 13:43 ` Koichiro Den
2026-01-12 15:43 ` Dave Jiang
0 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2026-01-10 13:43 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Jan 08, 2026 at 10:55:46AM -0700, Dave Jiang wrote:
>
>
> On 1/7/26 6:25 PM, Koichiro Den wrote:
> > On Wed, Jan 07, 2026 at 12:02:15PM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 1/7/26 7:54 AM, Koichiro Den wrote:
> >>> On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
> >>>>
> >>>>
> >>>> On 12/17/25 8:16 AM, Koichiro Den wrote:
> >>>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> >>>>> located on the endpoint, to be driven by both host and endpoint.
> >>>>>
> >>>>> The endpoint exposes a dedicated memory window which contains the eDMA
> >>>>> register block, a small control structure (struct ntb_edma_info) and
> >>>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
> >>>>> its local eDMA write channels for its transmission, while host side
> >>>>> uses the remote eDMA read channels for its transmission.
> >>>>>
> >>>>> A key benefit of this backend is that the memory window no longer needs
> >>>>> to carry data-plane payload. This makes the design less sensitive to
> >>>>> limited memory window space and allows scaling to multiple queue pairs.
> >>>>> The memory window layout is specific to the eDMA-backed backend, so
> >>>>> there is no automatic fallback to the memcpy-based default transport
> >>>>> that requires the different layout.
> >>>>>
> >>>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> >>>>> ---
> >>>>> drivers/ntb/Kconfig | 12 +
> >>>>> drivers/ntb/Makefile | 2 +
> >>>>> drivers/ntb/ntb_transport_core.c | 15 +-
> >>>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> >>>>> drivers/ntb/ntb_transport_internal.h | 15 +
> >>>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
> >>>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
> >>>>>
> >>>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> >>>>> index df16c755b4da..5ba6d0b7f5ba 100644
> >>>>> --- a/drivers/ntb/Kconfig
> >>>>> +++ b/drivers/ntb/Kconfig
> >>>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >>>>>
> >>>>> If unsure, say N.
> >>>>>
> >>>>> +config NTB_TRANSPORT_EDMA
> >>>>> + bool "NTB Transport backed by remote eDMA"
> >>>>> + depends on NTB_TRANSPORT
> >>>>> + depends on PCI
> >>>>> + select DMA_ENGINE
> >>>>> + select NTB_EDMA
> >>>>> + help
> >>>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
> >>>>> + exposed through a dedicated NTB memory window. The host uses the
> >>>>> + endpoint's eDMA engine to move data in both directions.
> >>>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> >>>>> +
> >>>>> endif # NTB
> >>>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> >>>>> index 9b66e5fafbc0..b9086b32ecde 100644
> >>>>> --- a/drivers/ntb/Makefile
> >>>>> +++ b/drivers/ntb/Makefile
> >>>>> @@ -6,3 +6,5 @@ ntb-y := core.o
> >>>>> ntb-$(CONFIG_NTB_MSI) += msi.o
> >>>>>
> >>>>> ntb_transport-y := ntb_transport_core.o
> >>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> >>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> >>>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> >>>>> index 40c2548f5930..bd21232f26fe 100644
> >>>>> --- a/drivers/ntb/ntb_transport_core.c
> >>>>> +++ b/drivers/ntb/ntb_transport_core.c
> >>>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> >>>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> >>>>> #endif
> >>>>>
> >>>>> +bool use_remote_edma;
> >>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>>>> +module_param(use_remote_edma, bool, 0644);
> >>>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> >>>>> +#endif
> >>>>
> >>>> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
> >>>
> >>> Agreed. I plan to drop 'use_remote_edma' and instead,
> >>> - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
> >>> - introduce ntb_transport_backend_register() for transports to self-register via
> >>> struct ntb_transport_backend { .name, .ops }, and
> >>> - have the core select the backend whose .name matches transport_type.
> >>>
> >>> I think this should keep any non-default transport-specific logic out of
> >>> ntb_transport_core, or at least keep it to a minimum, while still allowing
> >>> non-defualt transports (*ntb_transport_edma is the only choice for now
> >>> though) to plug in cleanly.
> >>>
> >>> If you see a cleaner approach, I would appreciate it if you could elaborate
> >>> a bit more on your idea.
> >>
> >
> > Thank you for the comment, let me respond inline below.
> >
> >> Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing?
> >
> > At least in the current implementation, the remote eDMA use is an
> > all-or-nothing type rather than something that can be selected per
> > ntb_transport_mw.
> >
> > The way remote eDMA consumes MW is quite similar to how ntb_msi uses them
> > today. Assuming multiple MWs are available, the last MW is reserved to
> > expose the remote eDMA info/register/LL regions to the host by packing all
> > of them into a single MW. In that sense, it does not map naturally to a
> > per-MW selection model.
> >
> >> I'm trying to see if we can do away with the module param.
> >
> > I think it is useful to keep an explicit way for an administrator to choose
> > the transport type (default vs edma). Even on platforms where dw-edma is
> > available, there can potentially be platform-specific or hard-to-reproduce
> > issues (e.g. problems that only show up with certain transfer patterns),
> > and having a way to fall back the long-existing traditional transport can
> > be valuable.
> >
> > That said, I am not opposed to making the default behavior an automatic
> > selection, where edma is chosen when it's available and the parameter is
> > left unset.
> >
> >> Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
> >
> > I'm not sure how selecting the transport type at ntb_netdev probe time
> > would work in practice, and what additional benefit that would provide.
>
> So currently ntb_netdev or ntb_transport are not auto-loaded right? They are manually probed by the user. So with the new transport, the user would modprobe ntb_transport_edma.ko. And that would trigger the eDMA transport setup right? With the ntb_transport_core library existing, we should be able to load both the ntb_transport_host and ntb_transport_edma at the same time theoretically. And ntb_netdev should be able to select one or the other transport. This is the most versatile scenario. An alternative is there can be only 1 transport ever loaded, and when ntb_transport_edma is loaded, it just looks like the default transport and netdev functions as it always has without knowing what the underneath transport is. On the platform if there are multiple NTB ports, it would be nice to have the flexibility of allowing each port choose the usage of the current transport and the edma transport if the user desires.
I was assuming manual load in my previous response. Also in this RFC v3,
ntb_transport_edma is not even a standalone module yet (although I do think
it should be). At this point, I feel the RFC v3 implementation is still a
bit too rough to use as a basis for discussing the ideal long-term design,
so I'd like to set it aside for a moment and focus on what the ideal shape
could look like.
My current thoughts on the ideal structure, after reading your last
comment, are as follows:
* The existing cpu/dma memcpy-based transport becomes "ntb_transport_host",
and the new eDMA-based transport becomes "ntb_transport_edma".
* Each transport is a separate kernel module, and each provides its own
ntb_client implementation (i.e. each registers independently with the
NTB core). In this model, it should be perfectly fine for both modules to
be loaded at the same time.
* Common pieces (e.g. ntb_transport_bus registration, shared helpers, and
the boundary/API exposed to ntb_transport_clients such as ntb_netdev)
should live in a shared library module, such as "ntb_transport_core" (or
"ntb_transport", naming TBD).
Then, for transport type selection:
* If we want to switch the transport type (host vs edma) on a per-NTB-port
(device) basis, we can rely on the standard driver override framework
(ie. driver_override, unbind/bind). To make that work, at first we need
to add driver_override support to ntb_bus.
* In the case that ntb_netdev wants to explicitly select a transport type,
I think it should still be handled via the per-NTB-port driver_override
rather than building transport-selection logic into ntb_netdev itself
(perhaps with some extension to the boundary API for
ntb_transport_clients).
* If ntb_transport_host / ntb_transport_edma are built-in modules, a
post-boot rebind might be sufficient in most cases. If that's not
sufficient, we could also consider providing a kernel parameter to define
a boot-time policy. For example, something like:
ntb_transport.policy=edma@0000:01:00.0,host@0000:5f:00.0
How doe that sound? In any case, I am planning to submit RFC v4.
Thanks for the review,
Koichiro
>
> DJ
>
> >
> > Kind regards,
> > Koichiro
> >
> >>
> >>>
> >>> Thanks,
> >>> Koichiro
> >>>
> >>>>
> >>>> DJ
> >>>>
> >>>>> +
> >>>>> static struct dentry *nt_debugfs_dir;
> >>>>>
> >>>>> /* Only two-ports NTB devices are supported */
> >>>>> @@ -156,7 +162,7 @@ enum {
> >>>>> #define drv_client(__drv) \
> >>>>> container_of((__drv), struct ntb_transport_client, driver)
> >>>>>
> >>>>> -#define NTB_QP_DEF_NUM_ENTRIES 100
> >>>>> +#define NTB_QP_DEF_NUM_ENTRIES 128
> >>>>> #define NTB_LINK_DOWN_TIMEOUT 10
> >>>>>
> >>>>> static void ntb_transport_rxc_db(unsigned long data);
> >>>>> @@ -1189,7 +1195,11 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
> >>>>>
> >>>>> nt->ndev = ndev;
> >>>>>
> >>>>> - rc = ntb_transport_default_init(nt);
> >>>>> + if (use_remote_edma)
> >>>>> + rc = ntb_transport_edma_init(nt);
> >>>>> + else
> >>>>> + rc = ntb_transport_default_init(nt);
> >>>>> +
> >>>>> if (rc)
> >>>>> return rc;
> >>>>>
> >>>>> @@ -1950,6 +1960,7 @@ ntb_transport_create_queue(void *data, struct device *client_dev,
> >>>>>
> >>>>> nt->qp_bitmap_free &= ~qp_bit;
> >>>>>
> >>>>> + qp->qp_bit = qp_bit;
> >>>>> qp->cb_data = data;
> >>>>> qp->rx_handler = handlers->rx_handler;
> >>>>> qp->tx_handler = handlers->tx_handler;
> >>>>> diff --git a/drivers/ntb/ntb_transport_edma.c b/drivers/ntb/ntb_transport_edma.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..6ae5da0a1367
> >>>>> --- /dev/null
> >>>>> +++ b/drivers/ntb/ntb_transport_edma.c
> >>>>> @@ -0,0 +1,987 @@
> >>>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>>> +/*
> >>>>> + * NTB transport backend for remote DesignWare eDMA.
> >>>>> + *
> >>>>> + * This implements the backend_ops used when use_remote_edma=1 and
> >>>>> + * relies on drivers/ntb/hw/edma/ for low-level eDMA/MW programming.
> >>>>> + */
> >>>>> +
> >>>>> +#include <linux/bug.h>
> >>>>> +#include <linux/compiler.h>
> >>>>> +#include <linux/debugfs.h>
> >>>>> +#include <linux/dmaengine.h>
> >>>>> +#include <linux/dma-mapping.h>
> >>>>> +#include <linux/errno.h>
> >>>>> +#include <linux/io-64-nonatomic-lo-hi.h>
> >>>>> +#include <linux/ntb.h>
> >>>>> +#include <linux/pci.h>
> >>>>> +#include <linux/pci-epc.h>
> >>>>> +#include <linux/seq_file.h>
> >>>>> +#include <linux/slab.h>
> >>>>> +
> >>>>> +#include "hw/edma/ntb_hw_edma.h"
> >>>>> +#include "ntb_transport_internal.h"
> >>>>> +
> >>>>> +#define NTB_EDMA_RING_ORDER 7
> >>>>> +#define NTB_EDMA_RING_ENTRIES (1U << NTB_EDMA_RING_ORDER)
> >>>>> +#define NTB_EDMA_RING_MASK (NTB_EDMA_RING_ENTRIES - 1)
> >>>>> +
> >>>>> +#define NTB_EDMA_MAX_POLL 32
> >>>>> +
> >>>>> +/*
> >>>>> + * Remote eDMA mode implementation
> >>>>> + */
> >>>>> +struct ntb_transport_ctx_edma {
> >>>>> + remote_edma_mode_t remote_edma_mode;
> >>>>> + struct device *dma_dev;
> >>>>> + struct workqueue_struct *wq;
> >>>>> + struct ntb_edma_chans chans;
> >>>>> +};
> >>>>> +
> >>>>> +struct ntb_transport_qp_edma {
> >>>>> + struct ntb_transport_qp *qp;
> >>>>> +
> >>>>> + /*
> >>>>> + * For ensuring peer notification in non-atomic context.
> >>>>> + * ntb_peer_db_set might sleep or schedule.
> >>>>> + */
> >>>>> + struct work_struct db_work;
> >>>>> +
> >>>>> + u32 rx_prod;
> >>>>> + u32 rx_cons;
> >>>>> + u32 tx_cons;
> >>>>> + u32 tx_issue;
> >>>>> +
> >>>>> + spinlock_t rx_lock;
> >>>>> + spinlock_t tx_lock;
> >>>>> +
> >>>>> + struct work_struct rx_work;
> >>>>> + struct work_struct tx_work;
> >>>>> +};
> >>>>> +
> >>>>> +struct ntb_edma_desc {
> >>>>> + u32 len;
> >>>>> + u32 flags;
> >>>>> + u64 addr; /* DMA address */
> >>>>> + u64 data;
> >>>>> +};
> >>>>> +
> >>>>> +struct ntb_edma_ring {
> >>>>> + struct ntb_edma_desc desc[NTB_EDMA_RING_ENTRIES];
> >>>>> + u32 head;
> >>>>> + u32 tail;
> >>>>> +};
> >>>>> +
> >>>>> +static inline bool ntb_qp_edma_is_rc(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>>>> +
> >>>>> + return ctx->remote_edma_mode == REMOTE_EDMA_RC;
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool ntb_qp_edma_is_ep(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>>>> +
> >>>>> + return ctx->remote_edma_mode == REMOTE_EDMA_EP;
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool ntb_qp_edma_enabled(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + return ntb_qp_edma_is_rc(qp) || ntb_qp_edma_is_ep(qp);
> >>>>> +}
> >>>>> +
> >>>>> +static inline unsigned int ntb_edma_ring_sel(struct ntb_transport_qp *qp,
> >>>>> + unsigned int n)
> >>>>> +{
> >>>>> + return n ^ !!ntb_qp_edma_is_ep(qp);
> >>>>> +}
> >>>>> +
> >>>>> +static inline struct ntb_edma_ring *
> >>>>> +ntb_edma_ring_local(struct ntb_transport_qp *qp, unsigned int n)
> >>>>> +{
> >>>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>>>> +
> >>>>> + return &((struct ntb_edma_ring *)qp->rx_buff)[r];
> >>>>> +}
> >>>>> +
> >>>>> +static inline struct ntb_edma_ring __iomem *
> >>>>> +ntb_edma_ring_remote(struct ntb_transport_qp *qp, unsigned int n)
> >>>>> +{
> >>>>> + unsigned int r = ntb_edma_ring_sel(qp, n);
> >>>>> +
> >>>>> + return &((struct ntb_edma_ring __iomem *)qp->tx_mw)[r];
> >>>>> +}
> >>>>> +
> >>>>> +static inline struct ntb_edma_desc *
> >>>>> +ntb_edma_desc_local(struct ntb_transport_qp *qp, unsigned int n, unsigned int i)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_local(qp, n)->desc[i];
> >>>>> +}
> >>>>> +
> >>>>> +static inline struct ntb_edma_desc __iomem *
> >>>>> +ntb_edma_desc_remote(struct ntb_transport_qp *qp, unsigned int n,
> >>>>> + unsigned int i)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_remote(qp, n)->desc[i];
> >>>>> +}
> >>>>> +
> >>>>> +static inline u32 *ntb_edma_head_local(struct ntb_transport_qp *qp,
> >>>>> + unsigned int n)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_local(qp, n)->head;
> >>>>> +}
> >>>>> +
> >>>>> +static inline u32 __iomem *ntb_edma_head_remote(struct ntb_transport_qp *qp,
> >>>>> + unsigned int n)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_remote(qp, n)->head;
> >>>>> +}
> >>>>> +
> >>>>> +static inline u32 *ntb_edma_tail_local(struct ntb_transport_qp *qp,
> >>>>> + unsigned int n)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_local(qp, n)->tail;
> >>>>> +}
> >>>>> +
> >>>>> +static inline u32 __iomem *ntb_edma_tail_remote(struct ntb_transport_qp *qp,
> >>>>> + unsigned int n)
> >>>>> +{
> >>>>> + return &ntb_edma_ring_remote(qp, n)->tail;
> >>>>> +}
> >>>>> +
> >>>>> +/* The 'i' must be generated by ntb_edma_ring_idx() */
> >>>>> +#define NTB_DESC_TX_O(qp, i) ntb_edma_desc_remote(qp, 0, i)
> >>>>> +#define NTB_DESC_TX_I(qp, i) ntb_edma_desc_local(qp, 0, i)
> >>>>> +#define NTB_DESC_RX_O(qp, i) ntb_edma_desc_remote(qp, 1, i)
> >>>>> +#define NTB_DESC_RX_I(qp, i) ntb_edma_desc_local(qp, 1, i)
> >>>>> +
> >>>>> +#define NTB_HEAD_TX_I(qp) ntb_edma_head_local(qp, 0)
> >>>>> +#define NTB_HEAD_RX_O(qp) ntb_edma_head_remote(qp, 1)
> >>>>> +
> >>>>> +#define NTB_TAIL_TX_O(qp) ntb_edma_tail_remote(qp, 0)
> >>>>> +#define NTB_TAIL_RX_I(qp) ntb_edma_tail_local(qp, 1)
> >>>>> +
> >>>>> +/* ntb_edma_ring helpers */
> >>>>> +static __always_inline u32 ntb_edma_ring_idx(u32 v)
> >>>>> +{
> >>>>> + return v & NTB_EDMA_RING_MASK;
> >>>>> +}
> >>>>> +
> >>>>> +static __always_inline u32 ntb_edma_ring_used_entry(u32 head, u32 tail)
> >>>>> +{
> >>>>> + if (head >= tail) {
> >>>>> + WARN_ON_ONCE((head - tail) > (NTB_EDMA_RING_ENTRIES - 1));
> >>>>> + return head - tail;
> >>>>> + }
> >>>>> +
> >>>>> + WARN_ON_ONCE((U32_MAX - tail + head + 1) > (NTB_EDMA_RING_ENTRIES - 1));
> >>>>> + return U32_MAX - tail + head + 1;
> >>>>> +}
> >>>>> +
> >>>>> +static __always_inline u32 ntb_edma_ring_free_entry(u32 head, u32 tail)
> >>>>> +{
> >>>>> + return NTB_EDMA_RING_ENTRIES - ntb_edma_ring_used_entry(head, tail) - 1;
> >>>>> +}
> >>>>> +
> >>>>> +static __always_inline bool ntb_edma_ring_full(u32 head, u32 tail)
> >>>>> +{
> >>>>> + return ntb_edma_ring_free_entry(head, tail) == 0;
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned int ntb_transport_edma_tx_free_entry(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> + unsigned int head, tail;
> >>>>> +
> >>>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>>>> + /* In this scope, only 'head' might proceed */
> >>>>> + tail = READ_ONCE(edma->tx_issue);
> >>>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>>>> + }
> >>>>> + /*
> >>>>> + * 'used' amount indicates how much the other end has refilled,
> >>>>> + * which are available for us to use for TX.
> >>>>> + */
> >>>>> + return ntb_edma_ring_used_entry(head, tail);
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_debugfs_stats_show(struct seq_file *s,
> >>>>> + struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + seq_printf(s, "rx_bytes - \t%llu\n", qp->rx_bytes);
> >>>>> + seq_printf(s, "rx_pkts - \t%llu\n", qp->rx_pkts);
> >>>>> + seq_printf(s, "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> >>>>> + seq_printf(s, "rx_buff - \t0x%p\n", qp->rx_buff);
> >>>>> + seq_printf(s, "rx_max_entry - \t%u\n", qp->rx_max_entry);
> >>>>> + seq_printf(s, "rx_alloc_entry - \t%u\n\n", qp->rx_alloc_entry);
> >>>>> +
> >>>>> + seq_printf(s, "tx_bytes - \t%llu\n", qp->tx_bytes);
> >>>>> + seq_printf(s, "tx_pkts - \t%llu\n", qp->tx_pkts);
> >>>>> + seq_printf(s, "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> >>>>> + seq_printf(s, "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> >>>>> + seq_printf(s, "tx_mw - \t0x%p\n", qp->tx_mw);
> >>>>> + seq_printf(s, "tx_max_entry - \t%u\n", qp->tx_max_entry);
> >>>>> + seq_printf(s, "free tx - \t%u\n", ntb_transport_tx_free_entry(qp));
> >>>>> + seq_putc(s, '\n');
> >>>>> +
> >>>>> + seq_puts(s, "Using Remote eDMA - Yes\n");
> >>>>> + seq_printf(s, "QP Link - \t%s\n", qp->link_is_up ? "Up" : "Down");
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_uninit(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> +
> >>>>> + if (ctx->wq)
> >>>>> + destroy_workqueue(ctx->wq);
> >>>>> + ctx->wq = NULL;
> >>>>> +
> >>>>> + ntb_edma_teardown_chans(&ctx->chans);
> >>>>> +
> >>>>> + switch (ctx->remote_edma_mode) {
> >>>>> + case REMOTE_EDMA_EP:
> >>>>> + ntb_edma_teardown_mws(nt->ndev);
> >>>>> + break;
> >>>>> + case REMOTE_EDMA_RC:
> >>>>> + ntb_edma_teardown_peer(nt->ndev);
> >>>>> + break;
> >>>>> + case REMOTE_EDMA_UNKNOWN:
> >>>>> + default:
> >>>>> + break;
> >>>>> + }
> >>>>> +
> >>>>> + ctx->remote_edma_mode = REMOTE_EDMA_UNKNOWN;
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_db_work(struct work_struct *work)
> >>>>> +{
> >>>>> + struct ntb_transport_qp_edma *edma =
> >>>>> + container_of(work, struct ntb_transport_qp_edma, db_work);
> >>>>> + struct ntb_transport_qp *qp = edma->qp;
> >>>>> +
> >>>>> + ntb_peer_db_set(qp->ndev, qp->qp_bit);
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_notify_peer(struct ntb_transport_qp_edma *edma)
> >>>>> +{
> >>>>> + struct ntb_transport_qp *qp = edma->qp;
> >>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>>>> +
> >>>>> + if (!ntb_edma_notify_peer(&ctx->chans, qp->qp_num))
> >>>>> + return;
> >>>>> +
> >>>>> + /*
> >>>>> + * Called from contexts that may be atomic. Since ntb_peer_db_set()
> >>>>> + * may sleep, delegate the actual doorbell write to a workqueue.
> >>>>> + */
> >>>>> + queue_work(system_highpri_wq, &edma->db_work);
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_isr(void *data, int qp_num)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx *nt = data;
> >>>>> + struct ntb_transport_qp_edma *edma;
> >>>>> + struct ntb_transport_ctx_edma *ctx;
> >>>>> + struct ntb_transport_qp *qp;
> >>>>> +
> >>>>> + if (qp_num < 0 || qp_num >= nt->qp_count)
> >>>>> + return;
> >>>>> +
> >>>>> + qp = &nt->qp_vec[qp_num];
> >>>>> + if (WARN_ON(!qp))
> >>>>> + return;
> >>>>> +
> >>>>> + ctx = (struct ntb_transport_ctx_edma *)qp->transport->priv;
> >>>>> + edma = qp->priv;
> >>>>> +
> >>>>> + queue_work(ctx->wq, &edma->rx_work);
> >>>>> + queue_work(ctx->wq, &edma->tx_work);
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_rc_init(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct pci_dev *pdev = ndev->pdev;
> >>>>> + int peer_mw;
> >>>>> + int rc;
> >>>>> +
> >>>>> + if (!use_remote_edma || ctx->remote_edma_mode != REMOTE_EDMA_UNKNOWN)
> >>>>> + return 0;
> >>>>> +
> >>>>> + peer_mw = ntb_peer_mw_count(ndev);
> >>>>> + if (peer_mw <= 0)
> >>>>> + return -ENODEV;
> >>>>> +
> >>>>> + rc = ntb_edma_setup_peer(ndev, peer_mw - 1, nt->qp_count);
> >>>>> + if (rc) {
> >>>>> + dev_err(&pdev->dev, "Failed to enable remote eDMA: %d\n", rc);
> >>>>> + return rc;
> >>>>> + }
> >>>>> +
> >>>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, true);
> >>>>> + if (rc) {
> >>>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>>>> + goto err_teardown_peer;
> >>>>> + }
> >>>>> +
> >>>>> + rc = ntb_edma_setup_intr_chan(get_dma_dev(ndev), &ctx->chans);
> >>>>> + if (rc) {
> >>>>> + dev_err(&pdev->dev, "Failed to setup eDMA notify channel: %d\n",
> >>>>> + rc);
> >>>>> + goto err_teardown_chans;
> >>>>> + }
> >>>>> +
> >>>>> + ctx->remote_edma_mode = REMOTE_EDMA_RC;
> >>>>> + return 0;
> >>>>> +
> >>>>> +err_teardown_chans:
> >>>>> + ntb_edma_teardown_chans(&ctx->chans);
> >>>>> +err_teardown_peer:
> >>>>> + ntb_edma_teardown_peer(ndev);
> >>>>> + return rc;
> >>>>> +}
> >>>>> +
> >>>>> +
> >>>>> +static int ntb_transport_edma_ep_init(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct pci_dev *pdev = ndev->pdev;
> >>>>> + int peer_mw;
> >>>>> + int rc;
> >>>>> +
> >>>>> + if (!use_remote_edma || ctx->remote_edma_mode == REMOTE_EDMA_EP)
> >>>>> + return 0;
> >>>>> +
> >>>>> + /**
> >>>>> + * This check assumes that the endpoint (pci-epf-vntb.c)
> >>>>> + * ntb_dev_ops implements .get_private_data() while the host side
> >>>>> + * (ntb_hw_epf.c) does not.
> >>>>> + */
> >>>>> + if (!ntb_get_private_data(ndev))
> >>>>> + return 0;
> >>>>> +
> >>>>> + peer_mw = ntb_peer_mw_count(ndev);
> >>>>> + if (peer_mw <= 0)
> >>>>> + return -ENODEV;
> >>>>> +
> >>>>> + rc = ntb_edma_setup_mws(ndev, peer_mw - 1, nt->qp_count,
> >>>>> + ntb_transport_edma_isr, nt);
> >>>>> + if (rc) {
> >>>>> + dev_err(&pdev->dev,
> >>>>> + "Failed to set up memory window for eDMA: %d\n", rc);
> >>>>> + return rc;
> >>>>> + }
> >>>>> +
> >>>>> + rc = ntb_edma_setup_chans(get_dma_dev(ndev), &ctx->chans, false);
> >>>>> + if (rc) {
> >>>>> + dev_err(&pdev->dev, "Failed to setup eDMA channels: %d\n", rc);
> >>>>> + ntb_edma_teardown_mws(ndev);
> >>>>> + return rc;
> >>>>> + }
> >>>>> +
> >>>>> + ctx->remote_edma_mode = REMOTE_EDMA_EP;
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +
> >>>>> +static int ntb_transport_edma_setup_qp_mw(struct ntb_transport_ctx *nt,
> >>>>> + unsigned int qp_num)
> >>>>> +{
> >>>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct ntb_queue_entry *entry;
> >>>>> + struct ntb_transport_mw *mw;
> >>>>> + unsigned int mw_num, mw_count, qp_count;
> >>>>> + unsigned int qp_offset, rx_info_offset;
> >>>>> + unsigned int mw_size, mw_size_per_qp;
> >>>>> + unsigned int num_qps_mw;
> >>>>> + size_t edma_total;
> >>>>> + unsigned int i;
> >>>>> + int node;
> >>>>> +
> >>>>> + mw_count = nt->mw_count;
> >>>>> + qp_count = nt->qp_count;
> >>>>> +
> >>>>> + mw_num = QP_TO_MW(nt, qp_num);
> >>>>> + mw = &nt->mw_vec[mw_num];
> >>>>> +
> >>>>> + if (!mw->virt_addr)
> >>>>> + return -ENOMEM;
> >>>>> +
> >>>>> + if (mw_num < qp_count % mw_count)
> >>>>> + num_qps_mw = qp_count / mw_count + 1;
> >>>>> + else
> >>>>> + num_qps_mw = qp_count / mw_count;
> >>>>> +
> >>>>> + mw_size = min(nt->mw_vec[mw_num].phys_size, mw->xlat_size);
> >>>>> + if (max_mw_size && mw_size > max_mw_size)
> >>>>> + mw_size = max_mw_size;
> >>>>> +
> >>>>> + mw_size_per_qp = round_down((unsigned int)mw_size / num_qps_mw, SZ_64);
> >>>>> + qp_offset = mw_size_per_qp * (qp_num / mw_count);
> >>>>> + rx_info_offset = mw_size_per_qp - sizeof(struct ntb_rx_info);
> >>>>> +
> >>>>> + qp->tx_mw_size = mw_size_per_qp;
> >>>>> + qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
> >>>>> + if (!qp->tx_mw)
> >>>>> + return -EINVAL;
> >>>>> + qp->tx_mw_phys = nt->mw_vec[mw_num].phys_addr + qp_offset;
> >>>>> + if (!qp->tx_mw_phys)
> >>>>> + return -EINVAL;
> >>>>> + qp->rx_info = qp->tx_mw + rx_info_offset;
> >>>>> + qp->rx_buff = mw->virt_addr + qp_offset;
> >>>>> + qp->remote_rx_info = qp->rx_buff + rx_info_offset;
> >>>>> +
> >>>>> + /* Due to housekeeping, there must be at least 2 buffs */
> >>>>> + qp->tx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>>>> + qp->rx_max_frame = min(transport_mtu, mw_size_per_qp / 2);
> >>>>> +
> >>>>> + /* In eDMA mode, decouple from MW sizing and force ring-sized entries */
> >>>>> + edma_total = 2 * sizeof(struct ntb_edma_ring);
> >>>>> + if (rx_info_offset < edma_total) {
> >>>>> + dev_err(&ndev->dev, "Ring space requires %zuB (>=%uB)\n",
> >>>>> + edma_total, rx_info_offset);
> >>>>> + return -EINVAL;
> >>>>> + }
> >>>>> + qp->tx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>>>> + qp->rx_max_entry = NTB_EDMA_RING_ENTRIES;
> >>>>> +
> >>>>> + /*
> >>>>> + * Checking to see if we have more entries than the default.
> >>>>> + * We should add additional entries if that is the case so we
> >>>>> + * can be in sync with the transport frames.
> >>>>> + */
> >>>>> + node = dev_to_node(&ndev->dev);
> >>>>> + for (i = qp->rx_alloc_entry; i < qp->rx_max_entry; i++) {
> >>>>> + entry = kzalloc_node(sizeof(*entry), GFP_KERNEL, node);
> >>>>> + if (!entry)
> >>>>> + return -ENOMEM;
> >>>>> +
> >>>>> + entry->qp = qp;
> >>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>>>> + &qp->rx_free_q);
> >>>>> + qp->rx_alloc_entry++;
> >>>>> + }
> >>>>> +
> >>>>> + memset(qp->rx_buff, 0, edma_total);
> >>>>> +
> >>>>> + qp->rx_pkts = 0;
> >>>>> + qp->tx_pkts = 0;
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_rx_complete(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> + struct ntb_queue_entry *entry;
> >>>>> + struct ntb_edma_desc *in;
> >>>>> + unsigned int len;
> >>>>> + bool link_down;
> >>>>> + u32 idx;
> >>>>> +
> >>>>> + if (ntb_edma_ring_used_entry(READ_ONCE(*NTB_TAIL_RX_I(qp)),
> >>>>> + edma->rx_cons) == 0)
> >>>>> + return 0;
> >>>>> +
> >>>>> + idx = ntb_edma_ring_idx(edma->rx_cons);
> >>>>> + in = NTB_DESC_RX_I(qp, idx);
> >>>>> + if (!(in->flags & DESC_DONE_FLAG))
> >>>>> + return 0;
> >>>>> +
> >>>>> + link_down = in->flags & LINK_DOWN_FLAG;
> >>>>> + in->flags = 0;
> >>>>> + len = in->len; /* might be smaller than entry->len */
> >>>>> +
> >>>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>>>> + if (WARN_ON(!entry))
> >>>>> + return 0;
> >>>>> +
> >>>>> + if (link_down) {
> >>>>> + ntb_qp_link_down(qp);
> >>>>> + edma->rx_cons++;
> >>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>>>> + return 1;
> >>>>> + }
> >>>>> +
> >>>>> + dma_unmap_single(dma_dev, entry->addr, entry->len, DMA_FROM_DEVICE);
> >>>>> +
> >>>>> + qp->rx_bytes += len;
> >>>>> + qp->rx_pkts++;
> >>>>> + edma->rx_cons++;
> >>>>> +
> >>>>> + if (qp->rx_handler && qp->client_ready)
> >>>>> + qp->rx_handler(qp, qp->cb_data, entry->cb_data, len);
> >>>>> +
> >>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_free_q);
> >>>>> + return 1;
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_rx_work(struct work_struct *work)
> >>>>> +{
> >>>>> + struct ntb_transport_qp_edma *edma = container_of(
> >>>>> + work, struct ntb_transport_qp_edma, rx_work);
> >>>>> + struct ntb_transport_qp *qp = edma->qp;
> >>>>> + struct ntb_transport_ctx_edma *ctx = qp->transport->priv;
> >>>>> + unsigned int i;
> >>>>> +
> >>>>> + for (i = 0; i < NTB_EDMA_MAX_POLL; i++) {
> >>>>> + if (!ntb_transport_edma_rx_complete(qp))
> >>>>> + break;
> >>>>> + }
> >>>>> +
> >>>>> + if (ntb_transport_edma_rx_complete(qp))
> >>>>> + queue_work(ctx->wq, &edma->rx_work);
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_tx_work(struct work_struct *work)
> >>>>> +{
> >>>>> + struct ntb_transport_qp_edma *edma = container_of(
> >>>>> + work, struct ntb_transport_qp_edma, tx_work);
> >>>>> + struct ntb_transport_qp *qp = edma->qp;
> >>>>> + struct ntb_edma_desc *in, __iomem *out;
> >>>>> + struct ntb_queue_entry *entry;
> >>>>> + unsigned int len;
> >>>>> + void *cb_data;
> >>>>> + u32 idx;
> >>>>> +
> >>>>> + while (ntb_edma_ring_used_entry(READ_ONCE(edma->tx_issue),
> >>>>> + edma->tx_cons) != 0) {
> >>>>> + /* Paired with smp_wmb() in ntb_transport_edma_tx_enqueue_inner() */
> >>>>> + smp_rmb();
> >>>>> +
> >>>>> + idx = ntb_edma_ring_idx(edma->tx_cons);
> >>>>> + in = NTB_DESC_TX_I(qp, idx);
> >>>>> + entry = (struct ntb_queue_entry *)(uintptr_t)in->data;
> >>>>> + if (!entry || !(entry->flags & DESC_DONE_FLAG))
> >>>>> + break;
> >>>>> +
> >>>>> + in->data = 0;
> >>>>> +
> >>>>> + cb_data = entry->cb_data;
> >>>>> + len = entry->len;
> >>>>> +
> >>>>> + out = NTB_DESC_TX_O(qp, idx);
> >>>>> +
> >>>>> + WRITE_ONCE(edma->tx_cons, edma->tx_cons + 1);
> >>>>> +
> >>>>> + /*
> >>>>> + * No need to add barrier in-between to enforce ordering here.
> >>>>> + * The other side proceeds only after both flags and tail are
> >>>>> + * updated.
> >>>>> + */
> >>>>> + iowrite32(entry->flags, &out->flags);
> >>>>> + iowrite32(edma->tx_cons, NTB_TAIL_TX_O(qp));
> >>>>> +
> >>>>> + ntb_transport_edma_notify_peer(edma);
> >>>>> +
> >>>>> + ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> >>>>> + &qp->tx_free_q);
> >>>>> +
> >>>>> + if (qp->tx_handler)
> >>>>> + qp->tx_handler(qp, qp->cb_data, cb_data, len);
> >>>>> +
> >>>>> + /* stat updates */
> >>>>> + qp->tx_bytes += len;
> >>>>> + qp->tx_pkts++;
> >>>>> + }
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_tx_cb(void *data,
> >>>>> + const struct dmaengine_result *res)
> >>>>> +{
> >>>>> + struct ntb_queue_entry *entry = data;
> >>>>> + struct ntb_transport_qp *qp = entry->qp;
> >>>>> + struct ntb_transport_ctx *nt = qp->transport;
> >>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>>>> + enum dmaengine_tx_result dma_err = res->result;
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> +
> >>>>> + switch (dma_err) {
> >>>>> + case DMA_TRANS_READ_FAILED:
> >>>>> + case DMA_TRANS_WRITE_FAILED:
> >>>>> + case DMA_TRANS_ABORTED:
> >>>>> + entry->errors++;
> >>>>> + entry->len = -EIO;
> >>>>> + break;
> >>>>> + case DMA_TRANS_NOERROR:
> >>>>> + default:
> >>>>> + break;
> >>>>> + }
> >>>>> + dma_unmap_sg(dma_dev, &entry->sgl, 1, DMA_TO_DEVICE);
> >>>>> + sg_dma_address(&entry->sgl) = 0;
> >>>>> +
> >>>>> + entry->flags |= DESC_DONE_FLAG;
> >>>>> +
> >>>>> + queue_work(ctx->wq, &edma->tx_work);
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_submit(struct device *d, struct dma_chan *chan,
> >>>>> + size_t len, void *rc_src, dma_addr_t dst,
> >>>>> + struct ntb_queue_entry *entry)
> >>>>> +{
> >>>>> + struct scatterlist *sgl = &entry->sgl;
> >>>>> + struct dma_async_tx_descriptor *txd;
> >>>>> + struct dma_slave_config cfg;
> >>>>> + dma_cookie_t cookie;
> >>>>> + int nents, rc;
> >>>>> +
> >>>>> + if (!d)
> >>>>> + return -ENODEV;
> >>>>> +
> >>>>> + if (!chan)
> >>>>> + return -ENXIO;
> >>>>> +
> >>>>> + if (WARN_ON(!rc_src || !dst))
> >>>>> + return -EINVAL;
> >>>>> +
> >>>>> + if (WARN_ON(sg_dma_address(sgl)))
> >>>>> + return -EINVAL;
> >>>>> +
> >>>>> + sg_init_one(sgl, rc_src, len);
> >>>>> + nents = dma_map_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>>>> + if (nents <= 0)
> >>>>> + return -EIO;
> >>>>> +
> >>>>> + memset(&cfg, 0, sizeof(cfg));
> >>>>> + cfg.dst_addr = dst;
> >>>>> + cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>>>> + cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
> >>>>> + cfg.direction = DMA_MEM_TO_DEV;
> >>>>> +
> >>>>> + txd = dmaengine_prep_slave_sg_config(chan, sgl, 1, DMA_MEM_TO_DEV,
> >>>>> + DMA_CTRL_ACK | DMA_PREP_INTERRUPT, &cfg);
> >>>>> + if (!txd) {
> >>>>> + rc = -EIO;
> >>>>> + goto out_unmap;
> >>>>> + }
> >>>>> +
> >>>>> + txd->callback_result = ntb_transport_edma_tx_cb;
> >>>>> + txd->callback_param = entry;
> >>>>> +
> >>>>> + cookie = dmaengine_submit(txd);
> >>>>> + if (dma_submit_error(cookie)) {
> >>>>> + rc = -EIO;
> >>>>> + goto out_unmap;
> >>>>> + }
> >>>>> + dma_async_issue_pending(chan);
> >>>>> + return 0;
> >>>>> +out_unmap:
> >>>>> + dma_unmap_sg(d, sgl, 1, DMA_TO_DEVICE);
> >>>>> + return rc;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_tx_enqueue_inner(struct ntb_transport_qp *qp,
> >>>>> + struct ntb_queue_entry *entry)
> >>>>> +{
> >>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> + struct ntb_transport_ctx *nt = qp->transport;
> >>>>> + struct ntb_edma_desc *in, __iomem *out;
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> + unsigned int len = entry->len;
> >>>>> + struct dma_chan *chan;
> >>>>> + u32 issue, idx, head;
> >>>>> + dma_addr_t dst;
> >>>>> + int rc;
> >>>>> +
> >>>>> + WARN_ON_ONCE(entry->flags & DESC_DONE_FLAG);
> >>>>> +
> >>>>> + scoped_guard(spinlock_irqsave, &edma->tx_lock) {
> >>>>> + head = READ_ONCE(*NTB_HEAD_TX_I(qp));
> >>>>> + issue = edma->tx_issue;
> >>>>> + if (ntb_edma_ring_used_entry(head, issue) == 0) {
> >>>>> + qp->tx_ring_full++;
> >>>>> + return -ENOSPC;
> >>>>> + }
> >>>>> +
> >>>>> + /*
> >>>>> + * ntb_transport_edma_tx_work() checks entry->flags
> >>>>> + * so it needs to be set before tx_issue++.
> >>>>> + */
> >>>>> + idx = ntb_edma_ring_idx(issue);
> >>>>> + in = NTB_DESC_TX_I(qp, idx);
> >>>>> + in->data = (uintptr_t)entry;
> >>>>> +
> >>>>> + /* Make in->data visible before tx_issue++ */
> >>>>> + smp_wmb();
> >>>>> +
> >>>>> + WRITE_ONCE(edma->tx_issue, edma->tx_issue + 1);
> >>>>> + }
> >>>>> +
> >>>>> + /* Publish the final transfer length to the other end */
> >>>>> + out = NTB_DESC_TX_O(qp, idx);
> >>>>> + iowrite32(len, &out->len);
> >>>>> + ioread32(&out->len);
> >>>>> +
> >>>>> + if (unlikely(!len)) {
> >>>>> + entry->flags |= DESC_DONE_FLAG;
> >>>>> + queue_work(ctx->wq, &edma->tx_work);
> >>>>> + return 0;
> >>>>> + }
> >>>>> +
> >>>>> + /* Paired with dma_wmb() in ntb_transport_edma_rx_enqueue_inner() */
> >>>>> + dma_rmb();
> >>>>> +
> >>>>> + /* kick remote eDMA read transfer */
> >>>>> + dst = (dma_addr_t)in->addr;
> >>>>> + chan = ntb_edma_pick_chan(&ctx->chans, qp->qp_num);
> >>>>> + rc = ntb_transport_edma_submit(dma_dev, chan, len,
> >>>>> + entry->buf, dst, entry);
> >>>>> + if (rc) {
> >>>>> + entry->errors++;
> >>>>> + entry->len = -EIO;
> >>>>> + entry->flags |= DESC_DONE_FLAG;
> >>>>> + queue_work(ctx->wq, &edma->tx_work);
> >>>>> + }
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_tx_enqueue(struct ntb_transport_qp *qp,
> >>>>> + struct ntb_queue_entry *entry,
> >>>>> + void *cb, void *data, unsigned int len,
> >>>>> + unsigned int flags)
> >>>>> +{
> >>>>> + struct device *dma_dev;
> >>>>> +
> >>>>> + if (entry->addr) {
> >>>>> + /* Deferred unmap */
> >>>>> + dma_dev = get_dma_dev(qp->ndev);
> >>>>> + dma_unmap_single(dma_dev, entry->addr, entry->len,
> >>>>> + DMA_TO_DEVICE);
> >>>>> + }
> >>>>> +
> >>>>> + entry->cb_data = cb;
> >>>>> + entry->buf = data;
> >>>>> + entry->len = len;
> >>>>> + entry->flags = flags;
> >>>>> + entry->errors = 0;
> >>>>> + entry->addr = 0;
> >>>>> +
> >>>>> + WARN_ON_ONCE(!ntb_qp_edma_enabled(qp));
> >>>>> +
> >>>>> + return ntb_transport_edma_tx_enqueue_inner(qp, entry);
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_rx_enqueue_inner(struct ntb_transport_qp *qp,
> >>>>> + struct ntb_queue_entry *entry)
> >>>>> +{
> >>>>> + struct device *dma_dev = get_dma_dev(qp->ndev);
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> + struct ntb_edma_desc *in, __iomem *out;
> >>>>> + unsigned int len = entry->len;
> >>>>> + void *data = entry->buf;
> >>>>> + dma_addr_t dst;
> >>>>> + u32 idx;
> >>>>> + int rc;
> >>>>> +
> >>>>> + dst = dma_map_single(dma_dev, data, len, DMA_FROM_DEVICE);
> >>>>> + rc = dma_mapping_error(dma_dev, dst);
> >>>>> + if (rc)
> >>>>> + return rc;
> >>>>> +
> >>>>> + guard(spinlock_bh)(&edma->rx_lock);
> >>>>> +
> >>>>> + if (ntb_edma_ring_full(READ_ONCE(edma->rx_prod),
> >>>>> + READ_ONCE(edma->rx_cons))) {
> >>>>> + rc = -ENOSPC;
> >>>>> + goto out_unmap;
> >>>>> + }
> >>>>> +
> >>>>> + idx = ntb_edma_ring_idx(edma->rx_prod);
> >>>>> + in = NTB_DESC_RX_I(qp, idx);
> >>>>> + out = NTB_DESC_RX_O(qp, idx);
> >>>>> +
> >>>>> + iowrite32(len, &out->len);
> >>>>> + iowrite64(dst, &out->addr);
> >>>>> +
> >>>>> + WARN_ON(in->flags & DESC_DONE_FLAG);
> >>>>> + in->data = (uintptr_t)entry;
> >>>>> + entry->addr = dst;
> >>>>> +
> >>>>> + /* Ensure len/addr are visible before the head update */
> >>>>> + dma_wmb();
> >>>>> +
> >>>>> + WRITE_ONCE(edma->rx_prod, edma->rx_prod + 1);
> >>>>> + iowrite32(edma->rx_prod, NTB_HEAD_RX_O(qp));
> >>>>> +
> >>>>> + return 0;
> >>>>> +out_unmap:
> >>>>> + dma_unmap_single(dma_dev, dst, len, DMA_FROM_DEVICE);
> >>>>> + return rc;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_rx_enqueue(struct ntb_transport_qp *qp,
> >>>>> + struct ntb_queue_entry *entry)
> >>>>> +{
> >>>>> + int rc;
> >>>>> +
> >>>>> + rc = ntb_transport_edma_rx_enqueue_inner(qp, entry);
> >>>>> + if (rc) {
> >>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry,
> >>>>> + &qp->rx_free_q);
> >>>>> + return rc;
> >>>>> + }
> >>>>> +
> >>>>> + ntb_list_add(&qp->ntb_rx_q_lock, &entry->entry, &qp->rx_pend_q);
> >>>>> +
> >>>>> + if (qp->active)
> >>>>> + tasklet_schedule(&qp->rxc_db_work);
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_rx_poll(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct ntb_transport_ctx *nt = qp->transport;
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> +
> >>>>> + queue_work(ctx->wq, &edma->rx_work);
> >>>>> + queue_work(ctx->wq, &edma->tx_work);
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_qp_init(struct ntb_transport_ctx *nt,
> >>>>> + unsigned int qp_num)
> >>>>> +{
> >>>>> + struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
> >>>>> + struct ntb_transport_qp_edma *edma;
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + int node;
> >>>>> +
> >>>>> + node = dev_to_node(&ndev->dev);
> >>>>> +
> >>>>> + qp->priv = kzalloc_node(sizeof(*edma), GFP_KERNEL, node);
> >>>>> + if (!qp->priv)
> >>>>> + return -ENOMEM;
> >>>>> +
> >>>>> + edma = (struct ntb_transport_qp_edma *)qp->priv;
> >>>>> + edma->qp = qp;
> >>>>> + edma->rx_prod = 0;
> >>>>> + edma->rx_cons = 0;
> >>>>> + edma->tx_cons = 0;
> >>>>> + edma->tx_issue = 0;
> >>>>> +
> >>>>> + spin_lock_init(&edma->rx_lock);
> >>>>> + spin_lock_init(&edma->tx_lock);
> >>>>> +
> >>>>> + INIT_WORK(&edma->db_work, ntb_transport_edma_db_work);
> >>>>> + INIT_WORK(&edma->rx_work, ntb_transport_edma_rx_work);
> >>>>> + INIT_WORK(&edma->tx_work, ntb_transport_edma_tx_work);
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_qp_free(struct ntb_transport_qp *qp)
> >>>>> +{
> >>>>> + struct ntb_transport_qp_edma *edma = qp->priv;
> >>>>> +
> >>>>> + cancel_work_sync(&edma->db_work);
> >>>>> + cancel_work_sync(&edma->rx_work);
> >>>>> + cancel_work_sync(&edma->tx_work);
> >>>>> +
> >>>>> + kfree(qp->priv);
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_pre_link_up(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct pci_dev *pdev = ndev->pdev;
> >>>>> + int rc;
> >>>>> +
> >>>>> + rc = ntb_transport_edma_ep_init(nt);
> >>>>> + if (rc)
> >>>>> + dev_err(&pdev->dev, "Failed to init EP: %d\n", rc);
> >>>>> +
> >>>>> + return rc;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_post_link_up(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct pci_dev *pdev = ndev->pdev;
> >>>>> + int rc;
> >>>>> +
> >>>>> + rc = ntb_transport_edma_rc_init(nt);
> >>>>> + if (rc)
> >>>>> + dev_err(&pdev->dev, "Failed to init RC: %d\n", rc);
> >>>>> +
> >>>>> + return rc;
> >>>>> +}
> >>>>> +
> >>>>> +static int ntb_transport_edma_enable(struct ntb_transport_ctx *nt,
> >>>>> + unsigned int *mw_count)
> >>>>> +{
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + struct ntb_transport_ctx_edma *ctx = nt->priv;
> >>>>> +
> >>>>> + if (!use_remote_edma)
> >>>>> + return 0;
> >>>>> +
> >>>>> + /*
> >>>>> + * We need at least one MW for the transport plus one MW reserved
> >>>>> + * for the remote eDMA window (see ntb_edma_setup_mws/peer).
> >>>>> + */
> >>>>> + if (*mw_count <= 1) {
> >>>>> + dev_err(&ndev->dev,
> >>>>> + "remote eDMA requires at least two MWS (have %u)\n",
> >>>>> + *mw_count);
> >>>>> + return -ENODEV;
> >>>>> + }
> >>>>> +
> >>>>> + ctx->wq = alloc_workqueue("ntb-edma-wq", WQ_UNBOUND | WQ_SYSFS, 0);
> >>>>> + if (!ctx->wq) {
> >>>>> + ntb_transport_edma_uninit(nt);
> >>>>> + return -ENOMEM;
> >>>>> + }
> >>>>> +
> >>>>> + /* Reserve the last peer MW exclusively for the eDMA window. */
> >>>>> + *mw_count -= 1;
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static void ntb_transport_edma_disable(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + ntb_transport_edma_uninit(nt);
> >>>>> +}
> >>>>> +
> >>>>> +static const struct ntb_transport_backend_ops edma_backend_ops = {
> >>>>> + .enable = ntb_transport_edma_enable,
> >>>>> + .disable = ntb_transport_edma_disable,
> >>>>> + .qp_init = ntb_transport_edma_qp_init,
> >>>>> + .qp_free = ntb_transport_edma_qp_free,
> >>>>> + .pre_link_up = ntb_transport_edma_pre_link_up,
> >>>>> + .post_link_up = ntb_transport_edma_post_link_up,
> >>>>> + .setup_qp_mw = ntb_transport_edma_setup_qp_mw,
> >>>>> + .tx_free_entry = ntb_transport_edma_tx_free_entry,
> >>>>> + .tx_enqueue = ntb_transport_edma_tx_enqueue,
> >>>>> + .rx_enqueue = ntb_transport_edma_rx_enqueue,
> >>>>> + .rx_poll = ntb_transport_edma_rx_poll,
> >>>>> + .debugfs_stats_show = ntb_transport_edma_debugfs_stats_show,
> >>>>> +};
> >>>>> +
> >>>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + struct ntb_dev *ndev = nt->ndev;
> >>>>> + int node;
> >>>>> +
> >>>>> + node = dev_to_node(&ndev->dev);
> >>>>> + nt->priv = kzalloc_node(sizeof(struct ntb_transport_ctx_edma), GFP_KERNEL,
> >>>>> + node);
> >>>>> + if (!nt->priv)
> >>>>> + return -ENOMEM;
> >>>>> +
> >>>>> + nt->backend_ops = edma_backend_ops;
> >>>>> + /*
> >>>>> + * On remote eDMA mode, one DMA read channel is used for Host side
> >>>>> + * to interrupt EP.
> >>>>> + */
> >>>>> + use_msi = false;
> >>>>> + return 0;
> >>>>> +}
> >>>>> diff --git a/drivers/ntb/ntb_transport_internal.h b/drivers/ntb/ntb_transport_internal.h
> >>>>> index 51ff08062d73..9fff65980d3d 100644
> >>>>> --- a/drivers/ntb/ntb_transport_internal.h
> >>>>> +++ b/drivers/ntb/ntb_transport_internal.h
> >>>>> @@ -8,6 +8,7 @@
> >>>>> extern unsigned long max_mw_size;
> >>>>> extern unsigned int transport_mtu;
> >>>>> extern bool use_msi;
> >>>>> +extern bool use_remote_edma;
> >>>>>
> >>>>> #define QP_TO_MW(nt, qp) ((qp) % nt->mw_count)
> >>>>>
> >>>>> @@ -29,6 +30,11 @@ struct ntb_queue_entry {
> >>>>> struct ntb_payload_header __iomem *tx_hdr;
> >>>>> struct ntb_payload_header *rx_hdr;
> >>>>> };
> >>>>> +
> >>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>>>> + dma_addr_t addr;
> >>>>> + struct scatterlist sgl;
> >>>>> +#endif
> >>>>> };
> >>>>>
> >>>>> struct ntb_rx_info {
> >>>>> @@ -202,4 +208,13 @@ int ntb_transport_init_queue(struct ntb_transport_ctx *nt,
> >>>>> unsigned int qp_num);
> >>>>> struct device *get_dma_dev(struct ntb_dev *ndev);
> >>>>>
> >>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>>>> +int ntb_transport_edma_init(struct ntb_transport_ctx *nt);
> >>>>> +#else
> >>>>> +static inline int ntb_transport_edma_init(struct ntb_transport_ctx *nt)
> >>>>> +{
> >>>>> + return -EOPNOTSUPP;
> >>>>> +}
> >>>>> +#endif /* CONFIG_NTB_TRANSPORT_EDMA */
> >>>>> +
> >>>>> #endif /* _NTB_TRANSPORT_INTERNAL_H_ */
> >>>>
> >>
> >
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-10 13:43 ` Koichiro Den
@ 2026-01-12 15:43 ` Dave Jiang
2026-01-13 2:44 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-12 15:43 UTC (permalink / raw)
To: Koichiro Den
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On 1/10/26 6:43 AM, Koichiro Den wrote:
> On Thu, Jan 08, 2026 at 10:55:46AM -0700, Dave Jiang wrote:
>>
>>
>> On 1/7/26 6:25 PM, Koichiro Den wrote:
>>> On Wed, Jan 07, 2026 at 12:02:15PM -0700, Dave Jiang wrote:
>>>>
>>>>
>>>> On 1/7/26 7:54 AM, Koichiro Den wrote:
>>>>> On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
>>>>>>
>>>>>>
>>>>>> On 12/17/25 8:16 AM, Koichiro Den wrote:
>>>>>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
>>>>>>> located on the endpoint, to be driven by both host and endpoint.
>>>>>>>
>>>>>>> The endpoint exposes a dedicated memory window which contains the eDMA
>>>>>>> register block, a small control structure (struct ntb_edma_info) and
>>>>>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
>>>>>>> its local eDMA write channels for its transmission, while host side
>>>>>>> uses the remote eDMA read channels for its transmission.
>>>>>>>
>>>>>>> A key benefit of this backend is that the memory window no longer needs
>>>>>>> to carry data-plane payload. This makes the design less sensitive to
>>>>>>> limited memory window space and allows scaling to multiple queue pairs.
>>>>>>> The memory window layout is specific to the eDMA-backed backend, so
>>>>>>> there is no automatic fallback to the memcpy-based default transport
>>>>>>> that requires the different layout.
>>>>>>>
>>>>>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
>>>>>>> ---
>>>>>>> drivers/ntb/Kconfig | 12 +
>>>>>>> drivers/ntb/Makefile | 2 +
>>>>>>> drivers/ntb/ntb_transport_core.c | 15 +-
>>>>>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
>>>>>>> drivers/ntb/ntb_transport_internal.h | 15 +
>>>>>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
>>>>>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
>>>>>>>
>>>>>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
>>>>>>> index df16c755b4da..5ba6d0b7f5ba 100644
>>>>>>> --- a/drivers/ntb/Kconfig
>>>>>>> +++ b/drivers/ntb/Kconfig
>>>>>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
>>>>>>>
>>>>>>> If unsure, say N.
>>>>>>>
>>>>>>> +config NTB_TRANSPORT_EDMA
>>>>>>> + bool "NTB Transport backed by remote eDMA"
>>>>>>> + depends on NTB_TRANSPORT
>>>>>>> + depends on PCI
>>>>>>> + select DMA_ENGINE
>>>>>>> + select NTB_EDMA
>>>>>>> + help
>>>>>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
>>>>>>> + exposed through a dedicated NTB memory window. The host uses the
>>>>>>> + endpoint's eDMA engine to move data in both directions.
>>>>>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
>>>>>>> +
>>>>>>> endif # NTB
>>>>>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
>>>>>>> index 9b66e5fafbc0..b9086b32ecde 100644
>>>>>>> --- a/drivers/ntb/Makefile
>>>>>>> +++ b/drivers/ntb/Makefile
>>>>>>> @@ -6,3 +6,5 @@ ntb-y := core.o
>>>>>>> ntb-$(CONFIG_NTB_MSI) += msi.o
>>>>>>>
>>>>>>> ntb_transport-y := ntb_transport_core.o
>>>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
>>>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
>>>>>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
>>>>>>> index 40c2548f5930..bd21232f26fe 100644
>>>>>>> --- a/drivers/ntb/ntb_transport_core.c
>>>>>>> +++ b/drivers/ntb/ntb_transport_core.c
>>>>>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
>>>>>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
>>>>>>> #endif
>>>>>>>
>>>>>>> +bool use_remote_edma;
>>>>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
>>>>>>> +module_param(use_remote_edma, bool, 0644);
>>>>>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
>>>>>>> +#endif
>>>>>>
>>>>>> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
>>>>>
>>>>> Agreed. I plan to drop 'use_remote_edma' and instead,
>>>>> - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
>>>>> - introduce ntb_transport_backend_register() for transports to self-register via
>>>>> struct ntb_transport_backend { .name, .ops }, and
>>>>> - have the core select the backend whose .name matches transport_type.
>>>>>
>>>>> I think this should keep any non-default transport-specific logic out of
>>>>> ntb_transport_core, or at least keep it to a minimum, while still allowing
>>>>> non-defualt transports (*ntb_transport_edma is the only choice for now
>>>>> though) to plug in cleanly.
>>>>>
>>>>> If you see a cleaner approach, I would appreciate it if you could elaborate
>>>>> a bit more on your idea.
>>>>
>>>
>>> Thank you for the comment, let me respond inline below.
>>>
>>>> Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing?
>>>
>>> At least in the current implementation, the remote eDMA use is an
>>> all-or-nothing type rather than something that can be selected per
>>> ntb_transport_mw.
>>>
>>> The way remote eDMA consumes MW is quite similar to how ntb_msi uses them
>>> today. Assuming multiple MWs are available, the last MW is reserved to
>>> expose the remote eDMA info/register/LL regions to the host by packing all
>>> of them into a single MW. In that sense, it does not map naturally to a
>>> per-MW selection model.
>>>
>>>> I'm trying to see if we can do away with the module param.
>>>
>>> I think it is useful to keep an explicit way for an administrator to choose
>>> the transport type (default vs edma). Even on platforms where dw-edma is
>>> available, there can potentially be platform-specific or hard-to-reproduce
>>> issues (e.g. problems that only show up with certain transfer patterns),
>>> and having a way to fall back the long-existing traditional transport can
>>> be valuable.
>>>
>>> That said, I am not opposed to making the default behavior an automatic
>>> selection, where edma is chosen when it's available and the parameter is
>>> left unset.
>>>
>>>> Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
>>>
>>> I'm not sure how selecting the transport type at ntb_netdev probe time
>>> would work in practice, and what additional benefit that would provide.
>>
>> So currently ntb_netdev or ntb_transport are not auto-loaded right? They are manually probed by the user. So with the new transport, the user would modprobe ntb_transport_edma.ko. And that would trigger the eDMA transport setup right? With the ntb_transport_core library existing, we should be able to load both the ntb_transport_host and ntb_transport_edma at the same time theoretically. And ntb_netdev should be able to select one or the other transport. This is the most versatile scenario. An alternative is there can be only 1 transport ever loaded, and when ntb_transport_edma is loaded, it just looks like the default transport and netdev functions as it always has without knowing what the underneath transport is. On the platform if there are multiple NTB ports, it would be nice to have the flexibility of allowing each port choose the usage of the current transport and the edma transport if the user desires.
>
> I was assuming manual load in my previous response. Also in this RFC v3,
> ntb_transport_edma is not even a standalone module yet (although I do think
> it should be). At this point, I feel the RFC v3 implementation is still a
> bit too rough to use as a basis for discussing the ideal long-term design,
> so I'd like to set it aside for a moment and focus on what the ideal shape
> could look like.
>
> My current thoughts on the ideal structure, after reading your last
> comment, are as follows:
>
> * The existing cpu/dma memcpy-based transport becomes "ntb_transport_host",
> and the new eDMA-based transport becomes "ntb_transport_edma".
> * Each transport is a separate kernel module, and each provides its own
> ntb_client implementation (i.e. each registers independently with the
> NTB core). In this model, it should be perfectly fine for both modules to
> be loaded at the same time.
> * Common pieces (e.g. ntb_transport_bus registration, shared helpers, and
> the boundary/API exposed to ntb_transport_clients such as ntb_netdev)
> should live in a shared library module, such as "ntb_transport_core" (or
> "ntb_transport", naming TBD).
>
> Then, for transport type selection:
>
> * If we want to switch the transport type (host vs edma) on a per-NTB-port
> (device) basis, we can rely on the standard driver override framework
> (ie. driver_override, unbind/bind). To make that work, at first we need
> to add driver_override support to ntb_bus.
> * In the case that ntb_netdev wants to explicitly select a transport type,
> I think it should still be handled via the per-NTB-port driver_override
> rather than building transport-selection logic into ntb_netdev itself
> (perhaps with some extension to the boundary API for
> ntb_transport_clients).
> * If ntb_transport_host / ntb_transport_edma are built-in modules, a
> post-boot rebind might be sufficient in most cases. If that's not
> sufficient, we could also consider providing a kernel parameter to define
> a boot-time policy. For example, something like:
> ntb_transport.policy=edma@0000:01:00.0,host@0000:5f:00.0
>
> How doe that sound? In any case, I am planning to submit RFC v4.
Yup that sounds about what I was thinking. If you are submitting RFC v4 w/o the changes mentioned about, just mention that the progress is moving towards that in the cover letter to remind people. Thanks!
Any additional thoughts Jon?
>
> Thanks for the review,
> Koichiro
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode
2026-01-12 15:43 ` Dave Jiang
@ 2026-01-13 2:44 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2026-01-13 2:44 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank.Li, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Mon, Jan 12, 2026 at 08:43:28AM -0700, Dave Jiang wrote:
>
>
> On 1/10/26 6:43 AM, Koichiro Den wrote:
> > On Thu, Jan 08, 2026 at 10:55:46AM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 1/7/26 6:25 PM, Koichiro Den wrote:
> >>> On Wed, Jan 07, 2026 at 12:02:15PM -0700, Dave Jiang wrote:
> >>>>
> >>>>
> >>>> On 1/7/26 7:54 AM, Koichiro Den wrote:
> >>>>> On Tue, Jan 06, 2026 at 11:51:03AM -0700, Dave Jiang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 12/17/25 8:16 AM, Koichiro Den wrote:
> >>>>>>> Add a new ntb_transport backend that uses a DesignWare eDMA engine
> >>>>>>> located on the endpoint, to be driven by both host and endpoint.
> >>>>>>>
> >>>>>>> The endpoint exposes a dedicated memory window which contains the eDMA
> >>>>>>> register block, a small control structure (struct ntb_edma_info) and
> >>>>>>> per-channel linked-list (LL) rings for read channels. Endpoint drives
> >>>>>>> its local eDMA write channels for its transmission, while host side
> >>>>>>> uses the remote eDMA read channels for its transmission.
> >>>>>>>
> >>>>>>> A key benefit of this backend is that the memory window no longer needs
> >>>>>>> to carry data-plane payload. This makes the design less sensitive to
> >>>>>>> limited memory window space and allows scaling to multiple queue pairs.
> >>>>>>> The memory window layout is specific to the eDMA-backed backend, so
> >>>>>>> there is no automatic fallback to the memcpy-based default transport
> >>>>>>> that requires the different layout.
> >>>>>>>
> >>>>>>> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> >>>>>>> ---
> >>>>>>> drivers/ntb/Kconfig | 12 +
> >>>>>>> drivers/ntb/Makefile | 2 +
> >>>>>>> drivers/ntb/ntb_transport_core.c | 15 +-
> >>>>>>> drivers/ntb/ntb_transport_edma.c | 987 +++++++++++++++++++++++++++
> >>>>>>> drivers/ntb/ntb_transport_internal.h | 15 +
> >>>>>>> 5 files changed, 1029 insertions(+), 2 deletions(-)
> >>>>>>> create mode 100644 drivers/ntb/ntb_transport_edma.c
> >>>>>>>
> >>>>>>> diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
> >>>>>>> index df16c755b4da..5ba6d0b7f5ba 100644
> >>>>>>> --- a/drivers/ntb/Kconfig
> >>>>>>> +++ b/drivers/ntb/Kconfig
> >>>>>>> @@ -37,4 +37,16 @@ config NTB_TRANSPORT
> >>>>>>>
> >>>>>>> If unsure, say N.
> >>>>>>>
> >>>>>>> +config NTB_TRANSPORT_EDMA
> >>>>>>> + bool "NTB Transport backed by remote eDMA"
> >>>>>>> + depends on NTB_TRANSPORT
> >>>>>>> + depends on PCI
> >>>>>>> + select DMA_ENGINE
> >>>>>>> + select NTB_EDMA
> >>>>>>> + help
> >>>>>>> + Enable a transport backend that uses a remote DesignWare eDMA engine
> >>>>>>> + exposed through a dedicated NTB memory window. The host uses the
> >>>>>>> + endpoint's eDMA engine to move data in both directions.
> >>>>>>> + Say Y here if you intend to use the 'use_remote_edma' module parameter.
> >>>>>>> +
> >>>>>>> endif # NTB
> >>>>>>> diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
> >>>>>>> index 9b66e5fafbc0..b9086b32ecde 100644
> >>>>>>> --- a/drivers/ntb/Makefile
> >>>>>>> +++ b/drivers/ntb/Makefile
> >>>>>>> @@ -6,3 +6,5 @@ ntb-y := core.o
> >>>>>>> ntb-$(CONFIG_NTB_MSI) += msi.o
> >>>>>>>
> >>>>>>> ntb_transport-y := ntb_transport_core.o
> >>>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += ntb_transport_edma.o
> >>>>>>> +ntb_transport-$(CONFIG_NTB_TRANSPORT_EDMA) += hw/edma/ntb_hw_edma.o
> >>>>>>> diff --git a/drivers/ntb/ntb_transport_core.c b/drivers/ntb/ntb_transport_core.c
> >>>>>>> index 40c2548f5930..bd21232f26fe 100644
> >>>>>>> --- a/drivers/ntb/ntb_transport_core.c
> >>>>>>> +++ b/drivers/ntb/ntb_transport_core.c
> >>>>>>> @@ -104,6 +104,12 @@ module_param(use_msi, bool, 0644);
> >>>>>>> MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
> >>>>>>> #endif
> >>>>>>>
> >>>>>>> +bool use_remote_edma;
> >>>>>>> +#ifdef CONFIG_NTB_TRANSPORT_EDMA
> >>>>>>> +module_param(use_remote_edma, bool, 0644);
> >>>>>>> +MODULE_PARM_DESC(use_remote_edma, "Use remote eDMA mode (when enabled, use_msi is ignored)");
> >>>>>>> +#endif
> >>>>>>
> >>>>>> This seems clunky. Can the ntb_transport_core determine this when the things are called through ntb_transport_edma? Or maybe a set_transport_type can be introduced by the transport itself during allocation?
> >>>>>
> >>>>> Agreed. I plan to drop 'use_remote_edma' and instead,
> >>>>> - add a module parameter: transport_type={"default","edma"} (defaulting to "default"),
> >>>>> - introduce ntb_transport_backend_register() for transports to self-register via
> >>>>> struct ntb_transport_backend { .name, .ops }, and
> >>>>> - have the core select the backend whose .name matches transport_type.
> >>>>>
> >>>>> I think this should keep any non-default transport-specific logic out of
> >>>>> ntb_transport_core, or at least keep it to a minimum, while still allowing
> >>>>> non-defualt transports (*ntb_transport_edma is the only choice for now
> >>>>> though) to plug in cleanly.
> >>>>>
> >>>>> If you see a cleaner approach, I would appreciate it if you could elaborate
> >>>>> a bit more on your idea.
> >>>>
> >>>
> >>> Thank you for the comment, let me respond inline below.
> >>>
> >>>> Do you think it's flexible enough that we can determine a transport type per 'ntb_transport_mw' or is this an all or nothing type of thing?
> >>>
> >>> At least in the current implementation, the remote eDMA use is an
> >>> all-or-nothing type rather than something that can be selected per
> >>> ntb_transport_mw.
> >>>
> >>> The way remote eDMA consumes MW is quite similar to how ntb_msi uses them
> >>> today. Assuming multiple MWs are available, the last MW is reserved to
> >>> expose the remote eDMA info/register/LL regions to the host by packing all
> >>> of them into a single MW. In that sense, it does not map naturally to a
> >>> per-MW selection model.
> >>>
> >>>> I'm trying to see if we can do away with the module param.
> >>>
> >>> I think it is useful to keep an explicit way for an administrator to choose
> >>> the transport type (default vs edma). Even on platforms where dw-edma is
> >>> available, there can potentially be platform-specific or hard-to-reproduce
> >>> issues (e.g. problems that only show up with certain transfer patterns),
> >>> and having a way to fall back the long-existing traditional transport can
> >>> be valuable.
> >>>
> >>> That said, I am not opposed to making the default behavior an automatic
> >>> selection, where edma is chosen when it's available and the parameter is
> >>> left unset.
> >>>
> >>>> Or I guess when you probe ntb_netdev, the selection would happen there and thus transport_type would be in ntb_netdev module?
> >>>
> >>> I'm not sure how selecting the transport type at ntb_netdev probe time
> >>> would work in practice, and what additional benefit that would provide.
> >>
> >> So currently ntb_netdev or ntb_transport are not auto-loaded right? They are manually probed by the user. So with the new transport, the user would modprobe ntb_transport_edma.ko. And that would trigger the eDMA transport setup right? With the ntb_transport_core library existing, we should be able to load both the ntb_transport_host and ntb_transport_edma at the same time theoretically. And ntb_netdev should be able to select one or the other transport. This is the most versatile scenario. An alternative is there can be only 1 transport ever loaded, and when ntb_transport_edma is loaded, it just looks like the default transport and netdev functions as it always has without knowing what the underneath transport is. On the platform if there are multiple NTB ports, it would be nice to have the flexibility of allowing each port choose the usage of the current transport and the edma transport if the user desires.
> >
> > I was assuming manual load in my previous response. Also in this RFC v3,
> > ntb_transport_edma is not even a standalone module yet (although I do think
> > it should be). At this point, I feel the RFC v3 implementation is still a
> > bit too rough to use as a basis for discussing the ideal long-term design,
> > so I'd like to set it aside for a moment and focus on what the ideal shape
> > could look like.
> >
> > My current thoughts on the ideal structure, after reading your last
> > comment, are as follows:
> >
> > * The existing cpu/dma memcpy-based transport becomes "ntb_transport_host",
> > and the new eDMA-based transport becomes "ntb_transport_edma".
> > * Each transport is a separate kernel module, and each provides its own
> > ntb_client implementation (i.e. each registers independently with the
> > NTB core). In this model, it should be perfectly fine for both modules to
> > be loaded at the same time.
> > * Common pieces (e.g. ntb_transport_bus registration, shared helpers, and
> > the boundary/API exposed to ntb_transport_clients such as ntb_netdev)
> > should live in a shared library module, such as "ntb_transport_core" (or
> > "ntb_transport", naming TBD).
> >
> > Then, for transport type selection:
> >
> > * If we want to switch the transport type (host vs edma) on a per-NTB-port
> > (device) basis, we can rely on the standard driver override framework
> > (ie. driver_override, unbind/bind). To make that work, at first we need
> > to add driver_override support to ntb_bus.
> > * In the case that ntb_netdev wants to explicitly select a transport type,
> > I think it should still be handled via the per-NTB-port driver_override
> > rather than building transport-selection logic into ntb_netdev itself
> > (perhaps with some extension to the boundary API for
> > ntb_transport_clients).
> > * If ntb_transport_host / ntb_transport_edma are built-in modules, a
> > post-boot rebind might be sufficient in most cases. If that's not
> > sufficient, we could also consider providing a kernel parameter to define
> > a boot-time policy. For example, something like:
> > ntb_transport.policy=edma@0000:01:00.0,host@0000:5f:00.0
> >
> > How doe that sound? In any case, I am planning to submit RFC v4.
>
> Yup that sounds about what I was thinking. If you are submitting RFC v4 w/o the changes mentioned about, just mention that the progress is moving towards that in the cover letter to remind people. Thanks!
I'll include all the changes in RFC v4.
Thanks,
Koichiro
>
> Any additional thoughts Jon?
>
> >
> > Thanks for the review,
> > Koichiro
>
>
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH v3 27/35] NTB: epf: Provide db_vector_count/db_vector_mask callbacks
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (25 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 26/35] NTB: ntb_transport: Introduce DW eDMA backed transport mode Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 28/35] ntb_netdev: Multi-queue support Koichiro Den
` (8 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Provide db_vector_count() and db_vector_mask() implementations for both
ntb_hw_epf and pci-epf-vntb so that ntb_transport can map MSI vectors to
doorbell bits. Without them, the upper layer cannot identify which
doorbell vector fired and ends up scheduling rxc_db_work() for all queue
pairs, resulting in a thundering-herd effect when multiple queue pairs
(QPs) are enabled.
With this change, .peer_db_set() must honor the db_bits mask and raise
all requested doorbell interrupts, so update those implementations
accordingly.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/epf/ntb_hw_epf.c | 47 ++++++++++++-------
drivers/pci/endpoint/functions/pci-epf-vntb.c | 40 +++++++++++++---
2 files changed, 63 insertions(+), 24 deletions(-)
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 4ecc6b2177b4..5303a8944019 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -375,7 +375,7 @@ static int ntb_epf_init_isr(struct ntb_epf_dev *ndev, int msi_min, int msi_max)
}
}
- ndev->db_count = irq;
+ ndev->db_count = irq - 1;
ret = ntb_epf_send_command(ndev, CMD_CONFIGURE_DOORBELL,
argument | irq);
@@ -409,6 +409,22 @@ static u64 ntb_epf_db_valid_mask(struct ntb_dev *ntb)
return ntb_ndev(ntb)->db_valid_mask;
}
+static int ntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+ return ntb_ndev(ntb)->db_count;
+}
+
+static u64 ntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+ struct ntb_epf_dev *ndev = ntb_ndev(ntb);
+
+ db_vector--; /* vector 0 is reserved for link events */
+ if (db_vector < 0 || db_vector >= ndev->db_count)
+ return 0;
+
+ return ndev->db_valid_mask & BIT_ULL(db_vector);
+}
+
static int ntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
{
return 0;
@@ -492,26 +508,21 @@ static int ntb_epf_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
static int ntb_epf_peer_db_set(struct ntb_dev *ntb, u64 db_bits)
{
struct ntb_epf_dev *ndev = ntb_ndev(ntb);
- u32 interrupt_num = ffs(db_bits) + 1;
- struct device *dev = ndev->dev;
+ u32 interrupt_num;
u32 db_entry_size;
u32 db_offset;
u32 db_data;
-
- if (interrupt_num >= ndev->db_count) {
- dev_err(dev, "DB interrupt %d greater than Max Supported %d\n",
- interrupt_num, ndev->db_count);
- return -EINVAL;
- }
+ int i;
db_entry_size = readl(ndev->ctrl_reg + NTB_EPF_DB_ENTRY_SIZE);
- db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
- db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
-
- writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
- db_offset);
-
+ for_each_set_bit(i, (unsigned long *)&db_bits, ndev->db_count) {
+ interrupt_num = i + 1;
+ db_data = readl(ndev->ctrl_reg + NTB_EPF_DB_DATA(interrupt_num));
+ db_offset = readl(ndev->ctrl_reg + NTB_EPF_DB_OFFSET(interrupt_num));
+ writel(db_data, ndev->db_reg + (db_entry_size * interrupt_num) +
+ db_offset);
+ }
return 0;
}
@@ -541,6 +552,8 @@ static const struct ntb_dev_ops ntb_epf_ops = {
.spad_count = ntb_epf_spad_count,
.peer_mw_count = ntb_epf_peer_mw_count,
.db_valid_mask = ntb_epf_db_valid_mask,
+ .db_vector_count = ntb_epf_db_vector_count,
+ .db_vector_mask = ntb_epf_db_vector_mask,
.db_set_mask = ntb_epf_db_set_mask,
.mw_set_trans = ntb_epf_mw_set_trans,
.mw_clear_trans = ntb_epf_mw_clear_trans,
@@ -591,8 +604,8 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
int ret;
/* One Link interrupt and rest doorbell interrupt */
- ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + NTB_EPF_IRQ_RESERVE,
- NTB_EPF_MAX_DB_COUNT + NTB_EPF_IRQ_RESERVE);
+ ret = ntb_epf_init_isr(ndev, NTB_EPF_MIN_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE,
+ NTB_EPF_MAX_DB_COUNT + 1 + NTB_EPF_IRQ_RESERVE);
if (ret) {
dev_err(dev, "Failed to init ISR\n");
return ret;
diff --git a/drivers/pci/endpoint/functions/pci-epf-vntb.c b/drivers/pci/endpoint/functions/pci-epf-vntb.c
index c89f5b0775fa..c47186fe4f75 100644
--- a/drivers/pci/endpoint/functions/pci-epf-vntb.c
+++ b/drivers/pci/endpoint/functions/pci-epf-vntb.c
@@ -1384,6 +1384,22 @@ static u64 vntb_epf_db_valid_mask(struct ntb_dev *ntb)
return BIT_ULL(ntb_ndev(ntb)->db_count) - 1;
}
+static int vntb_epf_db_vector_count(struct ntb_dev *ntb)
+{
+ return ntb_ndev(ntb)->db_count;
+}
+
+static u64 vntb_epf_db_vector_mask(struct ntb_dev *ntb, int db_vector)
+{
+ struct epf_ntb *ndev = ntb_ndev(ntb);
+
+ db_vector--; /* vector 0 is reserved for link events */
+ if (db_vector < 0 || db_vector >= ndev->db_count)
+ return 0;
+
+ return BIT_ULL(db_vector);
+}
+
static int vntb_epf_db_set_mask(struct ntb_dev *ntb, u64 db_bits)
{
return 0;
@@ -1509,20 +1525,28 @@ static int vntb_epf_peer_spad_write(struct ntb_dev *ndev, int pidx, int idx, u32
static int vntb_epf_peer_db_set(struct ntb_dev *ndev, u64 db_bits)
{
- u32 interrupt_num = ffs(db_bits) + 1;
struct epf_ntb *ntb = ntb_ndev(ndev);
u8 func_no, vfunc_no;
- int ret;
+ u64 failed = 0;
+ int i;
func_no = ntb->epf->func_no;
vfunc_no = ntb->epf->vfunc_no;
- ret = pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
- PCI_IRQ_MSI, interrupt_num + 1);
- if (ret)
- dev_err(&ntb->ntb.dev, "Failed to raise IRQ\n");
+ for_each_set_bit(i, (unsigned long *)&db_bits, ntb->db_count) {
+ /*
+ * DB bit i is MSI interrupt (i + 2).
+ * Vector 0 is used for link events and MSI vectors are
+ * 1-based for pci_epc_raise_irq().
+ */
+ if (pci_epc_raise_irq(ntb->epf->epc, func_no, vfunc_no,
+ PCI_IRQ_MSI, i + 2))
+ failed |= BIT_ULL(i);
+ }
+ if (failed)
+ dev_err(&ntb->ntb.dev, "Failed to raise IRQ (0x%llx)\n", failed);
- return ret;
+ return failed ? -EIO : 0;
}
static u64 vntb_epf_db_read(struct ntb_dev *ndev)
@@ -1596,6 +1620,8 @@ static const struct ntb_dev_ops vntb_epf_ops = {
.spad_count = vntb_epf_spad_count,
.peer_mw_count = vntb_epf_peer_mw_count,
.db_valid_mask = vntb_epf_db_valid_mask,
+ .db_vector_count = vntb_epf_db_vector_count,
+ .db_vector_mask = vntb_epf_db_vector_mask,
.db_set_mask = vntb_epf_db_set_mask,
.mw_set_trans = vntb_epf_mw_set_trans,
.mw_clear_trans = vntb_epf_mw_clear_trans,
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 28/35] ntb_netdev: Multi-queue support
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (26 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 27/35] NTB: epf: Provide db_vector_count/db_vector_mask callbacks Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 29/35] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car) Koichiro Den
` (7 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
In eDMA-backed mode (use_remote_edma=1), ntb_transport can scale
throughput across multiple queue pairs without being constrained by
scarce PCI memory window space used for data-plane buffers. It contrasts
with the default backend mode, where even with a single queue pair, only
up to 15 in-flight descriptors fit in a 1 MiB MW.
Teach ntb_netdev to allocate multiple ntb_transport queue pairs and
expose them as a multi-queue net_device.
With this patch, up to N queue pairs are created, where N is chosen as
follows:
- By default, N is num_online_cpus(), to give each CPU its own queue.
- If the ntb_num_queues module parameter is non-zero, it overrides the
default and requests that many queues.
- In both cases the requested value is capped at a fixed upper bound
to avoid unbounded allocations, and by the number of queue pairs
actually available from ntb_transport.
If only one queue pair can be created (or ntb_num_queues=1 is set), the
driver effectively falls back to the previous single-queue behaviour.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/net/ntb_netdev.c | 341 ++++++++++++++++++++++++++++-----------
1 file changed, 243 insertions(+), 98 deletions(-)
diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index fbeae05817e9..7aeca35b46c5 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -53,6 +53,8 @@
#include <linux/pci.h>
#include <linux/ntb.h>
#include <linux/ntb_transport.h>
+#include <linux/cpumask.h>
+#include <linux/slab.h>
#define NTB_NETDEV_VER "0.7"
@@ -70,26 +72,84 @@ static unsigned int tx_start = 10;
/* Number of descriptors still available before stop upper layer tx */
static unsigned int tx_stop = 5;
+/*
+ * Upper bound on how many queue pairs we will try to create even if
+ * ntb_num_queues or num_online_cpus() is very large. This is an
+ * arbitrary safety cap to avoid unbounded allocations.
+ */
+#define NTB_NETDEV_MAX_QUEUES 64
+
+/*
+ * ntb_num_queues == 0 (default) means:
+ * - use num_online_cpus() as the desired queue count, capped by
+ * NTB_NETDEV_MAX_QUEUES.
+ * ntb_num_queues > 0:
+ * - try to create exactly ntb_num_queues queue pairs (again capped
+ * by NTB_NETDEV_MAX_QUEUES), but fall back to the number of queue
+ * pairs actually available from ntb_transport.
+ */
+static unsigned int ntb_num_queues;
+module_param(ntb_num_queues, uint, 0644);
+MODULE_PARM_DESC(ntb_num_queues,
+ "Number of NTB netdev queue pairs to use (0 = per-CPU)");
+
+struct ntb_netdev;
+
+struct ntb_netdev_queue {
+ struct ntb_netdev *ntdev;
+ struct ntb_transport_qp *qp;
+ struct timer_list tx_timer;
+ u16 qid;
+};
+
struct ntb_netdev {
struct pci_dev *pdev;
struct net_device *ndev;
- struct ntb_transport_qp *qp;
- struct timer_list tx_timer;
+ unsigned int num_queues;
+ struct ntb_netdev_queue *queues;
};
#define NTB_TX_TIMEOUT_MS 1000
#define NTB_RXQ_SIZE 100
+static unsigned int ntb_netdev_default_queues(void)
+{
+ unsigned int n;
+
+ if (ntb_num_queues)
+ n = ntb_num_queues;
+ else
+ n = num_online_cpus();
+
+ if (!n)
+ n = 1;
+
+ if (n > NTB_NETDEV_MAX_QUEUES)
+ n = NTB_NETDEV_MAX_QUEUES;
+
+ return n;
+}
+
static void ntb_netdev_event_handler(void *data, int link_is_up)
{
- struct net_device *ndev = data;
- struct ntb_netdev *dev = netdev_priv(ndev);
+ struct ntb_netdev_queue *q = data;
+ struct ntb_netdev *dev = q->ntdev;
+ struct net_device *ndev = dev->ndev;
+ bool any_up = false;
+ unsigned int i;
- netdev_dbg(ndev, "Event %x, Link %x\n", link_is_up,
- ntb_transport_link_query(dev->qp));
+ netdev_dbg(ndev, "Event %x, Link %x, qp %u\n", link_is_up,
+ ntb_transport_link_query(q->qp), q->qid);
if (link_is_up) {
- if (ntb_transport_link_query(dev->qp))
+ for (i = 0; i < dev->num_queues; i++) {
+ if (ntb_transport_link_query(dev->queues[i].qp)) {
+ any_up = true;
+ break;
+ }
+ }
+
+ if (any_up)
netif_carrier_on(ndev);
} else {
netif_carrier_off(ndev);
@@ -99,7 +159,9 @@ static void ntb_netdev_event_handler(void *data, int link_is_up)
static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
void *data, int len)
{
- struct net_device *ndev = qp_data;
+ struct ntb_netdev_queue *q = qp_data;
+ struct ntb_netdev *dev = q->ntdev;
+ struct net_device *ndev = dev->ndev;
struct sk_buff *skb;
int rc;
@@ -135,7 +197,8 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
}
enqueue_again:
- rc = ntb_transport_rx_enqueue(qp, skb, skb->data, ndev->mtu + ETH_HLEN);
+ rc = ntb_transport_rx_enqueue(q->qp, skb, skb->data,
+ ndev->mtu + ETH_HLEN);
if (rc) {
dev_kfree_skb_any(skb);
ndev->stats.rx_errors++;
@@ -143,42 +206,37 @@ static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
}
}
-static int __ntb_netdev_maybe_stop_tx(struct net_device *netdev,
- struct ntb_transport_qp *qp, int size)
+static int ntb_netdev_maybe_stop_tx(struct ntb_netdev_queue *q, int size)
{
- struct ntb_netdev *dev = netdev_priv(netdev);
+ struct net_device *ndev = q->ntdev->ndev;
+
+ if (ntb_transport_tx_free_entry(q->qp) >= size)
+ return 0;
+
+ netif_stop_subqueue(ndev, q->qid);
- netif_stop_queue(netdev);
/* Make sure to see the latest value of ntb_transport_tx_free_entry()
* since the queue was last started.
*/
smp_mb();
- if (likely(ntb_transport_tx_free_entry(qp) < size)) {
- mod_timer(&dev->tx_timer, jiffies + usecs_to_jiffies(tx_time));
+ if (likely(ntb_transport_tx_free_entry(q->qp) < size)) {
+ mod_timer(&q->tx_timer, jiffies + usecs_to_jiffies(tx_time));
return -EBUSY;
}
- netif_start_queue(netdev);
- return 0;
-}
-
-static int ntb_netdev_maybe_stop_tx(struct net_device *ndev,
- struct ntb_transport_qp *qp, int size)
-{
- if (netif_queue_stopped(ndev) ||
- (ntb_transport_tx_free_entry(qp) >= size))
- return 0;
+ netif_wake_subqueue(ndev, q->qid);
- return __ntb_netdev_maybe_stop_tx(ndev, qp, size);
+ return 0;
}
static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
void *data, int len)
{
- struct net_device *ndev = qp_data;
+ struct ntb_netdev_queue *q = qp_data;
+ struct ntb_netdev *dev = q->ntdev;
+ struct net_device *ndev = dev->ndev;
struct sk_buff *skb;
- struct ntb_netdev *dev = netdev_priv(ndev);
skb = data;
if (!skb || !ndev)
@@ -194,13 +252,12 @@ static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
dev_kfree_skb_any(skb);
- if (ntb_transport_tx_free_entry(dev->qp) >= tx_start) {
+ if (ntb_transport_tx_free_entry(qp) >= tx_start) {
/* Make sure anybody stopping the queue after this sees the new
* value of ntb_transport_tx_free_entry()
*/
smp_mb();
- if (netif_queue_stopped(ndev))
- netif_wake_queue(ndev);
+ netif_wake_subqueue(ndev, q->qid);
}
}
@@ -208,16 +265,26 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
+ u16 qid = skb_get_queue_mapping(skb);
+ struct ntb_netdev_queue *q;
int rc;
- ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+ if (unlikely(!dev->num_queues))
+ goto err;
+
+ if (unlikely(qid >= dev->num_queues))
+ qid = qid % dev->num_queues;
- rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
+ q = &dev->queues[qid];
+
+ ntb_netdev_maybe_stop_tx(q, tx_stop);
+
+ rc = ntb_transport_tx_enqueue(q->qp, skb, skb->data, skb->len);
if (rc)
goto err;
/* check for next submit */
- ntb_netdev_maybe_stop_tx(ndev, dev->qp, tx_stop);
+ ntb_netdev_maybe_stop_tx(q, tx_stop);
return NETDEV_TX_OK;
@@ -229,80 +296,103 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
static void ntb_netdev_tx_timer(struct timer_list *t)
{
- struct ntb_netdev *dev = timer_container_of(dev, t, tx_timer);
+ struct ntb_netdev_queue *q = container_of(t, struct ntb_netdev_queue, tx_timer);
+ struct ntb_netdev *dev = q->ntdev;
struct net_device *ndev = dev->ndev;
- if (ntb_transport_tx_free_entry(dev->qp) < tx_stop) {
- mod_timer(&dev->tx_timer, jiffies + usecs_to_jiffies(tx_time));
+ if (ntb_transport_tx_free_entry(q->qp) < tx_stop) {
+ mod_timer(&q->tx_timer, jiffies + usecs_to_jiffies(tx_time));
} else {
- /* Make sure anybody stopping the queue after this sees the new
+ /*
+ * Make sure anybody stopping the queue after this sees the new
* value of ntb_transport_tx_free_entry()
*/
smp_mb();
- if (netif_queue_stopped(ndev))
- netif_wake_queue(ndev);
+ netif_wake_subqueue(ndev, q->qid);
}
}
static int ntb_netdev_open(struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
+ struct ntb_netdev_queue *queue;
struct sk_buff *skb;
- int rc, i, len;
-
- /* Add some empty rx bufs */
- for (i = 0; i < NTB_RXQ_SIZE; i++) {
- skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
- if (!skb) {
- rc = -ENOMEM;
- goto err;
- }
+ int rc = 0, i, len;
+ unsigned int q;
- rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
- ndev->mtu + ETH_HLEN);
- if (rc) {
- dev_kfree_skb(skb);
- goto err;
+ /* Add some empty rx bufs for each queue */
+ for (q = 0; q < dev->num_queues; q++) {
+ queue = &dev->queues[q];
+
+ for (i = 0; i < NTB_RXQ_SIZE; i++) {
+ skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+ if (!skb) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ rc = ntb_transport_rx_enqueue(queue->qp, skb, skb->data,
+ ndev->mtu + ETH_HLEN);
+ if (rc) {
+ dev_kfree_skb(skb);
+ goto err;
+ }
}
- }
- timer_setup(&dev->tx_timer, ntb_netdev_tx_timer, 0);
+ timer_setup(&queue->tx_timer, ntb_netdev_tx_timer, 0);
+ }
netif_carrier_off(ndev);
- ntb_transport_link_up(dev->qp);
- netif_start_queue(ndev);
+
+ for (q = 0; q < dev->num_queues; q++)
+ ntb_transport_link_up(dev->queues[q].qp);
+
+ netif_tx_start_all_queues(ndev);
return 0;
err:
- while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
- dev_kfree_skb(skb);
+ for (q = 0; q < dev->num_queues; q++) {
+ queue = &dev->queues[q];
+
+ while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+ dev_kfree_skb(skb);
+ }
return rc;
}
static int ntb_netdev_close(struct net_device *ndev)
{
struct ntb_netdev *dev = netdev_priv(ndev);
+ struct ntb_netdev_queue *queue;
struct sk_buff *skb;
+ unsigned int q;
int len;
- ntb_transport_link_down(dev->qp);
+ netif_tx_stop_all_queues(ndev);
+
+ for (q = 0; q < dev->num_queues; q++) {
+ queue = &dev->queues[q];
- while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
- dev_kfree_skb(skb);
+ ntb_transport_link_down(queue->qp);
- timer_delete_sync(&dev->tx_timer);
+ while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+ dev_kfree_skb(skb);
+ timer_delete_sync(&queue->tx_timer);
+ }
return 0;
}
static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
{
struct ntb_netdev *dev = netdev_priv(ndev);
+ struct ntb_netdev_queue *queue;
struct sk_buff *skb;
- int len, rc;
+ unsigned int q, i;
+ int len, rc = 0;
- if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+ if (new_mtu > ntb_transport_max_size(dev->queues[0].qp) - ETH_HLEN)
return -EINVAL;
if (!netif_running(ndev)) {
@@ -311,41 +401,54 @@ static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
}
/* Bring down the link and dispose of posted rx entries */
- ntb_transport_link_down(dev->qp);
+ for (q = 0; q < dev->num_queues; q++)
+ ntb_transport_link_down(dev->queues[0].qp);
if (ndev->mtu < new_mtu) {
- int i;
-
- for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
- dev_kfree_skb(skb);
+ for (q = 0; q < dev->num_queues; q++) {
+ queue = &dev->queues[q];
- for (; i; i--) {
- skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
- if (!skb) {
- rc = -ENOMEM;
- goto err;
- }
-
- rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
- new_mtu + ETH_HLEN);
- if (rc) {
+ for (i = 0;
+ (skb = ntb_transport_rx_remove(queue->qp, &len));
+ i++)
dev_kfree_skb(skb);
- goto err;
+
+ for (; i; i--) {
+ skb = netdev_alloc_skb(ndev,
+ new_mtu + ETH_HLEN);
+ if (!skb) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ rc = ntb_transport_rx_enqueue(queue->qp, skb,
+ skb->data,
+ new_mtu +
+ ETH_HLEN);
+ if (rc) {
+ dev_kfree_skb(skb);
+ goto err;
+ }
}
}
}
WRITE_ONCE(ndev->mtu, new_mtu);
- ntb_transport_link_up(dev->qp);
+ for (q = 0; q < dev->num_queues; q++)
+ ntb_transport_link_up(dev->queues[q].qp);
return 0;
err:
- ntb_transport_link_down(dev->qp);
+ for (q = 0; q < dev->num_queues; q++) {
+ struct ntb_netdev_queue *queue = &dev->queues[q];
+
+ ntb_transport_link_down(queue->qp);
- while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
- dev_kfree_skb(skb);
+ while ((skb = ntb_transport_rx_remove(queue->qp, &len)))
+ dev_kfree_skb(skb);
+ }
netdev_err(ndev, "Error changing MTU, device inoperable\n");
return rc;
@@ -404,6 +507,7 @@ static int ntb_netdev_probe(struct device *client_dev)
struct net_device *ndev;
struct pci_dev *pdev;
struct ntb_netdev *dev;
+ unsigned int q, desired_queues;
int rc;
ntb = dev_ntb(client_dev->parent);
@@ -411,7 +515,9 @@ static int ntb_netdev_probe(struct device *client_dev)
if (!pdev)
return -ENODEV;
- ndev = alloc_etherdev(sizeof(*dev));
+ desired_queues = ntb_netdev_default_queues();
+
+ ndev = alloc_etherdev_mq(sizeof(*dev), desired_queues);
if (!ndev)
return -ENOMEM;
@@ -420,6 +526,15 @@ static int ntb_netdev_probe(struct device *client_dev)
dev = netdev_priv(ndev);
dev->ndev = ndev;
dev->pdev = pdev;
+ dev->num_queues = 0;
+
+ dev->queues = kcalloc(desired_queues, sizeof(*dev->queues),
+ GFP_KERNEL);
+ if (!dev->queues) {
+ rc = -ENOMEM;
+ goto err_free_netdev;
+ }
+
ndev->features = NETIF_F_HIGHDMA;
ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
@@ -436,26 +551,51 @@ static int ntb_netdev_probe(struct device *client_dev)
ndev->min_mtu = 0;
ndev->max_mtu = ETH_MAX_MTU;
- dev->qp = ntb_transport_create_queue(ndev, client_dev,
- &ntb_netdev_handlers);
- if (!dev->qp) {
+ for (q = 0; q < desired_queues; q++) {
+ struct ntb_netdev_queue *queue = &dev->queues[q];
+
+ queue->ntdev = dev;
+ queue->qid = q;
+ queue->qp = ntb_transport_create_queue(queue, client_dev,
+ &ntb_netdev_handlers);
+ if (!queue->qp)
+ break;
+
+ dev->num_queues++;
+ }
+
+ if (!dev->num_queues) {
rc = -EIO;
- goto err;
+ goto err_free_queues;
}
- ndev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
+ rc = netif_set_real_num_tx_queues(ndev, dev->num_queues);
+ if (rc)
+ goto err_free_qps;
+
+ rc = netif_set_real_num_rx_queues(ndev, dev->num_queues);
+ if (rc)
+ goto err_free_qps;
+
+ ndev->mtu = ntb_transport_max_size(dev->queues[0].qp) - ETH_HLEN;
rc = register_netdev(ndev);
if (rc)
- goto err1;
+ goto err_free_qps;
dev_set_drvdata(client_dev, ndev);
- dev_info(&pdev->dev, "%s created\n", ndev->name);
+ dev_info(&pdev->dev, "%s created with %u queue pairs\n",
+ ndev->name, dev->num_queues);
return 0;
-err1:
- ntb_transport_free_queue(dev->qp);
-err:
+err_free_qps:
+ for (q = 0; q < dev->num_queues; q++)
+ ntb_transport_free_queue(dev->queues[q].qp);
+
+err_free_queues:
+ kfree(dev->queues);
+
+err_free_netdev:
free_netdev(ndev);
return rc;
}
@@ -464,9 +604,14 @@ static void ntb_netdev_remove(struct device *client_dev)
{
struct net_device *ndev = dev_get_drvdata(client_dev);
struct ntb_netdev *dev = netdev_priv(ndev);
+ unsigned int q;
+
unregister_netdev(ndev);
- ntb_transport_free_queue(dev->qp);
+ for (q = 0; q < dev->num_queues; q++)
+ ntb_transport_free_queue(dev->queues[q].qp);
+
+ kfree(dev->queues);
free_netdev(ndev);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 29/35] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (27 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 28/35] ntb_netdev: Multi-queue support Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 30/35] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist Koichiro Den
` (6 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Some R-Car platforms using Synopsys DesignWare PCIe with the integrated
eDMA exhibit reproducible payload corruption in RC->EP remote DMA read
traffic whenever the endpoint issues 256-byte Memory Read (MRd) TLPs.
The eDMA injects multiple MRd requests of size less than or equal to
min(MRRS, MPS), so constraining the endpoint's MRd request size removes
256-byte MRd TLPs and avoids the issue. This change adds a per-SoC knob
in the ntb_hw_epf driver and sets MRRS=128 on R-Car.
We intentionally do not change the endpoint's MPS. Per PCIe Base
Specification, MPS limits the payload size of TLPs with data transmitted
by the Function, while Max_Read_Request_Size limits the size of read
requests produced by the Function as a Requester. Limiting MRRS is
sufficient to constrain MRd Byte Count, while lowering MPS would also
throttle unrelated traffic (e.g. endpoint-originated Posted Writes and
Completions with Data) without being necessary for this fix.
This quirk is scoped to the affected endpoint only and can be removed
once the underlying issue is resolved in the controller/IP.
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/epf/ntb_hw_epf.c | 66 +++++++++++++++++++++++++++++----
1 file changed, 58 insertions(+), 8 deletions(-)
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index 5303a8944019..efe540a8c734 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -74,6 +74,12 @@ enum epf_ntb_bar {
NTB_BAR_NUM,
};
+struct ntb_epf_soc_data {
+ const enum pci_barno *barno_map;
+ /* non-zero to override MRRS for this SoC */
+ int force_mrrs;
+};
+
#define NTB_EPF_MAX_MW_COUNT (NTB_BAR_NUM - BAR_MW1)
struct ntb_epf_dev {
@@ -624,11 +630,12 @@ static int ntb_epf_init_dev(struct ntb_epf_dev *ndev)
}
static int ntb_epf_init_pci(struct ntb_epf_dev *ndev,
- struct pci_dev *pdev)
+ struct pci_dev *pdev,
+ const struct ntb_epf_soc_data *soc)
{
struct device *dev = ndev->dev;
size_t spad_sz, spad_off;
- int ret;
+ int ret, cur;
pci_set_drvdata(pdev, ndev);
@@ -646,6 +653,17 @@ static int ntb_epf_init_pci(struct ntb_epf_dev *ndev,
pci_set_master(pdev);
+ if (soc && pci_is_pcie(pdev) && soc->force_mrrs) {
+ cur = pcie_get_readrq(pdev);
+ ret = pcie_set_readrq(pdev, soc->force_mrrs);
+ if (ret)
+ dev_warn(&pdev->dev, "failed to set MRRS=%d: %d\n",
+ soc->force_mrrs, ret);
+ else
+ dev_info(&pdev->dev, "capped MRRS: %d->%d for ntb-epf\n",
+ cur, soc->force_mrrs);
+ }
+
ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
if (ret) {
ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
@@ -720,6 +738,7 @@ static void ntb_epf_cleanup_isr(struct ntb_epf_dev *ndev)
static int ntb_epf_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *id)
{
+ const struct ntb_epf_soc_data *soc = (const void *)id->driver_data;
struct device *dev = &pdev->dev;
struct ntb_epf_dev *ndev;
int ret;
@@ -731,16 +750,16 @@ static int ntb_epf_pci_probe(struct pci_dev *pdev,
if (!ndev)
return -ENOMEM;
- ndev->barno_map = (const enum pci_barno *)id->driver_data;
- if (!ndev->barno_map)
+ if (!soc || !soc->barno_map)
return -EINVAL;
+ ndev->barno_map = soc->barno_map;
ndev->dev = dev;
ntb_epf_init_struct(ndev, pdev);
mutex_init(&ndev->cmd_lock);
- ret = ntb_epf_init_pci(ndev, pdev);
+ ret = ntb_epf_init_pci(ndev, pdev, soc);
if (ret) {
dev_err(dev, "Failed to init PCI\n");
return ret;
@@ -812,21 +831,52 @@ static const enum pci_barno rcar_barno[NTB_BAR_NUM] = {
[BAR_MW4] = NO_BAR,
};
+static const struct ntb_epf_soc_data j721e_soc = {
+ .barno_map = j721e_map,
+};
+
+static const struct ntb_epf_soc_data mx8_soc = {
+ .barno_map = mx8_map,
+};
+
+static const struct ntb_epf_soc_data rcar_soc = {
+ .barno_map = rcar_barno,
+ /*
+ * On some R-Car platforms using the Synopsys DWC PCIe + eDMA we
+ * observe data corruption on RC->EP Remote DMA Read paths whenever
+ * the EP issues large MRd requests. The corruption consistently
+ * hits the tail of each 256-byte segment (e.g. offsets
+ * 0x00E0..0x00FF within a 256B block, and again at 0x01E0..0x01FF
+ * for larger transfers).
+ *
+ * The DMA injects multiple MRd requests of size less than or equal
+ * to the min(MRRS, MPS) into the outbound request path. By
+ * lowering MRRS to 128 we prevent 256B MRd TLPs from being
+ * generated and avoid the issue on the affected hardware. We
+ * intentionally keep MPS unchanged and scope this quirk to this
+ * endpoint to avoid impacting unrelated devices.
+ *
+ * Remove this once the issue is resolved (maybe controller/IP
+ * level) or a more preferable workaround becomes available.
+ */
+ .force_mrrs = 128,
+};
+
static const struct pci_device_id ntb_epf_pci_tbl[] = {
{
PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_J721E),
.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
- .driver_data = (kernel_ulong_t)j721e_map,
+ .driver_data = (kernel_ulong_t)&j721e_soc,
},
{
PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x0809),
.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
- .driver_data = (kernel_ulong_t)mx8_map,
+ .driver_data = (kernel_ulong_t)&mx8_soc,
},
{
PCI_DEVICE(PCI_VENDOR_ID_RENESAS, 0x0030),
.class = PCI_CLASS_MEMORY_RAM << 8, .class_mask = 0xffff00,
- .driver_data = (kernel_ulong_t)rcar_barno,
+ .driver_data = (kernel_ulong_t)&rcar_soc,
},
{ },
};
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 30/35] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (28 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 29/35] NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car) Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 31/35] iommu: ipmmu-vmsa: Add support for reserved regions Koichiro Den
` (5 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add the PCIe ch0 to the ipmmu-vmsa devices_allowlist so that traffic
routed through this PCIe instance can be translated by the IOMMU.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/iommu/ipmmu-vmsa.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ca848288dbf2..724d67ad5ef2 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -743,7 +743,9 @@ static const char * const devices_allowlist[] = {
"ee100000.mmc",
"ee120000.mmc",
"ee140000.mmc",
- "ee160000.mmc"
+ "ee160000.mmc",
+ "e65d0000.pcie",
+ "e65d0000.pcie-ep",
};
static bool ipmmu_device_is_allowed(struct device *dev)
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 31/35] iommu: ipmmu-vmsa: Add support for reserved regions
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (29 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 30/35] iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 32/35] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA Koichiro Den
` (4 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add support for reserved regions using iommu_dma_get_resv_regions().
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/iommu/ipmmu-vmsa.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 724d67ad5ef2..4a89d95db0f8 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -25,6 +25,8 @@
#include <linux/slab.h>
#include <linux/sys_soc.h>
+#include "dma-iommu.h"
+
#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
#include <asm/dma-iommu.h>
#else
@@ -888,6 +890,7 @@ static const struct iommu_ops ipmmu_ops = {
.device_group = IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)
? generic_device_group : generic_single_device_group,
.of_xlate = ipmmu_of_xlate,
+ .get_resv_regions = iommu_dma_get_resv_regions,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = ipmmu_attach_device,
.map_pages = ipmmu_map,
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 32/35] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (30 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 31/35] iommu: ipmmu-vmsa: Add support for reserved regions Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 33/35] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car Koichiro Den
` (3 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add dedicated DTs for the Spider CPU+BreakOut boards when used in PCIe
RC/EP mode with DW PCIe eDMA based NTB transport.
* r8a779f0-spider-rc.dts describes the board in RC mode.
It reserves 4 MiB of IOVA starting at 0xfe000000, which on this SoC
is the ECAM/Config aperture of the PCIe host bridge. In stress
testing with the remote eDMA, allowing generic DMA mappings to occupy
this range led to immediate instability. The exact mechanism is under
investigation, but reserving the range avoids the issue in practice.
* r8a779f0-spider-ep.dts describes the board in EP mode.
The RC interface is disabled and the EP interface is enabled. IPMMU
usage matches the RC case.
The base r8a779f0-spider.dts is intentionally left unchanged and
continues to describe the default RC-only board configuration.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
arch/arm64/boot/dts/renesas/Makefile | 2 +
.../boot/dts/renesas/r8a779f0-spider-ep.dts | 37 +++++++++++++
.../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +++++++++++++++++++
3 files changed, 91 insertions(+)
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
diff --git a/arch/arm64/boot/dts/renesas/Makefile b/arch/arm64/boot/dts/renesas/Makefile
index 1fab1b50f20e..e8d312be515b 100644
--- a/arch/arm64/boot/dts/renesas/Makefile
+++ b/arch/arm64/boot/dts/renesas/Makefile
@@ -82,6 +82,8 @@ dtb-$(CONFIG_ARCH_R8A77995) += r8a77995-draak-panel-aa104xd12.dtb
dtb-$(CONFIG_ARCH_R8A779A0) += r8a779a0-falcon.dtb
dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider.dtb
+dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider-ep.dtb
+dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f0-spider-rc.dtb
dtb-$(CONFIG_ARCH_R8A779F0) += r8a779f4-s4sk.dtb
dtb-$(CONFIG_ARCH_R8A779G0) += r8a779g0-white-hawk.dtb
diff --git a/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts b/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
new file mode 100644
index 000000000000..6753f8497d0d
--- /dev/null
+++ b/arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Device Tree Source for the Spider CPU and BreakOut boards
+ * (PCIe EP mode with DW PCIe eDMA used for NTB transport)
+ *
+ * Based on the base r8a779f0-spider.dts.
+ *
+ * Copyright (C) 2025 Renesas Electronics Corp.
+ */
+
+/dts-v1/;
+#include "r8a779f0-spider-cpu.dtsi"
+#include "r8a779f0-spider-ethernet.dtsi"
+
+/ {
+ model = "Renesas Spider CPU and Breakout boards based on r8a779f0";
+ compatible = "renesas,spider-breakout", "renesas,spider-cpu",
+ "renesas,r8a779f0";
+};
+
+&i2c4 {
+ eeprom@51 {
+ compatible = "rohm,br24g01", "atmel,24c01";
+ label = "breakout-board";
+ reg = <0x51>;
+ pagesize = <8>;
+ };
+};
+
+&pciec0 {
+ status = "disabled";
+};
+
+&pciec0_ep {
+ iommus = <&ipmmu_hc 32>;
+ status = "okay";
+};
diff --git a/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts b/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
new file mode 100644
index 000000000000..c7112862e1e1
--- /dev/null
+++ b/arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Device Tree Source for the Spider CPU and BreakOut boards
+ * (PCIe RC mode with remote DW PCIe eDMA used for NTB transport)
+ *
+ * Based on the base r8a779f0-spider.dts.
+ *
+ * Copyright (C) 2025 Renesas Electronics Corp.
+ */
+
+/dts-v1/;
+#include "r8a779f0-spider-cpu.dtsi"
+#include "r8a779f0-spider-ethernet.dtsi"
+
+/ {
+ model = "Renesas Spider CPU and Breakout boards based on r8a779f0";
+ compatible = "renesas,spider-breakout", "renesas,spider-cpu",
+ "renesas,r8a779f0";
+
+ reserved-memory {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ ranges;
+
+ /*
+ * Reserve 4 MiB of IOVA starting at 0xfe000000. Allowing DMA
+ * writes whose DAR (destination IOVA) falls numerically inside
+ * the ECAM/config window has been observed to trigger
+ * controller misbehavior.
+ */
+ pciec0_iova_resv: pcie-iova-resv {
+ iommu-addresses = <&pciec0 0x0 0xfe000000 0x0 0x00400000>;
+ };
+ };
+};
+
+&i2c4 {
+ eeprom@51 {
+ compatible = "rohm,br24g01", "atmel,24c01";
+ label = "breakout-board";
+ reg = <0x51>;
+ pagesize = <8>;
+ };
+};
+
+&pciec0 {
+ iommus = <&ipmmu_hc 32>;
+ iommu-map = <0 &ipmmu_hc 32 1>;
+ iommu-map-mask = <0>;
+
+ memory-region = <&pciec0_iova_resv>;
+};
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 33/35] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (31 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 32/35] arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe eDMA Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 34/35] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage Koichiro Den
` (2 subsequent siblings)
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
To enable remote eDMA mode on NTB transport, one additional memory
window is required. Since a single BAR can now be split into multiple
memory windows, add MW2 to BAR2 on R-Car.
For pci_epf_vntb configfs settings, users who want to use MW2 (e.g. to
enable remote eDMA mode for NTB transport as mentioned above) may
configure as follows:
$ echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
$ echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
$ echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
$ echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
$ echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
$ echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
drivers/ntb/hw/epf/ntb_hw_epf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ntb/hw/epf/ntb_hw_epf.c b/drivers/ntb/hw/epf/ntb_hw_epf.c
index efe540a8c734..18d27ba9b6f4 100644
--- a/drivers/ntb/hw/epf/ntb_hw_epf.c
+++ b/drivers/ntb/hw/epf/ntb_hw_epf.c
@@ -826,7 +826,7 @@ static const enum pci_barno rcar_barno[NTB_BAR_NUM] = {
[BAR_PEER_SPAD] = BAR_0,
[BAR_DB] = BAR_4,
[BAR_MW1] = BAR_2,
- [BAR_MW2] = NO_BAR,
+ [BAR_MW2] = BAR_2,
[BAR_MW3] = NO_BAR,
[BAR_MW4] = NO_BAR,
};
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 34/35] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (32 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 33/35] NTB: epf: Add an additional memory window (MW2) barno mapping on Renesas R-Car Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2025-12-17 15:16 ` [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend Koichiro Den
2025-12-19 15:12 ` [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Frank Li
35 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add a concrete example showing how to place multiple memory windows in
the same BAR (one for data, one for interrupts) by using 'mwN_offset'
and 'mwN_bar'.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
Documentation/PCI/endpoint/pci-vntb-howto.rst | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/Documentation/PCI/endpoint/pci-vntb-howto.rst b/Documentation/PCI/endpoint/pci-vntb-howto.rst
index 9a7a2f0a6849..bfc3e51ab79f 100644
--- a/Documentation/PCI/endpoint/pci-vntb-howto.rst
+++ b/Documentation/PCI/endpoint/pci-vntb-howto.rst
@@ -90,9 +90,9 @@ of the function device and is populated with the following NTB specific
attributes that can be configured by the user::
# ls functions/pci_epf_vntb/func1/pci_epf_vntb.0/
- ctrl_bar db_count mw1_bar mw2_bar mw3_bar mw4_bar spad_count
- db_bar mw1 mw2 mw3 mw4 num_mws vbus_number
- vntb_vid vntb_pid
+ ctrl_bar mw1 mw2 mw3 mw4 num_mws vntb_pid
+ db_bar mw1_bar mw2_bar mw3_bar mw4_bar spad_count vntb_vid
+ db_count mw1_offset mw2_offset mw3_offset mw4_offset vbus_number
A sample configuration for NTB function is given below::
@@ -111,6 +111,16 @@ A sample configuration for virtual NTB driver for virtual PCI bus::
# echo 0x080A > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
# echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
+When BAR resources are tight but you still need to create many memory
+windows, you can pack multiple windows into a single BAR via 'mwN_offset'
+and 'mwN_bar' as shown below::
+
+ # echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
+ # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
+ # echo 0xE0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
+ # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
+ # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
+
Binding pci-epf-ntb Device to EP Controller
--------------------------------------------
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (33 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 34/35] Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset usage Koichiro Den
@ 2025-12-17 15:16 ` Koichiro Den
2026-01-06 21:09 ` Dave Jiang
2025-12-19 15:12 ` [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Frank Li
35 siblings, 1 reply; 61+ messages in thread
From: Koichiro Den @ 2025-12-17 15:16 UTC (permalink / raw)
To: Frank.Li, dave.jiang, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring, den
Add a description of the ntb_transport backend architecture and the new
remote eDMA backed mode introduced by CONFIG_NTB_TRANSPORT_EDMA and the
use_remote_edma module parameter.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
---
Documentation/driver-api/ntb.rst | 58 ++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
index a49c41383779..eb7b889d17c4 100644
--- a/Documentation/driver-api/ntb.rst
+++ b/Documentation/driver-api/ntb.rst
@@ -132,6 +132,64 @@ Transport queue pair. Network data is copied between socket buffers and the
Transport queue pair buffer. The Transport client may be used for other things
besides Netdev, however no other applications have yet been written.
+Transport backends
+~~~~~~~~~~~~~~~~~~
+
+The ``ntb_transport`` core driver implements a generic "queue pair"
+abstraction on top of the memory windows exported by the NTB hardware. Each
+queue pair has a TX and an RX ring and is used by client drivers such as
+``ntb_netdev`` to exchange variable sized payloads with the peer.
+
+There are currently two ways for ``ntb_transport`` to move payload data
+between the local system memory and the peer:
+
+* The default backend copies data between the caller buffers and the TX/RX
+ rings in the memory windows using ``memcpy()`` on the local CPU or, when
+ the ``use_dma`` module parameter is set, a local DMA engine via the
+ standard dmaengine ``DMA_MEMCPY`` interface.
+
+* When ``CONFIG_NTB_TRANSPORT_EDMA`` is enabled in the kernel configuration
+ and the ``use_remote_edma`` module parameter is set at run time, a second
+ backend uses a DesignWare eDMA engine that resides on the endpoint side
+ of the NTB. In this mode the endpoint driver exposes a dedicated peer
+ memory window that contains the eDMA register block together with a small
+ control structure and per-channel linked-list rings only for read
+ channels. The host ioremaps this window and configures a dmaengine
+ device. The endpoint uses its local eDMA write channels for its TX
+ transfer, while the host side uses the remote eDMA read channels for its
+ TX transfer.
+
+The ``ntb_transport`` core routes queue pair operations (enqueue,
+completion polling, link bring-up/teardown etc.) through a small
+backend-ops structure so that both implementations can coexist in the same
+module without affecting the public queue pair API used by clients. From a
+client driver's point of view (for example ``ntb_netdev``) the queue pair
+interface is the same regardless of which backend is active.
+
+When ``use_remote_edma`` is not enabled, ``ntb_transport`` behaves as in
+previous kernels before the optional ``use_remote_edma`` parameter was
+introduced, and continues to use the shared-memory backend. Existing
+configurations that do not select the eDMA backend therefore see no
+behavioural change.
+
+In the remote eDMA mode host-to-endpoint notifications are delivered via a
+dedicated DMA read channel located at the endpoint. In both the default
+backend mode and the remote eDMA mode, endpoint-to-host notifications are
+backed by native MSI support on DW EPC, even when ``use_msi=0``. Because
+of this, the ``use_msi`` module parameter has no effect when
+``use_remote_edma=1`` on the host.
+
+At a high level, enabling the remote eDMA transport backend requires:
+
+* building the kernel with ``CONFIG_NTB_TRANSPORT`` and
+ ``CONFIG_NTB_TRANSPORT_EDMA`` enabled,
+* configuring the NTB endpoint so that it exposes a memory window containing
+ the eDMA register block, descriptor rings and control structure expected by
+ the helper driver, and
+* loading ``ntb_transport`` on the host with ``use_remote_edma=1`` so that
+ the eDMA-backed backend is selected instead of the default shared-memory
+ backend.
+
NTB Ping Pong Test Client (ntb\_pingpong)
-----------------------------------------
--
2.51.0
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend
2025-12-17 15:16 ` [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend Koichiro Den
@ 2026-01-06 21:09 ` Dave Jiang
2026-01-07 15:13 ` Koichiro Den
0 siblings, 1 reply; 61+ messages in thread
From: Dave Jiang @ 2026-01-06 21:09 UTC (permalink / raw)
To: Koichiro Den, Frank.Li, ntb, linux-pci, dmaengine,
linux-renesas-soc, netdev, linux-kernel
Cc: mani, kwilczynski, kishon, bhelgaas, corbet, geert+renesas,
magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro, will,
robin.murphy, jdmason, allenbh, andrew+netdev, davem, edumazet,
kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k, kurt.schwemmer,
logang, jingoohan1, lpieralisi, utkarsh02t, jbrunet, dlemoal,
arnd, elfring
On 12/17/25 8:16 AM, Koichiro Den wrote:
> Add a description of the ntb_transport backend architecture and the new
> remote eDMA backed mode introduced by CONFIG_NTB_TRANSPORT_EDMA and the
> use_remote_edma module parameter.
>
> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
> Documentation/driver-api/ntb.rst | 58 ++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
> index a49c41383779..eb7b889d17c4 100644
> --- a/Documentation/driver-api/ntb.rst
> +++ b/Documentation/driver-api/ntb.rst
> @@ -132,6 +132,64 @@ Transport queue pair. Network data is copied between socket buffers and the
> Transport queue pair buffer. The Transport client may be used for other things
> besides Netdev, however no other applications have yet been written.
>
> +Transport backends
> +~~~~~~~~~~~~~~~~~~
> +
> +The ``ntb_transport`` core driver implements a generic "queue pair"
> +abstraction on top of the memory windows exported by the NTB hardware. Each
> +queue pair has a TX and an RX ring and is used by client drivers such as
> +``ntb_netdev`` to exchange variable sized payloads with the peer.
> +
> +There are currently two ways for ``ntb_transport`` to move payload data
> +between the local system memory and the peer:
> +
> +* The default backend copies data between the caller buffers and the TX/RX
> + rings in the memory windows using ``memcpy()`` on the local CPU or, when
> + the ``use_dma`` module parameter is set, a local DMA engine via the
> + standard dmaengine ``DMA_MEMCPY`` interface.
> +
> +* When ``CONFIG_NTB_TRANSPORT_EDMA`` is enabled in the kernel configuration
> + and the ``use_remote_edma`` module parameter is set at run time, a second
> + backend uses a DesignWare eDMA engine that resides on the endpoint side
I would say "embedded DMA device" instead of a specific DesignWare eDMA engine to keep the transport generic. But provide a reference or link to DesignWare eDMA engine as reference.
> + of the NTB. In this mode the endpoint driver exposes a dedicated peer
> + memory window that contains the eDMA register block together with a small
> + control structure and per-channel linked-list rings only for read
> + channels. The host ioremaps this window and configures a dmaengine
> + device. The endpoint uses its local eDMA write channels for its TX
> + transfer, while the host side uses the remote eDMA read channels for its
> + TX transfer.
Can you provide some more text on the data flow from one host to the other for eDMA vs via host based DMA in the current transport? i.e. currently for a transmit, user data gets copied into an skbuff by the network stack, and then the local host copies it into the ring buffer on the remote host via DMA write (or CPU). And the remote host then copies out of the ring buffer entry to a kernel skbuff and back to user space on the receiver side. How does it now work with eDMA? Also can the mechanism used by eDMA be achieved with a host DMA setup or is the eDMA mechanism specifically tied to the DW hardware design? Would be nice to move the ASCII data flow diagram in the cover to documentation so we don't lose that.
DJ
> +
> +The ``ntb_transport`` core routes queue pair operations (enqueue,
> +completion polling, link bring-up/teardown etc.) through a small
> +backend-ops structure so that both implementations can coexist in the same
> +module without affecting the public queue pair API used by clients. From a
> +client driver's point of view (for example ``ntb_netdev``) the queue pair
> +interface is the same regardless of which backend is active.
> +
> +When ``use_remote_edma`` is not enabled, ``ntb_transport`` behaves as in
> +previous kernels before the optional ``use_remote_edma`` parameter was
> +introduced, and continues to use the shared-memory backend. Existing
> +configurations that do not select the eDMA backend therefore see no
> +behavioural change.
> +
> +In the remote eDMA mode host-to-endpoint notifications are delivered via a
> +dedicated DMA read channel located at the endpoint. In both the default
> +backend mode and the remote eDMA mode, endpoint-to-host notifications are
> +backed by native MSI support on DW EPC, even when ``use_msi=0``. Because
> +of this, the ``use_msi`` module parameter has no effect when
> +``use_remote_edma=1`` on the host.
> +
> +At a high level, enabling the remote eDMA transport backend requires:
> +
> +* building the kernel with ``CONFIG_NTB_TRANSPORT`` and
> + ``CONFIG_NTB_TRANSPORT_EDMA`` enabled,
> +* configuring the NTB endpoint so that it exposes a memory window containing
> + the eDMA register block, descriptor rings and control structure expected by
> + the helper driver, and
> +* loading ``ntb_transport`` on the host with ``use_remote_edma=1`` so that
> + the eDMA-backed backend is selected instead of the default shared-memory
> + backend.
> +
> NTB Ping Pong Test Client (ntb\_pingpong)
> -----------------------------------------
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend
2026-01-06 21:09 ` Dave Jiang
@ 2026-01-07 15:13 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2026-01-07 15:13 UTC (permalink / raw)
To: Dave Jiang
Cc: Frank.Li@nxp.com, ntb@lists.linux.dev, linux-pci@vger.kernel.org,
dmaengine@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
mani@kernel.org, kwilczynski@kernel.org, kishon@kernel.org,
bhelgaas@google.com, corbet@lwn.net, geert+renesas@glider.be,
magnus.damm@gmail.com, robh@kernel.org, krzk+dt@kernel.org,
conor+dt@kernel.org, vkoul@kernel.org, joro@8bytes.org,
will@kernel.org, robin.murphy@arm.com, jdmason@kudzu.us,
allenbh@gmail.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
Basavaraj.Natikar@amd.com, Shyam-sundar.S-k@amd.com,
kurt.schwemmer@microsemi.com, logang@deltatee.com,
jingoohan1@gmail.com, lpieralisi@kernel.org, utkarsh02t@gmail.com,
jbrunet@baylibre.com, dlemoal@kernel.org, arnd@arndb.de,
elfring@users.sourceforge.net
On Wed, Jan 07, 2026 at 06:09:38AM +0900, Dave Jiang wrote:
>
>
> On 12/17/25 8:16 AM, Koichiro Den wrote:
> > Add a description of the ntb_transport backend architecture and the new
> > remote eDMA backed mode introduced by CONFIG_NTB_TRANSPORT_EDMA and the
> > use_remote_edma module parameter.
> >
> > Signed-off-by: Koichiro Den <den@valinux.co.jp>
> > ---
> > Documentation/driver-api/ntb.rst | 58 ++++++++++++++++++++++++++++++++
> > 1 file changed, 58 insertions(+)
> >
> > diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
> > index a49c41383779..eb7b889d17c4 100644
> > --- a/Documentation/driver-api/ntb.rst
> > +++ b/Documentation/driver-api/ntb.rst
> > @@ -132,6 +132,64 @@ Transport queue pair. Network data is copied between socket buffers and the
> > Transport queue pair buffer. The Transport client may be used for other things
> > besides Netdev, however no other applications have yet been written.
> >
> > +Transport backends
> > +~~~~~~~~~~~~~~~~~~
> > +
> > +The ``ntb_transport`` core driver implements a generic "queue pair"
> > +abstraction on top of the memory windows exported by the NTB hardware. Each
> > +queue pair has a TX and an RX ring and is used by client drivers such as
> > +``ntb_netdev`` to exchange variable sized payloads with the peer.
> > +
> > +There are currently two ways for ``ntb_transport`` to move payload data
> > +between the local system memory and the peer:
> > +
> > +* The default backend copies data between the caller buffers and the TX/RX
> > + rings in the memory windows using ``memcpy()`` on the local CPU or, when
> > + the ``use_dma`` module parameter is set, a local DMA engine via the
> > + standard dmaengine ``DMA_MEMCPY`` interface.
> > +
> > +* When ``CONFIG_NTB_TRANSPORT_EDMA`` is enabled in the kernel configuration
> > + and the ``use_remote_edma`` module parameter is set at run time, a second
> > + backend uses a DesignWare eDMA engine that resides on the endpoint side
>
> I would say "embedded DMA device" instead of a specific DesignWare eDMA engine to keep the transport generic. But provide a reference or link to DesignWare eDMA engine as reference.
That makes sense. I will switch the wording.
>
> > + of the NTB. In this mode the endpoint driver exposes a dedicated peer
> > + memory window that contains the eDMA register block together with a small
> > + control structure and per-channel linked-list rings only for read
> > + channels. The host ioremaps this window and configures a dmaengine
> > + device. The endpoint uses its local eDMA write channels for its TX
> > + transfer, while the host side uses the remote eDMA read channels for its
> > + TX transfer.
>
> Can you provide some more text on the data flow from one host to the other for eDMA vs via host based DMA in the current transport? i.e. currently for a transmit, user data gets copied into an skbuff by the network stack, and then the local host copies it into the ring buffer on the remote host via DMA write (or CPU). And the remote host then copies out of the ring buffer entry to a kernel skbuff and back to user space on the receiver side. How does it now work with eDMA? Also can the mechanism used by eDMA be achieved with a host DMA setup or is the eDMA mechanism specifically tied to the DW hardware design? Would be nice to move the ASCII data flow diagram in the cover to documentation so we don't lose that.
I'll add more text (and the ASCII data-flow diagram).
Thanks,
Koichiro
>
> DJ
>
> > +
> > +The ``ntb_transport`` core routes queue pair operations (enqueue,
> > +completion polling, link bring-up/teardown etc.) through a small
> > +backend-ops structure so that both implementations can coexist in the same
> > +module without affecting the public queue pair API used by clients. From a
> > +client driver's point of view (for example ``ntb_netdev``) the queue pair
> > +interface is the same regardless of which backend is active.
> > +
> > +When ``use_remote_edma`` is not enabled, ``ntb_transport`` behaves as in
> > +previous kernels before the optional ``use_remote_edma`` parameter was
> > +introduced, and continues to use the shared-memory backend. Existing
> > +configurations that do not select the eDMA backend therefore see no
> > +behavioural change.
> > +
> > +In the remote eDMA mode host-to-endpoint notifications are delivered via a
> > +dedicated DMA read channel located at the endpoint. In both the default
> > +backend mode and the remote eDMA mode, endpoint-to-host notifications are
> > +backed by native MSI support on DW EPC, even when ``use_msi=0``. Because
> > +of this, the ``use_msi`` module parameter has no effect when
> > +``use_remote_edma=1`` on the host.
> > +
> > +At a high level, enabling the remote eDMA transport backend requires:
> > +
> > +* building the kernel with ``CONFIG_NTB_TRANSPORT`` and
> > + ``CONFIG_NTB_TRANSPORT_EDMA`` enabled,
> > +* configuring the NTB endpoint so that it exposes a memory window containing
> > + the eDMA register block, descriptor rings and control structure expected by
> > + the helper driver, and
> > +* loading ``ntb_transport`` on the host with ``use_remote_edma=1`` so that
> > + the eDMA-backed backend is selected instead of the default shared-memory
> > + backend.
> > +
> > NTB Ping Pong Test Client (ntb\_pingpong)
> > -----------------------------------------
> >
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA
2025-12-17 15:15 [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Koichiro Den
` (34 preceding siblings ...)
2025-12-17 15:16 ` [RFC PATCH v3 35/35] Documentation: driver-api: ntb: Document remote eDMA transport backend Koichiro Den
@ 2025-12-19 15:12 ` Frank Li
2025-12-20 15:44 ` Koichiro Den
35 siblings, 1 reply; 61+ messages in thread
From: Frank Li @ 2025-12-19 15:12 UTC (permalink / raw)
To: Koichiro Den
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Thu, Dec 18, 2025 at 12:15:34AM +0900, Koichiro Den wrote:
> Hi,
>
> This is RFC v3 of the NTB/PCI series that introduces NTB transport backed
> by DesignWare PCIe integrated eDMA.
>
> RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>
> The goal is to improve performance between a host and an endpoint over
> ntb_transport (typically with ntb_netdev on top). On R-Car S4, preliminary
> iperf3 results show 10~20x throughput improvement. Latency improvements are
> also observed.
Great!
>
> In this approach, payload is transferred by DMA directly between host and
> endpoint address spaces, and the NTB Memory Window is primarily used as a
> control/metadata window (and to expose the eDMA register/LL regions).
> Compared to the memcpy-based transport, this avoids extra copies and
> enables deeper rings and scales out to multiple queue pairs.
>
> Compared to RFC v2, data plane works in a symmetric manner in both
> directions (host-to-endpoint and endpoint-to-host). The host side drives
> remote read channels for its TX transfer while the endpoint drives local
> write channels.
>
> Again, I recognize that this is quite a large series. Sorry for the volume,
> but for the RFC stage I believe presenting the full picture in a single set
> helps with reviewing the overall architecture (Of course detail feedback
> would be appreciated as well). Once the direction is agreed, I will respin
> it split by subsystem and topic.
>
> Many thanks for all the reviews and feedback from multiple perspectives.
In next two weeks, it is holiday, I have not much time to review this long
thread. I glace for over all.
You can do some prepare work to speed up this great work's upstream.
Split prepare work for ntb change to new thread.
Split fix/code cleanup to new thread.
Beside some simple clean up,
- you start iatu for address mode match support first.
- eDMA some change, such as export reg base and LL region to support
remote DMA mode. (you can add it to pci-epf-test.c to do base test).
Frank
>
>
> Data flow overview
> ==================
>
> Figure 1. RC->EP traffic via ntb_netdev+ntb_transport
> backed by Remote eDMA
>
> EP RC
> phys addr phys addr
> space space
> +-+ +-+
> | | | |
> | | || | |
> +-+-----. || | |
> EDMA REG | | \ [A] || | |
> +-+----. '---+-+ || | |
> | | \ | |<---------[0-a]----------
> +-+-----------| |<----------[2]----------.
> EDMA LL | | | | || | | :
> | | | | || | | :
> +-+-----------+-+ || [B] | | :
> | | || ++ | | :
> ---------[0-b]----------->||----------------'
> | | ++ || || | |
> | | || || ++ | |
> | | ||<----------[4]-----------
> | | ++ || | |
> | | [C] || | |
> .--|#|<------------------------[3]------|#|<-.
> : |#| || |#| :
> [5] | | || | | [1]
> : | | || | | :
> '->|#| |#|--'
> |#| |#|
> | | | |
>
>
> Figure 2. EP->RC traffic via ntb_netdev+ntb_transport
> backed by EP-Local eDMA
>
> EP RC
> phys addr phys addr
> space space
> +-+ +-+
> | | | |
> | | || | |
> +-+ || | |
> EDMA REG | | || | |
> +-+ || | |
> ^ | | || | |
> : +-+ || | |
> : EDMA LL | | || | |
> : | | || | |
> : +-+ || [C] | |
> : | | || ++ | |
> : -----------[4]----------->|| | |
> : | | ++ || || | |
> : | | || || ++ | |
> '----------------[2]-----||<--------[0-b]-----------
> | | ++ || | |
> | | [B] || | |
> .->|#|--------[3]---------------------->|#|--.
> : |#| || |#| :
> [1] | | || | | [5]
> : | | || | | :
> '--|#| |#|<-'
> |#| |#|
> | | | |
>
>
> 0-a. configure Remote eDMA
> 0-b. DMA-map and produce DAR
> 1. memcpy while building skb in ntb_netdev case
> 2. consume DAR, DMA-map SAR and kick DMA read transfer
> 3. DMA transfer
> 4. consume (commit)
> 5. memcpy to application side
>
> [A]: MemoryWindow that aggregates eDMA regs and LL.
> IB iATU translations (Address Match Mode).
> [B]: Control plane ring buffer (for "produce")
> [C]: Control plane ring buffer (for "consume")
>
> Note:
> - Figure 1 is unchanged from RFC v2.
> - Figure 2 differs from the one depicted in RFC v2 cover letter.
>
>
> Changes since RFC v2
> ====================
>
> RFCv2->RFCv3 changes:
> - Architecture
> - Have EP side use its local write channels, while leaving RC side to
> use remote read channels.
> - Abstraction/HW-specific stuff encapsulation improved.
> - Added control/config region versioning for the vNTB/EPF control region
> so that mismatched RC/EP kernels fail early instead of silently using an
> incompatible layout.
> - Reworked BAR subrange / multi-region mapping support:
> - Dropped the v2 approach that added new inbound mapping ops in the EPC
> core.
> - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
> support BAR subrange inbound mapping via Address Match Mode IB iATU.
> - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
> when offsets are used.
> - Changed .get_pci_epc() to .get_private_data()
> - Dropped two commits from RFC v2 that should be submitted separately:
> (1) ntb_transport debugfs seq_file conversion
> (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
> - Added documentation updates.
> - Addressed assorted review nits from the RFC v2 thread (naming/structure).
>
> RFCv1->RFCv2 changes:
> - Architecture
> - Drop the generic interrupt backend + DW eDMA test-interrupt backend
> approach and instead adopt the remote eDMA-backed ntb_transport mode
> proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
> mapping (Address Match Mode) infrastructure from RFC v1 is largely
> kept, with only minor refinements and code motion where necessary
> to fit the new transport-mode design.
> - For Patch 01
> - Rework the array_index_nospec() conversion to address review
> comments on "[RFC PATCH 01/25]".
>
> RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
>
>
> Patch layout
> ============
>
> Patch 01-25 : preparation for Patch 26
> - 01-07: support multiple MWs in a BAR
> - 08-25: other misc preparations
> Patch 26 : main and most important patch, adds eDMA-backed transport
> Patch 27-28 : multi-queue use, thanks to the remote eDMA, performance
> scales
> Patch 29-33 : handle several SoC-specific issues so that remote eDMA
> mode ntb_transport works on R-Car S4
> Patch 34-35 : kernel doc updates
>
>
> Tested on
> =========
>
> * 2x Renesas R-Car S4 Spider (RC<->EP connected with OcuLink cable)
> * Kernel base: next-20251216 + [1] + [2] + [3]
>
> [1]: https://lore.kernel.org/all/20251210071358.2267494-2-cassel@kernel.org/
> (this is a spin-out patch from
> https://lore.kernel.org/linux-pci/20251129160405.2568284-20-den@valinux.co.jp/)
> [2]: https://lore.kernel.org/all/20251208-dma_prep_config-v1-0-53490c5e1e2a@nxp.com/
> (while it appears to still be under active discussion)
> [3]: https://lore.kernel.org/all/20251217081955.3137163-1-den@valinux.co.jp/
> (this is a spin-out patch from
> https://lore.kernel.org/all/20251129160405.2568284-14-den@valinux.co.jp/)
>
>
> Performance measurement
> =======================
>
> No serious measurements yet, because:
> * For "before the change", even use_dma/use_msi does not work on the
> upstream kernel unless we apply some patches for R-Car S4. With some
> unmerged patch series I had posted earlier (but superseded by this RFC
> attempt), it was observed that we can achieve about 7 Gbps for the
> RC->EP direction. Pure upstream kernel can achieve around 500 Mbps
> though.
> * For "after the change", measurements are not mature because this
> RFC v3 patch series is not yet performance-optimized at this stage.
>
> Here are the rough measurements showing the achievable performance on
> the R-Car S4:
>
> - Before this change:
>
> * ping
> 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
>
> * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
>
> * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
>
> Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
>
> - After this change (use_remote_edma=1):
>
> * ping
> 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.42 ms
> 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.38 ms
> 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.21 ms
> 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=1.02 ms
> 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.06 ms
> 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.995 ms
> 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.964 ms
> 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=1.49 ms
>
> * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> [ 5] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.437 ms 33053/82329 (40%) receiver
> [ 6] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.174 ms 46379/95655 (48%) receiver
> [ 9] 0.00-10.02 sec 2.88 GBytes 2.47 Gbits/sec 0.106 ms 47672/94924 (50%) receiver
> [ 11] 0.00-10.02 sec 2.87 GBytes 2.46 Gbits/sec 0.364 ms 23694/70817 (33%) receiver
> [SUM] 0.00-10.02 sec 11.8 GBytes 10.1 Gbits/sec 0.270 ms 150798/343725 (44%) receiver
>
> * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> [ 5] 0.00-10.01 sec 3.28 GBytes 2.82 Gbits/sec 0.380 ms 38578/92355 (42%) receiver
> [ 6] 0.00-10.01 sec 3.24 GBytes 2.78 Gbits/sec 0.430 ms 14268/67340 (21%) receiver
> [ 9] 0.00-10.01 sec 2.92 GBytes 2.51 Gbits/sec 0.074 ms 0/47890 (0%) receiver
> [ 11] 0.00-10.01 sec 4.76 GBytes 4.09 Gbits/sec 0.037 ms 0/78073 (0%) receiver
> [SUM] 0.00-10.01 sec 14.2 GBytes 12.2 Gbits/sec 0.230 ms 52846/285658 (18%) receiver
>
> * configfs settings:
> # modprobe pci_epf_vntb
> # cd /sys/kernel/config/pci_ep/
> # mkdir functions/pci_epf_vntb/func1
> # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> # echo 1 > controllers/e65d0000.pcie-ep/start
>
>
>
> Thank you for reviewing,
>
>
> Koichiro Den (35):
> PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> access
> NTB: epf: Add mwN_offset support and config region versioning
> PCI: dwc: ep: Support BAR subrange inbound mapping via address match
> iATU
> NTB: Add offset parameter to MW translation APIs
> PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> present
> NTB: ntb_transport: Support partial memory windows with offsets
> PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC
> driver
> NTB: core: Add .get_private_data() to ntb_dev_ops
> NTB: epf: vntb: Implement .get_private_data() callback
> dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr
> interrupts
> NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> NTB: ntb_transport: Dynamically determine qp count
> NTB: ntb_transport: Introduce get_dma_dev() helper
> NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> NTB: ntb_transport: Move internal types to ntb_transport_internal.h
> NTB: ntb_transport: Introduce ntb_transport_backend_ops
> dmaengine: dw-edma: Add helper func to retrieve register base and size
> dmaengine: dw-edma: Add per-channel interrupt routing mode
> dmaengine: dw-edma: Poll completion when local IRQ handling is
> disabled
> dmaengine: dw-edma: Add notify-only channels support
> dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region
> dmaengine: dw-edma: Serialize RMW on shared interrupt registers
> NTB: ntb_transport: Split core into ntb_transport_core.c
> NTB: ntb_transport: Add additional hooks for DW eDMA backend
> NTB: hw: Introduce DesignWare eDMA helper
> NTB: ntb_transport: Introduce DW eDMA backed transport mode
> NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> ntb_netdev: Multi-queue support
> NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> iommu: ipmmu-vmsa: Add support for reserved regions
> arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> eDMA
> NTB: epf: Add an additional memory window (MW2) barno mapping on
> Renesas R-Car
> Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> usage
> Documentation: driver-api: ntb: Document remote eDMA transport backend
>
> Documentation/PCI/endpoint/pci-vntb-howto.rst | 16 +-
> Documentation/driver-api/ntb.rst | 58 +
> arch/arm64/boot/dts/renesas/Makefile | 2 +
> .../boot/dts/renesas/r8a779f0-spider-ep.dts | 37 +
> .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> drivers/dma/dw-edma/dw-edma-core.c | 233 ++++-
> drivers/dma/dw-edma/dw-edma-core.h | 13 +-
> drivers/dma/dw-edma/dw-edma-v0-core.c | 39 +-
> drivers/iommu/ipmmu-vmsa.c | 7 +-
> drivers/net/ntb_netdev.c | 341 ++++--
> drivers/ntb/Kconfig | 12 +
> drivers/ntb/Makefile | 4 +
> drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> drivers/ntb/hw/edma/ntb_hw_edma.c | 754 +++++++++++++
> drivers/ntb/hw/edma/ntb_hw_edma.h | 76 ++
> drivers/ntb/hw/epf/ntb_hw_epf.c | 187 +++-
> drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> drivers/ntb/msi.c | 6 +-
> .../{ntb_transport.c => ntb_transport_core.c} | 482 ++++-----
> drivers/ntb/ntb_transport_edma.c | 987 ++++++++++++++++++
> drivers/ntb/ntb_transport_internal.h | 220 ++++
> drivers/ntb/test/ntb_perf.c | 4 +-
> drivers/ntb/test/ntb_tool.c | 6 +-
> .../pci/controller/dwc/pcie-designware-ep.c | 198 +++-
> drivers/pci/controller/dwc/pcie-designware.c | 25 +
> drivers/pci/controller/dwc/pcie-designware.h | 2 +
> drivers/pci/endpoint/functions/pci-epf-vntb.c | 246 ++++-
> drivers/pci/endpoint/pci-epc-core.c | 2 +-
> include/linux/dma/edma.h | 106 ++
> include/linux/ntb.h | 38 +-
> include/linux/ntb_transport.h | 5 +
> include/linux/pci-epf.h | 27 +
> 37 files changed, 3716 insertions(+), 501 deletions(-)
> create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.c
> create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.h
> rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (91%)
> create mode 100644 drivers/ntb/ntb_transport_edma.c
> create mode 100644 drivers/ntb/ntb_transport_internal.h
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA
2025-12-19 15:12 ` [RFC PATCH v3 00/35] NTB transport backed by endpoint DW eDMA Frank Li
@ 2025-12-20 15:44 ` Koichiro Den
0 siblings, 0 replies; 61+ messages in thread
From: Koichiro Den @ 2025-12-20 15:44 UTC (permalink / raw)
To: Frank Li
Cc: dave.jiang, ntb, linux-pci, dmaengine, linux-renesas-soc, netdev,
linux-kernel, mani, kwilczynski, kishon, bhelgaas, corbet,
geert+renesas, magnus.damm, robh, krzk+dt, conor+dt, vkoul, joro,
will, robin.murphy, jdmason, allenbh, andrew+netdev, davem,
edumazet, kuba, pabeni, Basavaraj.Natikar, Shyam-sundar.S-k,
kurt.schwemmer, logang, jingoohan1, lpieralisi, utkarsh02t,
jbrunet, dlemoal, arnd, elfring
On Fri, Dec 19, 2025 at 10:12:11AM -0500, Frank Li wrote:
> On Thu, Dec 18, 2025 at 12:15:34AM +0900, Koichiro Den wrote:
> > Hi,
> >
> > This is RFC v3 of the NTB/PCI series that introduces NTB transport backed
> > by DesignWare PCIe integrated eDMA.
> >
> > RFC v2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> > RFC v1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >
> > The goal is to improve performance between a host and an endpoint over
> > ntb_transport (typically with ntb_netdev on top). On R-Car S4, preliminary
> > iperf3 results show 10~20x throughput improvement. Latency improvements are
> > also observed.
>
> Great!
>
> >
> > In this approach, payload is transferred by DMA directly between host and
> > endpoint address spaces, and the NTB Memory Window is primarily used as a
> > control/metadata window (and to expose the eDMA register/LL regions).
> > Compared to the memcpy-based transport, this avoids extra copies and
> > enables deeper rings and scales out to multiple queue pairs.
> >
> > Compared to RFC v2, data plane works in a symmetric manner in both
> > directions (host-to-endpoint and endpoint-to-host). The host side drives
> > remote read channels for its TX transfer while the endpoint drives local
> > write channels.
> >
> > Again, I recognize that this is quite a large series. Sorry for the volume,
> > but for the RFC stage I believe presenting the full picture in a single set
> > helps with reviewing the overall architecture (Of course detail feedback
> > would be appreciated as well). Once the direction is agreed, I will respin
> > it split by subsystem and topic.
> >
> > Many thanks for all the reviews and feedback from multiple perspectives.
>
> In next two weeks, it is holiday, I have not much time to review this long
> thread. I glace for over all.
>
> You can do some prepare work to speed up this great work's upstream.
>
> Split prepare work for ntb change to new thread.
> Split fix/code cleanup to new thread.
>
> Beside some simple clean up,
> - you start iatu for address mode match support first.
> - eDMA some change, such as export reg base and LL region to support
> remote DMA mode. (you can add it to pci-epf-test.c to do base test).
Thank you for the review and for the guidance.
As suggested, I'll start preparing smaller, focused patchsets per
subsystem, dropping RFC tag. Honestly I still haven't prepared anything for
pci-epf-test.c addition yet, I'll start working on that first.
Have a nice holiday,
Koichiro
>
> Frank
> >
> >
> > Data flow overview
> > ==================
> >
> > Figure 1. RC->EP traffic via ntb_netdev+ntb_transport
> > backed by Remote eDMA
> >
> > EP RC
> > phys addr phys addr
> > space space
> > +-+ +-+
> > | | | |
> > | | || | |
> > +-+-----. || | |
> > EDMA REG | | \ [A] || | |
> > +-+----. '---+-+ || | |
> > | | \ | |<---------[0-a]----------
> > +-+-----------| |<----------[2]----------.
> > EDMA LL | | | | || | | :
> > | | | | || | | :
> > +-+-----------+-+ || [B] | | :
> > | | || ++ | | :
> > ---------[0-b]----------->||----------------'
> > | | ++ || || | |
> > | | || || ++ | |
> > | | ||<----------[4]-----------
> > | | ++ || | |
> > | | [C] || | |
> > .--|#|<------------------------[3]------|#|<-.
> > : |#| || |#| :
> > [5] | | || | | [1]
> > : | | || | | :
> > '->|#| |#|--'
> > |#| |#|
> > | | | |
> >
> >
> > Figure 2. EP->RC traffic via ntb_netdev+ntb_transport
> > backed by EP-Local eDMA
> >
> > EP RC
> > phys addr phys addr
> > space space
> > +-+ +-+
> > | | | |
> > | | || | |
> > +-+ || | |
> > EDMA REG | | || | |
> > +-+ || | |
> > ^ | | || | |
> > : +-+ || | |
> > : EDMA LL | | || | |
> > : | | || | |
> > : +-+ || [C] | |
> > : | | || ++ | |
> > : -----------[4]----------->|| | |
> > : | | ++ || || | |
> > : | | || || ++ | |
> > '----------------[2]-----||<--------[0-b]-----------
> > | | ++ || | |
> > | | [B] || | |
> > .->|#|--------[3]---------------------->|#|--.
> > : |#| || |#| :
> > [1] | | || | | [5]
> > : | | || | | :
> > '--|#| |#|<-'
> > |#| |#|
> > | | | |
> >
> >
> > 0-a. configure Remote eDMA
> > 0-b. DMA-map and produce DAR
> > 1. memcpy while building skb in ntb_netdev case
> > 2. consume DAR, DMA-map SAR and kick DMA read transfer
> > 3. DMA transfer
> > 4. consume (commit)
> > 5. memcpy to application side
> >
> > [A]: MemoryWindow that aggregates eDMA regs and LL.
> > IB iATU translations (Address Match Mode).
> > [B]: Control plane ring buffer (for "produce")
> > [C]: Control plane ring buffer (for "consume")
> >
> > Note:
> > - Figure 1 is unchanged from RFC v2.
> > - Figure 2 differs from the one depicted in RFC v2 cover letter.
> >
> >
> > Changes since RFC v2
> > ====================
> >
> > RFCv2->RFCv3 changes:
> > - Architecture
> > - Have EP side use its local write channels, while leaving RC side to
> > use remote read channels.
> > - Abstraction/HW-specific stuff encapsulation improved.
> > - Added control/config region versioning for the vNTB/EPF control region
> > so that mismatched RC/EP kernels fail early instead of silently using an
> > incompatible layout.
> > - Reworked BAR subrange / multi-region mapping support:
> > - Dropped the v2 approach that added new inbound mapping ops in the EPC
> > core.
> > - Introduced `struct pci_epf_bar.submap` and extended DesignWare EP to
> > support BAR subrange inbound mapping via Address Match Mode IB iATU.
> > - pci-epf-vntb now provides a subrange mapping hint to the EPC driver
> > when offsets are used.
> > - Changed .get_pci_epc() to .get_private_data()
> > - Dropped two commits from RFC v2 that should be submitted separately:
> > (1) ntb_transport debugfs seq_file conversion
> > (2) DWC EP outbound iATU MSI mapping/cache fix (will be re-posted separately)
> > - Added documentation updates.
> > - Addressed assorted review nits from the RFC v2 thread (naming/structure).
> >
> > RFCv1->RFCv2 changes:
> > - Architecture
> > - Drop the generic interrupt backend + DW eDMA test-interrupt backend
> > approach and instead adopt the remote eDMA-backed ntb_transport mode
> > proposed by Frank Li. The BAR-sharing / mwN_offset / inbound
> > mapping (Address Match Mode) infrastructure from RFC v1 is largely
> > kept, with only minor refinements and code motion where necessary
> > to fit the new transport-mode design.
> > - For Patch 01
> > - Rework the array_index_nospec() conversion to address review
> > comments on "[RFC PATCH 01/25]".
> >
> > RFCv2: https://lore.kernel.org/all/20251129160405.2568284-1-den@valinux.co.jp/
> > RFCv1: https://lore.kernel.org/all/20251023071916.901355-1-den@valinux.co.jp/
> >
> >
> > Patch layout
> > ============
> >
> > Patch 01-25 : preparation for Patch 26
> > - 01-07: support multiple MWs in a BAR
> > - 08-25: other misc preparations
> > Patch 26 : main and most important patch, adds eDMA-backed transport
> > Patch 27-28 : multi-queue use, thanks to the remote eDMA, performance
> > scales
> > Patch 29-33 : handle several SoC-specific issues so that remote eDMA
> > mode ntb_transport works on R-Car S4
> > Patch 34-35 : kernel doc updates
> >
> >
> > Tested on
> > =========
> >
> > * 2x Renesas R-Car S4 Spider (RC<->EP connected with OcuLink cable)
> > * Kernel base: next-20251216 + [1] + [2] + [3]
> >
> > [1]: https://lore.kernel.org/all/20251210071358.2267494-2-cassel@kernel.org/
> > (this is a spin-out patch from
> > https://lore.kernel.org/linux-pci/20251129160405.2568284-20-den@valinux.co.jp/)
> > [2]: https://lore.kernel.org/all/20251208-dma_prep_config-v1-0-53490c5e1e2a@nxp.com/
> > (while it appears to still be under active discussion)
> > [3]: https://lore.kernel.org/all/20251217081955.3137163-1-den@valinux.co.jp/
> > (this is a spin-out patch from
> > https://lore.kernel.org/all/20251129160405.2568284-14-den@valinux.co.jp/)
> >
> >
> > Performance measurement
> > =======================
> >
> > No serious measurements yet, because:
> > * For "before the change", even use_dma/use_msi does not work on the
> > upstream kernel unless we apply some patches for R-Car S4. With some
> > unmerged patch series I had posted earlier (but superseded by this RFC
> > attempt), it was observed that we can achieve about 7 Gbps for the
> > RC->EP direction. Pure upstream kernel can achieve around 500 Mbps
> > though.
> > * For "after the change", measurements are not mature because this
> > RFC v3 patch series is not yet performance-optimized at this stage.
> >
> > Here are the rough measurements showing the achievable performance on
> > the R-Car S4:
> >
> > - Before this change:
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=12.3 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=6.58 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.26 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=7.43 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.39 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=7.38 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=1.42 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=7.41 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > [ 5] 0.00-10.01 sec 344 MBytes 288 Mbits/sec 3.483 ms 51/5555 (0.92%) receiver
> > [ 6] 0.00-10.01 sec 342 MBytes 287 Mbits/sec 3.814 ms 38/5517 (0.69%) receiver
> > [SUM] 0.00-10.01 sec 686 MBytes 575 Mbits/sec 3.648 ms 89/11072 (0.8%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 2`)
> > [ 5] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 3.164 ms 390/5731 (6.8%) receiver
> > [ 6] 0.00-10.03 sec 334 MBytes 279 Mbits/sec 2.416 ms 396/5741 (6.9%) receiver
> > [SUM] 0.00-10.03 sec 667 MBytes 558 Mbits/sec 2.790 ms 786/11472 (6.9%) receiver
> >
> > Note: with `-P 2`, the best total bitrate (receiver side) was achieved.
> >
> > - After this change (use_remote_edma=1):
> >
> > * ping
> > 64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.42 ms
> > 64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=1.38 ms
> > 64 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=1.21 ms
> > 64 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=1.02 ms
> > 64 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=1.06 ms
> > 64 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.995 ms
> > 64 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.964 ms
> > 64 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=1.49 ms
> >
> > * RC->EP (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.437 ms 33053/82329 (40%) receiver
> > [ 6] 0.00-10.02 sec 3.00 GBytes 2.58 Gbits/sec 0.174 ms 46379/95655 (48%) receiver
> > [ 9] 0.00-10.02 sec 2.88 GBytes 2.47 Gbits/sec 0.106 ms 47672/94924 (50%) receiver
> > [ 11] 0.00-10.02 sec 2.87 GBytes 2.46 Gbits/sec 0.364 ms 23694/70817 (33%) receiver
> > [SUM] 0.00-10.02 sec 11.8 GBytes 10.1 Gbits/sec 0.270 ms 150798/343725 (44%) receiver
> >
> > * EP->RC (`sudo iperf3 -ub0 -l 65480 -P 4`)
> > [ 5] 0.00-10.01 sec 3.28 GBytes 2.82 Gbits/sec 0.380 ms 38578/92355 (42%) receiver
> > [ 6] 0.00-10.01 sec 3.24 GBytes 2.78 Gbits/sec 0.430 ms 14268/67340 (21%) receiver
> > [ 9] 0.00-10.01 sec 2.92 GBytes 2.51 Gbits/sec 0.074 ms 0/47890 (0%) receiver
> > [ 11] 0.00-10.01 sec 4.76 GBytes 4.09 Gbits/sec 0.037 ms 0/78073 (0%) receiver
> > [SUM] 0.00-10.01 sec 14.2 GBytes 12.2 Gbits/sec 0.230 ms 52846/285658 (18%) receiver
> >
> > * configfs settings:
> > # modprobe pci_epf_vntb
> > # cd /sys/kernel/config/pci_ep/
> > # mkdir functions/pci_epf_vntb/func1
> > # echo 0x1912 > functions/pci_epf_vntb/func1/vendorid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/deviceid
> > # echo 32 > functions/pci_epf_vntb/func1/msi_interrupts
> > # echo 16 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
> > # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
> > # echo 0x20000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2
> > # echo 0xe0000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_offset
> > # echo 0x1912 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
> > # echo 0x0030 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
> > # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
> > # echo 0 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/ctrl_bar
> > # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1_bar
> > # echo 2 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw2_bar
> > # ln -s controllers/e65d0000.pcie-ep functions/pci_epf_vntb/func1/primary/
> > # echo 1 > controllers/e65d0000.pcie-ep/start
> >
> >
> >
> > Thank you for reviewing,
> >
> >
> > Koichiro Den (35):
> > PCI: endpoint: pci-epf-vntb: Use array_index_nospec() on mws_size[]
> > access
> > NTB: epf: Add mwN_offset support and config region versioning
> > PCI: dwc: ep: Support BAR subrange inbound mapping via address match
> > iATU
> > NTB: Add offset parameter to MW translation APIs
> > PCI: endpoint: pci-epf-vntb: Propagate MW offset from configfs when
> > present
> > NTB: ntb_transport: Support partial memory windows with offsets
> > PCI: endpoint: pci-epf-vntb: Hint subrange mapping preference to EPC
> > driver
> > NTB: core: Add .get_private_data() to ntb_dev_ops
> > NTB: epf: vntb: Implement .get_private_data() callback
> > dmaengine: dw-edma: Fix MSI data values for multi-vector IMWr
> > interrupts
> > NTB: ntb_transport: Move TX memory window setup into setup_qp_mw()
> > NTB: ntb_transport: Dynamically determine qp count
> > NTB: ntb_transport: Introduce get_dma_dev() helper
> > NTB: epf: Reserve a subset of MSI vectors for non-NTB users
> > NTB: ntb_transport: Move internal types to ntb_transport_internal.h
> > NTB: ntb_transport: Introduce ntb_transport_backend_ops
> > dmaengine: dw-edma: Add helper func to retrieve register base and size
> > dmaengine: dw-edma: Add per-channel interrupt routing mode
> > dmaengine: dw-edma: Poll completion when local IRQ handling is
> > disabled
> > dmaengine: dw-edma: Add notify-only channels support
> > dmaengine: dw-edma: Add a helper to retrieve LL (Linked List) region
> > dmaengine: dw-edma: Serialize RMW on shared interrupt registers
> > NTB: ntb_transport: Split core into ntb_transport_core.c
> > NTB: ntb_transport: Add additional hooks for DW eDMA backend
> > NTB: hw: Introduce DesignWare eDMA helper
> > NTB: ntb_transport: Introduce DW eDMA backed transport mode
> > NTB: epf: Provide db_vector_count/db_vector_mask callbacks
> > ntb_netdev: Multi-queue support
> > NTB: epf: Add per-SoC quirk to cap MRRS for DWC eDMA (128B for R-Car)
> > iommu: ipmmu-vmsa: Add PCIe ch0 to devices_allowlist
> > iommu: ipmmu-vmsa: Add support for reserved regions
> > arm64: dts: renesas: Add Spider RC/EP DTs for NTB with remote DW PCIe
> > eDMA
> > NTB: epf: Add an additional memory window (MW2) barno mapping on
> > Renesas R-Car
> > Documentation: PCI: endpoint: pci-epf-vntb: Update and add mwN_offset
> > usage
> > Documentation: driver-api: ntb: Document remote eDMA transport backend
> >
> > Documentation/PCI/endpoint/pci-vntb-howto.rst | 16 +-
> > Documentation/driver-api/ntb.rst | 58 +
> > arch/arm64/boot/dts/renesas/Makefile | 2 +
> > .../boot/dts/renesas/r8a779f0-spider-ep.dts | 37 +
> > .../boot/dts/renesas/r8a779f0-spider-rc.dts | 52 +
> > drivers/dma/dw-edma/dw-edma-core.c | 233 ++++-
> > drivers/dma/dw-edma/dw-edma-core.h | 13 +-
> > drivers/dma/dw-edma/dw-edma-v0-core.c | 39 +-
> > drivers/iommu/ipmmu-vmsa.c | 7 +-
> > drivers/net/ntb_netdev.c | 341 ++++--
> > drivers/ntb/Kconfig | 12 +
> > drivers/ntb/Makefile | 4 +
> > drivers/ntb/hw/amd/ntb_hw_amd.c | 6 +-
> > drivers/ntb/hw/edma/ntb_hw_edma.c | 754 +++++++++++++
> > drivers/ntb/hw/edma/ntb_hw_edma.h | 76 ++
> > drivers/ntb/hw/epf/ntb_hw_epf.c | 187 +++-
> > drivers/ntb/hw/idt/ntb_hw_idt.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.c | 6 +-
> > drivers/ntb/hw/intel/ntb_hw_gen1.h | 2 +-
> > drivers/ntb/hw/intel/ntb_hw_gen3.c | 3 +-
> > drivers/ntb/hw/intel/ntb_hw_gen4.c | 6 +-
> > drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 6 +-
> > drivers/ntb/msi.c | 6 +-
> > .../{ntb_transport.c => ntb_transport_core.c} | 482 ++++-----
> > drivers/ntb/ntb_transport_edma.c | 987 ++++++++++++++++++
> > drivers/ntb/ntb_transport_internal.h | 220 ++++
> > drivers/ntb/test/ntb_perf.c | 4 +-
> > drivers/ntb/test/ntb_tool.c | 6 +-
> > .../pci/controller/dwc/pcie-designware-ep.c | 198 +++-
> > drivers/pci/controller/dwc/pcie-designware.c | 25 +
> > drivers/pci/controller/dwc/pcie-designware.h | 2 +
> > drivers/pci/endpoint/functions/pci-epf-vntb.c | 246 ++++-
> > drivers/pci/endpoint/pci-epc-core.c | 2 +-
> > include/linux/dma/edma.h | 106 ++
> > include/linux/ntb.h | 38 +-
> > include/linux/ntb_transport.h | 5 +
> > include/linux/pci-epf.h | 27 +
> > 37 files changed, 3716 insertions(+), 501 deletions(-)
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-ep.dts
> > create mode 100644 arch/arm64/boot/dts/renesas/r8a779f0-spider-rc.dts
> > create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.c
> > create mode 100644 drivers/ntb/hw/edma/ntb_hw_edma.h
> > rename drivers/ntb/{ntb_transport.c => ntb_transport_core.c} (91%)
> > create mode 100644 drivers/ntb/ntb_transport_edma.c
> > create mode 100644 drivers/ntb/ntb_transport_internal.h
> >
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 61+ messages in thread