* [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices
@ 2025-08-27 14:39 Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 1/7] queue_api: add support for fetching per queue DMA dev Dragos Tatulea
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jens Axboe,
Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Andrew Lunn
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel,
io-uring, linux-rdma
For TCP zerocopy rx (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
devices.
The series adds an API for getting the DMA device for a netdev queue.
Drivers that have special requirements can implement the newly added
queue management op. Otherwise the parent will still be used as before.
This series continues with switching to this API for io_uring zcrx and
devmem and adds a ndo_queue_dma_dev op for mlx5.
The last part of the series changes devmem rx bind to get the DMA device
per queue and blocks the case when multiple queues use different DMA
devices. The tx bind is left as is.
[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
Changes sice v5 [6]:
- Added NL_SET_BAD_ATTR for incorrect rq idx. (patch 6)
Changes sice v4 [5]:
- Dropped EXPORT_SYMBOL of netdev_queue_get_dma_dev() (patch 1).
- Fixed nits, typos and line length issues.
Changes sice v3 [4]:
- Moved ndo_queue_get_dma_dev() from header to own file (patch 1).
- Used rel_num_rx_queues for queue bitmap (patch 6).
- Allocate zeroed bitmap (patch 6).
- Validate queue index (patch 6).
- Dropped rxq_dma_dev check (patch 7).
- Fixed incorrect handling of extack message on bad dma dev (patch 7).
- Added conflicting queues in error message (patch 7).
- Dropped RFC status as feedback was mostly positive.
Changes sice v2 [3]:
- Downgraded to RFC status until consensus is reached.
- Implemented more generic approach as discussed during
v2 review.
- Refactor devmem to get DMA device for multiple rx queues for
multi PF netdev support.
- Renamed series with a more generic name.
Changes since v1 [2]:
- Dropped the Fixes tag.
- Added more documentation as requeseted.
- Renamed the patch title to better reflect its purpose.
Changes since RFC [1]:
- Upgraded from RFC status.
- Dropped driver specific bits for generic solution.
- Implemented single patch as a fix as requested in RFC.
- Handling of multi-PF netdevs will be handled in a subsequent patch
series.
[1] RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
[2] v1: https://lore.kernel.org/all/20250709124059.516095-2-dtatulea@nvidia.com/
[3] v2: https://lore.kernel.org/all/20250711092634.2733340-2-dtatulea@nvidia.com/
[4] v3: https://lore.kernel.org/all/20250815110401.2254214-2-dtatulea@nvidia.com/
[5] v4: https://lore.kernel.org/all/20250820171214.3597901-1-dtatulea@nvidia.com/
[6] v5: https://lore.kernel.org/all/20250825063655.583454-1-dtatulea@nvidia.com/
---
Dragos Tatulea (7):
queue_api: add support for fetching per queue DMA dev
io_uring/zcrx: add support for custom DMA devices
net: devmem: get netdev DMA device via new API
net/mlx5e: add op for getting netdev DMA device
net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf
net: devmem: pre-read requested rx queues during bind
net: devmem: allow binding on rx queues with same DMA devices
.../net/ethernet/mellanox/mlx5/core/en_main.c | 24 ++++
include/net/netdev_queues.h | 7 +
io_uring/zcrx.c | 3 +-
net/core/Makefile | 1 +
net/core/devmem.c | 8 +-
net/core/devmem.h | 2 +
net/core/netdev-genl.c | 122 +++++++++++++-----
net/core/netdev_queues.c | 27 ++++
8 files changed, 163 insertions(+), 31 deletions(-)
create mode 100644 net/core/netdev_queues.c
--
2.50.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next v6 1/7] queue_api: add support for fetching per queue DMA dev
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
@ 2025-08-27 14:39 ` Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 2/7] io_uring/zcrx: add support for custom DMA devices Dragos Tatulea
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel
For zerocopy (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
devices.
This patch introduces the a queue based interface for allowing drivers
to expose a different DMA device for zerocopy.
[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
include/net/netdev_queues.h | 7 +++++++
net/core/Makefile | 1 +
net/core/netdev_queues.c | 27 +++++++++++++++++++++++++++
3 files changed, 35 insertions(+)
create mode 100644 net/core/netdev_queues.c
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 6e835972abd1..b9d02bc65c97 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -127,6 +127,9 @@ void netdev_stat_queue_sum(struct net_device *netdev,
* @ndo_queue_stop: Stop the RX queue at the specified index. The stopped
* queue's memory is written at the specified address.
*
+ * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used
+ * for this queue. Return NULL on error.
+ *
* Note that @ndo_queue_mem_alloc and @ndo_queue_mem_free may be called while
* the interface is closed. @ndo_queue_start and @ndo_queue_stop will only
* be called for an interface which is open.
@@ -144,6 +147,8 @@ struct netdev_queue_mgmt_ops {
int (*ndo_queue_stop)(struct net_device *dev,
void *per_queue_mem,
int idx);
+ struct device * (*ndo_queue_get_dma_dev)(struct net_device *dev,
+ int idx);
};
/**
@@ -321,4 +326,6 @@ static inline void netif_subqueue_sent(const struct net_device *dev,
get_desc, start_thrs); \
})
+struct device *netdev_queue_get_dma_dev(struct net_device *dev, int idx);
+
#endif
diff --git a/net/core/Makefile b/net/core/Makefile
index b2a76ce33932..9ef2099c5426 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
obj-y += net-sysfs.o
obj-y += hotdata.o
obj-y += netdev_rx_queue.o
+obj-y += netdev_queues.o
obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
diff --git a/net/core/netdev_queues.c b/net/core/netdev_queues.c
new file mode 100644
index 000000000000..251f27a8307f
--- /dev/null
+++ b/net/core/netdev_queues.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <net/netdev_queues.h>
+
+/**
+ * netdev_queue_get_dma_dev() - get dma device for zero-copy operations
+ * @dev: net_device
+ * @idx: queue index
+ *
+ * Get dma device for zero-copy operations to be used for this queue.
+ * When such device is not available or valid, the function will return NULL.
+ *
+ * Return: Device or NULL on error
+ */
+struct device *netdev_queue_get_dma_dev(struct net_device *dev, int idx)
+{
+ const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
+ struct device *dma_dev;
+
+ if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
+ dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
+ else
+ dma_dev = dev->dev.parent;
+
+ return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
+}
+
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 2/7] io_uring/zcrx: add support for custom DMA devices
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 1/7] queue_api: add support for fetching per queue DMA dev Dragos Tatulea
@ 2025-08-27 14:39 ` Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 3/7] net: devmem: get netdev DMA device via new API Dragos Tatulea
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, Jens Axboe
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, io-uring,
linux-kernel
Use the new API for getting a DMA device for a specific netdev queue.
This patch will allow io_uring zero-copy rx to work with devices
where the DMA device is not stored in the parent device. mlx5 SFs
are an example of such a device.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index e5ff49f3425e..319eddfd30e0 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -12,6 +12,7 @@
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <net/netlink.h>
+#include <net/netdev_queues.h>
#include <net/netdev_rx_queue.h>
#include <net/tcp.h>
#include <net/rps.h>
@@ -599,7 +600,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
goto err;
}
- ifq->dev = ifq->netdev->dev.parent;
+ ifq->dev = netdev_queue_get_dma_dev(ifq->netdev, ifq->if_rxq);
if (!ifq->dev) {
ret = -EOPNOTSUPP;
goto err;
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 3/7] net: devmem: get netdev DMA device via new API
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 1/7] queue_api: add support for fetching per queue DMA dev Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 2/7] io_uring/zcrx: add support for custom DMA devices Dragos Tatulea
@ 2025-08-27 14:39 ` Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 4/7] net/mlx5e: add op for getting netdev DMA device Dragos Tatulea
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel
Switch to the new API for fetching DMA devices for a netdev. The API is
called with queue index 0 for now which is equivalent with the previous
behavior.
This patch will allow devmem to work with devices where the DMA device
is not stored in the parent device. mlx5 SFs are an example of such a
device.
Multi-PF netdevs are still problematic (as they were before this
change). Upcoming patches will address this for the rx binding.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
net/core/devmem.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 24c591ab38ae..c58b24128727 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -182,6 +182,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
{
struct net_devmem_dmabuf_binding *binding;
static u32 id_alloc_next;
+ struct device *dma_dev;
struct scatterlist *sg;
struct dma_buf *dmabuf;
unsigned int sg_idx, i;
@@ -192,6 +193,13 @@ net_devmem_bind_dmabuf(struct net_device *dev,
if (IS_ERR(dmabuf))
return ERR_CAST(dmabuf);
+ dma_dev = netdev_queue_get_dma_dev(dev, 0);
+ if (!dma_dev) {
+ err = -EOPNOTSUPP;
+ NL_SET_ERR_MSG(extack, "Device doesn't support DMA");
+ goto err_put_dmabuf;
+ }
+
binding = kzalloc_node(sizeof(*binding), GFP_KERNEL,
dev_to_node(&dev->dev));
if (!binding) {
@@ -209,7 +217,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
binding->dmabuf = dmabuf;
binding->direction = direction;
- binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
+ binding->attachment = dma_buf_attach(binding->dmabuf, dma_dev);
if (IS_ERR(binding->attachment)) {
err = PTR_ERR(binding->attachment);
NL_SET_ERR_MSG(extack, "Failed to bind dmabuf to device");
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 4/7] net/mlx5e: add op for getting netdev DMA device
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
` (2 preceding siblings ...)
2025-08-27 14:39 ` [PATCH net-next v6 3/7] net: devmem: get netdev DMA device via new API Dragos Tatulea
@ 2025-08-27 14:39 ` Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 5/7] net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf Dragos Tatulea
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, Saeed Mahameed, Tariq Toukan,
Mark Bloch, Leon Romanovsky, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-rdma,
linux-kernel
For zero-copy (devmem, io_uring), the netdev DMA device used
is the parent device of the net device. However that is not
always accurate for mlx5 devices:
- SFs: The parent device is an auxdev.
- Multi-PF netdevs: The DMA device should be determined by
the queue.
This change implements the DMA device queue API that returns the DMA
device appropriately for all cases.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
.../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 21bb88c5d3dc..0e48065a46eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5625,12 +5625,36 @@ static int mlx5e_queue_start(struct net_device *dev, void *newq,
return 0;
}
+static struct device *mlx5e_queue_get_dma_dev(struct net_device *dev,
+ int queue_index)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5e_channels *channels;
+ struct device *pdev = NULL;
+ struct mlx5e_channel *ch;
+
+ channels = &priv->channels;
+
+ mutex_lock(&priv->state_lock);
+
+ if (queue_index >= channels->num)
+ goto out;
+
+ ch = channels->c[queue_index];
+ pdev = ch->pdev;
+out:
+ mutex_unlock(&priv->state_lock);
+
+ return pdev;
+}
+
static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = {
.ndo_queue_mem_size = sizeof(struct mlx5_qmgmt_data),
.ndo_queue_mem_alloc = mlx5e_queue_mem_alloc,
.ndo_queue_mem_free = mlx5e_queue_mem_free,
.ndo_queue_start = mlx5e_queue_start,
.ndo_queue_stop = mlx5e_queue_stop,
+ .ndo_queue_get_dma_dev = mlx5e_queue_get_dma_dev,
};
static void mlx5e_build_nic_netdev(struct net_device *netdev)
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 5/7] net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
` (3 preceding siblings ...)
2025-08-27 14:39 ` [PATCH net-next v6 4/7] net/mlx5e: add op for getting netdev DMA device Dragos Tatulea
@ 2025-08-27 14:39 ` Dragos Tatulea
2025-08-27 14:40 ` [PATCH net-next v6 6/7] net: devmem: pre-read requested rx queues during bind Dragos Tatulea
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:39 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel
Fetch the DMA device before calling net_devmem_bind_dmabuf()
and pass it on as a parameter.
This is needed for an upcoming change which will read the
DMA device per queue.
This patch has no functional changes.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
net/core/devmem.c | 14 ++++++--------
net/core/devmem.h | 2 ++
net/core/netdev-genl.c | 12 ++++++++----
3 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index c58b24128727..d9de31a6cc7f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -176,30 +176,28 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
struct net_devmem_dmabuf_binding *
net_devmem_bind_dmabuf(struct net_device *dev,
+ struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
struct netlink_ext_ack *extack)
{
struct net_devmem_dmabuf_binding *binding;
static u32 id_alloc_next;
- struct device *dma_dev;
struct scatterlist *sg;
struct dma_buf *dmabuf;
unsigned int sg_idx, i;
unsigned long virtual;
int err;
- dmabuf = dma_buf_get(dmabuf_fd);
- if (IS_ERR(dmabuf))
- return ERR_CAST(dmabuf);
-
- dma_dev = netdev_queue_get_dma_dev(dev, 0);
if (!dma_dev) {
- err = -EOPNOTSUPP;
NL_SET_ERR_MSG(extack, "Device doesn't support DMA");
- goto err_put_dmabuf;
+ return ERR_PTR(-EOPNOTSUPP);
}
+ dmabuf = dma_buf_get(dmabuf_fd);
+ if (IS_ERR(dmabuf))
+ return ERR_CAST(dmabuf);
+
binding = kzalloc_node(sizeof(*binding), GFP_KERNEL,
dev_to_node(&dev->dev));
if (!binding) {
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 41cd6e1c9141..101150d761af 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -85,6 +85,7 @@ struct dmabuf_genpool_chunk_owner {
void __net_devmem_dmabuf_binding_free(struct work_struct *wq);
struct net_devmem_dmabuf_binding *
net_devmem_bind_dmabuf(struct net_device *dev,
+ struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
struct netlink_ext_ack *extack);
@@ -170,6 +171,7 @@ static inline void net_devmem_put_net_iov(struct net_iov *niov)
static inline struct net_devmem_dmabuf_binding *
net_devmem_bind_dmabuf(struct net_device *dev,
+ struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd,
struct netdev_nl_sock *priv,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 6314eb7bdf69..3e2d6aa6e060 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -876,6 +876,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
u32 ifindex, dmabuf_fd, rxq_idx;
struct netdev_nl_sock *priv;
struct net_device *netdev;
+ struct device *dma_dev;
struct sk_buff *rsp;
struct nlattr *attr;
int rem, err = 0;
@@ -921,8 +922,9 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock;
}
- binding = net_devmem_bind_dmabuf(netdev, DMA_FROM_DEVICE, dmabuf_fd,
- priv, info->extack);
+ dma_dev = netdev_queue_get_dma_dev(netdev, 0);
+ binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE,
+ dmabuf_fd, priv, info->extack);
if (IS_ERR(binding)) {
err = PTR_ERR(binding);
goto err_unlock;
@@ -986,6 +988,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
struct net_devmem_dmabuf_binding *binding;
struct netdev_nl_sock *priv;
struct net_device *netdev;
+ struct device *dma_dev;
u32 ifindex, dmabuf_fd;
struct sk_buff *rsp;
int err = 0;
@@ -1032,8 +1035,9 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock_netdev;
}
- binding = net_devmem_bind_dmabuf(netdev, DMA_TO_DEVICE, dmabuf_fd, priv,
- info->extack);
+ dma_dev = netdev_queue_get_dma_dev(netdev, 0);
+ binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE,
+ dmabuf_fd, priv, info->extack);
if (IS_ERR(binding)) {
err = PTR_ERR(binding);
goto err_unlock_netdev;
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 6/7] net: devmem: pre-read requested rx queues during bind
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
` (4 preceding siblings ...)
2025-08-27 14:39 ` [PATCH net-next v6 5/7] net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf Dragos Tatulea
@ 2025-08-27 14:40 ` Dragos Tatulea
2025-08-27 14:40 ` [PATCH net-next v6 7/7] net: devmem: allow binding on rx queues with same DMA devices Dragos Tatulea
2025-08-28 23:50 ` [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC " patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:40 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel
Instead of reading the requested rx queues after binding the buffer,
read the rx queues in advance in a bitmap and iterate over them when
needed.
This is a preparation for fetching the DMA device for each queue.
This patch has no functional changes besides adding an extra
rq index bounds check.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
---
net/core/netdev-genl.c | 85 ++++++++++++++++++++++++++++--------------
1 file changed, 58 insertions(+), 27 deletions(-)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 3e2d6aa6e060..739598d34657 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -869,17 +869,55 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
return err;
}
-int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
+static int netdev_nl_read_rxq_bitmap(struct genl_info *info,
+ u32 rxq_bitmap_len,
+ unsigned long *rxq_bitmap)
{
+ const int maxtype = ARRAY_SIZE(netdev_queue_id_nl_policy) - 1;
struct nlattr *tb[ARRAY_SIZE(netdev_queue_id_nl_policy)];
+ struct nlattr *attr;
+ int rem, err = 0;
+ u32 rxq_idx;
+
+ nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
+ genlmsg_data(info->genlhdr),
+ genlmsg_len(info->genlhdr), rem) {
+ err = nla_parse_nested(tb, maxtype, attr,
+ netdev_queue_id_nl_policy, info->extack);
+ if (err < 0)
+ return err;
+
+ if (NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_ID) ||
+ NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_TYPE))
+ return -EINVAL;
+
+ if (nla_get_u32(tb[NETDEV_A_QUEUE_TYPE]) != NETDEV_QUEUE_TYPE_RX) {
+ NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_TYPE]);
+ return -EINVAL;
+ }
+
+ rxq_idx = nla_get_u32(tb[NETDEV_A_QUEUE_ID]);
+ if (rxq_idx >= rxq_bitmap_len) {
+ NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_ID]);
+ return -EINVAL;
+ }
+
+ bitmap_set(rxq_bitmap, rxq_idx, 1);
+ }
+
+ return 0;
+}
+
+int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
+{
struct net_devmem_dmabuf_binding *binding;
u32 ifindex, dmabuf_fd, rxq_idx;
struct netdev_nl_sock *priv;
struct net_device *netdev;
+ unsigned long *rxq_bitmap;
struct device *dma_dev;
struct sk_buff *rsp;
- struct nlattr *attr;
- int rem, err = 0;
+ int err = 0;
void *hdr;
if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX) ||
@@ -922,37 +960,26 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock;
}
+ rxq_bitmap = bitmap_zalloc(netdev->real_num_rx_queues, GFP_KERNEL);
+ if (!rxq_bitmap) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
+ err = netdev_nl_read_rxq_bitmap(info, netdev->real_num_rx_queues,
+ rxq_bitmap);
+ if (err)
+ goto err_rxq_bitmap;
+
dma_dev = netdev_queue_get_dma_dev(netdev, 0);
binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE,
dmabuf_fd, priv, info->extack);
if (IS_ERR(binding)) {
err = PTR_ERR(binding);
- goto err_unlock;
+ goto err_rxq_bitmap;
}
- nla_for_each_attr_type(attr, NETDEV_A_DMABUF_QUEUES,
- genlmsg_data(info->genlhdr),
- genlmsg_len(info->genlhdr), rem) {
- err = nla_parse_nested(
- tb, ARRAY_SIZE(netdev_queue_id_nl_policy) - 1, attr,
- netdev_queue_id_nl_policy, info->extack);
- if (err < 0)
- goto err_unbind;
-
- if (NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_ID) ||
- NL_REQ_ATTR_CHECK(info->extack, attr, tb, NETDEV_A_QUEUE_TYPE)) {
- err = -EINVAL;
- goto err_unbind;
- }
-
- if (nla_get_u32(tb[NETDEV_A_QUEUE_TYPE]) != NETDEV_QUEUE_TYPE_RX) {
- NL_SET_BAD_ATTR(info->extack, tb[NETDEV_A_QUEUE_TYPE]);
- err = -EINVAL;
- goto err_unbind;
- }
-
- rxq_idx = nla_get_u32(tb[NETDEV_A_QUEUE_ID]);
-
+ for_each_set_bit(rxq_idx, rxq_bitmap, netdev->real_num_rx_queues) {
err = net_devmem_bind_dmabuf_to_queue(netdev, rxq_idx, binding,
info->extack);
if (err)
@@ -966,6 +993,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
if (err)
goto err_unbind;
+ bitmap_free(rxq_bitmap);
+
netdev_unlock(netdev);
mutex_unlock(&priv->lock);
@@ -974,6 +1003,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
err_unbind:
net_devmem_unbind_dmabuf(binding);
+err_rxq_bitmap:
+ bitmap_free(rxq_bitmap);
err_unlock:
netdev_unlock(netdev);
err_unlock_sock:
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next v6 7/7] net: devmem: allow binding on rx queues with same DMA devices
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
` (5 preceding siblings ...)
2025-08-27 14:40 ` [PATCH net-next v6 6/7] net: devmem: pre-read requested rx queues during bind Dragos Tatulea
@ 2025-08-27 14:40 ` Dragos Tatulea
2025-08-28 23:50 ` [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC " patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: Dragos Tatulea @ 2025-08-27 14:40 UTC (permalink / raw)
To: almasrymina, asml.silence, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: Dragos Tatulea, cratiu, parav, netdev, sdf, linux-kernel
Multi-PF netdevs have queues belonging to different PFs which also means
different DMA devices. This means that the binding on the DMA buffer can
be done to the incorrect device.
This change allows devmem binding to multiple queues only when the
queues have the same DMA device. Otherwise an error is returned.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
net/core/netdev-genl.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 739598d34657..470fabbeacd9 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -908,6 +908,30 @@ static int netdev_nl_read_rxq_bitmap(struct genl_info *info,
return 0;
}
+static struct device *
+netdev_nl_get_dma_dev(struct net_device *netdev, unsigned long *rxq_bitmap,
+ struct netlink_ext_ack *extack)
+{
+ struct device *dma_dev = NULL;
+ u32 rxq_idx, prev_rxq_idx;
+
+ for_each_set_bit(rxq_idx, rxq_bitmap, netdev->real_num_rx_queues) {
+ struct device *rxq_dma_dev;
+
+ rxq_dma_dev = netdev_queue_get_dma_dev(netdev, rxq_idx);
+ if (dma_dev && rxq_dma_dev != dma_dev) {
+ NL_SET_ERR_MSG_FMT(extack, "DMA device mismatch between queue %u and %u (multi-PF device?)",
+ rxq_idx, prev_rxq_idx);
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+
+ dma_dev = rxq_dma_dev;
+ prev_rxq_idx = rxq_idx;
+ }
+
+ return dma_dev;
+}
+
int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
{
struct net_devmem_dmabuf_binding *binding;
@@ -971,7 +995,12 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
if (err)
goto err_rxq_bitmap;
- dma_dev = netdev_queue_get_dma_dev(netdev, 0);
+ dma_dev = netdev_nl_get_dma_dev(netdev, rxq_bitmap, info->extack);
+ if (IS_ERR(dma_dev)) {
+ err = PTR_ERR(dma_dev);
+ goto err_rxq_bitmap;
+ }
+
binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE,
dmabuf_fd, priv, info->extack);
if (IS_ERR(binding)) {
--
2.50.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
` (6 preceding siblings ...)
2025-08-27 14:40 ` [PATCH net-next v6 7/7] net: devmem: allow binding on rx queues with same DMA devices Dragos Tatulea
@ 2025-08-28 23:50 ` patchwork-bot+netdevbpf
7 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-28 23:50 UTC (permalink / raw)
To: Dragos Tatulea
Cc: almasrymina, asml.silence, davem, edumazet, kuba, pabeni, horms,
axboe, saeedm, tariqt, mbloch, leon, andrew+netdev, cratiu, parav,
netdev, sdf, linux-kernel, io-uring, linux-rdma
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 27 Aug 2025 17:39:54 +0300 you wrote:
> For TCP zerocopy rx (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> - Scalable Function netdevs [1] have the DMA device in the grandparent.
> - For Multi-PF netdevs [2] queues can be associated to different DMA
> devices.
>
> The series adds an API for getting the DMA device for a netdev queue.
> Drivers that have special requirements can implement the newly added
> queue management op. Otherwise the parent will still be used as before.
>
> [...]
Here is the summary with links:
- [net-next,v6,1/7] queue_api: add support for fetching per queue DMA dev
https://git.kernel.org/netdev/net-next/c/13d8e05adf9d
- [net-next,v6,2/7] io_uring/zcrx: add support for custom DMA devices
https://git.kernel.org/netdev/net-next/c/59b8b32ac8d4
- [net-next,v6,3/7] net: devmem: get netdev DMA device via new API
https://git.kernel.org/netdev/net-next/c/7c7e94603a76
- [net-next,v6,4/7] net/mlx5e: add op for getting netdev DMA device
https://git.kernel.org/netdev/net-next/c/f1debf1a2ef4
- [net-next,v6,5/7] net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf
https://git.kernel.org/netdev/net-next/c/512c88fb0e88
- [net-next,v6,6/7] net: devmem: pre-read requested rx queues during bind
https://git.kernel.org/netdev/net-next/c/1b416902cd25
- [net-next,v6,7/7] net: devmem: allow binding on rx queues with same DMA devices
https://git.kernel.org/netdev/net-next/c/b8aab4bb9585
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-28 23:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 14:39 [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC DMA devices Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 1/7] queue_api: add support for fetching per queue DMA dev Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 2/7] io_uring/zcrx: add support for custom DMA devices Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 3/7] net: devmem: get netdev DMA device via new API Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 4/7] net/mlx5e: add op for getting netdev DMA device Dragos Tatulea
2025-08-27 14:39 ` [PATCH net-next v6 5/7] net: devmem: pull out dma_dev out of net_devmem_bind_dmabuf Dragos Tatulea
2025-08-27 14:40 ` [PATCH net-next v6 6/7] net: devmem: pre-read requested rx queues during bind Dragos Tatulea
2025-08-27 14:40 ` [PATCH net-next v6 7/7] net: devmem: allow binding on rx queues with same DMA devices Dragos Tatulea
2025-08-28 23:50 ` [PATCH net-next v6 0/7] devmem/io_uring: allow more flexibility for ZC " patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).