netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v13 00/10] io_uring zero copy rx
@ 2025-02-04 21:56 David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 01/10] net: page_pool: don't cast mp param to devmem David Wei
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

This patchset contains net/ patches needed by a new io_uring request
implementing zero copy rx into userspace pages, eliminating a kernel
to user copy.

We configure a page pool that a driver uses to fill a hw rx queue to
hand out user pages instead of kernel pages. Any data that ends up
hitting this hw rx queue will thus be dma'd into userspace memory
directly, without needing to be bounced through kernel memory. 'Reading'
data out of a socket instead becomes a _notification_ mechanism, where
the kernel tells userspace where the data is. The overall approach is
similar to the devmem TCP proposal.

This relies on hw header/data split, flow steering and RSS to ensure
packet headers remain in kernel memory and only desired flows hit a hw
rx queue configured for zero copy. Configuring this is outside of the
scope of this patchset.

We share netdev core infra with devmem TCP. The main difference is that
io_uring is used for the uAPI and the lifetime of all objects are bound
to an io_uring instance. Data is 'read' using a new io_uring request
type. When done, data is returned via a new shared refill queue. A zero
copy page pool refills a hw rx queue from this refill queue directly. Of
course, the lifetime of these data buffers are managed by io_uring
rather than the networking stack, with different refcounting rules.

This patchset is the first step adding basic zero copy support. We will
extend this iteratively with new features e.g. dynamically allocated
zero copy areas, THP support, dmabuf support, improved copy fallback,
general optimisations and more.

In terms of netdev support, we're first targeting Broadcom bnxt. Patches
aren't included since Taehee Yoo has already sent a more comprehensive
patchset adding support in [1]. Google gve should already support this,
and Mellanox mlx5 support is WIP pending driver changes.

===========
Performance
===========

Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.

Test setup:
* AMD EPYC 9454
* Broadcom BCM957508 200G
* Kernel v6.11 base [2]
* liburing fork [3]
* kperf fork [4]
* 4K MTU
* Single TCP flow

With application thread + net rx softirq pinned to _different_ cores:

+-------------------------------+
| epoll     | io_uring          |
|-----------|-------------------|
| 82.2 Gbps | 116.2 Gbps (+41%) |
+-------------------------------+

Pinned to _same_ core:

+-------------------------------+
| epoll     | io_uring          |
|-----------|-------------------|
| 62.6 Gbps | 80.9 Gbps (+29%)  |
+-------------------------------+

=====
Links
=====

Broadcom bnxt support:
[1]: https://lore.kernel.org/netdev/20241003160620.1521626-8-ap420073@gmail.com/

Linux kernel branch including io_uring bits:
[2]: https://github.com/isilence/linux.git zcrx/v13

liburing for testing:
[3]: https://github.com/isilence/liburing.git zcrx/next

kperf for testing:
[4]: https://git.kernel.dk/kperf.git

Changes in v13:
---------------
* Fix race between io_uring closing and netdev unregister
* Regenerate Netlink YAML

Changes in v12:
---------------
* Check nla_nest_start() errors
* Don't leak a netdev, add missing netdev_put()
* Warn on failed queue restart during close

Changes in v11:
---------------
* Add a shim provider helper for page_pool_set_dma_addr_netmem()
* Drop netdev in ->uninstall, pin struct device instead
* Add net_mp_open_rxq() and net_mp_close_rxq()
* Remove unneeded CFLAGS += -I/usr/include/ in selftest Makefile

Changes in v10:
---------------
* Fix !CONFIG_PAGE_POOL build
* Use acquire/release for RQ in examples
* Fix page_pool_ref_netmem for net_iov
* Move provider helpers / definitions into a new file
* Don’t export page_pool_{set,clear}_pp_info, introduce
  net_mp_niov_{set,clear}_page_pool() instead
* Remove devmem.h from net/core/page_pool_user.c
* Add Netdev yaml for io-uring attribute
* Add memory provider ops for filling in Netlink info

Changes in v9:
--------------
* Fail proof against multiple page pools running the same memory
  provider
  * Lock the consumer side of the refill queue.
  * Move scrub into io_uring exit.
  * Kill napi_execute.
  * Kill area init api and export finer grained net helpers as partial
    init now need to happen in ->alloc_netmems()
* Separate user refcounting.
  * Fix copy fallback path math.
* Add rodata check to page_pool_init()
* Fix incorrect path in documentation

Changes in v8:
--------------
* add documentation and selftest
* use io_uring regions for the refill ring

Changes in v7:
--------------
net:
* Use NAPI_F_PREFER_BUSY_POLL for napi_execute + stylistics changes.

Changes in v6:
--------------
Please note: Comparison with TCP_ZEROCOPY_RECEIVE isn't done yet.

net:
* Drop a devmem.h clean up patch.
* Migrate to netdev_get_by_index from deprecated API.
* Fix !CONFIG_NET_DEVMEM build.
* Don’t return into the page pool cache directly, use a new helper
* Refactor napi_execute

io_uring:
* Require IORING_RECV_MULTISHOT flag set.
* Add unselectable CONFIG_IO_URING_ZCRX.
* Pulled latest io_uring changes.
* Unexport io_uring_pp_zc_ops.

Changes in v5:
--------------
* Rebase on top of merged net_iov + netmem infra.
* Decouple net_iov from devmem TCP.
* Use netdev queue API to allocate an rx queue.
* Minor uAPI enhancements for future extensibility.
* QoS improvements with request throttling.

Changes in RFC v4:
------------------
* Rebased on top of Mina Almasry's TCP devmem patchset and latest
  net-next, now sharing common infra e.g.:
    * netmem_t and net_iovs
    * Page pool memory provider
* The registered buffer (rbuf) completion queue where completions from
  io_recvzc requests are posted is removed. Now these post into the main
  completion queue, using big (32-byte) CQEs. The first 16 bytes is an
  ordinary CQE, while the latter 16 bytes contain the io_uring_rbuf_cqe
  as before. This vastly simplifies the uAPI and removes a level of
  indirection in userspace when looking for payloads.
  * The rbuf refill queue is still needed for userspace to return
    buffers to kernel.
* Simplified code and uAPI on the io_uring side, particularly
  io_recvzc() and io_zc_rx_recv(). Many unnecessary lines were removed
  e.g. extra msg flags, readlen, etc.

Changes in RFC v3:
------------------
* Rebased on top of Jakub Kicinski's memory provider API RFC. The ZC
  pool added is now a backend for memory provider.
* We're also reusing ppiov infrastructure. The refcounting rules stay
  the same but it's shifted into ppiov->refcount. That lets us to
  flexibly manage buffer lifetimes without adding any extra code to the
  common networking paths. It'd also make it easier to support dmabufs
  and device memory in the future.
  * io_uring also knows about pages, and so ppiovs might unnecessarily
    break tools inspecting data, that can easily be solved later.

Many patches are not for upstream as they depend on work in progress,
namely from Mina:

* struct netmem_t
* Driver ndo commands for Rx queue configs
* struct page_pool_iov and shared pp infra

Changes in RFC v2:
------------------
* Added copy fallback support if userspace memory allocated for ZC Rx
  runs out, or if header splitting or flow steering fails.
* Added veth support for ZC Rx, for testing and demonstration. We will
  need to figure out what driver would be best for such testing
  functionality in the future. Perhaps netdevsim?
* Added socket registration API to io_uring to associate specific
  sockets with ifqs/Rx queues for ZC.
* Added multi-socket support, such that multiple connections can be
  steered into the same hardware Rx queue.
* Added Netbench server/client support.

David Wei (2):
  netdev: add io_uring memory provider info
  net: add helpers for setting a memory provider on an rx queue

Pavel Begunkov (8):
  net: page_pool: don't cast mp param to devmem
  net: prefix devmem specific helpers
  net: generalise net_iov chunk owners
  net: page_pool: create hooks for custom memory providers
  net: page_pool: add callback for mp info printing
  net: page_pool: add a mp hook to unregister_netdevice*
  net: prepare for non devmem TCP memory providers
  net: page_pool: add memory provider helpers

 Documentation/netlink/specs/netdev.yaml | 15 ++++
 include/net/netmem.h                    | 21 +++++-
 include/net/page_pool/memory_provider.h | 45 ++++++++++++
 include/net/page_pool/types.h           |  4 ++
 include/uapi/linux/netdev.h             |  7 ++
 net/core/dev.c                          | 16 ++++-
 net/core/devmem.c                       | 93 ++++++++++++++++---------
 net/core/devmem.h                       | 49 ++++++-------
 net/core/netdev-genl.c                  | 11 +--
 net/core/netdev_rx_queue.c              | 69 ++++++++++++++++++
 net/core/page_pool.c                    | 51 +++++++++++---
 net/core/page_pool_user.c               |  7 +-
 net/ipv4/tcp.c                          |  7 +-
 tools/include/uapi/linux/netdev.h       |  7 ++
 14 files changed, 321 insertions(+), 81 deletions(-)
 create mode 100644 include/net/page_pool/memory_provider.h

-- 
2.43.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 01/10] net: page_pool: don't cast mp param to devmem
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 02/10] net: prefix devmem specific helpers David Wei
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

page_pool_check_memory_provider() is a generic path and shouldn't assume
anything about the actual type of the memory provider argument. It's
fine while devmem is the only provider, but cast away the devmem
specific binding types to avoid confusion.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 net/core/page_pool_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 6677e0c2e256..d5e214c30c31 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -356,7 +356,7 @@ void page_pool_unlist(struct page_pool *pool)
 int page_pool_check_memory_provider(struct net_device *dev,
 				    struct netdev_rx_queue *rxq)
 {
-	struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
+	void *binding = rxq->mp_params.mp_priv;
 	struct page_pool *pool;
 	struct hlist_node *n;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 02/10] net: prefix devmem specific helpers
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 01/10] net: page_pool: don't cast mp param to devmem David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 03/10] net: generalise net_iov chunk owners David Wei
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

Add prefixes to all helpers that are specific to devmem TCP, i.e.
net_iov_binding[_id].

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 net/core/devmem.c |  2 +-
 net/core/devmem.h | 14 +++++++-------
 net/ipv4/tcp.c    |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 3bba3f018df0..66cd1ab9224f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -94,7 +94,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
 
 void net_devmem_free_dmabuf(struct net_iov *niov)
 {
-	struct net_devmem_dmabuf_binding *binding = net_iov_binding(niov);
+	struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov);
 	unsigned long dma_addr = net_devmem_get_dma_addr(niov);
 
 	if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr,
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 76099ef9c482..99782ddeca40 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -86,11 +86,16 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov)
 }
 
 static inline struct net_devmem_dmabuf_binding *
-net_iov_binding(const struct net_iov *niov)
+net_devmem_iov_binding(const struct net_iov *niov)
 {
 	return net_iov_owner(niov)->binding;
 }
 
+static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
+{
+	return net_devmem_iov_binding(niov)->id;
+}
+
 static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
 {
 	struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
@@ -99,11 +104,6 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
 	       ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
 }
 
-static inline u32 net_iov_binding_id(const struct net_iov *niov)
-{
-	return net_iov_owner(niov)->binding->id;
-}
-
 static inline void
 net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding)
 {
@@ -171,7 +171,7 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
 	return 0;
 }
 
-static inline u32 net_iov_binding_id(const struct net_iov *niov)
+static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
 {
 	return 0;
 }
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..b872de9a8271 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2494,7 +2494,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 
 				/* Will perform the exchange later */
 				dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx];
-				dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov);
+				dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
 
 				offset += copy;
 				remaining_len -= copy;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 03/10] net: generalise net_iov chunk owners
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 01/10] net: page_pool: don't cast mp param to devmem David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 02/10] net: prefix devmem specific helpers David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 04/10] net: page_pool: create hooks for custom memory providers David Wei
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner,
which serves as a useful abstraction to share data and provide a
context. However, it's too devmem specific, and we want to reuse it for
other memory providers, and for that we need to decouple net_iov from
devmem. Make net_iov to point to a new base structure called
net_iov_area, which dmabuf_genpool_chunk_owner extends.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/netmem.h | 21 ++++++++++++++++++++-
 net/core/devmem.c    | 25 +++++++++++++------------
 net/core/devmem.h    | 25 +++++++++----------------
 3 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 1b58faa4f20f..c61d5b21e7b4 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -24,11 +24,20 @@ struct net_iov {
 	unsigned long __unused_padding;
 	unsigned long pp_magic;
 	struct page_pool *pp;
-	struct dmabuf_genpool_chunk_owner *owner;
+	struct net_iov_area *owner;
 	unsigned long dma_addr;
 	atomic_long_t pp_ref_count;
 };
 
+struct net_iov_area {
+	/* Array of net_iovs for this area. */
+	struct net_iov *niovs;
+	size_t num_niovs;
+
+	/* Offset into the dma-buf where this chunk starts.  */
+	unsigned long base_virtual;
+};
+
 /* These fields in struct page are used by the page_pool and net stack:
  *
  *        struct {
@@ -54,6 +63,16 @@ NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
 NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
 #undef NET_IOV_ASSERT_OFFSET
 
+static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov)
+{
+	return niov->owner;
+}
+
+static inline unsigned int net_iov_idx(const struct net_iov *niov)
+{
+	return niov - net_iov_owner(niov)->niovs;
+}
+
 /* netmem */
 
 /**
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 66cd1ab9224f..fb0dddcb4e60 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -33,14 +33,15 @@ static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
 {
 	struct dmabuf_genpool_chunk_owner *owner = chunk->owner;
 
-	kvfree(owner->niovs);
+	kvfree(owner->area.niovs);
 	kfree(owner);
 }
 
 static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
 {
-	struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
+	struct dmabuf_genpool_chunk_owner *owner;
 
+	owner = net_devmem_iov_to_chunk_owner(niov);
 	return owner->base_dma_addr +
 	       ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT);
 }
@@ -83,7 +84,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
 
 	offset = dma_addr - owner->base_dma_addr;
 	index = offset / PAGE_SIZE;
-	niov = &owner->niovs[index];
+	niov = &owner->area.niovs[index];
 
 	niov->pp_magic = 0;
 	niov->pp = NULL;
@@ -261,9 +262,9 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 			goto err_free_chunks;
 		}
 
-		owner->base_virtual = virtual;
+		owner->area.base_virtual = virtual;
 		owner->base_dma_addr = dma_addr;
-		owner->num_niovs = len / PAGE_SIZE;
+		owner->area.num_niovs = len / PAGE_SIZE;
 		owner->binding = binding;
 
 		err = gen_pool_add_owner(binding->chunk_pool, dma_addr,
@@ -275,17 +276,17 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 			goto err_free_chunks;
 		}
 
-		owner->niovs = kvmalloc_array(owner->num_niovs,
-					      sizeof(*owner->niovs),
-					      GFP_KERNEL);
-		if (!owner->niovs) {
+		owner->area.niovs = kvmalloc_array(owner->area.num_niovs,
+						   sizeof(*owner->area.niovs),
+						   GFP_KERNEL);
+		if (!owner->area.niovs) {
 			err = -ENOMEM;
 			goto err_free_chunks;
 		}
 
-		for (i = 0; i < owner->num_niovs; i++) {
-			niov = &owner->niovs[i];
-			niov->owner = owner;
+		for (i = 0; i < owner->area.num_niovs; i++) {
+			niov = &owner->area.niovs[i];
+			niov->owner = &owner->area;
 			page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
 						      net_devmem_get_dma_addr(niov));
 		}
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 99782ddeca40..a2b9913e9a17 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -10,6 +10,8 @@
 #ifndef _NET_DEVMEM_H
 #define _NET_DEVMEM_H
 
+#include <net/netmem.h>
+
 struct netlink_ext_ack;
 
 struct net_devmem_dmabuf_binding {
@@ -51,17 +53,11 @@ struct net_devmem_dmabuf_binding {
  * allocations from this chunk.
  */
 struct dmabuf_genpool_chunk_owner {
-	/* Offset into the dma-buf where this chunk starts.  */
-	unsigned long base_virtual;
+	struct net_iov_area area;
+	struct net_devmem_dmabuf_binding *binding;
 
 	/* dma_addr of the start of the chunk.  */
 	dma_addr_t base_dma_addr;
-
-	/* Array of net_iovs for this chunk. */
-	struct net_iov *niovs;
-	size_t num_niovs;
-
-	struct net_devmem_dmabuf_binding *binding;
 };
 
 void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
@@ -75,20 +71,17 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 void dev_dmabuf_uninstall(struct net_device *dev);
 
 static inline struct dmabuf_genpool_chunk_owner *
-net_iov_owner(const struct net_iov *niov)
+net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
 {
-	return niov->owner;
-}
+	struct net_iov_area *owner = net_iov_owner(niov);
 
-static inline unsigned int net_iov_idx(const struct net_iov *niov)
-{
-	return niov - net_iov_owner(niov)->niovs;
+	return container_of(owner, struct dmabuf_genpool_chunk_owner, area);
 }
 
 static inline struct net_devmem_dmabuf_binding *
 net_devmem_iov_binding(const struct net_iov *niov)
 {
-	return net_iov_owner(niov)->binding;
+	return net_devmem_iov_to_chunk_owner(niov)->binding;
 }
 
 static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
@@ -98,7 +91,7 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
 
 static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
 {
-	struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
+	struct net_iov_area *owner = net_iov_owner(niov);
 
 	return owner->base_virtual +
 	       ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 04/10] net: page_pool: create hooks for custom memory providers
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (2 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 03/10] net: generalise net_iov chunk owners David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 05/10] netdev: add io_uring memory provider info David Wei
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

A spin off from the original page pool memory providers patch by Jakub,
which allows extending page pools with custom allocators. One of such
providers is devmem TCP, and the other is io_uring zerocopy added in
following patches.

Link: https://lore.kernel.org/netdev/20230707183935.997267-7-kuba@kernel.org/
Co-developed-by: Jakub Kicinski <kuba@kernel.org> # initial mp proposal
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/page_pool/memory_provider.h | 15 +++++++++++++++
 include/net/page_pool/types.h           |  4 ++++
 net/core/devmem.c                       | 15 ++++++++++++++-
 net/core/page_pool.c                    | 23 +++++++++++++++--------
 4 files changed, 48 insertions(+), 9 deletions(-)
 create mode 100644 include/net/page_pool/memory_provider.h

diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
new file mode 100644
index 000000000000..e49d0a52629d
--- /dev/null
+++ b/include/net/page_pool/memory_provider.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H
+#define _NET_PAGE_POOL_MEMORY_PROVIDER_H
+
+#include <net/netmem.h>
+#include <net/page_pool/types.h>
+
+struct memory_provider_ops {
+	netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp);
+	bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem);
+	int (*init)(struct page_pool *pool);
+	void (*destroy)(struct page_pool *pool);
+};
+
+#endif
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 7f405672b089..36eb57d73abc 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -152,8 +152,11 @@ struct page_pool_stats {
  */
 #define PAGE_POOL_FRAG_GROUP_ALIGN	(4 * sizeof(long))
 
+struct memory_provider_ops;
+
 struct pp_memory_provider_params {
 	void *mp_priv;
+	const struct memory_provider_ops *mp_ops;
 };
 
 struct page_pool {
@@ -216,6 +219,7 @@ struct page_pool {
 	struct ptr_ring ring;
 
 	void *mp_priv;
+	const struct memory_provider_ops *mp_ops;
 
 #ifdef CONFIG_PAGE_POOL_STATS
 	/* recycle stats are per-cpu to avoid locking */
diff --git a/net/core/devmem.c b/net/core/devmem.c
index fb0dddcb4e60..c81625ca57c6 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -16,6 +16,7 @@
 #include <net/netdev_queues.h>
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
 #include <trace/events/page_pool.h>
 
 #include "devmem.h"
@@ -27,6 +28,8 @@
 /* Protected by rtnl_lock() */
 static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
 
+static const struct memory_provider_ops dmabuf_devmem_ops;
+
 static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
 					       struct gen_pool_chunk *chunk,
 					       void *not_used)
@@ -118,6 +121,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
 		WARN_ON(rxq->mp_params.mp_priv != binding);
 
 		rxq->mp_params.mp_priv = NULL;
+		rxq->mp_params.mp_ops = NULL;
 
 		rxq_idx = get_netdev_rx_queue_index(rxq);
 
@@ -153,7 +157,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 	}
 
 	rxq = __netif_get_rx_queue(dev, rxq_idx);
-	if (rxq->mp_params.mp_priv) {
+	if (rxq->mp_params.mp_ops) {
 		NL_SET_ERR_MSG(extack, "designated queue already memory provider bound");
 		return -EEXIST;
 	}
@@ -171,6 +175,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 		return err;
 
 	rxq->mp_params.mp_priv = binding;
+	rxq->mp_params.mp_ops = &dmabuf_devmem_ops;
 
 	err = netdev_rx_queue_restart(dev, rxq_idx);
 	if (err)
@@ -180,6 +185,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 
 err_xa_erase:
 	rxq->mp_params.mp_priv = NULL;
+	rxq->mp_params.mp_ops = NULL;
 	xa_erase(&binding->bound_rxqs, xa_idx);
 
 	return err;
@@ -399,3 +405,10 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
 	/* We don't want the page pool put_page()ing our net_iovs. */
 	return false;
 }
+
+static const struct memory_provider_ops dmabuf_devmem_ops = {
+	.init			= mp_dmabuf_devmem_init,
+	.destroy		= mp_dmabuf_devmem_destroy,
+	.alloc_netmems		= mp_dmabuf_devmem_alloc_netmems,
+	.release_netmem		= mp_dmabuf_devmem_release_page,
+};
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f5e908c9e7ad..d632cf2c91c3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -13,6 +13,7 @@
 
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
 #include <net/xdp.h>
 
 #include <linux/dma-direction.h>
@@ -285,13 +286,19 @@ static int page_pool_init(struct page_pool *pool,
 		rxq = __netif_get_rx_queue(pool->slow.netdev,
 					   pool->slow.queue_idx);
 		pool->mp_priv = rxq->mp_params.mp_priv;
+		pool->mp_ops = rxq->mp_params.mp_ops;
 	}
 
-	if (pool->mp_priv) {
+	if (pool->mp_ops) {
 		if (!pool->dma_map || !pool->dma_sync)
 			return -EOPNOTSUPP;
 
-		err = mp_dmabuf_devmem_init(pool);
+		if (WARN_ON(!is_kernel_rodata((unsigned long)pool->mp_ops))) {
+			err = -EFAULT;
+			goto free_ptr_ring;
+		}
+
+		err = pool->mp_ops->init(pool);
 		if (err) {
 			pr_warn("%s() mem-provider init failed %d\n", __func__,
 				err);
@@ -587,8 +594,8 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
 		return netmem;
 
 	/* Slow-path: cache empty, do real allocation */
-	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
-		netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp);
+	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
+		netmem = pool->mp_ops->alloc_netmems(pool, gfp);
 	else
 		netmem = __page_pool_alloc_pages_slow(pool, gfp);
 	return netmem;
@@ -679,8 +686,8 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
 	bool put;
 
 	put = true;
-	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
-		put = mp_dmabuf_devmem_release_page(pool, netmem);
+	if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
+		put = pool->mp_ops->release_netmem(pool, netmem);
 	else
 		__page_pool_release_page_dma(pool, netmem);
 
@@ -1048,8 +1055,8 @@ static void __page_pool_destroy(struct page_pool *pool)
 	page_pool_unlist(pool);
 	page_pool_uninit(pool);
 
-	if (pool->mp_priv) {
-		mp_dmabuf_devmem_destroy(pool);
+	if (pool->mp_ops) {
+		pool->mp_ops->destroy(pool);
 		static_branch_dec(&page_pool_mem_providers);
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 05/10] netdev: add io_uring memory provider info
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (3 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 04/10] net: page_pool: create hooks for custom memory providers David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 06/10] net: page_pool: add callback for mp info printing David Wei
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

Add a nested attribute for io_uring memory provider info. For now it is
empty and its presence indicates that a particular page pool or queue
has an io_uring memory provider attached.

$ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get
[{'id': 80,
  'ifindex': 2,
  'inflight': 64,
  'inflight-mem': 262144,
  'napi-id': 525},
 {'id': 79,
  'ifindex': 2,
  'inflight': 320,
  'inflight-mem': 1310720,
  'io_uring': {},
  'napi-id': 525},
...

$ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get
[{'id': 0, 'ifindex': 1, 'type': 'rx'},
 {'id': 0, 'ifindex': 1, 'type': 'tx'},
 {'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'},
 {'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'},
...
 {'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'},
...

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 Documentation/netlink/specs/netdev.yaml | 15 +++++++++++++++
 include/uapi/linux/netdev.h             |  7 +++++++
 tools/include/uapi/linux/netdev.h       |  7 +++++++
 3 files changed, 29 insertions(+)

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..288923e965ae 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -114,6 +114,9 @@ attribute-sets:
         doc: Bitmask of enabled AF_XDP features.
         type: u64
         enum: xsk-flags
+  -
+    name: io-uring-provider-info
+    attributes: []
   -
     name: page-pool
     attributes:
@@ -171,6 +174,11 @@ attribute-sets:
         name: dmabuf
         doc: ID of the dmabuf this page-pool is attached to.
         type: u32
+      -
+        name: io-uring
+        doc: io-uring memory provider information.
+        type: nest
+        nested-attributes: io-uring-provider-info
   -
     name: page-pool-info
     subset-of: page-pool
@@ -296,6 +304,11 @@ attribute-sets:
         name: dmabuf
         doc: ID of the dmabuf attached to this queue, if any.
         type: u32
+      -
+        name: io-uring
+        doc: io_uring memory provider information.
+        type: nest
+        nested-attributes: io-uring-provider-info
 
   -
     name: qstats
@@ -572,6 +585,7 @@ operations:
             - inflight-mem
             - detach-time
             - dmabuf
+            - io-uring
       dump:
         reply: *pp-reply
       config-cond: page-pool
@@ -637,6 +651,7 @@ operations:
             - napi-id
             - ifindex
             - dmabuf
+            - io-uring
       dump:
         request:
           attributes:
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e4be227d3ad6..6c6ee183802d 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -86,6 +86,11 @@ enum {
 	NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
 };
 
+enum {
+	__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
+	NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
+};
+
 enum {
 	NETDEV_A_PAGE_POOL_ID = 1,
 	NETDEV_A_PAGE_POOL_IFINDEX,
@@ -94,6 +99,7 @@ enum {
 	NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
 	NETDEV_A_PAGE_POOL_DETACH_TIME,
 	NETDEV_A_PAGE_POOL_DMABUF,
+	NETDEV_A_PAGE_POOL_IO_URING,
 
 	__NETDEV_A_PAGE_POOL_MAX,
 	NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@@ -136,6 +142,7 @@ enum {
 	NETDEV_A_QUEUE_TYPE,
 	NETDEV_A_QUEUE_NAPI_ID,
 	NETDEV_A_QUEUE_DMABUF,
+	NETDEV_A_QUEUE_IO_URING,
 
 	__NETDEV_A_QUEUE_MAX,
 	NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index e4be227d3ad6..6c6ee183802d 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -86,6 +86,11 @@ enum {
 	NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
 };
 
+enum {
+	__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
+	NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
+};
+
 enum {
 	NETDEV_A_PAGE_POOL_ID = 1,
 	NETDEV_A_PAGE_POOL_IFINDEX,
@@ -94,6 +99,7 @@ enum {
 	NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
 	NETDEV_A_PAGE_POOL_DETACH_TIME,
 	NETDEV_A_PAGE_POOL_DMABUF,
+	NETDEV_A_PAGE_POOL_IO_URING,
 
 	__NETDEV_A_PAGE_POOL_MAX,
 	NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@@ -136,6 +142,7 @@ enum {
 	NETDEV_A_QUEUE_TYPE,
 	NETDEV_A_QUEUE_NAPI_ID,
 	NETDEV_A_QUEUE_DMABUF,
+	NETDEV_A_QUEUE_IO_URING,
 
 	__NETDEV_A_QUEUE_MAX,
 	NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 06/10] net: page_pool: add callback for mp info printing
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (4 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 05/10] netdev: add io_uring memory provider info David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 07/10] net: page_pool: add a mp hook to unregister_netdevice* David Wei
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

Add a mandatory callback that prints information about the memory
provider to netlink.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/page_pool/memory_provider.h |  5 +++++
 net/core/devmem.c                       | 10 ++++++++++
 net/core/netdev-genl.c                  | 11 ++++++-----
 net/core/page_pool_user.c               |  5 ++---
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index e49d0a52629d..6d10a0959d00 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -5,11 +5,16 @@
 #include <net/netmem.h>
 #include <net/page_pool/types.h>
 
+struct netdev_rx_queue;
+struct sk_buff;
+
 struct memory_provider_ops {
 	netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp);
 	bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem);
 	int (*init)(struct page_pool *pool);
 	void (*destroy)(struct page_pool *pool);
+	int (*nl_fill)(void *mp_priv, struct sk_buff *rsp,
+		       struct netdev_rx_queue *rxq);
 };
 
 #endif
diff --git a/net/core/devmem.c b/net/core/devmem.c
index c81625ca57c6..63b790326c7d 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -406,9 +406,19 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
 	return false;
 }
 
+static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp,
+				    struct netdev_rx_queue *rxq)
+{
+	const struct net_devmem_dmabuf_binding *binding = mp_priv;
+	int type = rxq ? NETDEV_A_QUEUE_DMABUF : NETDEV_A_PAGE_POOL_DMABUF;
+
+	return nla_put_u32(rsp, type, binding->id);
+}
+
 static const struct memory_provider_ops dmabuf_devmem_ops = {
 	.init			= mp_dmabuf_devmem_init,
 	.destroy		= mp_dmabuf_devmem_destroy,
 	.alloc_netmems		= mp_dmabuf_devmem_alloc_netmems,
 	.release_netmem		= mp_dmabuf_devmem_release_page,
+	.nl_fill		= mp_dmabuf_devmem_nl_fill,
 };
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 715f85c6b62e..5b459b4fef46 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -10,6 +10,7 @@
 #include <net/sock.h>
 #include <net/xdp.h>
 #include <net/xdp_sock.h>
+#include <net/page_pool/memory_provider.h>
 
 #include "dev.h"
 #include "devmem.h"
@@ -368,7 +369,7 @@ static int
 netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 			 u32 q_idx, u32 q_type, const struct genl_info *info)
 {
-	struct net_devmem_dmabuf_binding *binding;
+	struct pp_memory_provider_params *params;
 	struct netdev_rx_queue *rxq;
 	struct netdev_queue *txq;
 	void *hdr;
@@ -385,15 +386,15 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
 	switch (q_type) {
 	case NETDEV_QUEUE_TYPE_RX:
 		rxq = __netif_get_rx_queue(netdev, q_idx);
+
 		if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
 					     rxq->napi->napi_id))
 			goto nla_put_failure;
 
-		binding = rxq->mp_params.mp_priv;
-		if (binding &&
-		    nla_put_u32(rsp, NETDEV_A_QUEUE_DMABUF, binding->id))
+		params = &rxq->mp_params;
+		if (params->mp_ops &&
+		    params->mp_ops->nl_fill(params->mp_priv, rsp, rxq))
 			goto nla_put_failure;
-
 		break;
 	case NETDEV_QUEUE_TYPE_TX:
 		txq = netdev_get_tx_queue(netdev, q_idx);
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index d5e214c30c31..9d8a3d8597fa 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -8,9 +8,9 @@
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/helpers.h>
 #include <net/page_pool/types.h>
+#include <net/page_pool/memory_provider.h>
 #include <net/sock.h>
 
-#include "devmem.h"
 #include "page_pool_priv.h"
 #include "netdev-genl-gen.h"
 
@@ -216,7 +216,6 @@ static int
 page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 		  const struct genl_info *info)
 {
-	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
 	size_t inflight, refsz;
 	unsigned int napi_id;
 	void *hdr;
@@ -249,7 +248,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
 			 pool->user.detach_time))
 		goto err_cancel;
 
-	if (binding && nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id))
+	if (pool->mp_ops && pool->mp_ops->nl_fill(pool->mp_priv, rsp, NULL))
 		goto err_cancel;
 
 	genlmsg_end(rsp, hdr);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 07/10] net: page_pool: add a mp hook to unregister_netdevice*
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (5 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 06/10] net: page_pool: add callback for mp info printing David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 08/10] net: prepare for non devmem TCP memory providers David Wei
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep
the set tracking queues it's bound to, i.e. ->bound_rxqs. Instead of
devmem sticking directly out of the genetic path, add a mp function.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/page_pool/memory_provider.h |  1 +
 net/core/dev.c                          | 16 ++++++++++-
 net/core/devmem.c                       | 36 +++++++++++--------------
 net/core/devmem.h                       |  5 ----
 4 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 6d10a0959d00..36469a7e649f 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -15,6 +15,7 @@ struct memory_provider_ops {
 	void (*destroy)(struct page_pool *pool);
 	int (*nl_fill)(void *mp_priv, struct sk_buff *rsp,
 		       struct netdev_rx_queue *rxq);
+	void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
 };
 
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index c0021cbd28fc..384b9109932a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -159,6 +159,7 @@
 #include <net/netdev_rx_queue.h>
 #include <net/page_pool/types.h>
 #include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
 #include <net/rps.h>
 #include <linux/phy_link_topology.h>
 
@@ -11724,6 +11725,19 @@ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL(unregister_netdevice_queue);
 
+static void dev_memory_provider_uninstall(struct net_device *dev)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->real_num_rx_queues; i++) {
+		struct netdev_rx_queue *rxq = &dev->_rx[i];
+		struct pp_memory_provider_params *p = &rxq->mp_params;
+
+		if (p->mp_ops && p->mp_ops->uninstall)
+			p->mp_ops->uninstall(rxq->mp_params.mp_priv, rxq);
+	}
+}
+
 void unregister_netdevice_many_notify(struct list_head *head,
 				      u32 portid, const struct nlmsghdr *nlh)
 {
@@ -11778,7 +11792,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
 		dev_tcx_uninstall(dev);
 		dev_xdp_uninstall(dev);
 		bpf_dev_bound_netdev_unregister(dev);
-		dev_dmabuf_uninstall(dev);
+		dev_memory_provider_uninstall(dev);
 
 		netdev_offload_xstats_disable_all(dev);
 
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 63b790326c7d..cbac6419fcc4 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -320,26 +320,6 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 	return ERR_PTR(err);
 }
 
-void dev_dmabuf_uninstall(struct net_device *dev)
-{
-	struct net_devmem_dmabuf_binding *binding;
-	struct netdev_rx_queue *rxq;
-	unsigned long xa_idx;
-	unsigned int i;
-
-	for (i = 0; i < dev->real_num_rx_queues; i++) {
-		binding = dev->_rx[i].mp_params.mp_priv;
-		if (!binding)
-			continue;
-
-		xa_for_each(&binding->bound_rxqs, xa_idx, rxq)
-			if (rxq == &dev->_rx[i]) {
-				xa_erase(&binding->bound_rxqs, xa_idx);
-				break;
-			}
-	}
-}
-
 /*** "Dmabuf devmem memory provider" ***/
 
 int mp_dmabuf_devmem_init(struct page_pool *pool)
@@ -415,10 +395,26 @@ static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp,
 	return nla_put_u32(rsp, type, binding->id);
 }
 
+static void mp_dmabuf_devmem_uninstall(void *mp_priv,
+				       struct netdev_rx_queue *rxq)
+{
+	struct net_devmem_dmabuf_binding *binding = mp_priv;
+	struct netdev_rx_queue *bound_rxq;
+	unsigned long xa_idx;
+
+	xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
+		if (bound_rxq == rxq) {
+			xa_erase(&binding->bound_rxqs, xa_idx);
+			break;
+		}
+	}
+}
+
 static const struct memory_provider_ops dmabuf_devmem_ops = {
 	.init			= mp_dmabuf_devmem_init,
 	.destroy		= mp_dmabuf_devmem_destroy,
 	.alloc_netmems		= mp_dmabuf_devmem_alloc_netmems,
 	.release_netmem		= mp_dmabuf_devmem_release_page,
 	.nl_fill		= mp_dmabuf_devmem_nl_fill,
+	.uninstall		= mp_dmabuf_devmem_uninstall,
 };
diff --git a/net/core/devmem.h b/net/core/devmem.h
index a2b9913e9a17..8e999fe2ae67 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -68,7 +68,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
 int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 				    struct net_devmem_dmabuf_binding *binding,
 				    struct netlink_ext_ack *extack);
-void dev_dmabuf_uninstall(struct net_device *dev);
 
 static inline struct dmabuf_genpool_chunk_owner *
 net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
@@ -145,10 +144,6 @@ net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 	return -EOPNOTSUPP;
 }
 
-static inline void dev_dmabuf_uninstall(struct net_device *dev)
-{
-}
-
 static inline struct net_iov *
 net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 08/10] net: prepare for non devmem TCP memory providers
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (6 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 07/10] net: page_pool: add a mp hook to unregister_netdevice* David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 09/10] net: page_pool: add memory provider helpers David Wei
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

There is a good bunch of places in generic paths assuming that the only
page pool memory provider is devmem TCP. As we want to reuse the net_iov
and provider infrastructure, we need to patch it up and explicitly check
the provider type when we branch into devmem TCP code.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 net/core/devmem.c | 5 +++++
 net/core/devmem.h | 7 +++++++
 net/ipv4/tcp.c    | 5 +++++
 3 files changed, 17 insertions(+)

diff --git a/net/core/devmem.c b/net/core/devmem.c
index cbac6419fcc4..7c6e0b5b6acb 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -30,6 +30,11 @@ static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
 
 static const struct memory_provider_ops dmabuf_devmem_ops;
 
+bool net_is_devmem_iov(struct net_iov *niov)
+{
+	return niov->pp->mp_ops == &dmabuf_devmem_ops;
+}
+
 static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
 					       struct gen_pool_chunk *chunk,
 					       void *not_used)
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 8e999fe2ae67..7fc158d52729 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -115,6 +115,8 @@ struct net_iov *
 net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
 void net_devmem_free_dmabuf(struct net_iov *ppiov);
 
+bool net_is_devmem_iov(struct net_iov *niov);
+
 #else
 struct net_devmem_dmabuf_binding;
 
@@ -163,6 +165,11 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
 {
 	return 0;
 }
+
+static inline bool net_is_devmem_iov(struct net_iov *niov)
+{
+	return false;
+}
 #endif
 
 #endif /* _NET_DEVMEM_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b872de9a8271..7f43d31c9400 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2476,6 +2476,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 			}
 
 			niov = skb_frag_net_iov(frag);
+			if (!net_is_devmem_iov(niov)) {
+				err = -ENODEV;
+				goto out;
+			}
+
 			end = start + skb_frag_size(frag);
 			copy = end - offset;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 09/10] net: page_pool: add memory provider helpers
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (7 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 08/10] net: prepare for non devmem TCP memory providers David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-04 21:56 ` [PATCH net-next v13 10/10] net: add helpers for setting a memory provider on an rx queue David Wei
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

From: Pavel Begunkov <asml.silence@gmail.com>

Add helpers for memory providers to interact with page pools.
net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov
with a page pool. If used, the memory provider is responsible to match
"set" calls with "clear" once a net_iov is not going to be used by a page
pool anymore, changing a page pool, etc.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/page_pool/memory_provider.h | 19 +++++++++++++++++
 net/core/page_pool.c                    | 28 +++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 36469a7e649f..4f0ffb8f6a0a 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -18,4 +18,23 @@ struct memory_provider_ops {
 	void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
 };
 
+bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
+void net_mp_niov_clear_page_pool(struct net_iov *niov);
+
+/**
+  * net_mp_netmem_place_in_cache() - give a netmem to a page pool
+  * @pool:      the page pool to place the netmem into
+  * @netmem:    netmem to give
+  *
+  * Push an accounted netmem into the page pool's allocation cache. The caller
+  * must ensure that there is space in the cache. It should only be called off
+  * the mp_ops->alloc_netmems() path.
+  */
+static inline void net_mp_netmem_place_in_cache(struct page_pool *pool,
+						netmem_ref netmem)
+{
+	pool->alloc.cache[pool->alloc.count++] = netmem;
+}
+
 #endif
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d632cf2c91c3..686bd4a117d9 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1197,3 +1197,31 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
+
+bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr)
+{
+	return page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), addr);
+}
+
+/* Associate a niov with a page pool. Should follow with a matching
+ * net_mp_niov_clear_page_pool()
+ */
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov)
+{
+	netmem_ref netmem = net_iov_to_netmem(niov);
+
+	page_pool_set_pp_info(pool, netmem);
+
+	pool->pages_state_hold_cnt++;
+	trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
+}
+
+/* Disassociate a niov from a page pool. Should only be used in the
+ * ->release_netmem() path.
+ */
+void net_mp_niov_clear_page_pool(struct net_iov *niov)
+{
+	netmem_ref netmem = net_iov_to_netmem(niov);
+
+	page_pool_clear_pp_info(netmem);
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v13 10/10] net: add helpers for setting a memory provider on an rx queue
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (8 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 09/10] net: page_pool: add memory provider helpers David Wei
@ 2025-02-04 21:56 ` David Wei
  2025-02-05 17:44 ` [PATCH net-next v13 00/10] io_uring zero copy rx David Ahern
  2025-02-07  1:00 ` patchwork-bot+netdevbpf
  11 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2025-02-04 21:56 UTC (permalink / raw)
  To: netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

Add helpers that properly prep or remove a memory provider for an rx
queue then restart the queue.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 include/net/page_pool/memory_provider.h |  5 ++
 net/core/netdev_rx_queue.c              | 69 +++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 4f0ffb8f6a0a..b3e665897767 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -22,6 +22,11 @@ bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
 void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
 void net_mp_niov_clear_page_pool(struct net_iov *niov);
 
+int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+		    struct pp_memory_provider_params *p);
+void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+		      struct pp_memory_provider_params *old_p);
+
 /**
   * net_mp_netmem_place_in_cache() - give a netmem to a page pool
   * @pool:      the page pool to place the netmem into
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index db82786fa0c4..db46880f37cc 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -3,6 +3,7 @@
 #include <linux/netdevice.h>
 #include <net/netdev_queues.h>
 #include <net/netdev_rx_queue.h>
+#include <net/page_pool/memory_provider.h>
 
 #include "page_pool_priv.h"
 
@@ -80,3 +81,71 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
 	return err;
 }
 EXPORT_SYMBOL_NS_GPL(netdev_rx_queue_restart, "NETDEV_INTERNAL");
+
+static int __net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+			     struct pp_memory_provider_params *p)
+{
+	struct netdev_rx_queue *rxq;
+	int ret;
+
+	if (ifq_idx >= dev->real_num_rx_queues)
+		return -EINVAL;
+	ifq_idx = array_index_nospec(ifq_idx, dev->real_num_rx_queues);
+
+	rxq = __netif_get_rx_queue(dev, ifq_idx);
+	if (rxq->mp_params.mp_ops)
+		return -EEXIST;
+
+	rxq->mp_params = *p;
+	ret = netdev_rx_queue_restart(dev, ifq_idx);
+	if (ret) {
+		rxq->mp_params.mp_ops = NULL;
+		rxq->mp_params.mp_priv = NULL;
+	}
+	return ret;
+}
+
+int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+		    struct pp_memory_provider_params *p)
+{
+	int ret;
+
+	rtnl_lock();
+	ret = __net_mp_open_rxq(dev, ifq_idx, p);
+	rtnl_unlock();
+	return ret;
+}
+
+static void __net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+			      struct pp_memory_provider_params *old_p)
+{
+	struct netdev_rx_queue *rxq;
+
+	if (WARN_ON_ONCE(ifq_idx >= dev->real_num_rx_queues))
+		return;
+
+	rxq = __netif_get_rx_queue(dev, ifq_idx);
+
+	/* Callers holding a netdev ref may get here after we already
+	 * went thru shutdown via dev_memory_provider_uninstall().
+	 */
+	if (dev->reg_state > NETREG_REGISTERED &&
+	    !rxq->mp_params.mp_ops)
+		return;
+
+	if (WARN_ON_ONCE(rxq->mp_params.mp_ops != old_p->mp_ops ||
+			 rxq->mp_params.mp_priv != old_p->mp_priv))
+		return;
+
+	rxq->mp_params.mp_ops = NULL;
+	rxq->mp_params.mp_priv = NULL;
+	WARN_ON(netdev_rx_queue_restart(dev, ifq_idx));
+}
+
+void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+		      struct pp_memory_provider_params *old_p)
+{
+	rtnl_lock();
+	__net_mp_close_rxq(dev, ifq_idx, old_p);
+	rtnl_unlock();
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (9 preceding siblings ...)
  2025-02-04 21:56 ` [PATCH net-next v13 10/10] net: add helpers for setting a memory provider on an rx queue David Wei
@ 2025-02-05 17:44 ` David Ahern
  2025-02-05 18:00   ` David Wei
  2025-02-07  1:00 ` patchwork-bot+netdevbpf
  11 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2025-02-05 17:44 UTC (permalink / raw)
  To: David Wei, netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

On 2/4/25 2:56 PM, David Wei wrote:
> We share netdev core infra with devmem TCP. The main difference is that
> io_uring is used for the uAPI and the lifetime of all objects are bound
> to an io_uring instance. Data is 'read' using a new io_uring request
> type. When done, data is returned via a new shared refill queue. A zero
> copy page pool refills a hw rx queue from this refill queue directly. Of
> course, the lifetime of these data buffers are managed by io_uring
> rather than the networking stack, with different refcounting rules.

just to make sure I understand, working with GPU memory as well as host
memory is not a goal of this patch set?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-05 17:44 ` [PATCH net-next v13 00/10] io_uring zero copy rx David Ahern
@ 2025-02-05 18:00   ` David Wei
  2025-02-05 18:43     ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: David Wei @ 2025-02-05 18:00 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

On 2025-02-05 09:44, David Ahern wrote:
> On 2/4/25 2:56 PM, David Wei wrote:
>> We share netdev core infra with devmem TCP. The main difference is that
>> io_uring is used for the uAPI and the lifetime of all objects are bound
>> to an io_uring instance. Data is 'read' using a new io_uring request
>> type. When done, data is returned via a new shared refill queue. A zero
>> copy page pool refills a hw rx queue from this refill queue directly. Of
>> course, the lifetime of these data buffers are managed by io_uring
>> rather than the networking stack, with different refcounting rules.
> 
> just to make sure I understand, working with GPU memory as well as host
> memory is not a goal of this patch set?

Yes, this patchset is only for host memory.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-05 18:00   ` David Wei
@ 2025-02-05 18:43     ` David Ahern
  2025-02-05 19:04       ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2025-02-05 18:43 UTC (permalink / raw)
  To: David Wei, netdev
  Cc: Pavel Begunkov, Jakub Kicinski, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jesper Dangaard Brouer, Mina Almasry,
	Stanislav Fomichev, Joe Damato, Pedro Tammela

On 2/5/25 11:00 AM, David Wei wrote:
> On 2025-02-05 09:44, David Ahern wrote:
>> On 2/4/25 2:56 PM, David Wei wrote:
>>> We share netdev core infra with devmem TCP. The main difference is that
>>> io_uring is used for the uAPI and the lifetime of all objects are bound
>>> to an io_uring instance. Data is 'read' using a new io_uring request
>>> type. When done, data is returned via a new shared refill queue. A zero
>>> copy page pool refills a hw rx queue from this refill queue directly. Of
>>> course, the lifetime of these data buffers are managed by io_uring
>>> rather than the networking stack, with different refcounting rules.
>>
>> just to make sure I understand, working with GPU memory as well as host
>> memory is not a goal of this patch set?
> 
> Yes, this patchset is only for host memory.

And is GPU memory on the near term to-do list?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-05 18:43     ` David Ahern
@ 2025-02-05 19:04       ` Pavel Begunkov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2025-02-05 19:04 UTC (permalink / raw)
  To: David Ahern, David Wei, netdev
  Cc: Jakub Kicinski, Paolo Abeni, David S. Miller, Eric Dumazet,
	Jesper Dangaard Brouer, Mina Almasry, Stanislav Fomichev,
	Joe Damato, Pedro Tammela

On 2/5/25 18:43, David Ahern wrote:
> On 2/5/25 11:00 AM, David Wei wrote:
>> On 2025-02-05 09:44, David Ahern wrote:
>>> On 2/4/25 2:56 PM, David Wei wrote:
>>>> We share netdev core infra with devmem TCP. The main difference is that
>>>> io_uring is used for the uAPI and the lifetime of all objects are bound
>>>> to an io_uring instance. Data is 'read' using a new io_uring request
>>>> type. When done, data is returned via a new shared refill queue. A zero
>>>> copy page pool refills a hw rx queue from this refill queue directly. Of
>>>> course, the lifetime of these data buffers are managed by io_uring
>>>> rather than the networking stack, with different refcounting rules.
>>>
>>> just to make sure I understand, working with GPU memory as well as host
>>> memory is not a goal of this patch set?
>>
>> Yes, this patchset is only for host memory.
> 
> And is GPU memory on the near term to-do list?

Not a priority, but yes, and it's fairly easy to add as it'd
only need changes in setup disregard the fallback path.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
                   ` (10 preceding siblings ...)
  2025-02-05 17:44 ` [PATCH net-next v13 00/10] io_uring zero copy rx David Ahern
@ 2025-02-07  1:00 ` patchwork-bot+netdevbpf
  2025-02-07  1:07   ` Jakub Kicinski
  11 siblings, 1 reply; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-07  1:00 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, asml.silence, kuba, pabeni, davem, edumazet, hawk,
	dsahern, almasrymina, stfomichev, jdamato, pctammela

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  4 Feb 2025 13:56:11 -0800 you wrote:
> This patchset contains net/ patches needed by a new io_uring request
> implementing zero copy rx into userspace pages, eliminating a kernel
> to user copy.
> 
> We configure a page pool that a driver uses to fill a hw rx queue to
> hand out user pages instead of kernel pages. Any data that ends up
> hitting this hw rx queue will thus be dma'd into userspace memory
> directly, without needing to be bounced through kernel memory. 'Reading'
> data out of a socket instead becomes a _notification_ mechanism, where
> the kernel tells userspace where the data is. The overall approach is
> similar to the devmem TCP proposal.
> 
> [...]

Here is the summary with links:
  - [net-next,v13,01/10] net: page_pool: don't cast mp param to devmem
    https://git.kernel.org/netdev/net-next/c/8d522566ae9c
  - [net-next,v13,02/10] net: prefix devmem specific helpers
    https://git.kernel.org/netdev/net-next/c/297d389e9e5b
  - [net-next,v13,03/10] net: generalise net_iov chunk owners
    https://git.kernel.org/netdev/net-next/c/7d60fa9e1ab1
  - [net-next,v13,04/10] net: page_pool: create hooks for custom memory providers
    https://git.kernel.org/netdev/net-next/c/57afb4830157
  - [net-next,v13,05/10] netdev: add io_uring memory provider info
    https://git.kernel.org/netdev/net-next/c/dcc0113acd3b
  - [net-next,v13,06/10] net: page_pool: add callback for mp info printing
    https://git.kernel.org/netdev/net-next/c/2508a46f920a
  - [net-next,v13,07/10] net: page_pool: add a mp hook to unregister_netdevice*
    https://git.kernel.org/netdev/net-next/c/f8350a4358fc
  - [net-next,v13,08/10] net: prepare for non devmem TCP memory providers
    https://git.kernel.org/netdev/net-next/c/69e39537b662
  - [net-next,v13,09/10] net: page_pool: add memory provider helpers
    https://git.kernel.org/netdev/net-next/c/56102c013fa7
  - [net-next,v13,10/10] net: add helpers for setting a memory provider on an rx queue
    https://git.kernel.org/netdev/net-next/c/6e18ed929d3b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-07  1:00 ` patchwork-bot+netdevbpf
@ 2025-02-07  1:07   ` Jakub Kicinski
  2025-02-08 13:29     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-07  1:07 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf
  Cc: David Wei, netdev, asml.silence, pabeni, davem, edumazet, hawk,
	dsahern, almasrymina, stfomichev, jdamato, pctammela

On Fri, 07 Feb 2025 01:00:37 +0000 patchwork-bot+netdevbpf@kernel.org
wrote:
> This series was applied to netdev/net-next.git

off of v6.14-rc1 so 71f0dd5a3293d75 should pull in cleanly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-07  1:07   ` Jakub Kicinski
@ 2025-02-08 13:29     ` Pavel Begunkov
  2025-02-11  0:11       ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2025-02-08 13:29 UTC (permalink / raw)
  To: Jakub Kicinski, patchwork-bot+netdevbpf
  Cc: David Wei, netdev, pabeni, davem, edumazet, hawk, dsahern,
	almasrymina, stfomichev, jdamato, pctammela

On 2/7/25 01:07, Jakub Kicinski wrote:
> On Fri, 07 Feb 2025 01:00:37 +0000 patchwork-bot+netdevbpf@kernel.org
> wrote:
>> This series was applied to netdev/net-next.git
> 
> off of v6.14-rc1 so 71f0dd5a3293d75 should pull in cleanly

Thanks Jakub, from here we'll merge the rest of the patches through
io_uring as planned.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v13 00/10] io_uring zero copy rx
  2025-02-08 13:29     ` Pavel Begunkov
@ 2025-02-11  0:11       ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-11  0:11 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: patchwork-bot+netdevbpf, David Wei, netdev, pabeni, davem,
	edumazet, hawk, dsahern, almasrymina, stfomichev, jdamato,
	pctammela

On Sat, 8 Feb 2025 13:29:56 +0000 Pavel Begunkov wrote:
> >> This series was applied to netdev/net-next.git  
> > 
> > off of v6.14-rc1 so 71f0dd5a3293d75 should pull in cleanly  
> 
> Thanks Jakub, from here we'll merge the rest of the patches through
> io_uring as planned.

You should still CC netdev, to be clear.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-11  0:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-04 21:56 [PATCH net-next v13 00/10] io_uring zero copy rx David Wei
2025-02-04 21:56 ` [PATCH net-next v13 01/10] net: page_pool: don't cast mp param to devmem David Wei
2025-02-04 21:56 ` [PATCH net-next v13 02/10] net: prefix devmem specific helpers David Wei
2025-02-04 21:56 ` [PATCH net-next v13 03/10] net: generalise net_iov chunk owners David Wei
2025-02-04 21:56 ` [PATCH net-next v13 04/10] net: page_pool: create hooks for custom memory providers David Wei
2025-02-04 21:56 ` [PATCH net-next v13 05/10] netdev: add io_uring memory provider info David Wei
2025-02-04 21:56 ` [PATCH net-next v13 06/10] net: page_pool: add callback for mp info printing David Wei
2025-02-04 21:56 ` [PATCH net-next v13 07/10] net: page_pool: add a mp hook to unregister_netdevice* David Wei
2025-02-04 21:56 ` [PATCH net-next v13 08/10] net: prepare for non devmem TCP memory providers David Wei
2025-02-04 21:56 ` [PATCH net-next v13 09/10] net: page_pool: add memory provider helpers David Wei
2025-02-04 21:56 ` [PATCH net-next v13 10/10] net: add helpers for setting a memory provider on an rx queue David Wei
2025-02-05 17:44 ` [PATCH net-next v13 00/10] io_uring zero copy rx David Ahern
2025-02-05 18:00   ` David Wei
2025-02-05 18:43     ` David Ahern
2025-02-05 19:04       ` Pavel Begunkov
2025-02-07  1:00 ` patchwork-bot+netdevbpf
2025-02-07  1:07   ` Jakub Kicinski
2025-02-08 13:29     ` Pavel Begunkov
2025-02-11  0:11       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).