* [RFC 00/19] Split netmem from struct page
@ 2025-05-09 11:51 Byungchul Park
2025-05-09 11:51 ` [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc Byungchul Park
` (19 more replies)
0 siblings, 20 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
The MM subsystem is trying to reduce struct page to a single pointer.
The first step towards that is splitting struct page by its individual
users, as has already been done with folio and slab. This patchset does
that for netmem which is used for page pools.
Matthew Wilcox tried and stopped the same work, you can see in:
https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
Mina Almasry already has done a lot fo prerequisite works by luck, he
said :). I stacked my patches on the top of his work e.i. netmem.
I focused on removing the page pool members in struct page this time,
not moving the allocation code of page pool from net to mm. It can be
done later if needed.
There are still a lot of works to do, to remove the dependency on struct
page in the network subsystem. I will continue to work on this after
this base patchset is merged.
This patchset is based on mm tree's mm-unstable branch.
Byungchul Park (19):
netmem: rename struct net_iov to struct netmem_desc
netmem: introduce netmem alloc/put API to wrap page alloc/put API
page_pool: use netmem alloc/put API in __page_pool_alloc_page_order()
page_pool: rename __page_pool_alloc_page_order() to
__page_pool_alloc_large_netmem()
page_pool: use netmem alloc/put API in __page_pool_alloc_pages_slow()
page_pool: rename page_pool_return_page() to page_pool_return_netmem()
page_pool: use netmem alloc/put API in page_pool_return_netmem()
page_pool: rename __page_pool_release_page_dma() to
__page_pool_release_netmem_dma()
page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
page_pool: rename __page_pool_alloc_pages_slow() to
__page_pool_alloc_netmems_slow()
mlx4: use netmem descriptor and API for page pool
netmem: introduce page_pool_recycle_direct_netmem()
page_pool: expand scope of is_pp_{netmem,page}() to global
mm: page_alloc: do not directly access page->pp_magic but use
is_pp_page()
mlx5: use netmem descriptor and API for page pool
netmem: use _Generic to cover const casting for page_to_netmem()
netmem: remove __netmem_get_pp()
page_pool: make page_pool_get_dma_addr() just wrap
page_pool_get_dma_addr_netmem()
mm, netmem: remove the page pool members in struct page
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 ++++----
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 +--
.../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 15 ++-
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 +++++------
include/linux/mm_types.h | 13 +--
include/linux/skbuff.h | 18 ++-
include/net/netmem.h | 88 +++++----------
include/net/netmem_type.h | 22 ++++
include/net/page_pool/helpers.h | 17 ++-
include/net/page_pool/memory_provider.h | 6 +-
include/net/page_pool/types.h | 2 +
io_uring/zcrx.c | 42 +++----
mm/page_alloc.c | 5 +-
net/core/devmem.c | 14 +--
net/core/devmem.h | 24 ++--
net/core/page_pool.c | 106 ++++++++++--------
net/core/skbuff.c | 5 -
net/ipv4/tcp.c | 2 +-
22 files changed, 272 insertions(+), 255 deletions(-)
create mode 100644 include/net/netmem_type.h
base-commit: fd93b3350b4314eebd8fbf0fea3ca7fe48d777e3
--
2.17.1
^ permalink raw reply [flat|nested] 53+ messages in thread
* [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-12 13:11 ` Pavel Begunkov
2025-05-09 11:51 ` [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API Byungchul Park
` (18 subsequent siblings)
19 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
To simplify struct page, the page pool members of struct page should be
moved to other, allowing these members to be removed from struct page.
Reuse struct net_iov for also system memory, that already mirrored the
page pool members.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/linux/skbuff.h | 4 +--
include/net/netmem.h | 20 ++++++------
include/net/page_pool/memory_provider.h | 6 ++--
io_uring/zcrx.c | 42 ++++++++++++-------------
net/core/devmem.c | 14 ++++-----
net/core/devmem.h | 24 +++++++-------
net/core/page_pool.c | 6 ++--
net/ipv4/tcp.c | 2 +-
8 files changed, 59 insertions(+), 59 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b974a277975a8..bf67c47319a56 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3598,10 +3598,10 @@ static inline bool skb_frag_is_net_iov(const skb_frag_t *frag)
* skb_frag_net_iov - retrieve the net_iov referred to by fragment
* @frag: the fragment
*
- * Return: the &struct net_iov associated with @frag. Returns NULL if this
+ * Return: the &struct netmem_desc associated with @frag. Returns NULL if this
* frag has no associated net_iov.
*/
-static inline struct net_iov *skb_frag_net_iov(const skb_frag_t *frag)
+static inline struct netmem_desc *skb_frag_net_iov(const skb_frag_t *frag)
{
if (!skb_frag_is_net_iov(frag))
return NULL;
diff --git a/include/net/netmem.h b/include/net/netmem.h
index c61d5b21e7b42..45c209d6cc689 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -20,7 +20,7 @@ DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
*/
#define NET_IOV 0x01UL
-struct net_iov {
+struct netmem_desc {
unsigned long __unused_padding;
unsigned long pp_magic;
struct page_pool *pp;
@@ -31,7 +31,7 @@ struct net_iov {
struct net_iov_area {
/* Array of net_iovs for this area. */
- struct net_iov *niovs;
+ struct netmem_desc *niovs;
size_t num_niovs;
/* Offset into the dma-buf where this chunk starts. */
@@ -56,19 +56,19 @@ struct net_iov_area {
*/
#define NET_IOV_ASSERT_OFFSET(pg, iov) \
static_assert(offsetof(struct page, pg) == \
- offsetof(struct net_iov, iov))
+ offsetof(struct netmem_desc, iov))
NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
NET_IOV_ASSERT_OFFSET(pp, pp);
NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
#undef NET_IOV_ASSERT_OFFSET
-static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov)
+static inline struct net_iov_area *net_iov_owner(const struct netmem_desc *niov)
{
return niov->owner;
}
-static inline unsigned int net_iov_idx(const struct net_iov *niov)
+static inline unsigned int net_iov_idx(const struct netmem_desc *niov)
{
return niov - net_iov_owner(niov)->niovs;
}
@@ -118,17 +118,17 @@ static inline struct page *netmem_to_page(netmem_ref netmem)
return __netmem_to_page(netmem);
}
-static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem)
+static inline struct netmem_desc *netmem_to_net_iov(netmem_ref netmem)
{
if (netmem_is_net_iov(netmem))
- return (struct net_iov *)((__force unsigned long)netmem &
+ return (struct netmem_desc *)((__force unsigned long)netmem &
~NET_IOV);
DEBUG_NET_WARN_ON_ONCE(true);
return NULL;
}
-static inline netmem_ref net_iov_to_netmem(struct net_iov *niov)
+static inline netmem_ref net_iov_to_netmem(struct netmem_desc *niov)
{
return (__force netmem_ref)((unsigned long)niov | NET_IOV);
}
@@ -168,9 +168,9 @@ static inline unsigned long netmem_pfn_trace(netmem_ref netmem)
return page_to_pfn(netmem_to_page(netmem));
}
-static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
+static inline struct netmem_desc *__netmem_clear_lsb(netmem_ref netmem)
{
- return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
+ return (struct netmem_desc *)((__force unsigned long)netmem & ~NET_IOV);
}
/**
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index ada4f968960ae..83aac2e9692c0 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -19,9 +19,9 @@ struct memory_provider_ops {
void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
};
-bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
-void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
-void net_mp_niov_clear_page_pool(struct net_iov *niov);
+bool net_mp_niov_set_dma_addr(struct netmem_desc *niov, dma_addr_t addr);
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct netmem_desc *niov);
+void net_mp_niov_clear_page_pool(struct netmem_desc *niov);
int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *p);
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 0f46e0404c045..c0b66039d43df 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -34,7 +34,7 @@ static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
int i;
for (i = 0; i < nr_mapped; i++) {
- struct net_iov *niov = &area->nia.niovs[i];
+ struct netmem_desc *niov = &area->nia.niovs[i];
dma_addr_t dma;
dma = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
@@ -55,7 +55,7 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
int i;
for (i = 0; i < area->nia.num_niovs; i++) {
- struct net_iov *niov = &area->nia.niovs[i];
+ struct netmem_desc *niov = &area->nia.niovs[i];
dma_addr_t dma;
dma = dma_map_page_attrs(ifq->dev, area->pages[i], 0, PAGE_SIZE,
@@ -79,7 +79,7 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
}
static void io_zcrx_sync_for_device(const struct page_pool *pool,
- struct net_iov *niov)
+ struct netmem_desc *niov)
{
#if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC)
dma_addr_t dma_addr;
@@ -106,21 +106,21 @@ struct io_zcrx_args {
static const struct memory_provider_ops io_uring_pp_zc_ops;
-static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
+static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct netmem_desc *niov)
{
struct net_iov_area *owner = net_iov_owner(niov);
return container_of(owner, struct io_zcrx_area, nia);
}
-static inline atomic_t *io_get_user_counter(struct net_iov *niov)
+static inline atomic_t *io_get_user_counter(struct netmem_desc *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
return &area->user_refs[net_iov_idx(niov)];
}
-static bool io_zcrx_put_niov_uref(struct net_iov *niov)
+static bool io_zcrx_put_niov_uref(struct netmem_desc *niov)
{
atomic_t *uref = io_get_user_counter(niov);
@@ -130,12 +130,12 @@ static bool io_zcrx_put_niov_uref(struct net_iov *niov)
return true;
}
-static void io_zcrx_get_niov_uref(struct net_iov *niov)
+static void io_zcrx_get_niov_uref(struct netmem_desc *niov)
{
atomic_inc(io_get_user_counter(niov));
}
-static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
+static inline struct page *io_zcrx_iov_page(const struct netmem_desc *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
@@ -242,7 +242,7 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
goto err;
for (i = 0; i < nr_iovs; i++) {
- struct net_iov *niov = &area->nia.niovs[i];
+ struct netmem_desc *niov = &area->nia.niovs[i];
niov->owner = &area->nia;
area->freelist[i] = i;
@@ -435,7 +435,7 @@ void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
io_zcrx_ifq_free(ifq);
}
-static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
+static struct netmem_desc *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
{
unsigned niov_idx;
@@ -445,7 +445,7 @@ static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
return &area->nia.niovs[niov_idx];
}
-static void io_zcrx_return_niov_freelist(struct net_iov *niov)
+static void io_zcrx_return_niov_freelist(struct netmem_desc *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
@@ -454,7 +454,7 @@ static void io_zcrx_return_niov_freelist(struct net_iov *niov)
spin_unlock_bh(&area->freelist_lock);
}
-static void io_zcrx_return_niov(struct net_iov *niov)
+static void io_zcrx_return_niov(struct netmem_desc *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
@@ -476,7 +476,7 @@ static void io_zcrx_scrub(struct io_zcrx_ifq *ifq)
/* Reclaim back all buffers given to the user space. */
for (i = 0; i < area->nia.num_niovs; i++) {
- struct net_iov *niov = &area->nia.niovs[i];
+ struct netmem_desc *niov = &area->nia.niovs[i];
int nr;
if (!atomic_read(io_get_user_counter(niov)))
@@ -532,7 +532,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
do {
struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask);
struct io_zcrx_area *area;
- struct net_iov *niov;
+ struct netmem_desc *niov;
unsigned niov_idx, area_idx;
area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
@@ -573,7 +573,7 @@ static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq)
spin_lock_bh(&area->freelist_lock);
while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) {
- struct net_iov *niov = __io_zcrx_get_free_niov(area);
+ struct netmem_desc *niov = __io_zcrx_get_free_niov(area);
netmem_ref netmem = net_iov_to_netmem(niov);
net_mp_niov_set_page_pool(pp, niov);
@@ -604,7 +604,7 @@ static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp)
static bool io_pp_zc_release_netmem(struct page_pool *pp, netmem_ref netmem)
{
- struct net_iov *niov;
+ struct netmem_desc *niov;
if (WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
return false;
@@ -678,7 +678,7 @@ static const struct memory_provider_ops io_uring_pp_zc_ops = {
.uninstall = io_pp_uninstall,
};
-static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
+static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct netmem_desc *niov,
struct io_zcrx_ifq *ifq, int off, int len)
{
struct io_uring_zcrx_cqe *rcqe;
@@ -701,9 +701,9 @@ static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
return true;
}
-static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
+static struct netmem_desc *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
{
- struct net_iov *niov = NULL;
+ struct netmem_desc *niov = NULL;
spin_lock_bh(&area->freelist_lock);
if (area->free_count)
@@ -726,7 +726,7 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
while (len) {
size_t copy_size = min_t(size_t, PAGE_SIZE, len);
const int dst_off = 0;
- struct net_iov *niov;
+ struct netmem_desc *niov;
struct page *dst_page;
void *dst_addr;
@@ -784,7 +784,7 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
const skb_frag_t *frag, int off, int len)
{
- struct net_iov *niov;
+ struct netmem_desc *niov;
if (unlikely(!skb_frag_is_net_iov(frag)))
return io_zcrx_copy_frag(req, ifq, frag, off, len);
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 6e27a47d04935..5568478dd2ff6 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -28,7 +28,7 @@ static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
static const struct memory_provider_ops dmabuf_devmem_ops;
-bool net_is_devmem_iov(struct net_iov *niov)
+bool net_is_devmem_iov(struct netmem_desc *niov)
{
return niov->pp->mp_ops == &dmabuf_devmem_ops;
}
@@ -43,7 +43,7 @@ static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
kfree(owner);
}
-static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
+static dma_addr_t net_devmem_get_dma_addr(const struct netmem_desc *niov)
{
struct dmabuf_genpool_chunk_owner *owner;
@@ -74,12 +74,12 @@ void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding)
kfree(binding);
}
-struct net_iov *
+struct netmem_desc *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
{
struct dmabuf_genpool_chunk_owner *owner;
unsigned long dma_addr;
- struct net_iov *niov;
+ struct netmem_desc *niov;
ssize_t offset;
ssize_t index;
@@ -99,7 +99,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
return niov;
}
-void net_devmem_free_dmabuf(struct net_iov *niov)
+void net_devmem_free_dmabuf(struct netmem_desc *niov)
{
struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov);
unsigned long dma_addr = net_devmem_get_dma_addr(niov);
@@ -233,7 +233,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
dma_addr_t dma_addr = sg_dma_address(sg);
struct dmabuf_genpool_chunk_owner *owner;
size_t len = sg_dma_len(sg);
- struct net_iov *niov;
+ struct netmem_desc *niov;
owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
dev_to_node(&dev->dev));
@@ -319,7 +319,7 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp)
{
struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
- struct net_iov *niov;
+ struct netmem_desc *niov;
netmem_ref netmem;
niov = net_devmem_alloc_dmabuf(binding);
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 7fc158d527293..314ab0acdf716 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -70,7 +70,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
struct netlink_ext_ack *extack);
static inline struct dmabuf_genpool_chunk_owner *
-net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
+net_devmem_iov_to_chunk_owner(const struct netmem_desc *niov)
{
struct net_iov_area *owner = net_iov_owner(niov);
@@ -78,17 +78,17 @@ net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
}
static inline struct net_devmem_dmabuf_binding *
-net_devmem_iov_binding(const struct net_iov *niov)
+net_devmem_iov_binding(const struct netmem_desc *niov)
{
return net_devmem_iov_to_chunk_owner(niov)->binding;
}
-static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
+static inline u32 net_devmem_iov_binding_id(const struct netmem_desc *niov)
{
return net_devmem_iov_binding(niov)->id;
}
-static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
+static inline unsigned long net_iov_virtual_addr(const struct netmem_desc *niov)
{
struct net_iov_area *owner = net_iov_owner(niov);
@@ -111,11 +111,11 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding)
__net_devmem_dmabuf_binding_free(binding);
}
-struct net_iov *
+struct netmem_desc *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
-void net_devmem_free_dmabuf(struct net_iov *ppiov);
+void net_devmem_free_dmabuf(struct netmem_desc *ppiov);
-bool net_is_devmem_iov(struct net_iov *niov);
+bool net_is_devmem_iov(struct netmem_desc *niov);
#else
struct net_devmem_dmabuf_binding;
@@ -146,27 +146,27 @@ net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return -EOPNOTSUPP;
}
-static inline struct net_iov *
+static inline struct netmem_desc *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
{
return NULL;
}
-static inline void net_devmem_free_dmabuf(struct net_iov *ppiov)
+static inline void net_devmem_free_dmabuf(struct netmem_desc *ppiov)
{
}
-static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
+static inline unsigned long net_iov_virtual_addr(const struct netmem_desc *niov)
{
return 0;
}
-static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
+static inline u32 net_devmem_iov_binding_id(const struct netmem_desc *niov)
{
return 0;
}
-static inline bool net_is_devmem_iov(struct net_iov *niov)
+static inline bool net_is_devmem_iov(struct netmem_desc *niov)
{
return false;
}
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2d..575fdab337414 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1198,7 +1198,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
}
EXPORT_SYMBOL(page_pool_update_nid);
-bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr)
+bool net_mp_niov_set_dma_addr(struct netmem_desc *niov, dma_addr_t addr)
{
return page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), addr);
}
@@ -1206,7 +1206,7 @@ bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr)
/* Associate a niov with a page pool. Should follow with a matching
* net_mp_niov_clear_page_pool()
*/
-void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov)
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct netmem_desc *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
@@ -1219,7 +1219,7 @@ void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov)
/* Disassociate a niov from a page pool. Should only be used in the
* ->release_netmem() path.
*/
-void net_mp_niov_clear_page_pool(struct net_iov *niov)
+void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6edc441b37023..76199c832ef82 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2483,7 +2483,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
*/
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- struct net_iov *niov;
+ struct netmem_desc *niov;
u64 frag_offset;
int end;
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
2025-05-09 11:51 ` [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 13:39 ` Mina Almasry
2025-05-09 11:51 ` [RFC 03/19] page_pool: use netmem alloc/put API in __page_pool_alloc_page_order() Byungchul Park
` (17 subsequent siblings)
19 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
To eliminate the use of struct page in page pool, the page pool code
should use netmem descriptor and API instead.
As part of the work, introduce netmem alloc/put API allowing the code to
use them rather than struct page things.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/netmem.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 45c209d6cc689..c87a604e980b9 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -138,6 +138,24 @@ static inline netmem_ref page_to_netmem(struct page *page)
return (__force netmem_ref)page;
}
+static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
+ unsigned int order)
+{
+ return page_to_netmem(alloc_pages_node(nid, gfp_mask, order));
+}
+
+static inline unsigned long alloc_netmems_bulk_node(gfp_t gfp, int nid,
+ unsigned long nr_netmems, netmem_ref *netmem_array)
+{
+ return alloc_pages_bulk_node(gfp, nid, nr_netmems,
+ (struct page **)netmem_array);
+}
+
+static inline void put_netmem(netmem_ref netmem)
+{
+ put_page(netmem_to_page(netmem));
+}
+
/**
* virt_to_netmem - convert virtual memory pointer to a netmem reference
* @data: host memory pointer to convert
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 03/19] page_pool: use netmem alloc/put API in __page_pool_alloc_page_order()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
2025-05-09 11:51 ` [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc Byungchul Park
2025-05-09 11:51 ` [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 04/19] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
` (16 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Use netmem alloc/put API instead of page alloc/put API and make it
return netmem_ref instead of struct page * in
__page_pool_alloc_page_order().
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 575fdab337414..9f5e07a15f707 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -498,29 +498,29 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
return false;
}
-static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
+static netmem_ref __page_pool_alloc_page_order(struct page_pool *pool,
gfp_t gfp)
{
- struct page *page;
+ netmem_ref netmem;
gfp |= __GFP_COMP;
- page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);
- if (unlikely(!page))
- return NULL;
+ netmem = alloc_netmems_node(pool->p.nid, gfp, pool->p.order);
+ if (unlikely(!netmem))
+ return 0;
- if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page_to_netmem(page)))) {
- put_page(page);
- return NULL;
+ if (pool->dma_map && unlikely(!page_pool_dma_map(pool, netmem))) {
+ put_netmem(netmem);
+ return 0;
}
alloc_stat_inc(pool, slow_high_order);
- page_pool_set_pp_info(pool, page_to_netmem(page));
+ page_pool_set_pp_info(pool, netmem);
/* Track how many pages are held 'in-flight' */
pool->pages_state_hold_cnt++;
- trace_page_pool_state_hold(pool, page_to_netmem(page),
+ trace_page_pool_state_hold(pool, netmem,
pool->pages_state_hold_cnt);
- return page;
+ return netmem;
}
/* slow path */
@@ -535,7 +535,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
- return page_to_netmem(__page_pool_alloc_page_order(pool, gfp));
+ return __page_pool_alloc_page_order(pool, gfp);
/* Unnecessary as alloc cache is empty, but guarantees zero count */
if (unlikely(pool->alloc.count > 0))
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 04/19] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (2 preceding siblings ...)
2025-05-09 11:51 ` [RFC 03/19] page_pool: use netmem alloc/put API in __page_pool_alloc_page_order() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 05/19] page_pool: use netmem alloc/put API in __page_pool_alloc_pages_slow() Byungchul Park
` (15 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that __page_pool_alloc_page_order() uses netmem alloc/put API, not
page alloc/put API, rename it to __page_pool_alloc_large_netmem() to
reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9f5e07a15f707..c03caa11fc606 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -498,7 +498,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
return false;
}
-static netmem_ref __page_pool_alloc_page_order(struct page_pool *pool,
+static netmem_ref __page_pool_alloc_large_netmem(struct page_pool *pool,
gfp_t gfp)
{
netmem_ref netmem;
@@ -535,7 +535,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
- return __page_pool_alloc_page_order(pool, gfp);
+ return __page_pool_alloc_large_netmem(pool, gfp);
/* Unnecessary as alloc cache is empty, but guarantees zero count */
if (unlikely(pool->alloc.count > 0))
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 05/19] page_pool: use netmem alloc/put API in __page_pool_alloc_pages_slow()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (3 preceding siblings ...)
2025-05-09 11:51 ` [RFC 04/19] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 06/19] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
` (14 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Use netmem alloc/put API instead of page alloc/put API in
__page_pool_alloc_pages_slow().
While at it, improved some comments.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index c03caa11fc606..57ad133e6dc8c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -531,7 +531,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
unsigned int pp_order = pool->p.order;
bool dma_map = pool->dma_map;
netmem_ref netmem;
- int i, nr_pages;
+ int i, nr_netmems;
/* Don't support bulk alloc for high-order pages */
if (unlikely(pp_order))
@@ -541,21 +541,21 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
if (unlikely(pool->alloc.count > 0))
return pool->alloc.cache[--pool->alloc.count];
- /* Mark empty alloc.cache slots "empty" for alloc_pages_bulk */
+ /* Mark empty alloc.cache slots "empty" for alloc_netmems_bulk_node() */
memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
- nr_pages = alloc_pages_bulk_node(gfp, pool->p.nid, bulk,
- (struct page **)pool->alloc.cache);
- if (unlikely(!nr_pages))
+ nr_netmems = alloc_netmems_bulk_node(gfp, pool->p.nid, bulk,
+ pool->alloc.cache);
+ if (unlikely(!nr_netmems))
return 0;
- /* Pages have been filled into alloc.cache array, but count is zero and
- * page element have not been (possibly) DMA mapped.
+ /* Netmems have been filled into alloc.cache array, but count is
+ * zero and elements have not been (possibly) DMA mapped.
*/
- for (i = 0; i < nr_pages; i++) {
+ for (i = 0; i < nr_netmems; i++) {
netmem = pool->alloc.cache[i];
if (dma_map && unlikely(!page_pool_dma_map(pool, netmem))) {
- put_page(netmem_to_page(netmem));
+ put_netmem(netmem);
continue;
}
@@ -567,7 +567,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
pool->pages_state_hold_cnt);
}
- /* Return last page */
+ /* Return the last netmem */
if (likely(pool->alloc.count > 0)) {
netmem = pool->alloc.cache[--pool->alloc.count];
alloc_stat_inc(pool, slow);
@@ -575,7 +575,8 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
netmem = 0;
}
- /* When page just alloc'ed is should/must have refcnt 1. */
+ /* When a netmem has been just allocated, it should/must have
+ * refcnt 1. */
return netmem;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 06/19] page_pool: rename page_pool_return_page() to page_pool_return_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (4 preceding siblings ...)
2025-05-09 11:51 ` [RFC 05/19] page_pool: use netmem alloc/put API in __page_pool_alloc_pages_slow() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 07/19] page_pool: use netmem alloc/put API in page_pool_return_netmem() Byungchul Park
` (13 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that page_pool_return_page() is for returning netmem, not struct
page, rename it to page_pool_return_netmem() to reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 57ad133e6dc8c..bd43026ed7232 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -374,7 +374,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
}
EXPORT_SYMBOL(page_pool_create);
-static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem);
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem);
static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
{
@@ -412,7 +412,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
* (2) break out to fallthrough to alloc_pages_node.
* This limit stress on page buddy alloactor.
*/
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
alloc_stat_inc(pool, waive);
netmem = 0;
break;
@@ -679,7 +679,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
* a regular page (that will eventually be returned to the normal
* page-allocator via put_page).
*/
-void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
+static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
{
int count;
bool put;
@@ -796,7 +796,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
* will be invoking put_page.
*/
recycle_stat_inc(pool, released_refcnt);
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
return 0;
}
@@ -835,7 +835,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
/* Cache full, fallback to free pages */
recycle_stat_inc(pool, ring_full);
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
EXPORT_SYMBOL(page_pool_put_unrefed_netmem);
@@ -878,7 +878,7 @@ static void page_pool_recycle_ring_bulk(struct page_pool *pool,
* since put_page() with refcnt == 1 can be an expensive operation.
*/
for (; i < bulk_len; i++)
- page_pool_return_page(pool, bulk[i]);
+ page_pool_return_netmem(pool, bulk[i]);
}
/**
@@ -961,7 +961,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
return netmem;
}
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
return 0;
}
@@ -975,7 +975,7 @@ static void page_pool_free_frag(struct page_pool *pool)
if (!netmem || page_pool_unref_netmem(netmem, drain_count))
return;
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
@@ -1042,7 +1042,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
pr_crit("%s() page_pool refcnt %d violation\n",
__func__, netmem_ref_count(netmem));
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
@@ -1075,7 +1075,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool)
*/
while (pool->alloc.count) {
netmem = pool->alloc.cache[--pool->alloc.count];
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
@@ -1194,7 +1194,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
/* Flush pool alloc cache, as refill will check NUMA node */
while (pool->alloc.count) {
netmem = pool->alloc.cache[--pool->alloc.count];
- page_pool_return_page(pool, netmem);
+ page_pool_return_netmem(pool, netmem);
}
}
EXPORT_SYMBOL(page_pool_update_nid);
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 07/19] page_pool: use netmem alloc/put API in page_pool_return_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (5 preceding siblings ...)
2025-05-09 11:51 ` [RFC 06/19] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 08/19] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
` (12 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Use netmem alloc/put API, put_netmem(), instead of put_page() in
page_pool_return_netmem().
While at it, delete #include <linux/mm.h> since the last put_page() in
page_pool.c has been just removed with this patch.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index bd43026ed7232..311d0ef620ea1 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -20,7 +20,6 @@
#include <linux/dma-direction.h>
#include <linux/dma-mapping.h>
#include <linux/page-flags.h>
-#include <linux/mm.h> /* for put_page() */
#include <linux/poison.h>
#include <linux/ethtool.h>
#include <linux/netdevice.h>
@@ -677,7 +676,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
/* Disconnects a page (from a page_pool). API users can have a need
* to disconnect a page (from a page_pool), to allow it to be used as
* a regular page (that will eventually be returned to the normal
- * page-allocator via put_page).
+ * page-allocator via put_netmem() and then put_page()).
*/
static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
{
@@ -698,7 +697,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
if (put) {
page_pool_clear_pp_info(netmem);
- put_page(netmem_to_page(netmem));
+ put_netmem(netmem);
}
/* An optimization would be to call __free_pages(page, pool->p.order)
* knowing page is not part of page-cache (thus avoiding a
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 08/19] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (6 preceding siblings ...)
2025-05-09 11:51 ` [RFC 07/19] page_pool: use netmem alloc/put API in page_pool_return_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 09/19] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
` (11 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that __page_pool_release_page_dma() is for releasing netmem, not
struct page, rename it to __page_pool_release_netmem_dma() to reflect
what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 311d0ef620ea1..47164d561d1aa 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -653,7 +653,7 @@ void page_pool_clear_pp_info(netmem_ref netmem)
netmem_set_pp(netmem, NULL);
}
-static __always_inline void __page_pool_release_page_dma(struct page_pool *pool,
+static __always_inline void __page_pool_release_netmem_dma(struct page_pool *pool,
netmem_ref netmem)
{
dma_addr_t dma;
@@ -687,7 +687,7 @@ static void page_pool_return_netmem(struct page_pool *pool, netmem_ref netmem)
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
put = pool->mp_ops->release_netmem(pool, netmem);
else
- __page_pool_release_page_dma(pool, netmem);
+ __page_pool_release_netmem_dma(pool, netmem);
/* This may be the last page returned, releasing the pool, so
* it is not safe to reference pool afterwards.
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 09/19] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (7 preceding siblings ...)
2025-05-09 11:51 ` [RFC 08/19] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 10/19] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
` (10 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that __page_pool_put_page() puts netmem, not struct page, rename it
to __page_pool_put_netmem() to reflect what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 47164d561d1aa..f858a5518b7a4 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -755,7 +755,7 @@ static bool __page_pool_page_can_be_recycled(netmem_ref netmem)
* subsystem.
*/
static __always_inline netmem_ref
-__page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
+__page_pool_put_netmem(struct page_pool *pool, netmem_ref netmem,
unsigned int dma_sync_size, bool allow_direct)
{
lockdep_assert_no_hardirq();
@@ -811,7 +811,7 @@ static bool page_pool_napi_local(const struct page_pool *pool)
/* Allow direct recycle if we have reasons to believe that we are
* in the same context as the consumer would run, so there's
* no possible race.
- * __page_pool_put_page() makes sure we're not in hardirq context
+ * __page_pool_put_netmem() makes sure we're not in hardirq context
* and interrupts are enabled prior to accessing the cache.
*/
cpuid = smp_processor_id();
@@ -830,7 +830,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
allow_direct = page_pool_napi_local(pool);
netmem =
- __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct);
+ __page_pool_put_netmem(pool, netmem, dma_sync_size, allow_direct);
if (netmem && !page_pool_recycle_in_ring(pool, netmem)) {
/* Cache full, fallback to free pages */
recycle_stat_inc(pool, ring_full);
@@ -931,7 +931,7 @@ void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
continue;
}
- netmem = __page_pool_put_page(pool, netmem, -1,
+ netmem = __page_pool_put_netmem(pool, netmem, -1,
allow_direct);
/* Approved for bulk recycling in ptr_ring cache */
if (netmem)
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 10/19] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (8 preceding siblings ...)
2025-05-09 11:51 ` [RFC 09/19] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 11/19] mlx4: use netmem descriptor and API for page pool Byungchul Park
` (9 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that __page_pool_alloc_pages_slow() is for allocating netmem, not
struct page, rename it to __page_pool_alloc_netmems_slow() to reflect
what it does.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f858a5518b7a4..b61c1038f4c68 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -523,7 +523,7 @@ static netmem_ref __page_pool_alloc_large_netmem(struct page_pool *pool,
}
/* slow path */
-static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
+static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
gfp_t gfp)
{
const int bulk = PP_ALLOC_CACHE_REFILL;
@@ -595,7 +595,7 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
netmem = pool->mp_ops->alloc_netmems(pool, gfp);
else
- netmem = __page_pool_alloc_pages_slow(pool, gfp);
+ netmem = __page_pool_alloc_netmems_slow(pool, gfp);
return netmem;
}
EXPORT_SYMBOL(page_pool_alloc_netmems);
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 11/19] mlx4: use netmem descriptor and API for page pool
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (9 preceding siblings ...)
2025-05-09 11:51 ` [RFC 10/19] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 12/19] netmem: introduce page_pool_recycle_direct_netmem() Byungchul Park
` (8 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
Use netmem descriptor and API for page pool in mlx4 code.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 46 +++++++++++---------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 ++--
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
3 files changed, 31 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b33285d755b90..82c24931fa443 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -62,18 +62,18 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
int i;
for (i = 0; i < priv->num_frags; i++, frags++) {
- if (!frags->page) {
- frags->page = page_pool_alloc_pages(ring->pp, gfp);
- if (!frags->page) {
+ if (!frags->netmem) {
+ frags->netmem = page_pool_alloc_netmems(ring->pp, gfp);
+ if (!frags->netmem) {
ring->alloc_fail++;
return -ENOMEM;
}
- page_pool_fragment_page(frags->page, 1);
+ page_pool_fragment_netmem(frags->netmem, 1);
frags->page_offset = priv->rx_headroom;
ring->rx_alloc_pages++;
}
- dma = page_pool_get_dma_addr(frags->page);
+ dma = page_pool_get_dma_addr_netmem(frags->netmem);
rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
}
return 0;
@@ -83,10 +83,10 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_alloc *frag)
{
- if (frag->page)
- page_pool_put_full_page(ring->pp, frag->page, false);
+ if (frag->netmem)
+ page_pool_put_full_netmem(ring->pp, frag->netmem, false);
/* We need to clear all fields, otherwise a change of priv->log_rx_info
- * could lead to see garbage later in frag->page.
+ * could lead to see garbage later in frag->netmem.
*/
memset(frag, 0, sizeof(*frag));
}
@@ -440,29 +440,33 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
unsigned int truesize = 0;
bool release = true;
int nr, frag_size;
- struct page *page;
+ netmem_ref netmem;
dma_addr_t dma;
/* Collect used fragments while replacing them in the HW descriptors */
for (nr = 0;; frags++) {
frag_size = min_t(int, length, frag_info->frag_size);
- page = frags->page;
- if (unlikely(!page))
+ netmem = frags->netmem;
+ if (unlikely(!netmem))
goto fail;
- dma = page_pool_get_dma_addr(page);
+ dma = page_pool_get_dma_addr_netmem(netmem);
dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
frag_size, priv->dma_dir);
- __skb_fill_page_desc(skb, nr, page, frags->page_offset,
+ __skb_fill_netmem_desc(skb, nr, netmem, frags->page_offset,
frag_size);
truesize += frag_info->frag_stride;
if (frag_info->frag_stride == PAGE_SIZE / 2) {
+ struct page *page = netmem_to_page(netmem);
+ atomic_long_t *pp_ref_count =
+ netmem_get_pp_ref_count_ref(netmem);
+
frags->page_offset ^= PAGE_SIZE / 2;
release = page_count(page) != 1 ||
- atomic_long_read(&page->pp_ref_count) != 1 ||
+ atomic_long_read(pp_ref_count) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
} else if (!priv->rx_headroom) {
@@ -476,9 +480,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
}
if (release) {
- frags->page = NULL;
+ frags->netmem = 0;
} else {
- page_pool_ref_page(page);
+ page_pool_ref_netmem(netmem);
}
nr++;
@@ -719,7 +723,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
int nr;
frags = ring->rx_info + (index << priv->log_rx_info);
- va = page_address(frags[0].page) + frags[0].page_offset;
+ va = netmem_address(frags[0].netmem) + frags[0].page_offset;
net_prefetchw(va);
/*
* make sure we read the CQE after we read the ownership bit
@@ -748,7 +752,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Get pointer to first fragment since we haven't
* skb yet and cast it to ethhdr struct
*/
- dma = page_pool_get_dma_addr(frags[0].page);
+ dma = page_pool_get_dma_addr_netmem(frags[0].netmem);
dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
DMA_FROM_DEVICE);
@@ -788,7 +792,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
void *orig_data;
u32 act;
- dma = page_pool_get_dma_addr(frags[0].page);
+ dma = page_pool_get_dma_addr_netmem(frags[0].netmem);
dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma,
priv->frag_info[0].frag_size,
@@ -818,7 +822,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (likely(!xdp_do_redirect(dev, &mxbuf.xdp, xdp_prog))) {
ring->xdp_redirect++;
xdp_redir_flush = true;
- frags[0].page = NULL;
+ frags[0].netmem = 0;
goto next;
}
ring->xdp_redirect_fail++;
@@ -828,7 +832,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (likely(!mlx4_en_xmit_frame(ring, frags, priv,
length, cq_ring,
&doorbell_pending))) {
- frags[0].page = NULL;
+ frags[0].netmem = 0;
goto next;
}
trace_xdp_exception(dev, xdp_prog, act);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 87f35bcbeff8f..b564a953da09b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -354,7 +354,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
struct page_pool *pool = ring->recycle_ring->pp;
/* Note that napi_mode = 0 means ndo_close() path, not budget = 0 */
- page_pool_put_full_page(pool, tx_info->page, !!napi_mode);
+ page_pool_put_full_netmem(pool, tx_info->netmem, !!napi_mode);
return tx_info->nr_txbb;
}
@@ -1191,10 +1191,10 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
data = &tx_desc->data;
- dma = page_pool_get_dma_addr(frame->page);
+ dma = page_pool_get_dma_addr_netmem(frame->netmem);
- tx_info->page = frame->page;
- frame->page = NULL;
+ tx_info->netmem = frame->netmem;
+ frame->netmem = 0;
tx_info->map0_dma = dma;
tx_info->nr_bytes = max_t(unsigned int, length, ETH_ZLEN);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ad0d91a751848..3ef9a0a1f783d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -213,7 +213,7 @@ enum cq_type {
struct mlx4_en_tx_info {
union {
struct sk_buff *skb;
- struct page *page;
+ netmem_ref netmem;
};
dma_addr_t map0_dma;
u32 map0_byte_count;
@@ -246,7 +246,7 @@ struct mlx4_en_tx_desc {
#define MLX4_EN_CX3_HIGH_ID 0x1005
struct mlx4_en_rx_alloc {
- struct page *page;
+ netmem_ref netmem;
u32 page_offset;
};
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 12/19] netmem: introduce page_pool_recycle_direct_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (10 preceding siblings ...)
2025-05-09 11:51 ` [RFC 11/19] mlx4: use netmem descriptor and API for page pool Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global Byungchul Park
` (7 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
All the callers to page_pool_recycle_direct() need to be converted to
use netmem descriptor and API instead of struct page things.
As part of the work, introduce page_pool_recycle_direct_netmem()
allowing the callers to use netmem descriptor and API.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/page_pool/helpers.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 582a3d00cbe23..9b7a3a996bbea 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -395,6 +395,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
page_pool_put_full_page(pool, page, true);
}
+static inline void page_pool_recycle_direct_netmem(struct page_pool *pool,
+ netmem_ref netmem)
+{
+ page_pool_put_full_netmem(pool, netmem, true);
+}
+
#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA \
(sizeof(dma_addr_t) > sizeof(unsigned long))
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (11 preceding siblings ...)
2025-05-09 11:51 ` [RFC 12/19] netmem: introduce page_pool_recycle_direct_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-12 12:46 ` Toke Høiland-Jørgensen
2025-05-09 11:51 ` [RFC 14/19] mm: page_alloc: do not directly access page->pp_magic but use is_pp_page() Byungchul Park
` (6 subsequent siblings)
19 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Other than skbuff.c might need to check if a page or netmem is for page
pool, for example, page_alloc.c needs to check the page state, whether
it comes from page pool or not for their own purpose.
Expand the scope of is_pp_netmem() and introduce is_pp_page() newly, so
that those who want to check the source can achieve the checking without
accessing page pool member, page->pp_magic, directly.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/page_pool/types.h | 2 ++
net/core/page_pool.c | 10 ++++++++++
net/core/skbuff.c | 5 -----
3 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 36eb57d73abc6..d3e1a52f01e09 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -299,4 +299,6 @@ static inline bool is_page_pool_compiled_in(void)
/* Caller must provide appropriate safe context, e.g. NAPI. */
void page_pool_update_nid(struct page_pool *pool, int new_nid);
+bool is_pp_netmem(netmem_ref netmem);
+bool is_pp_page(struct page *page);
#endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index b61c1038f4c68..9c553e5a1b555 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1225,3 +1225,13 @@ void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
page_pool_clear_pp_info(netmem);
}
+
+bool is_pp_netmem(netmem_ref netmem)
+{
+ return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
+}
+
+bool is_pp_page(struct page *page)
+{
+ return is_pp_netmem(page_to_netmem(page));
+}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6cbf77bc61fce..11098c204fe3e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
skb_get(list);
}
-static bool is_pp_netmem(netmem_ref netmem)
-{
- return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
-}
-
int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
unsigned int headroom)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 14/19] mm: page_alloc: do not directly access page->pp_magic but use is_pp_page()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (12 preceding siblings ...)
2025-05-09 11:51 ` [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 15/19] mlx5: use netmem descriptor and API for page pool Byungchul Park
` (5 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
To achieve that, all the code should avoid accessing page pool members
of struct page directly, but use safe API for the corresponding purpose,
that is is_pp_page() in this case.
Use is_pp_page() instead of accessing the members directly, when
checking if a page comes from page pool or not.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a6fe1e9b95941..cf672b9ab7086 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -55,6 +55,7 @@
#include <linux/delayacct.h>
#include <linux/cacheinfo.h>
#include <linux/pgalloc_tag.h>
+#include <net/page_pool/types.h> /* for page pool checking */
#include <asm/div64.h>
#include "internal.h"
#include "shuffle.h"
@@ -899,7 +900,7 @@ static inline bool page_expected_state(struct page *page,
page->memcg_data |
#endif
#ifdef CONFIG_PAGE_POOL
- ((page->pp_magic & ~0x3UL) == PP_SIGNATURE) |
+ (is_pp_page(page)) |
#endif
(page->flags & check_flags)))
return false;
@@ -928,7 +929,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
bad_reason = "page still charged to cgroup";
#endif
#ifdef CONFIG_PAGE_POOL
- if (unlikely((page->pp_magic & ~0x3UL) == PP_SIGNATURE))
+ if (unlikely(is_pp_page(page)))
bad_reason = "page_pool leak";
#endif
return bad_reason;
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 15/19] mlx5: use netmem descriptor and API for page pool
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (13 preceding siblings ...)
2025-05-09 11:51 ` [RFC 14/19] mm: page_alloc: do not directly access page->pp_magic but use is_pp_page() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 16/19] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
` (4 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
To simplify struct page, the effort to seperate its own descriptor from
struct page is required and the work for page pool is on going.
Use netmem descriptor and API for page pool in mlx5 code.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 18 ++---
.../net/ethernet/mellanox/mlx5/core/en/xdp.h | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++--
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 66 +++++++++----------
include/linux/skbuff.h | 14 ++++
include/net/page_pool/helpers.h | 4 ++
7 files changed, 73 insertions(+), 50 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 32ed4963b8ada..c3992b9961540 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -551,7 +551,7 @@ struct mlx5e_icosq {
} ____cacheline_aligned_in_smp;
struct mlx5e_frag_page {
- struct page *page;
+ netmem_ref netmem;
u16 frags;
};
@@ -623,7 +623,7 @@ struct mlx5e_dma_info {
dma_addr_t addr;
union {
struct mlx5e_frag_page *frag_page;
- struct page *page;
+ netmem_ref netmem;
};
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f803e1c935900..886ed930d6a1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -61,7 +61,7 @@ static inline bool
mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
struct xdp_buff *xdp)
{
- struct page *page = virt_to_page(xdp->data);
+ netmem_ref netmem = virt_to_netmem(xdp->data);
struct mlx5e_xmit_data_frags xdptxdf = {};
struct mlx5e_xmit_data *xdptxd;
struct xdp_frame *xdpf;
@@ -122,7 +122,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
* mode.
*/
- dma_addr = page_pool_get_dma_addr(page) + (xdpf->data - (void *)xdpf);
+ dma_addr = page_pool_get_dma_addr_netmem(netmem) + (xdpf->data - (void *)xdpf);
dma_sync_single_for_device(sq->pdev, dma_addr, xdptxd->len, DMA_BIDIRECTIONAL);
if (xdptxd->has_frags) {
@@ -134,7 +134,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
dma_addr_t addr;
u32 len;
- addr = page_pool_get_dma_addr(skb_frag_page(frag)) +
+ addr = page_pool_get_dma_addr_netmem(skb_frag_netmem(frag)) +
skb_frag_off(frag);
len = skb_frag_size(frag);
dma_sync_single_for_device(sq->pdev, addr, len,
@@ -157,19 +157,19 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
(union mlx5e_xdp_info)
{ .page.num = 1 + xdptxdf.sinfo->nr_frags });
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
- (union mlx5e_xdp_info) { .page.page = page });
+ (union mlx5e_xdp_info) { .page.netmem = netmem });
for (i = 0; i < xdptxdf.sinfo->nr_frags; i++) {
skb_frag_t *frag = &xdptxdf.sinfo->frags[i];
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
(union mlx5e_xdp_info)
- { .page.page = skb_frag_page(frag) });
+ { .page.netmem = skb_frag_netmem(frag) });
}
} else {
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
(union mlx5e_xdp_info) { .page.num = 1 });
mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo,
- (union mlx5e_xdp_info) { .page.page = page });
+ (union mlx5e_xdp_info) { .page.netmem = netmem });
}
return true;
@@ -702,15 +702,15 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
num = xdpi.page.num;
do {
- struct page *page;
+ netmem_ref netmem;
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
- page = xdpi.page.page;
+ netmem = xdpi.page.netmem;
/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
* as we know this is a page_pool page.
*/
- page_pool_recycle_direct(page->pp, page);
+ page_pool_recycle_direct_netmem(netmem_get_pp(netmem), netmem);
} while (++n < num);
break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index 446e492c6bb8e..b37541837efba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -96,7 +96,7 @@ union mlx5e_xdp_info {
union {
struct mlx5e_rq *rq;
u8 num;
- struct page *page;
+ netmem_ref netmem;
} page;
struct xsk_tx_metadata_compl xsk_meta;
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3506024c24539..c152fd454f605 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -708,24 +708,29 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
{
- rq->wqe_overflow.page = alloc_page(GFP_KERNEL);
- if (!rq->wqe_overflow.page)
+ struct page *page = alloc_page(GFP_KERNEL);
+
+ if (!page)
return -ENOMEM;
- rq->wqe_overflow.addr = dma_map_page(rq->pdev, rq->wqe_overflow.page, 0,
+ rq->wqe_overflow.addr = dma_map_page(rq->pdev, page, 0,
PAGE_SIZE, rq->buff.map_dir);
if (dma_mapping_error(rq->pdev, rq->wqe_overflow.addr)) {
- __free_page(rq->wqe_overflow.page);
+ __free_page(page);
return -ENOMEM;
}
+
+ rq->wqe_overflow.netmem = page_to_netmem(page);
return 0;
}
static void mlx5e_free_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
{
+ struct page *page = netmem_to_page(rq->wqe_overflow.netmem);
+
dma_unmap_page(rq->pdev, rq->wqe_overflow.addr, PAGE_SIZE,
rq->buff.map_dir);
- __free_page(rq->wqe_overflow.page);
+ __free_page(page);
}
static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5fd70b4d55beb..ce7052287b2ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -276,16 +276,16 @@ static inline u32 mlx5e_decompress_cqes_start(struct mlx5e_rq *rq,
static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq,
struct mlx5e_frag_page *frag_page)
{
- struct page *page;
+ netmem_ref netmem;
- page = page_pool_dev_alloc_pages(rq->page_pool);
- if (unlikely(!page))
+ netmem = page_pool_dev_alloc_netmem(rq->page_pool, NULL, NULL);
+ if (unlikely(!netmem))
return -ENOMEM;
- page_pool_fragment_page(page, MLX5E_PAGECNT_BIAS_MAX);
+ page_pool_fragment_netmem(netmem, MLX5E_PAGECNT_BIAS_MAX);
*frag_page = (struct mlx5e_frag_page) {
- .page = page,
+ .netmem = netmem,
.frags = 0,
};
@@ -296,10 +296,10 @@ static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
struct mlx5e_frag_page *frag_page)
{
u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
- struct page *page = frag_page->page;
+ netmem_ref netmem = frag_page->netmem;
- if (page_pool_unref_page(page, drain_count) == 0)
- page_pool_put_unrefed_page(rq->page_pool, page, -1, true);
+ if (page_pool_unref_netmem(netmem, drain_count) == 0)
+ page_pool_put_unrefed_netmem(rq->page_pool, netmem, -1, true);
}
static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
@@ -358,7 +358,7 @@ static int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct mlx5e_rx_wqe_cyc *wqe,
frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE);
headroom = i == 0 ? rq->buff.headroom : 0;
- addr = page_pool_get_dma_addr(frag->frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag->frag_page->netmem);
wqe->data[i].addr = cpu_to_be64(addr + frag->offset + headroom);
}
@@ -501,7 +501,7 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
{
skb_frag_t *frag;
- dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len, rq->buff.map_dir);
if (!xdp_buff_has_frags(xdp)) {
@@ -514,9 +514,9 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
}
frag = &sinfo->frags[sinfo->nr_frags++];
- skb_frag_fill_page_desc(frag, frag_page->page, frag_offset, len);
+ skb_frag_fill_netmem_desc(frag, frag_page->netmem, frag_offset, len);
- if (page_is_pfmemalloc(frag_page->page))
+ if (netmem_is_pfmemalloc(frag_page->netmem))
xdp_buff_set_frag_pfmemalloc(xdp);
sinfo->xdp_frags_size += len;
}
@@ -527,27 +527,27 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
u32 frag_offset, u32 len,
unsigned int truesize)
{
- dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
u8 next_frag = skb_shinfo(skb)->nr_frags;
dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
rq->buff.map_dir);
- if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
+ if (skb_can_coalesce_netmem(skb, next_frag, frag_page->netmem, frag_offset)) {
skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
} else {
frag_page->frags++;
- skb_add_rx_frag(skb, next_frag, frag_page->page,
+ skb_add_rx_frag_netmem(skb, next_frag, frag_page->netmem,
frag_offset, len, truesize);
}
}
static inline void
mlx5e_copy_skb_header(struct mlx5e_rq *rq, struct sk_buff *skb,
- struct page *page, dma_addr_t addr,
+ netmem_ref netmem, dma_addr_t addr,
int offset_from, int dma_offset, u32 headlen)
{
- const void *from = page_address(page) + offset_from;
+ const void *from = netmem_address(netmem) + offset_from;
/* Aligning len to sizeof(long) optimizes memcpy performance */
unsigned int len = ALIGN(headlen, sizeof(long));
@@ -684,7 +684,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
goto err_unmap;
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) {
header_offset = mlx5e_shampo_hd_offset(index++);
@@ -794,7 +794,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
err = mlx5e_page_alloc_fragmented(rq, frag_page);
if (unlikely(err))
goto err_unmap;
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
.ptag = cpu_to_be64(addr | MLX5_EN_WR),
};
@@ -1212,7 +1212,7 @@ static void *mlx5e_shampo_get_packet_hd(struct mlx5e_rq *rq, u16 header_index)
struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
u16 head_offset = mlx5e_shampo_hd_offset(header_index) + rq->buff.headroom;
- return page_address(frag_page->page) + head_offset;
+ return netmem_address(frag_page->netmem) + head_offset;
}
static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *ipv4)
@@ -1673,11 +1673,11 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
dma_addr_t addr;
u32 frag_size;
- va = page_address(frag_page->page) + wi->offset;
+ va = netmem_address(frag_page->netmem) + wi->offset;
data = va + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
frag_size, rq->buff.map_dir);
net_prefetch(data);
@@ -1727,10 +1727,10 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
frag_page = wi->frag_page;
- va = page_address(frag_page->page) + wi->offset;
+ va = netmem_address(frag_page->netmem) + wi->offset;
frag_consumed_bytes = min_t(u32, frag_info->frag_size, cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
rq->buff.frame0_sz, rq->buff.map_dir);
net_prefetchw(va); /* xdp_frame data area */
@@ -2000,12 +2000,12 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
if (prog) {
/* area for bpf_xdp_[store|load]_bytes */
- net_prefetchw(page_address(frag_page->page) + frag_offset);
+ net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
if (unlikely(mlx5e_page_alloc_fragmented(rq, &wi->linear_page))) {
rq->stats->buff_alloc_err++;
return NULL;
}
- va = page_address(wi->linear_page.page);
+ va = netmem_address(wi->linear_page.netmem);
net_prefetchw(va); /* xdp_frame data area */
linear_hr = XDP_PACKET_HEADROOM;
linear_data_len = 0;
@@ -2110,8 +2110,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
while (++pagep < frag_page);
}
/* copy header */
- addr = page_pool_get_dma_addr(head_page->page);
- mlx5e_copy_skb_header(rq, skb, head_page->page, addr,
+ addr = page_pool_get_dma_addr_netmem(head_page->netmem);
+ mlx5e_copy_skb_header(rq, skb, head_page->netmem, addr,
head_offset, head_offset, headlen);
/* skb linear part was allocated with headlen and aligned to long */
skb->tail += headlen;
@@ -2141,11 +2141,11 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
return NULL;
}
- va = page_address(frag_page->page) + head_offset;
+ va = netmem_address(frag_page->netmem) + head_offset;
data = va + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
- addr = page_pool_get_dma_addr(frag_page->page);
+ addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
dma_sync_single_range_for_cpu(rq->pdev, addr, head_offset,
frag_size, rq->buff.map_dir);
net_prefetch(data);
@@ -2184,7 +2184,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
struct mlx5_cqe64 *cqe, u16 header_index)
{
struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
- dma_addr_t page_dma_addr = page_pool_get_dma_addr(frag_page->page);
+ dma_addr_t page_dma_addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
u16 head_offset = mlx5e_shampo_hd_offset(header_index);
dma_addr_t dma_addr = page_dma_addr + head_offset;
u16 head_size = cqe->shampo.header_size;
@@ -2193,7 +2193,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
void *hdr, *data;
u32 frag_size;
- hdr = page_address(frag_page->page) + head_offset;
+ hdr = netmem_address(frag_page->netmem) + head_offset;
data = hdr + rx_headroom;
frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + head_size);
@@ -2218,7 +2218,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
}
net_prefetchw(skb->data);
- mlx5e_copy_skb_header(rq, skb, frag_page->page, dma_addr,
+ mlx5e_copy_skb_header(rq, skb, frag_page->netmem, dma_addr,
head_offset + rx_headroom,
rx_headroom, head_size);
/* skb linear part was allocated with headlen and aligned to long */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bf67c47319a56..afec5ebed4372 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3882,6 +3882,20 @@ static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
return false;
}
+static inline bool skb_can_coalesce_netmem(struct sk_buff *skb, int i,
+ const netmem_ref netmem, int off)
+{
+ if (skb_zcopy(skb))
+ return false;
+ if (i) {
+ const skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];
+
+ return netmem == skb_frag_netmem(frag) &&
+ off == skb_frag_off(frag) + skb_frag_size(frag);
+ }
+ return false;
+}
+
static inline int __skb_linearize(struct sk_buff *skb)
{
return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 9b7a3a996bbea..4deb0b32e4bac 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -150,6 +150,10 @@ static inline netmem_ref page_pool_dev_alloc_netmem(struct page_pool *pool,
{
gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
+ WARN_ON((!offset && size) || (offset && !size));
+ if (!offset || !size)
+ return page_pool_alloc_netmems(pool, gfp);
+
return page_pool_alloc_netmem(pool, offset, size, gfp);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 16/19] netmem: use _Generic to cover const casting for page_to_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (14 preceding siblings ...)
2025-05-09 11:51 ` [RFC 15/19] mlx5: use netmem descriptor and API for page pool Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 11:51 ` [RFC 17/19] netmem: remove __netmem_get_pp() Byungchul Park
` (3 subsequent siblings)
19 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
The current page_to_netmem() doesn't cover const casting resulting in
trying to cast const struct page * to const netmem_ref fails.
To cover the case, change page_to_netmem() to use macro and _Generic.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/netmem.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index c87a604e980b9..ce3765e675d19 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -133,10 +133,9 @@ static inline netmem_ref net_iov_to_netmem(struct netmem_desc *niov)
return (__force netmem_ref)((unsigned long)niov | NET_IOV);
}
-static inline netmem_ref page_to_netmem(struct page *page)
-{
- return (__force netmem_ref)page;
-}
+#define page_to_netmem(p) (_Generic((p), \
+ const struct page *: (__force const netmem_ref)(p), \
+ struct page *: (__force netmem_ref)(p)))
static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
unsigned int order)
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 17/19] netmem: remove __netmem_get_pp()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (15 preceding siblings ...)
2025-05-09 11:51 ` [RFC 16/19] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 13:47 ` Mina Almasry
2025-05-09 11:51 ` [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
` (2 subsequent siblings)
19 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
There are no users of __netmem_get_pp(). Remove it.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/netmem.h | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index ce3765e675d19..00064e766b889 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -190,22 +190,6 @@ static inline struct netmem_desc *__netmem_clear_lsb(netmem_ref netmem)
return (struct netmem_desc *)((__force unsigned long)netmem & ~NET_IOV);
}
-/**
- * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
- * @netmem: netmem reference to get the pointer from
- *
- * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
- * e.g. when it's a header buffer, performs faster and generates smaller
- * object code (avoids clearing the LSB). When @netmem points to IOV,
- * provokes invalid memory access.
- *
- * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
- */
-static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
-{
- return __netmem_to_page(netmem)->pp;
-}
-
static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
{
return __netmem_clear_lsb(netmem)->pp;
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (16 preceding siblings ...)
2025-05-09 11:51 ` [RFC 17/19] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 13:49 ` Mina Almasry
2025-05-10 7:28 ` Ilias Apalodimas
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-09 14:09 ` [RFC 00/19] Split netmem from " Mina Almasry
19 siblings, 2 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
The page pool members in struct page cannot be removed unless it's not
allowed to access any of them via struct page.
Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
just wrap page_pool_get_dma_addr_netmem() safely.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/net/page_pool/helpers.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 4deb0b32e4bac..7e0395c70bfa2 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -441,12 +441,7 @@ static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem)
*/
static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
{
- dma_addr_t ret = page->dma_addr;
-
- if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
- ret <<= PAGE_SHIFT;
-
- return ret;
+ return page_pool_get_dma_addr_netmem(page_to_netmem(page));
}
static inline void __page_pool_dma_sync_for_cpu(const struct page_pool *pool,
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (17 preceding siblings ...)
2025-05-09 11:51 ` [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
@ 2025-05-09 11:51 ` Byungchul Park
2025-05-09 17:32 ` Mina Almasry
` (2 more replies)
2025-05-09 14:09 ` [RFC 00/19] Split netmem from " Mina Almasry
19 siblings, 3 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-09 11:51 UTC (permalink / raw)
To: willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Now that all the users of the page pool members in struct page have been
gone, the members can be removed from struct page. However, the space
in struct page needs to be kept using a place holder with the same size,
until struct netmem_desc has its own instance, not overlayed onto struct
page, to avoid conficting with other members within struct page.
Remove the page pool members in struct page and replace with a place
holder. The place holder should be removed once struct netmem_desc has
its own instance.
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
include/linux/mm_types.h | 13 ++-----------
include/net/netmem.h | 35 +----------------------------------
include/net/netmem_type.h | 22 ++++++++++++++++++++++
3 files changed, 25 insertions(+), 45 deletions(-)
create mode 100644 include/net/netmem_type.h
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e76bade9ebb12..69904a0855358 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -20,6 +20,7 @@
#include <linux/seqlock.h>
#include <linux/percpu_counter.h>
#include <linux/types.h>
+#include <net/netmem_type.h> /* for page pool */
#include <asm/mmu.h>
@@ -118,17 +119,7 @@ struct page {
*/
unsigned long private;
};
- struct { /* page_pool used by netstack */
- /**
- * @pp_magic: magic value to avoid recycling non
- * page_pool allocated pages.
- */
- unsigned long pp_magic;
- struct page_pool *pp;
- unsigned long _pp_mapping_pad;
- unsigned long dma_addr;
- atomic_long_t pp_ref_count;
- };
+ struct __netmem_desc place_holder_1; /* for page pool */
struct { /* Tail pages of compound page */
unsigned long compound_head; /* Bit zero is set */
};
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 00064e766b889..c414de6c6ab0d 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <net/net_debug.h>
+#include <net/netmem_type.h>
/* net_iov */
@@ -20,15 +21,6 @@ DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
*/
#define NET_IOV 0x01UL
-struct netmem_desc {
- unsigned long __unused_padding;
- unsigned long pp_magic;
- struct page_pool *pp;
- struct net_iov_area *owner;
- unsigned long dma_addr;
- atomic_long_t pp_ref_count;
-};
-
struct net_iov_area {
/* Array of net_iovs for this area. */
struct netmem_desc *niovs;
@@ -38,31 +30,6 @@ struct net_iov_area {
unsigned long base_virtual;
};
-/* These fields in struct page are used by the page_pool and net stack:
- *
- * struct {
- * unsigned long pp_magic;
- * struct page_pool *pp;
- * unsigned long _pp_mapping_pad;
- * unsigned long dma_addr;
- * atomic_long_t pp_ref_count;
- * };
- *
- * We mirror the page_pool fields here so the page_pool can access these fields
- * without worrying whether the underlying fields belong to a page or net_iov.
- *
- * The non-net stack fields of struct page are private to the mm stack and must
- * never be mirrored to net_iov.
- */
-#define NET_IOV_ASSERT_OFFSET(pg, iov) \
- static_assert(offsetof(struct page, pg) == \
- offsetof(struct netmem_desc, iov))
-NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
-NET_IOV_ASSERT_OFFSET(pp, pp);
-NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
-NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
-#undef NET_IOV_ASSERT_OFFSET
-
static inline struct net_iov_area *net_iov_owner(const struct netmem_desc *niov)
{
return niov->owner;
diff --git a/include/net/netmem_type.h b/include/net/netmem_type.h
new file mode 100644
index 0000000000000..6a3ac8e908515
--- /dev/null
+++ b/include/net/netmem_type.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Author: Byungchul Park <max.byungchul.park@gmail.com>
+ */
+
+#ifndef _NET_NETMEM_TYPE_H
+#define _NET_NETMEM_TYPE_H
+
+#include <linux/stddef.h>
+
+struct netmem_desc {
+ unsigned long __unused_padding;
+ struct_group_tagged(__netmem_desc, actual_data,
+ unsigned long pp_magic;
+ struct page_pool *pp;
+ struct net_iov_area *owner;
+ unsigned long dma_addr;
+ atomic_long_t pp_ref_count;
+ );
+};
+
+#endif /* _NET_NETMEM_TYPE_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API
2025-05-09 11:51 ` [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API Byungchul Park
@ 2025-05-09 13:39 ` Mina Almasry
2025-05-09 14:08 ` Mina Almasry
0 siblings, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 13:39 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>
> To eliminate the use of struct page in page pool, the page pool code
> should use netmem descriptor and API instead.
>
> As part of the work, introduce netmem alloc/put API allowing the code to
> use them rather than struct page things.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/net/netmem.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 45c209d6cc689..c87a604e980b9 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -138,6 +138,24 @@ static inline netmem_ref page_to_netmem(struct page *page)
> return (__force netmem_ref)page;
> }
>
> +static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
> + unsigned int order)
> +{
> + return page_to_netmem(alloc_pages_node(nid, gfp_mask, order));
> +}
> +
> +static inline unsigned long alloc_netmems_bulk_node(gfp_t gfp, int nid,
> + unsigned long nr_netmems, netmem_ref *netmem_array)
> +{
> + return alloc_pages_bulk_node(gfp, nid, nr_netmems,
> + (struct page **)netmem_array);
> +}
> +
> +static inline void put_netmem(netmem_ref netmem)
> +{
> + put_page(netmem_to_page(netmem));
> +}
We can't really do this. netmem_ref is an abstraction that can be a
struct page or struct net_iov underneath. We can't be sure when
put_netmem is called that it is safe to convert to a page via
netmem_to_page(). This will crash if put_netmem is called on a
netmem_ref that is a net_iov underneath.
Please read the patch series that introduced netmem_ref to familiarize
yourself with the background here:
https://lkml.iu.edu/hypermail/linux/kernel/2401.2/09140.html
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 17/19] netmem: remove __netmem_get_pp()
2025-05-09 11:51 ` [RFC 17/19] netmem: remove __netmem_get_pp() Byungchul Park
@ 2025-05-09 13:47 ` Mina Almasry
0 siblings, 0 replies; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 13:47 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>
> There are no users of __netmem_get_pp(). Remove it.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
2025-05-09 11:51 ` [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
@ 2025-05-09 13:49 ` Mina Almasry
2025-05-10 7:28 ` Ilias Apalodimas
1 sibling, 0 replies; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 13:49 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>
> The page pool members in struct page cannot be removed unless it's not
> allowed to access any of them via struct page.
>
> Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
> just wrap page_pool_get_dma_addr_netmem() safely.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API
2025-05-09 13:39 ` Mina Almasry
@ 2025-05-09 14:08 ` Mina Almasry
2025-05-12 12:30 ` Byungchul Park
0 siblings, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 14:08 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
j
On Fri, May 9, 2025 at 6:39 AM Mina Almasry <almasrymina@google.com> wrote:
>
> On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
> >
> > To eliminate the use of struct page in page pool, the page pool code
> > should use netmem descriptor and API instead.
> >
> > As part of the work, introduce netmem alloc/put API allowing the code to
> > use them rather than struct page things.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/net/netmem.h | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index 45c209d6cc689..c87a604e980b9 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -138,6 +138,24 @@ static inline netmem_ref page_to_netmem(struct page *page)
> > return (__force netmem_ref)page;
> > }
> >
> > +static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
> > + unsigned int order)
> > +{
> > + return page_to_netmem(alloc_pages_node(nid, gfp_mask, order));
> > +}
> > +
> > +static inline unsigned long alloc_netmems_bulk_node(gfp_t gfp, int nid,
> > + unsigned long nr_netmems, netmem_ref *netmem_array)
> > +{
> > + return alloc_pages_bulk_node(gfp, nid, nr_netmems,
> > + (struct page **)netmem_array);
> > +}
> > +
> > +static inline void put_netmem(netmem_ref netmem)
> > +{
> > + put_page(netmem_to_page(netmem));
> > +}
>
> We can't really do this. netmem_ref is an abstraction that can be a
> struct page or struct net_iov underneath. We can't be sure when
> put_netmem is called that it is safe to convert to a page via
> netmem_to_page(). This will crash if put_netmem is called on a
> netmem_ref that is a net_iov underneath.
>
On a closer look, it looks like this put_netmem is only called from
code paths where you are sure the netmem_ref is a page underneath, so
this is likely fine for now. There is a net_next series that is adding
proper put_netmem support [1]. It's probably best to rebase your work
on top of that, but this should be fine in the meantime.
[1] https://lore.kernel.org/netdev/20250508004830.4100853-1-almasrymina@google.com/
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/19] Split netmem from struct page
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
` (18 preceding siblings ...)
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
@ 2025-05-09 14:09 ` Mina Almasry
2025-05-12 12:36 ` Byungchul Park
19 siblings, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 14:09 UTC (permalink / raw)
To: Byungchul Park, Pavel Begunkov
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>
> The MM subsystem is trying to reduce struct page to a single pointer.
> The first step towards that is splitting struct page by its individual
> users, as has already been done with folio and slab. This patchset does
> that for netmem which is used for page pools.
>
> Matthew Wilcox tried and stopped the same work, you can see in:
>
> https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
>
> Mina Almasry already has done a lot fo prerequisite works by luck, he
> said :). I stacked my patches on the top of his work e.i. netmem.
>
> I focused on removing the page pool members in struct page this time,
> not moving the allocation code of page pool from net to mm. It can be
> done later if needed.
>
> There are still a lot of works to do, to remove the dependency on struct
> page in the network subsystem. I will continue to work on this after
> this base patchset is merged.
>
> This patchset is based on mm tree's mm-unstable branch.
>
This series largely looks good to me, but a couple of things:
- For deep changes like this to the page_pool, I think we need a
before/after run to Jesper's currently out-of-tree benchmark to see
any regressions:
https://lore.kernel.org/netdev/20250309084118.3080950-1-almasrymina@google.com/
- Also please CC Pavel on iterations related to netmem/net_iov, they
are reusing that in io_uring code for iouring rx rc as well.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
@ 2025-05-09 17:32 ` Mina Almasry
2025-05-09 18:11 ` Matthew Wilcox
2025-05-09 18:02 ` Matthew Wilcox
2025-05-10 7:26 ` Ilias Apalodimas
2 siblings, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 17:32 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>
> Now that all the users of the page pool members in struct page have been
> gone, the members can be removed from struct page. However, the space
> in struct page needs to be kept using a place holder with the same size,
> until struct netmem_desc has its own instance, not overlayed onto struct
> page, to avoid conficting with other members within struct page.
>
> Remove the page pool members in struct page and replace with a place
> holder. The place holder should be removed once struct netmem_desc has
> its own instance.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/mm_types.h | 13 ++-----------
> include/net/netmem.h | 35 +----------------------------------
> include/net/netmem_type.h | 22 ++++++++++++++++++++++
> 3 files changed, 25 insertions(+), 45 deletions(-)
> create mode 100644 include/net/netmem_type.h
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index e76bade9ebb12..69904a0855358 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -20,6 +20,7 @@
> #include <linux/seqlock.h>
> #include <linux/percpu_counter.h>
> #include <linux/types.h>
> +#include <net/netmem_type.h> /* for page pool */
>
> #include <asm/mmu.h>
>
> @@ -118,17 +119,7 @@ struct page {
> */
> unsigned long private;
> };
> - struct { /* page_pool used by netstack */
> - /**
> - * @pp_magic: magic value to avoid recycling non
> - * page_pool allocated pages.
> - */
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - unsigned long _pp_mapping_pad;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> - };
> + struct __netmem_desc place_holder_1; /* for page pool */
> struct { /* Tail pages of compound page */
> unsigned long compound_head; /* Bit zero is set */
> };
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 00064e766b889..c414de6c6ab0d 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -10,6 +10,7 @@
>
> #include <linux/mm.h>
> #include <net/net_debug.h>
> +#include <net/netmem_type.h>
>
> /* net_iov */
>
> @@ -20,15 +21,6 @@ DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> */
> #define NET_IOV 0x01UL
>
> -struct netmem_desc {
> - unsigned long __unused_padding;
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - struct net_iov_area *owner;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> -};
> -
> struct net_iov_area {
> /* Array of net_iovs for this area. */
> struct netmem_desc *niovs;
> @@ -38,31 +30,6 @@ struct net_iov_area {
> unsigned long base_virtual;
> };
>
> -/* These fields in struct page are used by the page_pool and net stack:
> - *
> - * struct {
> - * unsigned long pp_magic;
> - * struct page_pool *pp;
> - * unsigned long _pp_mapping_pad;
> - * unsigned long dma_addr;
> - * atomic_long_t pp_ref_count;
> - * };
> - *
> - * We mirror the page_pool fields here so the page_pool can access these fields
> - * without worrying whether the underlying fields belong to a page or net_iov.
> - *
> - * The non-net stack fields of struct page are private to the mm stack and must
> - * never be mirrored to net_iov.
> - */
> -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> - static_assert(offsetof(struct page, pg) == \
> - offsetof(struct netmem_desc, iov))
> -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> -NET_IOV_ASSERT_OFFSET(pp, pp);
> -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> -#undef NET_IOV_ASSERT_OFFSET
> -
> static inline struct net_iov_area *net_iov_owner(const struct netmem_desc *niov)
> {
> return niov->owner;
> diff --git a/include/net/netmem_type.h b/include/net/netmem_type.h
> new file mode 100644
> index 0000000000000..6a3ac8e908515
> --- /dev/null
> +++ b/include/net/netmem_type.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Author: Byungchul Park <max.byungchul.park@gmail.com>
> + */
> +
> +#ifndef _NET_NETMEM_TYPE_H
> +#define _NET_NETMEM_TYPE_H
> +
> +#include <linux/stddef.h>
> +
> +struct netmem_desc {
> + unsigned long __unused_padding;
> + struct_group_tagged(__netmem_desc, actual_data,
> + unsigned long pp_magic;
> + struct page_pool *pp;
> + struct net_iov_area *owner;
> + unsigned long dma_addr;
> + atomic_long_t pp_ref_count;
> + );
> +};
> +
> +#endif /* _NET_NETMEM_TYPE_H */
> --
> 2.17.1
>
Currently the only restriction on net_iov is that some of its fields
need to be cache aligned with some of the fields of struct page, but
there is no restriction on new fields added to net_iov. We already
have fields in net_iov that have nothing to do with struct page and
shouldn't be part of struct page. Like net_iov_area *owner. I don't
think net_iov_area should be part of struct page and I don't think we
should add restrictions of net_iov.
What I would suggest here is, roughly:
1. Add a new struct:
struct netmem_desc {
unsigned long pp_magic;
struct page_pool *pp;
unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
2. Then update struct page to include this entry instead of the definitions:
struct page {
...
struct netmem_desc place_holder_1; /* for page pool */
...
}
3. And update struct net_iov to also include netmem_desc:
struct net_iov {
struct netmem_desc desc;
struct net_iov_area *owner;
/* More net_iov specific fields in the future */
};
And drop patch 1 which does a rename.
Essentially netmem_desc can be an encapsulation of the shared fields
between struct page and struct net_iov.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-09 17:32 ` Mina Almasry
@ 2025-05-09 18:02 ` Matthew Wilcox
2025-05-12 12:51 ` Byungchul Park
2025-05-10 7:26 ` Ilias Apalodimas
2 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2025-05-09 18:02 UTC (permalink / raw)
To: Byungchul Park
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 09, 2025 at 08:51:26PM +0900, Byungchul Park wrote:
> +++ b/include/linux/mm_types.h
> @@ -20,6 +20,7 @@
> #include <linux/seqlock.h>
> #include <linux/percpu_counter.h>
> #include <linux/types.h>
> +#include <net/netmem_type.h> /* for page pool */
>
> #include <asm/mmu.h>
>
> @@ -118,17 +119,7 @@ struct page {
> */
> unsigned long private;
> };
> - struct { /* page_pool used by netstack */
> - /**
> - * @pp_magic: magic value to avoid recycling non
> - * page_pool allocated pages.
> - */
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - unsigned long _pp_mapping_pad;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> - };
> + struct __netmem_desc place_holder_1; /* for page pool */
> struct { /* Tail pages of compound page */
> unsigned long compound_head; /* Bit zero is set */
> };
The include and the place holder aren't needed.
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 00064e766b889..c414de6c6ab0d 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -10,6 +10,7 @@
>
> #include <linux/mm.h>
> #include <net/net_debug.h>
> +#include <net/netmem_type.h>
... which I think means you don't need the separate header file.
> /* net_iov */
>
> @@ -20,15 +21,6 @@ DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> */
> #define NET_IOV 0x01UL
>
> -struct netmem_desc {
> - unsigned long __unused_padding;
> - unsigned long pp_magic;
> - struct page_pool *pp;
> - struct net_iov_area *owner;
> - unsigned long dma_addr;
> - atomic_long_t pp_ref_count;
> -};
> -
> struct net_iov_area {
> /* Array of net_iovs for this area. */
> struct netmem_desc *niovs;
> @@ -38,31 +30,6 @@ struct net_iov_area {
> unsigned long base_virtual;
> };
>
> -/* These fields in struct page are used by the page_pool and net stack:
> - *
> - * struct {
> - * unsigned long pp_magic;
> - * struct page_pool *pp;
> - * unsigned long _pp_mapping_pad;
> - * unsigned long dma_addr;
> - * atomic_long_t pp_ref_count;
> - * };
> - *
> - * We mirror the page_pool fields here so the page_pool can access these fields
> - * without worrying whether the underlying fields belong to a page or net_iov.
> - *
> - * The non-net stack fields of struct page are private to the mm stack and must
> - * never be mirrored to net_iov.
> - */
> -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> - static_assert(offsetof(struct page, pg) == \
> - offsetof(struct netmem_desc, iov))
> -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> -NET_IOV_ASSERT_OFFSET(pp, pp);
> -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> -#undef NET_IOV_ASSERT_OFFSET
... but you do want to keep asserting that netmem_desc and
net_iov have the same offsets. And you want to assert that struct page
is big enough to hold everything in netmem_desc, like we do for slab:
static_assert(sizeof(struct slab) <= sizeof(struct page));
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 17:32 ` Mina Almasry
@ 2025-05-09 18:11 ` Matthew Wilcox
2025-05-09 19:04 ` Mina Almasry
0 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2025-05-09 18:11 UTC (permalink / raw)
To: Mina Almasry
Cc: Byungchul Park, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 09, 2025 at 10:32:08AM -0700, Mina Almasry wrote:
> Currently the only restriction on net_iov is that some of its fields
> need to be cache aligned with some of the fields of struct page, but
Cache aligned? Do you mean alias (ie be at the same offset)?
> What I would suggest here is, roughly:
>
> 1. Add a new struct:
>
> struct netmem_desc {
> unsigned long pp_magic;
> struct page_pool *pp;
> unsigned long _pp_mapping_pad;
> unsigned long dma_addr;
> atomic_long_t pp_ref_count;
> };
>
> 2. Then update struct page to include this entry instead of the definitions:
>
> struct page {
> ...
> struct netmem_desc place_holder_1; /* for page pool */
> ...
> }
No, the point is to move these fields out of struct page entirely.
At some point (probably this year), we'll actually kmalloc the netmem_desc
(and shrink struct page), but for now, it'll overlap the other fields
in struct page.
> 3. And update struct net_iov to also include netmem_desc:
>
> struct net_iov {
> struct netmem_desc desc;
> struct net_iov_area *owner;
> /* More net_iov specific fields in the future */
> };
>
> And drop patch 1 which does a rename.
>
> Essentially netmem_desc can be an encapsulation of the shared fields
> between struct page and struct net_iov.
That is not the goal.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 18:11 ` Matthew Wilcox
@ 2025-05-09 19:04 ` Mina Almasry
2025-05-09 19:48 ` Matthew Wilcox
0 siblings, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-09 19:04 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Byungchul Park, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 11:11 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, May 09, 2025 at 10:32:08AM -0700, Mina Almasry wrote:
> > Currently the only restriction on net_iov is that some of its fields
> > need to be cache aligned with some of the fields of struct page, but
>
> Cache aligned? Do you mean alias (ie be at the same offset)?
>
> > What I would suggest here is, roughly:
> >
> > 1. Add a new struct:
> >
> > struct netmem_desc {
> > unsigned long pp_magic;
> > struct page_pool *pp;
> > unsigned long _pp_mapping_pad;
> > unsigned long dma_addr;
> > atomic_long_t pp_ref_count;
> > };
> >
> > 2. Then update struct page to include this entry instead of the definitions:
> >
> > struct page {
> > ...
> > struct netmem_desc place_holder_1; /* for page pool */
> > ...
> > }
>
> No, the point is to move these fields out of struct page entirely.
>
> At some point (probably this year), we'll actually kmalloc the netmem_desc
> (and shrink struct page), but for now, it'll overlap the other fields
> in struct page.
>
Right, all I'm saying is that if it's at all possible to keep net_iov
something that can be extended with fields unrelated to struct page,
lets do that. net_iov already has fields that should not belong in
struct page like net_iov_owner and I think more will be added.
I'm thinking netmem_desc can be the fields that are shared between
struct net_iov and struct page (but both can have more specific to the
different memory types). As you say, for now netmem_desc can currently
overlap fields in struct page and struct net_iov, and a follow up
change can replace it with something that gets kmalloced and (I
guess?) there is a pointer in struct page or struct net_iov that
refers to the netmem_desc that contains the shared fields.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 19:04 ` Mina Almasry
@ 2025-05-09 19:48 ` Matthew Wilcox
2025-05-12 19:10 ` Mina Almasry
0 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2025-05-09 19:48 UTC (permalink / raw)
To: Mina Almasry
Cc: Byungchul Park, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 09, 2025 at 12:04:37PM -0700, Mina Almasry wrote:
> Right, all I'm saying is that if it's at all possible to keep net_iov
> something that can be extended with fields unrelated to struct page,
> lets do that. net_iov already has fields that should not belong in
> struct page like net_iov_owner and I think more will be added.
Sure, that's fine.
> I'm thinking netmem_desc can be the fields that are shared between
> struct net_iov and struct page (but both can have more specific to the
> different memory types). As you say, for now netmem_desc can currently
> overlap fields in struct page and struct net_iov, and a follow up
> change can replace it with something that gets kmalloced and (I
> guess?) there is a pointer in struct page or struct net_iov that
> refers to the netmem_desc that contains the shared fields.
I'm sure I've pointed you at
https://kernelnewbies.org/MatthewWilcox/Memdescs before.
But I wouldn't expect to have net_iov contain a pointer to netmem_desc,
rather it would embed a netmem_desc. Unless there's a good reason to
separate them.
Actually, I'd hope to do away with net_iov entirely. Networking should
handle memory-on-PCI-devices the same way everybody else does (as
hotplugged memory) rather than with its own special structures.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-09 17:32 ` Mina Almasry
2025-05-09 18:02 ` Matthew Wilcox
@ 2025-05-10 7:26 ` Ilias Apalodimas
2025-05-12 12:58 ` Byungchul Park
2 siblings, 1 reply; 53+ messages in thread
From: Ilias Apalodimas @ 2025-05-10 7:26 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Hi Byungchul
On Fri, 9 May 2025 at 14:51, Byungchul Park <byungchul@sk.com> wrote:
>
> Now that all the users of the page pool members in struct page have been
> gone, the members can be removed from struct page. However, the space
> in struct page needs to be kept using a place holder with the same size,
> until struct netmem_desc has its own instance, not overlayed onto struct
> page, to avoid conficting with other members within struct page.
>
FWIW similar mirroring was intially proposed [0] a few years ago
> Remove the page pool members in struct page and replace with a place
> holder. The place holder should be removed once struct netmem_desc has
> its own instance.
instance? To make sure I understand this, the netmem_descs are
expected to be allocated in the future right?
[...]
> -
> static inline struct net_iov_area *net_iov_owner(const struct netmem_desc *niov)
> {
> return niov->owner;
> diff --git a/include/net/netmem_type.h b/include/net/netmem_type.h
> new file mode 100644
> index 0000000000000..6a3ac8e908515
> --- /dev/null
> +++ b/include/net/netmem_type.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Author: Byungchul Park <max.byungchul.park@gmail.com>
Shouldn't Minas authorship be preserved here?
> + */
> +
> +#ifndef _NET_NETMEM_TYPE_H
> +#define _NET_NETMEM_TYPE_H
> +
> +#include <linux/stddef.h>
> +
> +struct netmem_desc {
> + unsigned long __unused_padding;
> + struct_group_tagged(__netmem_desc, actual_data,
> + unsigned long pp_magic;
> + struct page_pool *pp;
> + struct net_iov_area *owner;
> + unsigned long dma_addr;
> + atomic_long_t pp_ref_count;
> + );
> +};
> +
> +#endif /* _NET_NETMEM_TYPE_H */
> --
> 2.17.1
>
[0] https://lore.kernel.org/netdev/1549550196-25581-1-git-send-email-ilias.apalodimas@linaro.org/
Thanks
/Ilias
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem()
2025-05-09 11:51 ` [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-05-09 13:49 ` Mina Almasry
@ 2025-05-10 7:28 ` Ilias Apalodimas
1 sibling, 0 replies; 53+ messages in thread
From: Ilias Apalodimas @ 2025-05-10 7:28 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, 9 May 2025 at 14:51, Byungchul Park <byungchul@sk.com> wrote:
>
> The page pool members in struct page cannot be removed unless it's not
> allowed to access any of them via struct page.
>
> Do not access 'page->dma_addr' directly in page_pool_get_dma_addr() but
> just wrap page_pool_get_dma_addr_netmem() safely.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/net/page_pool/helpers.h | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 4deb0b32e4bac..7e0395c70bfa2 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -441,12 +441,7 @@ static inline dma_addr_t page_pool_get_dma_addr_netmem(netmem_ref netmem)
> */
> static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
> {
> - dma_addr_t ret = page->dma_addr;
> -
> - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
> - ret <<= PAGE_SHIFT;
> -
> - return ret;
> + return page_pool_get_dma_addr_netmem(page_to_netmem(page));
> }
>
> static inline void __page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> --
> 2.17.1
>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API
2025-05-09 14:08 ` Mina Almasry
@ 2025-05-12 12:30 ` Byungchul Park
0 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 12:30 UTC (permalink / raw)
To: Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 09, 2025 at 07:08:23AM -0700, Mina Almasry wrote:
> j
>
> On Fri, May 9, 2025 at 6:39 AM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > To eliminate the use of struct page in page pool, the page pool code
> > > should use netmem descriptor and API instead.
> > >
> > > As part of the work, introduce netmem alloc/put API allowing the code to
> > > use them rather than struct page things.
> > >
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > > include/net/netmem.h | 18 ++++++++++++++++++
> > > 1 file changed, 18 insertions(+)
> > >
> > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > index 45c209d6cc689..c87a604e980b9 100644
> > > --- a/include/net/netmem.h
> > > +++ b/include/net/netmem.h
> > > @@ -138,6 +138,24 @@ static inline netmem_ref page_to_netmem(struct page *page)
> > > return (__force netmem_ref)page;
> > > }
> > >
> > > +static inline netmem_ref alloc_netmems_node(int nid, gfp_t gfp_mask,
> > > + unsigned int order)
> > > +{
> > > + return page_to_netmem(alloc_pages_node(nid, gfp_mask, order));
> > > +}
> > > +
> > > +static inline unsigned long alloc_netmems_bulk_node(gfp_t gfp, int nid,
> > > + unsigned long nr_netmems, netmem_ref *netmem_array)
> > > +{
> > > + return alloc_pages_bulk_node(gfp, nid, nr_netmems,
> > > + (struct page **)netmem_array);
> > > +}
> > > +
> > > +static inline void put_netmem(netmem_ref netmem)
> > > +{
> > > + put_page(netmem_to_page(netmem));
> > > +}
> >
> > We can't really do this. netmem_ref is an abstraction that can be a
> > struct page or struct net_iov underneath. We can't be sure when
> > put_netmem is called that it is safe to convert to a page via
> > netmem_to_page(). This will crash if put_netmem is called on a
> > netmem_ref that is a net_iov underneath.
> >
>
> On a closer look, it looks like this put_netmem is only called from
> code paths where you are sure the netmem_ref is a page underneath, so
> this is likely fine for now. There is a net_next series that is adding
> proper put_netmem support [1]. It's probably best to rebase your work
> on top of that, but this should be fine in the meantime.
Indeed. Hm.. It'd be the best to work on the top of yours.
If it takes too long, I keep working on as it is for now and will adjust
this patch later once your work gets merged.
Byungchul
> [1] https://lore.kernel.org/netdev/20250508004830.4100853-1-almasrymina@google.com/
>
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/19] Split netmem from struct page
2025-05-09 14:09 ` [RFC 00/19] Split netmem from " Mina Almasry
@ 2025-05-12 12:36 ` Byungchul Park
2025-05-12 12:59 ` Pavel Begunkov
0 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 12:36 UTC (permalink / raw)
To: Mina Almasry
Cc: Pavel Begunkov, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, ast,
daniel, davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Fri, May 09, 2025 at 07:09:16AM -0700, Mina Almasry wrote:
> On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
> >
> > The MM subsystem is trying to reduce struct page to a single pointer.
> > The first step towards that is splitting struct page by its individual
> > users, as has already been done with folio and slab. This patchset does
> > that for netmem which is used for page pools.
> >
> > Matthew Wilcox tried and stopped the same work, you can see in:
> >
> > https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
> >
> > Mina Almasry already has done a lot fo prerequisite works by luck, he
> > said :). I stacked my patches on the top of his work e.i. netmem.
> >
> > I focused on removing the page pool members in struct page this time,
> > not moving the allocation code of page pool from net to mm. It can be
> > done later if needed.
> >
> > There are still a lot of works to do, to remove the dependency on struct
> > page in the network subsystem. I will continue to work on this after
> > this base patchset is merged.
> >
> > This patchset is based on mm tree's mm-unstable branch.
> >
>
> This series largely looks good to me, but a couple of things:
>
> - For deep changes like this to the page_pool, I think we need a
> before/after run to Jesper's currently out-of-tree benchmark to see
> any regressions:
> https://lore.kernel.org/netdev/20250309084118.3080950-1-almasrymina@google.com/
Sure. I will check it.
> - Also please CC Pavel on iterations related to netmem/net_iov, they
> are reusing that in io_uring code for iouring rx rc as well.
I will. Thank you.
Byungchul
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global
2025-05-09 11:51 ` [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global Byungchul Park
@ 2025-05-12 12:46 ` Toke Høiland-Jørgensen
2025-05-12 12:55 ` Byungchul Park
2025-05-14 3:00 ` Byungchul Park
0 siblings, 2 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-12 12:46 UTC (permalink / raw)
To: Byungchul Park, willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> Other than skbuff.c might need to check if a page or netmem is for page
> pool, for example, page_alloc.c needs to check the page state, whether
> it comes from page pool or not for their own purpose.
>
> Expand the scope of is_pp_netmem() and introduce is_pp_page() newly, so
> that those who want to check the source can achieve the checking without
> accessing page pool member, page->pp_magic, directly.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/net/page_pool/types.h | 2 ++
> net/core/page_pool.c | 10 ++++++++++
> net/core/skbuff.c | 5 -----
> 3 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 36eb57d73abc6..d3e1a52f01e09 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -299,4 +299,6 @@ static inline bool is_page_pool_compiled_in(void)
> /* Caller must provide appropriate safe context, e.g. NAPI. */
> void page_pool_update_nid(struct page_pool *pool, int new_nid);
>
> +bool is_pp_netmem(netmem_ref netmem);
> +bool is_pp_page(struct page *page);
> #endif /* _NET_PAGE_POOL_H */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index b61c1038f4c68..9c553e5a1b555 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1225,3 +1225,13 @@ void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
>
> page_pool_clear_pp_info(netmem);
> }
> +
> +bool is_pp_netmem(netmem_ref netmem)
> +{
> + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> +}
> +
> +bool is_pp_page(struct page *page)
> +{
> + return is_pp_netmem(page_to_netmem(page));
> +}
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6cbf77bc61fce..11098c204fe3e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
> skb_get(list);
> }
>
> -static bool is_pp_netmem(netmem_ref netmem)
> -{
> - return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> -}
> -
This has already been moved to mm.h (and the check changed) by commit:
cd3c93167da0 ("page_pool: Move pp_magic check into helper functions")
You should definitely rebase this series on top of that (and the
subsequent ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap
them when destroying the pool")), as these change the semantics of how
page_pool interacts with struct page.
Both of these are in net-next, which Mina already asked you to rebase
on, so I guess you'll pick it up there, put flagging it here just for
completeness :)
-Toke
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 18:02 ` Matthew Wilcox
@ 2025-05-12 12:51 ` Byungchul Park
2025-05-12 14:42 ` Matthew Wilcox
0 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 12:51 UTC (permalink / raw)
To: Matthew Wilcox
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 09, 2025 at 07:02:30PM +0100, Matthew Wilcox wrote:
> On Fri, May 09, 2025 at 08:51:26PM +0900, Byungchul Park wrote:
> > +++ b/include/linux/mm_types.h
> > @@ -20,6 +20,7 @@
> > #include <linux/seqlock.h>
> > #include <linux/percpu_counter.h>
> > #include <linux/types.h>
> > +#include <net/netmem_type.h> /* for page pool */
> >
> > #include <asm/mmu.h>
> >
> > @@ -118,17 +119,7 @@ struct page {
> > */
> > unsigned long private;
> > };
> > - struct { /* page_pool used by netstack */
> > - /**
> > - * @pp_magic: magic value to avoid recycling non
> > - * page_pool allocated pages.
> > - */
> > - unsigned long pp_magic;
> > - struct page_pool *pp;
> > - unsigned long _pp_mapping_pad;
> > - unsigned long dma_addr;
> > - atomic_long_t pp_ref_count;
> > - };
> > + struct __netmem_desc place_holder_1; /* for page pool */
> > struct { /* Tail pages of compound page */
> > unsigned long compound_head; /* Bit zero is set */
> > };
>
> The include and the place holder aren't needed.
Or netmem_desc overlaying struct page might be conflict with other
fields of sturct page e.g. _mapcount, _refcount and the like, once the
layout of struct page *extremly changes* in the future before
netmem_desc has its own instance.
So placing a place holder like this is the safest way, IMO, to prevent
the unextected result. Am I missing something?
> > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > index 00064e766b889..c414de6c6ab0d 100644
> > --- a/include/net/netmem.h
> > +++ b/include/net/netmem.h
> > @@ -10,6 +10,7 @@
> >
> > #include <linux/mm.h>
> > #include <net/net_debug.h>
> > +#include <net/netmem_type.h>
>
> ... which I think means you don't need the separate header file.
Agree if I don't use the place holder in mm_types.h.
> > /* net_iov */
> >
> > @@ -20,15 +21,6 @@ DECLARE_STATIC_KEY_FALSE(page_pool_mem_providers);
> > */
> > #define NET_IOV 0x01UL
> >
> > -struct netmem_desc {
> > - unsigned long __unused_padding;
> > - unsigned long pp_magic;
> > - struct page_pool *pp;
> > - struct net_iov_area *owner;
> > - unsigned long dma_addr;
> > - atomic_long_t pp_ref_count;
> > -};
> > -
> > struct net_iov_area {
> > /* Array of net_iovs for this area. */
> > struct netmem_desc *niovs;
> > @@ -38,31 +30,6 @@ struct net_iov_area {
> > unsigned long base_virtual;
> > };
> >
> > -/* These fields in struct page are used by the page_pool and net stack:
> > - *
> > - * struct {
> > - * unsigned long pp_magic;
> > - * struct page_pool *pp;
> > - * unsigned long _pp_mapping_pad;
> > - * unsigned long dma_addr;
> > - * atomic_long_t pp_ref_count;
> > - * };
> > - *
> > - * We mirror the page_pool fields here so the page_pool can access these fields
> > - * without worrying whether the underlying fields belong to a page or net_iov.
> > - *
> > - * The non-net stack fields of struct page are private to the mm stack and must
> > - * never be mirrored to net_iov.
> > - */
> > -#define NET_IOV_ASSERT_OFFSET(pg, iov) \
> > - static_assert(offsetof(struct page, pg) == \
> > - offsetof(struct netmem_desc, iov))
> > -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
> > -NET_IOV_ASSERT_OFFSET(pp, pp);
> > -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
> > -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
> > -#undef NET_IOV_ASSERT_OFFSET
>
> ... but you do want to keep asserting that netmem_desc and
> net_iov have the same offsets. And you want to assert that struct page
> is big enough to hold everything in netmem_desc, like we do for slab:
>
> static_assert(sizeof(struct slab) <= sizeof(struct page));
I will. However, as I mentioned above, the total size doesn't matter
but the layout change of struct page might matter, I think.
Byungchul
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global
2025-05-12 12:46 ` Toke Høiland-Jørgensen
@ 2025-05-12 12:55 ` Byungchul Park
2025-05-14 3:00 ` Byungchul Park
1 sibling, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 12:55 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Mon, May 12, 2025 at 02:46:36PM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > Other than skbuff.c might need to check if a page or netmem is for page
> > pool, for example, page_alloc.c needs to check the page state, whether
> > it comes from page pool or not for their own purpose.
> >
> > Expand the scope of is_pp_netmem() and introduce is_pp_page() newly, so
> > that those who want to check the source can achieve the checking without
> > accessing page pool member, page->pp_magic, directly.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/net/page_pool/types.h | 2 ++
> > net/core/page_pool.c | 10 ++++++++++
> > net/core/skbuff.c | 5 -----
> > 3 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> > index 36eb57d73abc6..d3e1a52f01e09 100644
> > --- a/include/net/page_pool/types.h
> > +++ b/include/net/page_pool/types.h
> > @@ -299,4 +299,6 @@ static inline bool is_page_pool_compiled_in(void)
> > /* Caller must provide appropriate safe context, e.g. NAPI. */
> > void page_pool_update_nid(struct page_pool *pool, int new_nid);
> >
> > +bool is_pp_netmem(netmem_ref netmem);
> > +bool is_pp_page(struct page *page);
> > #endif /* _NET_PAGE_POOL_H */
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index b61c1038f4c68..9c553e5a1b555 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1225,3 +1225,13 @@ void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
> >
> > page_pool_clear_pp_info(netmem);
> > }
> > +
> > +bool is_pp_netmem(netmem_ref netmem)
> > +{
> > + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> > +}
> > +
> > +bool is_pp_page(struct page *page)
> > +{
> > + return is_pp_netmem(page_to_netmem(page));
> > +}
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6cbf77bc61fce..11098c204fe3e 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
> > skb_get(list);
> > }
> >
> > -static bool is_pp_netmem(netmem_ref netmem)
> > -{
> > - return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> > -}
> > -
>
> This has already been moved to mm.h (and the check changed) by commit:
>
> cd3c93167da0 ("page_pool: Move pp_magic check into helper functions")
>
> You should definitely rebase this series on top of that (and the
> subsequent ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap
> them when destroying the pool")), as these change the semantics of how
> page_pool interacts with struct page.
>
> Both of these are in net-next, which Mina already asked you to rebase
> on, so I guess you'll pick it up there, put flagging it here just for
> completeness :)
I will not miss it. Thanks.
Byungchul
>
> -Toke
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-10 7:26 ` Ilias Apalodimas
@ 2025-05-12 12:58 ` Byungchul Park
0 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 12:58 UTC (permalink / raw)
To: Ilias Apalodimas
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Sat, May 10, 2025 at 10:26:00AM +0300, Ilias Apalodimas wrote:
> Hi Byungchul
>
> On Fri, 9 May 2025 at 14:51, Byungchul Park <byungchul@sk.com> wrote:
> >
> > Now that all the users of the page pool members in struct page have been
> > gone, the members can be removed from struct page. However, the space
> > in struct page needs to be kept using a place holder with the same size,
> > until struct netmem_desc has its own instance, not overlayed onto struct
> > page, to avoid conficting with other members within struct page.
> >
>
> FWIW similar mirroring was intially proposed [0] a few years ago
>
> > Remove the page pool members in struct page and replace with a place
> > holder. The place holder should be removed once struct netmem_desc has
> > its own instance.
>
> instance? To make sure I understand this, the netmem_descs are
> expected to be allocated in the future right?
Yes.
> [...]
>
> > -
> > static inline struct net_iov_area *net_iov_owner(const struct netmem_desc *niov)
> > {
> > return niov->owner;
> > diff --git a/include/net/netmem_type.h b/include/net/netmem_type.h
> > new file mode 100644
> > index 0000000000000..6a3ac8e908515
> > --- /dev/null
> > +++ b/include/net/netmem_type.h
> > @@ -0,0 +1,22 @@
> > +/* SPDX-License-Identifier: GPL-2.0
> > + *
> > + * Author: Byungchul Park <max.byungchul.park@gmail.com>
>
> Shouldn't Minas authorship be preserved here?
Ah. I dunno what I'm supposed to do in the case. I will remove the
author part :) if it's still okay.
Byungchul
> > + */
> > +
> > +#ifndef _NET_NETMEM_TYPE_H
> > +#define _NET_NETMEM_TYPE_H
> > +
> > +#include <linux/stddef.h>
> > +
> > +struct netmem_desc {
> > + unsigned long __unused_padding;
> > + struct_group_tagged(__netmem_desc, actual_data,
> > + unsigned long pp_magic;
> > + struct page_pool *pp;
> > + struct net_iov_area *owner;
> > + unsigned long dma_addr;
> > + atomic_long_t pp_ref_count;
> > + );
> > +};
> > +
> > +#endif /* _NET_NETMEM_TYPE_H */
> > --
> > 2.17.1
> >
>
> [0] https://lore.kernel.org/netdev/1549550196-25581-1-git-send-email-ilias.apalodimas@linaro.org/
> Thanks
> /Ilias
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/19] Split netmem from struct page
2025-05-12 12:36 ` Byungchul Park
@ 2025-05-12 12:59 ` Pavel Begunkov
0 siblings, 0 replies; 53+ messages in thread
From: Pavel Begunkov @ 2025-05-12 12:59 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On 5/12/25 13:36, Byungchul Park wrote:
> On Fri, May 09, 2025 at 07:09:16AM -0700, Mina Almasry wrote:
>> On Fri, May 9, 2025 at 4:51 AM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> The MM subsystem is trying to reduce struct page to a single pointer.
>>> The first step towards that is splitting struct page by its individual
>>> users, as has already been done with folio and slab. This patchset does
>>> that for netmem which is used for page pools.
>>>
>>> Matthew Wilcox tried and stopped the same work, you can see in:
>>>
>>> https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
>>>
>>> Mina Almasry already has done a lot fo prerequisite works by luck, he
>>> said :). I stacked my patches on the top of his work e.i. netmem.
>>>
>>> I focused on removing the page pool members in struct page this time,
>>> not moving the allocation code of page pool from net to mm. It can be
>>> done later if needed.
>>>
>>> There are still a lot of works to do, to remove the dependency on struct
>>> page in the network subsystem. I will continue to work on this after
>>> this base patchset is merged.
>>>
>>> This patchset is based on mm tree's mm-unstable branch.
>>>
>>
>> This series largely looks good to me, but a couple of things:
>>
>> - For deep changes like this to the page_pool, I think we need a
>> before/after run to Jesper's currently out-of-tree benchmark to see
>> any regressions:
>> https://lore.kernel.org/netdev/20250309084118.3080950-1-almasrymina@google.com/
>
> Sure. I will check it.
>
>> - Also please CC Pavel on iterations related to netmem/net_iov, they
>> are reusing that in io_uring code for iouring rx rc as well.
>
> I will. Thank you.
Mina, thanks for CC'ing. And since it's touching io_uring, future
versions need to CC it as well.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-09 11:51 ` [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc Byungchul Park
@ 2025-05-12 13:11 ` Pavel Begunkov
2025-05-12 13:29 ` Byungchul Park
0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2025-05-12 13:11 UTC (permalink / raw)
To: Byungchul Park, willy, netdev
Cc: linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On 5/9/25 12:51, Byungchul Park wrote:
> To simplify struct page, the page pool members of struct page should be
> moved to other, allowing these members to be removed from struct page.
>
> Reuse struct net_iov for also system memory, that already mirrored the
> page pool members.
>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
> include/linux/skbuff.h | 4 +--
> include/net/netmem.h | 20 ++++++------
> include/net/page_pool/memory_provider.h | 6 ++--
> io_uring/zcrx.c | 42 ++++++++++++-------------
You're unnecessarily complicating it for yourself. It'll certainly
conflict with changes in the io_uring tree, and hence it can't
be taken normally through the net tree.
Why are you renaming it in the first place? If there are good
reasons, maybe you can try adding a temporary alias of the struct
and patch io_uring later separately.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-12 13:11 ` Pavel Begunkov
@ 2025-05-12 13:29 ` Byungchul Park
2025-05-12 19:14 ` Mina Almasry
2025-05-13 12:49 ` Pavel Begunkov
0 siblings, 2 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-12 13:29 UTC (permalink / raw)
To: Pavel Begunkov
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
> On 5/9/25 12:51, Byungchul Park wrote:
> > To simplify struct page, the page pool members of struct page should be
> > moved to other, allowing these members to be removed from struct page.
> >
> > Reuse struct net_iov for also system memory, that already mirrored the
> > page pool members.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/linux/skbuff.h | 4 +--
> > include/net/netmem.h | 20 ++++++------
> > include/net/page_pool/memory_provider.h | 6 ++--
> > io_uring/zcrx.c | 42 ++++++++++++-------------
>
> You're unnecessarily complicating it for yourself. It'll certainly
> conflict with changes in the io_uring tree, and hence it can't
> be taken normally through the net tree.
>
> Why are you renaming it in the first place? If there are good
It's because the struct should be used for not only io vetor things but
also system memory. Current network code uses struct page as system
memory descriptor but struct page fields for page pool will be gone.
So I had to reuse struct net_iov and I thought renaming it made more
sense. It'd be welcome if you have better idea.
Byungchul
> reasons, maybe you can try adding a temporary alias of the struct
> and patch io_uring later separately.
>
> --
> Pavel Begunkov
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-12 12:51 ` Byungchul Park
@ 2025-05-12 14:42 ` Matthew Wilcox
2025-05-13 1:42 ` Byungchul Park
0 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2025-05-12 14:42 UTC (permalink / raw)
To: Byungchul Park
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Mon, May 12, 2025 at 09:51:03PM +0900, Byungchul Park wrote:
> On Fri, May 09, 2025 at 07:02:30PM +0100, Matthew Wilcox wrote:
> > > + struct __netmem_desc place_holder_1; /* for page pool */
> > > struct { /* Tail pages of compound page */
> > > unsigned long compound_head; /* Bit zero is set */
> > > };
> >
> > The include and the place holder aren't needed.
>
> Or netmem_desc overlaying struct page might be conflict with other
> fields of sturct page e.g. _mapcount, _refcount and the like, once the
> layout of struct page *extremly changes* in the future before
> netmem_desc has its own instance.
That's not how it'll happen. When the layout of struct page changes,
it'll shrink substantially (cut in half). That means that dynamically
allocating netmem_desc must happen first (along with slab, folio, etc,
etc).
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-09 19:48 ` Matthew Wilcox
@ 2025-05-12 19:10 ` Mina Almasry
0 siblings, 0 replies; 53+ messages in thread
From: Mina Almasry @ 2025-05-12 19:10 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Byungchul Park, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Fri, May 9, 2025 at 12:48 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, May 09, 2025 at 12:04:37PM -0700, Mina Almasry wrote:
> > Right, all I'm saying is that if it's at all possible to keep net_iov
> > something that can be extended with fields unrelated to struct page,
> > lets do that. net_iov already has fields that should not belong in
> > struct page like net_iov_owner and I think more will be added.
>
> Sure, that's fine.
>
Excellent!
> > I'm thinking netmem_desc can be the fields that are shared between
> > struct net_iov and struct page (but both can have more specific to the
> > different memory types). As you say, for now netmem_desc can currently
> > overlap fields in struct page and struct net_iov, and a follow up
> > change can replace it with something that gets kmalloced and (I
> > guess?) there is a pointer in struct page or struct net_iov that
> > refers to the netmem_desc that contains the shared fields.
>
> I'm sure I've pointed you at
> https://kernelnewbies.org/MatthewWilcox/Memdescs before.
>
I've gone through that again. Some of it is a bit over my head
(sorry), but this page does say that page->compound_head will have a
link to memdesc:
https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
That's an approach that sounds fine to me. We can have net_iov follow
that pattern if necessary (have in it a field that points to the
memdesc).
> But I wouldn't expect to have net_iov contain a pointer to netmem_desc,
> rather it would embed a netmem_desc. Unless there's a good reason to
> separate them.
>
net_iov embedding netmem_desc sounds fine as well to me.
> Actually, I'd hope to do away with net_iov entirely. Networking should
> handle memory-on-PCI-devices the same way everybody else does (as
> hotplugged memory) rather than with its own special structures.
>
Doing away with net_iov entirely is a different conversation. From the
devmem TCP side, you're much more of an expert than me but my
experience is that the GPU devices we initially net_iovs for, dmabuf
is the standard way of sharing memory, and the dma-buf importer just
gets a scatterlist, and has to either work with the scatterlist
directly or create descriptors (like net_iov) to handle chunks of the
scatterlist.
I think we discussed this before and you said to me you have long term
plans to get rid of scatterlists. Once that is done we may be able to
do away with the dma-buf use case for net_iovs, but the conversation
about migrating scatterlists to something new is well over my head and
probably needs discussion with the dma-buf maintainers.
Note also that the users of net_iov have expanded and io_uring has a
dependency on it as well.
The good news (I think) is that Byungchul's effort does not require
the removal of net_iov. From looking at this patchset I think what
he's trying to do is very compatible with net_iovs with minor
modifications.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-12 13:29 ` Byungchul Park
@ 2025-05-12 19:14 ` Mina Almasry
2025-05-13 2:00 ` Byungchul Park
2025-05-13 12:49 ` Pavel Begunkov
1 sibling, 1 reply; 53+ messages in thread
From: Mina Almasry @ 2025-05-12 19:14 UTC (permalink / raw)
To: Byungchul Park
Cc: Pavel Begunkov, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, ast,
daniel, davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Mon, May 12, 2025 at 6:29 AM Byungchul Park <byungchul@sk.com> wrote:
>
> On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
> > On 5/9/25 12:51, Byungchul Park wrote:
> > > To simplify struct page, the page pool members of struct page should be
> > > moved to other, allowing these members to be removed from struct page.
> > >
> > > Reuse struct net_iov for also system memory, that already mirrored the
> > > page pool members.
> > >
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > ---
> > > include/linux/skbuff.h | 4 +--
> > > include/net/netmem.h | 20 ++++++------
> > > include/net/page_pool/memory_provider.h | 6 ++--
> > > io_uring/zcrx.c | 42 ++++++++++++-------------
> >
> > You're unnecessarily complicating it for yourself. It'll certainly
> > conflict with changes in the io_uring tree, and hence it can't
> > be taken normally through the net tree.
> >
> > Why are you renaming it in the first place? If there are good
>
> It's because the struct should be used for not only io vetor things but
> also system memory. Current network code uses struct page as system
> memory descriptor but struct page fields for page pool will be gone.
>
> So I had to reuse struct net_iov and I thought renaming it made more
> sense. It'd be welcome if you have better idea.
>
As I said in another thread, struct page should not embed struct
net_iov as-is. struct net_iov already has fields that are unrelated to
page (like net_iov_owner) and more will be added in the future.
I think what Matthew seems to agree with AFAIU in the other thread is
creating a new struct, struct netmem_desc, and having struct net_iov
embed netmem_desc.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-12 14:42 ` Matthew Wilcox
@ 2025-05-13 1:42 ` Byungchul Park
2025-05-13 3:19 ` Matthew Wilcox
0 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-13 1:42 UTC (permalink / raw)
To: Matthew Wilcox
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Mon, May 12, 2025 at 03:42:54PM +0100, Matthew Wilcox wrote:
> On Mon, May 12, 2025 at 09:51:03PM +0900, Byungchul Park wrote:
> > On Fri, May 09, 2025 at 07:02:30PM +0100, Matthew Wilcox wrote:
> > > > + struct __netmem_desc place_holder_1; /* for page pool */
> > > > struct { /* Tail pages of compound page */
> > > > unsigned long compound_head; /* Bit zero is set */
> > > > };
> > >
> > > The include and the place holder aren't needed.
> >
> > Or netmem_desc overlaying struct page might be conflict with other
> > fields of sturct page e.g. _mapcount, _refcount and the like, once the
> > layout of struct page *extremly changes* in the future before
> > netmem_desc has its own instance.
>
> That's not how it'll happen. When the layout of struct page changes,
> it'll shrink substantially (cut in half). That means that dynamically
> allocating netmem_desc must happen first (along with slab, folio, etc,
> etc).
Just in case, lemme explain what I meant, for *example*:
struct page {
unsigned long flags;
union {
struct A { /* 24 bytes */ };
struct B { /* 40 bytes */ };
struct page_pool { /* 40 bytes */ };
};
atomic_t _mapcount;
atomic_t _refcount;
unsigend long memcf_data;
...
};
/* overlayed on struct page */
struct netmem_desc { /* 8 + 40 bytes */ };
After removing page_pool fields:
struct page {
unsigned long flags;
union {
struct A { /* 24 bytes */ };
struct B { /* 40 bytes */ };
};
atomic_t _mapcount;
atomic_t _refcount;
unsigend long memcf_data;
...
};
/* overlayed on struct page */
struct netmem_desc { /* 8 + 40 bytes */ };
The above still looks okay cuz operating on struct netmem_desc is not
touching _mapcount or _refcount in struct page.
However, either the size of struct B gets reduced to 32 bytes or struct B
gets away out of struct page for some reason, it will look like:
struct page {
unsigned long flags;
union {
struct A { /* 24 bytes */ };
struct B { /* 32 bytes */ };
};
atomic_t _mapcount;
atomic_t _refcount;
unsigend long memcf_data;
...
};
/* overlayed on struct page */
struct netmem_desc { /* 8 + 40 bytes */ };
In here, operating on struct netmem_desc can smash _mapcount and
_refcount in struct page unexpectedly, even though sizeof(struct
netmem_desc) <= sizeof(struct page). That's why I think the place holder
is necessary until it completely gets separated so as to have its own
instance.
If you believe it's still okay, I will remove the place holder, I still
concern it tho.
Byungchul
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-12 19:14 ` Mina Almasry
@ 2025-05-13 2:00 ` Byungchul Park
2025-05-13 12:58 ` Pavel Begunkov
0 siblings, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-13 2:00 UTC (permalink / raw)
To: Mina Almasry
Cc: Pavel Begunkov, willy, netdev, linux-kernel, linux-mm,
kernel_team, kuba, ilias.apalodimas, harry.yoo, hawk, akpm, ast,
daniel, davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Mon, May 12, 2025 at 12:14:13PM -0700, Mina Almasry wrote:
> On Mon, May 12, 2025 at 6:29 AM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
> > > On 5/9/25 12:51, Byungchul Park wrote:
> > > > To simplify struct page, the page pool members of struct page should be
> > > > moved to other, allowing these members to be removed from struct page.
> > > >
> > > > Reuse struct net_iov for also system memory, that already mirrored the
> > > > page pool members.
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > ---
> > > > include/linux/skbuff.h | 4 +--
> > > > include/net/netmem.h | 20 ++++++------
> > > > include/net/page_pool/memory_provider.h | 6 ++--
> > > > io_uring/zcrx.c | 42 ++++++++++++-------------
> > >
> > > You're unnecessarily complicating it for yourself. It'll certainly
> > > conflict with changes in the io_uring tree, and hence it can't
> > > be taken normally through the net tree.
> > >
> > > Why are you renaming it in the first place? If there are good
> >
> > It's because the struct should be used for not only io vetor things but
> > also system memory. Current network code uses struct page as system
> > memory descriptor but struct page fields for page pool will be gone.
> >
> > So I had to reuse struct net_iov and I thought renaming it made more
> > sense. It'd be welcome if you have better idea.
> >
>
> As I said in another thread, struct page should not embed struct
I don't understand here. Can you explain more? Do you mean do not use
place holder?
> net_iov as-is. struct net_iov already has fields that are unrelated to
> page (like net_iov_owner) and more will be added in the future.
>
> I think what Matthew seems to agree with AFAIU in the other thread is
> creating a new struct, struct netmem_desc, and having struct net_iov
> embed netmem_desc.
This would look better. I will try.
Byungchul
> --
> Thanks,
> Mina
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-13 1:42 ` Byungchul Park
@ 2025-05-13 3:19 ` Matthew Wilcox
2025-05-13 10:24 ` Byungchul Park
0 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2025-05-13 3:19 UTC (permalink / raw)
To: Byungchul Park
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Tue, May 13, 2025 at 10:42:00AM +0900, Byungchul Park wrote:
> Just in case, lemme explain what I meant, for *example*:
I understood what you meant.
> In here, operating on struct netmem_desc can smash _mapcount and
> _refcount in struct page unexpectedly, even though sizeof(struct
> netmem_desc) <= sizeof(struct page). That's why I think the place holder
> is necessary until it completely gets separated so as to have its own
> instance.
We could tighten up the assert a bit. eg
static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
We _can't_ shrink struct page until struct folio is dynamically
allocated. The same patch series that dynamically allocates folio will
do the same for netmem and slab and ptdesc and ...
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 19/19] mm, netmem: remove the page pool members in struct page
2025-05-13 3:19 ` Matthew Wilcox
@ 2025-05-13 10:24 ` Byungchul Park
0 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-13 10:24 UTC (permalink / raw)
To: Matthew Wilcox
Cc: netdev, linux-kernel, linux-mm, kernel_team, kuba, almasrymina,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On Tue, May 13, 2025 at 04:19:03AM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 10:42:00AM +0900, Byungchul Park wrote:
> > Just in case, lemme explain what I meant, for *example*:
>
> I understood what you meant.
>
> > In here, operating on struct netmem_desc can smash _mapcount and
> > _refcount in struct page unexpectedly, even though sizeof(struct
> > netmem_desc) <= sizeof(struct page). That's why I think the place holder
> > is necessary until it completely gets separated so as to have its own
> > instance.
>
> We could tighten up the assert a bit. eg
>
> static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
This mitigates what I concern. I will replace the place holder with
this (but it must never happen to relocate the fields in struct page by
any chance for any reason until the day. I trust you :).
Byungchul
> We _can't_ shrink struct page until struct folio is dynamically
> allocated. The same patch series that dynamically allocates folio will
> do the same for netmem and slab and ptdesc and ...
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-12 13:29 ` Byungchul Park
2025-05-12 19:14 ` Mina Almasry
@ 2025-05-13 12:49 ` Pavel Begunkov
2025-05-14 0:07 ` Byungchul Park
1 sibling, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2025-05-13 12:49 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On 5/12/25 14:29, Byungchul Park wrote:
> On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
>> On 5/9/25 12:51, Byungchul Park wrote:
>>> To simplify struct page, the page pool members of struct page should be
>>> moved to other, allowing these members to be removed from struct page.
>>>
>>> Reuse struct net_iov for also system memory, that already mirrored the
>>> page pool members.
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> ---
>>> include/linux/skbuff.h | 4 +--
>>> include/net/netmem.h | 20 ++++++------
>>> include/net/page_pool/memory_provider.h | 6 ++--
>>> io_uring/zcrx.c | 42 ++++++++++++-------------
>>
>> You're unnecessarily complicating it for yourself. It'll certainly
>> conflict with changes in the io_uring tree, and hence it can't
>> be taken normally through the net tree.
>>
>> Why are you renaming it in the first place? If there are good
>
> It's because the struct should be used for not only io vetor things but
> also system memory. Current network code uses struct page as system
Not sure what you mean by "io vector things", but it can already
point to system memory, and if anything, the use conceptually more
resembles struct pages rather than iovec. IOW, it's just a name,
neither gives a perfect understanding until you look up details,
so you could just leave it net_iov. Or follow what Mina suggested,
I like that option.
> memory descriptor but struct page fields for page pool will be gone.
>
> So I had to reuse struct net_iov and I thought renaming it made more
> sense. It'd be welcome if you have better idea.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-13 2:00 ` Byungchul Park
@ 2025-05-13 12:58 ` Pavel Begunkov
0 siblings, 0 replies; 53+ messages in thread
From: Pavel Begunkov @ 2025-05-13 12:58 UTC (permalink / raw)
To: Byungchul Park, Mina Almasry
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel, davem,
john.fastabend, andrew+netdev, edumazet, pabeni, vishal.moola
On 5/13/25 03:00, Byungchul Park wrote:
> On Mon, May 12, 2025 at 12:14:13PM -0700, Mina Almasry wrote:
>> On Mon, May 12, 2025 at 6:29 AM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
>>>> On 5/9/25 12:51, Byungchul Park wrote:
>>>>> To simplify struct page, the page pool members of struct page should be
>>>>> moved to other, allowing these members to be removed from struct page.
>>>>>
>>>>> Reuse struct net_iov for also system memory, that already mirrored the
>>>>> page pool members.
>>>>>
>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>> ---
>>>>> include/linux/skbuff.h | 4 +--
>>>>> include/net/netmem.h | 20 ++++++------
>>>>> include/net/page_pool/memory_provider.h | 6 ++--
>>>>> io_uring/zcrx.c | 42 ++++++++++++-------------
>>>>
>>>> You're unnecessarily complicating it for yourself. It'll certainly
>>>> conflict with changes in the io_uring tree, and hence it can't
>>>> be taken normally through the net tree.
>>>>
>>>> Why are you renaming it in the first place? If there are good
>>>
>>> It's because the struct should be used for not only io vetor things but
>>> also system memory. Current network code uses struct page as system
>>> memory descriptor but struct page fields for page pool will be gone.
>>>
>>> So I had to reuse struct net_iov and I thought renaming it made more
>>> sense. It'd be welcome if you have better idea.
>>>
>>
>> As I said in another thread, struct page should not embed struct
>
> I don't understand here. Can you explain more? Do you mean do not use
> place holder?
I assume this:
struct netmem_desc {
...
};
struct net_iov {
netmem_desc desc;
};
struct page {
union {
// eventually will go away
netmem_desc desc;
...
};
};
And to avoid conflicts with io_uring, you can send something like the
following to the net tree, and finish the io_uring conversion later.
struct net_iov {
struct_group_tagged(netmem_desc, desc,
...
);
};
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc
2025-05-13 12:49 ` Pavel Begunkov
@ 2025-05-14 0:07 ` Byungchul Park
0 siblings, 0 replies; 53+ messages in thread
From: Byungchul Park @ 2025-05-14 0:07 UTC (permalink / raw)
To: Pavel Begunkov
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Tue, May 13, 2025 at 01:49:56PM +0100, Pavel Begunkov wrote:
> On 5/12/25 14:29, Byungchul Park wrote:
> > On Mon, May 12, 2025 at 02:11:13PM +0100, Pavel Begunkov wrote:
> > > On 5/9/25 12:51, Byungchul Park wrote:
> > > > To simplify struct page, the page pool members of struct page should be
> > > > moved to other, allowing these members to be removed from struct page.
> > > >
> > > > Reuse struct net_iov for also system memory, that already mirrored the
> > > > page pool members.
> > > >
> > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > ---
> > > > include/linux/skbuff.h | 4 +--
> > > > include/net/netmem.h | 20 ++++++------
> > > > include/net/page_pool/memory_provider.h | 6 ++--
> > > > io_uring/zcrx.c | 42 ++++++++++++-------------
> > >
> > > You're unnecessarily complicating it for yourself. It'll certainly
> > > conflict with changes in the io_uring tree, and hence it can't
> > > be taken normally through the net tree.
> > >
> > > Why are you renaming it in the first place? If there are good
> >
> > It's because the struct should be used for not only io vetor things but
> > also system memory. Current network code uses struct page as system
>
> Not sure what you mean by "io vector things", but it can already
> point to system memory, and if anything, the use conceptually more
> resembles struct pages rather than iovec. IOW, it's just a name,
> neither gives a perfect understanding until you look up details,
> so you could just leave it net_iov. Or follow what Mina suggested,
> I like that option.
I appreciate all of your feedback and will try to apply them.
Byungchul
> > memory descriptor but struct page fields for page pool will be gone.
> >
> > So I had to reuse struct net_iov and I thought renaming it made more
> > sense. It'd be welcome if you have better idea.
> --
> Pavel Begunkov
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global
2025-05-12 12:46 ` Toke Høiland-Jørgensen
2025-05-12 12:55 ` Byungchul Park
@ 2025-05-14 3:00 ` Byungchul Park
2025-05-14 11:17 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 53+ messages in thread
From: Byungchul Park @ 2025-05-14 3:00 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
On Mon, May 12, 2025 at 02:46:36PM +0200, Toke Høiland-Jørgensen wrote:
> Byungchul Park <byungchul@sk.com> writes:
>
> > Other than skbuff.c might need to check if a page or netmem is for page
> > pool, for example, page_alloc.c needs to check the page state, whether
> > it comes from page pool or not for their own purpose.
> >
> > Expand the scope of is_pp_netmem() and introduce is_pp_page() newly, so
> > that those who want to check the source can achieve the checking without
> > accessing page pool member, page->pp_magic, directly.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > ---
> > include/net/page_pool/types.h | 2 ++
> > net/core/page_pool.c | 10 ++++++++++
> > net/core/skbuff.c | 5 -----
> > 3 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> > index 36eb57d73abc6..d3e1a52f01e09 100644
> > --- a/include/net/page_pool/types.h
> > +++ b/include/net/page_pool/types.h
> > @@ -299,4 +299,6 @@ static inline bool is_page_pool_compiled_in(void)
> > /* Caller must provide appropriate safe context, e.g. NAPI. */
> > void page_pool_update_nid(struct page_pool *pool, int new_nid);
> >
> > +bool is_pp_netmem(netmem_ref netmem);
> > +bool is_pp_page(struct page *page);
> > #endif /* _NET_PAGE_POOL_H */
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index b61c1038f4c68..9c553e5a1b555 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1225,3 +1225,13 @@ void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
> >
> > page_pool_clear_pp_info(netmem);
> > }
> > +
> > +bool is_pp_netmem(netmem_ref netmem)
> > +{
> > + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> > +}
> > +
> > +bool is_pp_page(struct page *page)
> > +{
> > + return is_pp_netmem(page_to_netmem(page));
> > +}
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6cbf77bc61fce..11098c204fe3e 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
> > skb_get(list);
> > }
> >
> > -static bool is_pp_netmem(netmem_ref netmem)
> > -{
> > - return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
> > -}
> > -
>
> This has already been moved to mm.h (and the check changed) by commit:
>
> cd3c93167da0 ("page_pool: Move pp_magic check into helper functions")
>
> You should definitely rebase this series on top of that (and the
> subsequent ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap
> them when destroying the pool")), as these change the semantics of how
> page_pool interacts with struct page.
>
> Both of these are in net-next, which Mina already asked you to rebase
Is this net-next you are mentioning? I will rebase on this if so.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
Byungchul
> on, so I guess you'll pick it up there, put flagging it here just for
> completeness :)
>
> -Toke
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global
2025-05-14 3:00 ` Byungchul Park
@ 2025-05-14 11:17 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-05-14 11:17 UTC (permalink / raw)
To: Byungchul Park
Cc: willy, netdev, linux-kernel, linux-mm, kernel_team, kuba,
almasrymina, ilias.apalodimas, harry.yoo, hawk, akpm, ast, daniel,
davem, john.fastabend, andrew+netdev, edumazet, pabeni,
vishal.moola
Byungchul Park <byungchul@sk.com> writes:
> On Mon, May 12, 2025 at 02:46:36PM +0200, Toke Høiland-Jørgensen wrote:
>> Byungchul Park <byungchul@sk.com> writes:
>>
>> > Other than skbuff.c might need to check if a page or netmem is for page
>> > pool, for example, page_alloc.c needs to check the page state, whether
>> > it comes from page pool or not for their own purpose.
>> >
>> > Expand the scope of is_pp_netmem() and introduce is_pp_page() newly, so
>> > that those who want to check the source can achieve the checking without
>> > accessing page pool member, page->pp_magic, directly.
>> >
>> > Signed-off-by: Byungchul Park <byungchul@sk.com>
>> > ---
>> > include/net/page_pool/types.h | 2 ++
>> > net/core/page_pool.c | 10 ++++++++++
>> > net/core/skbuff.c | 5 -----
>> > 3 files changed, 12 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
>> > index 36eb57d73abc6..d3e1a52f01e09 100644
>> > --- a/include/net/page_pool/types.h
>> > +++ b/include/net/page_pool/types.h
>> > @@ -299,4 +299,6 @@ static inline bool is_page_pool_compiled_in(void)
>> > /* Caller must provide appropriate safe context, e.g. NAPI. */
>> > void page_pool_update_nid(struct page_pool *pool, int new_nid);
>> >
>> > +bool is_pp_netmem(netmem_ref netmem);
>> > +bool is_pp_page(struct page *page);
>> > #endif /* _NET_PAGE_POOL_H */
>> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> > index b61c1038f4c68..9c553e5a1b555 100644
>> > --- a/net/core/page_pool.c
>> > +++ b/net/core/page_pool.c
>> > @@ -1225,3 +1225,13 @@ void net_mp_niov_clear_page_pool(struct netmem_desc *niov)
>> >
>> > page_pool_clear_pp_info(netmem);
>> > }
>> > +
>> > +bool is_pp_netmem(netmem_ref netmem)
>> > +{
>> > + return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
>> > +}
>> > +
>> > +bool is_pp_page(struct page *page)
>> > +{
>> > + return is_pp_netmem(page_to_netmem(page));
>> > +}
>> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> > index 6cbf77bc61fce..11098c204fe3e 100644
>> > --- a/net/core/skbuff.c
>> > +++ b/net/core/skbuff.c
>> > @@ -893,11 +893,6 @@ static void skb_clone_fraglist(struct sk_buff *skb)
>> > skb_get(list);
>> > }
>> >
>> > -static bool is_pp_netmem(netmem_ref netmem)
>> > -{
>> > - return (netmem_get_pp_magic(netmem) & ~0x3UL) == PP_SIGNATURE;
>> > -}
>> > -
>>
>> This has already been moved to mm.h (and the check changed) by commit:
>>
>> cd3c93167da0 ("page_pool: Move pp_magic check into helper functions")
>>
>> You should definitely rebase this series on top of that (and the
>> subsequent ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap
>> them when destroying the pool")), as these change the semantics of how
>> page_pool interacts with struct page.
>>
>> Both of these are in net-next, which Mina already asked you to rebase
>
> Is this net-next you are mentioning? I will rebase on this if so.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
Yup :)
-Toke
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2025-05-14 11:17 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 11:51 [RFC 00/19] Split netmem from struct page Byungchul Park
2025-05-09 11:51 ` [RFC 01/19] netmem: rename struct net_iov to struct netmem_desc Byungchul Park
2025-05-12 13:11 ` Pavel Begunkov
2025-05-12 13:29 ` Byungchul Park
2025-05-12 19:14 ` Mina Almasry
2025-05-13 2:00 ` Byungchul Park
2025-05-13 12:58 ` Pavel Begunkov
2025-05-13 12:49 ` Pavel Begunkov
2025-05-14 0:07 ` Byungchul Park
2025-05-09 11:51 ` [RFC 02/19] netmem: introduce netmem alloc/put API to wrap page alloc/put API Byungchul Park
2025-05-09 13:39 ` Mina Almasry
2025-05-09 14:08 ` Mina Almasry
2025-05-12 12:30 ` Byungchul Park
2025-05-09 11:51 ` [RFC 03/19] page_pool: use netmem alloc/put API in __page_pool_alloc_page_order() Byungchul Park
2025-05-09 11:51 ` [RFC 04/19] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 05/19] page_pool: use netmem alloc/put API in __page_pool_alloc_pages_slow() Byungchul Park
2025-05-09 11:51 ` [RFC 06/19] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 07/19] page_pool: use netmem alloc/put API in page_pool_return_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 08/19] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-05-09 11:51 ` [RFC 09/19] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 10/19] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-05-09 11:51 ` [RFC 11/19] mlx4: use netmem descriptor and API for page pool Byungchul Park
2025-05-09 11:51 ` [RFC 12/19] netmem: introduce page_pool_recycle_direct_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 13/19] page_pool: expand scope of is_pp_{netmem,page}() to global Byungchul Park
2025-05-12 12:46 ` Toke Høiland-Jørgensen
2025-05-12 12:55 ` Byungchul Park
2025-05-14 3:00 ` Byungchul Park
2025-05-14 11:17 ` Toke Høiland-Jørgensen
2025-05-09 11:51 ` [RFC 14/19] mm: page_alloc: do not directly access page->pp_magic but use is_pp_page() Byungchul Park
2025-05-09 11:51 ` [RFC 15/19] mlx5: use netmem descriptor and API for page pool Byungchul Park
2025-05-09 11:51 ` [RFC 16/19] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-05-09 11:51 ` [RFC 17/19] netmem: remove __netmem_get_pp() Byungchul Park
2025-05-09 13:47 ` Mina Almasry
2025-05-09 11:51 ` [RFC 18/19] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-05-09 13:49 ` Mina Almasry
2025-05-10 7:28 ` Ilias Apalodimas
2025-05-09 11:51 ` [RFC 19/19] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-09 17:32 ` Mina Almasry
2025-05-09 18:11 ` Matthew Wilcox
2025-05-09 19:04 ` Mina Almasry
2025-05-09 19:48 ` Matthew Wilcox
2025-05-12 19:10 ` Mina Almasry
2025-05-09 18:02 ` Matthew Wilcox
2025-05-12 12:51 ` Byungchul Park
2025-05-12 14:42 ` Matthew Wilcox
2025-05-13 1:42 ` Byungchul Park
2025-05-13 3:19 ` Matthew Wilcox
2025-05-13 10:24 ` Byungchul Park
2025-05-10 7:26 ` Ilias Apalodimas
2025-05-12 12:58 ` Byungchul Park
2025-05-09 14:09 ` [RFC 00/19] Split netmem from " Mina Almasry
2025-05-12 12:36 ` Byungchul Park
2025-05-12 12:59 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).