* [RFC net-next 00/13] net: page_pool: add netlink-based introspection
@ 2023-08-16 23:42 Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
` (13 more replies)
0 siblings, 14 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
As the page pool functionality grows we will be increasingly
constrained by the lack of any uAPI. This series is a first
step towards such a uAPI, it creates a way for users to query
information about page pools and associated statistics.
I'm purposefully exposing only the information which I'd
find useful when monitoring production workloads.
For the SET part (which to be clear isn't addressed by this
series at all) I think we'll need to turn to something more
along the lines of "netdev-level policy". Instead of configuring
page pools, which are somewhat ephemeral, and hard to change
at runtime, we should set the "expected page pool parameters"
at the netdev level, and have the driver consult those when
instantiating pools. My ramblings from yesterday about Queue API
may make this more or less clear...
https://lore.kernel.org/all/20230815171638.4c057dcd@kernel.org/
Jakub Kicinski (13):
net: page_pool: split the page_pool_params into fast and slow
net: page_pool: avoid touching slow on the fastpath
net: page_pool: factor out uninit
net: page_pool: id the page pools
net: page_pool: record pools per netdev
net: page_pool: stash the NAPI ID for easier access
eth: link netdev to pp
net: page_pool: add nlspec for basic access to page pools
net: page_pool: implement GET in the netlink API
net: page_pool: add netlink notifications for state changes
net: page_pool: report when page pool was destroyed
net: page_pool: expose page pool stats via netlink
tools: netdev: regen after page pool changes
Documentation/netlink/specs/netdev.yaml | 158 +++++++
Documentation/networking/page_pool.rst | 5 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
.../net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
include/linux/list.h | 20 +
include/linux/netdevice.h | 4 +
include/net/page_pool/helpers.h | 8 +-
include/net/page_pool/types.h | 43 +-
include/uapi/linux/netdev.h | 37 ++
net/core/Makefile | 2 +-
net/core/netdev-genl-gen.c | 41 ++
net/core/netdev-genl-gen.h | 11 +
net/core/page_pool.c | 56 ++-
net/core/page_pool_priv.h | 12 +
net/core/page_pool_user.c | 409 +++++++++++++++++
tools/include/uapi/linux/netdev.h | 37 ++
tools/net/ynl/generated/netdev-user.c | 415 ++++++++++++++++++
tools/net/ynl/generated/netdev-user.h | 169 +++++++
19 files changed, 1391 insertions(+), 39 deletions(-)
create mode 100644 net/core/page_pool_priv.h
create mode 100644 net/core/page_pool_user.c
--
2.41.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-17 9:31 ` Jesper Dangaard Brouer
` (2 more replies)
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
` (12 subsequent siblings)
13 siblings, 3 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
struct page_pool is rather performance critical and we use
16B of the first cache line to store 2 pointers used only
by test code. Future patches will add more informational
(non-fast path) attributes.
It's convenient for the user of the API to not have to worry
which fields are fast and which are slow path. Use struct
groups to split the params into the two categories internally.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 31 +++++++++++++++++++------------
net/core/page_pool.c | 7 ++++---
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 887e7946a597..1c16b95de62f 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -56,18 +56,22 @@ struct pp_alloc_cache {
* @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
*/
struct page_pool_params {
- unsigned int flags;
- unsigned int order;
- unsigned int pool_size;
- int nid;
- struct device *dev;
- struct napi_struct *napi;
- enum dma_data_direction dma_dir;
- unsigned int max_len;
- unsigned int offset;
+ struct_group_tagged(page_pool_params_fast, fast,
+ unsigned int flags;
+ unsigned int order;
+ unsigned int pool_size;
+ int nid;
+ struct device *dev;
+ struct napi_struct *napi;
+ enum dma_data_direction dma_dir;
+ unsigned int max_len;
+ unsigned int offset;
+ );
+ struct_group_tagged(page_pool_params_slow, slow,
/* private: used by test code only */
- void (*init_callback)(struct page *page, void *arg);
- void *init_arg;
+ void (*init_callback)(struct page *page, void *arg);
+ void *init_arg;
+ );
};
#ifdef CONFIG_PAGE_POOL_STATS
@@ -121,7 +125,7 @@ struct page_pool_stats {
#endif
struct page_pool {
- struct page_pool_params p;
+ struct page_pool_params_fast p;
long frag_users;
struct page *frag_page;
@@ -180,6 +184,9 @@ struct page_pool {
refcount_t user_cnt;
u64 destroy_cnt;
+
+ /* Slow/Control-path information follows */
+ struct page_pool_params_slow slow;
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 77cb75e63aca..ffe7782d7fc0 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -173,7 +173,8 @@ static int page_pool_init(struct page_pool *pool,
{
unsigned int ring_qsize = 1024; /* Default */
- memcpy(&pool->p, params, sizeof(pool->p));
+ memcpy(&pool->p, ¶ms->fast, sizeof(pool->p));
+ memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow));
/* Validate only known flags were used */
if (pool->p.flags & ~(PP_FLAG_ALL))
@@ -372,8 +373,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
{
page->pp = pool;
page->pp_magic |= PP_SIGNATURE;
- if (pool->p.init_callback)
- pool->p.init_callback(page, pool->p.init_arg);
+ if (pool->slow.init_callback)
+ pool->slow.init_callback(page, pool->slow.init_arg);
}
static void page_pool_clear_pp_info(struct page *page)
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-17 9:32 ` Jesper Dangaard Brouer
` (2 more replies)
2023-08-16 23:42 ` [RFC net-next 03/13] net: page_pool: factor out uninit Jakub Kicinski
` (11 subsequent siblings)
13 siblings, 3 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
To fully benefit from previous commit add one byte of state
in the first cache line recording if we need to look at
the slow part.
The packing isn't all that impressive right now, we create
a 7B hole. I'm expecting Olek's rework will reshuffle this,
anyway.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 2 ++
net/core/page_pool.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1c16b95de62f..1ac7ce25fbd4 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -127,6 +127,8 @@ struct page_pool_stats {
struct page_pool {
struct page_pool_params_fast p;
+ bool has_init_callback;
+
long frag_users;
struct page *frag_page;
unsigned int frag_offset;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ffe7782d7fc0..2c14445a353a 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -216,6 +216,8 @@ static int page_pool_init(struct page_pool *pool,
pool->p.flags & PP_FLAG_PAGE_FRAG)
return -EINVAL;
+ pool->has_init_callback = !!pool->slow.init_callback;
+
#ifdef CONFIG_PAGE_POOL_STATS
pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
if (!pool->recycle_stats)
@@ -373,7 +375,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
{
page->pp = pool;
page->pp_magic |= PP_SIGNATURE;
- if (pool->slow.init_callback)
+ if (pool->has_init_callback)
pool->slow.init_callback(page, pool->slow.init_arg);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 03/13] net: page_pool: factor out uninit
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-17 7:40 ` Ilias Apalodimas
2023-08-16 23:42 ` [RFC net-next 04/13] net: page_pool: id the page pools Jakub Kicinski
` (10 subsequent siblings)
13 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
We'll soon need a fuller unwind path in page_pool_create()
so create the inverse of page_pool_init().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/page_pool.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2c14445a353a..8e71e116224d 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -238,6 +238,18 @@ static int page_pool_init(struct page_pool *pool,
return 0;
}
+static void page_pool_uninit(struct page_pool *pool)
+{
+ ptr_ring_cleanup(&pool->ring, NULL);
+
+ if (pool->p.flags & PP_FLAG_DMA_MAP)
+ put_device(pool->p.dev);
+
+#ifdef CONFIG_PAGE_POOL_STATS
+ free_percpu(pool->recycle_stats);
+#endif
+}
+
/**
* page_pool_create() - create a page pool.
* @params: parameters, see struct page_pool_params
@@ -806,14 +818,7 @@ static void page_pool_free(struct page_pool *pool)
if (pool->disconnect)
pool->disconnect(pool);
- ptr_ring_cleanup(&pool->ring, NULL);
-
- if (pool->p.flags & PP_FLAG_DMA_MAP)
- put_device(pool->p.dev);
-
-#ifdef CONFIG_PAGE_POOL_STATS
- free_percpu(pool->recycle_stats);
-#endif
+ page_pool_uninit(pool);
kfree(pool);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 04/13] net: page_pool: id the page pools
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (2 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 03/13] net: page_pool: factor out uninit Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-17 21:56 ` Mina Almasry
2023-08-16 23:42 ` [RFC net-next 05/13] net: page_pool: record pools per netdev Jakub Kicinski
` (9 subsequent siblings)
13 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
To give ourselves the flexibility of creating netlink commands
and ability to refer to page pool instances in uAPIs create
IDs for page pools.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 4 ++++
net/core/Makefile | 2 +-
net/core/page_pool.c | 21 +++++++++++++++-----
net/core/page_pool_priv.h | 9 +++++++++
net/core/page_pool_user.c | 36 +++++++++++++++++++++++++++++++++++
5 files changed, 66 insertions(+), 6 deletions(-)
create mode 100644 net/core/page_pool_priv.h
create mode 100644 net/core/page_pool_user.c
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1ac7ce25fbd4..9fadf15dadfa 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -189,6 +189,10 @@ struct page_pool {
/* Slow/Control-path information follows */
struct page_pool_params_slow slow;
+ /* User-facing fields, protected by page_pools_lock */
+ struct {
+ u32 id;
+ } user;
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
diff --git a/net/core/Makefile b/net/core/Makefile
index 731db2eaa610..4ae3d83f67d5 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -18,7 +18,7 @@ obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
obj-y += net-sysfs.o
-obj-$(CONFIG_PAGE_POOL) += page_pool.o
+obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
obj-$(CONFIG_NETPOLL) += netpoll.o
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 8e71e116224d..de199c356043 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -23,6 +23,8 @@
#include <trace/events/page_pool.h>
+#include "page_pool_priv.h"
+
#define DEFER_TIME (msecs_to_jiffies(1000))
#define DEFER_WARN_INTERVAL (60 * HZ)
@@ -264,13 +266,21 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
return ERR_PTR(-ENOMEM);
err = page_pool_init(pool, params);
- if (err < 0) {
- pr_warn("%s() gave up with errno %d\n", __func__, err);
- kfree(pool);
- return ERR_PTR(err);
- }
+ if (err < 0)
+ goto err_free;
+
+ err = page_pool_list(pool);
+ if (err)
+ goto err_uninit;
return pool;
+
+err_uninit:
+ page_pool_uninit(pool);
+err_free:
+ pr_warn("%s() gave up with errno %d\n", __func__, err);
+ kfree(pool);
+ return ERR_PTR(err);
}
EXPORT_SYMBOL(page_pool_create);
@@ -818,6 +828,7 @@ static void page_pool_free(struct page_pool *pool)
if (pool->disconnect)
pool->disconnect(pool);
+ page_pool_unlist(pool);
page_pool_uninit(pool);
kfree(pool);
}
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
new file mode 100644
index 000000000000..6c4e4aeed02a
--- /dev/null
+++ b/net/core/page_pool_priv.h
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __PAGE_POOL_PRIV_H
+#define __PAGE_POOL_PRIV_H
+
+int page_pool_list(struct page_pool *pool);
+void page_pool_unlist(struct page_pool *pool);
+
+#endif
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
new file mode 100644
index 000000000000..af4ac38a2de1
--- /dev/null
+++ b/net/core/page_pool_user.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/mutex.h>
+#include <linux/xarray.h>
+#include <net/page_pool/types.h>
+
+#include "page_pool_priv.h"
+
+static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
+static DEFINE_MUTEX(page_pools_lock);
+
+int page_pool_list(struct page_pool *pool)
+{
+ static u32 id_alloc_next;
+ int err;
+
+ mutex_lock(&page_pools_lock);
+ err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
+ &id_alloc_next, GFP_KERNEL);
+ if (err < 0)
+ goto err_unlock;
+
+ mutex_unlock(&page_pools_lock);
+ return 0;
+
+err_unlock:
+ mutex_unlock(&page_pools_lock);
+ return err;
+}
+
+void page_pool_unlist(struct page_pool *pool)
+{
+ mutex_lock(&page_pools_lock);
+ xa_erase(&page_pools, pool->user.id);
+ mutex_unlock(&page_pools_lock);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 05/13] net: page_pool: record pools per netdev
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (3 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 04/13] net: page_pool: id the page pools Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-17 7:26 ` Simon Horman
2023-08-16 23:42 ` [RFC net-next 06/13] net: page_pool: stash the NAPI ID for easier access Jakub Kicinski
` (8 subsequent siblings)
13 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Link the page pools with netdevs. This needs to be netns compatible
so we have two options. Either we record the pools per netns and
have to worry about moving them as the netdev gets moved.
Or we record them directly on the netdev so they move with the netdev
without any extra work.
Implement the latter option. Since pools may outlast netdev we need
a place to store orphans. In time honored tradition use loopback
for this purpose.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/list.h | 20 ++++++++
include/linux/netdevice.h | 4 ++
include/net/page_pool/types.h | 4 ++
net/core/page_pool_user.c | 86 +++++++++++++++++++++++++++++++++++
4 files changed, 114 insertions(+)
diff --git a/include/linux/list.h b/include/linux/list.h
index f10344dbad4d..a65e3017a39b 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -1030,6 +1030,26 @@ static inline void hlist_move_list(struct hlist_head *old,
old->first = NULL;
}
+/**
+ * hlist_splice_init() - move all entries from one list to another
+ * @from: hlist_head from which entries will be moved
+ * @last: last entry on the @from list
+ * @to: hlist_head to which entries will be moved
+ *
+ * @to can be empty, @from must contain at least @last.
+ */
+static inline void hlist_splice_init(struct hlist_head *from,
+ struct hlist_node *last,
+ struct hlist_head *to)
+{
+ if (to->first)
+ to->first->pprev = &last->next;
+ last->next = to->first;
+ to->first = from->first;
+ from->first->pprev = &to->first;
+ from->first = NULL;
+}
+
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
#define hlist_for_each(pos, head) \
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0896aaa91dd7..7576cd43b49e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2405,6 +2405,10 @@ struct net_device {
struct rtnl_hw_stats64 *offload_xstats_l3;
struct devlink_port *devlink_port;
+#if IS_ENABLED(CONFIG_PAGE_POOL)
+ /** @page_pools: page pools created for this netdevice */
+ struct hlist_head page_pools;
+#endif
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 9fadf15dadfa..b9db612708e4 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -5,6 +5,7 @@
#include <linux/dma-direction.h>
#include <linux/ptr_ring.h>
+#include <linux/types.h>
#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
* map/unmap
@@ -50,6 +51,7 @@ struct pp_alloc_cache {
* @pool_size: size of the ptr_ring
* @nid: NUMA node id to allocate from pages from
* @dev: device, for DMA pre-mapping purposes
+ * @netdev: netdev this pool will serve (leave as NULL if none or multiple)
* @napi: NAPI which is the sole consumer of pages, otherwise NULL
* @dma_dir: DMA mapping direction
* @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
@@ -68,6 +70,7 @@ struct page_pool_params {
unsigned int offset;
);
struct_group_tagged(page_pool_params_slow, slow,
+ struct net_device *netdev;
/* private: used by test code only */
void (*init_callback)(struct page *page, void *arg);
void *init_arg;
@@ -191,6 +194,7 @@ struct page_pool {
struct page_pool_params_slow slow;
/* User-facing fields, protected by page_pools_lock */
struct {
+ struct hlist_node list;
u32 id;
} user;
};
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index af4ac38a2de1..25977ce18e2b 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -1,14 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/mutex.h>
+#include <linux/netdevice.h>
#include <linux/xarray.h>
+#include <net/net_debug.h>
#include <net/page_pool/types.h>
#include "page_pool_priv.h"
static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
+/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
+ * Ordering: inside rtnl_lock
+ */
static DEFINE_MUTEX(page_pools_lock);
+/* Page pools are only reachable from user space (via netlink) if they are
+ * linked to a netdev at creation time. Following page pool "visibility"
+ * states are possible:
+ * - normal
+ * - user.list: linked to real netdev, netdev: real netdev
+ * - orphaned - real netdev has disappeared
+ * - user.list: linked to lo, netdev: lo
+ * - invisible - either (a) created without netdev linking, (b) unlisted due
+ * to error, or (c) the entire namespace which owned this pool disappeared
+ * - user.list: unhashed, netdev: unknown
+ */
+
int page_pool_list(struct page_pool *pool)
{
static u32 id_alloc_next;
@@ -20,6 +37,10 @@ int page_pool_list(struct page_pool *pool)
if (err < 0)
goto err_unlock;
+ if (pool->slow.netdev)
+ hlist_add_head(&pool->user.list,
+ &pool->slow.netdev->page_pools);
+
mutex_unlock(&page_pools_lock);
return 0;
@@ -32,5 +53,70 @@ void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
xa_erase(&page_pools, pool->user.id);
+ hlist_del(&pool->user.list);
mutex_unlock(&page_pools_lock);
}
+
+static void page_pool_unreg_netdev_wipe(struct net_device *netdev)
+{
+ struct hlist_node *c, *n;
+
+ mutex_lock(&page_pools_lock);
+ hlist_for_each_safe(c, n, &netdev->page_pools)
+ hlist_del_init(c);
+ mutex_unlock(&page_pools_lock);
+}
+
+static void page_pool_unreg_netdev(struct net_device *netdev)
+{
+ struct page_pool *pool, *last;
+ struct net_device *lo;
+
+ lo = __dev_get_by_index(dev_net(netdev), 1);
+ if (!lo) {
+ netdev_err_once(netdev,
+ "can't get lo to store orphan page pools\n");
+ page_pool_unreg_netdev_wipe(netdev);
+ return;
+ }
+
+ mutex_lock(&page_pools_lock);
+ hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
+ pool->slow.netdev = lo;
+ last = pool;
+ }
+
+ hlist_splice_init(&netdev->page_pools, &last->user.list,
+ &lo->page_pools);
+ mutex_unlock(&page_pools_lock);
+}
+
+static int
+page_pool_netdevice_event(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
+
+ if (event != NETDEV_UNREGISTER)
+ return NOTIFY_DONE;
+
+ if (hlist_empty(&netdev->page_pools))
+ return NOTIFY_OK;
+
+ if (netdev->ifindex != 1)
+ page_pool_unreg_netdev(netdev);
+ else
+ page_pool_unreg_netdev_wipe(netdev);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block page_pool_netdevice_nb = {
+ .notifier_call = page_pool_netdevice_event,
+};
+
+static int __init page_pool_user_init(void)
+{
+ return register_netdevice_notifier(&page_pool_netdevice_nb);
+}
+
+subsys_initcall(page_pool_user_init);
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 06/13] net: page_pool: stash the NAPI ID for easier access
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (4 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 05/13] net: page_pool: record pools per netdev Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 07/13] eth: link netdev to pp Jakub Kicinski
` (7 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
To avoid any issues with race conditions on accessing napi
and having to think about the lifetime of NAPI objects
in netlink GET - stash the napi_id to which page pool
was linked at creation time.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 1 +
net/core/page_pool_user.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index b9db612708e4..3017557e0c59 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -195,6 +195,7 @@ struct page_pool {
/* User-facing fields, protected by page_pools_lock */
struct {
struct hlist_node list;
+ u32 napi_id;
u32 id;
} user;
};
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 25977ce18e2b..6632fc19a86f 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -37,9 +37,11 @@ int page_pool_list(struct page_pool *pool)
if (err < 0)
goto err_unlock;
- if (pool->slow.netdev)
+ if (pool->slow.netdev) {
hlist_add_head(&pool->user.list,
&pool->slow.netdev->page_pools);
+ pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
+ }
mutex_unlock(&page_pools_lock);
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 07/13] eth: link netdev to pp
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (5 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 06/13] net: page_pool: stash the NAPI ID for easier access Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 08/13] net: page_pool: add nlspec for basic access to page pools Jakub Kicinski
` (6 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Link page pool instances to netdev for the drivers which
already link to NAPI. Unless the driver is doing something
very weird per-NAPI should imply per-netdev.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7be917a8da48..06776196c05e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3247,6 +3247,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.pool_size = bp->rx_ring_size;
pp.nid = dev_to_node(&bp->pdev->dev);
pp.napi = &rxr->bnapi->napi;
+ pp.netdev = bp->dev;
pp.dev = &bp->pdev->dev;
pp.dma_dir = DMA_BIDIRECTIONAL;
if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bc9d5a5bea01..f9ca600f47e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -839,6 +839,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
pp_params.nid = node;
pp_params.dev = rq->pdev;
pp_params.napi = rq->cq.napi;
+ pp_params.netdev = rq->netdev;
pp_params.dma_dir = rq->buff.map_dir;
pp_params.max_len = PAGE_SIZE;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 4a16ebff3d1d..25286490e237 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2056,6 +2056,7 @@ static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
pprm.pool_size = RX_BUFFERS_PER_QUEUE;
pprm.nid = gc->numa_node;
pprm.napi = &rxq->rx_cq.napi;
+ pprm.netdev = rxq->ndev;
rxq->page_pool = page_pool_create(&pprm);
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 08/13] net: page_pool: add nlspec for basic access to page pools
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (6 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 07/13] eth: link netdev to pp Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 09/13] net: page_pool: implement GET in the netlink API Jakub Kicinski
` (5 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Add a Netlink spec in YAML for getting very basic information
about page pools.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 41 +++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 1c7284fd535b..ded77e61df7b 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -68,6 +68,30 @@ name: netdev
type: u32
checks:
min: 1
+ -
+ name: page-pool
+ attributes:
+ -
+ name: id
+ doc: Unique ID of a Page Pool instance.
+ type: u32
+ checks:
+ min: 1
+ -
+ name: pad
+ type: pad
+ -
+ name: ifindex
+ doc: |
+ ifindex of the netdev to which the pool belongs.
+ May be reported as 0 if the page pool was allocated for a netdev
+ which got destroyed already (page pools may outlast their netdevs
+ because they wait for all memory to be returned).
+ type: u32
+ -
+ name: napi-id
+ doc: Id of NAPI using this Page Pool instance.
+ type: u32
operations:
list:
@@ -101,6 +125,23 @@ name: netdev
doc: Notification about device configuration being changed.
notify: dev-get
mcgrp: mgmt
+ -
+ name: page-pool-get
+ doc: |
+ Get / dump information about Page Pools.
+ (Only Page Pools associated with a net_device can be listed.)
+ attribute-set: page-pool
+ do:
+ request:
+ attributes:
+ - id
+ reply: &pp-reply
+ attributes:
+ - id
+ - ifindex
+ - napi-id
+ dump:
+ reply: *pp-reply
mcast-groups:
list:
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 09/13] net: page_pool: implement GET in the netlink API
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (7 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 08/13] net: page_pool: add nlspec for basic access to page pools Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 10/13] net: page_pool: add netlink notifications for state changes Jakub Kicinski
` (4 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Expose the very basic page pool information via netlink.
Example using ynl-py for a system with 9 queues:
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 19, 'ifindex': 2, 'napi-id': 147},
{'id': 18, 'ifindex': 2, 'napi-id': 146},
{'id': 17, 'ifindex': 2, 'napi-id': 145},
{'id': 16, 'ifindex': 2, 'napi-id': 144},
{'id': 15, 'ifindex': 2, 'napi-id': 143},
{'id': 14, 'ifindex': 2, 'napi-id': 142},
{'id': 13, 'ifindex': 2, 'napi-id': 141},
{'id': 12, 'ifindex': 2, 'napi-id': 140},
{'id': 11, 'ifindex': 2, 'napi-id': 139},
{'id': 10, 'ifindex': 2, 'napi-id': 138}]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/uapi/linux/netdev.h | 11 ++++
net/core/netdev-genl-gen.c | 17 +++++
net/core/netdev-genl-gen.h | 3 +
net/core/page_pool_user.c | 127 ++++++++++++++++++++++++++++++++++++
4 files changed, 158 insertions(+)
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index c1634b95c223..98999e0650a3 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -48,11 +48,22 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_ID = 1,
+ NETDEV_A_PAGE_POOL_PAD,
+ NETDEV_A_PAGE_POOL_IFINDEX,
+ NETDEV_A_PAGE_POOL_NAPI_ID,
+
+ __NETDEV_A_PAGE_POOL_MAX,
+ NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index ea9231378aa6..c74fe69dd241 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -15,6 +15,11 @@ static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1
[NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
};
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+static const struct nla_policy netdev_page_pool_get_nl_policy[NETDEV_A_PAGE_POOL_ID + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+};
+
/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
@@ -29,6 +34,18 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.dumpit = netdev_nl_dev_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_GET,
+ .doit = netdev_nl_page_pool_get_doit,
+ .policy = netdev_page_pool_get_nl_policy,
+ .maxattr = NETDEV_A_PAGE_POOL_ID,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_GET,
+ .dumpit = netdev_nl_page_pool_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
};
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 7b370c073e7d..a011d12abff4 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -13,6 +13,9 @@
int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info);
+int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb);
enum {
NETDEV_NLGRP_MGMT,
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 6632fc19a86f..3bc434f7825f 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -5,8 +5,10 @@
#include <linux/xarray.h>
#include <net/net_debug.h>
#include <net/page_pool/types.h>
+#include <net/sock.h>
#include "page_pool_priv.h"
+#include "netdev-genl-gen.h"
static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
@@ -26,6 +28,131 @@ static DEFINE_MUTEX(page_pools_lock);
* - user.list: unhashed, netdev: unknown
*/
+typedef int (*pp_nl_fill_cb)(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info);
+
+static int
+netdev_nl_page_pool_get_do(struct genl_info *info, u32 id, pp_nl_fill_cb fill)
+{
+ struct page_pool *pool;
+ struct sk_buff *rsp;
+ int err;
+
+ mutex_lock(&page_pools_lock);
+ pool = xa_load(&page_pools, id);
+ if (!pool || hlist_unhashed(&pool->user.list) ||
+ !net_eq(dev_net(pool->slow.netdev), genl_info_net(info))) {
+ err = -ENOENT;
+ goto err_unlock;
+ }
+
+ rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!rsp) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
+ err = fill(rsp, pool, info);
+ if (err)
+ goto err_free_msg;
+
+ mutex_unlock(&page_pools_lock);
+
+ return genlmsg_reply(rsp, info);
+
+err_free_msg:
+ nlmsg_free(rsp);
+err_unlock:
+ mutex_unlock(&page_pools_lock);
+ return err;
+}
+
+struct page_pool_dump_cb {
+ unsigned long ifindex;
+ u32 pp_id;
+};
+
+static int
+netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
+ pp_nl_fill_cb fill)
+{
+ struct page_pool_dump_cb *state = (void *)cb->ctx;
+ const struct genl_info *info = genl_info_dump(cb);
+ struct net *net = sock_net(skb->sk);
+ struct net_device *netdev;
+ struct page_pool *pool;
+ int err = 0;
+
+ rtnl_lock();
+ mutex_lock(&page_pools_lock);
+ for_each_netdev_dump(net, netdev, state->ifindex) {
+ hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
+ if (state->pp_id && state->pp_id < pool->user.id)
+ continue;
+
+ state->pp_id = pool->user.id;
+ err = fill(skb, pool, info);
+ if (err)
+ break;
+ }
+
+ state->pp_id = 0;
+ }
+ mutex_unlock(&page_pools_lock);
+ rtnl_unlock();
+
+ if (skb->len && err == -EMSGSIZE)
+ return skb->len;
+ return err;
+}
+
+static int
+page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info)
+{
+ void *hdr;
+
+ hdr = genlmsg_iput(rsp, info);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id))
+ goto err_cancel;
+
+ if (pool->slow.netdev->ifindex != 1 &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
+ pool->slow.netdev->ifindex))
+ goto err_cancel;
+ if (pool->user.napi_id &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
+ goto err_cancel;
+
+ genlmsg_end(rsp, hdr);
+
+ return 0;
+err_cancel:
+ genlmsg_cancel(rsp, hdr);
+ return -EMSGSIZE;
+}
+
+int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ u32 id;
+
+ if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_PAGE_POOL_ID))
+ return -EINVAL;
+
+ id = nla_get_u32(info->attrs[NETDEV_A_PAGE_POOL_ID]);
+
+ return netdev_nl_page_pool_get_do(info, id, page_pool_nl_fill);
+}
+
+int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ return netdev_nl_page_pool_get_dump(skb, cb, page_pool_nl_fill);
+}
+
int page_pool_list(struct page_pool *pool)
{
static u32 id_alloc_next;
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 10/13] net: page_pool: add netlink notifications for state changes
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (8 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 09/13] net: page_pool: implement GET in the netlink API Jakub Kicinski
@ 2023-08-16 23:42 ` Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 11/13] net: page_pool: report when page pool was destroyed Jakub Kicinski
` (3 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:42 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Generate netlink notifications about page pool state changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 17 ++++++++++++
include/uapi/linux/netdev.h | 4 +++
net/core/netdev-genl-gen.c | 1 +
net/core/netdev-genl-gen.h | 1 +
net/core/page_pool_user.c | 36 +++++++++++++++++++++++++
5 files changed, 59 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index ded77e61df7b..47ae2a45daea 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -142,8 +142,25 @@ name: netdev
- napi-id
dump:
reply: *pp-reply
+ -
+ name: page-pool-add-ntf
+ doc: Notification about page pool appearing.
+ notify: page-pool-get
+ mcgrp: page-pool
+ -
+ name: page-pool-del-ntf
+ doc: Notification about page pool disappearing.
+ notify: page-pool-get
+ mcgrp: page-pool
+ -
+ name: page-pool-change-ntf
+ doc: Notification about page pool configuration being changed.
+ notify: page-pool-get
+ mcgrp: page-pool
mcast-groups:
list:
-
name: mgmt
+ -
+ name: page-pool
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 98999e0650a3..3c9818c1962a 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -64,11 +64,15 @@ enum {
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
+ NETDEV_CMD_PAGE_POOL_ADD_NTF,
+ NETDEV_CMD_PAGE_POOL_DEL_NTF,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
+#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index c74fe69dd241..23d5bd111ddd 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -50,6 +50,7 @@ static const struct genl_split_ops netdev_nl_ops[] = {
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
[NETDEV_NLGRP_MGMT] = { "mgmt", },
+ [NETDEV_NLGRP_PAGE_POOL] = { "page-pool", },
};
struct genl_family netdev_nl_family __ro_after_init = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index a011d12abff4..738097847100 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -19,6 +19,7 @@ int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
enum {
NETDEV_NLGRP_MGMT,
+ NETDEV_NLGRP_PAGE_POOL,
};
extern struct genl_family netdev_nl_family;
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 3bc434f7825f..ba2f27f15495 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -135,6 +135,37 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
return -EMSGSIZE;
}
+static void netdev_nl_page_pool_event(const struct page_pool *pool, u32 cmd)
+{
+ struct genl_info info;
+ struct sk_buff *ntf;
+ struct net *net;
+
+ lockdep_assert_held(&page_pools_lock);
+
+ /* 'invisible' page pools don't matter */
+ if (hlist_unhashed(&pool->user.list))
+ return;
+ net = dev_net(pool->slow.netdev);
+
+ if (!genl_has_listeners(&netdev_nl_family, net, NETDEV_NLGRP_PAGE_POOL))
+ return;
+
+ genl_info_init_ntf(&info, &netdev_nl_family, cmd);
+
+ ntf = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!ntf)
+ return;
+
+ if (page_pool_nl_fill(ntf, pool, &info)) {
+ nlmsg_free(ntf);
+ return;
+ }
+
+ genlmsg_multicast_netns(&netdev_nl_family, net, ntf,
+ 0, NETDEV_NLGRP_PAGE_POOL, GFP_KERNEL);
+}
+
int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info)
{
u32 id;
@@ -168,6 +199,8 @@ int page_pool_list(struct page_pool *pool)
hlist_add_head(&pool->user.list,
&pool->slow.netdev->page_pools);
pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
+
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
}
mutex_unlock(&page_pools_lock);
@@ -181,6 +214,7 @@ int page_pool_list(struct page_pool *pool)
void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_DEL_NTF);
xa_erase(&page_pools, pool->user.id);
hlist_del(&pool->user.list);
mutex_unlock(&page_pools_lock);
@@ -212,6 +246,8 @@ static void page_pool_unreg_netdev(struct net_device *netdev)
mutex_lock(&page_pools_lock);
hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
pool->slow.netdev = lo;
+ netdev_nl_page_pool_event(pool,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF);
last = pool;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 11/13] net: page_pool: report when page pool was destroyed
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (9 preceding siblings ...)
2023-08-16 23:42 ` [RFC net-next 10/13] net: page_pool: add netlink notifications for state changes Jakub Kicinski
@ 2023-08-16 23:43 ` Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 12/13] net: page_pool: expose page pool stats via netlink Jakub Kicinski
` (2 subsequent siblings)
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:43 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Report when page pool was destroyed and number of inflight
references. This allows user to proactively check for dead
page pools without waiting to see if errors messages will
be printed in dmesg.
Only provide inflight for pools which were already "destroyed".
inflight information could also be interesting for "live"
pools but we don't want to have to deal with the potential
negative values due to races.
Example output for a fake leaked page pool using some hacks
in netdevsim (one "live" pool, and one "leaked" on the same dev):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 2, 'ifindex': 3},
{'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 16 ++++++++++++++++
include/net/page_pool/types.h | 1 +
include/uapi/linux/netdev.h | 2 ++
net/core/page_pool.c | 3 ++-
net/core/page_pool_priv.h | 3 +++
net/core/page_pool_user.c | 16 ++++++++++++++++
6 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 47ae2a45daea..1fd3b4235251 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -92,6 +92,20 @@ name: netdev
name: napi-id
doc: Id of NAPI using this Page Pool instance.
type: u32
+ -
+ name: destroyed
+ type: u64
+ doc: |
+ Seconds in CLOCK_BOOTTIME of when Page Pool was destroyed.
+ Page Pools wait for all the memory allocated from them to be freed
+ before truly disappearing.
+ Absent if Page Pool hasn't been destroyed.
+ -
+ name: inflight
+ type: u32
+ doc: |
+ Number of outstanding references to this page pool (allocated
+ but yet to be freed pages).
operations:
list:
@@ -140,6 +154,8 @@ name: netdev
- id
- ifindex
- napi-id
+ - destroyed
+ - inflight
dump:
reply: *pp-reply
-
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 3017557e0c59..490c7419a474 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -195,6 +195,7 @@ struct page_pool {
/* User-facing fields, protected by page_pools_lock */
struct {
struct hlist_node list;
+ u64 destroyed;
u32 napi_id;
u32 id;
} user;
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 3c9818c1962a..aac840a3849b 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -53,6 +53,8 @@ enum {
NETDEV_A_PAGE_POOL_PAD,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
+ NETDEV_A_PAGE_POOL_DESTROYED,
+ NETDEV_A_PAGE_POOL_INFLIGHT,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index de199c356043..733ca2198d94 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -513,7 +513,7 @@ EXPORT_SYMBOL(page_pool_alloc_pages);
*/
#define _distance(a, b) (s32)((a) - (b))
-static s32 page_pool_inflight(struct page_pool *pool)
+s32 page_pool_inflight(const struct page_pool *pool)
{
u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
@@ -933,6 +933,7 @@ void page_pool_destroy(struct page_pool *pool)
if (!page_pool_release(pool))
return;
+ page_pool_destroyed(pool);
pool->defer_start = jiffies;
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
index 6c4e4aeed02a..892e0cddc400 100644
--- a/net/core/page_pool_priv.h
+++ b/net/core/page_pool_priv.h
@@ -3,7 +3,10 @@
#ifndef __PAGE_POOL_PRIV_H
#define __PAGE_POOL_PRIV_H
+s32 page_pool_inflight(const struct page_pool *pool);
+
int page_pool_list(struct page_pool *pool);
+void page_pool_destroyed(struct page_pool *pool);
void page_pool_unlist(struct page_pool *pool);
#endif
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index ba2f27f15495..a74d6f18caac 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -126,6 +126,14 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
if (pool->user.napi_id &&
nla_put_u32(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
goto err_cancel;
+ if (pool->user.destroyed) {
+ if (nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_DESTROYED,
+ pool->user.destroyed,
+ NETDEV_A_PAGE_POOL_PAD) ||
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_INFLIGHT,
+ page_pool_inflight(pool)))
+ goto err_cancel;
+ }
genlmsg_end(rsp, hdr);
@@ -211,6 +219,14 @@ int page_pool_list(struct page_pool *pool)
return err;
}
+void page_pool_destroyed(struct page_pool *pool)
+{
+ mutex_lock(&page_pools_lock);
+ pool->user.destroyed = ktime_get_boottime_seconds();
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_CHANGE_NTF);
+ mutex_unlock(&page_pools_lock);
+}
+
void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 12/13] net: page_pool: expose page pool stats via netlink
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (10 preceding siblings ...)
2023-08-16 23:43 ` [RFC net-next 11/13] net: page_pool: report when page pool was destroyed Jakub Kicinski
@ 2023-08-16 23:43 ` Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 13/13] tools: netdev: regen after page pool changes Jakub Kicinski
2023-08-17 21:21 ` [RFC net-next 00/13] net: page_pool: add netlink-based introspection Mina Almasry
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:43 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.
A trimmed example from a real (but recently booted system):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
'alloc-empty': 48,
'alloc-fast': 3024,
'alloc-refill': 0,
'alloc-slow': 48,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 0,
'recycle-cached': 0,
'recycle-released-refcnt': 0,
'recycle-ring': 0,
'recycle-ring-full': 0},
{'info': {'id': 18, 'ifindex': 2},
'alloc-empty': 66,
'alloc-fast': 11811,
'alloc-refill': 35,
'alloc-slow': 66,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 1145,
'recycle-cached': 6541,
'recycle-released-refcnt': 0,
'recycle-ring': 1275,
'recycle-ring-full': 0},
{'info': {'id': 17, 'ifindex': 2},
'alloc-empty': 73,
'alloc-fast': 62099,
'alloc-refill': 413,
...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 84 +++++++++++++++++++
Documentation/networking/page_pool.rst | 5 +-
include/net/page_pool/helpers.h | 8 +-
include/uapi/linux/netdev.h | 20 +++++
net/core/netdev-genl-gen.c | 23 +++++
net/core/netdev-genl-gen.h | 7 ++
net/core/page_pool.c | 2 +-
net/core/page_pool_user.c | 106 ++++++++++++++++++++++++
8 files changed, 246 insertions(+), 9 deletions(-)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 1fd3b4235251..412e500d5eaa 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -106,6 +106,66 @@ name: netdev
doc: |
Number of outstanding references to this page pool (allocated
but yet to be freed pages).
+ -
+ name: page-pool-info
+ subset-of: page-pool
+ attributes:
+ -
+ name: id
+ type: u32
+ checks:
+ min: 1
+ -
+ name: ifindex
+ type: u32
+ -
+ name: page-pool-stats
+ doc: |
+ Page pool statistics, see docs for struct page_pool_stats
+ for information about individual statistics.
+ attributes:
+ -
+ name: info
+ doc: Page pool identifying information.
+ type: nest
+ nested-attributes: page-pool-info
+ -
+ name: pad
+ type: pad
+ -
+ name: alloc-fast
+ type: u64
+ value: 8 # reserve some attr ids in case we need more metadata later
+ -
+ name: alloc-slow
+ type: u64
+ -
+ name: alloc-slow-high-order
+ type: u64
+ -
+ name: alloc-empty
+ type: u64
+ -
+ name: alloc-refill
+ type: u64
+ -
+ name: alloc-waive
+ type: u64
+ -
+ name: recycle-cached
+ type: u64
+ -
+ name: recycle-cache-full
+ type: u64
+ -
+ name: recycle-ring
+ type: u64
+ -
+ name: recycle-ring-full
+ type: u64
+ -
+ name: recycle-released-refcnt
+ type: u64
operations:
list:
@@ -173,6 +233,30 @@ name: netdev
doc: Notification about page pool configuration being changed.
notify: page-pool-get
mcgrp: page-pool
+ -
+ name: page-pool-stats-get
+ doc: Get page pool statistics.
+ attribute-set: page-pool-stats
+ do:
+ request:
+ attributes:
+ - info
+ reply: &pp-stats-reply
+ attributes:
+ - info
+ - alloc-fast
+ - alloc-slow
+ - alloc-slow-high-order
+ - alloc-empty
+ - alloc-refill
+ - alloc-waive
+ - recycle-cached
+ - recycle-cache-full
+ - recycle-ring
+ - recycle-ring-full
+ - recycle-released-refcnt
+ dump:
+ reply: *pp-stats-reply
mcast-groups:
list:
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 215ebc92752c..d5566b7c5e7c 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -105,8 +105,9 @@ page_pool_get_stats() and structures described below are available.
It takes a pointer to a ``struct page_pool`` and a pointer to a struct
page_pool_stats allocated by the caller.
-The API will fill in the provided struct page_pool_stats with
-statistics about the page_pool.
+Older drivers expose page pool statistics via ethtool or debugfs.
+The same statistics are accessible via the netlink netdev family
+in a driver-independent fashion.
.. kernel-doc:: include/net/page_pool/types.h
:identifiers: struct page_pool_recycle_stats
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 94231533a369..b004eef7071a 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -32,16 +32,12 @@
#include <net/page_pool/types.h>
#ifdef CONFIG_PAGE_POOL_STATS
+/* Deprecated driver-facing API, use netlink instead */
int page_pool_ethtool_stats_get_count(void);
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
-/*
- * Drivers that wish to harvest page pool stats and report them to users
- * (perhaps via ethtool, debugfs, or another mechanism) can allocate a
- * struct page_pool_stats call page_pool_get_stats to get stats for the specified pool.
- */
-bool page_pool_get_stats(struct page_pool *pool,
+bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats);
#else
static inline int page_pool_ethtool_stats_get_count(void)
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index aac840a3849b..25754237eb2f 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -60,6 +60,25 @@ enum {
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_STATS_INFO = 1,
+ NETDEV_A_PAGE_POOL_STATS_PAD,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+
+ __NETDEV_A_PAGE_POOL_STATS_MAX,
+ NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
@@ -69,6 +88,7 @@ enum {
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 23d5bd111ddd..5dab064988c1 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -10,6 +10,12 @@
#include <uapi/linux/netdev.h>
+/* Common nested types */
+const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_MIN(NLA_U32, 1),
+ [NETDEV_A_PAGE_POOL_IFINDEX] = { .type = NLA_U32, },
+};
+
/* NETDEV_CMD_DEV_GET - do */
static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = {
[NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
@@ -20,6 +26,11 @@ static const struct nla_policy netdev_page_pool_get_nl_policy[NETDEV_A_PAGE_POOL
[NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_MIN(NLA_U32, 1),
};
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+static const struct nla_policy netdev_page_pool_stats_get_nl_policy[NETDEV_A_PAGE_POOL_STATS_INFO + 1] = {
+ [NETDEV_A_PAGE_POOL_STATS_INFO] = NLA_POLICY_NESTED(netdev_page_pool_info_nl_policy),
+};
+
/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
@@ -46,6 +57,18 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.dumpit = netdev_nl_page_pool_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
+ .doit = netdev_nl_page_pool_stats_get_doit,
+ .policy = netdev_page_pool_stats_get_nl_policy,
+ .maxattr = NETDEV_A_PAGE_POOL_STATS_INFO,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
+ .dumpit = netdev_nl_page_pool_stats_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
};
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 738097847100..649e4b46eccf 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -11,11 +11,18 @@
#include <uapi/linux/netdev.h>
+/* Common nested types */
+extern const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1];
+
int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
+int netdev_nl_page_pool_stats_get_doit(struct sk_buff *skb,
+ struct genl_info *info);
+int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb);
enum {
NETDEV_NLGRP_MGMT,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 733ca2198d94..b7dac08709ea 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -71,7 +71,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
* is passed to this API which is filled in. The caller can then report
* those stats to the user (perhaps via ethtool, debugfs, etc.).
*/
-bool page_pool_get_stats(struct page_pool *pool,
+bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats)
{
int cpu = 0;
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index a74d6f18caac..f31e81b340d4 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -5,6 +5,7 @@
#include <linux/xarray.h>
#include <net/net_debug.h>
#include <net/page_pool/types.h>
+#include <net/page_pool/helpers.h>
#include <net/sock.h>
#include "page_pool_priv.h"
@@ -106,6 +107,111 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
return err;
}
+static int
+page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info)
+{
+#ifdef CONFIG_PAGE_POOL_STATS
+ const u32 pad = NETDEV_A_PAGE_POOL_STATS_PAD;
+ struct page_pool_stats stats = {};
+ struct nlattr *nest;
+ void *hdr;
+
+ if (!page_pool_get_stats(pool, &stats))
+ return 0;
+
+ hdr = genlmsg_iput(rsp, info);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ nest = nla_nest_start(rsp, NETDEV_A_PAGE_POOL_STATS_INFO);
+
+ if (nla_put_u32(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id) ||
+ (pool->slow.netdev->ifindex != 1 &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
+ pool->slow.netdev->ifindex)))
+ goto err_cancel_nest;
+
+ nla_nest_end(rsp, nest);
+
+ if (nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST,
+ stats.alloc_stats.fast, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ stats.alloc_stats.slow, pad) ||
+ nla_put_u64_64bit(rsp,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ stats.alloc_stats.slow_high_order, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ stats.alloc_stats.empty, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ stats.alloc_stats.refill, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ stats.alloc_stats.waive, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ stats.recycle_stats.cached, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ stats.recycle_stats.cache_full, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ stats.recycle_stats.ring, pad) ||
+ nla_put_u64_64bit(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ stats.recycle_stats.ring_full, pad) ||
+ nla_put_u64_64bit(rsp,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+ stats.recycle_stats.released_refcnt, pad))
+ goto err_cancel_msg;
+
+ genlmsg_end(rsp, hdr);
+
+ return 0;
+err_cancel_nest:
+ nla_nest_cancel(rsp, nest);
+err_cancel_msg:
+ genlmsg_cancel(rsp, hdr);
+ return -EMSGSIZE;
+#else
+ GENL_SET_ERR_MSG(info, "kernel built without CONFIG_PAGE_POOL_STATS");
+ return -EOPNOTSUPP;
+#endif
+}
+
+int netdev_nl_page_pool_stats_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct nlattr *tb[ARRAY_SIZE(netdev_page_pool_info_nl_policy)];
+ struct nlattr *nest;
+ int err;
+ u32 id;
+
+ if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_PAGE_POOL_STATS_INFO))
+ return -EINVAL;
+
+ nest = info->attrs[NETDEV_A_PAGE_POOL_STATS_INFO];
+ err = nla_parse_nested(tb, ARRAY_SIZE(tb) - 1, nest,
+ netdev_page_pool_info_nl_policy,
+ info->extack);
+ if (err)
+ return err;
+
+ if (NL_REQ_ATTR_CHECK(info->extack, nest, tb, NETDEV_A_PAGE_POOL_ID))
+ return -EINVAL;
+ if (tb[NETDEV_A_PAGE_POOL_IFINDEX]) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ tb[NETDEV_A_PAGE_POOL_IFINDEX],
+ "selecting by ifindex not supported");
+ return -EINVAL;
+ }
+
+ id = nla_get_u32(tb[NETDEV_A_PAGE_POOL_ID]);
+
+ return netdev_nl_page_pool_get_do(info, id, page_pool_nl_stats_fill);
+}
+
+int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ return netdev_nl_page_pool_get_dump(skb, cb, page_pool_nl_stats_fill);
+}
+
static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
const struct genl_info *info)
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [RFC net-next 13/13] tools: netdev: regen after page pool changes
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (11 preceding siblings ...)
2023-08-16 23:43 ` [RFC net-next 12/13] net: page_pool: expose page pool stats via netlink Jakub Kicinski
@ 2023-08-16 23:43 ` Jakub Kicinski
2023-08-17 21:21 ` [RFC net-next 00/13] net: page_pool: add netlink-based introspection Mina Almasry
13 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-16 23:43 UTC (permalink / raw)
To: netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina, Jakub Kicinski
Regenerate the code in tools/ after page pool addition
to the netdev netlink family.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
tools/include/uapi/linux/netdev.h | 37 +++
tools/net/ynl/generated/netdev-user.c | 415 ++++++++++++++++++++++++++
tools/net/ynl/generated/netdev-user.h | 169 +++++++++++
3 files changed, 621 insertions(+)
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index c1634b95c223..25754237eb2f 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -48,16 +48,53 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_ID = 1,
+ NETDEV_A_PAGE_POOL_PAD,
+ NETDEV_A_PAGE_POOL_IFINDEX,
+ NETDEV_A_PAGE_POOL_NAPI_ID,
+ NETDEV_A_PAGE_POOL_DESTROYED,
+ NETDEV_A_PAGE_POOL_INFLIGHT,
+
+ __NETDEV_A_PAGE_POOL_MAX,
+ NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
+};
+
+enum {
+ NETDEV_A_PAGE_POOL_STATS_INFO = 1,
+ NETDEV_A_PAGE_POOL_STATS_PAD,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+
+ __NETDEV_A_PAGE_POOL_STATS_MAX,
+ NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_GET,
+ NETDEV_CMD_PAGE_POOL_ADD_NTF,
+ NETDEV_CMD_PAGE_POOL_DEL_NTF,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
+#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
diff --git a/tools/net/ynl/generated/netdev-user.c b/tools/net/ynl/generated/netdev-user.c
index 68b408ca0f7f..d58eb01c436e 100644
--- a/tools/net/ynl/generated/netdev-user.c
+++ b/tools/net/ynl/generated/netdev-user.c
@@ -18,6 +18,11 @@ static const char * const netdev_op_strmap[] = {
[NETDEV_CMD_DEV_ADD_NTF] = "dev-add-ntf",
[NETDEV_CMD_DEV_DEL_NTF] = "dev-del-ntf",
[NETDEV_CMD_DEV_CHANGE_NTF] = "dev-change-ntf",
+ [NETDEV_CMD_PAGE_POOL_GET] = "page-pool-get",
+ [NETDEV_CMD_PAGE_POOL_ADD_NTF] = "page-pool-add-ntf",
+ [NETDEV_CMD_PAGE_POOL_DEL_NTF] = "page-pool-del-ntf",
+ [NETDEV_CMD_PAGE_POOL_CHANGE_NTF] = "page-pool-change-ntf",
+ [NETDEV_CMD_PAGE_POOL_STATS_GET] = "page-pool-stats-get",
};
const char *netdev_op_str(int op)
@@ -46,6 +51,16 @@ const char *netdev_xdp_act_str(enum netdev_xdp_act value)
}
/* Policies */
+struct ynl_policy_attr netdev_page_pool_info_policy[NETDEV_A_PAGE_POOL_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = { .name = "id", .type = YNL_PT_U32, },
+ [NETDEV_A_PAGE_POOL_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
+};
+
+struct ynl_policy_nest netdev_page_pool_info_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_MAX,
+ .table = netdev_page_pool_info_policy,
+};
+
struct ynl_policy_attr netdev_dev_policy[NETDEV_A_DEV_MAX + 1] = {
[NETDEV_A_DEV_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
[NETDEV_A_DEV_PAD] = { .name = "pad", .type = YNL_PT_IGNORE, },
@@ -58,7 +73,86 @@ struct ynl_policy_nest netdev_dev_nest = {
.table = netdev_dev_policy,
};
+struct ynl_policy_attr netdev_page_pool_policy[NETDEV_A_PAGE_POOL_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = { .name = "id", .type = YNL_PT_U32, },
+ [NETDEV_A_PAGE_POOL_PAD] = { .name = "pad", .type = YNL_PT_IGNORE, },
+ [NETDEV_A_PAGE_POOL_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
+ [NETDEV_A_PAGE_POOL_NAPI_ID] = { .name = "napi-id", .type = YNL_PT_U32, },
+ [NETDEV_A_PAGE_POOL_DESTROYED] = { .name = "destroyed", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_INFLIGHT] = { .name = "inflight", .type = YNL_PT_U32, },
+};
+
+struct ynl_policy_nest netdev_page_pool_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_MAX,
+ .table = netdev_page_pool_policy,
+};
+
+struct ynl_policy_attr netdev_page_pool_stats_policy[NETDEV_A_PAGE_POOL_STATS_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_STATS_INFO] = { .name = "info", .type = YNL_PT_NEST, .nest = &netdev_page_pool_info_nest, },
+ [NETDEV_A_PAGE_POOL_STATS_PAD] = { .name = "pad", .type = YNL_PT_IGNORE, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST] = { .name = "alloc-fast", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW] = { .name = "alloc-slow", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER] = { .name = "alloc-slow-high-order", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY] = { .name = "alloc-empty", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL] = { .name = "alloc-refill", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE] = { .name = "alloc-waive", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED] = { .name = "recycle-cached", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL] = { .name = "recycle-cache-full", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING] = { .name = "recycle-ring", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL] = { .name = "recycle-ring-full", .type = YNL_PT_U64, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT] = { .name = "recycle-released-refcnt", .type = YNL_PT_U64, },
+};
+
+struct ynl_policy_nest netdev_page_pool_stats_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_STATS_MAX,
+ .table = netdev_page_pool_stats_policy,
+};
+
/* Common nested types */
+void netdev_page_pool_info_free(struct netdev_page_pool_info *obj)
+{
+}
+
+int netdev_page_pool_info_put(struct nlmsghdr *nlh, unsigned int attr_type,
+ struct netdev_page_pool_info *obj)
+{
+ struct nlattr *nest;
+
+ nest = mnl_attr_nest_start(nlh, attr_type);
+ if (obj->_present.id)
+ mnl_attr_put_u32(nlh, NETDEV_A_PAGE_POOL_ID, obj->id);
+ if (obj->_present.ifindex)
+ mnl_attr_put_u32(nlh, NETDEV_A_PAGE_POOL_IFINDEX, obj->ifindex);
+ mnl_attr_nest_end(nlh, nest);
+
+ return 0;
+}
+
+int netdev_page_pool_info_parse(struct ynl_parse_arg *yarg,
+ const struct nlattr *nested)
+{
+ struct netdev_page_pool_info *dst = yarg->data;
+ const struct nlattr *attr;
+
+ mnl_attr_for_each_nested(attr, nested) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.id = 1;
+ dst->id = mnl_attr_get_u32(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_IFINDEX) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.ifindex = 1;
+ dst->ifindex = mnl_attr_get_u32(attr);
+ }
+ }
+
+ return 0;
+}
+
/* ============== NETDEV_CMD_DEV_GET ============== */
/* NETDEV_CMD_DEV_GET - do */
void netdev_dev_get_req_free(struct netdev_dev_get_req *req)
@@ -178,6 +272,309 @@ void netdev_dev_get_ntf_free(struct netdev_dev_get_ntf *rsp)
free(rsp);
}
+/* ============== NETDEV_CMD_PAGE_POOL_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+void netdev_page_pool_get_req_free(struct netdev_page_pool_get_req *req)
+{
+ free(req);
+}
+
+void netdev_page_pool_get_rsp_free(struct netdev_page_pool_get_rsp *rsp)
+{
+ free(rsp);
+}
+
+int netdev_page_pool_get_rsp_parse(const struct nlmsghdr *nlh, void *data)
+{
+ struct netdev_page_pool_get_rsp *dst;
+ struct ynl_parse_arg *yarg = data;
+ const struct nlattr *attr;
+
+ dst = yarg->data;
+
+ mnl_attr_for_each(attr, nlh, sizeof(struct genlmsghdr)) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.id = 1;
+ dst->id = mnl_attr_get_u32(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_IFINDEX) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.ifindex = 1;
+ dst->ifindex = mnl_attr_get_u32(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_NAPI_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.napi_id = 1;
+ dst->napi_id = mnl_attr_get_u32(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_DESTROYED) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.destroyed = 1;
+ dst->destroyed = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_INFLIGHT) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.inflight = 1;
+ dst->inflight = mnl_attr_get_u32(attr);
+ }
+ }
+
+ return MNL_CB_OK;
+}
+
+struct netdev_page_pool_get_rsp *
+netdev_page_pool_get(struct ynl_sock *ys, struct netdev_page_pool_get_req *req)
+{
+ struct ynl_req_state yrs = { .yarg = { .ys = ys, }, };
+ struct netdev_page_pool_get_rsp *rsp;
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = ynl_gemsg_start_req(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_GET, 1);
+ ys->req_policy = &netdev_page_pool_nest;
+ yrs.yarg.rsp_policy = &netdev_page_pool_nest;
+
+ if (req->_present.id)
+ mnl_attr_put_u32(nlh, NETDEV_A_PAGE_POOL_ID, req->id);
+
+ rsp = calloc(1, sizeof(*rsp));
+ yrs.yarg.data = rsp;
+ yrs.cb = netdev_page_pool_get_rsp_parse;
+ yrs.rsp_cmd = NETDEV_CMD_PAGE_POOL_GET;
+
+ err = ynl_exec(ys, nlh, &yrs);
+ if (err < 0)
+ goto err_free;
+
+ return rsp;
+
+err_free:
+ netdev_page_pool_get_rsp_free(rsp);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_GET - dump */
+void netdev_page_pool_get_list_free(struct netdev_page_pool_get_list *rsp)
+{
+ struct netdev_page_pool_get_list *next = rsp;
+
+ while ((void *)next != YNL_LIST_END) {
+ rsp = next;
+ next = rsp->next;
+
+ free(rsp);
+ }
+}
+
+struct netdev_page_pool_get_list *
+netdev_page_pool_get_dump(struct ynl_sock *ys)
+{
+ struct ynl_dump_state yds = {};
+ struct nlmsghdr *nlh;
+ int err;
+
+ yds.ys = ys;
+ yds.alloc_sz = sizeof(struct netdev_page_pool_get_list);
+ yds.cb = netdev_page_pool_get_rsp_parse;
+ yds.rsp_cmd = NETDEV_CMD_PAGE_POOL_GET;
+ yds.rsp_policy = &netdev_page_pool_nest;
+
+ nlh = ynl_gemsg_start_dump(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_GET, 1);
+
+ err = ynl_exec_dump(ys, nlh, &yds);
+ if (err < 0)
+ goto free_list;
+
+ return yds.first;
+
+free_list:
+ netdev_page_pool_get_list_free(yds.first);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_GET - notify */
+void netdev_page_pool_get_ntf_free(struct netdev_page_pool_get_ntf *rsp)
+{
+ free(rsp);
+}
+
+/* ============== NETDEV_CMD_PAGE_POOL_STATS_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+void
+netdev_page_pool_stats_get_req_free(struct netdev_page_pool_stats_get_req *req)
+{
+ netdev_page_pool_info_free(&req->info);
+ free(req);
+}
+
+void
+netdev_page_pool_stats_get_rsp_free(struct netdev_page_pool_stats_get_rsp *rsp)
+{
+ netdev_page_pool_info_free(&rsp->info);
+ free(rsp);
+}
+
+int netdev_page_pool_stats_get_rsp_parse(const struct nlmsghdr *nlh,
+ void *data)
+{
+ struct netdev_page_pool_stats_get_rsp *dst;
+ struct ynl_parse_arg *yarg = data;
+ const struct nlattr *attr;
+ struct ynl_parse_arg parg;
+
+ dst = yarg->data;
+ parg.ys = yarg->ys;
+
+ mnl_attr_for_each(attr, nlh, sizeof(struct genlmsghdr)) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_STATS_INFO) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.info = 1;
+
+ parg.rsp_policy = &netdev_page_pool_info_nest;
+ parg.data = &dst->info;
+ if (netdev_page_pool_info_parse(&parg, attr))
+ return MNL_CB_ERROR;
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_fast = 1;
+ dst->alloc_fast = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_slow = 1;
+ dst->alloc_slow = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_slow_high_order = 1;
+ dst->alloc_slow_high_order = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_empty = 1;
+ dst->alloc_empty = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_refill = 1;
+ dst->alloc_refill = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_waive = 1;
+ dst->alloc_waive = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_cached = 1;
+ dst->recycle_cached = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_cache_full = 1;
+ dst->recycle_cache_full = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_ring = 1;
+ dst->recycle_ring = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_ring_full = 1;
+ dst->recycle_ring_full = mnl_attr_get_u64(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_released_refcnt = 1;
+ dst->recycle_released_refcnt = mnl_attr_get_u64(attr);
+ }
+ }
+
+ return MNL_CB_OK;
+}
+
+struct netdev_page_pool_stats_get_rsp *
+netdev_page_pool_stats_get(struct ynl_sock *ys,
+ struct netdev_page_pool_stats_get_req *req)
+{
+ struct ynl_req_state yrs = { .yarg = { .ys = ys, }, };
+ struct netdev_page_pool_stats_get_rsp *rsp;
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = ynl_gemsg_start_req(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_STATS_GET, 1);
+ ys->req_policy = &netdev_page_pool_stats_nest;
+ yrs.yarg.rsp_policy = &netdev_page_pool_stats_nest;
+
+ if (req->_present.info)
+ netdev_page_pool_info_put(nlh, NETDEV_A_PAGE_POOL_STATS_INFO, &req->info);
+
+ rsp = calloc(1, sizeof(*rsp));
+ yrs.yarg.data = rsp;
+ yrs.cb = netdev_page_pool_stats_get_rsp_parse;
+ yrs.rsp_cmd = NETDEV_CMD_PAGE_POOL_STATS_GET;
+
+ err = ynl_exec(ys, nlh, &yrs);
+ if (err < 0)
+ goto err_free;
+
+ return rsp;
+
+err_free:
+ netdev_page_pool_stats_get_rsp_free(rsp);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - dump */
+void
+netdev_page_pool_stats_get_list_free(struct netdev_page_pool_stats_get_list *rsp)
+{
+ struct netdev_page_pool_stats_get_list *next = rsp;
+
+ while ((void *)next != YNL_LIST_END) {
+ rsp = next;
+ next = rsp->next;
+
+ netdev_page_pool_info_free(&rsp->obj.info);
+ free(rsp);
+ }
+}
+
+struct netdev_page_pool_stats_get_list *
+netdev_page_pool_stats_get_dump(struct ynl_sock *ys)
+{
+ struct ynl_dump_state yds = {};
+ struct nlmsghdr *nlh;
+ int err;
+
+ yds.ys = ys;
+ yds.alloc_sz = sizeof(struct netdev_page_pool_stats_get_list);
+ yds.cb = netdev_page_pool_stats_get_rsp_parse;
+ yds.rsp_cmd = NETDEV_CMD_PAGE_POOL_STATS_GET;
+ yds.rsp_policy = &netdev_page_pool_stats_nest;
+
+ nlh = ynl_gemsg_start_dump(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_STATS_GET, 1);
+
+ err = ynl_exec_dump(ys, nlh, &yds);
+ if (err < 0)
+ goto free_list;
+
+ return yds.first;
+
+free_list:
+ netdev_page_pool_stats_get_list_free(yds.first);
+ return NULL;
+}
+
static const struct ynl_ntf_info netdev_ntf_info[] = {
[NETDEV_CMD_DEV_ADD_NTF] = {
.alloc_sz = sizeof(struct netdev_dev_get_ntf),
@@ -197,6 +594,24 @@ static const struct ynl_ntf_info netdev_ntf_info[] = {
.policy = &netdev_dev_nest,
.free = (void *)netdev_dev_get_ntf_free,
},
+ [NETDEV_CMD_PAGE_POOL_ADD_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
+ [NETDEV_CMD_PAGE_POOL_DEL_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
+ [NETDEV_CMD_PAGE_POOL_CHANGE_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
};
const struct ynl_family ynl_netdev_family = {
diff --git a/tools/net/ynl/generated/netdev-user.h b/tools/net/ynl/generated/netdev-user.h
index 0952d3261f4d..4462dbe1369b 100644
--- a/tools/net/ynl/generated/netdev-user.h
+++ b/tools/net/ynl/generated/netdev-user.h
@@ -20,6 +20,16 @@ const char *netdev_op_str(int op);
const char *netdev_xdp_act_str(enum netdev_xdp_act value);
/* Common nested types */
+struct netdev_page_pool_info {
+ struct {
+ __u32 id:1;
+ __u32 ifindex:1;
+ } _present;
+
+ __u32 id;
+ __u32 ifindex;
+};
+
/* ============== NETDEV_CMD_DEV_GET ============== */
/* NETDEV_CMD_DEV_GET - do */
struct netdev_dev_get_req {
@@ -84,4 +94,163 @@ struct netdev_dev_get_ntf {
void netdev_dev_get_ntf_free(struct netdev_dev_get_ntf *rsp);
+/* ============== NETDEV_CMD_PAGE_POOL_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+struct netdev_page_pool_get_req {
+ struct {
+ __u32 id:1;
+ } _present;
+
+ __u32 id;
+};
+
+static inline struct netdev_page_pool_get_req *
+netdev_page_pool_get_req_alloc(void)
+{
+ return calloc(1, sizeof(struct netdev_page_pool_get_req));
+}
+void netdev_page_pool_get_req_free(struct netdev_page_pool_get_req *req);
+
+static inline void
+netdev_page_pool_get_req_set_id(struct netdev_page_pool_get_req *req, __u32 id)
+{
+ req->_present.id = 1;
+ req->id = id;
+}
+
+struct netdev_page_pool_get_rsp {
+ struct {
+ __u32 id:1;
+ __u32 ifindex:1;
+ __u32 napi_id:1;
+ __u32 destroyed:1;
+ __u32 inflight:1;
+ } _present;
+
+ __u32 id;
+ __u32 ifindex;
+ __u32 napi_id;
+ __u64 destroyed;
+ __u32 inflight;
+};
+
+void netdev_page_pool_get_rsp_free(struct netdev_page_pool_get_rsp *rsp);
+
+/*
+ * Get / dump information about Page Pools.
+(Only Page Pools associated with a net_device can be listed.)
+
+ */
+struct netdev_page_pool_get_rsp *
+netdev_page_pool_get(struct ynl_sock *ys, struct netdev_page_pool_get_req *req);
+
+/* NETDEV_CMD_PAGE_POOL_GET - dump */
+struct netdev_page_pool_get_list {
+ struct netdev_page_pool_get_list *next;
+ struct netdev_page_pool_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void netdev_page_pool_get_list_free(struct netdev_page_pool_get_list *rsp);
+
+struct netdev_page_pool_get_list *
+netdev_page_pool_get_dump(struct ynl_sock *ys);
+
+/* NETDEV_CMD_PAGE_POOL_GET - notify */
+struct netdev_page_pool_get_ntf {
+ __u16 family;
+ __u8 cmd;
+ struct ynl_ntf_base_type *next;
+ void (*free)(struct netdev_page_pool_get_ntf *ntf);
+ struct netdev_page_pool_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void netdev_page_pool_get_ntf_free(struct netdev_page_pool_get_ntf *rsp);
+
+/* ============== NETDEV_CMD_PAGE_POOL_STATS_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+struct netdev_page_pool_stats_get_req {
+ struct {
+ __u32 info:1;
+ } _present;
+
+ struct netdev_page_pool_info info;
+};
+
+static inline struct netdev_page_pool_stats_get_req *
+netdev_page_pool_stats_get_req_alloc(void)
+{
+ return calloc(1, sizeof(struct netdev_page_pool_stats_get_req));
+}
+void
+netdev_page_pool_stats_get_req_free(struct netdev_page_pool_stats_get_req *req);
+
+static inline void
+netdev_page_pool_stats_get_req_set_info_id(struct netdev_page_pool_stats_get_req *req,
+ __u32 id)
+{
+ req->_present.info = 1;
+ req->info._present.id = 1;
+ req->info.id = id;
+}
+static inline void
+netdev_page_pool_stats_get_req_set_info_ifindex(struct netdev_page_pool_stats_get_req *req,
+ __u32 ifindex)
+{
+ req->_present.info = 1;
+ req->info._present.ifindex = 1;
+ req->info.ifindex = ifindex;
+}
+
+struct netdev_page_pool_stats_get_rsp {
+ struct {
+ __u32 info:1;
+ __u32 alloc_fast:1;
+ __u32 alloc_slow:1;
+ __u32 alloc_slow_high_order:1;
+ __u32 alloc_empty:1;
+ __u32 alloc_refill:1;
+ __u32 alloc_waive:1;
+ __u32 recycle_cached:1;
+ __u32 recycle_cache_full:1;
+ __u32 recycle_ring:1;
+ __u32 recycle_ring_full:1;
+ __u32 recycle_released_refcnt:1;
+ } _present;
+
+ struct netdev_page_pool_info info;
+ __u64 alloc_fast;
+ __u64 alloc_slow;
+ __u64 alloc_slow_high_order;
+ __u64 alloc_empty;
+ __u64 alloc_refill;
+ __u64 alloc_waive;
+ __u64 recycle_cached;
+ __u64 recycle_cache_full;
+ __u64 recycle_ring;
+ __u64 recycle_ring_full;
+ __u64 recycle_released_refcnt;
+};
+
+void
+netdev_page_pool_stats_get_rsp_free(struct netdev_page_pool_stats_get_rsp *rsp);
+
+/*
+ * Get page pool statistics.
+ */
+struct netdev_page_pool_stats_get_rsp *
+netdev_page_pool_stats_get(struct ynl_sock *ys,
+ struct netdev_page_pool_stats_get_req *req);
+
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - dump */
+struct netdev_page_pool_stats_get_list {
+ struct netdev_page_pool_stats_get_list *next;
+ struct netdev_page_pool_stats_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void
+netdev_page_pool_stats_get_list_free(struct netdev_page_pool_stats_get_list *rsp);
+
+struct netdev_page_pool_stats_get_list *
+netdev_page_pool_stats_get_dump(struct ynl_sock *ys);
+
#endif /* _LINUX_NETDEV_GEN_H */
--
2.41.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [RFC net-next 05/13] net: page_pool: record pools per netdev
2023-08-16 23:42 ` [RFC net-next 05/13] net: page_pool: record pools per netdev Jakub Kicinski
@ 2023-08-17 7:26 ` Simon Horman
2023-08-17 16:22 ` Jakub Kicinski
0 siblings, 1 reply; 29+ messages in thread
From: Simon Horman @ 2023-08-17 7:26 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina
On Wed, Aug 16, 2023 at 04:42:54PM -0700, Jakub Kicinski wrote:
> Link the page pools with netdevs. This needs to be netns compatible
> so we have two options. Either we record the pools per netns and
> have to worry about moving them as the netdev gets moved.
> Or we record them directly on the netdev so they move with the netdev
> without any extra work.
>
> Implement the latter option. Since pools may outlast netdev we need
> a place to store orphans. In time honored tradition use loopback
> for this purpose.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
...
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
...
> +static void page_pool_unreg_netdev(struct net_device *netdev)
> +{
> + struct page_pool *pool, *last;
> + struct net_device *lo;
> +
> + lo = __dev_get_by_index(dev_net(netdev), 1);
> + if (!lo) {
> + netdev_err_once(netdev,
> + "can't get lo to store orphan page pools\n");
> + page_pool_unreg_netdev_wipe(netdev);
> + return;
> + }
> +
> + mutex_lock(&page_pools_lock);
> + hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
> + pool->slow.netdev = lo;
> + last = pool;
> + }
> +
> + hlist_splice_init(&netdev->page_pools, &last->user.list,
Hi Jakub.
I'm not sure if it is possible, but if the hlist loop above iterates zero
times then last will be uninitialised here.
Flagged by Smatch.
> + &lo->page_pools);
> + mutex_unlock(&page_pools_lock);
> +}
...
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 03/13] net: page_pool: factor out uninit
2023-08-16 23:42 ` [RFC net-next 03/13] net: page_pool: factor out uninit Jakub Kicinski
@ 2023-08-17 7:40 ` Ilias Apalodimas
2023-08-17 16:25 ` Jakub Kicinski
0 siblings, 1 reply; 29+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 7:40 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, hawk, aleksander.lobakin, linyunsheng, almasrymina
Hi Jakub,
[...]
> }
>
> +static void page_pool_uninit(struct page_pool *pool)
> +{
> + ptr_ring_cleanup(&pool->ring, NULL);
> +
> + if (pool->p.flags & PP_FLAG_DMA_MAP)
> + put_device(pool->p.dev);
> +
> +#ifdef CONFIG_PAGE_POOL_STATS
> + free_percpu(pool->recycle_stats);
> +#endif
> +}
I am not sure I am following the reasoning here. The only extra thing
page_pool_free() does is disconnect the pool. So I assume no one will
call page_pool_uninit() directly. Do you expect page_pool_free() to
grow in the future, so factoring out the uninit makes the code easier
to read?
Thanks
/Ilias
> +
> /**
> * page_pool_create() - create a page pool.
> * @params: parameters, see struct page_pool_params
> @@ -806,14 +818,7 @@ static void page_pool_free(struct page_pool *pool)
> if (pool->disconnect)
> pool->disconnect(pool);
>
> - ptr_ring_cleanup(&pool->ring, NULL);
> -
> - if (pool->p.flags & PP_FLAG_DMA_MAP)
> - put_device(pool->p.dev);
> -
> -#ifdef CONFIG_PAGE_POOL_STATS
> - free_percpu(pool->recycle_stats);
> -#endif
> + page_pool_uninit(pool);
> kfree(pool);
> }
>
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
@ 2023-08-17 9:31 ` Jesper Dangaard Brouer
2023-08-17 9:35 ` Ilias Apalodimas
2023-08-17 21:28 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-17 9:31 UTC (permalink / raw)
To: Jakub Kicinski, netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina
On 17/08/2023 01.42, Jakub Kicinski wrote:
> struct page_pool is rather performance critical and we use
> 16B of the first cache line to store 2 pointers used only
> by test code. Future patches will add more informational
> (non-fast path) attributes.
>
> It's convenient for the user of the API to not have to worry
> which fields are fast and which are slow path. Use struct
> groups to split the params into the two categories internally.
>
> Signed-off-by: Jakub Kicinski<kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 31 +++++++++++++++++++------------
> net/core/page_pool.c | 7 ++++---
> 2 files changed, 23 insertions(+), 15 deletions(-)
With changes in 2/13 where you add a bool to avoid accessing "slow":
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
@ 2023-08-17 9:32 ` Jesper Dangaard Brouer
2023-08-17 9:38 ` Ilias Apalodimas
2023-08-17 21:31 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-17 9:32 UTC (permalink / raw)
To: Jakub Kicinski, netdev
Cc: hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina
On 17/08/2023 01.42, Jakub Kicinski wrote:
> To fully benefit from previous commit add one byte of state
> in the first cache line recording if we need to look at
> the slow part.
>
> The packing isn't all that impressive right now, we create
> a 7B hole. I'm expecting Olek's rework will reshuffle this,
> anyway.
>
> Signed-off-by: Jakub Kicinski<kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 2 ++
> net/core/page_pool.c | 4 +++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
As followup to 1/13 LGTM
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
2023-08-17 9:31 ` Jesper Dangaard Brouer
@ 2023-08-17 9:35 ` Ilias Apalodimas
2023-08-17 21:28 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 9:35 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, hawk, aleksander.lobakin, linyunsheng, almasrymina
Hi Jakub,
On Thu, 17 Aug 2023 at 02:43, Jakub Kicinski <kuba@kernel.org> wrote:
>
> struct page_pool is rather performance critical and we use
> 16B of the first cache line to store 2 pointers used only
> by test code. Future patches will add more informational
> (non-fast path) attributes.
>
> It's convenient for the user of the API to not have to worry
> which fields are fast and which are slow path. Use struct
> groups to split the params into the two categories internally.
LGTM and valuable, since we've been struggling to explain where new
variables should be placed, in order to affect the cache line
placement as little as possible.
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 31 +++++++++++++++++++------------
> net/core/page_pool.c | 7 ++++---
> 2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 887e7946a597..1c16b95de62f 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -56,18 +56,22 @@ struct pp_alloc_cache {
> * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
> */
> struct page_pool_params {
> - unsigned int flags;
> - unsigned int order;
> - unsigned int pool_size;
> - int nid;
> - struct device *dev;
> - struct napi_struct *napi;
> - enum dma_data_direction dma_dir;
> - unsigned int max_len;
> - unsigned int offset;
> + struct_group_tagged(page_pool_params_fast, fast,
> + unsigned int flags;
> + unsigned int order;
> + unsigned int pool_size;
> + int nid;
> + struct device *dev;
> + struct napi_struct *napi;
> + enum dma_data_direction dma_dir;
> + unsigned int max_len;
> + unsigned int offset;
> + );
> + struct_group_tagged(page_pool_params_slow, slow,
> /* private: used by test code only */
> - void (*init_callback)(struct page *page, void *arg);
> - void *init_arg;
> + void (*init_callback)(struct page *page, void *arg);
> + void *init_arg;
> + );
> };
>
> #ifdef CONFIG_PAGE_POOL_STATS
> @@ -121,7 +125,7 @@ struct page_pool_stats {
> #endif
>
> struct page_pool {
> - struct page_pool_params p;
> + struct page_pool_params_fast p;
>
> long frag_users;
> struct page *frag_page;
> @@ -180,6 +184,9 @@ struct page_pool {
> refcount_t user_cnt;
>
> u64 destroy_cnt;
> +
> + /* Slow/Control-path information follows */
> + struct page_pool_params_slow slow;
> };
>
> struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..ffe7782d7fc0 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -173,7 +173,8 @@ static int page_pool_init(struct page_pool *pool,
> {
> unsigned int ring_qsize = 1024; /* Default */
>
> - memcpy(&pool->p, params, sizeof(pool->p));
> + memcpy(&pool->p, ¶ms->fast, sizeof(pool->p));
> + memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow));
>
> /* Validate only known flags were used */
> if (pool->p.flags & ~(PP_FLAG_ALL))
> @@ -372,8 +373,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> {
> page->pp = pool;
> page->pp_magic |= PP_SIGNATURE;
> - if (pool->p.init_callback)
> - pool->p.init_callback(page, pool->p.init_arg);
> + if (pool->slow.init_callback)
> + pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> static void page_pool_clear_pp_info(struct page *page)
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
2023-08-17 9:32 ` Jesper Dangaard Brouer
@ 2023-08-17 9:38 ` Ilias Apalodimas
2023-08-17 21:31 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 9:38 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, hawk, aleksander.lobakin, linyunsheng, almasrymina
On Thu, 17 Aug 2023 at 02:43, Jakub Kicinski <kuba@kernel.org> wrote:
>
> To fully benefit from previous commit add one byte of state
> in the first cache line recording if we need to look at
> the slow part.
>
> The packing isn't all that impressive right now, we create
> a 7B hole. I'm expecting Olek's rework will reshuffle this,
> anyway.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 2 ++
> net/core/page_pool.c | 4 +++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 1c16b95de62f..1ac7ce25fbd4 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -127,6 +127,8 @@ struct page_pool_stats {
> struct page_pool {
> struct page_pool_params_fast p;
>
> + bool has_init_callback;
> +
> long frag_users;
> struct page *frag_page;
> unsigned int frag_offset;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ffe7782d7fc0..2c14445a353a 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -216,6 +216,8 @@ static int page_pool_init(struct page_pool *pool,
> pool->p.flags & PP_FLAG_PAGE_FRAG)
> return -EINVAL;
>
> + pool->has_init_callback = !!pool->slow.init_callback;
> +
> #ifdef CONFIG_PAGE_POOL_STATS
> pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
> if (!pool->recycle_stats)
> @@ -373,7 +375,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> {
> page->pp = pool;
> page->pp_magic |= PP_SIGNATURE;
> - if (pool->slow.init_callback)
> + if (pool->has_init_callback)
> pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> --
> 2.41.0
>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 05/13] net: page_pool: record pools per netdev
2023-08-17 7:26 ` Simon Horman
@ 2023-08-17 16:22 ` Jakub Kicinski
0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-17 16:22 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
almasrymina
On Thu, 17 Aug 2023 09:26:09 +0200 Simon Horman wrote:
> I'm not sure if it is possible, but if the hlist loop above iterates zero
> times then last will be uninitialised here.
>
> Flagged by Smatch.
Hm, the caller checks if the list is empty but there may be a race
condition since we don't hold the lock, yet. Thanks for flagging!
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 03/13] net: page_pool: factor out uninit
2023-08-17 7:40 ` Ilias Apalodimas
@ 2023-08-17 16:25 ` Jakub Kicinski
2023-08-17 16:53 ` Ilias Apalodimas
0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-17 16:25 UTC (permalink / raw)
To: Ilias Apalodimas
Cc: netdev, hawk, aleksander.lobakin, linyunsheng, almasrymina
On Thu, 17 Aug 2023 10:40:09 +0300 Ilias Apalodimas wrote:
> > +static void page_pool_uninit(struct page_pool *pool)
> > +{
> > + ptr_ring_cleanup(&pool->ring, NULL);
> > +
> > + if (pool->p.flags & PP_FLAG_DMA_MAP)
> > + put_device(pool->p.dev);
> > +
> > +#ifdef CONFIG_PAGE_POOL_STATS
> > + free_percpu(pool->recycle_stats);
> > +#endif
> > +}
>
> I am not sure I am following the reasoning here. The only extra thing
> page_pool_free() does is disconnect the pool. So I assume no one will
> call page_pool_uninit() directly. Do you expect page_pool_free() to
> grow in the future, so factoring out the uninit makes the code easier
> to read?
I'm calling it from the unwind patch of page_pool_create() in the next
patch, because I'm adding another setup state after page_pool_init().
I can't put the free into _uninit() because on the unwind path of
_create() that's an individual step.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 03/13] net: page_pool: factor out uninit
2023-08-17 16:25 ` Jakub Kicinski
@ 2023-08-17 16:53 ` Ilias Apalodimas
0 siblings, 0 replies; 29+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 16:53 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, hawk, aleksander.lobakin, linyunsheng, almasrymina
On Thu, 17 Aug 2023 at 19:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 17 Aug 2023 10:40:09 +0300 Ilias Apalodimas wrote:
> > > +static void page_pool_uninit(struct page_pool *pool)
> > > +{
> > > + ptr_ring_cleanup(&pool->ring, NULL);
> > > +
> > > + if (pool->p.flags & PP_FLAG_DMA_MAP)
> > > + put_device(pool->p.dev);
> > > +
> > > +#ifdef CONFIG_PAGE_POOL_STATS
> > > + free_percpu(pool->recycle_stats);
> > > +#endif
> > > +}
> >
> > I am not sure I am following the reasoning here. The only extra thing
> > page_pool_free() does is disconnect the pool. So I assume no one will
> > call page_pool_uninit() directly. Do you expect page_pool_free() to
> > grow in the future, so factoring out the uninit makes the code easier
> > to read?
>
> I'm calling it from the unwind patch of page_pool_create() in the next
> patch, because I'm adding another setup state after page_pool_init().
> I can't put the free into _uninit() because on the unwind path of
> _create() that's an individual step.
Yea fair enough, I went through that patch a few minutes ago, so this
one makes sense
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 00/13] net: page_pool: add netlink-based introspection
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
` (12 preceding siblings ...)
2023-08-16 23:43 ` [RFC net-next 13/13] tools: netdev: regen after page pool changes Jakub Kicinski
@ 2023-08-17 21:21 ` Mina Almasry
2023-08-18 0:13 ` Jakub Kicinski
13 siblings, 1 reply; 29+ messages in thread
From: Mina Almasry @ 2023-08-17 21:21 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
Willem de Bruijn, Praveen Kaligineedi
On Wed, Aug 16, 2023 at 4:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> As the page pool functionality grows we will be increasingly
> constrained by the lack of any uAPI. This series is a first
> step towards such a uAPI, it creates a way for users to query
> information about page pools and associated statistics.
>
> I'm purposefully exposing only the information which I'd
> find useful when monitoring production workloads.
>
> For the SET part (which to be clear isn't addressed by this
> series at all) I think we'll need to turn to something more
> along the lines of "netdev-level policy". Instead of configuring
> page pools, which are somewhat ephemeral, and hard to change
> at runtime, we should set the "expected page pool parameters"
> at the netdev level, and have the driver consult those when
> instantiating pools. My ramblings from yesterday about Queue API
> may make this more or less clear...
> https://lore.kernel.org/all/20230815171638.4c057dcd@kernel.org/
>
The patches themselves look good to me, and I'll provide Reviewed-by
for the ones I feel I understand well enough to review, but I'm a bit
unsure about exposing specifically page_pool stats to the user. Isn't
it better to expose rx-queue stats (slightly more generic) to the
user? In my mind, the advantages:
- we could implement better SET apis that allocate, delete, or
reconfigure rx-queues. APIs that allocate or delete page-pool make
less sense semantically maybe? The page-pool doesn't decide to
instantiate itself.
- The api can be extended for non-page-pool stats (per rx-queue
dropped packets maybe, or something like that).
- The api maybe can be extended to non-page-pool drivers. The driver
may be able to implement their own function to provide equivalent
stats (although this one may not be that important).
- rx-queue GET API fits in nicely with what you described yesterday
[1]. At the moment I'm a bit unsure because the SET api you described
yesterday sounded per-rx-queue to me. But the GET api here is
per-page-pool based. Maybe the distinction doesn't matter? Maybe
you're thinking they're unrelated APIs?
[1] https://lore.kernel.org/all/20230815171638.4c057dcd@kernel.org/
> Jakub Kicinski (13):
> net: page_pool: split the page_pool_params into fast and slow
> net: page_pool: avoid touching slow on the fastpath
> net: page_pool: factor out uninit
> net: page_pool: id the page pools
> net: page_pool: record pools per netdev
> net: page_pool: stash the NAPI ID for easier access
> eth: link netdev to pp
> net: page_pool: add nlspec for basic access to page pools
> net: page_pool: implement GET in the netlink API
> net: page_pool: add netlink notifications for state changes
> net: page_pool: report when page pool was destroyed
> net: page_pool: expose page pool stats via netlink
> tools: netdev: regen after page pool changes
>
> Documentation/netlink/specs/netdev.yaml | 158 +++++++
> Documentation/networking/page_pool.rst | 5 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
> drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
> include/linux/list.h | 20 +
> include/linux/netdevice.h | 4 +
> include/net/page_pool/helpers.h | 8 +-
> include/net/page_pool/types.h | 43 +-
> include/uapi/linux/netdev.h | 37 ++
> net/core/Makefile | 2 +-
> net/core/netdev-genl-gen.c | 41 ++
> net/core/netdev-genl-gen.h | 11 +
> net/core/page_pool.c | 56 ++-
> net/core/page_pool_priv.h | 12 +
> net/core/page_pool_user.c | 409 +++++++++++++++++
> tools/include/uapi/linux/netdev.h | 37 ++
> tools/net/ynl/generated/netdev-user.c | 415 ++++++++++++++++++
> tools/net/ynl/generated/netdev-user.h | 169 +++++++
> 19 files changed, 1391 insertions(+), 39 deletions(-)
> create mode 100644 net/core/page_pool_priv.h
> create mode 100644 net/core/page_pool_user.c
>
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
2023-08-17 9:31 ` Jesper Dangaard Brouer
2023-08-17 9:35 ` Ilias Apalodimas
@ 2023-08-17 21:28 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Mina Almasry @ 2023-08-17 21:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng
On Wed, Aug 16, 2023 at 4:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> struct page_pool is rather performance critical and we use
> 16B of the first cache line to store 2 pointers used only
> by test code. Future patches will add more informational
> (non-fast path) attributes.
>
> It's convenient for the user of the API to not have to worry
> which fields are fast and which are slow path. Use struct
> groups to split the params into the two categories internally.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> include/net/page_pool/types.h | 31 +++++++++++++++++++------------
> net/core/page_pool.c | 7 ++++---
> 2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 887e7946a597..1c16b95de62f 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -56,18 +56,22 @@ struct pp_alloc_cache {
> * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
> */
> struct page_pool_params {
> - unsigned int flags;
> - unsigned int order;
> - unsigned int pool_size;
> - int nid;
> - struct device *dev;
> - struct napi_struct *napi;
> - enum dma_data_direction dma_dir;
> - unsigned int max_len;
> - unsigned int offset;
> + struct_group_tagged(page_pool_params_fast, fast,
> + unsigned int flags;
> + unsigned int order;
> + unsigned int pool_size;
> + int nid;
> + struct device *dev;
> + struct napi_struct *napi;
> + enum dma_data_direction dma_dir;
> + unsigned int max_len;
> + unsigned int offset;
> + );
> + struct_group_tagged(page_pool_params_slow, slow,
> /* private: used by test code only */
> - void (*init_callback)(struct page *page, void *arg);
> - void *init_arg;
> + void (*init_callback)(struct page *page, void *arg);
> + void *init_arg;
> + );
> };
>
> #ifdef CONFIG_PAGE_POOL_STATS
> @@ -121,7 +125,7 @@ struct page_pool_stats {
> #endif
>
> struct page_pool {
> - struct page_pool_params p;
> + struct page_pool_params_fast p;
>
> long frag_users;
> struct page *frag_page;
> @@ -180,6 +184,9 @@ struct page_pool {
> refcount_t user_cnt;
>
> u64 destroy_cnt;
> +
> + /* Slow/Control-path information follows */
> + struct page_pool_params_slow slow;
> };
>
> struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..ffe7782d7fc0 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -173,7 +173,8 @@ static int page_pool_init(struct page_pool *pool,
> {
> unsigned int ring_qsize = 1024; /* Default */
>
> - memcpy(&pool->p, params, sizeof(pool->p));
> + memcpy(&pool->p, ¶ms->fast, sizeof(pool->p));
> + memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow));
>
> /* Validate only known flags were used */
> if (pool->p.flags & ~(PP_FLAG_ALL))
> @@ -372,8 +373,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> {
> page->pp = pool;
> page->pp_magic |= PP_SIGNATURE;
> - if (pool->p.init_callback)
> - pool->p.init_callback(page, pool->p.init_arg);
> + if (pool->slow.init_callback)
> + pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> static void page_pool_clear_pp_info(struct page *page)
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
2023-08-17 9:32 ` Jesper Dangaard Brouer
2023-08-17 9:38 ` Ilias Apalodimas
@ 2023-08-17 21:31 ` Mina Almasry
2 siblings, 0 replies; 29+ messages in thread
From: Mina Almasry @ 2023-08-17 21:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng
On Wed, Aug 16, 2023 at 4:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> To fully benefit from previous commit add one byte of state
> in the first cache line recording if we need to look at
> the slow part.
>
> The packing isn't all that impressive right now, we create
> a 7B hole. I'm expecting Olek's rework will reshuffle this,
> anyway.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 2 ++
> net/core/page_pool.c | 4 +++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 1c16b95de62f..1ac7ce25fbd4 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -127,6 +127,8 @@ struct page_pool_stats {
> struct page_pool {
> struct page_pool_params_fast p;
>
> + bool has_init_callback;
> +
Nit: I wonder if it's slightly more appropriate for this to be
has_slow, given how I understand the intent of usage, but it doesn't
really matter. Either way,:
Reviewed-by: Mina Almasry <almasrymina@google.com>
> long frag_users;
> struct page *frag_page;
> unsigned int frag_offset;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ffe7782d7fc0..2c14445a353a 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -216,6 +216,8 @@ static int page_pool_init(struct page_pool *pool,
> pool->p.flags & PP_FLAG_PAGE_FRAG)
> return -EINVAL;
>
> + pool->has_init_callback = !!pool->slow.init_callback;
> +
> #ifdef CONFIG_PAGE_POOL_STATS
> pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
> if (!pool->recycle_stats)
> @@ -373,7 +375,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> {
> page->pp = pool;
> page->pp_magic |= PP_SIGNATURE;
> - if (pool->slow.init_callback)
> + if (pool->has_init_callback)
> pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 04/13] net: page_pool: id the page pools
2023-08-16 23:42 ` [RFC net-next 04/13] net: page_pool: id the page pools Jakub Kicinski
@ 2023-08-17 21:56 ` Mina Almasry
2023-08-18 0:08 ` Jakub Kicinski
0 siblings, 1 reply; 29+ messages in thread
From: Mina Almasry @ 2023-08-17 21:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng
On Wed, Aug 16, 2023 at 4:43 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> To give ourselves the flexibility of creating netlink commands
> and ability to refer to page pool instances in uAPIs create
> IDs for page pools.
>
Sorry maybe I'm missing something, but it's a bit curious to me that
this ID is needed. An rx queue can only ever have 1 page-pool
associated with it at a time, right? Could you instead add a pointer
to the page_pool into 'struct netdev_rx_queue', and then page-pool can
be referred to by the netdev id & the rx-queue number? Wouldn't that
make the implementation much simpler? I also believe the userspace
refers to the rx-queue by its index number for the ethtool APIs like
adding flow steering rules, so extending that to here maybe makes
sense.
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 4 ++++
> net/core/Makefile | 2 +-
> net/core/page_pool.c | 21 +++++++++++++++-----
> net/core/page_pool_priv.h | 9 +++++++++
> net/core/page_pool_user.c | 36 +++++++++++++++++++++++++++++++++++
> 5 files changed, 66 insertions(+), 6 deletions(-)
> create mode 100644 net/core/page_pool_priv.h
> create mode 100644 net/core/page_pool_user.c
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 1ac7ce25fbd4..9fadf15dadfa 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -189,6 +189,10 @@ struct page_pool {
>
> /* Slow/Control-path information follows */
> struct page_pool_params_slow slow;
> + /* User-facing fields, protected by page_pools_lock */
> + struct {
> + u32 id;
> + } user;
> };
>
> struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 731db2eaa610..4ae3d83f67d5 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -18,7 +18,7 @@ obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
> obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
>
> obj-y += net-sysfs.o
> -obj-$(CONFIG_PAGE_POOL) += page_pool.o
> +obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
> obj-$(CONFIG_PROC_FS) += net-procfs.o
> obj-$(CONFIG_NET_PKTGEN) += pktgen.o
> obj-$(CONFIG_NETPOLL) += netpoll.o
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 8e71e116224d..de199c356043 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -23,6 +23,8 @@
>
> #include <trace/events/page_pool.h>
>
> +#include "page_pool_priv.h"
> +
> #define DEFER_TIME (msecs_to_jiffies(1000))
> #define DEFER_WARN_INTERVAL (60 * HZ)
>
> @@ -264,13 +266,21 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
> return ERR_PTR(-ENOMEM);
>
> err = page_pool_init(pool, params);
> - if (err < 0) {
> - pr_warn("%s() gave up with errno %d\n", __func__, err);
> - kfree(pool);
> - return ERR_PTR(err);
> - }
> + if (err < 0)
> + goto err_free;
> +
> + err = page_pool_list(pool);
> + if (err)
> + goto err_uninit;
>
> return pool;
> +
> +err_uninit:
> + page_pool_uninit(pool);
> +err_free:
> + pr_warn("%s() gave up with errno %d\n", __func__, err);
> + kfree(pool);
> + return ERR_PTR(err);
> }
> EXPORT_SYMBOL(page_pool_create);
>
> @@ -818,6 +828,7 @@ static void page_pool_free(struct page_pool *pool)
> if (pool->disconnect)
> pool->disconnect(pool);
>
> + page_pool_unlist(pool);
> page_pool_uninit(pool);
> kfree(pool);
> }
> diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
> new file mode 100644
> index 000000000000..6c4e4aeed02a
> --- /dev/null
> +++ b/net/core/page_pool_priv.h
> @@ -0,0 +1,9 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#ifndef __PAGE_POOL_PRIV_H
> +#define __PAGE_POOL_PRIV_H
> +
> +int page_pool_list(struct page_pool *pool);
> +void page_pool_unlist(struct page_pool *pool);
> +
> +#endif
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> new file mode 100644
> index 000000000000..af4ac38a2de1
> --- /dev/null
> +++ b/net/core/page_pool_user.c
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/mutex.h>
> +#include <linux/xarray.h>
> +#include <net/page_pool/types.h>
> +
> +#include "page_pool_priv.h"
> +
> +static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
> +static DEFINE_MUTEX(page_pools_lock);
> +
> +int page_pool_list(struct page_pool *pool)
> +{
> + static u32 id_alloc_next;
> + int err;
> +
> + mutex_lock(&page_pools_lock);
> + err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> + &id_alloc_next, GFP_KERNEL);
> + if (err < 0)
> + goto err_unlock;
> +
> + mutex_unlock(&page_pools_lock);
> + return 0;
> +
> +err_unlock:
> + mutex_unlock(&page_pools_lock);
> + return err;
> +}
> +
> +void page_pool_unlist(struct page_pool *pool)
> +{
> + mutex_lock(&page_pools_lock);
> + xa_erase(&page_pools, pool->user.id);
> + mutex_unlock(&page_pools_lock);
> +}
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 04/13] net: page_pool: id the page pools
2023-08-17 21:56 ` Mina Almasry
@ 2023-08-18 0:08 ` Jakub Kicinski
0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-18 0:08 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng
On Thu, 17 Aug 2023 14:56:01 -0700 Mina Almasry wrote:
> Sorry maybe I'm missing something, but it's a bit curious to me that
> this ID is needed.
Right.. now I realize that I haven't explained the queue side.
I expect a similar API will be added to dump queue info, and then page
pool ID will be added to the queue object, to link which queue is being
fed from which page pool. I guess I should have said that in the cover
letter...
> An rx queue can only ever have 1 page-pool
> associated with it at a time, right? Could you instead add a pointer
> to the page_pool into 'struct netdev_rx_queue',
100% my intention, which is why I moved the location of that struct
recently :)
> and then page-pool can be referred to by the netdev id & the rx-queue
> number?
And use/type (headers vs payloads). And then nothing stops a driver from
using one page pool for multiple queues of the same type within a NAPI.
And then the page pools can die and linger (see patch 11).
> Wouldn't that make the implementation much simpler? I also believe
> the userspace refers to the rx-queue by its index number for the
> ethtool APIs like adding flow steering rules, so extending that to
> here maybe makes sense.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [RFC net-next 00/13] net: page_pool: add netlink-based introspection
2023-08-17 21:21 ` [RFC net-next 00/13] net: page_pool: add netlink-based introspection Mina Almasry
@ 2023-08-18 0:13 ` Jakub Kicinski
0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2023-08-18 0:13 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, hawk, ilias.apalodimas, aleksander.lobakin, linyunsheng,
Willem de Bruijn, Praveen Kaligineedi
On Thu, 17 Aug 2023 14:21:26 -0700 Mina Almasry wrote:
> - rx-queue GET API fits in nicely with what you described yesterday
> [1]. At the moment I'm a bit unsure because the SET api you described
> yesterday sounded per-rx-queue to me. But the GET api here is
> per-page-pool based. Maybe the distinction doesn't matter? Maybe
> you're thinking they're unrelated APIs?
Right, they aren't unrelated but I do somehow prefer the more flexible
model where each object type has its own API and the objects can refer
to each other.
I think there's too much of a risk that the mapping between page pools
to queues will become n:m and then having them tied together will be
a problem.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2023-08-18 0:13 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-16 23:42 [RFC net-next 00/13] net: page_pool: add netlink-based introspection Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 01/13] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
2023-08-17 9:31 ` Jesper Dangaard Brouer
2023-08-17 9:35 ` Ilias Apalodimas
2023-08-17 21:28 ` Mina Almasry
2023-08-16 23:42 ` [RFC net-next 02/13] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
2023-08-17 9:32 ` Jesper Dangaard Brouer
2023-08-17 9:38 ` Ilias Apalodimas
2023-08-17 21:31 ` Mina Almasry
2023-08-16 23:42 ` [RFC net-next 03/13] net: page_pool: factor out uninit Jakub Kicinski
2023-08-17 7:40 ` Ilias Apalodimas
2023-08-17 16:25 ` Jakub Kicinski
2023-08-17 16:53 ` Ilias Apalodimas
2023-08-16 23:42 ` [RFC net-next 04/13] net: page_pool: id the page pools Jakub Kicinski
2023-08-17 21:56 ` Mina Almasry
2023-08-18 0:08 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 05/13] net: page_pool: record pools per netdev Jakub Kicinski
2023-08-17 7:26 ` Simon Horman
2023-08-17 16:22 ` Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 06/13] net: page_pool: stash the NAPI ID for easier access Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 07/13] eth: link netdev to pp Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 08/13] net: page_pool: add nlspec for basic access to page pools Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 09/13] net: page_pool: implement GET in the netlink API Jakub Kicinski
2023-08-16 23:42 ` [RFC net-next 10/13] net: page_pool: add netlink notifications for state changes Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 11/13] net: page_pool: report when page pool was destroyed Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 12/13] net: page_pool: expose page pool stats via netlink Jakub Kicinski
2023-08-16 23:43 ` [RFC net-next 13/13] tools: netdev: regen after page pool changes Jakub Kicinski
2023-08-17 21:21 ` [RFC net-next 00/13] net: page_pool: add netlink-based introspection Mina Almasry
2023-08-18 0:13 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).