* [PATCH net-next 01/15] net: page_pool: split the page_pool_params into fast and slow
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-11-09 8:13 ` Ilias Apalodimas
2023-10-24 16:02 ` [PATCH net-next 02/15] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
` (14 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
struct page_pool is rather performance critical and we use
16B of the first cache line to store 2 pointers used only
by test code. Future patches will add more informational
(non-fast path) attributes.
It's convenient for the user of the API to not have to worry
which fields are fast and which are slow path. Use struct
groups to split the params into the two categories internally.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 31 +++++++++++++++++++------------
net/core/page_pool.c | 7 ++++---
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 6fc5134095ed..23950fcc4eca 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -54,18 +54,22 @@ struct pp_alloc_cache {
* @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
*/
struct page_pool_params {
- unsigned int flags;
- unsigned int order;
- unsigned int pool_size;
- int nid;
- struct device *dev;
- struct napi_struct *napi;
- enum dma_data_direction dma_dir;
- unsigned int max_len;
- unsigned int offset;
+ struct_group_tagged(page_pool_params_fast, fast,
+ unsigned int flags;
+ unsigned int order;
+ unsigned int pool_size;
+ int nid;
+ struct device *dev;
+ struct napi_struct *napi;
+ enum dma_data_direction dma_dir;
+ unsigned int max_len;
+ unsigned int offset;
+ );
+ struct_group_tagged(page_pool_params_slow, slow,
/* private: used by test code only */
- void (*init_callback)(struct page *page, void *arg);
- void *init_arg;
+ void (*init_callback)(struct page *page, void *arg);
+ void *init_arg;
+ );
};
#ifdef CONFIG_PAGE_POOL_STATS
@@ -119,7 +123,7 @@ struct page_pool_stats {
#endif
struct page_pool {
- struct page_pool_params p;
+ struct page_pool_params_fast p;
long frag_users;
struct page *frag_page;
@@ -178,6 +182,9 @@ struct page_pool {
refcount_t user_cnt;
u64 destroy_cnt;
+
+ /* Slow/Control-path information follows */
+ struct page_pool_params_slow slow;
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5e409b98aba0..5cae413de7cc 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -173,7 +173,8 @@ static int page_pool_init(struct page_pool *pool,
{
unsigned int ring_qsize = 1024; /* Default */
- memcpy(&pool->p, params, sizeof(pool->p));
+ memcpy(&pool->p, ¶ms->fast, sizeof(pool->p));
+ memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow));
/* Validate only known flags were used */
if (pool->p.flags & ~(PP_FLAG_ALL))
@@ -384,8 +385,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
* the overhead is negligible.
*/
page_pool_fragment_page(page, 1);
- if (pool->p.init_callback)
- pool->p.init_callback(page, pool->p.init_arg);
+ if (pool->slow.init_callback)
+ pool->slow.init_callback(page, pool->slow.init_arg);
}
static void page_pool_clear_pp_info(struct page *page)
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 01/15] net: page_pool: split the page_pool_params into fast and slow
2023-10-24 16:02 ` [PATCH net-next 01/15] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
@ 2023-11-09 8:13 ` Ilias Apalodimas
0 siblings, 0 replies; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 8:13 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Tue, 24 Oct 2023 at 19:02, Jakub Kicinski <kuba@kernel.org> wrote:
>
> struct page_pool is rather performance critical and we use
> 16B of the first cache line to store 2 pointers used only
> by test code. Future patches will add more informational
> (non-fast path) attributes.
>
> It's convenient for the user of the API to not have to worry
> which fields are fast and which are slow path. Use struct
> groups to split the params into the two categories internally.
>
> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 31 +++++++++++++++++++------------
> net/core/page_pool.c | 7 ++++---
> 2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 6fc5134095ed..23950fcc4eca 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -54,18 +54,22 @@ struct pp_alloc_cache {
> * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
> */
> struct page_pool_params {
> - unsigned int flags;
> - unsigned int order;
> - unsigned int pool_size;
> - int nid;
> - struct device *dev;
> - struct napi_struct *napi;
> - enum dma_data_direction dma_dir;
> - unsigned int max_len;
> - unsigned int offset;
> + struct_group_tagged(page_pool_params_fast, fast,
> + unsigned int flags;
> + unsigned int order;
> + unsigned int pool_size;
> + int nid;
> + struct device *dev;
> + struct napi_struct *napi;
> + enum dma_data_direction dma_dir;
> + unsigned int max_len;
> + unsigned int offset;
> + );
> + struct_group_tagged(page_pool_params_slow, slow,
> /* private: used by test code only */
> - void (*init_callback)(struct page *page, void *arg);
> - void *init_arg;
> + void (*init_callback)(struct page *page, void *arg);
> + void *init_arg;
> + );
> };
>
> #ifdef CONFIG_PAGE_POOL_STATS
> @@ -119,7 +123,7 @@ struct page_pool_stats {
> #endif
>
> struct page_pool {
> - struct page_pool_params p;
> + struct page_pool_params_fast p;
>
> long frag_users;
> struct page *frag_page;
> @@ -178,6 +182,9 @@ struct page_pool {
> refcount_t user_cnt;
>
> u64 destroy_cnt;
> +
> + /* Slow/Control-path information follows */
> + struct page_pool_params_slow slow;
> };
>
> struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5e409b98aba0..5cae413de7cc 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -173,7 +173,8 @@ static int page_pool_init(struct page_pool *pool,
> {
> unsigned int ring_qsize = 1024; /* Default */
>
> - memcpy(&pool->p, params, sizeof(pool->p));
> + memcpy(&pool->p, ¶ms->fast, sizeof(pool->p));
> + memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow));
>
> /* Validate only known flags were used */
> if (pool->p.flags & ~(PP_FLAG_ALL))
> @@ -384,8 +385,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> * the overhead is negligible.
> */
> page_pool_fragment_page(page, 1);
> - if (pool->p.init_callback)
> - pool->p.init_callback(page, pool->p.init_arg);
> + if (pool->slow.init_callback)
> + pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> static void page_pool_clear_pp_info(struct page *page)
> --
> 2.41.0
>
Had time for a close look, feel free to replace my ack with
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 02/15] net: page_pool: avoid touching slow on the fastpath
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 01/15] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-11-09 9:00 ` Ilias Apalodimas
2023-10-24 16:02 ` [PATCH net-next 03/15] net: page_pool: factor out uninit Jakub Kicinski
` (13 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
To fully benefit from previous commit add one byte of state
in the first cache line recording if we need to look at
the slow part.
The packing isn't all that impressive right now, we create
a 7B hole. I'm expecting Olek's rework will reshuffle this,
anyway.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 2 ++
net/core/page_pool.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 23950fcc4eca..e1bb92c192de 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -125,6 +125,8 @@ struct page_pool_stats {
struct page_pool {
struct page_pool_params_fast p;
+ bool has_init_callback;
+
long frag_users;
struct page *frag_page;
unsigned int frag_offset;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5cae413de7cc..08af9de8e8eb 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -212,6 +212,8 @@ static int page_pool_init(struct page_pool *pool,
*/
}
+ pool->has_init_callback = !!pool->slow.init_callback;
+
#ifdef CONFIG_PAGE_POOL_STATS
pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
if (!pool->recycle_stats)
@@ -385,7 +387,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
* the overhead is negligible.
*/
page_pool_fragment_page(page, 1);
- if (pool->slow.init_callback)
+ if (pool->has_init_callback)
pool->slow.init_callback(page, pool->slow.init_arg);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 02/15] net: page_pool: avoid touching slow on the fastpath
2023-10-24 16:02 ` [PATCH net-next 02/15] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
@ 2023-11-09 9:00 ` Ilias Apalodimas
0 siblings, 0 replies; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 9:00 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Tue, 24 Oct 2023 at 19:02, Jakub Kicinski <kuba@kernel.org> wrote:
>
> To fully benefit from previous commit add one byte of state
> in the first cache line recording if we need to look at
> the slow part.
>
> The packing isn't all that impressive right now, we create
> a 7B hole. I'm expecting Olek's rework will reshuffle this,
> anyway.
>
> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 2 ++
> net/core/page_pool.c | 4 +++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 23950fcc4eca..e1bb92c192de 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -125,6 +125,8 @@ struct page_pool_stats {
> struct page_pool {
> struct page_pool_params_fast p;
>
> + bool has_init_callback;
> +
> long frag_users;
> struct page *frag_page;
> unsigned int frag_offset;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5cae413de7cc..08af9de8e8eb 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -212,6 +212,8 @@ static int page_pool_init(struct page_pool *pool,
> */
> }
>
> + pool->has_init_callback = !!pool->slow.init_callback;
> +
> #ifdef CONFIG_PAGE_POOL_STATS
> pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
> if (!pool->recycle_stats)
> @@ -385,7 +387,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
> * the overhead is negligible.
> */
> page_pool_fragment_page(page, 1);
> - if (pool->slow.init_callback)
> + if (pool->has_init_callback)
> pool->slow.init_callback(page, pool->slow.init_arg);
> }
>
> --
> 2.41.0
>
Same here, please swap my ack with
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 03/15] net: page_pool: factor out uninit
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 01/15] net: page_pool: split the page_pool_params into fast and slow Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 02/15] net: page_pool: avoid touching slow on the fastpath Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-25 18:33 ` Mina Almasry
2023-10-24 16:02 ` [PATCH net-next 04/15] net: page_pool: id the page pools Jakub Kicinski
` (12 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
We'll soon (next change in the series) need a fuller unwind path
in page_pool_create() so create the inverse of page_pool_init().
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/page_pool.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 08af9de8e8eb..4aed4809b5b4 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -234,6 +234,18 @@ static int page_pool_init(struct page_pool *pool,
return 0;
}
+static void page_pool_uninit(struct page_pool *pool)
+{
+ ptr_ring_cleanup(&pool->ring, NULL);
+
+ if (pool->p.flags & PP_FLAG_DMA_MAP)
+ put_device(pool->p.dev);
+
+#ifdef CONFIG_PAGE_POOL_STATS
+ free_percpu(pool->recycle_stats);
+#endif
+}
+
/**
* page_pool_create() - create a page pool.
* @params: parameters, see struct page_pool_params
@@ -817,14 +829,7 @@ static void __page_pool_destroy(struct page_pool *pool)
if (pool->disconnect)
pool->disconnect(pool);
- ptr_ring_cleanup(&pool->ring, NULL);
-
- if (pool->p.flags & PP_FLAG_DMA_MAP)
- put_device(pool->p.dev);
-
-#ifdef CONFIG_PAGE_POOL_STATS
- free_percpu(pool->recycle_stats);
-#endif
+ page_pool_uninit(pool);
kfree(pool);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 03/15] net: page_pool: factor out uninit
2023-10-24 16:02 ` [PATCH net-next 03/15] net: page_pool: factor out uninit Jakub Kicinski
@ 2023-10-25 18:33 ` Mina Almasry
0 siblings, 0 replies; 39+ messages in thread
From: Mina Almasry @ 2023-10-25 18:33 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
On Tue, Oct 24, 2023 at 9:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> We'll soon (next change in the series) need a fuller unwind path
> in page_pool_create() so create the inverse of page_pool_init().
>
> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> net/core/page_pool.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 08af9de8e8eb..4aed4809b5b4 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -234,6 +234,18 @@ static int page_pool_init(struct page_pool *pool,
> return 0;
> }
>
> +static void page_pool_uninit(struct page_pool *pool)
> +{
> + ptr_ring_cleanup(&pool->ring, NULL);
> +
> + if (pool->p.flags & PP_FLAG_DMA_MAP)
> + put_device(pool->p.dev);
> +
> +#ifdef CONFIG_PAGE_POOL_STATS
> + free_percpu(pool->recycle_stats);
> +#endif
> +}
> +
> /**
> * page_pool_create() - create a page pool.
> * @params: parameters, see struct page_pool_params
> @@ -817,14 +829,7 @@ static void __page_pool_destroy(struct page_pool *pool)
> if (pool->disconnect)
> pool->disconnect(pool);
>
> - ptr_ring_cleanup(&pool->ring, NULL);
> -
> - if (pool->p.flags & PP_FLAG_DMA_MAP)
> - put_device(pool->p.dev);
> -
> -#ifdef CONFIG_PAGE_POOL_STATS
> - free_percpu(pool->recycle_stats);
> -#endif
> + page_pool_uninit(pool);
> kfree(pool);
> }
>
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 04/15] net: page_pool: id the page pools
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (2 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 03/15] net: page_pool: factor out uninit Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-25 18:49 ` Mina Almasry
2023-11-09 9:21 ` Ilias Apalodimas
2023-10-24 16:02 ` [PATCH net-next 05/15] net: page_pool: record pools per netdev Jakub Kicinski
` (11 subsequent siblings)
15 siblings, 2 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
To give ourselves the flexibility of creating netlink commands
and ability to refer to page pool instances in uAPIs create
IDs for page pools.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 4 ++++
net/core/Makefile | 2 +-
net/core/page_pool.c | 21 +++++++++++++++-----
net/core/page_pool_priv.h | 9 +++++++++
net/core/page_pool_user.c | 36 +++++++++++++++++++++++++++++++++++
5 files changed, 66 insertions(+), 6 deletions(-)
create mode 100644 net/core/page_pool_priv.h
create mode 100644 net/core/page_pool_user.c
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index e1bb92c192de..c19f0df3bf0b 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -187,6 +187,10 @@ struct page_pool {
/* Slow/Control-path information follows */
struct page_pool_params_slow slow;
+ /* User-facing fields, protected by page_pools_lock */
+ struct {
+ u32 id;
+ } user;
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
diff --git a/net/core/Makefile b/net/core/Makefile
index 0cb734cbc24b..821aec06abf1 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -18,7 +18,7 @@ obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
obj-y += net-sysfs.o
-obj-$(CONFIG_PAGE_POOL) += page_pool.o
+obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
obj-$(CONFIG_NETPOLL) += netpoll.o
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4aed4809b5b4..b3b484fbacae 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -23,6 +23,8 @@
#include <trace/events/page_pool.h>
+#include "page_pool_priv.h"
+
#define DEFER_TIME (msecs_to_jiffies(1000))
#define DEFER_WARN_INTERVAL (60 * HZ)
@@ -260,13 +262,21 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
return ERR_PTR(-ENOMEM);
err = page_pool_init(pool, params);
- if (err < 0) {
- pr_warn("%s() gave up with errno %d\n", __func__, err);
- kfree(pool);
- return ERR_PTR(err);
- }
+ if (err < 0)
+ goto err_free;
+
+ err = page_pool_list(pool);
+ if (err)
+ goto err_uninit;
return pool;
+
+err_uninit:
+ page_pool_uninit(pool);
+err_free:
+ pr_warn("%s() gave up with errno %d\n", __func__, err);
+ kfree(pool);
+ return ERR_PTR(err);
}
EXPORT_SYMBOL(page_pool_create);
@@ -829,6 +839,7 @@ static void __page_pool_destroy(struct page_pool *pool)
if (pool->disconnect)
pool->disconnect(pool);
+ page_pool_unlist(pool);
page_pool_uninit(pool);
kfree(pool);
}
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
new file mode 100644
index 000000000000..c17ea092b4ab
--- /dev/null
+++ b/net/core/page_pool_priv.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PAGE_POOL_PRIV_H
+#define __PAGE_POOL_PRIV_H
+
+int page_pool_list(struct page_pool *pool);
+void page_pool_unlist(struct page_pool *pool);
+
+#endif
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
new file mode 100644
index 000000000000..630d1eeecf2a
--- /dev/null
+++ b/net/core/page_pool_user.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mutex.h>
+#include <linux/xarray.h>
+#include <net/page_pool/types.h>
+
+#include "page_pool_priv.h"
+
+static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
+static DEFINE_MUTEX(page_pools_lock);
+
+int page_pool_list(struct page_pool *pool)
+{
+ static u32 id_alloc_next;
+ int err;
+
+ mutex_lock(&page_pools_lock);
+ err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
+ &id_alloc_next, GFP_KERNEL);
+ if (err < 0)
+ goto err_unlock;
+
+ mutex_unlock(&page_pools_lock);
+ return 0;
+
+err_unlock:
+ mutex_unlock(&page_pools_lock);
+ return err;
+}
+
+void page_pool_unlist(struct page_pool *pool)
+{
+ mutex_lock(&page_pools_lock);
+ xa_erase(&page_pools, pool->user.id);
+ mutex_unlock(&page_pools_lock);
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 04/15] net: page_pool: id the page pools
2023-10-24 16:02 ` [PATCH net-next 04/15] net: page_pool: id the page pools Jakub Kicinski
@ 2023-10-25 18:49 ` Mina Almasry
2023-11-09 9:21 ` Ilias Apalodimas
1 sibling, 0 replies; 39+ messages in thread
From: Mina Almasry @ 2023-10-25 18:49 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
On Tue, Oct 24, 2023 at 9:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> To give ourselves the flexibility of creating netlink commands
> and ability to refer to page pool instances in uAPIs create
> IDs for page pools.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
> ---
> include/net/page_pool/types.h | 4 ++++
> net/core/Makefile | 2 +-
> net/core/page_pool.c | 21 +++++++++++++++-----
> net/core/page_pool_priv.h | 9 +++++++++
> net/core/page_pool_user.c | 36 +++++++++++++++++++++++++++++++++++
> 5 files changed, 66 insertions(+), 6 deletions(-)
> create mode 100644 net/core/page_pool_priv.h
> create mode 100644 net/core/page_pool_user.c
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index e1bb92c192de..c19f0df3bf0b 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -187,6 +187,10 @@ struct page_pool {
>
> /* Slow/Control-path information follows */
> struct page_pool_params_slow slow;
> + /* User-facing fields, protected by page_pools_lock */
> + struct {
> + u32 id;
> + } user;
> };
>
> struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 0cb734cbc24b..821aec06abf1 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -18,7 +18,7 @@ obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
> obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
>
> obj-y += net-sysfs.o
> -obj-$(CONFIG_PAGE_POOL) += page_pool.o
> +obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
> obj-$(CONFIG_PROC_FS) += net-procfs.o
> obj-$(CONFIG_NET_PKTGEN) += pktgen.o
> obj-$(CONFIG_NETPOLL) += netpoll.o
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 4aed4809b5b4..b3b484fbacae 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -23,6 +23,8 @@
>
> #include <trace/events/page_pool.h>
>
> +#include "page_pool_priv.h"
> +
> #define DEFER_TIME (msecs_to_jiffies(1000))
> #define DEFER_WARN_INTERVAL (60 * HZ)
>
> @@ -260,13 +262,21 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
> return ERR_PTR(-ENOMEM);
>
> err = page_pool_init(pool, params);
> - if (err < 0) {
> - pr_warn("%s() gave up with errno %d\n", __func__, err);
> - kfree(pool);
> - return ERR_PTR(err);
> - }
> + if (err < 0)
> + goto err_free;
> +
> + err = page_pool_list(pool);
> + if (err)
> + goto err_uninit;
>
> return pool;
> +
> +err_uninit:
> + page_pool_uninit(pool);
> +err_free:
> + pr_warn("%s() gave up with errno %d\n", __func__, err);
> + kfree(pool);
> + return ERR_PTR(err);
> }
> EXPORT_SYMBOL(page_pool_create);
>
> @@ -829,6 +839,7 @@ static void __page_pool_destroy(struct page_pool *pool)
> if (pool->disconnect)
> pool->disconnect(pool);
>
> + page_pool_unlist(pool);
> page_pool_uninit(pool);
> kfree(pool);
> }
> diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
> new file mode 100644
> index 000000000000..c17ea092b4ab
> --- /dev/null
> +++ b/net/core/page_pool_priv.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __PAGE_POOL_PRIV_H
> +#define __PAGE_POOL_PRIV_H
> +
> +int page_pool_list(struct page_pool *pool);
> +void page_pool_unlist(struct page_pool *pool);
> +
> +#endif
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> new file mode 100644
> index 000000000000..630d1eeecf2a
> --- /dev/null
> +++ b/net/core/page_pool_user.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/mutex.h>
> +#include <linux/xarray.h>
> +#include <net/page_pool/types.h>
> +
> +#include "page_pool_priv.h"
> +
> +static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
> +static DEFINE_MUTEX(page_pools_lock);
> +
> +int page_pool_list(struct page_pool *pool)
> +{
> + static u32 id_alloc_next;
> + int err;
> +
> + mutex_lock(&page_pools_lock);
> + err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> + &id_alloc_next, GFP_KERNEL);
> + if (err < 0)
> + goto err_unlock;
> +
> + mutex_unlock(&page_pools_lock);
> + return 0;
> +
> +err_unlock:
> + mutex_unlock(&page_pools_lock);
> + return err;
> +}
> +
> +void page_pool_unlist(struct page_pool *pool)
> +{
> + mutex_lock(&page_pools_lock);
> + xa_erase(&page_pools, pool->user.id);
> + mutex_unlock(&page_pools_lock);
> +}
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 04/15] net: page_pool: id the page pools
2023-10-24 16:02 ` [PATCH net-next 04/15] net: page_pool: id the page pools Jakub Kicinski
2023-10-25 18:49 ` Mina Almasry
@ 2023-11-09 9:21 ` Ilias Apalodimas
2023-11-09 16:22 ` Jakub Kicinski
1 sibling, 1 reply; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 9:21 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Tue, 24 Oct 2023 at 19:02, Jakub Kicinski <kuba@kernel.org> wrote:
>
> To give ourselves the flexibility of creating netlink commands
> and ability to refer to page pool instances in uAPIs create
> IDs for page pools.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> include/net/page_pool/types.h | 4 ++++
> net/core/Makefile | 2 +-
> net/core/page_pool.c | 21 +++++++++++++++-----
> net/core/page_pool_priv.h | 9 +++++++++
> net/core/page_pool_user.c | 36 +++++++++++++++++++++++++++++++++++
> 5 files changed, 66 insertions(+), 6 deletions(-)
> create mode 100644 net/core/page_pool_priv.h
> create mode 100644 net/core/page_pool_user.c
>
[...]
> +int page_pool_list(struct page_pool *pool)
> +{
> + static u32 id_alloc_next;
> + int err;
> +
> + mutex_lock(&page_pools_lock);
> + err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> + &id_alloc_next, GFP_KERNEL);
> + if (err < 0)
> + goto err_unlock;
A nit really, but get rid of the if/goto and just let this return err; ?
> +
> + mutex_unlock(&page_pools_lock);
> + return 0;
> +
> +err_unlock:
> + mutex_unlock(&page_pools_lock);
> + return err;
> +}
> +
> +void page_pool_unlist(struct page_pool *pool)
> +{
> + mutex_lock(&page_pools_lock);
> + xa_erase(&page_pools, pool->user.id);
> + mutex_unlock(&page_pools_lock);
> +}
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 04/15] net: page_pool: id the page pools
2023-11-09 9:21 ` Ilias Apalodimas
@ 2023-11-09 16:22 ` Jakub Kicinski
2023-11-09 16:48 ` Ilias Apalodimas
0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-11-09 16:22 UTC (permalink / raw)
To: Ilias Apalodimas; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Thu, 9 Nov 2023 11:21:32 +0200 Ilias Apalodimas wrote:
> > + mutex_lock(&page_pools_lock);
> > + err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> > + &id_alloc_next, GFP_KERNEL);
> > + if (err < 0)
> > + goto err_unlock;
>
> A nit really, but get rid of the if/goto and just let this return err; ?
There's more stuff added here by a subsequent patch. It ends up like
this:
int page_pool_list(struct page_pool *pool)
{
static u32 id_alloc_next;
int err;
mutex_lock(&page_pools_lock);
err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
&id_alloc_next, GFP_KERNEL);
if (err < 0)
goto err_unlock;
if (pool->slow.netdev) {
hlist_add_head(&pool->user.list,
&pool->slow.netdev->page_pools);
pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
}
mutex_unlock(&page_pools_lock);
return 0;
err_unlock:
mutex_unlock(&page_pools_lock);
return err;
}
Do you want me to combine the error and non-error paths?
I have a weak preference for not mixing, sometimes err gets set
to a positive value and that starts to propagate, unlikely to
happen here tho.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 04/15] net: page_pool: id the page pools
2023-11-09 16:22 ` Jakub Kicinski
@ 2023-11-09 16:48 ` Ilias Apalodimas
0 siblings, 0 replies; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 16:48 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Thu, 9 Nov 2023 at 18:22, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 9 Nov 2023 11:21:32 +0200 Ilias Apalodimas wrote:
> > > + mutex_lock(&page_pools_lock);
> > > + err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> > > + &id_alloc_next, GFP_KERNEL);
> > > + if (err < 0)
> > > + goto err_unlock;
> >
> > A nit really, but get rid of the if/goto and just let this return err; ?
>
> There's more stuff added here by a subsequent patch. It ends up like
> this:
>
> int page_pool_list(struct page_pool *pool)
> {
> static u32 id_alloc_next;
> int err;
>
> mutex_lock(&page_pools_lock);
> err = xa_alloc_cyclic(&page_pools, &pool->user.id, pool, xa_limit_32b,
> &id_alloc_next, GFP_KERNEL);
> if (err < 0)
> goto err_unlock;
>
> if (pool->slow.netdev) {
> hlist_add_head(&pool->user.list,
> &pool->slow.netdev->page_pools);
> pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
>
> netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
> }
>
> mutex_unlock(&page_pools_lock);
> return 0;
>
> err_unlock:
> mutex_unlock(&page_pools_lock);
> return err;
> }
>
> Do you want me to combine the error and non-error paths?
> I have a weak preference for not mixing, sometimes err gets set
> to a positive value and that starts to propagate, unlikely to
> happen here tho.
Nop that's fine -- I actually prefer it myself. I just haven't got the
follow up patches yet
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (3 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 04/15] net: page_pool: id the page pools Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 17:31 ` David Ahern
2023-10-25 19:56 ` Mina Almasry
2023-10-24 16:02 ` [PATCH net-next 06/15] net: page_pool: stash the NAPI ID for easier access Jakub Kicinski
` (10 subsequent siblings)
15 siblings, 2 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Link the page pools with netdevs. This needs to be netns compatible
so we have two options. Either we record the pools per netns and
have to worry about moving them as the netdev gets moved.
Or we record them directly on the netdev so they move with the netdev
without any extra work.
Implement the latter option. Since pools may outlast netdev we need
a place to store orphans. In time honored tradition use loopback
for this purpose.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v1: fix race between page pool and netdev disappearing (Simon)
---
include/linux/list.h | 20 ++++++++
include/linux/netdevice.h | 4 ++
include/linux/poison.h | 2 +
include/net/page_pool/types.h | 4 ++
net/core/page_pool_user.c | 90 +++++++++++++++++++++++++++++++++++
5 files changed, 120 insertions(+)
diff --git a/include/linux/list.h b/include/linux/list.h
index 164b4d0e9d2a..3d8884472164 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -1111,6 +1111,26 @@ static inline void hlist_move_list(struct hlist_head *old,
old->first = NULL;
}
+/**
+ * hlist_splice_init() - move all entries from one list to another
+ * @from: hlist_head from which entries will be moved
+ * @last: last entry on the @from list
+ * @to: hlist_head to which entries will be moved
+ *
+ * @to can be empty, @from must contain at least @last.
+ */
+static inline void hlist_splice_init(struct hlist_head *from,
+ struct hlist_node *last,
+ struct hlist_head *to)
+{
+ if (to->first)
+ to->first->pprev = &last->next;
+ last->next = to->first;
+ to->first = from->first;
+ from->first->pprev = &to->first;
+ from->first = NULL;
+}
+
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
#define hlist_for_each(pos, head) \
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b8bf669212cc..224ee4680a31 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2431,6 +2431,10 @@ struct net_device {
#if IS_ENABLED(CONFIG_DPLL)
struct dpll_pin *dpll_pin;
#endif
+#if IS_ENABLED(CONFIG_PAGE_POOL)
+ /** @page_pools: page pools created for this netdevice */
+ struct hlist_head page_pools;
+#endif
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 851a855d3868..27a7dad17eef 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -83,6 +83,8 @@
/********** net/core/skbuff.c **********/
#define SKB_LIST_POISON_NEXT ((void *)(0x800 + POISON_POINTER_DELTA))
+/********** net/ **********/
+#define NET_PTR_POISON ((void *)(0x801 + POISON_POINTER_DELTA))
/********** kernel/bpf/ **********/
#define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index c19f0df3bf0b..b258a571201e 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -5,6 +5,7 @@
#include <linux/dma-direction.h>
#include <linux/ptr_ring.h>
+#include <linux/types.h>
#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
* map/unmap
@@ -48,6 +49,7 @@ struct pp_alloc_cache {
* @pool_size: size of the ptr_ring
* @nid: NUMA node id to allocate from pages from
* @dev: device, for DMA pre-mapping purposes
+ * @netdev: netdev this pool will serve (leave as NULL if none or multiple)
* @napi: NAPI which is the sole consumer of pages, otherwise NULL
* @dma_dir: DMA mapping direction
* @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
@@ -66,6 +68,7 @@ struct page_pool_params {
unsigned int offset;
);
struct_group_tagged(page_pool_params_slow, slow,
+ struct net_device *netdev;
/* private: used by test code only */
void (*init_callback)(struct page *page, void *arg);
void *init_arg;
@@ -189,6 +192,7 @@ struct page_pool {
struct page_pool_params_slow slow;
/* User-facing fields, protected by page_pools_lock */
struct {
+ struct hlist_node list;
u32 id;
} user;
};
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 630d1eeecf2a..7cd6f416b87a 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -1,14 +1,31 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/mutex.h>
+#include <linux/netdevice.h>
#include <linux/xarray.h>
+#include <net/net_debug.h>
#include <net/page_pool/types.h>
#include "page_pool_priv.h"
static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
+/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
+ * Ordering: inside rtnl_lock
+ */
static DEFINE_MUTEX(page_pools_lock);
+/* Page pools are only reachable from user space (via netlink) if they are
+ * linked to a netdev at creation time. Following page pool "visibility"
+ * states are possible:
+ * - normal
+ * - user.list: linked to real netdev, netdev: real netdev
+ * - orphaned - real netdev has disappeared
+ * - user.list: linked to lo, netdev: lo
+ * - invisible - either (a) created without netdev linking, (b) unlisted due
+ * to error, or (c) the entire namespace which owned this pool disappeared
+ * - user.list: unhashed, netdev: unknown
+ */
+
int page_pool_list(struct page_pool *pool)
{
static u32 id_alloc_next;
@@ -20,6 +37,10 @@ int page_pool_list(struct page_pool *pool)
if (err < 0)
goto err_unlock;
+ if (pool->slow.netdev)
+ hlist_add_head(&pool->user.list,
+ &pool->slow.netdev->page_pools);
+
mutex_unlock(&page_pools_lock);
return 0;
@@ -32,5 +53,74 @@ void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
xa_erase(&page_pools, pool->user.id);
+ hlist_del(&pool->user.list);
mutex_unlock(&page_pools_lock);
}
+
+static void page_pool_unreg_netdev_wipe(struct net_device *netdev)
+{
+ struct page_pool *pool;
+ struct hlist_node *n;
+
+ mutex_lock(&page_pools_lock);
+ hlist_for_each_entry_safe(pool, n, &netdev->page_pools, user.list) {
+ hlist_del_init(&pool->user.list);
+ pool->slow.netdev = NET_PTR_POISON;
+ }
+ mutex_unlock(&page_pools_lock);
+}
+
+static void page_pool_unreg_netdev(struct net_device *netdev)
+{
+ struct page_pool *pool, *last;
+ struct net_device *lo;
+
+ lo = __dev_get_by_index(dev_net(netdev), 1);
+ if (!lo) {
+ netdev_err_once(netdev,
+ "can't get lo to store orphan page pools\n");
+ page_pool_unreg_netdev_wipe(netdev);
+ return;
+ }
+
+ mutex_lock(&page_pools_lock);
+ last = NULL;
+ hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
+ pool->slow.netdev = lo;
+ last = pool;
+ }
+ if (last)
+ hlist_splice_init(&netdev->page_pools, &last->user.list,
+ &lo->page_pools);
+ mutex_unlock(&page_pools_lock);
+}
+
+static int
+page_pool_netdevice_event(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
+
+ if (event != NETDEV_UNREGISTER)
+ return NOTIFY_DONE;
+
+ if (hlist_empty(&netdev->page_pools))
+ return NOTIFY_OK;
+
+ if (netdev->ifindex != 1)
+ page_pool_unreg_netdev(netdev);
+ else
+ page_pool_unreg_netdev_wipe(netdev);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block page_pool_netdevice_nb = {
+ .notifier_call = page_pool_netdevice_event,
+};
+
+static int __init page_pool_user_init(void)
+{
+ return register_netdevice_notifier(&page_pool_netdevice_nb);
+}
+
+subsys_initcall(page_pool_user_init);
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 16:02 ` [PATCH net-next 05/15] net: page_pool: record pools per netdev Jakub Kicinski
@ 2023-10-24 17:31 ` David Ahern
2023-10-24 17:49 ` Jakub Kicinski
2023-10-25 19:56 ` Mina Almasry
1 sibling, 1 reply; 39+ messages in thread
From: David Ahern @ 2023-10-24 17:31 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas
On 10/24/23 10:02 AM, Jakub Kicinski wrote:
> Link the page pools with netdevs. This needs to be netns compatible
> so we have two options. Either we record the pools per netns and
> have to worry about moving them as the netdev gets moved.
> Or we record them directly on the netdev so they move with the netdev
> without any extra work.
>
> Implement the latter option. Since pools may outlast netdev we need
> a place to store orphans. In time honored tradition use loopback
> for this purpose.
>
blackhole_netdev might be a better choice than loopback
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 17:31 ` David Ahern
@ 2023-10-24 17:49 ` Jakub Kicinski
2023-10-24 19:19 ` David Ahern
0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 17:49 UTC (permalink / raw)
To: David Ahern
Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas
On Tue, 24 Oct 2023 11:31:46 -0600 David Ahern wrote:
> On 10/24/23 10:02 AM, Jakub Kicinski wrote:
> > Link the page pools with netdevs. This needs to be netns compatible
> > so we have two options. Either we record the pools per netns and
> > have to worry about moving them as the netdev gets moved.
> > Or we record them directly on the netdev so they move with the netdev
> > without any extra work.
> >
> > Implement the latter option. Since pools may outlast netdev we need
> > a place to store orphans. In time honored tradition use loopback
> > for this purpose.
>
> blackhole_netdev might be a better choice than loopback
With loopback we still keep it per netns, and still accessible
via netlink.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 17:49 ` Jakub Kicinski
@ 2023-10-24 19:19 ` David Ahern
2023-10-24 19:45 ` Jakub Kicinski
0 siblings, 1 reply; 39+ messages in thread
From: David Ahern @ 2023-10-24 19:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas
On 10/24/23 11:49 AM, Jakub Kicinski wrote:
> On Tue, 24 Oct 2023 11:31:46 -0600 David Ahern wrote:
>> On 10/24/23 10:02 AM, Jakub Kicinski wrote:
>>> Link the page pools with netdevs. This needs to be netns compatible
>>> so we have two options. Either we record the pools per netns and
>>> have to worry about moving them as the netdev gets moved.
>>> Or we record them directly on the netdev so they move with the netdev
>>> without any extra work.
>>>
>>> Implement the latter option. Since pools may outlast netdev we need
>>> a place to store orphans. In time honored tradition use loopback
>>> for this purpose.
>>
>> blackhole_netdev might be a better choice than loopback
>
> With loopback we still keep it per netns, and still accessible
> via netlink.
and if the namespace and loopback device get deleted?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 19:19 ` David Ahern
@ 2023-10-24 19:45 ` Jakub Kicinski
0 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 19:45 UTC (permalink / raw)
To: David Ahern
Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas
On Tue, 24 Oct 2023 13:19:35 -0600 David Ahern wrote:
> >> blackhole_netdev might be a better choice than loopback
> >
> > With loopback we still keep it per netns, and still accessible
> > via netlink.
>
> and if the namespace and loopback device get deleted?
For now we're back to printing the warning to dmesg.
My instinct is that such double-orphans (both netdev and netns
disappeared) should be reported via a separate netlink command
accessible in init_net only. But that's for later, I hope falling
back to the print is good enough for now.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-24 16:02 ` [PATCH net-next 05/15] net: page_pool: record pools per netdev Jakub Kicinski
2023-10-24 17:31 ` David Ahern
@ 2023-10-25 19:56 ` Mina Almasry
2023-10-25 20:17 ` Jakub Kicinski
1 sibling, 1 reply; 39+ messages in thread
From: Mina Almasry @ 2023-10-25 19:56 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
On Tue, Oct 24, 2023 at 9:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Link the page pools with netdevs. This needs to be netns compatible
> so we have two options. Either we record the pools per netns and
> have to worry about moving them as the netdev gets moved.
> Or we record them directly on the netdev so they move with the netdev
> without any extra work.
>
> Implement the latter option. Since pools may outlast netdev we need
> a place to store orphans. In time honored tradition use loopback
> for this purpose.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v1: fix race between page pool and netdev disappearing (Simon)
> ---
> include/linux/list.h | 20 ++++++++
> include/linux/netdevice.h | 4 ++
> include/linux/poison.h | 2 +
> include/net/page_pool/types.h | 4 ++
> net/core/page_pool_user.c | 90 +++++++++++++++++++++++++++++++++++
> 5 files changed, 120 insertions(+)
>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 164b4d0e9d2a..3d8884472164 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -1111,6 +1111,26 @@ static inline void hlist_move_list(struct hlist_head *old,
> old->first = NULL;
> }
>
> +/**
> + * hlist_splice_init() - move all entries from one list to another
> + * @from: hlist_head from which entries will be moved
> + * @last: last entry on the @from list
> + * @to: hlist_head to which entries will be moved
> + *
> + * @to can be empty, @from must contain at least @last.
> + */
> +static inline void hlist_splice_init(struct hlist_head *from,
> + struct hlist_node *last,
> + struct hlist_head *to)
> +{
> + if (to->first)
> + to->first->pprev = &last->next;
> + last->next = to->first;
> + to->first = from->first;
> + from->first->pprev = &to->first;
> + from->first = NULL;
> +}
> +
> #define hlist_entry(ptr, type, member) container_of(ptr,type,member)
>
> #define hlist_for_each(pos, head) \
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index b8bf669212cc..224ee4680a31 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2431,6 +2431,10 @@ struct net_device {
> #if IS_ENABLED(CONFIG_DPLL)
> struct dpll_pin *dpll_pin;
> #endif
> +#if IS_ENABLED(CONFIG_PAGE_POOL)
> + /** @page_pools: page pools created for this netdevice */
> + struct hlist_head page_pools;
> +#endif
I wonder if this per netdev field is really necessary. Is it not
possible to do the same simply looping over the (global) page_pools
xarray? Or is that too silly of an idea. I guess on some systems you
may end up with 100s or 1000s of active or orphaned page pools and
then globally iterating over the whole page_pools xarray can be really
slow..
> };
> #define to_net_dev(d) container_of(d, struct net_device, dev)
>
> diff --git a/include/linux/poison.h b/include/linux/poison.h
> index 851a855d3868..27a7dad17eef 100644
> --- a/include/linux/poison.h
> +++ b/include/linux/poison.h
> @@ -83,6 +83,8 @@
>
> /********** net/core/skbuff.c **********/
> #define SKB_LIST_POISON_NEXT ((void *)(0x800 + POISON_POINTER_DELTA))
> +/********** net/ **********/
> +#define NET_PTR_POISON ((void *)(0x801 + POISON_POINTER_DELTA))
>
> /********** kernel/bpf/ **********/
> #define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index c19f0df3bf0b..b258a571201e 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -5,6 +5,7 @@
>
> #include <linux/dma-direction.h>
> #include <linux/ptr_ring.h>
> +#include <linux/types.h>
>
> #define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
> * map/unmap
> @@ -48,6 +49,7 @@ struct pp_alloc_cache {
> * @pool_size: size of the ptr_ring
> * @nid: NUMA node id to allocate from pages from
> * @dev: device, for DMA pre-mapping purposes
> + * @netdev: netdev this pool will serve (leave as NULL if none or multiple)
Is this an existing use case (page_pools that serve null or multiple
netdevs), or a future use case? My understanding is that currently
page_pools serve at most 1 rx-queue. Spot checking a few drivers that
seems to be true.
> * @napi: NAPI which is the sole consumer of pages, otherwise NULL
> * @dma_dir: DMA mapping direction
> * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
> @@ -66,6 +68,7 @@ struct page_pool_params {
> unsigned int offset;
> );
> struct_group_tagged(page_pool_params_slow, slow,
> + struct net_device *netdev;
> /* private: used by test code only */
> void (*init_callback)(struct page *page, void *arg);
> void *init_arg;
> @@ -189,6 +192,7 @@ struct page_pool {
> struct page_pool_params_slow slow;
> /* User-facing fields, protected by page_pools_lock */
> struct {
> + struct hlist_node list;
> u32 id;
> } user;
> };
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 630d1eeecf2a..7cd6f416b87a 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -1,14 +1,31 @@
> // SPDX-License-Identifier: GPL-2.0
>
> #include <linux/mutex.h>
> +#include <linux/netdevice.h>
> #include <linux/xarray.h>
> +#include <net/net_debug.h>
> #include <net/page_pool/types.h>
>
> #include "page_pool_priv.h"
>
> static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
> +/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
> + * Ordering: inside rtnl_lock
> + */
> static DEFINE_MUTEX(page_pools_lock);
>
> +/* Page pools are only reachable from user space (via netlink) if they are
> + * linked to a netdev at creation time. Following page pool "visibility"
> + * states are possible:
> + * - normal
> + * - user.list: linked to real netdev, netdev: real netdev
> + * - orphaned - real netdev has disappeared
> + * - user.list: linked to lo, netdev: lo
> + * - invisible - either (a) created without netdev linking, (b) unlisted due
> + * to error, or (c) the entire namespace which owned this pool disappeared
> + * - user.list: unhashed, netdev: unknown
> + */
> +
> int page_pool_list(struct page_pool *pool)
> {
> static u32 id_alloc_next;
> @@ -20,6 +37,10 @@ int page_pool_list(struct page_pool *pool)
> if (err < 0)
> goto err_unlock;
>
> + if (pool->slow.netdev)
> + hlist_add_head(&pool->user.list,
> + &pool->slow.netdev->page_pools);
> +
> mutex_unlock(&page_pools_lock);
> return 0;
>
> @@ -32,5 +53,74 @@ void page_pool_unlist(struct page_pool *pool)
> {
> mutex_lock(&page_pools_lock);
> xa_erase(&page_pools, pool->user.id);
> + hlist_del(&pool->user.list);
> mutex_unlock(&page_pools_lock);
> }
> +
> +static void page_pool_unreg_netdev_wipe(struct net_device *netdev)
> +{
> + struct page_pool *pool;
> + struct hlist_node *n;
> +
> + mutex_lock(&page_pools_lock);
> + hlist_for_each_entry_safe(pool, n, &netdev->page_pools, user.list) {
> + hlist_del_init(&pool->user.list);
> + pool->slow.netdev = NET_PTR_POISON;
> + }
> + mutex_unlock(&page_pools_lock);
> +}
> +
> +static void page_pool_unreg_netdev(struct net_device *netdev)
> +{
> + struct page_pool *pool, *last;
> + struct net_device *lo;
> +
> + lo = __dev_get_by_index(dev_net(netdev), 1);
> + if (!lo) {
> + netdev_err_once(netdev,
> + "can't get lo to store orphan page pools\n");
> + page_pool_unreg_netdev_wipe(netdev);
> + return;
> + }
> +
> + mutex_lock(&page_pools_lock);
> + last = NULL;
> + hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
> + pool->slow.netdev = lo;
> + last = pool;
> + }
> + if (last)
> + hlist_splice_init(&netdev->page_pools, &last->user.list,
> + &lo->page_pools);
> + mutex_unlock(&page_pools_lock);
> +}
> +
> +static int
> +page_pool_netdevice_event(struct notifier_block *nb,
> + unsigned long event, void *ptr)
> +{
> + struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
> +
> + if (event != NETDEV_UNREGISTER)
> + return NOTIFY_DONE;
> +
> + if (hlist_empty(&netdev->page_pools))
> + return NOTIFY_OK;
> +
> + if (netdev->ifindex != 1)
I'm guessing 1 is _always_ loopback?
> + page_pool_unreg_netdev(netdev);
> + else
> + page_pool_unreg_netdev_wipe(netdev);
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block page_pool_netdevice_nb = {
> + .notifier_call = page_pool_netdevice_event,
> +};
> +
> +static int __init page_pool_user_init(void)
> +{
> + return register_netdevice_notifier(&page_pool_netdevice_nb);
> +}
> +
> +subsys_initcall(page_pool_user_init);
> --
> 2.41.0
>
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-25 19:56 ` Mina Almasry
@ 2023-10-25 20:17 ` Jakub Kicinski
2023-11-09 3:28 ` Mina Almasry
0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-25 20:17 UTC (permalink / raw)
To: Mina Almasry; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
On Wed, 25 Oct 2023 12:56:44 -0700 Mina Almasry wrote:
> > +#if IS_ENABLED(CONFIG_PAGE_POOL)
> > + /** @page_pools: page pools created for this netdevice */
> > + struct hlist_head page_pools;
> > +#endif
>
> I wonder if this per netdev field is really necessary. Is it not
> possible to do the same simply looping over the (global) page_pools
> xarray? Or is that too silly of an idea. I guess on some systems you
> may end up with 100s or 1000s of active or orphaned page pools and
> then globally iterating over the whole page_pools xarray can be really
> slow..
I think we want the per-netdev hlist either way, on netdev
unregistration we need to find its pools to clear the pointers.
At which point we can as well use that to dump the pools.
I don't see a strong reason to use one approach over the other.
Note that other objects like napi and queues (WIP patches) also walk
netdevs and dump sub-objects from them.
> > @@ -48,6 +49,7 @@ struct pp_alloc_cache {
> > * @pool_size: size of the ptr_ring
> > * @nid: NUMA node id to allocate from pages from
> > * @dev: device, for DMA pre-mapping purposes
> > + * @netdev: netdev this pool will serve (leave as NULL if none or multiple)
>
> Is this an existing use case (page_pools that serve null or multiple
> netdevs), or a future use case? My understanding is that currently
> page_pools serve at most 1 rx-queue. Spot checking a few drivers that
> seems to be true.
I think I saw one embedded driver for a switch-like device which has
queues servicing all ports, and therefore netdevs.
We'd need some help from people using such devices to figure out what
the right way to represent them is, and what extra bits of
functionality they need.
> I'm guessing 1 is _always_ loopback?
AFAIK, yes. I should probably use LOOPBACK_IFINDEX, to make it clearer.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 05/15] net: page_pool: record pools per netdev
2023-10-25 20:17 ` Jakub Kicinski
@ 2023-11-09 3:28 ` Mina Almasry
0 siblings, 0 replies; 39+ messages in thread
From: Mina Almasry @ 2023-11-09 3:28 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas
On Wed, Oct 25, 2023 at 1:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 25 Oct 2023 12:56:44 -0700 Mina Almasry wrote:
> > > +#if IS_ENABLED(CONFIG_PAGE_POOL)
> > > + /** @page_pools: page pools created for this netdevice */
> > > + struct hlist_head page_pools;
> > > +#endif
> >
> > I wonder if this per netdev field is really necessary. Is it not
> > possible to do the same simply looping over the (global) page_pools
> > xarray? Or is that too silly of an idea. I guess on some systems you
> > may end up with 100s or 1000s of active or orphaned page pools and
> > then globally iterating over the whole page_pools xarray can be really
> > slow..
>
> I think we want the per-netdev hlist either way, on netdev
> unregistration we need to find its pools to clear the pointers.
> At which point we can as well use that to dump the pools.
>
> I don't see a strong reason to use one approach over the other.
> Note that other objects like napi and queues (WIP patches) also walk
> netdevs and dump sub-objects from them.
>
Sorry for the very late reply.
I'm not sure we need a per-netdev hlist either way, seems to me
looping over the global list and filterying by pp->netdev is the
equivalent of looping over the per-netdev hlist, I think.
The reason I was suggesting to only do the global list is because when
the same info is stored in 2 places you may end up with the info going
out of sync. Here, netdev->page_pools needs to match the entries in
each pp->netdev. But, I think your approach is more reasonable as
looping over the global array at all times may be a pain if there are
lots of page_pools on the system. I also can't find any issues with
how you're keeping netdev->page_pools & the page_pools->netdev, so
FWIW:
Reviewed-by: Mina Almasry <almasrymina@google.com>
> > > @@ -48,6 +49,7 @@ struct pp_alloc_cache {
> > > * @pool_size: size of the ptr_ring
> > > * @nid: NUMA node id to allocate from pages from
> > > * @dev: device, for DMA pre-mapping purposes
> > > + * @netdev: netdev this pool will serve (leave as NULL if none or multiple)
> >
> > Is this an existing use case (page_pools that serve null or multiple
> > netdevs), or a future use case? My understanding is that currently
> > page_pools serve at most 1 rx-queue. Spot checking a few drivers that
> > seems to be true.
>
> I think I saw one embedded driver for a switch-like device which has
> queues servicing all ports, and therefore netdevs.
> We'd need some help from people using such devices to figure out what
> the right way to represent them is, and what extra bits of
> functionality they need.
>
> > I'm guessing 1 is _always_ loopback?
>
> AFAIK, yes. I should probably use LOOPBACK_IFINDEX, to make it clearer.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 06/15] net: page_pool: stash the NAPI ID for easier access
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (4 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 05/15] net: page_pool: record pools per netdev Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 07/15] eth: link netdev to page_pools in drivers Jakub Kicinski
` (9 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
To avoid any issues with race conditions on accessing napi
and having to think about the lifetime of NAPI objects
in netlink GET - stash the napi_id to which page pool
was linked at creation time.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/net/page_pool/types.h | 1 +
net/core/page_pool_user.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index b258a571201e..7e47d7bb2c1e 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -193,6 +193,7 @@ struct page_pool {
/* User-facing fields, protected by page_pools_lock */
struct {
struct hlist_node list;
+ u32 napi_id;
u32 id;
} user;
};
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 7cd6f416b87a..5d8b1bc2474b 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -37,9 +37,11 @@ int page_pool_list(struct page_pool *pool)
if (err < 0)
goto err_unlock;
- if (pool->slow.netdev)
+ if (pool->slow.netdev) {
hlist_add_head(&pool->user.list,
&pool->slow.netdev->page_pools);
+ pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
+ }
mutex_unlock(&page_pools_lock);
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH net-next 07/15] eth: link netdev to page_pools in drivers
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (5 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 06/15] net: page_pool: stash the NAPI ID for easier access Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-11-09 9:11 ` Ilias Apalodimas
2023-10-24 16:02 ` [PATCH net-next 08/15] net: page_pool: add nlspec for basic access to page pools Jakub Kicinski
` (8 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Link page pool instances to netdev for the drivers which
already link to NAPI. Unless the driver is doing something
very weird per-NAPI should imply per-netdev.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d0359b569afe..04b1b53b1bf1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3298,6 +3298,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.pool_size += bp->rx_ring_size;
pp.nid = dev_to_node(&bp->pdev->dev);
pp.napi = &rxr->bnapi->napi;
+ pp.netdev = bp->dev;
pp.dev = &bp->pdev->dev;
pp.dma_dir = bp->rx_dir;
pp.max_len = PAGE_SIZE;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ea58c6917433..9e4325453d15 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -902,6 +902,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
pp_params.nid = node;
pp_params.dev = rq->pdev;
pp_params.napi = rq->cq.napi;
+ pp_params.netdev = rq->netdev;
pp_params.dma_dir = rq->buff.map_dir;
pp_params.max_len = PAGE_SIZE;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 48ea4aeeea5d..91ad64538cb3 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2137,6 +2137,7 @@ static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
pprm.pool_size = RX_BUFFERS_PER_QUEUE;
pprm.nid = gc->numa_node;
pprm.napi = &rxq->rx_cq.napi;
+ pprm.netdev = rxq->ndev;
rxq->page_pool = page_pool_create(&pprm);
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 07/15] eth: link netdev to page_pools in drivers
2023-10-24 16:02 ` [PATCH net-next 07/15] eth: link netdev to page_pools in drivers Jakub Kicinski
@ 2023-11-09 9:11 ` Ilias Apalodimas
2023-11-09 16:26 ` Jakub Kicinski
0 siblings, 1 reply; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 9:11 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
Hi Jakub,
On Tue, 24 Oct 2023 at 19:02, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Link page pool instances to netdev for the drivers which
> already link to NAPI. Unless the driver is doing something
> very weird per-NAPI should imply per-netdev.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
> drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
> 3 files changed, 3 insertions(+)
Mind add a line for netsec.c (probably in netsec_setup_rx_dring),
since that's the only NIC I currently have around with pp support?
[...]
Thanks
/Ilias
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 07/15] eth: link netdev to page_pools in drivers
2023-11-09 9:11 ` Ilias Apalodimas
@ 2023-11-09 16:26 ` Jakub Kicinski
2023-11-09 16:51 ` Ilias Apalodimas
0 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-11-09 16:26 UTC (permalink / raw)
To: Ilias Apalodimas; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Thu, 9 Nov 2023 11:11:04 +0200 Ilias Apalodimas wrote:
> > ---
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
> > 3 files changed, 3 insertions(+)
>
> Mind add a line for netsec.c (probably in netsec_setup_rx_dring),
> since that's the only NIC I currently have around with pp support?
It's a single queue / single napi device? So like this?
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 0891e9e49ecb..384c506bb930 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1302,6 +1302,8 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
.offset = NETSEC_RXBUF_HEADROOM,
.max_len = NETSEC_RX_BUF_SIZE,
+ .napi = priv->napi,
+ .netdev = priv->ndev,
};
int i, err;
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 07/15] eth: link netdev to page_pools in drivers
2023-11-09 16:26 ` Jakub Kicinski
@ 2023-11-09 16:51 ` Ilias Apalodimas
0 siblings, 0 replies; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 16:51 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Thu, 9 Nov 2023 at 18:26, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 9 Nov 2023 11:11:04 +0200 Ilias Apalodimas wrote:
> > > ---
> > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
> > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
> > > drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
> > > 3 files changed, 3 insertions(+)
> >
> > Mind add a line for netsec.c (probably in netsec_setup_rx_dring),
> > since that's the only NIC I currently have around with pp support?
>
> It's a single queue / single napi device? So like this?
Yes it is
>
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 0891e9e49ecb..384c506bb930 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1302,6 +1302,8 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
> .dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
> .offset = NETSEC_RXBUF_HEADROOM,
> .max_len = NETSEC_RX_BUF_SIZE,
> + .napi = priv->napi,
> + .netdev = priv->ndev,
Yea
Thanks
/Ilias
> };
> int i, err;
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 08/15] net: page_pool: add nlspec for basic access to page pools
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (6 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 07/15] eth: link netdev to page_pools in drivers Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API Jakub Kicinski
` (7 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Add a Netlink spec in YAML for getting very basic information
about page pools.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 45 +++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 14511b13f305..cc9e87dc36e9 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -86,6 +86,34 @@ name: netdev
See Documentation/networking/xdp-rx-metadata.rst for more details.
type: u64
enum: xdp-rx-metadata
+ -
+ name: page-pool
+ attributes:
+ -
+ name: id
+ doc: Unique ID of a Page Pool instance.
+ type: uint
+ checks:
+ min: 1
+ max: u32-max
+ -
+ name: ifindex
+ doc: |
+ ifindex of the netdev to which the pool belongs.
+ May be reported as 0 if the page pool was allocated for a netdev
+ which got destroyed already (page pools may outlast their netdevs
+ because they wait for all memory to be returned).
+ type: u32
+ checks:
+ min: 1
+ max: s32-max
+ -
+ name: napi-id
+ doc: Id of NAPI using this Page Pool instance.
+ type: uint
+ checks:
+ min: 1
+ max: u32-max
operations:
list:
@@ -120,6 +148,23 @@ name: netdev
doc: Notification about device configuration being changed.
notify: dev-get
mcgrp: mgmt
+ -
+ name: page-pool-get
+ doc: |
+ Get / dump information about Page Pools.
+ (Only Page Pools associated with a net_device can be listed.)
+ attribute-set: page-pool
+ do:
+ request:
+ attributes:
+ - id
+ reply: &pp-reply
+ attributes:
+ - id
+ - ifindex
+ - napi-id
+ dump:
+ reply: *pp-reply
mcast-groups:
list:
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (7 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 08/15] net: page_pool: add nlspec for basic access to page pools Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-25 10:51 ` kernel test robot
2023-10-25 22:08 ` kernel test robot
2023-10-24 16:02 ` [PATCH net-next 10/15] net: page_pool: add netlink notifications for state changes Jakub Kicinski
` (6 subsequent siblings)
15 siblings, 2 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Expose the very basic page pool information via netlink.
Example using ynl-py for a system with 9 queues:
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 19, 'ifindex': 2, 'napi-id': 147},
{'id': 18, 'ifindex': 2, 'napi-id': 146},
{'id': 17, 'ifindex': 2, 'napi-id': 145},
{'id': 16, 'ifindex': 2, 'napi-id': 144},
{'id': 15, 'ifindex': 2, 'napi-id': 143},
{'id': 14, 'ifindex': 2, 'napi-id': 142},
{'id': 13, 'ifindex': 2, 'napi-id': 141},
{'id': 12, 'ifindex': 2, 'napi-id': 140},
{'id': 11, 'ifindex': 2, 'napi-id': 139},
{'id': 10, 'ifindex': 2, 'napi-id': 138}]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/uapi/linux/netdev.h | 10 +++
net/core/netdev-genl-gen.c | 23 +++++++
net/core/netdev-genl-gen.h | 3 +
net/core/page_pool_user.c | 127 ++++++++++++++++++++++++++++++++++++
4 files changed, 163 insertions(+)
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 2943a151d4f1..176665bcf0da 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -64,11 +64,21 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_ID = 1,
+ NETDEV_A_PAGE_POOL_IFINDEX,
+ NETDEV_A_PAGE_POOL_NAPI_ID,
+
+ __NETDEV_A_PAGE_POOL_MAX,
+ NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index ea9231378aa6..fc5370cf654c 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -10,11 +10,22 @@
#include <uapi/linux/netdev.h>
+/* Integer value ranges */
+struct netlink_range_validation netdev_a_page_pool_id_range = {
+ .min = 1,
+ .max = 4294967295,
+};
+
/* NETDEV_CMD_DEV_GET - do */
static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = {
[NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
};
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+static const struct nla_policy netdev_page_pool_get_nl_policy[NETDEV_A_PAGE_POOL_ID + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range),
+};
+
/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
@@ -29,6 +40,18 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.dumpit = netdev_nl_dev_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_GET,
+ .doit = netdev_nl_page_pool_get_doit,
+ .policy = netdev_page_pool_get_nl_policy,
+ .maxattr = NETDEV_A_PAGE_POOL_ID,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_GET,
+ .dumpit = netdev_nl_page_pool_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
};
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 7b370c073e7d..a011d12abff4 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -13,6 +13,9 @@
int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info);
+int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb);
enum {
NETDEV_NLGRP_MGMT,
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 5d8b1bc2474b..3f90a20896c2 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -5,8 +5,10 @@
#include <linux/xarray.h>
#include <net/net_debug.h>
#include <net/page_pool/types.h>
+#include <net/sock.h>
#include "page_pool_priv.h"
+#include "netdev-genl-gen.h"
static DEFINE_XARRAY_FLAGS(page_pools, XA_FLAGS_ALLOC1);
/* Protects: page_pools, netdevice->page_pools, pool->slow.netdev, pool->user.
@@ -26,6 +28,131 @@ static DEFINE_MUTEX(page_pools_lock);
* - user.list: unhashed, netdev: unknown
*/
+typedef int (*pp_nl_fill_cb)(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info);
+
+static int
+netdev_nl_page_pool_get_do(struct genl_info *info, u32 id, pp_nl_fill_cb fill)
+{
+ struct page_pool *pool;
+ struct sk_buff *rsp;
+ int err;
+
+ mutex_lock(&page_pools_lock);
+ pool = xa_load(&page_pools, id);
+ if (!pool || hlist_unhashed(&pool->user.list) ||
+ !net_eq(dev_net(pool->slow.netdev), genl_info_net(info))) {
+ err = -ENOENT;
+ goto err_unlock;
+ }
+
+ rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!rsp) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
+ err = fill(rsp, pool, info);
+ if (err)
+ goto err_free_msg;
+
+ mutex_unlock(&page_pools_lock);
+
+ return genlmsg_reply(rsp, info);
+
+err_free_msg:
+ nlmsg_free(rsp);
+err_unlock:
+ mutex_unlock(&page_pools_lock);
+ return err;
+}
+
+struct page_pool_dump_cb {
+ unsigned long ifindex;
+ u32 pp_id;
+};
+
+static int
+netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
+ pp_nl_fill_cb fill)
+{
+ struct page_pool_dump_cb *state = (void *)cb->ctx;
+ const struct genl_info *info = genl_info_dump(cb);
+ struct net *net = sock_net(skb->sk);
+ struct net_device *netdev;
+ struct page_pool *pool;
+ int err = 0;
+
+ rtnl_lock();
+ mutex_lock(&page_pools_lock);
+ for_each_netdev_dump(net, netdev, state->ifindex) {
+ hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
+ if (state->pp_id && state->pp_id < pool->user.id)
+ continue;
+
+ state->pp_id = pool->user.id;
+ err = fill(skb, pool, info);
+ if (err)
+ break;
+ }
+
+ state->pp_id = 0;
+ }
+ mutex_unlock(&page_pools_lock);
+ rtnl_unlock();
+
+ if (skb->len && err == -EMSGSIZE)
+ return skb->len;
+ return err;
+}
+
+static int
+page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info)
+{
+ void *hdr;
+
+ hdr = genlmsg_iput(rsp, info);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id))
+ goto err_cancel;
+
+ if (pool->slow.netdev->ifindex != 1 &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
+ pool->slow.netdev->ifindex))
+ goto err_cancel;
+ if (pool->user.napi_id &&
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
+ goto err_cancel;
+
+ genlmsg_end(rsp, hdr);
+
+ return 0;
+err_cancel:
+ genlmsg_cancel(rsp, hdr);
+ return -EMSGSIZE;
+}
+
+int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ u32 id;
+
+ if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_PAGE_POOL_ID))
+ return -EINVAL;
+
+ id = nla_get_uint(info->attrs[NETDEV_A_PAGE_POOL_ID]);
+
+ return netdev_nl_page_pool_get_do(info, id, page_pool_nl_fill);
+}
+
+int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ return netdev_nl_page_pool_get_dump(skb, cb, page_pool_nl_fill);
+}
+
int page_pool_list(struct page_pool *pool)
{
static u32 id_alloc_next;
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API
2023-10-24 16:02 ` [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API Jakub Kicinski
@ 2023-10-25 10:51 ` kernel test robot
2023-10-25 22:08 ` kernel test robot
1 sibling, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-10-25 10:51 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: llvm, oe-kbuild-all, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas, Jakub Kicinski
Hi Jakub,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Jakub-Kicinski/net-page_pool-split-the-page_pool_params-into-fast-and-slow/20231025-023128
base: net-next/main
patch link: https://lore.kernel.org/r/20231024160220.3973311-10-kuba%40kernel.org
patch subject: [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20231025/202310251843.xbPDgsrV-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231025/202310251843.xbPDgsrV-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310251843.xbPDgsrV-lkp@intel.com/
All errors (new ones prefixed by >>):
/usr/bin/ld: init/main.o: warning: relocation in read-only section `.ref.text'
/usr/bin/ld: warning: .tmp_vmlinux.kallsyms1 has a LOAD segment with RWX permissions
>> /usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0x98): undefined reference to `netdev_nl_page_pool_get_doit'
>> /usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0xc0): undefined reference to `netdev_nl_page_pool_get_dumpit'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
clang: error: linker command failed with exit code 1 (use -v to see invocation)
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API
2023-10-24 16:02 ` [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API Jakub Kicinski
2023-10-25 10:51 ` kernel test robot
@ 2023-10-25 22:08 ` kernel test robot
1 sibling, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-10-25 22:08 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: oe-kbuild-all, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas, Jakub Kicinski
Hi Jakub,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Jakub-Kicinski/net-page_pool-split-the-page_pool_params-into-fast-and-slow/20231025-023128
base: net-next/main
patch link: https://lore.kernel.org/r/20231024160220.3973311-10-kuba%40kernel.org
patch subject: [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API
config: csky-defconfig (https://download.01.org/0day-ci/archive/20231026/202310260559.9QsjadJ3-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231026/202310260559.9QsjadJ3-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310260559.9QsjadJ3-lkp@intel.com/
All errors (new ones prefixed by >>):
>> csky-linux-ld: net/core/netdev-genl-gen.o:(.rodata+0x48): undefined reference to `netdev_nl_page_pool_get_doit'
>> csky-linux-ld: net/core/netdev-genl-gen.o:(.rodata+0x60): undefined reference to `netdev_nl_page_pool_get_dumpit'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 10/15] net: page_pool: add netlink notifications for state changes
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (8 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 09/15] net: page_pool: implement GET in the netlink API Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 11/15] net: page_pool: report amount of memory held by page pools Jakub Kicinski
` (5 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Generate netlink notifications about page pool state changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 17 ++++++++++++
include/uapi/linux/netdev.h | 4 +++
net/core/netdev-genl-gen.c | 1 +
net/core/netdev-genl-gen.h | 1 +
net/core/page_pool_user.c | 36 +++++++++++++++++++++++++
5 files changed, 59 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cc9e87dc36e9..2963ff853dd6 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -165,8 +165,25 @@ name: netdev
- napi-id
dump:
reply: *pp-reply
+ -
+ name: page-pool-add-ntf
+ doc: Notification about page pool appearing.
+ notify: page-pool-get
+ mcgrp: page-pool
+ -
+ name: page-pool-del-ntf
+ doc: Notification about page pool disappearing.
+ notify: page-pool-get
+ mcgrp: page-pool
+ -
+ name: page-pool-change-ntf
+ doc: Notification about page pool configuration being changed.
+ notify: page-pool-get
+ mcgrp: page-pool
mcast-groups:
list:
-
name: mgmt
+ -
+ name: page-pool
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 176665bcf0da..beb158872226 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -79,11 +79,15 @@ enum {
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
+ NETDEV_CMD_PAGE_POOL_ADD_NTF,
+ NETDEV_CMD_PAGE_POOL_DEL_NTF,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
+#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index fc5370cf654c..30b99eeb2cd5 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -56,6 +56,7 @@ static const struct genl_split_ops netdev_nl_ops[] = {
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
[NETDEV_NLGRP_MGMT] = { "mgmt", },
+ [NETDEV_NLGRP_PAGE_POOL] = { "page-pool", },
};
struct genl_family netdev_nl_family __ro_after_init = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index a011d12abff4..738097847100 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -19,6 +19,7 @@ int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
enum {
NETDEV_NLGRP_MGMT,
+ NETDEV_NLGRP_PAGE_POOL,
};
extern struct genl_family netdev_nl_family;
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 3f90a20896c2..0c7f065907d9 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -135,6 +135,37 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
return -EMSGSIZE;
}
+static void netdev_nl_page_pool_event(const struct page_pool *pool, u32 cmd)
+{
+ struct genl_info info;
+ struct sk_buff *ntf;
+ struct net *net;
+
+ lockdep_assert_held(&page_pools_lock);
+
+ /* 'invisible' page pools don't matter */
+ if (hlist_unhashed(&pool->user.list))
+ return;
+ net = dev_net(pool->slow.netdev);
+
+ if (!genl_has_listeners(&netdev_nl_family, net, NETDEV_NLGRP_PAGE_POOL))
+ return;
+
+ genl_info_init_ntf(&info, &netdev_nl_family, cmd);
+
+ ntf = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!ntf)
+ return;
+
+ if (page_pool_nl_fill(ntf, pool, &info)) {
+ nlmsg_free(ntf);
+ return;
+ }
+
+ genlmsg_multicast_netns(&netdev_nl_family, net, ntf,
+ 0, NETDEV_NLGRP_PAGE_POOL, GFP_KERNEL);
+}
+
int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info)
{
u32 id;
@@ -168,6 +199,8 @@ int page_pool_list(struct page_pool *pool)
hlist_add_head(&pool->user.list,
&pool->slow.netdev->page_pools);
pool->user.napi_id = pool->p.napi ? pool->p.napi->napi_id : 0;
+
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_ADD_NTF);
}
mutex_unlock(&page_pools_lock);
@@ -181,6 +214,7 @@ int page_pool_list(struct page_pool *pool)
void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_DEL_NTF);
xa_erase(&page_pools, pool->user.id);
hlist_del(&pool->user.list);
mutex_unlock(&page_pools_lock);
@@ -216,6 +250,8 @@ static void page_pool_unreg_netdev(struct net_device *netdev)
last = NULL;
hlist_for_each_entry(pool, &netdev->page_pools, user.list) {
pool->slow.netdev = lo;
+ netdev_nl_page_pool_event(pool,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF);
last = pool;
}
if (last)
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH net-next 11/15] net: page_pool: report amount of memory held by page pools
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (9 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 10/15] net: page_pool: add netlink notifications for state changes Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 12/15] net: page_pool: report when page pool was destroyed Jakub Kicinski
` (4 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Advanced deployments need the ability to check memory use
of various system components. It makes it possible to make informed
decisions about memory allocation and to find regressions and leaks.
Report memory use of page pools. Report both number of references
and bytes held.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 13 +++++++++++++
include/uapi/linux/netdev.h | 2 ++
net/core/page_pool.c | 13 +++++++++----
net/core/page_pool_priv.h | 2 ++
net/core/page_pool_user.c | 8 ++++++++
5 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 2963ff853dd6..8d995760a14a 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -114,6 +114,17 @@ name: netdev
checks:
min: 1
max: u32-max
+ -
+ name: inflight
+ type: uint
+ doc: |
+ Number of outstanding references to this page pool (allocated
+ but yet to be freed pages).
+ -
+ name: inflight-mem
+ type: uint
+ doc: |
+ Amount of memory held by inflight pages.
operations:
list:
@@ -163,6 +174,8 @@ name: netdev
- id
- ifindex
- napi-id
+ - inflight
+ - inflight-mem
dump:
reply: *pp-reply
-
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index beb158872226..26ae5bdd3187 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -68,6 +68,8 @@ enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
+ NETDEV_A_PAGE_POOL_INFLIGHT,
+ NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index b3b484fbacae..30c8fc91fa66 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -525,7 +525,7 @@ EXPORT_SYMBOL(page_pool_alloc_pages);
*/
#define _distance(a, b) (s32)((a) - (b))
-static s32 page_pool_inflight(struct page_pool *pool)
+s32 page_pool_inflight(const struct page_pool *pool, bool strict)
{
u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
@@ -533,8 +533,13 @@ static s32 page_pool_inflight(struct page_pool *pool)
inflight = _distance(hold_cnt, release_cnt);
- trace_page_pool_release(pool, inflight, hold_cnt, release_cnt);
- WARN(inflight < 0, "Negative(%d) inflight packet-pages", inflight);
+ if (strict) {
+ trace_page_pool_release(pool, inflight, hold_cnt, release_cnt);
+ WARN(inflight < 0, "Negative(%d) inflight packet-pages",
+ inflight);
+ } else {
+ inflight = max(0, inflight);
+ }
return inflight;
}
@@ -877,7 +882,7 @@ static int page_pool_release(struct page_pool *pool)
int inflight;
page_pool_scrub(pool);
- inflight = page_pool_inflight(pool);
+ inflight = page_pool_inflight(pool, true);
if (!inflight)
__page_pool_destroy(pool);
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
index c17ea092b4ab..72fb21ea1ddc 100644
--- a/net/core/page_pool_priv.h
+++ b/net/core/page_pool_priv.h
@@ -3,6 +3,8 @@
#ifndef __PAGE_POOL_PRIV_H
#define __PAGE_POOL_PRIV_H
+s32 page_pool_inflight(const struct page_pool *pool, bool strict);
+
int page_pool_list(struct page_pool *pool);
void page_pool_unlist(struct page_pool *pool);
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 0c7f065907d9..c971fe9eeb01 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -110,6 +110,7 @@ static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
const struct genl_info *info)
{
+ size_t inflight, refsz;
void *hdr;
hdr = genlmsg_iput(rsp, info);
@@ -127,6 +128,13 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_NAPI_ID, pool->user.napi_id))
goto err_cancel;
+ inflight = page_pool_inflight(pool, false);
+ refsz = PAGE_SIZE << pool->p.order;
+ if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_INFLIGHT, inflight) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
+ inflight * refsz))
+ goto err_cancel;
+
genlmsg_end(rsp, hdr);
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH net-next 12/15] net: page_pool: report when page pool was destroyed
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (10 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 11/15] net: page_pool: report amount of memory held by page pools Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-11-09 17:05 ` Dragos Tatulea
2023-10-24 16:02 ` [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink Jakub Kicinski
` (3 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Report when page pool was destroyed. Together with the inflight
/ memory use reporting this can serve as a replacement for the
warning about leaked page pools we currently print to dmesg.
Example output for a fake leaked page pool using some hacks
in netdevsim (one "live" pool, and one "leaked" on the same dev):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-get
[{'id': 2, 'ifindex': 3},
{'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 9 +++++++++
include/net/page_pool/types.h | 1 +
include/uapi/linux/netdev.h | 1 +
net/core/page_pool.c | 1 +
net/core/page_pool_priv.h | 1 +
net/core/page_pool_user.c | 12 ++++++++++++
6 files changed, 25 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 8d995760a14a..8be8f249bed3 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -125,6 +125,14 @@ name: netdev
type: uint
doc: |
Amount of memory held by inflight pages.
+ -
+ name: destroyed
+ type: uint
+ doc: |
+ Seconds in CLOCK_BOOTTIME of when Page Pool was destroyed.
+ Page Pools wait for all the memory allocated from them to be freed
+ before truly disappearing.
+ Absent if Page Pool hasn't been destroyed.
operations:
list:
@@ -176,6 +184,7 @@ name: netdev
- napi-id
- inflight
- inflight-mem
+ - destroyed
dump:
reply: *pp-reply
-
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 7e47d7bb2c1e..f0c51ef5e345 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -193,6 +193,7 @@ struct page_pool {
/* User-facing fields, protected by page_pools_lock */
struct {
struct hlist_node list;
+ u64 destroyed;
u32 napi_id;
u32 id;
} user;
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 26ae5bdd3187..e5bf66d2aa31 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -70,6 +70,7 @@ enum {
NETDEV_A_PAGE_POOL_NAPI_ID,
NETDEV_A_PAGE_POOL_INFLIGHT,
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
+ NETDEV_A_PAGE_POOL_DESTROYED,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 30c8fc91fa66..57847fbb76a0 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -949,6 +949,7 @@ void page_pool_destroy(struct page_pool *pool)
if (!page_pool_release(pool))
return;
+ page_pool_destroyed(pool);
pool->defer_start = jiffies;
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
index 72fb21ea1ddc..7fe6f842a270 100644
--- a/net/core/page_pool_priv.h
+++ b/net/core/page_pool_priv.h
@@ -6,6 +6,7 @@
s32 page_pool_inflight(const struct page_pool *pool, bool strict);
int page_pool_list(struct page_pool *pool);
+void page_pool_destroyed(struct page_pool *pool);
void page_pool_unlist(struct page_pool *pool);
#endif
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index c971fe9eeb01..1fb5c3cbe412 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -134,6 +134,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
inflight * refsz))
goto err_cancel;
+ if (pool->user.destroyed &&
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_DESTROYED,
+ pool->user.destroyed))
+ goto err_cancel;
genlmsg_end(rsp, hdr);
@@ -219,6 +223,14 @@ int page_pool_list(struct page_pool *pool)
return err;
}
+void page_pool_destroyed(struct page_pool *pool)
+{
+ mutex_lock(&page_pools_lock);
+ pool->user.destroyed = ktime_get_boottime_seconds();
+ netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_CHANGE_NTF);
+ mutex_unlock(&page_pools_lock);
+}
+
void page_pool_unlist(struct page_pool *pool)
{
mutex_lock(&page_pools_lock);
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 12/15] net: page_pool: report when page pool was destroyed
2023-10-24 16:02 ` [PATCH net-next 12/15] net: page_pool: report when page pool was destroyed Jakub Kicinski
@ 2023-11-09 17:05 ` Dragos Tatulea
0 siblings, 0 replies; 39+ messages in thread
From: Dragos Tatulea @ 2023-11-09 17:05 UTC (permalink / raw)
To: kuba@kernel.org, davem@davemloft.net
Cc: ilias.apalodimas@linaro.org, edumazet@google.com,
netdev@vger.kernel.org, hawk@kernel.org, pabeni@redhat.com,
almasrymina@google.com
On Tue, 2023-10-24 at 09:02 -0700, Jakub Kicinski wrote:
> Report when page pool was destroyed. Together with the inflight
> / memory use reporting this can serve as a replacement for the
> warning about leaked page pools we currently print to dmesg.
>
> Example output for a fake leaked page pool using some hacks
> in netdevsim (one "live" pool, and one "leaked" on the same dev):
>
> $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
> --dump page-pool-get
> [{'id': 2, 'ifindex': 3},
> {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]
>
The destroyed ts really helps to narrow down which tests/test ranges are
triggering the leaks. Thanks!
I was planning to add a per page_pool "name" where the driver would encode the
rq index + creation timestamp and the printout would show this name. This
approach is much cleaner though.
For what it's worth:
Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> Documentation/netlink/specs/netdev.yaml | 9 +++++++++
> include/net/page_pool/types.h | 1 +
> include/uapi/linux/netdev.h | 1 +
> net/core/page_pool.c | 1 +
> net/core/page_pool_priv.h | 1 +
> net/core/page_pool_user.c | 12 ++++++++++++
> 6 files changed, 25 insertions(+)
>
> diff --git a/Documentation/netlink/specs/netdev.yaml
> b/Documentation/netlink/specs/netdev.yaml
> index 8d995760a14a..8be8f249bed3 100644
> --- a/Documentation/netlink/specs/netdev.yaml
> +++ b/Documentation/netlink/specs/netdev.yaml
> @@ -125,6 +125,14 @@ name: netdev
> type: uint
> doc: |
> Amount of memory held by inflight pages.
> + -
> + name: destroyed
> + type: uint
> + doc: |
> + Seconds in CLOCK_BOOTTIME of when Page Pool was destroyed.
> + Page Pools wait for all the memory allocated from them to be freed
> + before truly disappearing.
> + Absent if Page Pool hasn't been destroyed.
>
> operations:
> list:
> @@ -176,6 +184,7 @@ name: netdev
> - napi-id
> - inflight
> - inflight-mem
> + - destroyed
> dump:
> reply: *pp-reply
> -
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 7e47d7bb2c1e..f0c51ef5e345 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -193,6 +193,7 @@ struct page_pool {
> /* User-facing fields, protected by page_pools_lock */
> struct {
> struct hlist_node list;
> + u64 destroyed;
> u32 napi_id;
> u32 id;
> } user;
> diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
> index 26ae5bdd3187..e5bf66d2aa31 100644
> --- a/include/uapi/linux/netdev.h
> +++ b/include/uapi/linux/netdev.h
> @@ -70,6 +70,7 @@ enum {
> NETDEV_A_PAGE_POOL_NAPI_ID,
> NETDEV_A_PAGE_POOL_INFLIGHT,
> NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
> + NETDEV_A_PAGE_POOL_DESTROYED,
>
> __NETDEV_A_PAGE_POOL_MAX,
> NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 30c8fc91fa66..57847fbb76a0 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -949,6 +949,7 @@ void page_pool_destroy(struct page_pool *pool)
> if (!page_pool_release(pool))
> return;
>
> + page_pool_destroyed(pool);
> pool->defer_start = jiffies;
> pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
>
> diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h
> index 72fb21ea1ddc..7fe6f842a270 100644
> --- a/net/core/page_pool_priv.h
> +++ b/net/core/page_pool_priv.h
> @@ -6,6 +6,7 @@
> s32 page_pool_inflight(const struct page_pool *pool, bool strict);
>
> int page_pool_list(struct page_pool *pool);
> +void page_pool_destroyed(struct page_pool *pool);
> void page_pool_unlist(struct page_pool *pool);
>
> #endif
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index c971fe9eeb01..1fb5c3cbe412 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -134,6 +134,10 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct
> page_pool *pool,
> nla_put_uint(rsp, NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
> inflight * refsz))
> goto err_cancel;
> + if (pool->user.destroyed &&
> + nla_put_uint(rsp, NETDEV_A_PAGE_POOL_DESTROYED,
> + pool->user.destroyed))
> + goto err_cancel;
>
> genlmsg_end(rsp, hdr);
>
> @@ -219,6 +223,14 @@ int page_pool_list(struct page_pool *pool)
> return err;
> }
>
> +void page_pool_destroyed(struct page_pool *pool)
> +{
> + mutex_lock(&page_pools_lock);
> + pool->user.destroyed = ktime_get_boottime_seconds();
> + netdev_nl_page_pool_event(pool, NETDEV_CMD_PAGE_POOL_CHANGE_NTF);
> + mutex_unlock(&page_pools_lock);
> +}
> +
> void page_pool_unlist(struct page_pool *pool)
> {
> mutex_lock(&page_pools_lock);
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (11 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 12/15] net: page_pool: report when page pool was destroyed Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-25 13:50 ` kernel test robot
2023-10-24 16:02 ` [PATCH net-next 14/15] net: page_pool: mute the periodic warning for visible page pools Jakub Kicinski
` (2 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Dump the stats into netlink. More clever approaches
like dumping the stats per-CPU for each CPU individually
to see where the packets get consumed can be implemented
in the future.
A trimmed example from a real (but recently booted system):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
--dump page-pool-stats-get
[{'info': {'id': 19, 'ifindex': 2},
'alloc-empty': 48,
'alloc-fast': 3024,
'alloc-refill': 0,
'alloc-slow': 48,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 0,
'recycle-cached': 0,
'recycle-released-refcnt': 0,
'recycle-ring': 0,
'recycle-ring-full': 0},
{'info': {'id': 18, 'ifindex': 2},
'alloc-empty': 66,
'alloc-fast': 11811,
'alloc-refill': 35,
'alloc-slow': 66,
'alloc-slow-high-order': 0,
'alloc-waive': 0,
'recycle-cache-full': 1145,
'recycle-cached': 6541,
'recycle-released-refcnt': 0,
'recycle-ring': 1275,
'recycle-ring-full': 0},
{'info': {'id': 17, 'ifindex': 2},
'alloc-empty': 73,
'alloc-fast': 62099,
'alloc-refill': 413,
...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Documentation/netlink/specs/netdev.yaml | 77 ++++++++++++++++++
Documentation/networking/page_pool.rst | 10 ++-
include/net/page_pool/helpers.h | 8 +-
include/uapi/linux/netdev.h | 19 +++++
net/core/netdev-genl-gen.c | 28 +++++++
net/core/netdev-genl-gen.h | 7 ++
net/core/page_pool.c | 2 +-
net/core/page_pool_user.c | 103 ++++++++++++++++++++++++
8 files changed, 245 insertions(+), 9 deletions(-)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index 8be8f249bed3..432f19ea07c4 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -133,6 +133,59 @@ name: netdev
Page Pools wait for all the memory allocated from them to be freed
before truly disappearing.
Absent if Page Pool hasn't been destroyed.
+ -
+ name: page-pool-info
+ subset-of: page-pool
+ attributes:
+ -
+ name: id
+ -
+ name: ifindex
+ -
+ name: page-pool-stats
+ doc: |
+ Page pool statistics, see docs for struct page_pool_stats
+ for information about individual statistics.
+ attributes:
+ -
+ name: info
+ doc: Page pool identifying information.
+ type: nest
+ nested-attributes: page-pool-info
+ -
+ name: alloc-fast
+ type: uint
+ value: 8 # reserve some attr ids in case we need more metadata later
+ -
+ name: alloc-slow
+ type: uint
+ -
+ name: alloc-slow-high-order
+ type: uint
+ -
+ name: alloc-empty
+ type: uint
+ -
+ name: alloc-refill
+ type: uint
+ -
+ name: alloc-waive
+ type: uint
+ -
+ name: recycle-cached
+ type: uint
+ -
+ name: recycle-cache-full
+ type: uint
+ -
+ name: recycle-ring
+ type: uint
+ -
+ name: recycle-ring-full
+ type: uint
+ -
+ name: recycle-released-refcnt
+ type: uint
operations:
list:
@@ -202,6 +255,30 @@ name: netdev
doc: Notification about page pool configuration being changed.
notify: page-pool-get
mcgrp: page-pool
+ -
+ name: page-pool-stats-get
+ doc: Get page pool statistics.
+ attribute-set: page-pool-stats
+ do:
+ request:
+ attributes:
+ - info
+ reply: &pp-stats-reply
+ attributes:
+ - info
+ - alloc-fast
+ - alloc-slow
+ - alloc-slow-high-order
+ - alloc-empty
+ - alloc-refill
+ - alloc-waive
+ - recycle-cached
+ - recycle-cache-full
+ - recycle-ring
+ - recycle-ring-full
+ - recycle-released-refcnt
+ dump:
+ reply: *pp-stats-reply
mcast-groups:
list:
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 60993cb56b32..9d958128a57c 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -41,6 +41,11 @@ Architecture overview
| Fast cache | | ptr-ring cache |
+-----------------+ +------------------+
+Monitoring
+==========
+Information about page pools on the system can be accessed via the netdev
+genetlink family (see Documentation/netlink/specs/netdev.yaml).
+
API interface
=============
The number of pools created **must** match the number of hardware queues
@@ -107,8 +112,9 @@ page_pool_get_stats() and structures described below are available.
It takes a pointer to a ``struct page_pool`` and a pointer to a struct
page_pool_stats allocated by the caller.
-The API will fill in the provided struct page_pool_stats with
-statistics about the page_pool.
+Older drivers expose page pool statistics via ethtool or debugfs.
+The same statistics are accessible via the netlink netdev family
+in a driver-independent fashion.
.. kernel-doc:: include/net/page_pool/types.h
:identifiers: struct page_pool_recycle_stats
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 4ebd544ae977..7dc65774cde5 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -55,16 +55,12 @@
#include <net/page_pool/types.h>
#ifdef CONFIG_PAGE_POOL_STATS
+/* Deprecated driver-facing API, use netlink instead */
int page_pool_ethtool_stats_get_count(void);
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
-/*
- * Drivers that wish to harvest page pool stats and report them to users
- * (perhaps via ethtool, debugfs, or another mechanism) can allocate a
- * struct page_pool_stats call page_pool_get_stats to get stats for the specified pool.
- */
-bool page_pool_get_stats(struct page_pool *pool,
+bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats);
#else
static inline int page_pool_ethtool_stats_get_count(void)
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e5bf66d2aa31..9d0a48717d1f 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -76,6 +76,24 @@ enum {
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_STATS_INFO = 1,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+
+ __NETDEV_A_PAGE_POOL_STATS_MAX,
+ NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
@@ -85,6 +103,7 @@ enum {
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 30b99eeb2cd5..14200fb4f0a6 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -16,6 +16,17 @@ struct netlink_range_validation netdev_a_page_pool_id_range = {
.max = 4294967295,
};
+struct netlink_range_validation netdev_a_page_pool_ifindex_range = {
+ .min = 1,
+ .max = 2147483647,
+};
+
+/* Common nested types */
+const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range),
+ [NETDEV_A_PAGE_POOL_IFINDEX] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_page_pool_ifindex_range),
+};
+
/* NETDEV_CMD_DEV_GET - do */
static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = {
[NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
@@ -26,6 +37,11 @@ static const struct nla_policy netdev_page_pool_get_nl_policy[NETDEV_A_PAGE_POOL
[NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range),
};
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+static const struct nla_policy netdev_page_pool_stats_get_nl_policy[NETDEV_A_PAGE_POOL_STATS_INFO + 1] = {
+ [NETDEV_A_PAGE_POOL_STATS_INFO] = NLA_POLICY_NESTED(netdev_page_pool_info_nl_policy),
+};
+
/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
@@ -52,6 +68,18 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.dumpit = netdev_nl_page_pool_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
+ .doit = netdev_nl_page_pool_stats_get_doit,
+ .policy = netdev_page_pool_stats_get_nl_policy,
+ .maxattr = NETDEV_A_PAGE_POOL_STATS_INFO,
+ .flags = GENL_CMD_CAP_DO,
+ },
+ {
+ .cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
+ .dumpit = netdev_nl_page_pool_stats_get_dumpit,
+ .flags = GENL_CMD_CAP_DUMP,
+ },
};
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h
index 738097847100..649e4b46eccf 100644
--- a/net/core/netdev-genl-gen.h
+++ b/net/core/netdev-genl-gen.h
@@ -11,11 +11,18 @@
#include <uapi/linux/netdev.h>
+/* Common nested types */
+extern const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1];
+
int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
+int netdev_nl_page_pool_stats_get_doit(struct sk_buff *skb,
+ struct genl_info *info);
+int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb);
enum {
NETDEV_NLGRP_MGMT,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 57847fbb76a0..366ddb00778f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -71,7 +71,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
* is passed to this API which is filled in. The caller can then report
* those stats to the user (perhaps via ethtool, debugfs, etc.).
*/
-bool page_pool_get_stats(struct page_pool *pool,
+bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats)
{
int cpu = 0;
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 1fb5c3cbe412..f1e679a1b85d 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -5,6 +5,7 @@
#include <linux/xarray.h>
#include <net/net_debug.h>
#include <net/page_pool/types.h>
+#include <net/page_pool/helpers.h>
#include <net/sock.h>
#include "page_pool_priv.h"
@@ -106,6 +107,108 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
return err;
}
+static int
+page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
+ const struct genl_info *info)
+{
+#ifdef CONFIG_PAGE_POOL_STATS
+ struct page_pool_stats stats = {};
+ struct nlattr *nest;
+ void *hdr;
+
+ if (!page_pool_get_stats(pool, &stats))
+ return 0;
+
+ hdr = genlmsg_iput(rsp, info);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ nest = nla_nest_start(rsp, NETDEV_A_PAGE_POOL_STATS_INFO);
+
+ if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_ID, pool->user.id) ||
+ (pool->slow.netdev->ifindex != 1 &&
+ nla_put_u32(rsp, NETDEV_A_PAGE_POOL_IFINDEX,
+ pool->slow.netdev->ifindex)))
+ goto err_cancel_nest;
+
+ nla_nest_end(rsp, nest);
+
+ if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST,
+ stats.alloc_stats.fast) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ stats.alloc_stats.slow) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ stats.alloc_stats.slow_high_order) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ stats.alloc_stats.empty) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ stats.alloc_stats.refill) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ stats.alloc_stats.waive) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ stats.recycle_stats.cached) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ stats.recycle_stats.cache_full) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ stats.recycle_stats.ring) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ stats.recycle_stats.ring_full) ||
+ nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+ stats.recycle_stats.released_refcnt))
+ goto err_cancel_msg;
+
+ genlmsg_end(rsp, hdr);
+
+ return 0;
+err_cancel_nest:
+ nla_nest_cancel(rsp, nest);
+err_cancel_msg:
+ genlmsg_cancel(rsp, hdr);
+ return -EMSGSIZE;
+#else
+ GENL_SET_ERR_MSG(info, "kernel built without CONFIG_PAGE_POOL_STATS");
+ return -EOPNOTSUPP;
+#endif
+}
+
+int netdev_nl_page_pool_stats_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct nlattr *tb[ARRAY_SIZE(netdev_page_pool_info_nl_policy)];
+ struct nlattr *nest;
+ int err;
+ u32 id;
+
+ if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_PAGE_POOL_STATS_INFO))
+ return -EINVAL;
+
+ nest = info->attrs[NETDEV_A_PAGE_POOL_STATS_INFO];
+ err = nla_parse_nested(tb, ARRAY_SIZE(tb) - 1, nest,
+ netdev_page_pool_info_nl_policy,
+ info->extack);
+ if (err)
+ return err;
+
+ if (NL_REQ_ATTR_CHECK(info->extack, nest, tb, NETDEV_A_PAGE_POOL_ID))
+ return -EINVAL;
+ if (tb[NETDEV_A_PAGE_POOL_IFINDEX]) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ tb[NETDEV_A_PAGE_POOL_IFINDEX],
+ "selecting by ifindex not supported");
+ return -EINVAL;
+ }
+
+ id = nla_get_uint(tb[NETDEV_A_PAGE_POOL_ID]);
+
+ return netdev_nl_page_pool_get_do(info, id, page_pool_nl_stats_fill);
+}
+
+int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ return netdev_nl_page_pool_get_dump(skb, cb, page_pool_nl_stats_fill);
+}
+
static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
const struct genl_info *info)
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink
2023-10-24 16:02 ` [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink Jakub Kicinski
@ 2023-10-25 13:50 ` kernel test robot
0 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2023-10-25 13:50 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: llvm, oe-kbuild-all, netdev, edumazet, pabeni, almasrymina, hawk,
ilias.apalodimas, Jakub Kicinski
Hi Jakub,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Jakub-Kicinski/net-page_pool-split-the-page_pool_params-into-fast-and-slow/20231025-023128
base: net-next/main
patch link: https://lore.kernel.org/r/20231024160220.3973311-14-kuba%40kernel.org
patch subject: [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20231025/202310252138.dGM2gwCZ-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231025/202310252138.dGM2gwCZ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310252138.dGM2gwCZ-lkp@intel.com/
All errors (new ones prefixed by >>):
/usr/bin/ld: init/main.o: warning: relocation in read-only section `.ref.text'
/usr/bin/ld: warning: .tmp_vmlinux.kallsyms1 has a LOAD segment with RWX permissions
/usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0xf8): undefined reference to `netdev_nl_page_pool_get_doit'
/usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0x120): undefined reference to `netdev_nl_page_pool_get_dumpit'
>> /usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0x148): undefined reference to `netdev_nl_page_pool_stats_get_doit'
>> /usr/bin/ld: net/core/netdev-genl-gen.o:(.rodata+0x170): undefined reference to `netdev_nl_page_pool_stats_get_dumpit'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
clang: error: linker command failed with exit code 1 (use -v to see invocation)
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH net-next 14/15] net: page_pool: mute the periodic warning for visible page pools
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (12 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 13/15] net: page_pool: expose page pool stats via netlink Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-10-24 16:02 ` [PATCH net-next 15/15] tools: ynl: add sample for getting page-pool information Jakub Kicinski
2023-11-09 8:11 ` [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Ilias Apalodimas
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Mute the periodic "stalled pool shutdown" warning if the page pool
is visible to user space. Rolling out a driver using page pools
to just a few hundred hosts at Meta surfaces applications which
fail to reap their broken sockets. Obviously it's best if the
applications are fixed, but we don't generally print warnings
for application resource leaks. Admins can now depend on the
netlink interface for getting page pool info to detect buggy
apps.
While at it throw in the ID of the pool into the message,
in rare cases (pools from destroyed netns) this will make
finding the pool with a debugger easier.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/page_pool.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 366ddb00778f..0a9f0f762ec8 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -893,18 +893,21 @@ static void page_pool_release_retry(struct work_struct *wq)
{
struct delayed_work *dwq = to_delayed_work(wq);
struct page_pool *pool = container_of(dwq, typeof(*pool), release_dw);
+ void *netdev;
int inflight;
inflight = page_pool_release(pool);
if (!inflight)
return;
- /* Periodic warning */
- if (time_after_eq(jiffies, pool->defer_warn)) {
+ /* Periodic warning for page pools the user can't see */
+ netdev = READ_ONCE(pool->slow.netdev);
+ if (time_after_eq(jiffies, pool->defer_warn) &&
+ (!netdev || netdev == NET_PTR_POISON)) {
int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
- pr_warn("%s() stalled pool shutdown %d inflight %d sec\n",
- __func__, inflight, sec);
+ pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
+ __func__, pool->user.id, inflight, sec);
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH net-next 15/15] tools: ynl: add sample for getting page-pool information
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (13 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 14/15] net: page_pool: mute the periodic warning for visible page pools Jakub Kicinski
@ 2023-10-24 16:02 ` Jakub Kicinski
2023-11-09 8:11 ` [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Ilias Apalodimas
15 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-10-24 16:02 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, almasrymina, hawk, ilias.apalodimas,
Jakub Kicinski
Regenerate the tools/ code after netdev spec changes.
Add sample to query page-pool info in a concise fashion:
$ ./page-pool
eth0[2] page pools: 10 (zombies: 0)
refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
tools/include/uapi/linux/netdev.h | 36 +++
tools/net/ynl/generated/netdev-user.c | 419 ++++++++++++++++++++++++++
tools/net/ynl/generated/netdev-user.h | 171 +++++++++++
tools/net/ynl/lib/ynl.h | 2 +-
tools/net/ynl/samples/.gitignore | 1 +
tools/net/ynl/samples/Makefile | 2 +-
tools/net/ynl/samples/page-pool.c | 147 +++++++++
7 files changed, 776 insertions(+), 2 deletions(-)
create mode 100644 tools/net/ynl/samples/page-pool.c
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 2943a151d4f1..9d0a48717d1f 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -64,16 +64,52 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+ NETDEV_A_PAGE_POOL_ID = 1,
+ NETDEV_A_PAGE_POOL_IFINDEX,
+ NETDEV_A_PAGE_POOL_NAPI_ID,
+ NETDEV_A_PAGE_POOL_INFLIGHT,
+ NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
+ NETDEV_A_PAGE_POOL_DESTROYED,
+
+ __NETDEV_A_PAGE_POOL_MAX,
+ NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
+};
+
+enum {
+ NETDEV_A_PAGE_POOL_STATS_INFO = 1,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
+ NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
+ NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
+
+ __NETDEV_A_PAGE_POOL_STATS_MAX,
+ NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
+};
+
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_GET,
+ NETDEV_CMD_PAGE_POOL_ADD_NTF,
+ NETDEV_CMD_PAGE_POOL_DEL_NTF,
+ NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
+ NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
+#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
diff --git a/tools/net/ynl/generated/netdev-user.c b/tools/net/ynl/generated/netdev-user.c
index b5ffe8cd1144..f9a000584493 100644
--- a/tools/net/ynl/generated/netdev-user.c
+++ b/tools/net/ynl/generated/netdev-user.c
@@ -18,6 +18,11 @@ static const char * const netdev_op_strmap[] = {
[NETDEV_CMD_DEV_ADD_NTF] = "dev-add-ntf",
[NETDEV_CMD_DEV_DEL_NTF] = "dev-del-ntf",
[NETDEV_CMD_DEV_CHANGE_NTF] = "dev-change-ntf",
+ [NETDEV_CMD_PAGE_POOL_GET] = "page-pool-get",
+ [NETDEV_CMD_PAGE_POOL_ADD_NTF] = "page-pool-add-ntf",
+ [NETDEV_CMD_PAGE_POOL_DEL_NTF] = "page-pool-del-ntf",
+ [NETDEV_CMD_PAGE_POOL_CHANGE_NTF] = "page-pool-change-ntf",
+ [NETDEV_CMD_PAGE_POOL_STATS_GET] = "page-pool-stats-get",
};
const char *netdev_op_str(int op)
@@ -59,6 +64,16 @@ const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value)
}
/* Policies */
+struct ynl_policy_attr netdev_page_pool_info_policy[NETDEV_A_PAGE_POOL_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = { .name = "id", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
+};
+
+struct ynl_policy_nest netdev_page_pool_info_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_MAX,
+ .table = netdev_page_pool_info_policy,
+};
+
struct ynl_policy_attr netdev_dev_policy[NETDEV_A_DEV_MAX + 1] = {
[NETDEV_A_DEV_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
[NETDEV_A_DEV_PAD] = { .name = "pad", .type = YNL_PT_IGNORE, },
@@ -72,7 +87,85 @@ struct ynl_policy_nest netdev_dev_nest = {
.table = netdev_dev_policy,
};
+struct ynl_policy_attr netdev_page_pool_policy[NETDEV_A_PAGE_POOL_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_ID] = { .name = "id", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, },
+ [NETDEV_A_PAGE_POOL_NAPI_ID] = { .name = "napi-id", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_INFLIGHT] = { .name = "inflight", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_INFLIGHT_MEM] = { .name = "inflight-mem", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_DESTROYED] = { .name = "destroyed", .type = YNL_PT_UINT, },
+};
+
+struct ynl_policy_nest netdev_page_pool_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_MAX,
+ .table = netdev_page_pool_policy,
+};
+
+struct ynl_policy_attr netdev_page_pool_stats_policy[NETDEV_A_PAGE_POOL_STATS_MAX + 1] = {
+ [NETDEV_A_PAGE_POOL_STATS_INFO] = { .name = "info", .type = YNL_PT_NEST, .nest = &netdev_page_pool_info_nest, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST] = { .name = "alloc-fast", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW] = { .name = "alloc-slow", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER] = { .name = "alloc-slow-high-order", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY] = { .name = "alloc-empty", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL] = { .name = "alloc-refill", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE] = { .name = "alloc-waive", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED] = { .name = "recycle-cached", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL] = { .name = "recycle-cache-full", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING] = { .name = "recycle-ring", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL] = { .name = "recycle-ring-full", .type = YNL_PT_UINT, },
+ [NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT] = { .name = "recycle-released-refcnt", .type = YNL_PT_UINT, },
+};
+
+struct ynl_policy_nest netdev_page_pool_stats_nest = {
+ .max_attr = NETDEV_A_PAGE_POOL_STATS_MAX,
+ .table = netdev_page_pool_stats_policy,
+};
+
/* Common nested types */
+void netdev_page_pool_info_free(struct netdev_page_pool_info *obj)
+{
+}
+
+int netdev_page_pool_info_put(struct nlmsghdr *nlh, unsigned int attr_type,
+ struct netdev_page_pool_info *obj)
+{
+ struct nlattr *nest;
+
+ nest = mnl_attr_nest_start(nlh, attr_type);
+ if (obj->_present.id)
+ mnl_attr_put_uint(nlh, NETDEV_A_PAGE_POOL_ID, obj->id);
+ if (obj->_present.ifindex)
+ mnl_attr_put_u32(nlh, NETDEV_A_PAGE_POOL_IFINDEX, obj->ifindex);
+ mnl_attr_nest_end(nlh, nest);
+
+ return 0;
+}
+
+int netdev_page_pool_info_parse(struct ynl_parse_arg *yarg,
+ const struct nlattr *nested)
+{
+ struct netdev_page_pool_info *dst = yarg->data;
+ const struct nlattr *attr;
+
+ mnl_attr_for_each_nested(attr, nested) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.id = 1;
+ dst->id = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_IFINDEX) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.ifindex = 1;
+ dst->ifindex = mnl_attr_get_u32(attr);
+ }
+ }
+
+ return 0;
+}
+
/* ============== NETDEV_CMD_DEV_GET ============== */
/* NETDEV_CMD_DEV_GET - do */
void netdev_dev_get_req_free(struct netdev_dev_get_req *req)
@@ -197,6 +290,314 @@ void netdev_dev_get_ntf_free(struct netdev_dev_get_ntf *rsp)
free(rsp);
}
+/* ============== NETDEV_CMD_PAGE_POOL_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+void netdev_page_pool_get_req_free(struct netdev_page_pool_get_req *req)
+{
+ free(req);
+}
+
+void netdev_page_pool_get_rsp_free(struct netdev_page_pool_get_rsp *rsp)
+{
+ free(rsp);
+}
+
+int netdev_page_pool_get_rsp_parse(const struct nlmsghdr *nlh, void *data)
+{
+ struct netdev_page_pool_get_rsp *dst;
+ struct ynl_parse_arg *yarg = data;
+ const struct nlattr *attr;
+
+ dst = yarg->data;
+
+ mnl_attr_for_each(attr, nlh, sizeof(struct genlmsghdr)) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.id = 1;
+ dst->id = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_IFINDEX) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.ifindex = 1;
+ dst->ifindex = mnl_attr_get_u32(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_NAPI_ID) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.napi_id = 1;
+ dst->napi_id = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_INFLIGHT) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.inflight = 1;
+ dst->inflight = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_INFLIGHT_MEM) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.inflight_mem = 1;
+ dst->inflight_mem = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_DESTROYED) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.destroyed = 1;
+ dst->destroyed = mnl_attr_get_uint(attr);
+ }
+ }
+
+ return MNL_CB_OK;
+}
+
+struct netdev_page_pool_get_rsp *
+netdev_page_pool_get(struct ynl_sock *ys, struct netdev_page_pool_get_req *req)
+{
+ struct ynl_req_state yrs = { .yarg = { .ys = ys, }, };
+ struct netdev_page_pool_get_rsp *rsp;
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = ynl_gemsg_start_req(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_GET, 1);
+ ys->req_policy = &netdev_page_pool_nest;
+ yrs.yarg.rsp_policy = &netdev_page_pool_nest;
+
+ if (req->_present.id)
+ mnl_attr_put_uint(nlh, NETDEV_A_PAGE_POOL_ID, req->id);
+
+ rsp = calloc(1, sizeof(*rsp));
+ yrs.yarg.data = rsp;
+ yrs.cb = netdev_page_pool_get_rsp_parse;
+ yrs.rsp_cmd = NETDEV_CMD_PAGE_POOL_GET;
+
+ err = ynl_exec(ys, nlh, &yrs);
+ if (err < 0)
+ goto err_free;
+
+ return rsp;
+
+err_free:
+ netdev_page_pool_get_rsp_free(rsp);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_GET - dump */
+void netdev_page_pool_get_list_free(struct netdev_page_pool_get_list *rsp)
+{
+ struct netdev_page_pool_get_list *next = rsp;
+
+ while ((void *)next != YNL_LIST_END) {
+ rsp = next;
+ next = rsp->next;
+
+ free(rsp);
+ }
+}
+
+struct netdev_page_pool_get_list *
+netdev_page_pool_get_dump(struct ynl_sock *ys)
+{
+ struct ynl_dump_state yds = {};
+ struct nlmsghdr *nlh;
+ int err;
+
+ yds.ys = ys;
+ yds.alloc_sz = sizeof(struct netdev_page_pool_get_list);
+ yds.cb = netdev_page_pool_get_rsp_parse;
+ yds.rsp_cmd = NETDEV_CMD_PAGE_POOL_GET;
+ yds.rsp_policy = &netdev_page_pool_nest;
+
+ nlh = ynl_gemsg_start_dump(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_GET, 1);
+
+ err = ynl_exec_dump(ys, nlh, &yds);
+ if (err < 0)
+ goto free_list;
+
+ return yds.first;
+
+free_list:
+ netdev_page_pool_get_list_free(yds.first);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_GET - notify */
+void netdev_page_pool_get_ntf_free(struct netdev_page_pool_get_ntf *rsp)
+{
+ free(rsp);
+}
+
+/* ============== NETDEV_CMD_PAGE_POOL_STATS_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+void
+netdev_page_pool_stats_get_req_free(struct netdev_page_pool_stats_get_req *req)
+{
+ netdev_page_pool_info_free(&req->info);
+ free(req);
+}
+
+void
+netdev_page_pool_stats_get_rsp_free(struct netdev_page_pool_stats_get_rsp *rsp)
+{
+ netdev_page_pool_info_free(&rsp->info);
+ free(rsp);
+}
+
+int netdev_page_pool_stats_get_rsp_parse(const struct nlmsghdr *nlh,
+ void *data)
+{
+ struct netdev_page_pool_stats_get_rsp *dst;
+ struct ynl_parse_arg *yarg = data;
+ const struct nlattr *attr;
+ struct ynl_parse_arg parg;
+
+ dst = yarg->data;
+ parg.ys = yarg->ys;
+
+ mnl_attr_for_each(attr, nlh, sizeof(struct genlmsghdr)) {
+ unsigned int type = mnl_attr_get_type(attr);
+
+ if (type == NETDEV_A_PAGE_POOL_STATS_INFO) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.info = 1;
+
+ parg.rsp_policy = &netdev_page_pool_info_nest;
+ parg.data = &dst->info;
+ if (netdev_page_pool_info_parse(&parg, attr))
+ return MNL_CB_ERROR;
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_fast = 1;
+ dst->alloc_fast = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_slow = 1;
+ dst->alloc_slow = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_slow_high_order = 1;
+ dst->alloc_slow_high_order = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_empty = 1;
+ dst->alloc_empty = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_refill = 1;
+ dst->alloc_refill = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.alloc_waive = 1;
+ dst->alloc_waive = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_cached = 1;
+ dst->recycle_cached = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_cache_full = 1;
+ dst->recycle_cache_full = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_ring = 1;
+ dst->recycle_ring = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_ring_full = 1;
+ dst->recycle_ring_full = mnl_attr_get_uint(attr);
+ } else if (type == NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT) {
+ if (ynl_attr_validate(yarg, attr))
+ return MNL_CB_ERROR;
+ dst->_present.recycle_released_refcnt = 1;
+ dst->recycle_released_refcnt = mnl_attr_get_uint(attr);
+ }
+ }
+
+ return MNL_CB_OK;
+}
+
+struct netdev_page_pool_stats_get_rsp *
+netdev_page_pool_stats_get(struct ynl_sock *ys,
+ struct netdev_page_pool_stats_get_req *req)
+{
+ struct ynl_req_state yrs = { .yarg = { .ys = ys, }, };
+ struct netdev_page_pool_stats_get_rsp *rsp;
+ struct nlmsghdr *nlh;
+ int err;
+
+ nlh = ynl_gemsg_start_req(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_STATS_GET, 1);
+ ys->req_policy = &netdev_page_pool_stats_nest;
+ yrs.yarg.rsp_policy = &netdev_page_pool_stats_nest;
+
+ if (req->_present.info)
+ netdev_page_pool_info_put(nlh, NETDEV_A_PAGE_POOL_STATS_INFO, &req->info);
+
+ rsp = calloc(1, sizeof(*rsp));
+ yrs.yarg.data = rsp;
+ yrs.cb = netdev_page_pool_stats_get_rsp_parse;
+ yrs.rsp_cmd = NETDEV_CMD_PAGE_POOL_STATS_GET;
+
+ err = ynl_exec(ys, nlh, &yrs);
+ if (err < 0)
+ goto err_free;
+
+ return rsp;
+
+err_free:
+ netdev_page_pool_stats_get_rsp_free(rsp);
+ return NULL;
+}
+
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - dump */
+void
+netdev_page_pool_stats_get_list_free(struct netdev_page_pool_stats_get_list *rsp)
+{
+ struct netdev_page_pool_stats_get_list *next = rsp;
+
+ while ((void *)next != YNL_LIST_END) {
+ rsp = next;
+ next = rsp->next;
+
+ netdev_page_pool_info_free(&rsp->obj.info);
+ free(rsp);
+ }
+}
+
+struct netdev_page_pool_stats_get_list *
+netdev_page_pool_stats_get_dump(struct ynl_sock *ys)
+{
+ struct ynl_dump_state yds = {};
+ struct nlmsghdr *nlh;
+ int err;
+
+ yds.ys = ys;
+ yds.alloc_sz = sizeof(struct netdev_page_pool_stats_get_list);
+ yds.cb = netdev_page_pool_stats_get_rsp_parse;
+ yds.rsp_cmd = NETDEV_CMD_PAGE_POOL_STATS_GET;
+ yds.rsp_policy = &netdev_page_pool_stats_nest;
+
+ nlh = ynl_gemsg_start_dump(ys, ys->family_id, NETDEV_CMD_PAGE_POOL_STATS_GET, 1);
+
+ err = ynl_exec_dump(ys, nlh, &yds);
+ if (err < 0)
+ goto free_list;
+
+ return yds.first;
+
+free_list:
+ netdev_page_pool_stats_get_list_free(yds.first);
+ return NULL;
+}
+
static const struct ynl_ntf_info netdev_ntf_info[] = {
[NETDEV_CMD_DEV_ADD_NTF] = {
.alloc_sz = sizeof(struct netdev_dev_get_ntf),
@@ -216,6 +617,24 @@ static const struct ynl_ntf_info netdev_ntf_info[] = {
.policy = &netdev_dev_nest,
.free = (void *)netdev_dev_get_ntf_free,
},
+ [NETDEV_CMD_PAGE_POOL_ADD_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
+ [NETDEV_CMD_PAGE_POOL_DEL_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
+ [NETDEV_CMD_PAGE_POOL_CHANGE_NTF] = {
+ .alloc_sz = sizeof(struct netdev_page_pool_get_ntf),
+ .cb = netdev_page_pool_get_rsp_parse,
+ .policy = &netdev_page_pool_nest,
+ .free = (void *)netdev_page_pool_get_ntf_free,
+ },
};
const struct ynl_family ynl_netdev_family = {
diff --git a/tools/net/ynl/generated/netdev-user.h b/tools/net/ynl/generated/netdev-user.h
index 4fafac879df3..3f31bcc836a1 100644
--- a/tools/net/ynl/generated/netdev-user.h
+++ b/tools/net/ynl/generated/netdev-user.h
@@ -21,6 +21,16 @@ const char *netdev_xdp_act_str(enum netdev_xdp_act value);
const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value);
/* Common nested types */
+struct netdev_page_pool_info {
+ struct {
+ __u32 id:1;
+ __u32 ifindex:1;
+ } _present;
+
+ __u64 id;
+ __u32 ifindex;
+};
+
/* ============== NETDEV_CMD_DEV_GET ============== */
/* NETDEV_CMD_DEV_GET - do */
struct netdev_dev_get_req {
@@ -87,4 +97,165 @@ struct netdev_dev_get_ntf {
void netdev_dev_get_ntf_free(struct netdev_dev_get_ntf *rsp);
+/* ============== NETDEV_CMD_PAGE_POOL_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_GET - do */
+struct netdev_page_pool_get_req {
+ struct {
+ __u32 id:1;
+ } _present;
+
+ __u64 id;
+};
+
+static inline struct netdev_page_pool_get_req *
+netdev_page_pool_get_req_alloc(void)
+{
+ return calloc(1, sizeof(struct netdev_page_pool_get_req));
+}
+void netdev_page_pool_get_req_free(struct netdev_page_pool_get_req *req);
+
+static inline void
+netdev_page_pool_get_req_set_id(struct netdev_page_pool_get_req *req, __u64 id)
+{
+ req->_present.id = 1;
+ req->id = id;
+}
+
+struct netdev_page_pool_get_rsp {
+ struct {
+ __u32 id:1;
+ __u32 ifindex:1;
+ __u32 napi_id:1;
+ __u32 inflight:1;
+ __u32 inflight_mem:1;
+ __u32 destroyed:1;
+ } _present;
+
+ __u64 id;
+ __u32 ifindex;
+ __u64 napi_id;
+ __u64 inflight;
+ __u64 inflight_mem;
+ __u64 destroyed;
+};
+
+void netdev_page_pool_get_rsp_free(struct netdev_page_pool_get_rsp *rsp);
+
+/*
+ * Get / dump information about Page Pools.
+(Only Page Pools associated with a net_device can be listed.)
+
+ */
+struct netdev_page_pool_get_rsp *
+netdev_page_pool_get(struct ynl_sock *ys, struct netdev_page_pool_get_req *req);
+
+/* NETDEV_CMD_PAGE_POOL_GET - dump */
+struct netdev_page_pool_get_list {
+ struct netdev_page_pool_get_list *next;
+ struct netdev_page_pool_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void netdev_page_pool_get_list_free(struct netdev_page_pool_get_list *rsp);
+
+struct netdev_page_pool_get_list *
+netdev_page_pool_get_dump(struct ynl_sock *ys);
+
+/* NETDEV_CMD_PAGE_POOL_GET - notify */
+struct netdev_page_pool_get_ntf {
+ __u16 family;
+ __u8 cmd;
+ struct ynl_ntf_base_type *next;
+ void (*free)(struct netdev_page_pool_get_ntf *ntf);
+ struct netdev_page_pool_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void netdev_page_pool_get_ntf_free(struct netdev_page_pool_get_ntf *rsp);
+
+/* ============== NETDEV_CMD_PAGE_POOL_STATS_GET ============== */
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
+struct netdev_page_pool_stats_get_req {
+ struct {
+ __u32 info:1;
+ } _present;
+
+ struct netdev_page_pool_info info;
+};
+
+static inline struct netdev_page_pool_stats_get_req *
+netdev_page_pool_stats_get_req_alloc(void)
+{
+ return calloc(1, sizeof(struct netdev_page_pool_stats_get_req));
+}
+void
+netdev_page_pool_stats_get_req_free(struct netdev_page_pool_stats_get_req *req);
+
+static inline void
+netdev_page_pool_stats_get_req_set_info_id(struct netdev_page_pool_stats_get_req *req,
+ __u64 id)
+{
+ req->_present.info = 1;
+ req->info._present.id = 1;
+ req->info.id = id;
+}
+static inline void
+netdev_page_pool_stats_get_req_set_info_ifindex(struct netdev_page_pool_stats_get_req *req,
+ __u32 ifindex)
+{
+ req->_present.info = 1;
+ req->info._present.ifindex = 1;
+ req->info.ifindex = ifindex;
+}
+
+struct netdev_page_pool_stats_get_rsp {
+ struct {
+ __u32 info:1;
+ __u32 alloc_fast:1;
+ __u32 alloc_slow:1;
+ __u32 alloc_slow_high_order:1;
+ __u32 alloc_empty:1;
+ __u32 alloc_refill:1;
+ __u32 alloc_waive:1;
+ __u32 recycle_cached:1;
+ __u32 recycle_cache_full:1;
+ __u32 recycle_ring:1;
+ __u32 recycle_ring_full:1;
+ __u32 recycle_released_refcnt:1;
+ } _present;
+
+ struct netdev_page_pool_info info;
+ __u64 alloc_fast;
+ __u64 alloc_slow;
+ __u64 alloc_slow_high_order;
+ __u64 alloc_empty;
+ __u64 alloc_refill;
+ __u64 alloc_waive;
+ __u64 recycle_cached;
+ __u64 recycle_cache_full;
+ __u64 recycle_ring;
+ __u64 recycle_ring_full;
+ __u64 recycle_released_refcnt;
+};
+
+void
+netdev_page_pool_stats_get_rsp_free(struct netdev_page_pool_stats_get_rsp *rsp);
+
+/*
+ * Get page pool statistics.
+ */
+struct netdev_page_pool_stats_get_rsp *
+netdev_page_pool_stats_get(struct ynl_sock *ys,
+ struct netdev_page_pool_stats_get_req *req);
+
+/* NETDEV_CMD_PAGE_POOL_STATS_GET - dump */
+struct netdev_page_pool_stats_get_list {
+ struct netdev_page_pool_stats_get_list *next;
+ struct netdev_page_pool_stats_get_rsp obj __attribute__ ((aligned (8)));
+};
+
+void
+netdev_page_pool_stats_get_list_free(struct netdev_page_pool_stats_get_list *rsp);
+
+struct netdev_page_pool_stats_get_list *
+netdev_page_pool_stats_get_dump(struct ynl_sock *ys);
+
#endif /* _LINUX_NETDEV_GEN_H */
diff --git a/tools/net/ynl/lib/ynl.h b/tools/net/ynl/lib/ynl.h
index e974378e3b8c..075d868f3b57 100644
--- a/tools/net/ynl/lib/ynl.h
+++ b/tools/net/ynl/lib/ynl.h
@@ -239,7 +239,7 @@ int ynl_error_parse(struct ynl_parse_arg *yarg, const char *msg);
#ifndef MNL_HAS_AUTO_SCALARS
static inline uint64_t mnl_attr_get_uint(const struct nlattr *attr)
{
- if (mnl_attr_get_len(attr) == 4)
+ if (mnl_attr_get_payload_len(attr) == 4)
return mnl_attr_get_u32(attr);
return mnl_attr_get_u64(attr);
}
diff --git a/tools/net/ynl/samples/.gitignore b/tools/net/ynl/samples/.gitignore
index 2aae60c4829f..49637b26c482 100644
--- a/tools/net/ynl/samples/.gitignore
+++ b/tools/net/ynl/samples/.gitignore
@@ -1,3 +1,4 @@
ethtool
devlink
netdev
+page-pool
\ No newline at end of file
diff --git a/tools/net/ynl/samples/Makefile b/tools/net/ynl/samples/Makefile
index 3dbb106e87d9..1afefc266b7a 100644
--- a/tools/net/ynl/samples/Makefile
+++ b/tools/net/ynl/samples/Makefile
@@ -18,7 +18,7 @@ include $(wildcard *.d)
all: $(BINS)
-$(BINS): ../lib/ynl.a ../generated/protos.a
+$(BINS): ../lib/ynl.a ../generated/protos.a $(SRCS)
@echo -e '\tCC sample $@'
@$(COMPILE.c) $(CFLAGS_$@) $@.c -o $@.o
@$(LINK.c) $@.o -o $@ $(LDLIBS)
diff --git a/tools/net/ynl/samples/page-pool.c b/tools/net/ynl/samples/page-pool.c
new file mode 100644
index 000000000000..18d359713469
--- /dev/null
+++ b/tools/net/ynl/samples/page-pool.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <string.h>
+
+#include <ynl.h>
+
+#include <net/if.h>
+
+#include "netdev-user.h"
+
+struct stat {
+ unsigned int ifc;
+
+ struct {
+ unsigned int cnt;
+ size_t refs, bytes;
+ } live[2];
+
+ size_t alloc_slow, alloc_fast, recycle_ring, recycle_cache;
+};
+
+struct stats_array {
+ unsigned int i, max;
+ struct stat *s;
+};
+
+static struct stat *find_ifc(struct stats_array *a, unsigned int ifindex)
+{
+ unsigned int i;
+
+ for (i = 0; i < a->i; i++) {
+ if (a->s[i].ifc == ifindex)
+ return &a->s[i];
+ }
+
+ a->i++;
+ if (a->i == a->max) {
+ a->max *= 2;
+ a->s = reallocarray(a->s, a->max, sizeof(*a->s));
+ }
+ a->s[i].ifc = ifindex;
+ return &a->s[i];
+}
+
+static void count(struct stat *s, unsigned int l,
+ struct netdev_page_pool_get_rsp *pp)
+{
+ s->live[l].cnt++;
+ if (pp->_present.inflight)
+ s->live[l].refs += pp->inflight;
+ if (pp->_present.inflight_mem)
+ s->live[l].bytes += pp->inflight_mem;
+}
+
+int main(int argc, char **argv)
+{
+ struct netdev_page_pool_stats_get_list *pp_stats;
+ struct netdev_page_pool_get_list *pools;
+ struct stats_array a = {};
+ struct ynl_error yerr;
+ struct ynl_sock *ys;
+
+ ys = ynl_sock_create(&ynl_netdev_family, &yerr);
+ if (!ys) {
+ fprintf(stderr, "YNL: %s\n", yerr.msg);
+ return 1;
+ }
+
+ a.max = 128;
+ a.s = calloc(a.max, sizeof(*a.s));
+ if (!a.s)
+ goto err_close;
+
+ pools = netdev_page_pool_get_dump(ys);
+ if (!pools)
+ goto err_free;
+
+ ynl_dump_foreach(pools, pp) {
+ struct stat *s = find_ifc(&a, pp->ifindex);
+
+ count(s, 1, pp);
+ if (pp->_present.destroyed)
+ count(s, 0, pp);
+ }
+ netdev_page_pool_get_list_free(pools);
+
+ pp_stats = netdev_page_pool_stats_get_dump(ys);
+ if (!pp_stats)
+ goto err_free;
+
+ ynl_dump_foreach(pp_stats, pp) {
+ struct stat *s = find_ifc(&a, pp->info.ifindex);
+
+ if (pp->_present.alloc_fast)
+ s->alloc_fast += pp->alloc_fast;
+ if (pp->_present.alloc_slow)
+ s->alloc_slow += pp->alloc_slow;
+ if (pp->_present.recycle_ring)
+ s->recycle_ring += pp->recycle_ring;
+ if (pp->_present.recycle_cached)
+ s->recycle_cache += pp->recycle_cached;
+ }
+ netdev_page_pool_stats_get_list_free(pp_stats);
+
+ for (unsigned int i = 0; i < a.i; i++) {
+ char ifname[IF_NAMESIZE];
+ struct stat *s = &a.s[i];
+ const char *name;
+ double recycle;
+
+ if (!s->ifc) {
+ name = "<orphan>\t";
+ } else {
+ name = if_indextoname(s->ifc, ifname);
+ if (name)
+ printf("%8s", name);
+ printf("[%d]\t", s->ifc);
+ }
+
+ printf("page pools: %u (zombies: %u)\n",
+ s->live[1].cnt, s->live[0].cnt);
+ printf("\t\trefs: %zu bytes: %zu (refs: %zu bytes: %zu)\n",
+ s->live[1].refs, s->live[1].bytes,
+ s->live[0].refs, s->live[0].bytes);
+
+ /* We don't know how many pages are sitting in cache and ring
+ * so we will under-count the recycling rate a bit.
+ */
+ recycle = (double)(s->recycle_ring + s->recycle_cache) /
+ (s->alloc_fast + s->alloc_slow) * 100;
+ printf("\t\trecycling: %.1lf%% (alloc: %zu:%zu recycle: %zu:%zu)\n",
+ recycle, s->alloc_slow, s->alloc_fast,
+ s->recycle_ring, s->recycle_cache);
+ }
+
+ ynl_sock_destroy(ys);
+ return 0;
+
+err_free:
+ free(a.s);
+err_close:
+ fprintf(stderr, "YNL: %s\n", ys->err.msg);
+ ynl_sock_destroy(ys);
+ return 2;
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH net-next 00/15] net: page_pool: add netlink-based introspection
2023-10-24 16:02 [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Jakub Kicinski
` (14 preceding siblings ...)
2023-10-24 16:02 ` [PATCH net-next 15/15] tools: ynl: add sample for getting page-pool information Jakub Kicinski
@ 2023-11-09 8:11 ` Ilias Apalodimas
2023-11-09 16:14 ` Jakub Kicinski
15 siblings, 1 reply; 39+ messages in thread
From: Ilias Apalodimas @ 2023-11-09 8:11 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
Hi Jakub,
On Tue, 24 Oct 2023 at 19:02, Jakub Kicinski <kuba@kernel.org> wrote:
>
> This is a new revision of the RFC posted in August:
> https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/
> There's been a handful of fixes and tweaks but the overall
> architecture is unchanged.
>
> As a reminder the RFC was posted as the first step towards
> an API which could configure the page pools (GET API as a stepping
> stone for a SET API to come later). I wasn't sure whether we should
> commit to the GET API before the SET takes shape, hence the large
> delay between versions.
>
> Unfortunately, real deployment experience made this series much more
> urgent. We recently started to deploy newer kernels / drivers
> at Meta, making significant use of page pools for the first time.
That's nice and scary at the same time!
> We immediately run into page pool leaks both real and false positive
> warnings. As Eric pointed out/predicted there's no guarantee that
> applications will read / close their sockets so a page pool page
> may be stuck in a socket (but not leaked) forever. This happens
> a lot in our fleet. Most of these are obviously due to application
> bugs but we should not be printing kernel warnings due to minor
> application resource leaks.
Fair enough, I guess you mean 'continuous warnings'?
>
> Conversely the page pool memory may get leaked at runtime, and
> we have no way to detect / track that, unless someone reconfigures
> the NIC and destroys the page pools which leaked the pages.
>
> The solution presented here is to expose the memory use of page
> pools via netlink. This allows for continuous monitoring of memory
> used by page pools, regardless if they were destroyed or not.
> Sample in patch 15 can print the memory use and recycling
> efficiency:
>
> $ ./page-pool
> eth0[2] page pools: 10 (zombies: 0)
> refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
>
That's reasonable, and the recycling rate is pretty impressive. Any
idea how that translated to enhancements overall? mem/cpu pressure etc
Thanks
/Ilias
> The main change compared to the RFC is that the API now exposes
> outstanding references and byte counts even for "live" page pools.
> The warning is no longer printed if page pool is accessible via netlink.
>
> Jakub Kicinski (15):
> net: page_pool: split the page_pool_params into fast and slow
> net: page_pool: avoid touching slow on the fastpath
> net: page_pool: factor out uninit
> net: page_pool: id the page pools
> net: page_pool: record pools per netdev
> net: page_pool: stash the NAPI ID for easier access
> eth: link netdev to page_pools in drivers
> net: page_pool: add nlspec for basic access to page pools
> net: page_pool: implement GET in the netlink API
> net: page_pool: add netlink notifications for state changes
> net: page_pool: report amount of memory held by page pools
> net: page_pool: report when page pool was destroyed
> net: page_pool: expose page pool stats via netlink
> net: page_pool: mute the periodic warning for visible page pools
> tools: ynl: add sample for getting page-pool information
>
> Documentation/netlink/specs/netdev.yaml | 161 +++++++
> Documentation/networking/page_pool.rst | 10 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
> drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
> include/linux/list.h | 20 +
> include/linux/netdevice.h | 4 +
> include/linux/poison.h | 2 +
> include/net/page_pool/helpers.h | 8 +-
> include/net/page_pool/types.h | 43 +-
> include/uapi/linux/netdev.h | 36 ++
> net/core/Makefile | 2 +-
> net/core/netdev-genl-gen.c | 52 +++
> net/core/netdev-genl-gen.h | 11 +
> net/core/page_pool.c | 78 ++--
> net/core/page_pool_priv.h | 12 +
> net/core/page_pool_user.c | 414 +++++++++++++++++
> tools/include/uapi/linux/netdev.h | 36 ++
> tools/net/ynl/generated/netdev-user.c | 419 ++++++++++++++++++
> tools/net/ynl/generated/netdev-user.h | 171 +++++++
> tools/net/ynl/lib/ynl.h | 2 +-
> tools/net/ynl/samples/.gitignore | 1 +
> tools/net/ynl/samples/Makefile | 2 +-
> tools/net/ynl/samples/page-pool.c | 147 ++++++
> 24 files changed, 1586 insertions(+), 48 deletions(-)
> create mode 100644 net/core/page_pool_priv.h
> create mode 100644 net/core/page_pool_user.c
> create mode 100644 tools/net/ynl/samples/page-pool.c
>
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH net-next 00/15] net: page_pool: add netlink-based introspection
2023-11-09 8:11 ` [PATCH net-next 00/15] net: page_pool: add netlink-based introspection Ilias Apalodimas
@ 2023-11-09 16:14 ` Jakub Kicinski
0 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2023-11-09 16:14 UTC (permalink / raw)
To: Ilias Apalodimas; +Cc: davem, netdev, edumazet, pabeni, almasrymina, hawk
On Thu, 9 Nov 2023 10:11:47 +0200 Ilias Apalodimas wrote:
> > We immediately run into page pool leaks both real and false positive
> > warnings. As Eric pointed out/predicted there's no guarantee that
> > applications will read / close their sockets so a page pool page
> > may be stuck in a socket (but not leaked) forever. This happens
> > a lot in our fleet. Most of these are obviously due to application
> > bugs but we should not be printing kernel warnings due to minor
> > application resource leaks.
>
> Fair enough, I guess you mean 'continuous warnings'?
Yes, in this case but I'm making a general statement.
Or do you mean that there's a typo / grammar issue?
> > Conversely the page pool memory may get leaked at runtime, and
> > we have no way to detect / track that, unless someone reconfigures
> > the NIC and destroys the page pools which leaked the pages.
> >
> > The solution presented here is to expose the memory use of page
> > pools via netlink. This allows for continuous monitoring of memory
> > used by page pools, regardless if they were destroyed or not.
> > Sample in patch 15 can print the memory use and recycling
> > efficiency:
> >
> > $ ./page-pool
> > eth0[2] page pools: 10 (zombies: 0)
> > refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> > recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
>
> That's reasonable, and the recycling rate is pretty impressive.
This is just from a test machine, fresh boot, maybe a short iperf run,
I don't remember now :) In any case not real workload.
> Any idea how that translated to enhancements overall? mem/cpu pressure etc
I haven't collected much prod data at this stage, I'm hoping to add
this to the internal kernel and then do a more thorough investigation.
^ permalink raw reply [flat|nested] 39+ messages in thread