netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model
       [not found] <20240220210342.40267-1-toke@redhat.com>
@ 2024-02-20 21:03 ` Toke Høiland-Jørgensen
  2024-04-03 20:20   ` John Fastabend
  2024-02-20 21:03 ` [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-02-20 21:03 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend
  Cc: Toke Høiland-Jørgensen, Alexander Lobakin, netdev, bpf

To make the system page pool usable as a source for allocating XDP
frames, we need to register it with xdp_reg_mem_model(), so that page
return works correctly. This is done in preparation for using the system
page pool for the XDP live frame mode in BPF_TEST_RUN; for the same
reason, make the per-cpu variable non-static so we can access it from
the test_run code as well.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/netdevice.h |  1 +
 net/core/dev.c            | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c541550b0e6e..e1dfdf0c4075 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3345,6 +3345,7 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd,
 }
 
 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
+DECLARE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
 
 static inline int dev_recursion_level(void)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index d8dd293a7a27..cdb916a647e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -428,7 +428,7 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
  * PP consumers must pay attention to run APIs in the appropriate context
  * (e.g. NAPI context).
  */
-static DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
+DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
 
 #ifdef CONFIG_LOCKDEP
 /*
@@ -11739,12 +11739,20 @@ static int net_page_pool_create(int cpuid)
 		.pool_size = SYSTEM_PERCPU_PAGE_POOL_SIZE,
 		.nid = NUMA_NO_NODE,
 	};
+	struct xdp_mem_info info;
 	struct page_pool *pp_ptr;
+	int err;
 
 	pp_ptr = page_pool_create_percpu(&page_pool_params, cpuid);
 	if (IS_ERR(pp_ptr))
 		return -ENOMEM;
 
+	err = xdp_reg_mem_model(&info, MEM_TYPE_PAGE_POOL, pp_ptr);
+	if (err) {
+		page_pool_destroy(pp_ptr);
+		return err;
+	}
+
 	per_cpu(system_page_pool, cpuid) = pp_ptr;
 #endif
 	return 0;
@@ -11834,12 +11842,15 @@ static int __init net_dev_init(void)
 out:
 	if (rc < 0) {
 		for_each_possible_cpu(i) {
+			struct xdp_mem_info mem = { .type = MEM_TYPE_PAGE_POOL };
 			struct page_pool *pp_ptr;
 
 			pp_ptr = per_cpu(system_page_pool, i);
 			if (!pp_ptr)
 				continue;
 
+			mem.id = pp_ptr->xdp_mem_id;
+			xdp_unreg_mem_model(&mem);
 			page_pool_destroy(pp_ptr);
 			per_cpu(system_page_pool, i) = NULL;
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
       [not found] <20240220210342.40267-1-toke@redhat.com>
  2024-02-20 21:03 ` [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model Toke Høiland-Jørgensen
@ 2024-02-20 21:03 ` Toke Høiland-Jørgensen
  2024-02-21 14:48   ` Toke Høiland-Jørgensen
  2024-04-03 16:34   ` Alexander Lobakin
  2024-02-20 21:03 ` [PATCH net-next v2 3/4] bpf: test_run: Fix cacheline alignment of live XDP frame data structures Toke Høiland-Jørgensen
  2024-02-20 21:03 ` [PATCH net-next v2 4/4] page pool: Remove init_callback parameter Toke Høiland-Jørgensen
  3 siblings, 2 replies; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-02-20 21:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Alexander Lobakin, Eric Dumazet,
	Paolo Abeni, bpf, netdev

The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
each time it is called and uses that to allocate the frames used for the
XDP run. This works well if the syscall is used with a high repetitions
number, as it allows for efficient page recycling. However, if used with
a small number of repetitions, the overhead of creating and tearing down
the page pool is significant, and can even lead to system stalls if the
syscall is called in a tight loop.

Now that we have a persistent system page pool instance, it becomes
pretty straight forward to change the test_run code to use it. The only
wrinkle is that we can no longer rely on a custom page init callback
from page_pool itself; instead, we change the test_run code to write a
random cookie value to the beginning of the page as an indicator that
the page has been initialised and can be re-used without copying the
initial data again.

The cookie is a random 128-bit value, which means the probability that
we will get accidental collisions (which would lead to recycling the
wrong page values and reading garbage) is on the order of 2^-128. This
is in the "won't happen before the heat death of the universe" range, so
this marking is safe for the intended usage.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/bpf/test_run.c | 139 +++++++++++++++++++++++----------------------
 1 file changed, 71 insertions(+), 68 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index dfd919374017..60a36a4df3e1 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -94,10 +94,19 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations,
 }
 
 /* We put this struct at the head of each page with a context and frame
- * initialised when the page is allocated, so we don't have to do this on each
- * repetition of the test run.
+ * initialised the first time a given page is used, saving the memcpy() of the
+ * data on subsequent repetition of the test run. The cookie value is used to
+ * mark the page data the first time we initialise it so we can skip it the next
+ * time we see that page.
  */
+
+struct xdp_page_cookie {
+	u64 val1;
+	u64 val2;
+};
+
 struct xdp_page_head {
+	struct xdp_page_cookie cookie;
 	struct xdp_buff orig_ctx;
 	struct xdp_buff ctx;
 	union {
@@ -111,10 +120,9 @@ struct xdp_test_data {
 	struct xdp_buff *orig_ctx;
 	struct xdp_rxq_info rxq;
 	struct net_device *dev;
-	struct page_pool *pp;
 	struct xdp_frame **frames;
 	struct sk_buff **skbs;
-	struct xdp_mem_info mem;
+	struct xdp_page_cookie cookie;
 	u32 batch_size;
 	u32 frame_cnt;
 };
@@ -126,48 +134,9 @@ struct xdp_test_data {
 #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head))
 #define TEST_XDP_MAX_BATCH 256
 
-static void xdp_test_run_init_page(struct page *page, void *arg)
-{
-	struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
-	struct xdp_buff *new_ctx, *orig_ctx;
-	u32 headroom = XDP_PACKET_HEADROOM;
-	struct xdp_test_data *xdp = arg;
-	size_t frm_len, meta_len;
-	struct xdp_frame *frm;
-	void *data;
-
-	orig_ctx = xdp->orig_ctx;
-	frm_len = orig_ctx->data_end - orig_ctx->data_meta;
-	meta_len = orig_ctx->data - orig_ctx->data_meta;
-	headroom -= meta_len;
-
-	new_ctx = &head->ctx;
-	frm = head->frame;
-	data = head->data;
-	memcpy(data + headroom, orig_ctx->data_meta, frm_len);
-
-	xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq);
-	xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
-	new_ctx->data = new_ctx->data_meta + meta_len;
-
-	xdp_update_frame_from_buff(new_ctx, frm);
-	frm->mem = new_ctx->rxq->mem;
-
-	memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
-}
-
 static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx)
 {
-	struct page_pool *pp;
 	int err = -ENOMEM;
-	struct page_pool_params pp_params = {
-		.order = 0,
-		.flags = 0,
-		.pool_size = xdp->batch_size,
-		.nid = NUMA_NO_NODE,
-		.init_callback = xdp_test_run_init_page,
-		.init_arg = xdp,
-	};
 
 	xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL);
 	if (!xdp->frames)
@@ -177,34 +146,21 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c
 	if (!xdp->skbs)
 		goto err_skbs;
 
-	pp = page_pool_create(&pp_params);
-	if (IS_ERR(pp)) {
-		err = PTR_ERR(pp);
-		goto err_pp;
-	}
-
-	/* will copy 'mem.id' into pp->xdp_mem_id */
-	err = xdp_reg_mem_model(&xdp->mem, MEM_TYPE_PAGE_POOL, pp);
-	if (err)
-		goto err_mmodel;
-
-	xdp->pp = pp;
-
 	/* We create a 'fake' RXQ referencing the original dev, but with an
 	 * xdp_mem_info pointing to our page_pool
 	 */
 	xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
-	xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
-	xdp->rxq.mem.id = pp->xdp_mem_id;
+	xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; /* mem id is set per-frame below */
 	xdp->dev = orig_ctx->rxq->dev;
 	xdp->orig_ctx = orig_ctx;
 
+	/* We need a random cookie for each run as pages can stick around
+	 * between runs in the system page pool
+	 */
+	get_random_bytes(&xdp->cookie, sizeof(xdp->cookie));
+
 	return 0;
 
-err_mmodel:
-	page_pool_destroy(pp);
-err_pp:
-	kvfree(xdp->skbs);
 err_skbs:
 	kvfree(xdp->frames);
 	return err;
@@ -212,8 +168,6 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c
 
 static void xdp_test_run_teardown(struct xdp_test_data *xdp)
 {
-	xdp_unreg_mem_model(&xdp->mem);
-	page_pool_destroy(xdp->pp);
 	kfree(xdp->frames);
 	kfree(xdp->skbs);
 }
@@ -235,8 +189,12 @@ static bool ctx_was_changed(struct xdp_page_head *head)
 		head->orig_ctx.data_end != head->ctx.data_end;
 }
 
-static void reset_ctx(struct xdp_page_head *head)
+static void reset_ctx(struct xdp_page_head *head, struct xdp_test_data *xdp)
 {
+	/* mem id can change if we migrate CPUs between batches */
+	if (head->frame->mem.id != xdp->rxq.mem.id)
+		head->frame->mem.id = xdp->rxq.mem.id;
+
 	if (likely(!frame_was_changed(head) && !ctx_was_changed(head)))
 		return;
 
@@ -246,6 +204,48 @@ static void reset_ctx(struct xdp_page_head *head)
 	xdp_update_frame_from_buff(&head->ctx, head->frame);
 }
 
+static struct xdp_page_head *
+xdp_test_run_init_page(struct page *page, struct xdp_test_data *xdp)
+{
+	struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
+	struct xdp_buff *new_ctx, *orig_ctx;
+	u32 headroom = XDP_PACKET_HEADROOM;
+	size_t frm_len, meta_len;
+	struct xdp_frame *frm;
+	void *data;
+
+	/* Optimise for the recycle case, which is the normal case when doing
+	 * high-repetition REDIRECTS to drivers that return frames.
+	 */
+	if (likely(!memcmp(&head->cookie, &xdp->cookie, sizeof(head->cookie)))) {
+		reset_ctx(head, xdp);
+		return head;
+	}
+
+	head->cookie = xdp->cookie;
+
+	orig_ctx = xdp->orig_ctx;
+	frm_len = orig_ctx->data_end - orig_ctx->data_meta;
+	meta_len = orig_ctx->data - orig_ctx->data_meta;
+	headroom -= meta_len;
+
+	new_ctx = &head->ctx;
+	frm = head->frame;
+	data = head->data;
+	memcpy(data + headroom, orig_ctx->data_meta, frm_len);
+
+	xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq);
+	xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
+	new_ctx->data = new_ctx->data_meta + meta_len;
+
+	xdp_update_frame_from_buff(new_ctx, frm);
+	frm->mem = new_ctx->rxq->mem;
+
+	memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
+
+	return head;
+}
+
 static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
 			   struct sk_buff **skbs,
 			   struct net_device *dev)
@@ -287,6 +287,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
 	struct xdp_page_head *head;
 	struct xdp_frame *frm;
 	bool redirect = false;
+	struct page_pool *pp;
 	struct xdp_buff *ctx;
 	struct page *page;
 
@@ -295,15 +296,17 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
 	local_bh_disable();
 	xdp_set_return_frame_no_direct();
 
+	pp = this_cpu_read(system_page_pool);
+	xdp->rxq.mem.id = pp->xdp_mem_id;
+
 	for (i = 0; i < batch_sz; i++) {
-		page = page_pool_dev_alloc_pages(xdp->pp);
+		page = page_pool_dev_alloc_pages(pp);
 		if (!page) {
 			err = -ENOMEM;
 			goto out;
 		}
 
-		head = phys_to_virt(page_to_phys(page));
-		reset_ctx(head);
+		head = xdp_test_run_init_page(page, xdp);
 		ctx = &head->ctx;
 		frm = head->frame;
 		xdp->frame_cnt++;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v2 3/4] bpf: test_run: Fix cacheline alignment of live XDP frame data structures
       [not found] <20240220210342.40267-1-toke@redhat.com>
  2024-02-20 21:03 ` [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model Toke Høiland-Jørgensen
  2024-02-20 21:03 ` [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode Toke Høiland-Jørgensen
@ 2024-02-20 21:03 ` Toke Høiland-Jørgensen
  2024-02-20 21:03 ` [PATCH net-next v2 4/4] page pool: Remove init_callback parameter Toke Høiland-Jørgensen
  3 siblings, 0 replies; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-02-20 21:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Alexander Lobakin, Eric Dumazet,
	Paolo Abeni, bpf, netdev

The live XDP frame code in BPF_PROG_RUN suffered from suboptimal cache
line placement due to the forced cache line alignment of struct
xdp_rxq_info. Rearrange things so we don't waste a whole cache line on
padding, and also add explicit alignment to the data_hard_start field in
the start-of-page data structure we use for the data pages.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/bpf/test_run.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 60a36a4df3e1..485084c302b2 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -113,12 +113,12 @@ struct xdp_page_head {
 		/* ::data_hard_start starts here */
 		DECLARE_FLEX_ARRAY(struct xdp_frame, frame);
 		DECLARE_FLEX_ARRAY(u8, data);
-	};
+	} ____cacheline_aligned;
 };
 
 struct xdp_test_data {
-	struct xdp_buff *orig_ctx;
 	struct xdp_rxq_info rxq;
+	struct xdp_buff *orig_ctx;
 	struct net_device *dev;
 	struct xdp_frame **frames;
 	struct sk_buff **skbs;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v2 4/4] page pool: Remove init_callback parameter
       [not found] <20240220210342.40267-1-toke@redhat.com>
                   ` (2 preceding siblings ...)
  2024-02-20 21:03 ` [PATCH net-next v2 3/4] bpf: test_run: Fix cacheline alignment of live XDP frame data structures Toke Høiland-Jørgensen
@ 2024-02-20 21:03 ` Toke Høiland-Jørgensen
  2024-02-29 18:12   ` Ilias Apalodimas
  3 siblings, 1 reply; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-02-20 21:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Ilias Apalodimas
  Cc: Toke Høiland-Jørgensen, Alexander Lobakin,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev

The only user of the init_callback parameter to page pool was the
BPF_TEST_RUN code. Since that has now been moved to use a different
scheme, we can get rid of the init callback entirely.

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/page_pool/types.h | 4 ----
 net/core/page_pool.c          | 4 ----
 2 files changed, 8 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 3828396ae60c..2f5975ab2cd0 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -69,9 +69,6 @@ struct page_pool_params {
 	);
 	struct_group_tagged(page_pool_params_slow, slow,
 		struct net_device *netdev;
-/* private: used by test code only */
-		void (*init_callback)(struct page *page, void *arg);
-		void *init_arg;
 	);
 };
 
@@ -129,7 +126,6 @@ struct page_pool {
 	struct page_pool_params_fast p;
 
 	int cpuid;
-	bool has_init_callback;
 
 	long frag_users;
 	struct page *frag_page;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 89c835fcf094..fd054b6f773a 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -217,8 +217,6 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	pool->has_init_callback = !!pool->slow.init_callback;
-
 #ifdef CONFIG_PAGE_POOL_STATS
 	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
 	if (!pool->recycle_stats)
@@ -428,8 +426,6 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 	 * the overhead is negligible.
 	 */
 	page_pool_fragment_page(page, 1);
-	if (pool->has_init_callback)
-		pool->slow.init_callback(page, pool->slow.init_arg);
 }
 
 static void page_pool_clear_pp_info(struct page *page)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-02-20 21:03 ` [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode Toke Høiland-Jørgensen
@ 2024-02-21 14:48   ` Toke Høiland-Jørgensen
  2024-04-04 11:23     ` Jesper Dangaard Brouer
  2024-04-03 16:34   ` Alexander Lobakin
  1 sibling, 1 reply; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-02-21 14:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, bpf, netdev

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> The cookie is a random 128-bit value, which means the probability that
> we will get accidental collisions (which would lead to recycling the
> wrong page values and reading garbage) is on the order of 2^-128. This
> is in the "won't happen before the heat death of the universe" range, so
> this marking is safe for the intended usage.

Alright, got a second opinion on this from someone better at security
than me; I'll go try out some different ideas :)

pw-bot: changes-requested


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 4/4] page pool: Remove init_callback parameter
  2024-02-20 21:03 ` [PATCH net-next v2 4/4] page pool: Remove init_callback parameter Toke Høiland-Jørgensen
@ 2024-02-29 18:12   ` Ilias Apalodimas
  0 siblings, 0 replies; 14+ messages in thread
From: Ilias Apalodimas @ 2024-02-29 18:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, Alexander Lobakin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, 20 Feb 2024 at 23:03, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> The only user of the init_callback parameter to page pool was the
> BPF_TEST_RUN code. Since that has now been moved to use a different
> scheme, we can get rid of the init callback entirely.
>
> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/net/page_pool/types.h | 4 ----
>  net/core/page_pool.c          | 4 ----
>  2 files changed, 8 deletions(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 3828396ae60c..2f5975ab2cd0 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -69,9 +69,6 @@ struct page_pool_params {
>         );
>         struct_group_tagged(page_pool_params_slow, slow,
>                 struct net_device *netdev;
> -/* private: used by test code only */
> -               void (*init_callback)(struct page *page, void *arg);
> -               void *init_arg;
>         );
>  };
>
> @@ -129,7 +126,6 @@ struct page_pool {
>         struct page_pool_params_fast p;
>
>         int cpuid;
> -       bool has_init_callback;
>
>         long frag_users;
>         struct page *frag_page;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 89c835fcf094..fd054b6f773a 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -217,8 +217,6 @@ static int page_pool_init(struct page_pool *pool,
>                  */
>         }
>
> -       pool->has_init_callback = !!pool->slow.init_callback;
> -
>  #ifdef CONFIG_PAGE_POOL_STATS
>         pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
>         if (!pool->recycle_stats)
> @@ -428,8 +426,6 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>          * the overhead is negligible.
>          */
>         page_pool_fragment_page(page, 1);
> -       if (pool->has_init_callback)
> -               pool->slow.init_callback(page, pool->slow.init_arg);
>  }
>
>  static void page_pool_clear_pp_info(struct page *page)
> --
> 2.43.0
>

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-02-20 21:03 ` [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode Toke Høiland-Jørgensen
  2024-02-21 14:48   ` Toke Høiland-Jørgensen
@ 2024-04-03 16:34   ` Alexander Lobakin
  2024-04-03 20:39     ` John Fastabend
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2024-04-03 16:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Eric Dumazet, Paolo Abeni, bpf, netdev

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Tue, 20 Feb 2024 22:03:39 +0100

> The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
> each time it is called and uses that to allocate the frames used for the
> XDP run. This works well if the syscall is used with a high repetitions
> number, as it allows for efficient page recycling. However, if used with
> a small number of repetitions, the overhead of creating and tearing down
> the page pool is significant, and can even lead to system stalls if the
> syscall is called in a tight loop.
> 
> Now that we have a persistent system page pool instance, it becomes
> pretty straight forward to change the test_run code to use it. The only
> wrinkle is that we can no longer rely on a custom page init callback
> from page_pool itself; instead, we change the test_run code to write a
> random cookie value to the beginning of the page as an indicator that
> the page has been initialised and can be re-used without copying the
> initial data again.
> 
> The cookie is a random 128-bit value, which means the probability that
> we will get accidental collisions (which would lead to recycling the
> wrong page values and reading garbage) is on the order of 2^-128. This
> is in the "won't happen before the heat death of the universe" range, so
> this marking is safe for the intended usage.
> 
> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Hey,

What's the status of this series, now that the window is open?

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model
  2024-02-20 21:03 ` [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model Toke Høiland-Jørgensen
@ 2024-04-03 20:20   ` John Fastabend
  2024-04-04  9:08     ` Alexander Lobakin
  0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2024-04-03 20:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend
  Cc: Toke Høiland-Jørgensen, Alexander Lobakin, netdev, bpf

Toke Høiland-Jørgensen wrote:
> To make the system page pool usable as a source for allocating XDP
> frames, we need to register it with xdp_reg_mem_model(), so that page
> return works correctly. This is done in preparation for using the system
> page pool for the XDP live frame mode in BPF_TEST_RUN; for the same
> reason, make the per-cpu variable non-static so we can access it from
> the test_run code as well.
> 
> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/linux/netdevice.h |  1 +
>  net/core/dev.c            | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index c541550b0e6e..e1dfdf0c4075 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3345,6 +3345,7 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd,
>  }
>  
>  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
> +DECLARE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
>  
>  static inline int dev_recursion_level(void)
>  {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d8dd293a7a27..cdb916a647e7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -428,7 +428,7 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
>   * PP consumers must pay attention to run APIs in the appropriate context
>   * (e.g. NAPI context).
>   */
> -static DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
> +DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
>  
>  #ifdef CONFIG_LOCKDEP
>  /*
> @@ -11739,12 +11739,20 @@ static int net_page_pool_create(int cpuid)
>  		.pool_size = SYSTEM_PERCPU_PAGE_POOL_SIZE,
>  		.nid = NUMA_NO_NODE,
>  	};
> +	struct xdp_mem_info info;
>  	struct page_pool *pp_ptr;
> +	int err;
>  
>  	pp_ptr = page_pool_create_percpu(&page_pool_params, cpuid);
>  	if (IS_ERR(pp_ptr))
>  		return -ENOMEM;
>  
> +	err = xdp_reg_mem_model(&info, MEM_TYPE_PAGE_POOL, pp_ptr);
> +	if (err) {
> +		page_pool_destroy(pp_ptr);
> +		return err;
> +	}
> +
>  	per_cpu(system_page_pool, cpuid) = pp_ptr;
>  #endif
>  	return 0;
> @@ -11834,12 +11842,15 @@ static int __init net_dev_init(void)
>  out:
>  	if (rc < 0) {
>  		for_each_possible_cpu(i) {
> +			struct xdp_mem_info mem = { .type = MEM_TYPE_PAGE_POOL };
>  			struct page_pool *pp_ptr;
>  
>  			pp_ptr = per_cpu(system_page_pool, i);
>  			if (!pp_ptr)
>  				continue;
>  
> +			mem.id = pp_ptr->xdp_mem_id;
> +			xdp_unreg_mem_model(&mem);

Take it or leave it, a net_page_pool_destroy(int cpuid) would be
symmetric here.

>  			page_pool_destroy(pp_ptr);
>  			per_cpu(system_page_pool, i) = NULL;
>  		}

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-04-03 16:34   ` Alexander Lobakin
@ 2024-04-03 20:39     ` John Fastabend
  2024-04-04 11:43       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2024-04-03 20:39 UTC (permalink / raw)
  To: Alexander Lobakin, Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Eric Dumazet, Paolo Abeni, bpf, netdev

Alexander Lobakin wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> Date: Tue, 20 Feb 2024 22:03:39 +0100
> 
> > The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
> > each time it is called and uses that to allocate the frames used for the
> > XDP run. This works well if the syscall is used with a high repetitions
> > number, as it allows for efficient page recycling. However, if used with
> > a small number of repetitions, the overhead of creating and tearing down
> > the page pool is significant, and can even lead to system stalls if the
> > syscall is called in a tight loop.
> > 
> > Now that we have a persistent system page pool instance, it becomes
> > pretty straight forward to change the test_run code to use it. The only
> > wrinkle is that we can no longer rely on a custom page init callback
> > from page_pool itself; instead, we change the test_run code to write a
> > random cookie value to the beginning of the page as an indicator that
> > the page has been initialised and can be re-used without copying the
> > initial data again.
> > 
> > The cookie is a random 128-bit value, which means the probability that
> > we will get accidental collisions (which would lead to recycling the
> > wrong page values and reading garbage) is on the order of 2^-128. This
> > is in the "won't happen before the heat death of the universe" range, so
> > this marking is safe for the intended usage.
> > 
> > Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> > Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Hey,
> 
> What's the status of this series, now that the window is open?
> 
> Thanks,
> Olek

Hi Toke,

I read the thread from top to bottom so seems someone else notices the
2^128 is unique numbers not the collision probability. Anywaays I'm still
a bit confused, whats the use case here? Maybe I need to understand
what this XDP live frame mode is better?

Could another solution be to avoid calling BPF_TEST_RUN multiple times
in a row? Or perhaps have a BPF_SETUP_RUN that does the config and lets
BPF_TEST_RUN skip the page allocation? Another idea just have the first
run of BPF_TEST_RUN init a page pool and not destroy it.

Thanks,
John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model
  2024-04-03 20:20   ` John Fastabend
@ 2024-04-04  9:08     ` Alexander Lobakin
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Lobakin @ 2024-04-04  9:08 UTC (permalink / raw)
  To: John Fastabend, Toke Høiland-Jørgensen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: netdev, bpf

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 03 Apr 2024 13:20:32 -0700

> Toke Høiland-Jørgensen wrote:
>> To make the system page pool usable as a source for allocating XDP
>> frames, we need to register it with xdp_reg_mem_model(), so that page
>> return works correctly. This is done in preparation for using the system
>> page pool for the XDP live frame mode in BPF_TEST_RUN; for the same
>> reason, make the per-cpu variable non-static so we can access it from
>> the test_run code as well.
>>
>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  include/linux/netdevice.h |  1 +
>>  net/core/dev.c            | 13 ++++++++++++-
>>  2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index c541550b0e6e..e1dfdf0c4075 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3345,6 +3345,7 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd,
>>  }
>>  
>>  DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
>> +DECLARE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
>>  
>>  static inline int dev_recursion_level(void)
>>  {
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index d8dd293a7a27..cdb916a647e7 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -428,7 +428,7 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
>>   * PP consumers must pay attention to run APIs in the appropriate context
>>   * (e.g. NAPI context).
>>   */
>> -static DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
>> +DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool);
>>  
>>  #ifdef CONFIG_LOCKDEP
>>  /*
>> @@ -11739,12 +11739,20 @@ static int net_page_pool_create(int cpuid)
>>  		.pool_size = SYSTEM_PERCPU_PAGE_POOL_SIZE,
>>  		.nid = NUMA_NO_NODE,
>>  	};
>> +	struct xdp_mem_info info;
>>  	struct page_pool *pp_ptr;
>> +	int err;
>>  
>>  	pp_ptr = page_pool_create_percpu(&page_pool_params, cpuid);
>>  	if (IS_ERR(pp_ptr))
>>  		return -ENOMEM;
>>  
>> +	err = xdp_reg_mem_model(&info, MEM_TYPE_PAGE_POOL, pp_ptr);
>> +	if (err) {
>> +		page_pool_destroy(pp_ptr);
>> +		return err;
>> +	}
>> +
>>  	per_cpu(system_page_pool, cpuid) = pp_ptr;
>>  #endif
>>  	return 0;
>> @@ -11834,12 +11842,15 @@ static int __init net_dev_init(void)
>>  out:
>>  	if (rc < 0) {
>>  		for_each_possible_cpu(i) {
>> +			struct xdp_mem_info mem = { .type = MEM_TYPE_PAGE_POOL };
>>  			struct page_pool *pp_ptr;
>>  
>>  			pp_ptr = per_cpu(system_page_pool, i);
>>  			if (!pp_ptr)
>>  				continue;
>>  
>> +			mem.id = pp_ptr->xdp_mem_id;
>> +			xdp_unreg_mem_model(&mem);
> 
> Take it or leave it, a net_page_pool_destroy(int cpuid) would be
> symmetric here.
> 
>>  			page_pool_destroy(pp_ptr);

I believe it's better to remove this page_pool_destroy() and let
xdp_unreg_mem_model() destroy it.

>>  			per_cpu(system_page_pool, i) = NULL;
>>  		}
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-02-21 14:48   ` Toke Høiland-Jørgensen
@ 2024-04-04 11:23     ` Jesper Dangaard Brouer
  2024-04-04 13:34       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2024-04-04 11:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, bpf, netdev



On 21/02/2024 15.48, Toke Høiland-Jørgensen wrote:
> Toke Høiland-Jørgensen <toke@redhat.com> writes:
> 
>> The cookie is a random 128-bit value, which means the probability that
>> we will get accidental collisions (which would lead to recycling the
>> wrong page values and reading garbage) is on the order of 2^-128. This
>> is in the "won't happen before the heat death of the universe" range, so
>> this marking is safe for the intended usage.
> 
> Alright, got a second opinion on this from someone better at security
> than me; I'll go try out some different ideas :)

It is a general security concern for me that BPF test_run gets access to
memory used by 'system page pool', with the concern of leaking data
(from real traffic) to an attacker than can inject a BPF test_run
program via e.g. a CI pipeline.

I'm not saying we leaking data today in BPF/XDP progs, but there is a
potential, because to gain performance in XDP and page_pool we don't
clear memory to avoid cache line performance issues.
I guess today, I could BPF tail extend and read packet data from older
frames, in this way, if I get access to 'system page pool'.

--Jesper

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-04-03 20:39     ` John Fastabend
@ 2024-04-04 11:43       ` Jesper Dangaard Brouer
  2024-04-04 13:09         ` Alexander Lobakin
  0 siblings, 1 reply; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2024-04-04 11:43 UTC (permalink / raw)
  To: John Fastabend, Alexander Lobakin,
	Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, bpf, netdev



On 03/04/2024 22.39, John Fastabend wrote:
> Alexander Lobakin wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> Date: Tue, 20 Feb 2024 22:03:39 +0100
>>
>>> The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
>>> each time it is called and uses that to allocate the frames used for the
>>> XDP run. This works well if the syscall is used with a high repetitions
>>> number, as it allows for efficient page recycling. However, if used with
>>> a small number of repetitions, the overhead of creating and tearing down
>>> the page pool is significant, and can even lead to system stalls if the
>>> syscall is called in a tight loop.
>>>
>>> Now that we have a persistent system page pool instance, it becomes
>>> pretty straight forward to change the test_run code to use it. The only
>>> wrinkle is that we can no longer rely on a custom page init callback
>>> from page_pool itself; instead, we change the test_run code to write a
>>> random cookie value to the beginning of the page as an indicator that
>>> the page has been initialised and can be re-used without copying the
>>> initial data again.
>>>
>>> The cookie is a random 128-bit value, which means the probability that
>>> we will get accidental collisions (which would lead to recycling the
>>> wrong page values and reading garbage) is on the order of 2^-128. This
>>> is in the "won't happen before the heat death of the universe" range, so
>>> this marking is safe for the intended usage.
>>>
>>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Hey,
>>
>> What's the status of this series, now that the window is open?
>>
>> Thanks,
>> Olek
> 
> Hi Toke,
> 
> I read the thread from top to bottom so seems someone else notices the
> 2^128 is unique numbers not the collision probability. Anywaays I'm still
> a bit confused, whats the use case here? Maybe I need to understand
> what this XDP live frame mode is better?
> 
> Could another solution be to avoid calling BPF_TEST_RUN multiple times
> in a row? Or perhaps have a BPF_SETUP_RUN that does the config and lets
> BPF_TEST_RUN skip the page allocation? Another idea just have the first
> run of BPF_TEST_RUN init a page pool and not destroy it.
> 

I like John's idea of "the first run of BPF_TEST_RUN init a page pool
and not destroy it".  On exit we could start a work-queue that tried to
"destroy" the PP (in the future) if it's not in use.

Page pool (PP) performance comes from having an association with a
SINGLE RX-queue, which means no synchronization is needed then
"allocating" new pages (from the alloc cache array).

Thus, IMHO each BPF_TEST_RUN should gets it's own PP instance, as then
lockless PP scheme works (and we don't have to depend on NAPI /
BH-disable).  This still works with John's idea, as we could simply have
a list of PP instances that can be reused.

--Jesper

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-04-04 11:43       ` Jesper Dangaard Brouer
@ 2024-04-04 13:09         ` Alexander Lobakin
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Lobakin @ 2024-04-04 13:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, bpf, netdev

From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: Thu, 4 Apr 2024 13:43:12 +0200

> 
> 
> On 03/04/2024 22.39, John Fastabend wrote:
>> Alexander Lobakin wrote:
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>> Date: Tue, 20 Feb 2024 22:03:39 +0100
>>>
>>>> The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
>>>> each time it is called and uses that to allocate the frames used for
>>>> the
>>>> XDP run. This works well if the syscall is used with a high repetitions
>>>> number, as it allows for efficient page recycling. However, if used
>>>> with
>>>> a small number of repetitions, the overhead of creating and tearing
>>>> down
>>>> the page pool is significant, and can even lead to system stalls if the
>>>> syscall is called in a tight loop.
>>>>
>>>> Now that we have a persistent system page pool instance, it becomes
>>>> pretty straight forward to change the test_run code to use it. The only
>>>> wrinkle is that we can no longer rely on a custom page init callback
>>>> from page_pool itself; instead, we change the test_run code to write a
>>>> random cookie value to the beginning of the page as an indicator that
>>>> the page has been initialised and can be re-used without copying the
>>>> initial data again.
>>>>
>>>> The cookie is a random 128-bit value, which means the probability that
>>>> we will get accidental collisions (which would lead to recycling the
>>>> wrong page values and reading garbage) is on the order of 2^-128. This
>>>> is in the "won't happen before the heat death of the universe"
>>>> range, so
>>>> this marking is safe for the intended usage.
>>>>
>>>> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> Hey,
>>>
>>> What's the status of this series, now that the window is open?
>>>
>>> Thanks,
>>> Olek
>>
>> Hi Toke,
>>
>> I read the thread from top to bottom so seems someone else notices the
>> 2^128 is unique numbers not the collision probability. Anywaays I'm still
>> a bit confused, whats the use case here? Maybe I need to understand
>> what this XDP live frame mode is better?
>>
>> Could another solution be to avoid calling BPF_TEST_RUN multiple times
>> in a row? Or perhaps have a BPF_SETUP_RUN that does the config and lets
>> BPF_TEST_RUN skip the page allocation? Another idea just have the first
>> run of BPF_TEST_RUN init a page pool and not destroy it.
>>
> 
> I like John's idea of "the first run of BPF_TEST_RUN init a page pool
> and not destroy it".  On exit we could start a work-queue that tried to
> "destroy" the PP (in the future) if it's not in use.
> 
> Page pool (PP) performance comes from having an association with a
> SINGLE RX-queue, which means no synchronization is needed then
> "allocating" new pages (from the alloc cache array).
> 
> Thus, IMHO each BPF_TEST_RUN should gets it's own PP instance, as then
> lockless PP scheme works (and we don't have to depend on NAPI /
> BH-disable).  This still works with John's idea, as we could simply have
> a list of PP instances that can be reused.

Lockless PP scheme works for percpu PPs as well via page_pool::cpuid,
seems like you missed this change?
Percpu page_pool is CPU-local, which means it absolutely can't be
accessed from several threads simultaneously.

> 
> --Jesper

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode
  2024-04-04 11:23     ` Jesper Dangaard Brouer
@ 2024-04-04 13:34       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 14+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-04-04 13:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, David S. Miller, Jakub Kicinski
  Cc: Alexander Lobakin, Eric Dumazet, Paolo Abeni, bpf, netdev

Jesper Dangaard Brouer <hawk@kernel.org> writes:

> On 21/02/2024 15.48, Toke Høiland-Jørgensen wrote:
>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> 
>>> The cookie is a random 128-bit value, which means the probability that
>>> we will get accidental collisions (which would lead to recycling the
>>> wrong page values and reading garbage) is on the order of 2^-128. This
>>> is in the "won't happen before the heat death of the universe" range, so
>>> this marking is safe for the intended usage.
>> 
>> Alright, got a second opinion on this from someone better at security
>> than me; I'll go try out some different ideas :)
>
> It is a general security concern for me that BPF test_run gets access to
> memory used by 'system page pool', with the concern of leaking data
> (from real traffic) to an attacker than can inject a BPF test_run
> program via e.g. a CI pipeline.
>
> I'm not saying we leaking data today in BPF/XDP progs, but there is a
> potential, because to gain performance in XDP and page_pool we don't
> clear memory to avoid cache line performance issues.
> I guess today, I could BPF tail extend and read packet data from older
> frames, in this way, if I get access to 'system page pool'.

I agree that the leak concern is non-trivial (also of the secret cookie
value), so I am not planning to re-submit with that approach. I got
half-way revising the patches to use the system PP but always
re-initialise the pages before the merge window. This comes with a ~7%
overhead on small packets and probably more with big frames (due to the
memcpy() of the payload data).

Due to this, my current plan is to take a hybrid approach, depending on
the 'repetitions' parameter: for low repetition counts, just
pre-allocate a bunch of pages from the system PP at setup time,
initialise them, and don't bother with recycling. And for large
repetition counts, keep the current approach of allocating a separate PP
for each syscall invocation. The threshold for when something is a "low"
number is the kicker here, of course, but probably some static value can
be set as a threshold; I'll play around with this and see what makes
sense.

WDYT about this?

-Toke


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-04 13:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240220210342.40267-1-toke@redhat.com>
2024-02-20 21:03 ` [PATCH net-next v2 1/4] net: Register system page pool as an XDP memory model Toke Høiland-Jørgensen
2024-04-03 20:20   ` John Fastabend
2024-04-04  9:08     ` Alexander Lobakin
2024-02-20 21:03 ` [PATCH net-next v2 2/4] bpf: test_run: Use system page pool for XDP live frame mode Toke Høiland-Jørgensen
2024-02-21 14:48   ` Toke Høiland-Jørgensen
2024-04-04 11:23     ` Jesper Dangaard Brouer
2024-04-04 13:34       ` Toke Høiland-Jørgensen
2024-04-03 16:34   ` Alexander Lobakin
2024-04-03 20:39     ` John Fastabend
2024-04-04 11:43       ` Jesper Dangaard Brouer
2024-04-04 13:09         ` Alexander Lobakin
2024-02-20 21:03 ` [PATCH net-next v2 3/4] bpf: test_run: Fix cacheline alignment of live XDP frame data structures Toke Høiland-Jørgensen
2024-02-20 21:03 ` [PATCH net-next v2 4/4] page pool: Remove init_callback parameter Toke Høiland-Jørgensen
2024-02-29 18:12   ` Ilias Apalodimas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).