[RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
@ 2023-08-16 12:30 Liang Chen
  2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw)
  To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng
  Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux

Page pool is supported for veth, but at the moment pages are not properly
recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
leveraging the advantages of the page pool. So this RFC patchset is mainly
to make recycling work for those cases. With that in place, it can be
further optimized by utilizing the napi skb cache. Detailed figures are
presented in each commit message, and together they demonstrate a quite
noticeable improvement.

Changes from v2:
- refactor the code to make it more readable
- make use of the napi skb cache for further optimization 
- take the Page pool creation error handling patch out for separate
  submission

Liang Chen (2):
  net: veth: Improving page pool recycling
  net: veth: Optimizing skb reuse in NAPI Context

 drivers/net/veth.c | 66 ++++++++++++++++++++++++++++++++++++++++------
 net/core/skbuff.c  |  1 +
 2 files changed, 59 insertions(+), 8 deletions(-)

-- 
2.40.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling
  2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen
@ 2023-08-16 12:30 ` Liang Chen
  2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen
  2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer
  2 siblings, 0 replies; 8+ messages in thread
From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw)
  To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng
  Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux

Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases,
the pages are not effectively recycled. "ethtool -S" statistics for the
page pool are as follows:

NIC statistics:
rx_pp_alloc_fast: 18041186
rx_pp_alloc_slow: 286369
rx_pp_recycle_ring: 0
rx_pp_recycle_released_ref: 18327555

This failure to recycle page pool pages is a result of the code snippet
below, which converts page pool pages into regular pages and releases
the skb data structure:

veth_xdp_get(xdp);
consume_skb(skb);

The reason behind is some skbs received from the veth peer are not page
pool pages, and remain so after conversion to xdp frame. In order to not
confusing __xdp_return with mixed regular pages and page pool pages, they
are all converted to regular pages. So registering xdp memory model as
MEM_TYPE_PAGE_SHARED is sufficient.

If we replace the above code with kfree_skb_partial, directly releasing
the skb data structure, we can retain the original page pool page behavior.
However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
not a solution as explained above. Therefore, we introduced an additionally
MEM_TYPE_PAGE_POOL model for each rq.

The following tests were conducted using pktgen to generate traffic and
evaluate the performance improvement after page pool pages get successfully
recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP.

Test environment setup:
ns1                 ns2
veth0   <-peer->    veth1
veth2   <-peer->    veth3

Test Results:
pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP)
    without PP recycle: 1,855,069
    with PP recycle: 2,033,439
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS)
    without PP recycle: 1,494,890
    with PP recycle: 1,585,462
    improvement: 5%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP)
    without PP recycle: 1,631,582
    with PP recycle: 1,787,342
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS)
    without PP recycle: 1,309,265
    with PP recycle: 1,391,587
    improvement: ~6%

pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP)
    without PP recycle: 1,674,021
    with PP recycle: 1,811,844
    improvement: ~8%

Additionally, the performance improvement were measured when converting to
xdp_buff doesn't require buffer copy and original skb uses regular pages,
i.e. page pool recycle not involved. This still gives around 2% improvement
attributed to the changes from consume_skb to kfree_skb_partial.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 drivers/net/veth.c | 64 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 509e901da41d..7234eb0297dd 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -62,6 +62,7 @@ struct veth_rq {
 	struct net_device	*dev;
 	struct bpf_prog __rcu	*xdp_prog;
 	struct xdp_mem_info	xdp_mem;
+	struct xdp_mem_info	xdp_mem_pp;
 	struct veth_rq_stats	stats;
 	bool			rx_notify_masked;
 	struct ptr_ring		xdp_ring;
@@ -728,7 +729,8 @@ static void veth_xdp_get(struct xdp_buff *xdp)
 
 static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 					struct xdp_buff *xdp,
-					struct sk_buff **pskb)
+					struct sk_buff **pskb,
+					bool *local_pp_alloc)
 {
 	struct sk_buff *skb = *pskb;
 	u32 frame_sz;
@@ -802,6 +804,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 
 		consume_skb(skb);
 		skb = nskb;
+		*local_pp_alloc = true;
 	}
 
 	/* SKB "head" area always have tailroom for skb_shared_info */
@@ -827,6 +830,37 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	return -ENOMEM;
 }
 
+static void __skb2xdp_steal_data(struct sk_buff *skb,
+				 struct xdp_buff *xdp,
+				 struct veth_rq *rq,
+				 bool local_pp_alloc)
+{
+	if (local_pp_alloc) {
+		/* This is the most common case where the skb was reallocated locally in
+		 * veth_convert_skb_to_xdp_buff, and it's safe to use the xdp_mem_pp model.
+		 */
+		xdp->rxq->mem = rq->xdp_mem_pp;
+		kfree_skb_partial(skb, true);
+	} else if (!skb->pp_recycle) {
+		/* We can safely use kfree_skb_partial here because this cannot be an fclone
+		 * skb. Fclone skbs are allocated via __alloc_skb, with their head buffer
+		 * allocated by kmalloc_reserve (i.e. skb->head_frag = 0), satisfying the
+		 * skb_head_is_locked condition in veth_convert_skb_to_xdp_buff, and are
+		 * thus reallocated.
+		 */
+		xdp->rxq->mem = rq->xdp_mem;
+		kfree_skb_partial(skb, true);
+	} else {
+		/* skbs in this case may include page_pool pages from peer. We cannot use
+		 * rq->xdp_mem_pp as for the local_pp_alloc case, because they might already
+		 * be associated with different xdp_mem_info.
+		 */
+		veth_xdp_get(xdp);
+		consume_skb(skb);
+		xdp->rxq->mem = rq->xdp_mem;
+	}
+}
+
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 					struct sk_buff *skb,
 					struct veth_xdp_tx_bq *bq,
@@ -836,6 +870,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	struct bpf_prog *xdp_prog;
 	struct veth_xdp_buff vxbuf;
 	struct xdp_buff *xdp = &vxbuf.xdp;
+	bool local_pp_alloc = false;
 	u32 act, metalen;
 	int off;
 
@@ -849,7 +884,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 
 	__skb_push(skb, skb->data - skb_mac_header(skb));
-	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
+	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb, &local_pp_alloc))
 		goto drop;
 	vxbuf.skb = skb;
 
@@ -862,9 +897,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		__skb2xdp_steal_data(skb, xdp, rq, local_pp_alloc);
+
 		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
@@ -874,9 +908,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		__skb2xdp_steal_data(skb, xdp, rq, local_pp_alloc);
+
 		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
@@ -1061,6 +1094,14 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 			goto err_page_pool;
 	}
 
+	for (i = start; i < end; i++) {
+		err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp,
+					MEM_TYPE_PAGE_POOL,
+					priv->rq[i].page_pool);
+		if (err)
+			goto err_reg_mem;
+	}
+
 	for (i = start; i < end; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
@@ -1082,6 +1123,10 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 	for (i--; i >= start; i--)
 		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
 	i = end;
+err_reg_mem:
+	for (i--; i >= start; i--)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+	i = end;
 err_page_pool:
 	for (i--; i >= start; i--) {
 		page_pool_destroy(priv->rq[i].page_pool);
@@ -1117,6 +1162,9 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end)
 		ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
 	}
 
+	for (i = start; i < end; i++)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+
 	for (i = start; i < end; i++) {
 		page_pool_destroy(priv->rq[i].page_pool);
 		priv->rq[i].page_pool = NULL;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context
  2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen
  2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen
@ 2023-08-16 12:30 ` Liang Chen
  2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer
  2 siblings, 0 replies; 8+ messages in thread
From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw)
  To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng
  Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux

In cases where the skb is reallocated by veth_convert_skb_to_xdp_buff,
we leverage the NAPI version of the "head stolen" function to enable
fast skb reuse. The following test results evaluate the performance
improvement resulting from reusing skb in the NAPI context with
pktgen-generated traffic.

Test environment setup:
ns1                 ns2
veth0   <-peer->    veth1
veth2   <-peer->    veth3

Test Results:
pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP)
    without reusing skb: 2,033,439
    reusing skb: 2,167,749
    improvement: ~6%

pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS)
    without reusing skb: 1,585,462
    reusing skb: 1,650,572
    improvement: ~4%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP)
    without reusing skb: 1,787,342
    reusing skb: 1,848,516
    improvement: ~3%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS)
    without reusing skb: 1,391,587
    reusing skb: 1,439,866
    improvement: ~3%

pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP)
    without reusing skb: 1,811,844
    with reusing skb: 1,861,027
    improvement: ~3%

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 drivers/net/veth.c | 5 ++++-
 net/core/skbuff.c  | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 7234eb0297dd..4e1ee110ab84 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -838,9 +838,12 @@ static void __skb2xdp_steal_data(struct sk_buff *skb,
 	if (local_pp_alloc) {
 		/* This is the most common case where the skb was reallocated locally in
 		 * veth_convert_skb_to_xdp_buff, and it's safe to use the xdp_mem_pp model.
+		 * Since the skb is "reallocated" in the NAPI context of veth, it is possible
+		 * to use the NAPI version of the "head stolen" function to optimize the
+		 * reuse of skb as well.
 		 */
 		xdp->rxq->mem = rq->xdp_mem_pp;
-		kfree_skb_partial(skb, true);
+		napi_skb_free_stolen_head(skb);
 	} else if (!skb->pp_recycle) {
 		/* We can safely use kfree_skb_partial here because this cannot be an fclone
 		 * skb. Fclone skbs are allocated via __alloc_skb, with their head buffer
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6..954ba1b94840 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1284,6 +1284,7 @@ void napi_skb_free_stolen_head(struct sk_buff *skb)
 	}
 	napi_skb_cache_put(skb);
 }
+EXPORT_SYMBOL(napi_skb_free_stolen_head);
 
 void napi_consume_skb(struct sk_buff *skb, int budget)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
  2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen
  2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen
  2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen
@ 2023-08-21 14:21 ` Jesper Dangaard Brouer
  2023-08-21 21:54   ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-21 14:21 UTC (permalink / raw)
  To: Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni,
	linyunsheng
  Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi,
	Stanislav Fomichev

On 16/08/2023 14.30, Liang Chen wrote:
> Page pool is supported for veth, but at the moment pages are not properly
> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
> leveraging the advantages of the page pool. So this RFC patchset is mainly
> to make recycling work for those cases. With that in place, it can be
> further optimized by utilizing the napi skb cache. Detailed figures are
> presented in each commit message, and together they demonstrate a quite
> noticeable improvement.
> 

I'm digging into this code path today.

I'm trying to extend this and find a way to support SKBs that used
kmalloc (skb->head_frag=0), such that we can remove the
skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
allow more SKBs to avoid realloc.  As long as they have enough headroom,
which we can dynamically control for netdev TX-packets by adjusting
netdev->needed_headroom, e.g. when loading an XDP prog.

I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
think it is a bug that generic-XDP allows this.

Deep into this rabbit hole, I start to question our approach.
  - Perhaps the veth XDP approach for SKBs is wrong?

The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
question, why isn't it using "xdp_do_generic_redirect()"?
(I will jump into this rabbit hole now...)

--Jesper

> Changes from v2:
> - refactor the code to make it more readable
> - make use of the napi skb cache for further optimization
> - take the Page pool creation error handling patch out for separate
>    submission
> 
> Liang Chen (2):
>    net: veth: Improving page pool recycling
>    net: veth: Optimizing skb reuse in NAPI Context
> 
>   drivers/net/veth.c | 66 ++++++++++++++++++++++++++++++++++++++++------
>   net/core/skbuff.c  |  1 +
>   2 files changed, 59 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
  2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer
@ 2023-08-21 21:54   ` Jesper Dangaard Brouer
  2023-08-22 12:24     ` Yunsheng Lin
  0 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-21 21:54 UTC (permalink / raw)
  To: Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni,
	linyunsheng
  Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi,
	Stanislav Fomichev, Maryam Tahhan



On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
> 
> On 16/08/2023 14.30, Liang Chen wrote:
>> Page pool is supported for veth, but at the moment pages are not properly
>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>> leveraging the advantages of the page pool. So this RFC patchset is 
>> mainly
>> to make recycling work for those cases. With that in place, it can be
>> further optimized by utilizing the napi skb cache. Detailed figures are
>> presented in each commit message, and together they demonstrate a quite
>> noticeable improvement.
>>
> 
> I'm digging into this code path today.
> 
> I'm trying to extend this and find a way to support SKBs that used
> kmalloc (skb->head_frag=0), such that we can remove the
> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
> allow more SKBs to avoid realloc.  As long as they have enough headroom,
> which we can dynamically control for netdev TX-packets by adjusting
> netdev->needed_headroom, e.g. when loading an XDP prog.
> 
> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
> handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
> think it is a bug that generic-XDP allows this.
> 
> Deep into this rabbit hole, I start to question our approach.
>   - Perhaps the veth XDP approach for SKBs is wrong?
> 
> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
> question, why isn't it using "xdp_do_generic_redirect()"?
> (I will jump into this rabbit hole now...)

It works, implemented using xdp_do_generic_redirect() and lifted
skb_head_is_locked check in veth_convert_skb_to_xdp_buff(), plus adjust
xsk_build_skb() to alloc enough headroom.

The results[1] are good approx 26% faster[1] compared to Maryam's
veth-benchmark[3] results[2].

--Jesper

[1] 
https://github.com/xdp-project/xdp-project/blob/veth-benchmark02/areas/core/veth_benchmark04.org#implemented-use-generic-xdp-redirect

[2] 
https://github.com/xdp-project/xdp-project/blob/veth-benchmark02/areas/core/veth_benchmark03.org#benchmark01-with-xdp-redirect-loaded-on-host-veth

[3] https://github.com/maryamtahhan/veth-benchmark/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
  2023-08-21 21:54   ` Jesper Dangaard Brouer
@ 2023-08-22 12:24     ` Yunsheng Lin
  2023-08-22 13:05       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 8+ messages in thread
From: Yunsheng Lin @ 2023-08-22 12:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Liang Chen, hawk, horms, davem, edumazet,
	kuba, pabeni
  Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi,
	Stanislav Fomichev, Maryam Tahhan

On 2023/8/22 5:54, Jesper Dangaard Brouer wrote:
> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
>>
>> On 16/08/2023 14.30, Liang Chen wrote:
>>> Page pool is supported for veth, but at the moment pages are not properly
>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>>> leveraging the advantages of the page pool. So this RFC patchset is mainly
>>> to make recycling work for those cases. With that in place, it can be
>>> further optimized by utilizing the napi skb cache. Detailed figures are
>>> presented in each commit message, and together they demonstrate a quite
>>> noticeable improvement.
>>>
>>
>> I'm digging into this code path today.
>>
>> I'm trying to extend this and find a way to support SKBs that used
>> kmalloc (skb->head_frag=0), such that we can remove the
>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
>> allow more SKBs to avoid realloc.  As long as they have enough headroom,
>> which we can dynamically control for netdev TX-packets by adjusting
>> netdev->needed_headroom, e.g. when loading an XDP prog.
>>
>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
>> handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
>> think it is a bug that generic-XDP allows this.

Is it possible to relaxe other checking too, and implement something like
pskb_expand_head() in xdp core if xdp core need to modify the data?


>>
>> Deep into this rabbit hole, I start to question our approach.
>>   - Perhaps the veth XDP approach for SKBs is wrong?
>>
>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
>> question, why isn't it using "xdp_do_generic_redirect()"?
>> (I will jump into this rabbit hole now...)

Is there any reason why xdp_do_redirect() can not handle the slab-allocated
data? Can we change the xdp_do_redirect() to handle slab-allocated
data, so that it can benefit other case beside veth too?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
  2023-08-22 12:24     ` Yunsheng Lin
@ 2023-08-22 13:05       ` Jesper Dangaard Brouer
  2023-08-22 18:13         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 13:05 UTC (permalink / raw)
  To: Yunsheng Lin, Jesper Dangaard Brouer, Liang Chen, hawk, horms,
	davem, edumazet, kuba, pabeni
  Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi,
	Stanislav Fomichev, Maryam Tahhan



On 22/08/2023 14.24, Yunsheng Lin wrote:
> On 2023/8/22 5:54, Jesper Dangaard Brouer wrote:
>> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
>>>
>>> On 16/08/2023 14.30, Liang Chen wrote:
>>>> Page pool is supported for veth, but at the moment pages are not properly
>>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>>>> leveraging the advantages of the page pool. So this RFC patchset is mainly
>>>> to make recycling work for those cases. With that in place, it can be
>>>> further optimized by utilizing the napi skb cache. Detailed figures are
>>>> presented in each commit message, and together they demonstrate a quite
>>>> noticeable improvement.
>>>>
>>>
>>> I'm digging into this code path today.
>>>
>>> I'm trying to extend this and find a way to support SKBs that used
>>> kmalloc (skb->head_frag=0), such that we can remove the
>>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
>>> allow more SKBs to avoid realloc.  As long as they have enough headroom,
>>> which we can dynamically control for netdev TX-packets by adjusting
>>> netdev->needed_headroom, e.g. when loading an XDP prog.
>>>
>>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
>>> handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
>>> think it is a bug that generic-XDP allows this.
> 
> Is it possible to relaxe other checking too, and implement something like
> pskb_expand_head() in xdp core if xdp core need to modify the data?
> 

Yes, I definitely hope (and plan) to relax other checks.

The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and
wrong, as many drivers today use headroom 192 bytes for XDP (which we
allowed).  Thus, there is not reason for veth to insist on this
XDP_PACKET_HEADROOM limit.  Today XDP can handle variable headroom (due
to these drivers).


> 
>>>
>>> Deep into this rabbit hole, I start to question our approach.
>>>    - Perhaps the veth XDP approach for SKBs is wrong?
>>>
>>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
>>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
>>> question, why isn't it using "xdp_do_generic_redirect()"?
>>> (I will jump into this rabbit hole now...)
> 
> Is there any reason why xdp_do_redirect() can not handle the slab-allocated
> data? Can we change the xdp_do_redirect() to handle slab-allocated
> data, so that it can benefit other case beside veth too?
> 

I started coding up this, but realized that it was a wrong approach.

The xdp_do_redirect() call is for native-XDP with a proper xdp_buff.
When dealing with SKBs we pretend is a xdp_buff, we have the API
xdp_do_generic_redirect().  IMHO it is wrong to "steal" the packet-data
from an SKB and in-order to use the native-XDP API xdp_do_redirect().
In the use-cases I see, often the next layer will allocate a new SKB and
attach the stolen packet-data , which is pure-waste as
xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs.

--Jesper




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage
  2023-08-22 13:05       ` Jesper Dangaard Brouer
@ 2023-08-22 18:13         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 18:13 UTC (permalink / raw)
  To: Yunsheng Lin, Jesper Dangaard Brouer, Liang Chen, hawk, horms,
	davem, edumazet, kuba, pabeni
  Cc: ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi,
	Stanislav Fomichev, Maryam Tahhan



On 22/08/2023 15.05, Jesper Dangaard Brouer wrote:
> 
> On 22/08/2023 14.24, Yunsheng Lin wrote:
>> On 2023/8/22 5:54, Jesper Dangaard Brouer wrote:
>>> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote:
>>>>
>>>> On 16/08/2023 14.30, Liang Chen wrote:
>>>>> Page pool is supported for veth, but at the moment pages are not properly
>>>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully
>>>>> leveraging the advantages of the page pool. So this RFC patchset is mainly
>>>>> to make recycling work for those cases. With that in place, it can be
>>>>> further optimized by utilizing the napi skb cache. Detailed figures are
>>>>> presented in each commit message, and together they demonstrate a quite
>>>>> noticeable improvement.
>>>>>
>>>>
>>>> I'm digging into this code path today.
>>>>
>>>> I'm trying to extend this and find a way to support SKBs that used
>>>> kmalloc (skb->head_frag=0), such that we can remove the
>>>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will
>>>> allow more SKBs to avoid realloc.  As long as they have enough headroom,
>>>> which we can dynamically control for netdev TX-packets by adjusting
>>>> netdev->needed_headroom, e.g. when loading an XDP prog.
>>>>
>>>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can
>>>> handle SKB kmalloc (skb->head_frag=0).  Going though the code, I don't
>>>> think it is a bug that generic-XDP allows this.
>>
>> Is it possible to relaxe other checking too, and implement something like
>> pskb_expand_head() in xdp core if xdp core need to modify the data?
>>
> 
> Yes, I definitely hope (and plan) to relax other checks.
> 
> The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and
> wrong, as many drivers today use headroom 192 bytes for XDP (which we
> allowed).  Thus, there is not reason for veth to insist on this
> XDP_PACKET_HEADROOM limit.  Today XDP can handle variable headroom (due
> to these drivers).
> 
> 
>>
>>>>
>>>> Deep into this rabbit hole, I start to question our approach.
>>>>    - Perhaps the veth XDP approach for SKBs is wrong?
>>>>
>>>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that
>>>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I
>>>> question, why isn't it using "xdp_do_generic_redirect()"?
>>>> (I will jump into this rabbit hole now...)
>>
>> Is there any reason why xdp_do_redirect() can not handle the 
>> slab-allocated
>> data? Can we change the xdp_do_redirect() to handle slab-allocated
>> data, so that it can benefit other case beside veth too?
>>
> 
> I started coding up this, but realized that it was a wrong approach.
> 
> The xdp_do_redirect() call is for native-XDP with a proper xdp_buff.
> When dealing with SKBs we pretend is a xdp_buff, we have the API
> xdp_do_generic_redirect().  IMHO it is wrong to "steal" the packet-data
> from an SKB and in-order to use the native-XDP API xdp_do_redirect().
> In the use-cases I see, often the next layer will allocate a new SKB and
> attach the stolen packet-data , which is pure-waste as
> xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs.
> 

Please see my RFC-v1 patchset[1] for a different approach, which avoids 
the realloc and page_pool usage all together (but also results in 
correct recycling for PP when realloc cannot be avoided).

[1] 
https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/

--Jesper



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-22 18:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen
2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen
2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen
2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer
2023-08-21 21:54   ` Jesper Dangaard Brouer
2023-08-22 12:24     ` Yunsheng Lin
2023-08-22 13:05       ` Jesper Dangaard Brouer
2023-08-22 18:13         ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).