[RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
@ 2023-08-01  6:19 Liang Chen
  2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-01  6:19 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, linyunsheng
  Cc: hawk, ilias.apalodimas, daniel, ast, netdev, liangchen.linux

The failure handling procedure destroys page pools for all queues,
including those that haven't had their page pool created yet. this patch
introduces necessary adjustments to prevent potential risks and
inconsistency with the error handling behavior.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 drivers/net/veth.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..509e901da41d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 err_xdp_ring:
 	for (i--; i >= start; i--)
 		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
+	i = end;
 err_page_pool:
-	for (i = start; i < end; i++) {
+	for (i--; i >= start; i--) {
 		page_pool_destroy(priv->rq[i].page_pool);
 		priv->rq[i].page_pool = NULL;
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-01  6:19 [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen
@ 2023-08-01  6:19 ` Liang Chen
  2023-08-02 12:32   ` Yunsheng Lin
  2023-08-11 18:16   ` Jesper Dangaard Brouer
  2023-08-01 12:18 ` [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Simon Horman
  2023-08-02  8:56 ` Jesper Dangaard Brouer
  2 siblings, 2 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-01  6:19 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, linyunsheng
  Cc: hawk, ilias.apalodimas, daniel, ast, netdev, liangchen.linux

Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases,
the pages are not effectively recycled. "ethtool -S" statistics for the
page pool are as follows:

NIC statistics:
rx_pp_alloc_fast: 18041186
rx_pp_alloc_slow: 286369
rx_pp_recycle_ring: 0
rx_pp_recycle_released_ref: 18327555

This failure to recycle page pool pages is a result of the code snippet
below, which converts page pool pages into regular pages and releases
the skb data structure:

veth_xdp_get(xdp);
consume_skb(skb);

The reason behind is some skbs received from the veth peer are not page
pool pages, and remain so after conversion to xdp frame. In order to not
confusing __xdp_return with mixed regular pages and page pool pages, they
are all converted to regular pages. So registering xdp memory model as
MEM_TYPE_PAGE_SHARED is sufficient.

If we replace the above code with kfree_skb_partial, directly releasing
the skb data structure, we can retain the original page pool page behavior.
However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is
not a solution as explained above. Therefore, we introduced an additionally
MEM_TYPE_PAGE_POOL model for each rq.

In addition, to avoid mixing up pages from page pools with different
xdp_mem_id, page pool pages directly coming from the peers are still
converted into regular pages. This is not common, as most of the time,
they will be reallocated in veth_convert_skb_to_xdp_buff.

The following tests were conducted using pktgen to generate traffic and
evaluate the performance improvement after page pool pages get successfully
recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP.

Test environment setup:
ns1                 ns2
veth0   <-peer->    veth1
veth2   <-peer->    veth3

Test Results:
pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP)
    without PP recycle: 1,780,392
    with PP recycle: 1,984,680
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS)
    without PP recycle: 1,433,491
    with PP recycle: 1,511,680
    improvement: 5~6%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP)
    without PP recycle: 1,527,708
    with PP recycle: 1,672,101
    improvement: ~10%

pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS)
    without PP recycle: 1,325,804
    with PP recycle: 1,392,704
    improvement: ~5.5%

pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP)
    without PP recycle: 1,607,609
    with PP recycle: 1,736,957
    improvement: ~8%

Additionally, the performance improvement were measured when converting to
xdp_buff doesn't require buffer copy and original skb uses regular pages,
i.e. page pool recycle not involved. This still gives around 2% improvement
attributed to the changes from consume_skb to kfree_skb_partial.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
 Changes from v1:
- pp pages from the peers are still converted into regular pages.
---
 drivers/net/veth.c | 48 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 509e901da41d..ea1b344e5db4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -62,6 +62,7 @@ struct veth_rq {
 	struct net_device	*dev;
 	struct bpf_prog __rcu	*xdp_prog;
 	struct xdp_mem_info	xdp_mem;
+	struct xdp_mem_info	xdp_mem_pp;
 	struct veth_rq_stats	stats;
 	bool			rx_notify_masked;
 	struct ptr_ring		xdp_ring;
@@ -836,6 +837,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	struct bpf_prog *xdp_prog;
 	struct veth_xdp_buff vxbuf;
 	struct xdp_buff *xdp = &vxbuf.xdp;
+	struct sk_buff *skb_orig;
 	u32 act, metalen;
 	int off;
 
@@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		goto out;
 	}
 
+	skb_orig = skb;
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
 		goto drop;
@@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		if (skb != skb_orig) {
+			xdp->rxq->mem = rq->xdp_mem_pp;
+			kfree_skb_partial(skb, true);
+		} else if (!skb->pp_recycle) {
+			xdp->rxq->mem = rq->xdp_mem;
+			kfree_skb_partial(skb, true);
+		} else {
+			veth_xdp_get(xdp);
+			consume_skb(skb);
+			xdp->rxq->mem = rq->xdp_mem;
+		}
+
 		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
 			trace_xdp_exception(rq->dev, xdp_prog, act);
 			stats->rx_drops++;
@@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		rcu_read_unlock();
 		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
+		if (skb != skb_orig) {
+			xdp->rxq->mem = rq->xdp_mem_pp;
+			kfree_skb_partial(skb, true);
+		} else if (!skb->pp_recycle) {
+			xdp->rxq->mem = rq->xdp_mem;
+			kfree_skb_partial(skb, true);
+		} else {
+			veth_xdp_get(xdp);
+			consume_skb(skb);
+			xdp->rxq->mem = rq->xdp_mem;
+		}
+
 		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
 			stats->rx_drops++;
 			goto err_xdp;
@@ -1061,6 +1082,14 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 			goto err_page_pool;
 	}
 
+	for (i = start; i < end; i++) {
+		err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp,
+					MEM_TYPE_PAGE_POOL,
+					priv->rq[i].page_pool);
+		if (err)
+			goto err_reg_mem;
+	}
+
 	for (i = start; i < end; i++) {
 		struct veth_rq *rq = &priv->rq[i];
 
@@ -1082,6 +1111,10 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
 	for (i--; i >= start; i--)
 		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
 	i = end;
+err_reg_mem:
+	for (i--; i >= start; i--)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+	i = end;
 err_page_pool:
 	for (i--; i >= start; i--) {
 		page_pool_destroy(priv->rq[i].page_pool);
@@ -1117,6 +1150,9 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end)
 		ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
 	}
 
+	for (i = start; i < end; i++)
+		xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp);
+
 	for (i = start; i < end; i++) {
 		page_pool_destroy(priv->rq[i].page_pool);
 		priv->rq[i].page_pool = NULL;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-01  6:19 [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen
  2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
@ 2023-08-01 12:18 ` Simon Horman
  2023-08-11 11:56   ` Liang Chen
  2023-08-02  8:56 ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 16+ messages in thread
From: Simon Horman @ 2023-08-01 12:18 UTC (permalink / raw)
  To: Liang Chen
  Cc: davem, edumazet, kuba, pabeni, linyunsheng, hawk,
	ilias.apalodimas, daniel, ast, netdev

On Tue, Aug 01, 2023 at 02:19:31PM +0800, Liang Chen wrote:
> The failure handling procedure destroys page pools for all queues,
> including those that haven't had their page pool created yet. this patch
> introduces necessary adjustments to prevent potential risks and
> inconsistency with the error handling behavior.
> 
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>

I wonder if this should this have a fixes tag and be targeted at 'net' as a
bugfix?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-01  6:19 [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen
  2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
  2023-08-01 12:18 ` [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Simon Horman
@ 2023-08-02  8:56 ` Jesper Dangaard Brouer
  2023-08-11 12:02   ` Liang Chen
  2 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-02  8:56 UTC (permalink / raw)
  To: Liang Chen, davem, edumazet, kuba, pabeni, linyunsheng
  Cc: hawk, ilias.apalodimas, daniel, ast, netdev



On 01/08/2023 08.19, Liang Chen wrote:
> The failure handling procedure destroys page pools for all queues,
> including those that haven't had their page pool created yet. this patch
> introduces necessary adjustments to prevent potential risks and
> inconsistency with the error handling behavior.
> 
> Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> ---
>   drivers/net/veth.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 614f3e3efab0..509e901da41d 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
>   err_xdp_ring:
>   	for (i--; i >= start; i--)
>   		ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
> +	i = end;
>   err_page_pool:
> -	for (i = start; i < end; i++) {
> +	for (i--; i >= start; i--) {

I'm not a fan of this coding style, that iterates backwards, but I can
see you just inherited the existing style in this function.

>   		page_pool_destroy(priv->rq[i].page_pool);
>   		priv->rq[i].page_pool = NULL;
>   	}

The page_pool_destroy() call handles(exits) if called with NULL.
So, I don't think this incorrect walking all (start to end) can trigger 
an actual bug.

Anyhow, I do think this is more correct, so you can append my ACK for 
the real submission.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
@ 2023-08-02 12:32   ` Yunsheng Lin
  2023-08-07 12:20     ` Liang Chen
  2023-08-11 18:16   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 16+ messages in thread
From: Yunsheng Lin @ 2023-08-02 12:32 UTC (permalink / raw)
  To: Liang Chen, davem, edumazet, kuba, pabeni
  Cc: hawk, ilias.apalodimas, daniel, ast, netdev

On 2023/8/1 14:19, Liang Chen wrote:

> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>  	case XDP_PASS:
>  		break;
>  	case XDP_TX:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);

For this case, I suppose that we can safely call kfree_skb_partial()
as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
I am not sure about the !skb->pp_recycle case.

> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);

For consume_skb(), there is skb_unref() checking and other checking/operation.
Can we really assume that we can call kfree_skb_partial() with head_stolen
being true? Is it possible that skb->users is bigger than 1? If it is possible,
don't we free the 'skb' back to skbuff_cache when other may still be using
it?

> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-02 12:32   ` Yunsheng Lin
@ 2023-08-07 12:20     ` Liang Chen
  2023-08-08 11:16       ` Yunsheng Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Liang Chen @ 2023-08-07 12:20 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel,
	ast, netdev

On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/1 14:19, Liang Chen wrote:
>
> > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >       case XDP_PASS:
> >               break;
> >       case XDP_TX:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
>
> For this case, I suppose that we can safely call kfree_skb_partial()
> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> I am not sure about the !skb->pp_recycle case.
>
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
>
> For consume_skb(), there is skb_unref() checking and other checking/operation.
> Can we really assume that we can call kfree_skb_partial() with head_stolen
> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> don't we free the 'skb' back to skbuff_cache when other may still be using
> it?
>

Thanks for raising the concern. If there are multiple references to
the skb (skb->users is greater than 1), the skb will be reallocated in
veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
case.

In fact, entering the !skb->pp_recycle case implies that the skb meets
the following conditions:
1. It is neither shared nor cloned.
2. It is not allocated using kmalloc.
3. It does not have fragment data.
4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.


Thanks,
Liang

> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-07 12:20     ` Liang Chen
@ 2023-08-08 11:16       ` Yunsheng Lin
  2023-08-09 10:01         ` Liang Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Yunsheng Lin @ 2023-08-08 11:16 UTC (permalink / raw)
  To: Liang Chen
  Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel,
	ast, netdev

On 2023/8/7 20:20, Liang Chen wrote:
> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/8/1 14:19, Liang Chen wrote:
>>
>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>       case XDP_PASS:
>>>               break;
>>>       case XDP_TX:
>>> -             veth_xdp_get(xdp);
>>> -             consume_skb(skb);
>>> -             xdp->rxq->mem = rq->xdp_mem;
>>> +             if (skb != skb_orig) {
>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
>>> +                     kfree_skb_partial(skb, true);
>>
>> For this case, I suppose that we can safely call kfree_skb_partial()
>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
>> I am not sure about the !skb->pp_recycle case.
>>
>>> +             } else if (!skb->pp_recycle) {
>>> +                     xdp->rxq->mem = rq->xdp_mem;
>>> +                     kfree_skb_partial(skb, true);
>>
>> For consume_skb(), there is skb_unref() checking and other checking/operation.
>> Can we really assume that we can call kfree_skb_partial() with head_stolen
>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
>> don't we free the 'skb' back to skbuff_cache when other may still be using
>> it?
>>
> 
> Thanks for raising the concern. If there are multiple references to
> the skb (skb->users is greater than 1), the skb will be reallocated in
> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> case.
> 
> In fact, entering the !skb->pp_recycle case implies that the skb meets
> the following conditions:
> 1. It is neither shared nor cloned.
> 2. It is not allocated using kmalloc.
> 3. It does not have fragment data.
> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> 

You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
it seems the xdp is pretty strict about the buffer owner, it need to
have exclusive access to all the buffer.

And it seems there is only one difference left then, with
kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
'skb' only allocated from 'skbuff_cache', this patch looks good to me
then.

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-08 11:16       ` Yunsheng Lin
@ 2023-08-09 10:01         ` Liang Chen
  2023-08-09 12:35           ` Yunsheng Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Liang Chen @ 2023-08-09 10:01 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel,
	ast, netdev

On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/7 20:20, Liang Chen wrote:
> > On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2023/8/1 14:19, Liang Chen wrote:
> >>
> >>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >>>       case XDP_PASS:
> >>>               break;
> >>>       case XDP_TX:
> >>> -             veth_xdp_get(xdp);
> >>> -             consume_skb(skb);
> >>> -             xdp->rxq->mem = rq->xdp_mem;
> >>> +             if (skb != skb_orig) {
> >>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
> >>> +                     kfree_skb_partial(skb, true);
> >>
> >> For this case, I suppose that we can safely call kfree_skb_partial()
> >> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> >> I am not sure about the !skb->pp_recycle case.
> >>
> >>> +             } else if (!skb->pp_recycle) {
> >>> +                     xdp->rxq->mem = rq->xdp_mem;
> >>> +                     kfree_skb_partial(skb, true);
> >>
> >> For consume_skb(), there is skb_unref() checking and other checking/operation.
> >> Can we really assume that we can call kfree_skb_partial() with head_stolen
> >> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> >> don't we free the 'skb' back to skbuff_cache when other may still be using
> >> it?
> >>
> >
> > Thanks for raising the concern. If there are multiple references to
> > the skb (skb->users is greater than 1), the skb will be reallocated in
> > veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> > case.
> >
> > In fact, entering the !skb->pp_recycle case implies that the skb meets
> > the following conditions:
> > 1. It is neither shared nor cloned.
> > 2. It is not allocated using kmalloc.
> > 3. It does not have fragment data.
> > 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> >
>
> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
> it seems the xdp is pretty strict about the buffer owner, it need to
> have exclusive access to all the buffer.
>
> And it seems there is only one difference left then, with
> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
> then.
>

The difference between kmem_cache_free and kfree_skbmem lies in the
fact that kfree_skbmem checks whether the skb is an fclone (fast
clone) skb. If it is, it should be returned to the
skbuff_fclone_cache. Currently, fclone skbs can only be allocated
through __alloc_skb, and their head buffer is allocated by
kmalloc_reserve, which does not meet the condition mentioned above -
"2. It is not allocated using kmalloc.". Therefore, the fclone skb
will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
the skb != skb_orig case. In other words, entering the
!skb->pp_recycle case indicates that the skb was allocated from
skbuff_cache.

Thanks for the review,
Liang

> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-09 10:01         ` Liang Chen
@ 2023-08-09 12:35           ` Yunsheng Lin
  2023-08-12  1:52             ` Liang Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Yunsheng Lin @ 2023-08-09 12:35 UTC (permalink / raw)
  To: Liang Chen
  Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel,
	ast, netdev

On 2023/8/9 18:01, Liang Chen wrote:
> On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/8/7 20:20, Liang Chen wrote:
>>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> On 2023/8/1 14:19, Liang Chen wrote:
>>>>
>>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>>>>>       case XDP_PASS:
>>>>>               break;
>>>>>       case XDP_TX:
>>>>> -             veth_xdp_get(xdp);
>>>>> -             consume_skb(skb);
>>>>> -             xdp->rxq->mem = rq->xdp_mem;
>>>>> +             if (skb != skb_orig) {
>>>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
>>>>> +                     kfree_skb_partial(skb, true);
>>>>
>>>> For this case, I suppose that we can safely call kfree_skb_partial()
>>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
>>>> I am not sure about the !skb->pp_recycle case.
>>>>
>>>>> +             } else if (!skb->pp_recycle) {
>>>>> +                     xdp->rxq->mem = rq->xdp_mem;
>>>>> +                     kfree_skb_partial(skb, true);
>>>>
>>>> For consume_skb(), there is skb_unref() checking and other checking/operation.
>>>> Can we really assume that we can call kfree_skb_partial() with head_stolen
>>>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
>>>> don't we free the 'skb' back to skbuff_cache when other may still be using
>>>> it?
>>>>
>>>
>>> Thanks for raising the concern. If there are multiple references to
>>> the skb (skb->users is greater than 1), the skb will be reallocated in
>>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
>>> case.
>>>
>>> In fact, entering the !skb->pp_recycle case implies that the skb meets
>>> the following conditions:
>>> 1. It is neither shared nor cloned.
>>> 2. It is not allocated using kmalloc.
>>> 3. It does not have fragment data.
>>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
>>>
>>
>> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
>> it seems the xdp is pretty strict about the buffer owner, it need to
>> have exclusive access to all the buffer.
>>
>> And it seems there is only one difference left then, with
>> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
>> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
>> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
>> then.
>>
> 
> The difference between kmem_cache_free and kfree_skbmem lies in the
> fact that kfree_skbmem checks whether the skb is an fclone (fast
> clone) skb. If it is, it should be returned to the
> skbuff_fclone_cache. Currently, fclone skbs can only be allocated
> through __alloc_skb, and their head buffer is allocated by
> kmalloc_reserve, which does not meet the condition mentioned above -
> "2. It is not allocated using kmalloc.". Therefore, the fclone skb
> will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
> the skb != skb_orig case. In other words, entering the
> !skb->pp_recycle case indicates that the skb was allocated from
> skbuff_cache.

It might need some comment to make it clear or add some compile testing
such as BUILD_BUG_ON() to ensure that, as it is not so obvious if
someone change it to allocate a fclone skb with a frag head data in
the future.

Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and
we might be able to reuse the 'skb' if we can use something like
napi_skb_free_stolen_head().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-01 12:18 ` [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Simon Horman
@ 2023-08-11 11:56   ` Liang Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-11 11:56 UTC (permalink / raw)
  To: Simon Horman
  Cc: davem, edumazet, kuba, pabeni, linyunsheng, hawk,
	ilias.apalodimas, daniel, ast, netdev

On Tue, Aug 1, 2023 at 8:18 PM Simon Horman <horms@kernel.org> wrote:
>
> On Tue, Aug 01, 2023 at 02:19:31PM +0800, Liang Chen wrote:
> > The failure handling procedure destroys page pools for all queues,
> > including those that haven't had their page pool created yet. this patch
> > introduces necessary adjustments to prevent potential risks and
> > inconsistency with the error handling behavior.
> >
> > Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
>
> I wonder if this should this have a fixes tag and be targeted at 'net' as a
> bugfix?

It doesn't trigger a real bug at the moment as page_pool_destroy
handles NULL argument. So may not be suitable for a fix tag.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-02  8:56 ` Jesper Dangaard Brouer
@ 2023-08-11 12:02   ` Liang Chen
  2023-08-11 16:35     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Liang Chen @ 2023-08-11 12:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, linyunsheng, ilias.apalodimas,
	daniel, ast, netdev

On Wed, Aug 2, 2023 at 4:56 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
> On 01/08/2023 08.19, Liang Chen wrote:
> > The failure handling procedure destroys page pools for all queues,
> > including those that haven't had their page pool created yet. this patch
> > introduces necessary adjustments to prevent potential risks and
> > inconsistency with the error handling behavior.
> >
> > Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
> > ---
> >   drivers/net/veth.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 614f3e3efab0..509e901da41d 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end)
> >   err_xdp_ring:
> >       for (i--; i >= start; i--)
> >               ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free);
> > +     i = end;
> >   err_page_pool:
> > -     for (i = start; i < end; i++) {
> > +     for (i--; i >= start; i--) {
>
> I'm not a fan of this coding style, that iterates backwards, but I can
> see you just inherited the existing style in this function.
>
> >               page_pool_destroy(priv->rq[i].page_pool);
> >               priv->rq[i].page_pool = NULL;
> >       }
>
> The page_pool_destroy() call handles(exits) if called with NULL.
> So, I don't think this incorrect walking all (start to end) can trigger
> an actual bug.
>
> Anyhow, I do think this is more correct, so you can append my ACK for
> the real submission.
>
> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>

Thanks! I will separate this patch out and make a real submission,
since it's a small fix and not really coupled with the optimization
patch which still needs some further work after receiving feedback
from Yunsheng.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-11 12:02   ` Liang Chen
@ 2023-08-11 16:35     ` Jesper Dangaard Brouer
  2023-08-12  2:16       ` Liang Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-11 16:35 UTC (permalink / raw)
  To: Liang Chen, Jesper Dangaard Brouer
  Cc: brouer, davem, edumazet, kuba, pabeni, linyunsheng,
	ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi



On 11/08/2023 14.02, Liang Chen wrote:
> On Wed, Aug 2, 2023 at 4:56 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>
[...]
>>>                page_pool_destroy(priv->rq[i].page_pool);
>>>                priv->rq[i].page_pool = NULL;
>>>        }
>>
>> The page_pool_destroy() call handles(exits) if called with NULL.
>> So, I don't think this incorrect walking all (start to end) can trigger
>> an actual bug.
>>
>> Anyhow, I do think this is more correct, so you can append my ACK for
>> the real submission.
>>
>> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>
> 
> Thanks! I will separate this patch out and make a real submission,
> since it's a small fix and not really coupled with the optimization
> patch which still needs some further work after receiving feedback
> from Yunsheng.

Sure, send it as a fix commit.  Given it is not super critical i think
it is okay to send for net-next, to avoid merge issues/conflicts with
your 2/2 optimization patch.  And for good order we should add a Fixes
tag, but IMHO net-next is still okay, given I don't think this can
trigger a bug.

That said, I do want to encourage you to work on 2/2 optimization patch.
I think this is a very important optimization and actually a fix for
then we introduced page_pool to veth. Well spotted! :-)

In 2/2 I keep getting confused by use of kfree_skb_partial() and if the
right conditions are meet.

--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
  2023-08-02 12:32   ` Yunsheng Lin
@ 2023-08-11 18:16   ` Jesper Dangaard Brouer
  2023-08-12  2:02     ` Liang Chen
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-11 18:16 UTC (permalink / raw)
  To: Liang Chen, davem, edumazet, kuba, pabeni, linyunsheng
  Cc: brouer, hawk, ilias.apalodimas, daniel, ast, netdev


On 01/08/2023 08.19, Liang Chen wrote:
[...]
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 509e901da41d..ea1b344e5db4 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
[...]
> @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   		goto out;
>   	}
>   
> +	skb_orig = skb;
>   	__skb_push(skb, skb->data - skb_mac_header(skb));
>   	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>   		goto drop;
> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   	case XDP_PASS:
>   		break;
>   	case XDP_TX:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);
> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);
> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +

Above code section, and below section looks the same.
It begs for a common function.

>   		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
>   			trace_xdp_exception(rq->dev, xdp_prog, act);
>   			stats->rx_drops++;
> @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>   		rcu_read_unlock();
>   		goto xdp_xmit;
>   	case XDP_REDIRECT:
> -		veth_xdp_get(xdp);
> -		consume_skb(skb);
> -		xdp->rxq->mem = rq->xdp_mem;
> +		if (skb != skb_orig) {
> +			xdp->rxq->mem = rq->xdp_mem_pp;
> +			kfree_skb_partial(skb, true);
> +		} else if (!skb->pp_recycle) {
> +			xdp->rxq->mem = rq->xdp_mem;
> +			kfree_skb_partial(skb, true);
> +		} else {
> +			veth_xdp_get(xdp);
> +			consume_skb(skb);
> +			xdp->rxq->mem = rq->xdp_mem;
> +		}
> +

The common function can be named to reflect what the purpose of this 
code section is.  According to my understanding, the code steals the 
(packet) data section from the SKB and free the SKB.  And 
prepare/associate the correct memory type in xdp_buff->rxq.

Function name proposals:
  __skb_steal_data
  __free_skb_and_steal_data
  __free_skb_and_steal_data_for_xdp
  __free_skb_and_xdp_steal_data
  __skb2xdp_steal_data

When doing this in a function, it will also allow us to add some 
comments explaining the different cases and assumptions.  These 
assumptions can get broken as a result of (future) changes in 
surrounding the code, thus we need to explain our assumptions/intent (to 
help our future selves).

For code readability, I think we should convert (skb != skb_orig) into a 
boolean that says what this case captures, e.g. local_pp_alloc.

Func prototype:
  __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc)


Always feel free to challenge my view,
--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-09 12:35           ` Yunsheng Lin
@ 2023-08-12  1:52             ` Liang Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-12  1:52 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel,
	ast, netdev

On Wed, Aug 9, 2023 at 8:35 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/8/9 18:01, Liang Chen wrote:
> > On Tue, Aug 8, 2023 at 7:16 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2023/8/7 20:20, Liang Chen wrote:
> >>> On Wed, Aug 2, 2023 at 8:32 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>>>
> >>>> On 2023/8/1 14:19, Liang Chen wrote:
> >>>>
> >>>>> @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >>>>>       case XDP_PASS:
> >>>>>               break;
> >>>>>       case XDP_TX:
> >>>>> -             veth_xdp_get(xdp);
> >>>>> -             consume_skb(skb);
> >>>>> -             xdp->rxq->mem = rq->xdp_mem;
> >>>>> +             if (skb != skb_orig) {
> >>>>> +                     xdp->rxq->mem = rq->xdp_mem_pp;
> >>>>> +                     kfree_skb_partial(skb, true);
> >>>>
> >>>> For this case, I suppose that we can safely call kfree_skb_partial()
> >>>> as we allocate the skb in the veth_convert_skb_to_xdp_buff(), but
> >>>> I am not sure about the !skb->pp_recycle case.
> >>>>
> >>>>> +             } else if (!skb->pp_recycle) {
> >>>>> +                     xdp->rxq->mem = rq->xdp_mem;
> >>>>> +                     kfree_skb_partial(skb, true);
> >>>>
> >>>> For consume_skb(), there is skb_unref() checking and other checking/operation.
> >>>> Can we really assume that we can call kfree_skb_partial() with head_stolen
> >>>> being true? Is it possible that skb->users is bigger than 1? If it is possible,
> >>>> don't we free the 'skb' back to skbuff_cache when other may still be using
> >>>> it?
> >>>>
> >>>
> >>> Thanks for raising the concern. If there are multiple references to
> >>> the skb (skb->users is greater than 1), the skb will be reallocated in
> >>> veth_convert_skb_to_xdp_buff(). So it should enter the skb != skb_orig
> >>> case.
> >>>
> >>> In fact, entering the !skb->pp_recycle case implies that the skb meets
> >>> the following conditions:
> >>> 1. It is neither shared nor cloned.
> >>> 2. It is not allocated using kmalloc.
> >>> 3. It does not have fragment data.
> >>> 4. The headroom of the skb is greater than XDP_PACKET_HEADROOM.
> >>>
> >>
> >> You are right, I missed the checking in veth_convert_skb_to_xdp_buff(),
> >> it seems the xdp is pretty strict about the buffer owner, it need to
> >> have exclusive access to all the buffer.
> >>
> >> And it seems there is only one difference left then, with
> >> kfree_skb_partial() calling 'kmem_cache_free(skbuff_cache, skb)' and
> >> consume_skb() calling 'kfree_skbmem(skb)'. If we are true about
> >> 'skb' only allocated from 'skbuff_cache', this patch looks good to me
> >> then.
> >>
> >
> > The difference between kmem_cache_free and kfree_skbmem lies in the
> > fact that kfree_skbmem checks whether the skb is an fclone (fast
> > clone) skb. If it is, it should be returned to the
> > skbuff_fclone_cache. Currently, fclone skbs can only be allocated
> > through __alloc_skb, and their head buffer is allocated by
> > kmalloc_reserve, which does not meet the condition mentioned above -
> > "2. It is not allocated using kmalloc.". Therefore, the fclone skb
> > will still be reallocated by veth_convert_skb_to_xdp_buff, leading to
> > the skb != skb_orig case. In other words, entering the
> > !skb->pp_recycle case indicates that the skb was allocated from
> > skbuff_cache.
>
> It might need some comment to make it clear or add some compile testing
> such as BUILD_BUG_ON() to ensure that, as it is not so obvious if
> someone change it to allocate a fclone skb with a frag head data in
> the future.
>

Sure. We will add a comment to explain that like below
/*
 * We can safely use kfree_skb_partial here because this cannot be an
fclone skb.
 * Fclone skbs are exclusively allocated via __alloc_skb, with their head buffer
 * allocated by kmalloc_reserve (so, skb->head_frag = 0), satisfying the
 * skb_head_is_locked condition in veth_convert_skb_to_xdp_buff, leading to skb
 * being reallocated.
 */

> Also I suppose the veth_xdp_rcv_skb() is called in NAPI context, and
> we might be able to reuse the 'skb' if we can use something like
> napi_skb_free_stolen_head().

Sure. Using napi_skb_free_stolen_head seems to be a good idea, it
further accelerates the skb != skb_orig case in our primitive test.
However, it's not suitable for the !skb->pp_recycle case, as the skb
isn't allocated in the current NAPI context.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling
  2023-08-11 18:16   ` Jesper Dangaard Brouer
@ 2023-08-12  2:02     ` Liang Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-12  2:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, edumazet, kuba, pabeni, linyunsheng, brouer, hawk,
	ilias.apalodimas, daniel, ast, netdev

On Sat, Aug 12, 2023 at 2:16 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 01/08/2023 08.19, Liang Chen wrote:
> [...]
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 509e901da41d..ea1b344e5db4 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> [...]
> > @@ -848,6 +850,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >               goto out;
> >       }
> >
> > +     skb_orig = skb;
> >       __skb_push(skb, skb->data - skb_mac_header(skb));
> >       if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> >               goto drop;
> > @@ -862,9 +865,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >       case XDP_PASS:
> >               break;
> >       case XDP_TX:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>
> Above code section, and below section looks the same.
> It begs for a common function.
>
> >               if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
> >                       trace_xdp_exception(rq->dev, xdp_prog, act);
> >                       stats->rx_drops++;
> > @@ -874,9 +886,18 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> >               rcu_read_unlock();
> >               goto xdp_xmit;
> >       case XDP_REDIRECT:
> > -             veth_xdp_get(xdp);
> > -             consume_skb(skb);
> > -             xdp->rxq->mem = rq->xdp_mem;
> > +             if (skb != skb_orig) {
> > +                     xdp->rxq->mem = rq->xdp_mem_pp;
> > +                     kfree_skb_partial(skb, true);
> > +             } else if (!skb->pp_recycle) {
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +                     kfree_skb_partial(skb, true);
> > +             } else {
> > +                     veth_xdp_get(xdp);
> > +                     consume_skb(skb);
> > +                     xdp->rxq->mem = rq->xdp_mem;
> > +             }
> > +
>
> The common function can be named to reflect what the purpose of this
> code section is.  According to my understanding, the code steals the
> (packet) data section from the SKB and free the SKB.  And
> prepare/associate the correct memory type in xdp_buff->rxq.
>
> Function name proposals:
>   __skb_steal_data
>   __free_skb_and_steal_data
>   __free_skb_and_steal_data_for_xdp
>   __free_skb_and_xdp_steal_data
>   __skb2xdp_steal_data
>
> When doing this in a function, it will also allow us to add some
> comments explaining the different cases and assumptions.  These
> assumptions can get broken as a result of (future) changes in
> surrounding the code, thus we need to explain our assumptions/intent (to
> help our future selves).
>
> For code readability, I think we should convert (skb != skb_orig) into a
> boolean that says what this case captures, e.g. local_pp_alloc.
>
> Func prototype:
>   __skb2xdp_steal_data(skb, xdp, rq, bool local_pp_alloc)
>
>
> Always feel free to challenge my view,
> --Jesper
>

Thank you for the detailed suggestion! We will move the code into a
function as you pointed out and comment on each of the different cases
to explain our assumption and intent.

Thanks,
Liang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only
  2023-08-11 16:35     ` Jesper Dangaard Brouer
@ 2023-08-12  2:16       ` Liang Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Liang Chen @ 2023-08-12  2:16 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jesper Dangaard Brouer, brouer, davem, edumazet, kuba, pabeni,
	linyunsheng, ilias.apalodimas, daniel, ast, netdev,
	Lorenzo Bianconi

On Sat, Aug 12, 2023 at 12:35 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 11/08/2023 14.02, Liang Chen wrote:
> > On Wed, Aug 2, 2023 at 4:56 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> >>
> [...]
> >>>                page_pool_destroy(priv->rq[i].page_pool);
> >>>                priv->rq[i].page_pool = NULL;
> >>>        }
> >>
> >> The page_pool_destroy() call handles(exits) if called with NULL.
> >> So, I don't think this incorrect walking all (start to end) can trigger
> >> an actual bug.
> >>
> >> Anyhow, I do think this is more correct, so you can append my ACK for
> >> the real submission.
> >>
> >> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
> >>
> >
> > Thanks! I will separate this patch out and make a real submission,
> > since it's a small fix and not really coupled with the optimization
> > patch which still needs some further work after receiving feedback
> > from Yunsheng.
>
> Sure, send it as a fix commit.  Given it is not super critical i think
> it is okay to send for net-next, to avoid merge issues/conflicts with
> your 2/2 optimization patch.  And for good order we should add a Fixes
> tag, but IMHO net-next is still okay, given I don't think this can
> trigger a bug.
>
> That said, I do want to encourage you to work on 2/2 optimization patch.
> I think this is a very important optimization and actually a fix for
> then we introduced page_pool to veth. Well spotted! :-)
>
> In 2/2 I keep getting confused by use of kfree_skb_partial() and if the
> right conditions are meet.
>
> --Jesper
>

Thank you for the encouragement! The work for the optimization patch
is going on, and we will submit another version which hopefully will
give a further performance improvement.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-12  2:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-01  6:19 [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen
2023-08-01  6:19 ` [RFC PATCH net-next v2 2/2] net: veth: Improving page pool pages recycling Liang Chen
2023-08-02 12:32   ` Yunsheng Lin
2023-08-07 12:20     ` Liang Chen
2023-08-08 11:16       ` Yunsheng Lin
2023-08-09 10:01         ` Liang Chen
2023-08-09 12:35           ` Yunsheng Lin
2023-08-12  1:52             ` Liang Chen
2023-08-11 18:16   ` Jesper Dangaard Brouer
2023-08-12  2:02     ` Liang Chen
2023-08-01 12:18 ` [RFC PATCH net-next v2 1/2] net: veth: Page pool creation error handling for existing pools only Simon Horman
2023-08-11 11:56   ` Liang Chen
2023-08-02  8:56 ` Jesper Dangaard Brouer
2023-08-11 12:02   ` Liang Chen
2023-08-11 16:35     ` Jesper Dangaard Brouer
2023-08-12  2:16       ` Liang Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).