* [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only @ 2023-07-19 7:29 Liang Chen 2023-07-19 7:29 ` [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling Liang Chen 2023-07-19 12:43 ` [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Yunsheng Lin 0 siblings, 2 replies; 7+ messages in thread From: Liang Chen @ 2023-07-19 7:29 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni Cc: hawk, ilias.apalodimas, daniel, ast, linyunsheng, netdev, liangchen.linux The failure handling procedure destroys page pools for all queues, including those that haven't had their page pool created yet. this patch introduces necessary adjustments to prevent potential risks and inconsistency with the error handling behavior. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- drivers/net/veth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..509e901da41d 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) err_xdp_ring: for (i--; i >= start; i--) ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); + i = end; err_page_pool: - for (i = start; i < end; i++) { + for (i--; i >= start; i--) { page_pool_destroy(priv->rq[i].page_pool); priv->rq[i].page_pool = NULL; } -- 2.40.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling 2023-07-19 7:29 [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen @ 2023-07-19 7:29 ` Liang Chen 2023-07-21 12:18 ` Yunsheng Lin 2023-07-19 12:43 ` [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Yunsheng Lin 1 sibling, 1 reply; 7+ messages in thread From: Liang Chen @ 2023-07-19 7:29 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni Cc: hawk, ilias.apalodimas, daniel, ast, linyunsheng, netdev, liangchen.linux Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases, the pages are not effectively recycled. "ethtool -S" statistics for the page pool are as follows: NIC statistics: rx_pp_alloc_fast: 18041186 rx_pp_alloc_slow: 286369 rx_pp_recycle_ring: 0 rx_pp_recycle_released_ref: 18327555 This failure to recycle page pool pages is a result of the code snippet below, which converts page pool pages into regular pages and releases the skb data structure: veth_xdp_get(xdp); consume_skb(skb); The reason behind is some skbs received from the veth peer are not page pool pages, and remain so after conversion to xdp frame. In order to not confusing __xdp_return with mixed regular pages and page pool pages, they are all converted to regular pages. So registering xdp memory model as MEM_TYPE_PAGE_SHARED is sufficient. If we replace the above code with kfree_skb_partial, directly releasing the skb data structure, we can retain the original page pool page behavior. However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is not a solution as explained above. Therefore, we introduced an additionally MEM_TYPE_PAGE_POOL model for each rq. The following tests were conducted using pktgen to generate traffic and evaluate the performance improvement after page pool pages get successfully recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP. Test environment setup: ns1 ns2 veth0 <-peer-> veth1 veth2 <-peer-> veth3 Test Results: pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP) without PP recycle: 1,780,392 with PP recycle: 1,984,680 improvement: ~10% pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS) without PP recycle: 1,433,491 with PP recycle: 1,511,680 improvement: 5~6% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP) without PP recycle: 1,527,708 with PP recycle: 1,672,101 improvement: ~10% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS) without PP recycle: 1,325,804 with PP recycle: 1,392,704 improvement: ~5.5% pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP) without PP recycle: 1,607,609 with PP recycle: 1,736,957 improvement: ~8% Additionally, the performance improvement were measured when converting to xdp_buff doesn't require buffer copy and original skb uses regular pages, i.e. page pool recycle not involved. This still gives around 2% improvement attributed to the changes from consume_skb to kfree_skb_partial. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- drivers/net/veth.c | 41 ++++++++++++++++++++++------------------- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 509e901da41d..a825b086f744 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -62,6 +62,7 @@ struct veth_rq { struct net_device *dev; struct bpf_prog __rcu *xdp_prog; struct xdp_mem_info xdp_mem; + struct xdp_mem_info xdp_mem_pp; struct veth_rq_stats stats; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -713,19 +714,6 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames, } } -static void veth_xdp_get(struct xdp_buff *xdp) -{ - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - int i; - - get_page(virt_to_page(xdp->data)); - if (likely(!xdp_buff_has_frags(xdp))) - return; - - for (i = 0; i < sinfo->nr_frags; i++) - __skb_frag_ref(&sinfo->frags[i]); -} - static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, struct xdp_buff *xdp, struct sk_buff **pskb) @@ -862,9 +850,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, case XDP_PASS: break; case XDP_TX: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem; + kfree_skb_partial(skb, true); + if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { trace_xdp_exception(rq->dev, xdp_prog, act); stats->rx_drops++; @@ -874,9 +862,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem; + kfree_skb_partial(skb, true); + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { stats->rx_drops++; goto err_xdp; @@ -1061,6 +1049,14 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) goto err_page_pool; } + for (i = start; i < end; i++) { + err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp, + MEM_TYPE_PAGE_POOL, + priv->rq[i].page_pool); + if (err) + goto err_reg_mem; + } + for (i = start; i < end; i++) { struct veth_rq *rq = &priv->rq[i]; @@ -1082,6 +1078,10 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) for (i--; i >= start; i--) ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); i = end; +err_reg_mem: + for (i--; i >= start; i--) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + i = end; err_page_pool: for (i--; i >= start; i--) { page_pool_destroy(priv->rq[i].page_pool); @@ -1117,6 +1117,9 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); } + for (i = start; i < end; i++) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + for (i = start; i < end; i++) { page_pool_destroy(priv->rq[i].page_pool); priv->rq[i].page_pool = NULL; -- 2.40.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling 2023-07-19 7:29 ` [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling Liang Chen @ 2023-07-21 12:18 ` Yunsheng Lin 2023-07-24 9:44 ` Liang Chen 0 siblings, 1 reply; 7+ messages in thread From: Yunsheng Lin @ 2023-07-21 12:18 UTC (permalink / raw) To: Liang Chen, davem, edumazet, kuba, pabeni Cc: hawk, ilias.apalodimas, daniel, ast, netdev On 2023/7/19 15:29, Liang Chen wrote: ... > > The reason behind is some skbs received from the veth peer are not page > pool pages, and remain so after conversion to xdp frame. In order to not > confusing __xdp_return with mixed regular pages and page pool pages, they > are all converted to regular pages. So registering xdp memory model as > MEM_TYPE_PAGE_SHARED is sufficient. > > If we replace the above code with kfree_skb_partial, directly releasing > the skb data structure, we can retain the original page pool page behavior. > However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is > not a solution as explained above. Therefore, we introduced an additionally > MEM_TYPE_PAGE_POOL model for each rq. > ... > @@ -874,9 +862,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > rcu_read_unlock(); > goto xdp_xmit; > case XDP_REDIRECT: > - veth_xdp_get(xdp); > - consume_skb(skb); > - xdp->rxq->mem = rq->xdp_mem; > + xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem; I am not really familiar with the veth here, so some question here: Is it possible that skbs received from the veth peer are also page pool pages? Does using the local rq->xdp_mem_pp for page allocated from veth peer cause some problem here? As there is type and id for a specific page_pool instance, type may be the same, but I suppose id is not the same for veth and it's veth peer. > + kfree_skb_partial(skb, true); > + ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling 2023-07-21 12:18 ` Yunsheng Lin @ 2023-07-24 9:44 ` Liang Chen 0 siblings, 0 replies; 7+ messages in thread From: Liang Chen @ 2023-07-24 9:44 UTC (permalink / raw) To: Yunsheng Lin Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel, ast, netdev On Fri, Jul 21, 2023 at 8:18 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/7/19 15:29, Liang Chen wrote: > > ... > > > > > The reason behind is some skbs received from the veth peer are not page > > pool pages, and remain so after conversion to xdp frame. In order to not > > confusing __xdp_return with mixed regular pages and page pool pages, they > > are all converted to regular pages. So registering xdp memory model as > > MEM_TYPE_PAGE_SHARED is sufficient. > > > > If we replace the above code with kfree_skb_partial, directly releasing > > the skb data structure, we can retain the original page pool page behavior. > > However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is > > not a solution as explained above. Therefore, we introduced an additionally > > MEM_TYPE_PAGE_POOL model for each rq. > > > > ... > > > @@ -874,9 +862,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > rcu_read_unlock(); > > goto xdp_xmit; > > case XDP_REDIRECT: > > - veth_xdp_get(xdp); > > - consume_skb(skb); > > - xdp->rxq->mem = rq->xdp_mem; > > + xdp->rxq->mem = skb->pp_recycle ? rq->xdp_mem_pp : rq->xdp_mem; > > I am not really familiar with the veth here, so some question here: > Is it possible that skbs received from the veth peer are also page pool pages? > Does using the local rq->xdp_mem_pp for page allocated from veth peer cause > some problem here? As there is type and id for a specific page_pool instance, > type may be the same, but I suppose id is not the same for veth and it's veth > peer. > Yeah, I understand your concern. If a skb uses a page pool page whose pool has previously been registered with a xdp memory model, this will lead to a situation where veth compose a xdp frame indicating its buffer coming from the local xdp_mem_pp pool from its xdp_mem_info.id field, and the page structure itself refers to another pool from where it was originally allocated. This may cause problems for things like xdp_return_frame_bulk. We will address it in V2. Thank you for bringing up this issue. Thanks, Liang > > + kfree_skb_partial(skb, true); > > + ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only 2023-07-19 7:29 [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen 2023-07-19 7:29 ` [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling Liang Chen @ 2023-07-19 12:43 ` Yunsheng Lin 2023-07-21 11:17 ` Liang Chen 1 sibling, 1 reply; 7+ messages in thread From: Yunsheng Lin @ 2023-07-19 12:43 UTC (permalink / raw) To: Liang Chen, davem, edumazet, kuba, pabeni Cc: hawk, ilias.apalodimas, daniel, ast, netdev On 2023/7/19 15:29, Liang Chen wrote: > The failure handling procedure destroys page pools for all queues, > including those that haven't had their page pool created yet. this patch > introduces necessary adjustments to prevent potential risks and > inconsistency with the error handling behavior. > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > --- > drivers/net/veth.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 614f3e3efab0..509e901da41d 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) > err_xdp_ring: > for (i--; i >= start; i--) > ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); > + i = end; > err_page_pool: > - for (i = start; i < end; i++) { > + for (i--; i >= start; i--) { > page_pool_destroy(priv->rq[i].page_pool); > priv->rq[i].page_pool = NULL; There is NULL checking in page_pool_destroy(), priv->rq[i].page_pool is set to NULL here, and the kcalloc() in veth_alloc_queues() ensure it is NULL initially, maybe it is fine as it is? > } > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only 2023-07-19 12:43 ` [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Yunsheng Lin @ 2023-07-21 11:17 ` Liang Chen 2023-07-21 12:35 ` Yunsheng Lin 0 siblings, 1 reply; 7+ messages in thread From: Liang Chen @ 2023-07-21 11:17 UTC (permalink / raw) To: Yunsheng Lin Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel, ast, netdev On Wed, Jul 19, 2023 at 8:44 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/7/19 15:29, Liang Chen wrote: > > The failure handling procedure destroys page pools for all queues, > > including those that haven't had their page pool created yet. this patch > > introduces necessary adjustments to prevent potential risks and > > inconsistency with the error handling behavior. > > > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > > --- > > drivers/net/veth.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 614f3e3efab0..509e901da41d 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -1081,8 +1081,9 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) > > err_xdp_ring: > > for (i--; i >= start; i--) > > ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); > > + i = end; > > err_page_pool: > > - for (i = start; i < end; i++) { > > + for (i--; i >= start; i--) { > > page_pool_destroy(priv->rq[i].page_pool); > > priv->rq[i].page_pool = NULL; > > There is NULL checking in page_pool_destroy(), > priv->rq[i].page_pool is set to NULL here, and the kcalloc() > in veth_alloc_queues() ensure it is NULL initially, maybe it > is fine as it is? > Sure, it doesn't cause any real problem. This was meant to align err_page_pool handling with the case above (though ptr_ring_cleanup cannot take an uninitialized ring), and it doesn't always need to loop from start to end. Thanks, Liang > > } > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only 2023-07-21 11:17 ` Liang Chen @ 2023-07-21 12:35 ` Yunsheng Lin 0 siblings, 0 replies; 7+ messages in thread From: Yunsheng Lin @ 2023-07-21 12:35 UTC (permalink / raw) To: Liang Chen Cc: davem, edumazet, kuba, pabeni, hawk, ilias.apalodimas, daniel, ast, netdev On 2023/7/21 19:17, Liang Chen wrote: >> There is NULL checking in page_pool_destroy(), >> priv->rq[i].page_pool is set to NULL here, and the kcalloc() >> in veth_alloc_queues() ensure it is NULL initially, maybe it >> is fine as it is? >> > > Sure, it doesn't cause any real problem. > > This was meant to align err_page_pool handling with the case above > (though ptr_ring_cleanup cannot take an uninitialized ring), and it > doesn't always need to loop from start to end. > I suppose it is for the preparation of the next patch, right? In that case, maybe make it clear in the commit log. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-24 9:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-19 7:29 [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Liang Chen 2023-07-19 7:29 ` [RFC PATCH net-next 2/2] net: veth: Improving page pool recycling Liang Chen 2023-07-21 12:18 ` Yunsheng Lin 2023-07-24 9:44 ` Liang Chen 2023-07-19 12:43 ` [RFC PATCH net-next 1/2] net: veth: Page pool creation error handling for existing pools only Yunsheng Lin 2023-07-21 11:17 ` Liang Chen 2023-07-21 12:35 ` Yunsheng Lin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.