* [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage @ 2023-08-16 12:30 Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw) To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux Page pool is supported for veth, but at the moment pages are not properly recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully leveraging the advantages of the page pool. So this RFC patchset is mainly to make recycling work for those cases. With that in place, it can be further optimized by utilizing the napi skb cache. Detailed figures are presented in each commit message, and together they demonstrate a quite noticeable improvement. Changes from v2: - refactor the code to make it more readable - make use of the napi skb cache for further optimization - take the Page pool creation error handling patch out for separate submission Liang Chen (2): net: veth: Improving page pool recycling net: veth: Optimizing skb reuse in NAPI Context drivers/net/veth.c | 66 ++++++++++++++++++++++++++++++++++++++++------ net/core/skbuff.c | 1 + 2 files changed, 59 insertions(+), 8 deletions(-) -- 2.40.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling 2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen @ 2023-08-16 12:30 ` Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen 2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer 2 siblings, 0 replies; 8+ messages in thread From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw) To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux Page pool is supported for veth. But for XDP_TX and XDP_REDIRECT cases, the pages are not effectively recycled. "ethtool -S" statistics for the page pool are as follows: NIC statistics: rx_pp_alloc_fast: 18041186 rx_pp_alloc_slow: 286369 rx_pp_recycle_ring: 0 rx_pp_recycle_released_ref: 18327555 This failure to recycle page pool pages is a result of the code snippet below, which converts page pool pages into regular pages and releases the skb data structure: veth_xdp_get(xdp); consume_skb(skb); The reason behind is some skbs received from the veth peer are not page pool pages, and remain so after conversion to xdp frame. In order to not confusing __xdp_return with mixed regular pages and page pool pages, they are all converted to regular pages. So registering xdp memory model as MEM_TYPE_PAGE_SHARED is sufficient. If we replace the above code with kfree_skb_partial, directly releasing the skb data structure, we can retain the original page pool page behavior. However, directly changing the xdp memory model to MEM_TYPE_PAGE_POOL is not a solution as explained above. Therefore, we introduced an additionally MEM_TYPE_PAGE_POOL model for each rq. The following tests were conducted using pktgen to generate traffic and evaluate the performance improvement after page pool pages get successfully recycled in scenarios involving XDP_TX, XDP_REDIRECT, and AF_XDP. Test environment setup: ns1 ns2 veth0 <-peer-> veth1 veth2 <-peer-> veth3 Test Results: pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP) without PP recycle: 1,855,069 with PP recycle: 2,033,439 improvement: ~10% pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS) without PP recycle: 1,494,890 with PP recycle: 1,585,462 improvement: 5% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP) without PP recycle: 1,631,582 with PP recycle: 1,787,342 improvement: ~10% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS) without PP recycle: 1,309,265 with PP recycle: 1,391,587 improvement: ~6% pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP) without PP recycle: 1,674,021 with PP recycle: 1,811,844 improvement: ~8% Additionally, the performance improvement were measured when converting to xdp_buff doesn't require buffer copy and original skb uses regular pages, i.e. page pool recycle not involved. This still gives around 2% improvement attributed to the changes from consume_skb to kfree_skb_partial. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- drivers/net/veth.c | 64 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 56 insertions(+), 8 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 509e901da41d..7234eb0297dd 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -62,6 +62,7 @@ struct veth_rq { struct net_device *dev; struct bpf_prog __rcu *xdp_prog; struct xdp_mem_info xdp_mem; + struct xdp_mem_info xdp_mem_pp; struct veth_rq_stats stats; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -728,7 +729,8 @@ static void veth_xdp_get(struct xdp_buff *xdp) static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, struct xdp_buff *xdp, - struct sk_buff **pskb) + struct sk_buff **pskb, + bool *local_pp_alloc) { struct sk_buff *skb = *pskb; u32 frame_sz; @@ -802,6 +804,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, consume_skb(skb); skb = nskb; + *local_pp_alloc = true; } /* SKB "head" area always have tailroom for skb_shared_info */ @@ -827,6 +830,37 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, return -ENOMEM; } +static void __skb2xdp_steal_data(struct sk_buff *skb, + struct xdp_buff *xdp, + struct veth_rq *rq, + bool local_pp_alloc) +{ + if (local_pp_alloc) { + /* This is the most common case where the skb was reallocated locally in + * veth_convert_skb_to_xdp_buff, and it's safe to use the xdp_mem_pp model. + */ + xdp->rxq->mem = rq->xdp_mem_pp; + kfree_skb_partial(skb, true); + } else if (!skb->pp_recycle) { + /* We can safely use kfree_skb_partial here because this cannot be an fclone + * skb. Fclone skbs are allocated via __alloc_skb, with their head buffer + * allocated by kmalloc_reserve (i.e. skb->head_frag = 0), satisfying the + * skb_head_is_locked condition in veth_convert_skb_to_xdp_buff, and are + * thus reallocated. + */ + xdp->rxq->mem = rq->xdp_mem; + kfree_skb_partial(skb, true); + } else { + /* skbs in this case may include page_pool pages from peer. We cannot use + * rq->xdp_mem_pp as for the local_pp_alloc case, because they might already + * be associated with different xdp_mem_info. + */ + veth_xdp_get(xdp); + consume_skb(skb); + xdp->rxq->mem = rq->xdp_mem; + } +} + static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, struct veth_xdp_tx_bq *bq, @@ -836,6 +870,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct bpf_prog *xdp_prog; struct veth_xdp_buff vxbuf; struct xdp_buff *xdp = &vxbuf.xdp; + bool local_pp_alloc = false; u32 act, metalen; int off; @@ -849,7 +884,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, } __skb_push(skb, skb->data - skb_mac_header(skb)); - if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) + if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb, &local_pp_alloc)) goto drop; vxbuf.skb = skb; @@ -862,9 +897,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, case XDP_PASS: break; case XDP_TX: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + __skb2xdp_steal_data(skb, xdp, rq, local_pp_alloc); + if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { trace_xdp_exception(rq->dev, xdp_prog, act); stats->rx_drops++; @@ -874,9 +908,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - veth_xdp_get(xdp); - consume_skb(skb); - xdp->rxq->mem = rq->xdp_mem; + __skb2xdp_steal_data(skb, xdp, rq, local_pp_alloc); + if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { stats->rx_drops++; goto err_xdp; @@ -1061,6 +1094,14 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) goto err_page_pool; } + for (i = start; i < end; i++) { + err = xdp_reg_mem_model(&priv->rq[i].xdp_mem_pp, + MEM_TYPE_PAGE_POOL, + priv->rq[i].page_pool); + if (err) + goto err_reg_mem; + } + for (i = start; i < end; i++) { struct veth_rq *rq = &priv->rq[i]; @@ -1082,6 +1123,10 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) for (i--; i >= start; i--) ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); i = end; +err_reg_mem: + for (i--; i >= start; i--) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + i = end; err_page_pool: for (i--; i >= start; i--) { page_pool_destroy(priv->rq[i].page_pool); @@ -1117,6 +1162,9 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); } + for (i = start; i < end; i++) + xdp_unreg_mem_model(&priv->rq[i].xdp_mem_pp); + for (i = start; i < end; i++) { page_pool_destroy(priv->rq[i].page_pool); priv->rq[i].page_pool = NULL; -- 2.40.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context 2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen @ 2023-08-16 12:30 ` Liang Chen 2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer 2 siblings, 0 replies; 8+ messages in thread From: Liang Chen @ 2023-08-16 12:30 UTC (permalink / raw) To: hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng Cc: ilias.apalodimas, daniel, ast, netdev, liangchen.linux In cases where the skb is reallocated by veth_convert_skb_to_xdp_buff, we leverage the NAPI version of the "head stolen" function to enable fast skb reuse. The following test results evaluate the performance improvement resulting from reusing skb in the NAPI context with pktgen-generated traffic. Test environment setup: ns1 ns2 veth0 <-peer-> veth1 veth2 <-peer-> veth3 Test Results: pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_DROP) without reusing skb: 2,033,439 reusing skb: 2,167,749 improvement: ~6% pktgen -> veth1 -> veth0(XDP_TX) -> veth1(XDP_PASS) without reusing skb: 1,585,462 reusing skb: 1,650,572 improvement: ~4% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_DROP) without reusing skb: 1,787,342 reusing skb: 1,848,516 improvement: ~3% pktgen -> veth1 -> veth0(XDP_REDIRECT) -> veth2 -> veth3(XDP_PASS) without reusing skb: 1,391,587 reusing skb: 1,439,866 improvement: ~3% pktgen -> veth1 -> veth0(AF_XDP) -> user space(DROP) without reusing skb: 1,811,844 with reusing skb: 1,861,027 improvement: ~3% Signed-off-by: Liang Chen <liangchen.linux@gmail.com> --- drivers/net/veth.c | 5 ++++- net/core/skbuff.c | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 7234eb0297dd..4e1ee110ab84 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -838,9 +838,12 @@ static void __skb2xdp_steal_data(struct sk_buff *skb, if (local_pp_alloc) { /* This is the most common case where the skb was reallocated locally in * veth_convert_skb_to_xdp_buff, and it's safe to use the xdp_mem_pp model. + * Since the skb is "reallocated" in the NAPI context of veth, it is possible + * to use the NAPI version of the "head stolen" function to optimize the + * reuse of skb as well. */ xdp->rxq->mem = rq->xdp_mem_pp; - kfree_skb_partial(skb, true); + napi_skb_free_stolen_head(skb); } else if (!skb->pp_recycle) { /* We can safely use kfree_skb_partial here because this cannot be an fclone * skb. Fclone skbs are allocated via __alloc_skb, with their head buffer diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a298992060e6..954ba1b94840 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1284,6 +1284,7 @@ void napi_skb_free_stolen_head(struct sk_buff *skb) } napi_skb_cache_put(skb); } +EXPORT_SYMBOL(napi_skb_free_stolen_head); void napi_consume_skb(struct sk_buff *skb, int budget) { -- 2.40.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage 2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen @ 2023-08-21 14:21 ` Jesper Dangaard Brouer 2023-08-21 21:54 ` Jesper Dangaard Brouer 2 siblings, 1 reply; 8+ messages in thread From: Jesper Dangaard Brouer @ 2023-08-21 14:21 UTC (permalink / raw) To: Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi, Stanislav Fomichev On 16/08/2023 14.30, Liang Chen wrote: > Page pool is supported for veth, but at the moment pages are not properly > recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully > leveraging the advantages of the page pool. So this RFC patchset is mainly > to make recycling work for those cases. With that in place, it can be > further optimized by utilizing the napi skb cache. Detailed figures are > presented in each commit message, and together they demonstrate a quite > noticeable improvement. > I'm digging into this code path today. I'm trying to extend this and find a way to support SKBs that used kmalloc (skb->head_frag=0), such that we can remove the skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will allow more SKBs to avoid realloc. As long as they have enough headroom, which we can dynamically control for netdev TX-packets by adjusting netdev->needed_headroom, e.g. when loading an XDP prog. I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't think it is a bug that generic-XDP allows this. Deep into this rabbit hole, I start to question our approach. - Perhaps the veth XDP approach for SKBs is wrong? The root-cause of this issue is that veth_xdp_rcv_skb() code path (that handle SKBs) is calling XDP-native function "xdp_do_redirect()". I question, why isn't it using "xdp_do_generic_redirect()"? (I will jump into this rabbit hole now...) --Jesper > Changes from v2: > - refactor the code to make it more readable > - make use of the napi skb cache for further optimization > - take the Page pool creation error handling patch out for separate > submission > > Liang Chen (2): > net: veth: Improving page pool recycling > net: veth: Optimizing skb reuse in NAPI Context > > drivers/net/veth.c | 66 ++++++++++++++++++++++++++++++++++++++++------ > net/core/skbuff.c | 1 + > 2 files changed, 59 insertions(+), 8 deletions(-) > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage 2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer @ 2023-08-21 21:54 ` Jesper Dangaard Brouer 2023-08-22 12:24 ` Yunsheng Lin 0 siblings, 1 reply; 8+ messages in thread From: Jesper Dangaard Brouer @ 2023-08-21 21:54 UTC (permalink / raw) To: Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni, linyunsheng Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi, Stanislav Fomichev, Maryam Tahhan On 21/08/2023 16.21, Jesper Dangaard Brouer wrote: > > On 16/08/2023 14.30, Liang Chen wrote: >> Page pool is supported for veth, but at the moment pages are not properly >> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully >> leveraging the advantages of the page pool. So this RFC patchset is >> mainly >> to make recycling work for those cases. With that in place, it can be >> further optimized by utilizing the napi skb cache. Detailed figures are >> presented in each commit message, and together they demonstrate a quite >> noticeable improvement. >> > > I'm digging into this code path today. > > I'm trying to extend this and find a way to support SKBs that used > kmalloc (skb->head_frag=0), such that we can remove the > skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will > allow more SKBs to avoid realloc. As long as they have enough headroom, > which we can dynamically control for netdev TX-packets by adjusting > netdev->needed_headroom, e.g. when loading an XDP prog. > > I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can > handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't > think it is a bug that generic-XDP allows this. > > Deep into this rabbit hole, I start to question our approach. > - Perhaps the veth XDP approach for SKBs is wrong? > > The root-cause of this issue is that veth_xdp_rcv_skb() code path (that > handle SKBs) is calling XDP-native function "xdp_do_redirect()". I > question, why isn't it using "xdp_do_generic_redirect()"? > (I will jump into this rabbit hole now...) It works, implemented using xdp_do_generic_redirect() and lifted skb_head_is_locked check in veth_convert_skb_to_xdp_buff(), plus adjust xsk_build_skb() to alloc enough headroom. The results[1] are good approx 26% faster[1] compared to Maryam's veth-benchmark[3] results[2]. --Jesper [1] https://github.com/xdp-project/xdp-project/blob/veth-benchmark02/areas/core/veth_benchmark04.org#implemented-use-generic-xdp-redirect [2] https://github.com/xdp-project/xdp-project/blob/veth-benchmark02/areas/core/veth_benchmark03.org#benchmark01-with-xdp-redirect-loaded-on-host-veth [3] https://github.com/maryamtahhan/veth-benchmark/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage 2023-08-21 21:54 ` Jesper Dangaard Brouer @ 2023-08-22 12:24 ` Yunsheng Lin 2023-08-22 13:05 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 8+ messages in thread From: Yunsheng Lin @ 2023-08-22 12:24 UTC (permalink / raw) To: Jesper Dangaard Brouer, Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi, Stanislav Fomichev, Maryam Tahhan On 2023/8/22 5:54, Jesper Dangaard Brouer wrote: > On 21/08/2023 16.21, Jesper Dangaard Brouer wrote: >> >> On 16/08/2023 14.30, Liang Chen wrote: >>> Page pool is supported for veth, but at the moment pages are not properly >>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully >>> leveraging the advantages of the page pool. So this RFC patchset is mainly >>> to make recycling work for those cases. With that in place, it can be >>> further optimized by utilizing the napi skb cache. Detailed figures are >>> presented in each commit message, and together they demonstrate a quite >>> noticeable improvement. >>> >> >> I'm digging into this code path today. >> >> I'm trying to extend this and find a way to support SKBs that used >> kmalloc (skb->head_frag=0), such that we can remove the >> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will >> allow more SKBs to avoid realloc. As long as they have enough headroom, >> which we can dynamically control for netdev TX-packets by adjusting >> netdev->needed_headroom, e.g. when loading an XDP prog. >> >> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can >> handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't >> think it is a bug that generic-XDP allows this. Is it possible to relaxe other checking too, and implement something like pskb_expand_head() in xdp core if xdp core need to modify the data? >> >> Deep into this rabbit hole, I start to question our approach. >> - Perhaps the veth XDP approach for SKBs is wrong? >> >> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that >> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I >> question, why isn't it using "xdp_do_generic_redirect()"? >> (I will jump into this rabbit hole now...) Is there any reason why xdp_do_redirect() can not handle the slab-allocated data? Can we change the xdp_do_redirect() to handle slab-allocated data, so that it can benefit other case beside veth too? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage 2023-08-22 12:24 ` Yunsheng Lin @ 2023-08-22 13:05 ` Jesper Dangaard Brouer 2023-08-22 18:13 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 8+ messages in thread From: Jesper Dangaard Brouer @ 2023-08-22 13:05 UTC (permalink / raw) To: Yunsheng Lin, Jesper Dangaard Brouer, Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni Cc: brouer, ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi, Stanislav Fomichev, Maryam Tahhan On 22/08/2023 14.24, Yunsheng Lin wrote: > On 2023/8/22 5:54, Jesper Dangaard Brouer wrote: >> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote: >>> >>> On 16/08/2023 14.30, Liang Chen wrote: >>>> Page pool is supported for veth, but at the moment pages are not properly >>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully >>>> leveraging the advantages of the page pool. So this RFC patchset is mainly >>>> to make recycling work for those cases. With that in place, it can be >>>> further optimized by utilizing the napi skb cache. Detailed figures are >>>> presented in each commit message, and together they demonstrate a quite >>>> noticeable improvement. >>>> >>> >>> I'm digging into this code path today. >>> >>> I'm trying to extend this and find a way to support SKBs that used >>> kmalloc (skb->head_frag=0), such that we can remove the >>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will >>> allow more SKBs to avoid realloc. As long as they have enough headroom, >>> which we can dynamically control for netdev TX-packets by adjusting >>> netdev->needed_headroom, e.g. when loading an XDP prog. >>> >>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can >>> handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't >>> think it is a bug that generic-XDP allows this. > > Is it possible to relaxe other checking too, and implement something like > pskb_expand_head() in xdp core if xdp core need to modify the data? > Yes, I definitely hope (and plan) to relax other checks. The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and wrong, as many drivers today use headroom 192 bytes for XDP (which we allowed). Thus, there is not reason for veth to insist on this XDP_PACKET_HEADROOM limit. Today XDP can handle variable headroom (due to these drivers). > >>> >>> Deep into this rabbit hole, I start to question our approach. >>> - Perhaps the veth XDP approach for SKBs is wrong? >>> >>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that >>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I >>> question, why isn't it using "xdp_do_generic_redirect()"? >>> (I will jump into this rabbit hole now...) > > Is there any reason why xdp_do_redirect() can not handle the slab-allocated > data? Can we change the xdp_do_redirect() to handle slab-allocated > data, so that it can benefit other case beside veth too? > I started coding up this, but realized that it was a wrong approach. The xdp_do_redirect() call is for native-XDP with a proper xdp_buff. When dealing with SKBs we pretend is a xdp_buff, we have the API xdp_do_generic_redirect(). IMHO it is wrong to "steal" the packet-data from an SKB and in-order to use the native-XDP API xdp_do_redirect(). In the use-cases I see, often the next layer will allocate a new SKB and attach the stolen packet-data , which is pure-waste as xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs. --Jesper ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage 2023-08-22 13:05 ` Jesper Dangaard Brouer @ 2023-08-22 18:13 ` Jesper Dangaard Brouer 0 siblings, 0 replies; 8+ messages in thread From: Jesper Dangaard Brouer @ 2023-08-22 18:13 UTC (permalink / raw) To: Yunsheng Lin, Jesper Dangaard Brouer, Liang Chen, hawk, horms, davem, edumazet, kuba, pabeni Cc: ilias.apalodimas, daniel, ast, netdev, Lorenzo Bianconi, Stanislav Fomichev, Maryam Tahhan On 22/08/2023 15.05, Jesper Dangaard Brouer wrote: > > On 22/08/2023 14.24, Yunsheng Lin wrote: >> On 2023/8/22 5:54, Jesper Dangaard Brouer wrote: >>> On 21/08/2023 16.21, Jesper Dangaard Brouer wrote: >>>> >>>> On 16/08/2023 14.30, Liang Chen wrote: >>>>> Page pool is supported for veth, but at the moment pages are not properly >>>>> recyled for XDP_TX and XDP_REDIRECT. That prevents veth xdp from fully >>>>> leveraging the advantages of the page pool. So this RFC patchset is mainly >>>>> to make recycling work for those cases. With that in place, it can be >>>>> further optimized by utilizing the napi skb cache. Detailed figures are >>>>> presented in each commit message, and together they demonstrate a quite >>>>> noticeable improvement. >>>>> >>>> >>>> I'm digging into this code path today. >>>> >>>> I'm trying to extend this and find a way to support SKBs that used >>>> kmalloc (skb->head_frag=0), such that we can remove the >>>> skb_head_is_locked() check in veth_convert_skb_to_xdp_buff(), which will >>>> allow more SKBs to avoid realloc. As long as they have enough headroom, >>>> which we can dynamically control for netdev TX-packets by adjusting >>>> netdev->needed_headroom, e.g. when loading an XDP prog. >>>> >>>> I noticed netif_receive_generic_xdp() and bpf_prog_run_generic_xdp() can >>>> handle SKB kmalloc (skb->head_frag=0). Going though the code, I don't >>>> think it is a bug that generic-XDP allows this. >> >> Is it possible to relaxe other checking too, and implement something like >> pskb_expand_head() in xdp core if xdp core need to modify the data? >> > > Yes, I definitely hope (and plan) to relax other checks. > > The XDP_PACKET_HEADROOM (256 bytes) check have IMHO become obsolete and > wrong, as many drivers today use headroom 192 bytes for XDP (which we > allowed). Thus, there is not reason for veth to insist on this > XDP_PACKET_HEADROOM limit. Today XDP can handle variable headroom (due > to these drivers). > > >> >>>> >>>> Deep into this rabbit hole, I start to question our approach. >>>> - Perhaps the veth XDP approach for SKBs is wrong? >>>> >>>> The root-cause of this issue is that veth_xdp_rcv_skb() code path (that >>>> handle SKBs) is calling XDP-native function "xdp_do_redirect()". I >>>> question, why isn't it using "xdp_do_generic_redirect()"? >>>> (I will jump into this rabbit hole now...) >> >> Is there any reason why xdp_do_redirect() can not handle the >> slab-allocated >> data? Can we change the xdp_do_redirect() to handle slab-allocated >> data, so that it can benefit other case beside veth too? >> > > I started coding up this, but realized that it was a wrong approach. > > The xdp_do_redirect() call is for native-XDP with a proper xdp_buff. > When dealing with SKBs we pretend is a xdp_buff, we have the API > xdp_do_generic_redirect(). IMHO it is wrong to "steal" the packet-data > from an SKB and in-order to use the native-XDP API xdp_do_redirect(). > In the use-cases I see, often the next layer will allocate a new SKB and > attach the stolen packet-data , which is pure-waste as > xdp_do_generic_redirect() keeps the SKB intact, so no new SKB allocs. > Please see my RFC-v1 patchset[1] for a different approach, which avoids the realloc and page_pool usage all together (but also results in correct recycling for PP when realloc cannot be avoided). [1] https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/ --Jesper ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-22 18:13 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-16 12:30 [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 1/2] net: veth: Improving page pool recycling Liang Chen 2023-08-16 12:30 ` [RFC PATCH net-next v3 2/2] net: veth: Optimizing skb reuse in NAPI Context Liang Chen 2023-08-21 14:21 ` [RFC PATCH net-next v3 0/2] net: veth: Optimizing page pool usage Jesper Dangaard Brouer 2023-08-21 21:54 ` Jesper Dangaard Brouer 2023-08-22 12:24 ` Yunsheng Lin 2023-08-22 13:05 ` Jesper Dangaard Brouer 2023-08-22 18:13 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).