* [PATCH v3 net-next 0/2] add multi-buff support for xdp running in generic mode
@ 2023-12-01 13:48 Lorenzo Bianconi
2023-12-01 13:48 ` [PATCH v3 net-next 1/2] xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp Lorenzo Bianconi
2023-12-01 13:48 ` [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode Lorenzo Bianconi
0 siblings, 2 replies; 14+ messages in thread
From: Lorenzo Bianconi @ 2023-12-01 13:48 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, lorenzo.bianconi, bpf, hawk, toke,
aleksander.lobakin, willemdebruijn.kernel, jasowang, sdf
Introduce multi-buffer support for xdp running in generic mode not always
linearizing the skb in netif_receive_generic_xdp routine.
Changes since v2:
- rely on napi_alloc_frag() and napi_build_skb() to build the new skb
Changes since v1:
- explicitly keep the skb segmented in netif_skb_check_for_generic_xdp() and
do not rely on pskb_expand_head()
Lorenzo Bianconi (2):
xdp: rely on skb pointer reference in do_xdp_generic and
netif_receive_generic_xdp
xdp: add multi-buff support for xdp running in generic mode
drivers/net/tun.c | 4 +-
include/linux/netdevice.h | 2 +-
net/core/dev.c | 165 +++++++++++++++++++++++++++++++-------
3 files changed, 137 insertions(+), 34 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v3 net-next 1/2] xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp 2023-12-01 13:48 [PATCH v3 net-next 0/2] add multi-buff support for xdp running in generic mode Lorenzo Bianconi @ 2023-12-01 13:48 ` Lorenzo Bianconi 2023-12-01 13:48 ` [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode Lorenzo Bianconi 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-01 13:48 UTC (permalink / raw) To: netdev Cc: davem, edumazet, kuba, pabeni, lorenzo.bianconi, bpf, hawk, toke, aleksander.lobakin, willemdebruijn.kernel, jasowang, sdf Rely on skb pointer reference instead of the skb pointer in do_xdp_generic and netif_receive_generic_xdp routine signatures. This is a preliminary patch to add multi-buff support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- drivers/net/tun.c | 4 ++-- include/linux/netdevice.h | 2 +- net/core/dev.c | 16 +++++++++------- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afa5497f7c35..206adddff699 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1921,7 +1921,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { - ret = do_xdp_generic(xdp_prog, skb); + ret = do_xdp_generic(xdp_prog, &skb); if (ret != XDP_PASS) { rcu_read_unlock(); local_bh_enable(); @@ -2511,7 +2511,7 @@ static int tun_xdp_one(struct tun_struct *tun, skb_record_rx_queue(skb, tfile->queue_index); if (skb_xdp) { - ret = do_xdp_generic(xdp_prog, skb); + ret = do_xdp_generic(xdp_prog, &skb); if (ret != XDP_PASS) { ret = 0; goto out; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c2d74bc112dd..6195b75a520a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3927,7 +3927,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb) u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog); void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); +int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb); int netif_rx(struct sk_buff *skb); int __netif_rx(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 3950ced396b5..4df68d7f04a2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4915,10 +4915,11 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, return act; } -static u32 netif_receive_generic_xdp(struct sk_buff *skb, +static u32 netif_receive_generic_xdp(struct sk_buff **pskb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { + struct sk_buff *skb = *pskb; u32 act = XDP_DROP; /* Reinjected packets coming from act_mirred or similar should @@ -4999,24 +5000,24 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) +int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb) { if (xdp_prog) { struct xdp_buff xdp; u32 act; int err; - act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); + act = netif_receive_generic_xdp(pskb, &xdp, xdp_prog); if (act != XDP_PASS) { switch (act) { case XDP_REDIRECT: - err = xdp_do_generic_redirect(skb->dev, skb, + err = xdp_do_generic_redirect((*pskb)->dev, *pskb, &xdp, xdp_prog); if (err) goto out_redir; break; case XDP_TX: - generic_xdp_tx(skb, xdp_prog); + generic_xdp_tx(*pskb, xdp_prog); break; } return XDP_DROP; @@ -5024,7 +5025,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) } return XDP_PASS; out_redir: - kfree_skb_reason(skb, SKB_DROP_REASON_XDP); + kfree_skb_reason(*pskb, SKB_DROP_REASON_XDP); return XDP_DROP; } EXPORT_SYMBOL_GPL(do_xdp_generic); @@ -5347,7 +5348,8 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, int ret2; migrate_disable(); - ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb); + ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), + &skb); migrate_enable(); if (ret2 != XDP_PASS) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-01 13:48 [PATCH v3 net-next 0/2] add multi-buff support for xdp running in generic mode Lorenzo Bianconi 2023-12-01 13:48 ` [PATCH v3 net-next 1/2] xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp Lorenzo Bianconi @ 2023-12-01 13:48 ` Lorenzo Bianconi 2023-12-02 3:48 ` Jakub Kicinski 1 sibling, 1 reply; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-01 13:48 UTC (permalink / raw) To: netdev Cc: davem, edumazet, kuba, pabeni, lorenzo.bianconi, bpf, hawk, toke, aleksander.lobakin, willemdebruijn.kernel, jasowang, sdf Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be processed by the eBPF program. This allow to add multi-buffer support for xdp running in generic mode. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- net/core/dev.c | 151 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 126 insertions(+), 25 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 4df68d7f04a2..ed827b443d48 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4853,6 +4853,12 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, xdp_init_buff(xdp, frame_sz, &rxqueue->xdp_rxq); xdp_prepare_buff(xdp, hard_start, skb_headroom(skb) - mac_len, skb_headlen(skb) + mac_len, true); + if (skb_is_nonlinear(skb)) { + skb_shinfo(skb)->xdp_frags_size = skb->data_len; + xdp_buff_set_frags_flag(xdp); + } else { + xdp_buff_clear_frags_flag(xdp); + } orig_data_end = xdp->data_end; orig_data = xdp->data; @@ -4882,6 +4888,14 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, skb->len += off; /* positive on grow, negative on shrink */ } + /* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers + * (e.g. bpf_xdp_adjust_tail), we need to update data_len here. + */ + if (xdp_buff_has_frags(xdp)) + skb->data_len = skb_shinfo(skb)->xdp_frags_size; + else + skb->data_len = 0; + /* check if XDP changed eth hdr such SKB needs update */ eth = (struct ethhdr *)xdp->data; if ((orig_eth_type != eth->h_proto) || @@ -4915,54 +4929,141 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, return act; } -static u32 netif_receive_generic_xdp(struct sk_buff **pskb, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) +static int netif_skb_check_for_generic_xdp(struct sk_buff **pskb, + struct bpf_prog *prog) { struct sk_buff *skb = *pskb; - u32 act = XDP_DROP; - - /* Reinjected packets coming from act_mirred or similar should - * not get XDP generic processing. - */ - if (skb_is_redirected(skb)) - return XDP_PASS; + int err; - /* XDP packets must be linear and must have sufficient headroom - * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also - * native XDP provides, thus we need to do it here as well. + /* XDP does not support fraglist so we need to linearize + * the skb. */ - if (skb_cloned(skb) || skb_is_nonlinear(skb) || - skb_headroom(skb) < XDP_PACKET_HEADROOM) { + if (skb_has_frag_list(skb) || !prog->aux->xdp_has_frags) { int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); int troom = skb->tail + skb->data_len - skb->end; /* In case we have to go down the path and also linearize, * then lets do the pskb_expand_head() work just once here. */ - if (pskb_expand_head(skb, - hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, - troom > 0 ? troom + 128 : 0, GFP_ATOMIC)) - goto do_drop; - if (skb_linearize(skb)) - goto do_drop; + err = pskb_expand_head(skb, + hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, + troom > 0 ? troom + 128 : 0, GFP_ATOMIC); + if (err) + return err; + + err = skb_linearize(skb); + if (err) + return err; + + return 0; } - act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog); + /* XDP packets must have sufficient headroom of XDP_PACKET_HEADROOM + * bytes. This is the guarantee that also native XDP provides, + * thus we need to do it here as well. + */ + if (skb_cloned(skb) || skb_shinfo(skb)->nr_frags || + skb_headroom(skb) < XDP_PACKET_HEADROOM) { + u32 mac_len = skb->data - skb_mac_header(skb); + u32 size, truesize, len, max_head_size, off; + struct sk_buff *nskb; + int i, head_off; + void *data; + + __skb_push(skb, mac_len); + max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE - + XDP_PACKET_HEADROOM); + if (skb->len > max_head_size + MAX_SKB_FRAGS * PAGE_SIZE) + return -ENOMEM; + + size = min_t(u32, skb->len, max_head_size); + truesize = SKB_HEAD_ALIGN(size) + XDP_PACKET_HEADROOM; + data = napi_alloc_frag(truesize); + if (!data) + return -ENOMEM; + + nskb = napi_build_skb(data, truesize); + if (!nskb) { + skb_free_frag(data); + return -ENOMEM; + } + + skb_reserve(nskb, XDP_PACKET_HEADROOM); + skb_copy_header(nskb, skb); + + err = skb_copy_bits(skb, 0, nskb->data, size); + if (err) { + consume_skb(nskb); + return err; + } + skb_put(nskb, size); + + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + + off = size; + len = skb->len - off; + for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { + struct page *page; + + size = min_t(u32, len, PAGE_SIZE); + data = napi_alloc_frag(size); + if (!data) { + consume_skb(nskb); + return -ENOMEM; + } + + page = virt_to_head_page(data); + skb_add_rx_frag(nskb, i, page, + data - page_address(page), size, size); + err = skb_copy_bits(skb, off, data, size); + if (err) { + consume_skb(nskb); + return err; + } + + len -= size; + off += size; + } + + consume_skb(skb); + *pskb = nskb; + __skb_pull(nskb, mac_len); + } + + return 0; +} + +static u32 netif_receive_generic_xdp(struct sk_buff **pskb, + struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) +{ + u32 act = XDP_DROP; + + /* Reinjected packets coming from act_mirred or similar should + * not get XDP generic processing. + */ + if (skb_is_redirected(*pskb)) + return XDP_PASS; + + if (netif_skb_check_for_generic_xdp(pskb, xdp_prog)) + goto do_drop; + + act = bpf_prog_run_generic_xdp(*pskb, xdp, xdp_prog); switch (act) { case XDP_REDIRECT: case XDP_TX: case XDP_PASS: break; default: - bpf_warn_invalid_xdp_action(skb->dev, xdp_prog, act); + bpf_warn_invalid_xdp_action((*pskb)->dev, xdp_prog, act); fallthrough; case XDP_ABORTED: - trace_xdp_exception(skb->dev, xdp_prog, act); + trace_xdp_exception((*pskb)->dev, xdp_prog, act); fallthrough; case XDP_DROP: do_drop: - kfree_skb(skb); + kfree_skb(*pskb); break; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-01 13:48 ` [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode Lorenzo Bianconi @ 2023-12-02 3:48 ` Jakub Kicinski 2023-12-04 15:43 ` Lorenzo Bianconi 0 siblings, 1 reply; 14+ messages in thread From: Jakub Kicinski @ 2023-12-02 3:48 UTC (permalink / raw) To: Lorenzo Bianconi, aleksander.lobakin Cc: netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, hawk, toke, willemdebruijn.kernel, jasowang, sdf On Fri, 1 Dec 2023 14:48:26 +0100 Lorenzo Bianconi wrote: > Similar to native xdp, do not always linearize the skb in > netif_receive_generic_xdp routine but create a non-linear xdp_buff to be > processed by the eBPF program. This allow to add multi-buffer support > for xdp running in generic mode. Hm. How close is the xdp generic code to veth? I wonder if it'd make sense to create a page pool instance for each core, we could then pass it into a common "reallocate skb into a page-pool backed, fragged form" helper. Common between this code and veth? Perhaps we could even get rid of the veth page pools and use the per cpu pools there? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-02 3:48 ` Jakub Kicinski @ 2023-12-04 15:43 ` Lorenzo Bianconi 2023-12-04 20:01 ` Jakub Kicinski 0 siblings, 1 reply; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-04 15:43 UTC (permalink / raw) To: Jakub Kicinski Cc: aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, hawk, toke, willemdebruijn.kernel, jasowang, sdf [-- Attachment #1: Type: text/plain, Size: 1878 bytes --] > On Fri, 1 Dec 2023 14:48:26 +0100 Lorenzo Bianconi wrote: > > Similar to native xdp, do not always linearize the skb in > > netif_receive_generic_xdp routine but create a non-linear xdp_buff to be > > processed by the eBPF program. This allow to add multi-buffer support > > for xdp running in generic mode. > > Hm. How close is the xdp generic code to veth? Actually they are quite close, the only difference is the use of page_pool vs page_frag_cache APIs. > I wonder if it'd make sense to create a page pool instance for each > core, we could then pass it into a common "reallocate skb into a > page-pool backed, fragged form" helper. Common between this code > and veth? Perhaps we could even get rid of the veth page pools > and use the per cpu pools there? yes, I was thinking about it actually. I run some preliminary tests to check if we are introducing any performance penalties or so. My setup relies on a couple of veth pairs and an eBPF program to perform XDP_REDIRECT from one pair to another one. I am running the program in xdp driver mode (not generic one). v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 v00: iperf3 client v11: iperf3 server I am run the test with different MTU valeus (1500B, 8KB, 64KB) net-next veth codebase: ======================= - MTU 1500: iperf3 ~ 4.37Gbps - MTU 8000: iperf3 ~ 9.75Gbps - MTU 64000: iperf3 ~ 11.24Gbps net-next veth codebase + page_frag_cache instead of page_pool: ============================================================== - MTU 1500: iperf3 ~ 4.99Gbps (+14%) - MTU 8000: iperf3 ~ 8.5Gbps (-12%) - MTU 64000: iperf3 ~ 11.9Gbps ( +6%) It seems there is no a clear win situation of using page_pool or page_frag_cache. What do you think? Regards, Lorenzo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-04 15:43 ` Lorenzo Bianconi @ 2023-12-04 20:01 ` Jakub Kicinski 2023-12-05 23:08 ` Lorenzo Bianconi 0 siblings, 1 reply; 14+ messages in thread From: Jakub Kicinski @ 2023-12-04 20:01 UTC (permalink / raw) To: Lorenzo Bianconi Cc: aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, hawk, toke, willemdebruijn.kernel, jasowang, sdf On Mon, 4 Dec 2023 16:43:56 +0100 Lorenzo Bianconi wrote: > yes, I was thinking about it actually. > I run some preliminary tests to check if we are introducing any performance > penalties or so. > My setup relies on a couple of veth pairs and an eBPF program to perform > XDP_REDIRECT from one pair to another one. I am running the program in xdp > driver mode (not generic one). > > v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 > > v00: iperf3 client > v11: iperf3 server > > I am run the test with different MTU valeus (1500B, 8KB, 64KB) > > net-next veth codebase: > ======================= > - MTU 1500: iperf3 ~ 4.37Gbps > - MTU 8000: iperf3 ~ 9.75Gbps > - MTU 64000: iperf3 ~ 11.24Gbps > > net-next veth codebase + page_frag_cache instead of page_pool: > ============================================================== > - MTU 1500: iperf3 ~ 4.99Gbps (+14%) > - MTU 8000: iperf3 ~ 8.5Gbps (-12%) > - MTU 64000: iperf3 ~ 11.9Gbps ( +6%) > > It seems there is no a clear win situation of using page_pool or > page_frag_cache. What do you think? Hm, interesting. Are the iperf processes running on different cores? May be worth pinning (both same and different) to make sure the cache effects are isolated. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-04 20:01 ` Jakub Kicinski @ 2023-12-05 23:08 ` Lorenzo Bianconi 2023-12-05 23:58 ` Jakub Kicinski 0 siblings, 1 reply; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-05 23:08 UTC (permalink / raw) To: Jakub Kicinski Cc: aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, hawk, toke, willemdebruijn.kernel, jasowang, sdf [-- Attachment #1: Type: text/plain, Size: 7151 bytes --] > On Mon, 4 Dec 2023 16:43:56 +0100 Lorenzo Bianconi wrote: > > yes, I was thinking about it actually. > > I run some preliminary tests to check if we are introducing any performance > > penalties or so. > > My setup relies on a couple of veth pairs and an eBPF program to perform > > XDP_REDIRECT from one pair to another one. I am running the program in xdp > > driver mode (not generic one). > > > > v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 > > > > v00: iperf3 client > > v11: iperf3 server > > > > I am run the test with different MTU valeus (1500B, 8KB, 64KB) > > > > net-next veth codebase: > > ======================= > > - MTU 1500: iperf3 ~ 4.37Gbps > > - MTU 8000: iperf3 ~ 9.75Gbps > > - MTU 64000: iperf3 ~ 11.24Gbps > > > > net-next veth codebase + page_frag_cache instead of page_pool: > > ============================================================== > > - MTU 1500: iperf3 ~ 4.99Gbps (+14%) > > - MTU 8000: iperf3 ~ 8.5Gbps (-12%) > > - MTU 64000: iperf3 ~ 11.9Gbps ( +6%) > > > > It seems there is no a clear win situation of using page_pool or > > page_frag_cache. What do you think? > > Hm, interesting. Are the iperf processes running on different cores? > May be worth pinning (both same and different) to make sure the cache > effects are isolated. Hi Jakub, I carried out some more tests today based on your suggestion on both veth driver and xdp_generic codebase (on a more powerful system). Test setup: v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 ==(XDP_REDIRECT)==> v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 - v00: iperf3 client (pinned on core 0) - v11: iperf3 server (pinned on core 7) net-next veth codebase (page_pool APIs): ======================================= - MTU 1500: ~ 5.42 Gbps - MTU 8000: ~ 14.1 Gbps - MTU 64000: ~ 18.4 Gbps net-next veth codebase + page_frag_cahe APIs [0]: ================================================= - MTU 1500: ~ 6.62 Gbps - MTU 8000: ~ 14.7 Gbps - MTU 64000: ~ 19.7 Gbps xdp_generic codebase + page_frag_cahe APIs (current proposed patch): ==================================================================== - MTU 1500: ~ 6.41 Gbps - MTU 8000: ~ 14.2 Gbps - MTU 64000: ~ 19.8 Gbps xdp_generic codebase + page_frag_cahe APIs [1]: =============================================== - MTU 1500: ~ 5.75 Gbps - MTU 8000: ~ 15.3 Gbps - MTU 64000: ~ 21.2 Gbps It seems page_pool APIs are working better for xdp_generic codebase (except MTU 1500 case) while page_frag_cache APIs are better for veth driver. What do you think? Am I missing something? Regards, Lorenzo [0] Here I have just used napi_alloc_frag() instead of page_pool_dev_alloc_va()/page_pool_dev_alloc() in veth_convert_skb_to_xdp_buff() [1] I developed this PoC to use page_pool APIs for xdp_generic code: diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h index cdcafb30d437..5115b61f38f1 100644 --- a/include/net/netdev_rx_queue.h +++ b/include/net/netdev_rx_queue.h @@ -21,6 +21,7 @@ struct netdev_rx_queue { #ifdef CONFIG_XDP_SOCKETS struct xsk_buff_pool *pool; #endif + struct page_pool *page_pool; } ____cacheline_aligned_in_smp; /* diff --git a/net/core/dev.c b/net/core/dev.c index ed827b443d48..06fb568427c4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -153,6 +153,8 @@ #include <linux/prandom.h> #include <linux/once_lite.h> #include <net/netdev_rx_queue.h> +#include <net/page_pool/types.h> +#include <net/page_pool/helpers.h> #include "dev.h" #include "net-sysfs.h" @@ -4964,6 +4966,7 @@ static int netif_skb_check_for_generic_xdp(struct sk_buff **pskb, */ if (skb_cloned(skb) || skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { + struct netdev_rx_queue *rxq = netif_get_rxqueue(skb); u32 mac_len = skb->data - skb_mac_header(skb); u32 size, truesize, len, max_head_size, off; struct sk_buff *nskb; @@ -4978,18 +4981,19 @@ static int netif_skb_check_for_generic_xdp(struct sk_buff **pskb, size = min_t(u32, skb->len, max_head_size); truesize = SKB_HEAD_ALIGN(size) + XDP_PACKET_HEADROOM; - data = napi_alloc_frag(truesize); + data = page_pool_dev_alloc_va(rxq->page_pool, &truesize); if (!data) return -ENOMEM; nskb = napi_build_skb(data, truesize); if (!nskb) { - skb_free_frag(data); + page_pool_free_va(rxq->page_pool, data, true); return -ENOMEM; } skb_reserve(nskb, XDP_PACKET_HEADROOM); skb_copy_header(nskb, skb); + skb_mark_for_recycle(nskb); err = skb_copy_bits(skb, 0, nskb->data, size); if (err) { @@ -5005,18 +5009,21 @@ static int netif_skb_check_for_generic_xdp(struct sk_buff **pskb, len = skb->len - off; for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { struct page *page; + u32 page_off; size = min_t(u32, len, PAGE_SIZE); - data = napi_alloc_frag(size); + truesize = size; + page = page_pool_dev_alloc(rxq->page_pool, &page_off, + &truesize); if (!data) { consume_skb(nskb); return -ENOMEM; } - page = virt_to_head_page(data); - skb_add_rx_frag(nskb, i, page, - data - page_address(page), size, size); - err = skb_copy_bits(skb, off, data, size); + skb_add_rx_frag(nskb, i, page, page_off, size, truesize); + err = skb_copy_bits(skb, off, + page_address(page) + page_off, + size); if (err) { consume_skb(nskb); return err; @@ -10057,6 +10064,11 @@ EXPORT_SYMBOL(netif_stacked_transfer_operstate); static int netif_alloc_rx_queues(struct net_device *dev) { unsigned int i, count = dev->num_rx_queues; + struct page_pool_params page_pool_params = { + .pool_size = 256, + .nid = NUMA_NO_NODE, + .dev = &dev->dev, + }; struct netdev_rx_queue *rx; size_t sz = count * sizeof(*rx); int err = 0; @@ -10075,14 +10087,25 @@ static int netif_alloc_rx_queues(struct net_device *dev) /* XDP RX-queue setup */ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0); if (err < 0) - goto err_rxq_info; + goto err_rxq; + + /* rx queue page pool allocator */ + rx[i].page_pool = page_pool_create(&page_pool_params); + if (IS_ERR(rx[i].page_pool)) { + rx[i].page_pool = NULL; + goto err_rxq; + } } return 0; -err_rxq_info: +err_rxq: /* Rollback successful reg's and free other resources */ - while (i--) + while (i--) { xdp_rxq_info_unreg(&rx[i].xdp_rxq); + if (rx[i].page_pool) + page_pool_destroy(rx[i].page_pool); + } + kvfree(dev->_rx); dev->_rx = NULL; return err; @@ -10096,8 +10119,11 @@ static void netif_free_rx_queues(struct net_device *dev) if (!dev->_rx) return; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { xdp_rxq_info_unreg(&dev->_rx[i].xdp_rxq); + if (dev->_rx[i].page_pool) + page_pool_destroy(dev->_rx[i].page_pool); + } kvfree(dev->_rx); } -- 2.43.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-05 23:08 ` Lorenzo Bianconi @ 2023-12-05 23:58 ` Jakub Kicinski 2023-12-06 12:41 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 14+ messages in thread From: Jakub Kicinski @ 2023-12-05 23:58 UTC (permalink / raw) To: Lorenzo Bianconi Cc: aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, hawk, toke, willemdebruijn.kernel, jasowang, sdf On Wed, 6 Dec 2023 00:08:15 +0100 Lorenzo Bianconi wrote: > v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 ==(XDP_REDIRECT)==> v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 > > - v00: iperf3 client (pinned on core 0) > - v11: iperf3 server (pinned on core 7) > > net-next veth codebase (page_pool APIs): > ======================================= > - MTU 1500: ~ 5.42 Gbps > - MTU 8000: ~ 14.1 Gbps > - MTU 64000: ~ 18.4 Gbps > > net-next veth codebase + page_frag_cahe APIs [0]: > ================================================= > - MTU 1500: ~ 6.62 Gbps > - MTU 8000: ~ 14.7 Gbps > - MTU 64000: ~ 19.7 Gbps > > xdp_generic codebase + page_frag_cahe APIs (current proposed patch): > ==================================================================== > - MTU 1500: ~ 6.41 Gbps > - MTU 8000: ~ 14.2 Gbps > - MTU 64000: ~ 19.8 Gbps > > xdp_generic codebase + page_frag_cahe APIs [1]: > =============================================== This one should say page pool? > - MTU 1500: ~ 5.75 Gbps > - MTU 8000: ~ 15.3 Gbps > - MTU 64000: ~ 21.2 Gbps > > It seems page_pool APIs are working better for xdp_generic codebase > (except MTU 1500 case) while page_frag_cache APIs are better for > veth driver. What do you think? Am I missing something? IDK the details of veth XDP very well but IIUC they are pretty much the same. Are there any clues in perf -C 0 / 7? > [0] Here I have just used napi_alloc_frag() instead of > page_pool_dev_alloc_va()/page_pool_dev_alloc() in > veth_convert_skb_to_xdp_buff() > > [1] I developed this PoC to use page_pool APIs for xdp_generic code: Why not put the page pool in softnet_data? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-05 23:58 ` Jakub Kicinski @ 2023-12-06 12:41 ` Jesper Dangaard Brouer 2023-12-06 13:51 ` Lorenzo Bianconi 2023-12-06 16:03 ` Jakub Kicinski 0 siblings, 2 replies; 14+ messages in thread From: Jesper Dangaard Brouer @ 2023-12-06 12:41 UTC (permalink / raw) To: Jakub Kicinski, Lorenzo Bianconi Cc: aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf On 12/6/23 00:58, Jakub Kicinski wrote: > On Wed, 6 Dec 2023 00:08:15 +0100 Lorenzo Bianconi wrote: >> v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 ==(XDP_REDIRECT)==> v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 >> >> - v00: iperf3 client (pinned on core 0) >> - v11: iperf3 server (pinned on core 7) >> >> net-next veth codebase (page_pool APIs): >> ======================================= >> - MTU 1500: ~ 5.42 Gbps >> - MTU 8000: ~ 14.1 Gbps >> - MTU 64000: ~ 18.4 Gbps >> >> net-next veth codebase + page_frag_cahe APIs [0]: >> ================================================= >> - MTU 1500: ~ 6.62 Gbps >> - MTU 8000: ~ 14.7 Gbps >> - MTU 64000: ~ 19.7 Gbps >> >> xdp_generic codebase + page_frag_cahe APIs (current proposed patch): >> ==================================================================== >> - MTU 1500: ~ 6.41 Gbps >> - MTU 8000: ~ 14.2 Gbps >> - MTU 64000: ~ 19.8 Gbps >> >> xdp_generic codebase + page_frag_cahe APIs [1]: >> =============================================== > > This one should say page pool? > >> - MTU 1500: ~ 5.75 Gbps >> - MTU 8000: ~ 15.3 Gbps >> - MTU 64000: ~ 21.2 Gbps >> >> It seems page_pool APIs are working better for xdp_generic codebase >> (except MTU 1500 case) while page_frag_cache APIs are better for >> veth driver. What do you think? Am I missing something? > > IDK the details of veth XDP very well but IIUC they are pretty much > the same. Are there any clues in perf -C 0 / 7? > >> [0] Here I have just used napi_alloc_frag() instead of >> page_pool_dev_alloc_va()/page_pool_dev_alloc() in >> veth_convert_skb_to_xdp_buff() >> >> [1] I developed this PoC to use page_pool APIs for xdp_generic code: > > Why not put the page pool in softnet_data? First I thought cool that Jakub is suggesting softnet_data, which will make page_pool (PP) even more central as the netstacks memory layer. BUT then I realized that PP have a weakness, which is the return/free path that need to take a normal spin_lock, as that can be called from any CPU (unlike the RX/alloc case). Thus, I fear that making multiple devices share a page_pool via softnet_data, increase the chance of lock contention when packets are "freed" returned/recycled. --Jesper p.s. PP have the page_pool_put_page_bulk() API, but only XDP (NIC-drivers) leverage this. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-06 12:41 ` Jesper Dangaard Brouer @ 2023-12-06 13:51 ` Lorenzo Bianconi 2023-12-06 16:03 ` Jakub Kicinski 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-06 13:51 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Jakub Kicinski, aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf [-- Attachment #1: Type: text/plain, Size: 2942 bytes --] > > > On 12/6/23 00:58, Jakub Kicinski wrote: > > On Wed, 6 Dec 2023 00:08:15 +0100 Lorenzo Bianconi wrote: > > > v00 (NS:ns0 - 192.168.0.1/24) <---> (NS:ns1 - 192.168.0.2/24) v01 ==(XDP_REDIRECT)==> v10 (NS:ns1 - 192.168.1.1/24) <---> (NS:ns2 - 192.168.1.2/24) v11 > > > > > > - v00: iperf3 client (pinned on core 0) > > > - v11: iperf3 server (pinned on core 7) > > > > > > net-next veth codebase (page_pool APIs): > > > ======================================= > > > - MTU 1500: ~ 5.42 Gbps > > > - MTU 8000: ~ 14.1 Gbps > > > - MTU 64000: ~ 18.4 Gbps > > > > > > net-next veth codebase + page_frag_cahe APIs [0]: > > > ================================================= > > > - MTU 1500: ~ 6.62 Gbps > > > - MTU 8000: ~ 14.7 Gbps > > > - MTU 64000: ~ 19.7 Gbps > > > > > > xdp_generic codebase + page_frag_cahe APIs (current proposed patch): > > > ==================================================================== > > > - MTU 1500: ~ 6.41 Gbps > > > - MTU 8000: ~ 14.2 Gbps > > > - MTU 64000: ~ 19.8 Gbps > > > > > > xdp_generic codebase + page_frag_cahe APIs [1]: > > > =============================================== > > > > This one should say page pool? yep, sorry > > > > > - MTU 1500: ~ 5.75 Gbps > > > - MTU 8000: ~ 15.3 Gbps > > > - MTU 64000: ~ 21.2 Gbps > > > > > > It seems page_pool APIs are working better for xdp_generic codebase > > > (except MTU 1500 case) while page_frag_cache APIs are better for > > > veth driver. What do you think? Am I missing something? > > > > IDK the details of veth XDP very well but IIUC they are pretty much > > the same. Are there any clues in perf -C 0 / 7? > > > > > [0] Here I have just used napi_alloc_frag() instead of > > > page_pool_dev_alloc_va()/page_pool_dev_alloc() in > > > veth_convert_skb_to_xdp_buff() > > > > > > [1] I developed this PoC to use page_pool APIs for xdp_generic code: > > > > Why not put the page pool in softnet_data? > > First I thought cool that Jakub is suggesting softnet_data, which will > make page_pool (PP) even more central as the netstacks memory layer. > > BUT then I realized that PP have a weakness, which is the return/free > path that need to take a normal spin_lock, as that can be called from > any CPU (unlike the RX/alloc case). Thus, I fear that making multiple > devices share a page_pool via softnet_data, increase the chance of lock > contention when packets are "freed" returned/recycled. yep, afaik skb_attempt_defer_free() is used just by the tcp stack so far (e.g. we will have contention for udp). moreover it seems page_pool return path is not so optimized for the percpu approach (we have a lot of atomic read/write operations and page_pool stats are already implemented as percpu variables). Regards, Lorenzo > > --Jesper > > p.s. PP have the page_pool_put_page_bulk() API, but only XDP (NIC-drivers) > leverage this. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-06 12:41 ` Jesper Dangaard Brouer 2023-12-06 13:51 ` Lorenzo Bianconi @ 2023-12-06 16:03 ` Jakub Kicinski 2023-12-09 19:23 ` Lorenzo Bianconi 1 sibling, 1 reply; 14+ messages in thread From: Jakub Kicinski @ 2023-12-06 16:03 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Lorenzo Bianconi, aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf On Wed, 6 Dec 2023 13:41:49 +0100 Jesper Dangaard Brouer wrote: > BUT then I realized that PP have a weakness, which is the return/free > path that need to take a normal spin_lock, as that can be called from > any CPU (unlike the RX/alloc case). Thus, I fear that making multiple > devices share a page_pool via softnet_data, increase the chance of lock > contention when packets are "freed" returned/recycled. I was thinking we can add a pcpu CPU ID to page pool so that napi_pp_put_page() has a chance to realize that its on the "right CPU" and feed the cache directly. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-06 16:03 ` Jakub Kicinski @ 2023-12-09 19:23 ` Lorenzo Bianconi 2023-12-11 17:00 ` Jakub Kicinski 0 siblings, 1 reply; 14+ messages in thread From: Lorenzo Bianconi @ 2023-12-09 19:23 UTC (permalink / raw) To: Jakub Kicinski Cc: Jesper Dangaard Brouer, aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf [-- Attachment #1: Type: text/plain, Size: 976 bytes --] > On Wed, 6 Dec 2023 13:41:49 +0100 Jesper Dangaard Brouer wrote: > > BUT then I realized that PP have a weakness, which is the return/free > > path that need to take a normal spin_lock, as that can be called from > > any CPU (unlike the RX/alloc case). Thus, I fear that making multiple > > devices share a page_pool via softnet_data, increase the chance of lock > > contention when packets are "freed" returned/recycled. > > I was thinking we can add a pcpu CPU ID to page pool so that > napi_pp_put_page() has a chance to realize that its on the "right CPU" > and feed the cache directly. Are we going to use these page_pools just for virtual devices (e.g. veth) or even for hw NICs? If we do not bound the page_pool to a netdevice I think we can't rely on it to DMA map/unmap the buffer, right? Moreover, are we going to rework page_pool stats first? It seems a bit weird to have a percpu struct with a percpu pointer in it, right? Regards, Lorenzo [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-09 19:23 ` Lorenzo Bianconi @ 2023-12-11 17:00 ` Jakub Kicinski 2023-12-12 8:36 ` Paolo Abeni 0 siblings, 1 reply; 14+ messages in thread From: Jakub Kicinski @ 2023-12-11 17:00 UTC (permalink / raw) To: Lorenzo Bianconi Cc: Jesper Dangaard Brouer, aleksander.lobakin, netdev, davem, edumazet, pabeni, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf On Sat, 9 Dec 2023 20:23:09 +0100 Lorenzo Bianconi wrote: > Are we going to use these page_pools just for virtual devices (e.g. veth) or > even for hw NICs? If we do not bound the page_pool to a netdevice I think we > can't rely on it to DMA map/unmap the buffer, right? Right, I don't think it's particularly useful for HW NICs. Maybe for allocating skb heads? We could possibly kill struct page_frag_1k and use PP page / frag instead. But not sure how Eric would react :) > Moreover, are we going to rework page_pool stats first? It seems a bit weird to > have a percpu struct with a percpu pointer in it, right? The per-CPU stuff is for recycling, IIRC. Even if PP is for a single CPU we can still end up freeing packets which used its pages anywhere in the system. I don't disagree that we may end up with a lot of stats on a large system, but seems tangential to per-cpu page pools. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode 2023-12-11 17:00 ` Jakub Kicinski @ 2023-12-12 8:36 ` Paolo Abeni 0 siblings, 0 replies; 14+ messages in thread From: Paolo Abeni @ 2023-12-12 8:36 UTC (permalink / raw) To: Jakub Kicinski, Lorenzo Bianconi Cc: Jesper Dangaard Brouer, aleksander.lobakin, netdev, davem, edumazet, lorenzo.bianconi, bpf, toke, willemdebruijn.kernel, jasowang, sdf On Mon, 2023-12-11 at 09:00 -0800, Jakub Kicinski wrote: > On Sat, 9 Dec 2023 20:23:09 +0100 Lorenzo Bianconi wrote: > > Are we going to use these page_pools just for virtual devices (e.g. veth) or > > even for hw NICs? If we do not bound the page_pool to a netdevice I think we > > can't rely on it to DMA map/unmap the buffer, right? > > Right, I don't think it's particularly useful for HW NICs. > Maybe for allocating skb heads? We could possibly kill > struct page_frag_1k and use PP page / frag instead. > But not sure how Eric would react :) Side note here: we have a dedicated kmem_cache for typical skb head allocation since commit bf9f1baa279f0758dc2297080360c5a616843927 - where Eric mentioned we could possibly remove the page_frag_1k after that (on my todo list since forever, sorry). Cheers, Paolo > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-12-12 8:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-01 13:48 [PATCH v3 net-next 0/2] add multi-buff support for xdp running in generic mode Lorenzo Bianconi 2023-12-01 13:48 ` [PATCH v3 net-next 1/2] xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp Lorenzo Bianconi 2023-12-01 13:48 ` [PATCH v3 net-next 2/2] xdp: add multi-buff support for xdp running in generic mode Lorenzo Bianconi 2023-12-02 3:48 ` Jakub Kicinski 2023-12-04 15:43 ` Lorenzo Bianconi 2023-12-04 20:01 ` Jakub Kicinski 2023-12-05 23:08 ` Lorenzo Bianconi 2023-12-05 23:58 ` Jakub Kicinski 2023-12-06 12:41 ` Jesper Dangaard Brouer 2023-12-06 13:51 ` Lorenzo Bianconi 2023-12-06 16:03 ` Jakub Kicinski 2023-12-09 19:23 ` Lorenzo Bianconi 2023-12-11 17:00 ` Jakub Kicinski 2023-12-12 8:36 ` Paolo Abeni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).