* [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit()
@ 2023-07-04 8:29 wei.fang
2023-07-04 8:29 ` [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP wei.fang
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: wei.fang @ 2023-07-04 8:29 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend,
shenwei.wang, xiaoning.wang, netdev
Cc: linux-imx, linux-kernel, bpf
From: Wei Fang <wei.fang@nxp.com>
We encountered some issues when testing the ndo_xdp_xmit() interface
of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
easy to reproduce, and the specific reproduction steps are as follows.
step1: The ethernet port of a board (board A) is connected to the EQOS
port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
to another ethernet port, such as a switch port.
step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
and send packets to i.MX8MP/i.MX93. The command is shown below.
./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
56:bf:0d:68:b0:9e -s 1500
step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
XDP frames from EQOS port to FEC port. The command is shown below.
./xdp_redirect eth1 eth0
After a few moments, the warning or error logs will be printed in the
console, for more details, please refer to the commit message of each
patch.
Wei Fang (3):
net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP
net: fec: recycle pages for transmitted XDP frames
net: fec: increase the size of tx ring and update thresholds of tx
ring
drivers/net/ethernet/freescale/fec.h | 17 ++-
drivers/net/ethernet/freescale/fec_main.c | 168 +++++++++++++++-------
2 files changed, 128 insertions(+), 57 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP 2023-07-04 8:29 [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit() wei.fang @ 2023-07-04 8:29 ` wei.fang 2023-07-04 23:41 ` Andrew Lunn 2023-07-04 8:29 ` [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames wei.fang 2023-07-04 8:29 ` [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring wei.fang 2 siblings, 1 reply; 18+ messages in thread From: wei.fang @ 2023-07-04 8:29 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev Cc: linux-imx, linux-kernel, bpf From: Wei Fang <wei.fang@nxp.com> When a XDP program is installed or uninstalled, fec_restart() will be invoked to reset MAC and buffer descriptor rings. It's reasonable not to transmit any packet during the process of reset. However, the NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default, that is to say, it's possible that the fec_enet_xdp_xmit() will be invoked even if the process of reset is not finished. In this case, the redirected XDP frames might be dropped and available transmit BDs may be incorrectly deemed insufficient. So this patch disable the NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure this feature when the bpf program is installed or uninstalled. Fixes: e4ac7cc6e5a4 ("net: fec: turn on XDP features") Signed-off-by: Wei Fang <wei.fang@nxp.com> --- drivers/net/ethernet/freescale/fec_main.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 8fbe47703d47..9ce0319b33c3 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3732,12 +3732,18 @@ static int fec_enet_bpf(struct net_device *dev, struct netdev_bpf *bpf) if (fep->quirks & FEC_QUIRK_SWAP_FRAME) return -EOPNOTSUPP; + if (!bpf->prog) + xdp_features_clear_redirect_target(dev); + if (is_run) { napi_disable(&fep->napi); netif_tx_disable(dev); } old_prog = xchg(&fep->xdp_prog, bpf->prog); + if (old_prog) + bpf_prog_put(old_prog); + fec_restart(dev); if (is_run) { @@ -3745,8 +3751,8 @@ static int fec_enet_bpf(struct net_device *dev, struct netdev_bpf *bpf) netif_tx_start_all_queues(dev); } - if (old_prog) - bpf_prog_put(old_prog); + if (bpf->prog) + xdp_features_set_redirect_target(dev, false); return 0; @@ -4016,8 +4022,7 @@ static int fec_enet_init(struct net_device *ndev) if (!(fep->quirks & FEC_QUIRK_SWAP_FRAME)) ndev->xdp_features = NETDEV_XDP_ACT_BASIC | - NETDEV_XDP_ACT_REDIRECT | - NETDEV_XDP_ACT_NDO_XMIT; + NETDEV_XDP_ACT_REDIRECT; fec_restart(ndev); -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP 2023-07-04 8:29 ` [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP wei.fang @ 2023-07-04 23:41 ` Andrew Lunn 2023-07-04 23:54 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 18+ messages in thread From: Andrew Lunn @ 2023-07-04 23:41 UTC (permalink / raw) To: wei.fang Cc: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev, linux-imx, linux-kernel, bpf On Tue, Jul 04, 2023 at 04:29:14PM +0800, wei.fang@nxp.com wrote: > From: Wei Fang <wei.fang@nxp.com> > > When a XDP program is installed or uninstalled, fec_restart() will > be invoked to reset MAC and buffer descriptor rings. It's reasonable > not to transmit any packet during the process of reset. However, the > NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default, > that is to say, it's possible that the fec_enet_xdp_xmit() will be > invoked even if the process of reset is not finished. In this case, > the redirected XDP frames might be dropped and available transmit BDs > may be incorrectly deemed insufficient. So this patch disable the > NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure > this feature when the bpf program is installed or uninstalled. I don't know much about XDP, so please excuse what might be a stupid question. Is this a generic issue? Should this xdp_features_clear_redirect_target(dev) / xdp_features_set_redirect_target(dev, false) be done in the core? Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP 2023-07-04 23:41 ` Andrew Lunn @ 2023-07-04 23:54 ` Toke Høiland-Jørgensen 2023-07-05 6:31 ` Wei Fang 0 siblings, 1 reply; 18+ messages in thread From: Toke Høiland-Jørgensen @ 2023-07-04 23:54 UTC (permalink / raw) To: Andrew Lunn, wei.fang Cc: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev, linux-imx, linux-kernel, bpf Andrew Lunn <andrew@lunn.ch> writes: > On Tue, Jul 04, 2023 at 04:29:14PM +0800, wei.fang@nxp.com wrote: >> From: Wei Fang <wei.fang@nxp.com> >> >> When a XDP program is installed or uninstalled, fec_restart() will >> be invoked to reset MAC and buffer descriptor rings. It's reasonable >> not to transmit any packet during the process of reset. However, the >> NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default, >> that is to say, it's possible that the fec_enet_xdp_xmit() will be >> invoked even if the process of reset is not finished. In this case, >> the redirected XDP frames might be dropped and available transmit BDs >> may be incorrectly deemed insufficient. So this patch disable the >> NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure >> this feature when the bpf program is installed or uninstalled. > > I don't know much about XDP, so please excuse what might be a stupid > question. > > Is this a generic issue? Should this > xdp_features_clear_redirect_target(dev) / > xdp_features_set_redirect_target(dev, false) be done in the core? No, because not all drivers require an XDP program to be attached to support being a redirect target (which is one of the reasons we introduced these feature bits in the first place :)). -Toke ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP 2023-07-04 23:54 ` Toke Høiland-Jørgensen @ 2023-07-05 6:31 ` Wei Fang 0 siblings, 0 replies; 18+ messages in thread From: Wei Fang @ 2023-07-05 6:31 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Andrew Lunn Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Toke Høiland-Jørgensen <toke@redhat.com> > Sent: 2023年7月5日 7:54 > To: Andrew Lunn <andrew@lunn.ch>; Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net; > hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang > <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>; > netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>; > linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 1/3] net: fec: dynamically set the > NETDEV_XDP_ACT_NDO_XMIT feature of XDP > > Andrew Lunn <andrew@lunn.ch> writes: > > > On Tue, Jul 04, 2023 at 04:29:14PM +0800, wei.fang@nxp.com wrote: > >> From: Wei Fang <wei.fang@nxp.com> > >> > >> When a XDP program is installed or uninstalled, fec_restart() will be > >> invoked to reset MAC and buffer descriptor rings. It's reasonable not > >> to transmit any packet during the process of reset. However, the > >> NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default, > >> that is to say, it's possible that the fec_enet_xdp_xmit() will be > >> invoked even if the process of reset is not finished. In this case, > >> the redirected XDP frames might be dropped and available transmit BDs > >> may be incorrectly deemed insufficient. So this patch disable the > >> NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically > configure > >> this feature when the bpf program is installed or uninstalled. > > > > I don't know much about XDP, so please excuse what might be a stupid > > question. > > > > Is this a generic issue? Should this > > xdp_features_clear_redirect_target(dev) / > > xdp_features_set_redirect_target(dev, false) be done in the core? > > No, because not all drivers require an XDP program to be attached to support > being a redirect target (which is one of the reasons we introduced these > feature bits in the first place :)). > Hi Toke, Thanks for your explanation so much. :) ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames 2023-07-04 8:29 [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit() wei.fang 2023-07-04 8:29 ` [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP wei.fang @ 2023-07-04 8:29 ` wei.fang 2023-07-04 23:48 ` Andrew Lunn 2023-07-04 8:29 ` [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring wei.fang 2 siblings, 1 reply; 18+ messages in thread From: wei.fang @ 2023-07-04 8:29 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev Cc: linux-imx, linux-kernel, bpf From: Wei Fang <wei.fang@nxp.com> Once the XDP frames have been successfully transmitted through the ndo_xdp_xmit() interface, it's the driver responsibility to free the frames so that the page_pool can recycle the pages and reuse them. However, this action is not implemented in the fec driver. This leads to a user-visible problem that the console will print the following warning log. [ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec [ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec [ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec [ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec [ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec Therefore, to solve this issue, we free XDP frames via xdp_return_frame() while cleaning the tx BD ring. Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> --- drivers/net/ethernet/freescale/fec.h | 15 ++- drivers/net/ethernet/freescale/fec_main.c | 148 +++++++++++++++------- 2 files changed, 115 insertions(+), 48 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 9939ccafb556..8c0226d061fe 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -544,10 +544,23 @@ enum { XDP_STATS_TOTAL, }; +enum fec_txbuf_type { + FEC_TXBUF_T_SKB, + FEC_TXBUF_T_XDP_NDO, +}; + +struct fec_tx_buffer { + union { + struct sk_buff *skb; + struct xdp_frame *xdp; + }; + enum fec_txbuf_type type; +}; + struct fec_enet_priv_tx_q { struct bufdesc_prop bd; unsigned char *tx_bounce[TX_RING_SIZE]; - struct sk_buff *tx_skbuff[TX_RING_SIZE]; + struct fec_tx_buffer tx_buf[TX_RING_SIZE]; unsigned short tx_stop_threshold; unsigned short tx_wake_threshold; diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 9ce0319b33c3..940d3afe1d24 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -397,7 +397,7 @@ static void fec_dump(struct net_device *ndev) fec16_to_cpu(bdp->cbd_sc), fec32_to_cpu(bdp->cbd_bufaddr), fec16_to_cpu(bdp->cbd_datlen), - txq->tx_skbuff[index]); + txq->tx_buf[index].skb); bdp = fec_enet_get_nextdesc(bdp, &txq->bd); index++; } while (bdp != txq->bd.base); @@ -654,7 +654,7 @@ static int fec_enet_txq_submit_skb(struct fec_enet_priv_tx_q *txq, index = fec_enet_get_bd_index(last_bdp, &txq->bd); /* Save skb pointer */ - txq->tx_skbuff[index] = skb; + txq->tx_buf[index].skb = skb; /* Make sure the updates to rest of the descriptor are performed before * transferring ownership. @@ -672,9 +672,7 @@ static int fec_enet_txq_submit_skb(struct fec_enet_priv_tx_q *txq, skb_tx_timestamp(skb); - /* Make sure the update to bdp and tx_skbuff are performed before - * txq->bd.cur. - */ + /* Make sure the update to bdp is performed before txq->bd.cur. */ wmb(); txq->bd.cur = bdp; @@ -862,7 +860,7 @@ static int fec_enet_txq_submit_tso(struct fec_enet_priv_tx_q *txq, } /* Save skb pointer */ - txq->tx_skbuff[index] = skb; + txq->tx_buf[index].skb = skb; skb_tx_timestamp(skb); txq->bd.cur = bdp; @@ -952,16 +950,33 @@ static void fec_enet_bd_init(struct net_device *dev) for (i = 0; i < txq->bd.ring_size; i++) { /* Initialize the BD for every fragment in the page. */ bdp->cbd_sc = cpu_to_fec16(0); - if (bdp->cbd_bufaddr && - !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) - dma_unmap_single(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), - fec16_to_cpu(bdp->cbd_datlen), - DMA_TO_DEVICE); - if (txq->tx_skbuff[i]) { - dev_kfree_skb_any(txq->tx_skbuff[i]); - txq->tx_skbuff[i] = NULL; + if (txq->tx_buf[i].type == FEC_TXBUF_T_SKB) { + if (bdp->cbd_bufaddr && + !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) + dma_unmap_single(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); + if (txq->tx_buf[i].skb) { + dev_kfree_skb_any(txq->tx_buf[i].skb); + txq->tx_buf[i].skb = NULL; + } + } else { + if (bdp->cbd_bufaddr) + dma_unmap_single(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); + + if (txq->tx_buf[i].xdp) { + xdp_return_frame(txq->tx_buf[i].xdp); + txq->tx_buf[i].xdp = NULL; + } + + /* restore default tx buffer type: FEC_TXBUF_T_SKB */ + txq->tx_buf[i].type = FEC_TXBUF_T_SKB; } + bdp->cbd_bufaddr = cpu_to_fec32(0); bdp = fec_enet_get_nextdesc(bdp, &txq->bd); } @@ -1360,6 +1375,7 @@ static void fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) { struct fec_enet_private *fep; + struct xdp_frame *xdpf; struct bufdesc *bdp; unsigned short status; struct sk_buff *skb; @@ -1387,16 +1403,31 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) index = fec_enet_get_bd_index(bdp, &txq->bd); - skb = txq->tx_skbuff[index]; - txq->tx_skbuff[index] = NULL; - if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) - dma_unmap_single(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), - fec16_to_cpu(bdp->cbd_datlen), - DMA_TO_DEVICE); - bdp->cbd_bufaddr = cpu_to_fec32(0); - if (!skb) - goto skb_done; + if (txq->tx_buf[index].type == FEC_TXBUF_T_SKB) { + skb = txq->tx_buf[index].skb; + txq->tx_buf[index].skb = NULL; + if (bdp->cbd_bufaddr && + !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) + dma_unmap_single(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); + bdp->cbd_bufaddr = cpu_to_fec32(0); + if (!skb) + goto tx_buf_done; + } else { + xdpf = txq->tx_buf[index].xdp; + if (bdp->cbd_bufaddr) + dma_unmap_single(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); + bdp->cbd_bufaddr = cpu_to_fec32(0); + if (!xdpf) { + txq->tx_buf[index].type = FEC_TXBUF_T_SKB; + goto tx_buf_done; + } + } /* Check for errors. */ if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC | @@ -1415,21 +1446,11 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) ndev->stats.tx_carrier_errors++; } else { ndev->stats.tx_packets++; - ndev->stats.tx_bytes += skb->len; - } - /* NOTE: SKBTX_IN_PROGRESS being set does not imply it's we who - * are to time stamp the packet, so we still need to check time - * stamping enabled flag. - */ - if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS && - fep->hwts_tx_en) && - fep->bufdesc_ex) { - struct skb_shared_hwtstamps shhwtstamps; - struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp; - - fec_enet_hwtstamp(fep, fec32_to_cpu(ebdp->ts), &shhwtstamps); - skb_tstamp_tx(skb, &shhwtstamps); + if (txq->tx_buf[index].type == FEC_TXBUF_T_SKB) + ndev->stats.tx_bytes += skb->len; + else + ndev->stats.tx_bytes += xdpf->len; } /* Deferred means some collisions occurred during transmit, @@ -1438,10 +1459,32 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) if (status & BD_ENET_TX_DEF) ndev->stats.collisions++; - /* Free the sk buffer associated with this last transmit */ - dev_kfree_skb_any(skb); -skb_done: - /* Make sure the update to bdp and tx_skbuff are performed + if (txq->tx_buf[index].type == FEC_TXBUF_T_SKB) { + /* NOTE: SKBTX_IN_PROGRESS being set does not imply it's we who + * are to time stamp the packet, so we still need to check time + * stamping enabled flag. + */ + if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS && + fep->hwts_tx_en) && fep->bufdesc_ex) { + struct skb_shared_hwtstamps shhwtstamps; + struct bufdesc_ex *ebdp = (struct bufdesc_ex *)bdp; + + fec_enet_hwtstamp(fep, fec32_to_cpu(ebdp->ts), &shhwtstamps); + skb_tstamp_tx(skb, &shhwtstamps); + } + + /* Free the sk buffer associated with this last transmit */ + dev_kfree_skb_any(skb); + } else { + xdp_return_frame(xdpf); + + txq->tx_buf[index].xdp = NULL; + /* restore default tx buffer type: FEC_TXBUF_T_SKB */ + txq->tx_buf[index].type = FEC_TXBUF_T_SKB; + } + +tx_buf_done: + /* Make sure the update to bdp and tx_buf are performed * before dirty_tx */ wmb(); @@ -3249,9 +3292,19 @@ static void fec_enet_free_buffers(struct net_device *ndev) for (i = 0; i < txq->bd.ring_size; i++) { kfree(txq->tx_bounce[i]); txq->tx_bounce[i] = NULL; - skb = txq->tx_skbuff[i]; - txq->tx_skbuff[i] = NULL; - dev_kfree_skb(skb); + + if (txq->tx_buf[i].type == FEC_TXBUF_T_SKB) { + skb = txq->tx_buf[i].skb; + txq->tx_buf[i].skb = NULL; + dev_kfree_skb(skb); + } else { + if (txq->tx_buf[i].xdp) { + xdp_return_frame(txq->tx_buf[i].xdp); + txq->tx_buf[i].xdp = NULL; + } + + txq->tx_buf[i].type = FEC_TXBUF_T_SKB; + } } } } @@ -3817,7 +3870,8 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep, ebdp->cbd_esc = cpu_to_fec32(estatus); } - txq->tx_skbuff[index] = NULL; + txq->tx_buf[index].type = FEC_TXBUF_T_XDP_NDO; + txq->tx_buf[index].xdp = frame; /* Make sure the updates to rest of the descriptor are performed before * transferring ownership. -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames 2023-07-04 8:29 ` [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames wei.fang @ 2023-07-04 23:48 ` Andrew Lunn 2023-07-05 1:56 ` Wei Fang 0 siblings, 1 reply; 18+ messages in thread From: Andrew Lunn @ 2023-07-04 23:48 UTC (permalink / raw) To: wei.fang Cc: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev, linux-imx, linux-kernel, bpf > /* Save skb pointer */ > - txq->tx_skbuff[index] = skb; > + txq->tx_buf[index].skb = skb; What about txq->tx_buf[index].type ? > @@ -862,7 +860,7 @@ static int fec_enet_txq_submit_tso(struct fec_enet_priv_tx_q *txq, > } > > /* Save skb pointer */ > - txq->tx_skbuff[index] = skb; > + txq->tx_buf[index].skb = skb; here as well. > + /* restore default tx buffer type: FEC_TXBUF_T_SKB */ > + txq->tx_buf[i].type = FEC_TXBUF_T_SKB; Seems error prone. It would be safer to explicitly set it next to assigning .skb/.xdp. Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames 2023-07-04 23:48 ` Andrew Lunn @ 2023-07-05 1:56 ` Wei Fang 0 siblings, 0 replies; 18+ messages in thread From: Wei Fang @ 2023-07-05 1:56 UTC (permalink / raw) To: Andrew Lunn Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: 2023年7月5日 7:48 > To: Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net; > hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang > <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>; > netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>; > linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames > > > /* Save skb pointer */ > > - txq->tx_skbuff[index] = skb; > > + txq->tx_buf[index].skb = skb; > > What about txq->tx_buf[index].type ? > we restore the buffer type to FEC_TXBUF_T_SKB when recycling the buffer descriptor, so there is no need to set it again here. > > @@ -862,7 +860,7 @@ static int fec_enet_txq_submit_tso(struct > fec_enet_priv_tx_q *txq, > > } > > > > /* Save skb pointer */ > > - txq->tx_skbuff[index] = skb; > > + txq->tx_buf[index].skb = skb; > > here as well. > > > + /* restore default tx buffer type: FEC_TXBUF_T_SKB */ > > + txq->tx_buf[i].type = FEC_TXBUF_T_SKB; > > Seems error prone. It would be safer to explicitly set it next to > assigning .skb/.xdp. I also considered this method, but in the case when skb has frags or TSO, we only need to set skb pointer for the first txq->tx_buf, but we need to explicitly set the type for all txq->tx_buf corresponding to the skb. This may confuse others. As for your concern about being error-prone. I don't think it should be, I reset the type to FEC_TXBUF_T_SKB in all places where buffers are recycled or cleared. > > Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-04 8:29 [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit() wei.fang 2023-07-04 8:29 ` [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP wei.fang 2023-07-04 8:29 ` [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames wei.fang @ 2023-07-04 8:29 ` wei.fang 2023-07-05 0:25 ` Andrew Lunn 2 siblings, 1 reply; 18+ messages in thread From: wei.fang @ 2023-07-04 8:29 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev Cc: linux-imx, linux-kernel, bpf From: Wei Fang <wei.fang@nxp.com> When the XDP feature is enabled and with heavy XDP frames to be transmitted, there is a considerable probability that available tx BDs are insufficient. This will lead to some XDP frames to be discarded and the "NOT enough BD for SG!" error log will appear in the console (as shown below). [ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG! [ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG! In the case of heavy XDP traffic, sometimes the speed of recycling tx BDs may be slower than the speed of sending XDP frames. There may be several specific reasons, such as the interrupt is not responsed in time, the efficiency of the NAPI callback function is too low due to all the queues (tx queues and rx queues) share the same NAPI, and so on. After trying various methods, I think that increase the size of tx BD ring is simple and effective. Maybe the best resolution is that allocate NAPI for each queue to improve the efficiency of the NAPI callback, but this change is a bit big and I didn't try this method. Perheps this method will be implemented in a future patch. In addtion, this patch also updates the tx_stop_threshold and the tx_wake_threshold of the tx ring. In previous logic, the value of tx_stop_threshold is 217, however, the value of tx_wake_threshold is 147, it does not make sense that tx_wake_threshold is less than tx_stop_threshold. Besides, both XDP path and 'slow path' share the tx BD rings. So if tx_stop_threshold is 217, in the case of heavy XDP traffic, the slow path is easily to be stopped, this will have a serious impact on the slow path. Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> --- drivers/net/ethernet/freescale/fec.h | 2 +- drivers/net/ethernet/freescale/fec_main.c | 7 +++---- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 8c0226d061fe..63a053dea819 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -355,7 +355,7 @@ struct bufdesc_ex { #define RX_RING_SIZE (FEC_ENET_RX_FRPPG * FEC_ENET_RX_PAGES) #define FEC_ENET_TX_FRSIZE 2048 #define FEC_ENET_TX_FRPPG (PAGE_SIZE / FEC_ENET_TX_FRSIZE) -#define TX_RING_SIZE 512 /* Must be power of two */ +#define TX_RING_SIZE 1024 /* Must be power of two */ #define TX_RING_MOD_MASK 511 /* for this to work */ #define BD_ENET_RX_INT 0x00800000 diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 940d3afe1d24..6c255c0d6f41 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3348,9 +3348,8 @@ static int fec_enet_alloc_queue(struct net_device *ndev) txq->bd.ring_size = TX_RING_SIZE; fep->total_tx_ring_size += fep->tx_queue[i]->bd.ring_size; - txq->tx_stop_threshold = FEC_MAX_SKB_DESCS; - txq->tx_wake_threshold = - (txq->bd.ring_size - txq->tx_stop_threshold) / 2; + txq->tx_stop_threshold = MAX_SKB_FRAGS; + txq->tx_wake_threshold = MAX_SKB_FRAGS + 1; txq->tso_hdrs = dma_alloc_coherent(&fep->pdev->dev, txq->bd.ring_size * TSO_HEADER_SIZE, @@ -3837,7 +3836,7 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep, entries_free = fec_enet_get_free_txdesc_num(txq); if (entries_free < MAX_SKB_FRAGS + 1) { - netdev_err(fep->netdev, "NOT enough BD for SG!\n"); + netdev_err_once(fep->netdev, "NOT enough BD for SG!\n"); return -EBUSY; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-04 8:29 ` [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring wei.fang @ 2023-07-05 0:25 ` Andrew Lunn 2023-07-05 6:20 ` Wei Fang 0 siblings, 1 reply; 18+ messages in thread From: Andrew Lunn @ 2023-07-05 0:25 UTC (permalink / raw) To: wei.fang Cc: davem, edumazet, kuba, pabeni, ast, daniel, hawk, john.fastabend, shenwei.wang, xiaoning.wang, netdev, linux-imx, linux-kernel, bpf > After trying various methods, I think that increase the size of tx > BD ring is simple and effective. Maybe the best resolution is that > allocate NAPI for each queue to improve the efficiency of the NAPI > callback, but this change is a bit big and I didn't try this method. > Perheps this method will be implemented in a future patch. How does this affect platforms like Vybrid with its fast Ethernet? Does the burst latency go up? > In addtion, this patch also updates the tx_stop_threshold and the > tx_wake_threshold of the tx ring. In previous logic, the value of > tx_stop_threshold is 217, however, the value of tx_wake_threshold > is 147, it does not make sense that tx_wake_threshold is less than > tx_stop_threshold. What do these actually mean? I could imagine that as the ring fills you don't want to stop until it is 217/512 full. There is then some hysteresis, such that it has to drop below 147/512 before more can be added? > Besides, both XDP path and 'slow path' share the > tx BD rings. So if tx_stop_threshold is 217, in the case of heavy > XDP traffic, the slow path is easily to be stopped, this will have > a serious impact on the slow path. Please post your iperf results for various platforms, so we can see the effects of this. We generally don't accept tuning patches without benchmarks which prove the improvements, and also show there is no regression. And given the wide variety of SoCs using the FEC, i expect testing on a number of SoCs, but Fast and 1G. Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-05 0:25 ` Andrew Lunn @ 2023-07-05 6:20 ` Wei Fang 2023-07-05 18:11 ` Jakub Kicinski 2023-07-08 19:03 ` Andrew Lunn 0 siblings, 2 replies; 18+ messages in thread From: Wei Fang @ 2023-07-05 6:20 UTC (permalink / raw) To: Andrew Lunn Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: 2023年7月5日 8:25 > To: Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net; > hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang > <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>; > netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>; > linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update > thresholds of tx ring > > > After trying various methods, I think that increase the size of tx BD > > ring is simple and effective. Maybe the best resolution is that > > allocate NAPI for each queue to improve the efficiency of the NAPI > > callback, but this change is a bit big and I didn't try this method. > > Perheps this method will be implemented in a future patch. > > How does this affect platforms like Vybrid with its fast Ethernet? Sorry, I don't have the Vybrid platform, but I think I don't think it has much impact, at most it just takes up some more memory. > Does the burst latency go up? No, for fec, when a packet is attached to the BDs, the software will immediately trigger the hardware to send the packet. In addition, I think it may improve the latency, because the size of the tx ring becomes larger, and more packets can be attached to the BD ring for burst traffic. > > > In addtion, this patch also updates the tx_stop_threshold and the > > tx_wake_threshold of the tx ring. In previous logic, the value of > > tx_stop_threshold is 217, however, the value of tx_wake_threshold is > > 147, it does not make sense that tx_wake_threshold is less than > > tx_stop_threshold. > > What do these actually mean? I could imagine that as the ring fills you don't > want to stop until it is 217/512 full. There is then some hysteresis, such that it > has to drop below 147/512 before more can be added? > You must have misunderstood, let me explain more clearly, the queue will be stopped when the available BDs are less than tx_stop_threshold (217 BDs). And the queue will be waked when the available BDs are greater than tx_wake_threshold (147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold when the queue is stopped, the only effect is to delay packet sending. In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we should stop queue when the available BDs are not enough for a skb to be attached. And wake the queue when the available BDs are sufficient for a skb. > > Besides, both XDP path and 'slow path' share the tx BD rings. So if > > tx_stop_threshold is 217, in the case of heavy XDP traffic, the slow > > path is easily to be stopped, this will have a serious impact on the > > slow path. > > Please post your iperf results for various platforms, so we can see the effects of > this. We generally don't accept tuning patches without benchmarks which > prove the improvements, and also show there is no regression. And given the > wide variety of SoCs using the FEC, i expect testing on a number of SoCs, but > Fast and 1G. > Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and 8ULP only support Fast ethernet. Others support 1G. 1.1 i.MX6UL with tx ring size 512 root@imx6ul7d:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 47148 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 10.1 MBytes 84.9 Mbits/sec 0 103 KBytes [ 5] 1.00-2.00 sec 9.88 MBytes 82.6 Mbits/sec 0 103 KBytes [ 5] 2.00-3.00 sec 9.82 MBytes 82.7 Mbits/sec 0 103 KBytes [ 5] 3.00-4.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes [ 5] 4.00-5.00 sec 9.88 MBytes 82.9 Mbits/sec 0 103 KBytes [ 5] 5.00-6.00 sec 9.94 MBytes 83.4 Mbits/sec 0 103 KBytes [ 5] 6.00-7.00 sec 10.1 MBytes 84.3 Mbits/sec 0 103 KBytes [ 5] 7.00-8.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes [ 5] 8.00-9.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes [ 5] 9.00-10.00 sec 9.88 MBytes 82.9 Mbits/sec 0 103 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 99.0 MBytes 83.1 Mbits/sec 0 sender [ 5] 0.00-10.01 sec 98.8 MBytes 82.8 Mbits/sec receiver 1.2 i.MX6UL with tx ring size 1024 root@imx6ul7d:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 55670 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 10.2 MBytes 85.4 Mbits/sec 0 236 KBytes [ 5] 1.00-2.00 sec 10.1 MBytes 84.6 Mbits/sec 0 236 KBytes [ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 249 KBytes [ 5] 3.00-4.00 sec 10.1 MBytes 85.1 Mbits/sec 0 249 KBytes [ 5] 4.00-5.00 sec 10.1 MBytes 84.7 Mbits/sec 0 249 KBytes [ 5] 5.00-6.00 sec 10.0 MBytes 84.1 Mbits/sec 0 249 KBytes [ 5] 6.00-7.00 sec 10.1 MBytes 85.1 Mbits/sec 0 249 KBytes [ 5] 7.00-8.00 sec 10.1 MBytes 84.9 Mbits/sec 0 249 KBytes [ 5] 8.00-9.00 sec 10.3 MBytes 85.9 Mbits/sec 0 249 KBytes [ 5] 9.00-10.00 sec 10.2 MBytes 85.6 Mbits/sec 0 249 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 101 MBytes 85.1 Mbits/sec 0 sender [ 5] 0.00-10.01 sec 101 MBytes 84.5 Mbits/sec receiver 2.1 i.MX8ULP with tx ring size 512 root@imx8ulpevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 54342 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 10.8 MBytes 90.3 Mbits/sec 0 99.0 KBytes [ 5] 1.00-2.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes [ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes [ 5] 3.00-4.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes [ 5] 4.00-5.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes [ 5] 5.00-6.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes [ 5] 6.00-7.00 sec 9.69 MBytes 81.3 Mbits/sec 0 99.0 KBytes [ 5] 7.00-8.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes [ 5] 8.00-9.00 sec 9.69 MBytes 81.3 Mbits/sec 0 99.0 KBytes [ 5] 9.00-10.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 100 MBytes 84.3 Mbits/sec 0 sender [ 5] 0.00-9.90 sec 100 MBytes 84.7 Mbits/sec receiver 2.1 i.MX8ULP with tx ring size 1024 root@imx8ulpevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 44770 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 10.7 MBytes 90.1 Mbits/sec 0 93.3 KBytes [ 5] 1.00-2.00 sec 9.94 MBytes 83.4 Mbits/sec 0 93.3 KBytes [ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 93.3 KBytes [ 5] 3.00-4.00 sec 10.1 MBytes 85.0 Mbits/sec 0 93.3 KBytes [ 5] 4.00-5.00 sec 9.94 MBytes 83.4 Mbits/sec 0 100 KBytes [ 5] 5.00-6.00 sec 10.2 MBytes 85.5 Mbits/sec 0 100 KBytes [ 5] 6.00-7.00 sec 9.69 MBytes 81.3 Mbits/sec 0 100 KBytes [ 5] 7.00-8.00 sec 9.94 MBytes 83.4 Mbits/sec 0 100 KBytes [ 5] 8.00-9.00 sec 10.2 MBytes 85.5 Mbits/sec 0 100 KBytes [ 5] 9.00-10.00 sec 9.69 MBytes 81.3 Mbits/sec 0 100 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 101 MBytes 84.4 Mbits/sec 0 sender [ 5] 0.00-9.92 sec 100 MBytes 84.8 Mbits/sec receiver 3.1 i.MX8MM with tx ring size 512 root@imx8mmevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 55734 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 111 MBytes 934 Mbits/sec 0 577 KBytes [ 5] 1.00-2.00 sec 112 MBytes 937 Mbits/sec 0 577 KBytes [ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec 0 609 KBytes [ 5] 3.00-4.00 sec 113 MBytes 945 Mbits/sec 0 638 KBytes [ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 0 638 KBytes [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes [ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes [ 5] 7.00-8.00 sec 112 MBytes 943 Mbits/sec 0 638 KBytes [ 5] 8.00-9.00 sec 112 MBytes 943 Mbits/sec 0 638 KBytes [ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec receiver 3.2 i.MX8MM with tx ring size 1024 root@imx8mmevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 53350 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 114 MBytes 952 Mbits/sec 0 585 KBytes [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec 0 585 KBytes [ 5] 2.00-3.00 sec 113 MBytes 947 Mbits/sec 0 585 KBytes [ 5] 3.00-4.00 sec 112 MBytes 940 Mbits/sec 0 648 KBytes [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 648 KBytes [ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes [ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver 4.1 i.MX8MP with tx ring size 512 root@imx8mpevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 51892 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 594 KBytes [ 5] 1.00-2.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes [ 5] 2.00-3.00 sec 113 MBytes 946 Mbits/sec 0 626 KBytes [ 5] 3.00-4.00 sec 112 MBytes 937 Mbits/sec 0 626 KBytes [ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes [ 5] 5.00-6.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes [ 5] 6.00-7.00 sec 113 MBytes 946 Mbits/sec 0 626 KBytes [ 5] 7.00-8.00 sec 112 MBytes 939 Mbits/sec 0 626 KBytes [ 5] 8.00-9.00 sec 111 MBytes 935 Mbits/sec 0 626 KBytes [ 5] 9.00-10.00 sec 112 MBytes 943 Mbits/sec 0 626 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver 4.2 i.MX8MP with tx ring size 1024 root@imx8mpevk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 37922 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 113 MBytes 951 Mbits/sec 0 608 KBytes [ 5] 1.00-2.00 sec 112 MBytes 937 Mbits/sec 0 608 KBytes [ 5] 2.00-3.00 sec 113 MBytes 947 Mbits/sec 0 608 KBytes [ 5] 3.00-4.00 sec 111 MBytes 934 Mbits/sec 0 608 KBytes [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes [ 5] 5.00-6.00 sec 112 MBytes 939 Mbits/sec 0 608 KBytes [ 5] 6.00-7.00 sec 113 MBytes 949 Mbits/sec 0 608 KBytes [ 5] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes [ 5] 8.00-9.00 sec 112 MBytes 936 Mbits/sec 0 608 KBytes [ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec receiver 5.1 i.MX93 with tx ring size 512 root@imx93evk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 44216 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 115 MBytes 965 Mbits/sec 0 656 KBytes [ 5] 1.00-2.00 sec 111 MBytes 934 Mbits/sec 0 656 KBytes [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 3.00-4.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 0 656 KBytes [ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 944 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec receiver 5.2 i.MX93 with tx ring size 1024 root@imx93evk:~# iperf3 -c 192.168.3.6 Connecting to host 192.168.3.6, port 5201 [ 5] local 192.168.3.9 port 51058 connected to 192.168.3.6 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 588 KBytes [ 5] 1.00-2.00 sec 112 MBytes 935 Mbits/sec 0 649 KBytes [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 3.00-4.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 649 KBytes [ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.10 GBytes 940 Mbits/sec receiver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-05 6:20 ` Wei Fang @ 2023-07-05 18:11 ` Jakub Kicinski 2023-07-06 1:44 ` Wei Fang 2023-07-08 19:03 ` Andrew Lunn 1 sibling, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2023-07-05 18:11 UTC (permalink / raw) To: Wei Fang Cc: Andrew Lunn, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote: > > > In addtion, this patch also updates the tx_stop_threshold and the > > > tx_wake_threshold of the tx ring. In previous logic, the value of > > > tx_stop_threshold is 217, however, the value of tx_wake_threshold is > > > 147, it does not make sense that tx_wake_threshold is less than > > > tx_stop_threshold. > > > > What do these actually mean? I could imagine that as the ring fills you don't > > want to stop until it is 217/512 full. There is then some hysteresis, such that it > > has to drop below 147/512 before more can be added? > > > You must have misunderstood, let me explain more clearly, the queue will be > stopped when the available BDs are less than tx_stop_threshold (217 BDs). And > the queue will be waked when the available BDs are greater than tx_wake_threshold > (147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold > when the queue is stopped, the only effect is to delay packet sending. > In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we > should stop queue when the available BDs are not enough for a skb to be attached. > And wake the queue when the available BDs are sufficient for a skb. But you shouldn't restart the queue for a single packet either. Restarting for a single packet wastes CPU cycles as there will be much more stop / start operations. Two large packets seem like the absolute minimum reasonable wake threshold. Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as you won't be able to accept a full TSO frame. Please split the change, the netdev_err_once() should be one patch and then the change to wake thresh a separate one. -- pw-bot: cr ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-05 18:11 ` Jakub Kicinski @ 2023-07-06 1:44 ` Wei Fang 2023-07-06 3:11 ` Jakub Kicinski 0 siblings, 1 reply; 18+ messages in thread From: Wei Fang @ 2023-07-06 1:44 UTC (permalink / raw) To: Jakub Kicinski Cc: Andrew Lunn, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: 2023年7月6日 2:11 > To: Wei Fang <wei.fang@nxp.com> > Cc: Andrew Lunn <andrew@lunn.ch>; davem@davemloft.net; > edumazet@google.com; pabeni@redhat.com; ast@kernel.org; > daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com; > Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang > <xiaoning.wang@nxp.com>; netdev@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update > thresholds of tx ring > > On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote: > > > > In addtion, this patch also updates the tx_stop_threshold and the > > > > tx_wake_threshold of the tx ring. In previous logic, the value of > > > > tx_stop_threshold is 217, however, the value of tx_wake_threshold > > > > is 147, it does not make sense that tx_wake_threshold is less than > > > > tx_stop_threshold. > > > > > > What do these actually mean? I could imagine that as the ring fills > > > you don't want to stop until it is 217/512 full. There is then some > > > hysteresis, such that it has to drop below 147/512 before more can be > added? > > > > > You must have misunderstood, let me explain more clearly, the queue > > will be stopped when the available BDs are less than tx_stop_threshold > > (217 BDs). And the queue will be waked when the available BDs are > > greater than tx_wake_threshold > > (147 BDs). So in most cases, the available BDs are greater than > > tx_wake_threshold when the queue is stopped, the only effect is to delay > packet sending. > > In my opinion, tx_wake_threshold should be greater than > > tx_stop_threshold, we should stop queue when the available BDs are not > enough for a skb to be attached. > > And wake the queue when the available BDs are sufficient for a skb. > > But you shouldn't restart the queue for a single packet either. > Restarting for a single packet wastes CPU cycles as there will be much more > stop / start operations. Two large packets seem like the absolute minimum > reasonable wake threshold. > > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as > you won't be able to accept a full TSO frame. > Maybe I should keep the tx_stop_threshold unchanged, so that the queue is to be stopped if the available BDs is not enough for a full TSO frame to be attached. And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's more reasonable. > Please split the change, the netdev_err_once() should be one patch and then > the change to wake thresh a separate one. Okay, thanks! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-06 1:44 ` Wei Fang @ 2023-07-06 3:11 ` Jakub Kicinski 2023-07-06 6:14 ` Wei Fang 0 siblings, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2023-07-06 3:11 UTC (permalink / raw) To: Wei Fang Cc: Andrew Lunn, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote: > > But you shouldn't restart the queue for a single packet either. > > Restarting for a single packet wastes CPU cycles as there will be much more > > stop / start operations. Two large packets seem like the absolute minimum > > reasonable wake threshold. > > > > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as > > you won't be able to accept a full TSO frame. > > > Maybe I should keep the tx_stop_threshold unchanged, so that the queue is > to be stopped if the available BDs is not enough for a full TSO frame to be attached. > And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's > more reasonable. How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ? If a queue of hundreds of entries is overflowing, we should be able to apply a hysteresis of a few tens of entries. Do you see a difference in drops? The packets from the stack should preferably stay in the qdiscs instead of the driver queue, where AQM and scheduling can be applied. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-06 3:11 ` Jakub Kicinski @ 2023-07-06 6:14 ` Wei Fang 0 siblings, 0 replies; 18+ messages in thread From: Wei Fang @ 2023-07-06 6:14 UTC (permalink / raw) To: Jakub Kicinski Cc: Andrew Lunn, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: 2023年7月6日 11:12 > To: Wei Fang <wei.fang@nxp.com> > Cc: Andrew Lunn <andrew@lunn.ch>; davem@davemloft.net; > edumazet@google.com; pabeni@redhat.com; ast@kernel.org; > daniel@iogearbox.net; hawk@kernel.org; john.fastabend@gmail.com; > Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang > <xiaoning.wang@nxp.com>; netdev@vger.kernel.org; dl-linux-imx > <linux-imx@nxp.com>; linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update > thresholds of tx ring > > On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote: > > > But you shouldn't restart the queue for a single packet either. > > > Restarting for a single packet wastes CPU cycles as there will be > > > much more stop / start operations. Two large packets seem like the > > > absolute minimum reasonable wake threshold. > > > > > > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right > > > either, as you won't be able to accept a full TSO frame. > > > > > Maybe I should keep the tx_stop_threshold unchanged, so that the queue > > is to be stopped if the available BDs is not enough for a full TSO frame to be > attached. > > And then just change tx_wake_threshold to tx_stop_threshold + 1, which > > I think it's more reasonable. > > How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ? It's okay. The iperf performance is well as before. > If a queue of hundreds of entries is overflowing, we should be able to apply a > hysteresis of a few tens of entries. Do you see a difference in drops? The I didn't see there was any packet loss. > packets from the stack should preferably stay in the qdiscs instead of the driver > queue, where AQM and scheduling can be applied. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-05 6:20 ` Wei Fang 2023-07-05 18:11 ` Jakub Kicinski @ 2023-07-08 19:03 ` Andrew Lunn 2023-07-10 6:16 ` Wei Fang 1 sibling, 1 reply; 18+ messages in thread From: Andrew Lunn @ 2023-07-08 19:03 UTC (permalink / raw) To: Wei Fang Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > > How does this affect platforms like Vybrid with its fast Ethernet? > Sorry, I don't have the Vybrid platform, but I think I don't think it has much > impact, at most it just takes up some more memory. It has been 6 months since the page pool patches were posted and i asked about benchmark results for other platforms like Vybrid. Is it so hard to get hold or reference platforms? Fugang Duan used to have a test farm of all sorts of boards and reported to me the regressions i introduced with MDIO changes and PM changes. As somebody who seems to be an NXP FEC Maintainer i would expect you to have access to a range of hardware. Especially since XDP and eBPF is a bit of a niche for embedded processes which NXP produce. You want to be sure your changes don't regress the main use cases which i guess are plain networking. > > Does the burst latency go up? > No, for fec, when a packet is attached to the BDs, the software will immediately > trigger the hardware to send the packet. In addition, I think it may improve the > latency, because the size of the tx ring becomes larger, and more packets can be > attached to the BD ring for burst traffic. And a bigger burst means more latency. Read about Buffer bloat. While you have iperf running saturating the link, try a ping as well. How does ping latency change with more TX buffers? Ideally you want enough transmit buffers to keep the link full, but not more. If the driver is using BQL, the network stack will help with this. > Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and > 8ULP only support Fast ethernet. Others support 1G. Thanks for the benchmark numbers. Please get into the habit of including them. We like to see justification for any sort of performance tweaks. Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-08 19:03 ` Andrew Lunn @ 2023-07-10 6:16 ` Wei Fang 2023-07-12 1:16 ` Andrew Lunn 0 siblings, 1 reply; 18+ messages in thread From: Wei Fang @ 2023-07-10 6:16 UTC (permalink / raw) To: Andrew Lunn Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > -----Original Message----- > From: Andrew Lunn <andrew@lunn.ch> > Sent: 2023年7月9日 3:04 > To: Wei Fang <wei.fang@nxp.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; ast@kernel.org; daniel@iogearbox.net; > hawk@kernel.org; john.fastabend@gmail.com; Shenwei Wang > <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>; > netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>; > linux-kernel@vger.kernel.org; bpf@vger.kernel.org > Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update > thresholds of tx ring > > > > How does this affect platforms like Vybrid with its fast Ethernet? > > Sorry, I don't have the Vybrid platform, but I think I don't think it > > has much impact, at most it just takes up some more memory. > > It has been 6 months since the page pool patches were posted and i asked > about benchmark results for other platforms like Vybrid. Is it so hard to get > hold or reference platforms? Fugang Duan used to have a test farm of all sorts > of boards and reported to me the regressions i introduced with MDIO changes > and PM changes. As somebody who seems to be an NXP FEC Maintainer i > would expect you to have access to a range of hardware. Especially since XDP > and eBPF is a bit of a niche for embedded processes which NXP produce. You > want to be sure your changes don't regress the main use cases which i guess > are plain networking. > Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not included in our usual Yocto SDK RC version. Even I find a Vybrid board, I think it probably cann't run with the latest kernel image, because the latest kernel image does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid platform. I also asked my colleague in test team who is in charge of ethernet testing, she hadn't even heard of Vybrid platform. > > > Does the burst latency go up? > > No, for fec, when a packet is attached to the BDs, the software will > > immediately trigger the hardware to send the packet. In addition, I > > think it may improve the latency, because the size of the tx ring > > becomes larger, and more packets can be attached to the BD ring for burst > traffic. > > And a bigger burst means more latency. Read about Buffer bloat. > Perhaps, but not necessarily. For example, in some cases that some burst packets maybe stay in Qdisc instead of hardware queue if ring size is small. > While you have iperf running saturating the link, try a ping as well. How does > ping latency change with more TX buffers? > Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The results do not appear to be very different. i.MX93 with ring size 1024: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16 PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data. 64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=3.78 ms 64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=2.16 ms 64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=3.31 ms 64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=2.11 ms 64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.43 ms 64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=3.20 ms 64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=3.20 ms 64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.75 ms 64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=3.21 ms 64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.76 ms 64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=2.16 ms 64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.67 ms 64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=3.59 ms 64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=2.55 ms 64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.54 ms 64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=3.88 ms --- 10.193.102.114 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15027ms rtt min/avg/max/mdev = 2.112/3.082/3.875/0.606 ms i.MX93 with ring size 512: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16 PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data. 64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=2.74 ms 64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=3.32 ms 64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=2.72 ms 64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=3.36 ms 64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.41 ms 64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=2.67 ms 64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=2.77 ms 64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.38 ms 64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=2.54 ms 64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.36 ms 64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=3.44 ms 64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.80 ms 64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=2.86 ms 64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=3.90 ms 64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.50 ms 64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=2.89 ms --- 10.193.102.114 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15028ms rtt min/avg/max/mdev = 2.496/3.040/3.898/0.394 ms i.MX8MM with ring size 1024: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16 PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data. 64 bytes from 10.193.102.126: icmp_seq=1 ttl=127 time=1.34 ms 64 bytes from 10.193.102.126: icmp_seq=2 ttl=127 time=2.07 ms 64 bytes from 10.193.102.126: icmp_seq=3 ttl=127 time=2.40 ms 64 bytes from 10.193.102.126: icmp_seq=4 ttl=127 time=1.48 ms 64 bytes from 10.193.102.126: icmp_seq=5 ttl=127 time=1.69 ms 64 bytes from 10.193.102.126: icmp_seq=6 ttl=127 time=1.54 ms 64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.30 ms 64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.94 ms 64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=4.25 ms 64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.75 ms 64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.25 ms 64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=2.04 ms 64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.31 ms 64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=2.18 ms 64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=2.25 ms 64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=1.37 ms --- 10.193.102.126 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15026ms rtt min/avg/max/mdev = 1.248/2.011/4.250/0.686 ms i.MX8MM with ring size 512: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16 PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data. 64 bytes from 10.193.102.126: icmp_seq=1 ttl=63 time=4.82 ms 64 bytes from 10.193.102.126: icmp_seq=2 ttl=63 time=4.67 ms 64 bytes from 10.193.102.126: icmp_seq=3 ttl=63 time=3.74 ms 64 bytes from 10.193.102.126: icmp_seq=4 ttl=63 time=3.87 ms 64 bytes from 10.193.102.126: icmp_seq=5 ttl=63 time=3.30 ms 64 bytes from 10.193.102.126: icmp_seq=6 ttl=63 time=3.79 ms 64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.12 ms 64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.99 ms 64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=2.15 ms 64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.82 ms 64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.92 ms 64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=1.23 ms 64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.00 ms 64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=1.66 ms 64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=1.75 ms 64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=2.24 ms --- 10.193.102.126 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15026ms rtt min/avg/max/mdev = 1.226/2.691/4.820/1.111 ms i.MX8ULP with ring size 1024: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16 PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data. 64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=3.40 ms 64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.46 ms 64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=5.55 ms 64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.97 ms 64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=6.26 ms 64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=0.963 ms 64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=4.10 ms 64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=4.55 ms 64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.24 ms 64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.96 ms 64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=3.27 ms 64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.57 ms 64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=2.99 ms 64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=1.70 ms 64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=1.79 ms 64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.42 ms --- 10.193.102.216 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15026ms rtt min/avg/max/mdev = 0.963/3.886/6.955/2.009 ms i.MX8ULP with ring size 512: fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16 PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data. 64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=5.70 ms 64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.89 ms 64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=3.37 ms 64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.07 ms 64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=1.47 ms 64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=3.45 ms 64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=1.35 ms 64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=6.62 ms 64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.41 ms 64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.43 ms 64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=1.41 ms 64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.75 ms 64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=4.76 ms 64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=3.85 ms 64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=3.50 ms 64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.37 ms --- 10.193.102.216 ping statistics --- 16 packets transmitted, 16 received, 0% packet loss, time 15027ms rtt min/avg/max/mdev = 1.349/3.900/6.749/1.985 ms > Ideally you want enough transmit buffers to keep the link full, but not more. If > the driver is using BQL, the network stack will help with this. > > > Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL > > and 8ULP only support Fast ethernet. Others support 1G. > > Thanks for the benchmark numbers. Please get into the habit of including > them. We like to see justification for any sort of performance tweaks. > > Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring 2023-07-10 6:16 ` Wei Fang @ 2023-07-12 1:16 ` Andrew Lunn 0 siblings, 0 replies; 18+ messages in thread From: Andrew Lunn @ 2023-07-12 1:16 UTC (permalink / raw) To: Wei Fang Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com, Shenwei Wang, Clark Wang, netdev@vger.kernel.org, dl-linux-imx, linux-kernel@vger.kernel.org, bpf@vger.kernel.org > Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not > included in our usual Yocto SDK RC version. Your Yocto SDK version does not matter. We are talking about mainline here... You are maintaining the mainline driver, and submitting patches to extend the mainline driver. > Even I find a Vybrid board, I think it > probably cann't run with the latest kernel image, because the latest kernel image > does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid > platform. I also asked my colleague in test team who is in charge of ethernet testing, > she hadn't even heard of Vybrid platform. Well 6.5-rc1 does run on Vybrid, i have a number of boards with it. You will also find that many inflight entertainment systems have a box under each row of seats in a aeroplane with a Vybrid controlling a Marvell switch. Take a look at arch/arm/boot/dts/vf610-zii-* You will also find a number of DTS files for imx5i, imx6, imx7 used for ZII products which are back of the seat displays, and make heavy use of networking with the FEC. So i have in interest in the FEC for all these platforms. > > And a bigger burst means more latency. Read about Buffer bloat. > > > Perhaps, but not necessarily. For example, in some cases that some burst packets > maybe stay in Qdisc instead of hardware queue if ring size is small. Which is what you want, so high priority packets can jump to the head of the queue. > > While you have iperf running saturating the link, try a ping as well. How does > > ping latency change with more TX buffers? > > > Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The > results do not appear to be very different. Thanks for the benchmark numbers. They look O.K. Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-07-12 1:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-04 8:29 [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit() wei.fang 2023-07-04 8:29 ` [PATCH net 1/3] net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP wei.fang 2023-07-04 23:41 ` Andrew Lunn 2023-07-04 23:54 ` Toke Høiland-Jørgensen 2023-07-05 6:31 ` Wei Fang 2023-07-04 8:29 ` [PATCH net 2/3] net: fec: recycle pages for transmitted XDP frames wei.fang 2023-07-04 23:48 ` Andrew Lunn 2023-07-05 1:56 ` Wei Fang 2023-07-04 8:29 ` [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring wei.fang 2023-07-05 0:25 ` Andrew Lunn 2023-07-05 6:20 ` Wei Fang 2023-07-05 18:11 ` Jakub Kicinski 2023-07-06 1:44 ` Wei Fang 2023-07-06 3:11 ` Jakub Kicinski 2023-07-06 6:14 ` Wei Fang 2023-07-08 19:03 ` Andrew Lunn 2023-07-10 6:16 ` Wei Fang 2023-07-12 1:16 ` Andrew Lunn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).