netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Wei Fang <wei.fang@nxp.com>,
	Jesper Dangaard Brouer <jbrouer@redhat.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	Shenwei Wang <shenwei.wang@nxp.com>,
	Clark Wang <xiaoning.wang@nxp.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"hawk@kernel.org" <hawk@kernel.org>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: brouer@redhat.com, dl-linux-imx <linux-imx@nxp.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Andrew Lunn <andrew@lunn.ch>
Subject: Re: [PATCH V3 net-next] net: fec: add XDP_TX feature support
Date: Wed, 2 Aug 2023 14:34:24 +0200	[thread overview]
Message-ID: <6cfddca7-0c22-01ca-de43-a2b3f71aa75a@redhat.com> (raw)
In-Reply-To: <AM5PR04MB31394F01926FB20F95262E0A880BA@AM5PR04MB3139.eurprd04.prod.outlook.com>



On 02/08/2023 11.59, Wei Fang wrote:
>>
>> On 31/07/2023 08.00, Wei Fang wrote:
>>> The XDP_TX feature is not supported before, and all the frames
>>> which are deemed to do XDP_TX action actually do the XDP_DROP
>>> action. So this patch adds the XDP_TX support to FEC driver.
>>>
>>> I tested the performance of XDP_TX feature in XDP_DRV and XDP_SKB
>>> modes on i.MX8MM-EVK and i.MX8MP-EVK platforms respectively, and
>>> the test steps and results are as follows.
>>>
>>> Step 1: Board A connects to the FEC port of the DUT and runs the
>>> pktgen_sample03_burst_single_flow.sh script to generate and send
>>> burst traffic to DUT. Note that the length of packet was set to
>>> 64 bytes and the procotol of packet was UDP in my test scenario.
>>>
>>> Step 2: The DUT runs the xdp2 program to transmit received UDP
>>> packets back out on the same port where they were received.
>>>
>>
>> Below test result runs should have some more explaination, please.
>> (more inline code comments below)
>>
>>> root@imx8mmevk:~# ./xdp2 eth0
>>> proto 17:     150326 pkt/s
>>> proto 17:     141920 pkt/s
>>> proto 17:     147338 pkt/s
>>> proto 17:     140783 pkt/s
>>> proto 17:     150400 pkt/s
>>> proto 17:     134651 pkt/s
>>> proto 17:     134676 pkt/s
>>> proto 17:     134959 pkt/s
>>> proto 17:     148152 pkt/s
>>> proto 17:     149885 pkt/s
>>>
>>> root@imx8mmevk:~# ./xdp2 -S eth0
>>> proto 17:     131094 pkt/s
>>> proto 17:     134691 pkt/s
>>> proto 17:     138930 pkt/s
>>> proto 17:     129347 pkt/s
>>> proto 17:     133050 pkt/s
>>> proto 17:     132932 pkt/s
>>> proto 17:     136628 pkt/s
>>> proto 17:     132964 pkt/s
>>> proto 17:     131265 pkt/s
>>> proto 17:     135794 pkt/s
>>>
>>> root@imx8mpevk:~# ./xdp2 eth0
>>> proto 17:     135817 pkt/s
>>> proto 17:     142776 pkt/s
>>> proto 17:     142237 pkt/s
>>> proto 17:     135673 pkt/s
>>> proto 17:     139508 pkt/s
>>> proto 17:     147340 pkt/s
>>> proto 17:     133329 pkt/s
>>> proto 17:     141171 pkt/s
>>> proto 17:     146917 pkt/s
>>> proto 17:     135488 pkt/s
>>>
>>> root@imx8mpevk:~# ./xdp2 -S eth0
>>> proto 17:     133150 pkt/s
>>> proto 17:     133127 pkt/s
>>> proto 17:     133538 pkt/s
>>> proto 17:     133094 pkt/s
>>> proto 17:     133690 pkt/s
>>> proto 17:     133199 pkt/s
>>> proto 17:     133905 pkt/s
>>> proto 17:     132908 pkt/s
>>> proto 17:     133292 pkt/s
>>> proto 17:     133511 pkt/s
>>>
>>
>> For this driver, I would like to see a benchmark comparison between
>> XDP_TX and XDP_REDIRECT.
>>
> Okay, I'll do a comparison test.

Thanks.

> 
>> As below code does could create a situation where XDP_REDIRECT is just
>> as fast as XDP_TX.  (Note, that I expect XDP_TX to be faster than
>> XDP_REDIRECT.)
>>
> Could you explain why you expect XDP_TX should be faster than XDP_REDIRECT?

First of all: I explained the changes needed to improve XDP_TX, below as
comments on the code. Please read and address.

XDP_TX should always be faster than XDP_REDIRECT, because it happens
locally in the driver and doesn't have to go through any generic
xdp_do_redirect code.

Like benchmarks shown in our XDP-paper[1] on mlx5 the graph[2] shows
XDP_TX vs XDP_REDIRECT vs. DPDK.

  [1] https://dl.acm.org/doi/10.1145/3281411.3281443
  [2] 
https://github.com/xdp-project/xdp-paper/blob/master/figures/redirect-test.pdf


> What's the problem if XDP_TX is as fast ad XDP_REDIRECT?
> 
>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>> ---
>>> V2 changes:
>>> According to Jakub's comments, the V2 patch adds two changes.
>>> 1. Call txq_trans_cond_update() in fec_enet_xdp_tx_xmit() to avoid
>>> tx timeout as XDP shares the queues with kernel stack.
>>> 2. Tx processing shouldn't call any XDP (or page pool) APIs if the
>>> "budget" is 0.
>>>
>>> V3 changes:
>>> 1. Remove the second change in V2, because this change has been
>>> separated into another patch and it has been submmitted to the
>>> upstream [1].
>>> [1]
>> https://lore.k/
>> ernel.org%2Fr%2F20230725074148.2936402-1-wei.fang%40nxp.com&data=
>> 05%7C01%7Cwei.fang%40nxp.com%7C9a2fc5bab84947e4bea608db933aa5
>> e9%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638265652320
>> 018962%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV
>> 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wc
>> xe8nBeLS9uQrbphuNI18owgDNHJq9478V53KybWB8%3D&reserved=0
>>> ---
>>>    drivers/net/ethernet/freescale/fec.h      |  1 +
>>>    drivers/net/ethernet/freescale/fec_main.c | 80
>> ++++++++++++++++++-----
>>>    2 files changed, 65 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/freescale/fec.h
>> b/drivers/net/ethernet/freescale/fec.h
>>> index 8f1edcca96c4..f35445bddc7a 100644
>>> --- a/drivers/net/ethernet/freescale/fec.h
>>> +++ b/drivers/net/ethernet/freescale/fec.h
>>> @@ -547,6 +547,7 @@ enum {
>>>    enum fec_txbuf_type {
>>>      FEC_TXBUF_T_SKB,
>>>      FEC_TXBUF_T_XDP_NDO,
>>> +   FEC_TXBUF_T_XDP_TX,
>>>    };
>>>
>>>    struct fec_tx_buffer {
>>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>> b/drivers/net/ethernet/freescale/fec_main.c
>>> index 14d0dc7ba3c9..2068fe95504e 100644
>>> --- a/drivers/net/ethernet/freescale/fec_main.c
>>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>>> @@ -75,6 +75,8 @@
>>>
>>>    static void set_multicast_list(struct net_device *ndev);
>>>    static void fec_enet_itr_coal_set(struct net_device *ndev);
>>> +static int fec_enet_xdp_tx_xmit(struct net_device *ndev,
>>> +                           struct xdp_buff *xdp);
>>>
>>>    #define DRIVER_NAME       "fec"
>>>
>>> @@ -960,7 +962,8 @@ static void fec_enet_bd_init(struct net_device
>> *dev)
>>>                                      txq->tx_buf[i].skb = NULL;
>>>                              }
>>>                      } else {
>>> -                           if (bdp->cbd_bufaddr)
>>> +                           if (bdp->cbd_bufaddr &&
>>> +                               txq->tx_buf[i].type == FEC_TXBUF_T_XDP_NDO)
>>>                                      dma_unmap_single(&fep->pdev->dev,
>>>                                                       fec32_to_cpu(bdp->cbd_bufaddr),
>>>                                                       fec16_to_cpu(bdp->cbd_datlen),
>>> @@ -1423,7 +1426,8 @@ fec_enet_tx_queue(struct net_device *ndev, u16
>> queue_id, int budget)
>>>                              break;
>>>
>>>                      xdpf = txq->tx_buf[index].xdp;
>>> -                   if (bdp->cbd_bufaddr)
>>> +                   if (bdp->cbd_bufaddr &&
>>> +                       txq->tx_buf[index].type == FEC_TXBUF_T_XDP_NDO)
>>>                              dma_unmap_single(&fep->pdev->dev,
>>>                                               fec32_to_cpu(bdp->cbd_bufaddr),
>>>                                               fec16_to_cpu(bdp->cbd_datlen),
>>> @@ -1482,7 +1486,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16
>> queue_id, int budget)
>>>                      /* Free the sk buffer associated with this last transmit */
>>>                      dev_kfree_skb_any(skb);
>>>              } else {
>>> -                   xdp_return_frame(xdpf);
>>> +                   xdp_return_frame_rx_napi(xdpf);
>>>
>>>                      txq->tx_buf[index].xdp = NULL;
>>>                      /* restore default tx buffer type: FEC_TXBUF_T_SKB */
>>> @@ -1573,11 +1577,18 @@ fec_enet_run_xdp(struct fec_enet_private
>> *fep, struct bpf_prog *prog,
>>>              }
>>>              break;
>>>
>>> -   default:
>>> -           bpf_warn_invalid_xdp_action(fep->netdev, prog, act);
>>> -           fallthrough;
>>> -
>>>      case XDP_TX:
>>> +           err = fec_enet_xdp_tx_xmit(fep->netdev, xdp);
>>
>> You should pass along the "sync" length value to fec_enet_xdp_tx_xmit().
>> Because we know DMA comes from same device (it is already DMA mapped
>> to), then we can do a DMA sync "to_device" with only the sync length.
>>
>>> +           if (err) {
>>
>> Add an unlikely(err) or do like above case XDP_REDIRECT, where it takes
>> the likely case "if (!err)" first.
>>
>>> +                   ret = FEC_ENET_XDP_CONSUMED;
>>> +                   page = virt_to_head_page(xdp->data);
>>> +                   page_pool_put_page(rxq->page_pool, page, sync, true);
>>> +           } else {
>>> +                   ret = FEC_ENET_XDP_TX;
>>> +           }
>>> +           break;
>>> +
>>> +   default:
>>>              bpf_warn_invalid_xdp_action(fep->netdev, prog, act);
>>>              fallthrough;
>>>
>>> @@ -3793,7 +3804,8 @@ fec_enet_xdp_get_tx_queue(struct
>> fec_enet_private *fep, int index)
>>>
>>>    static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,
>>>                                 struct fec_enet_priv_tx_q *txq,
>>> -                              struct xdp_frame *frame)
>>> +                              struct xdp_frame *frame,
>>> +                              bool ndo_xmit)
>>
>> E.g add parameter dma_sync_len.
>>
>>>    {
>>>      unsigned int index, status, estatus;
>>>      struct bufdesc *bdp;
>>> @@ -3813,10 +3825,24 @@ static int fec_enet_txq_xmit_frame(struct
>> fec_enet_private *fep,
>>>
>>>      index = fec_enet_get_bd_index(bdp, &txq->bd);
>>>
>>> -   dma_addr = dma_map_single(&fep->pdev->dev, frame->data,
>>> -                             frame->len, DMA_TO_DEVICE);
>>> -   if (dma_mapping_error(&fep->pdev->dev, dma_addr))
>>> -           return -ENOMEM;
>>> +   if (ndo_xmit) {
>>> +           dma_addr = dma_map_single(&fep->pdev->dev, frame->data,
>>> +                                     frame->len, DMA_TO_DEVICE);
>>> +           if (dma_mapping_error(&fep->pdev->dev, dma_addr))
>>> +                   return -ENOMEM;
>>> +
>>> +           txq->tx_buf[index].type = FEC_TXBUF_T_XDP_NDO;
>>> +   } else {
>>> +           struct page *page = virt_to_page(frame->data);
>>> +
>>> +           dma_addr = page_pool_get_dma_addr(page) + sizeof(*frame) +
>>> +                      frame->headroom;
>>> +           dma_sync_single_for_device(&fep->pdev->dev, dma_addr,
>>> +                                      frame->len, DMA_BIDIRECTIONAL);
>>
>> Optimization: use dma_sync_len here instead of frame->len.
>>
>>> +           txq->tx_buf[index].type = FEC_TXBUF_T_XDP_TX;
>>> +   }
>>> +
>>> +   txq->tx_buf[index].xdp = frame;
>>>
>>>      status |= (BD_ENET_TX_INTR | BD_ENET_TX_LAST);
>>>      if (fep->bufdesc_ex)
>>> @@ -3835,9 +3861,6 @@ static int fec_enet_txq_xmit_frame(struct
>> fec_enet_private *fep,
>>>              ebdp->cbd_esc = cpu_to_fec32(estatus);
>>>      }
>>>
>>> -   txq->tx_buf[index].type = FEC_TXBUF_T_XDP_NDO;
>>> -   txq->tx_buf[index].xdp = frame;
>>> -
>>>      /* Make sure the updates to rest of the descriptor are performed
>> before
>>>       * transferring ownership.
>>>       */
>>> @@ -3863,6 +3886,31 @@ static int fec_enet_txq_xmit_frame(struct
>> fec_enet_private *fep,
>>>      return 0;
>>>    }
>>>
>>> +static int fec_enet_xdp_tx_xmit(struct net_device *ndev,
>>> +                           struct xdp_buff *xdp)
>>> +{
>>
>> E.g add parameter dma_sync_len.
>>
>>> +   struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
>>
>> XDP_TX can avoid this conversion to xdp_frame.
>> It would requires some refactor of fec_enet_txq_xmit_frame().
>>
>>> +   struct fec_enet_private *fep = netdev_priv(ndev);
>>> +   struct fec_enet_priv_tx_q *txq;
>>> +   int cpu = smp_processor_id();
>>> +   struct netdev_queue *nq;
>>> +   int queue, ret;
>>> +
>>> +   queue = fec_enet_xdp_get_tx_queue(fep, cpu);
>>> +   txq = fep->tx_queue[queue];
>>> +   nq = netdev_get_tx_queue(fep->netdev, queue);
>>> +
>>> +   __netif_tx_lock(nq, cpu);
>>
>> It is sad that XDP_TX takes a lock for each frame.
>>
>>> +
>>> +   /* Avoid tx timeout as XDP shares the queue with kernel stack */
>>> +   txq_trans_cond_update(nq);
>>> +   ret = fec_enet_txq_xmit_frame(fep, txq, xdpf, false);
>>
>> Add/pass parameter dma_sync_len to fec_enet_txq_xmit_frame().
>>
>>
>>> +
>>> +   __netif_tx_unlock(nq);
>>> +
>>> +   return ret;
>>> +}
>>> +
>>>    static int fec_enet_xdp_xmit(struct net_device *dev,
>>>                           int num_frames,
>>>                           struct xdp_frame **frames,
>>> @@ -3885,7 +3933,7 @@ static int fec_enet_xdp_xmit(struct net_device
>> *dev,
>>>      /* Avoid tx timeout as XDP shares the queue with kernel stack */
>>>      txq_trans_cond_update(nq);
>>>      for (i = 0; i < num_frames; i++) {
>>> -           if (fec_enet_txq_xmit_frame(fep, txq, frames[i]) < 0)
>>> +           if (fec_enet_txq_xmit_frame(fep, txq, frames[i], true) < 0)
>>>                      break;
>>>              sent_frames++;
>>>      }
> 


  reply	other threads:[~2023-08-02 12:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31  6:00 [PATCH V3 net-next] net: fec: add XDP_TX feature support Wei Fang
2023-08-01 15:34 ` Larysa Zaremba
2023-08-01 19:51   ` Jakub Kicinski
2023-08-02 11:46     ` Larysa Zaremba
2023-08-01 21:57 ` Jakub Kicinski
2023-08-02  2:43   ` Wei Fang
2023-08-02  9:26 ` Jesper Dangaard Brouer
2023-08-02  9:59   ` Wei Fang
2023-08-02 12:34     ` Jesper Dangaard Brouer [this message]
2023-08-02 12:33   ` Wei Fang
2023-08-04 12:09     ` Jesper Dangaard Brouer
2023-08-07  8:41       ` Wei Fang
2023-08-02 17:47 ` Jakub Kicinski
2023-08-03  3:58   ` Wei Fang
2023-08-03  7:36     ` Jesper Dangaard Brouer
2023-08-03 11:18       ` Wei Fang
2023-08-03 12:55         ` Jesper Dangaard Brouer
2023-08-04  3:06           ` Wei Fang
2023-08-04 12:36             ` Jesper Dangaard Brouer
2023-08-07 10:30               ` Wei Fang
2023-08-07 13:15                 ` Jesper Dangaard Brouer
2023-08-08  5:02                   ` Wei Fang
2023-08-08 11:33                     ` Jesper Dangaard Brouer
2023-08-09  6:22                       ` Wei Fang
2023-08-07 16:33                 ` Jakub Kicinski
2023-08-08  0:19                   ` Wei Fang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cfddca7-0c22-01ca-de43-a2b3f71aa75a@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=andrew@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shenwei.wang@nxp.com \
    --cc=wei.fang@nxp.com \
    --cc=xiaoning.wang@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).