From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Richard Cochran <richardcochran@gmail.com>,
Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>,
Benjamin Steinke <benjamin.steinke@woks-audio.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>,
Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Subject: Re: [PATCH iwl-next v9 6/6] igb: Add AF_XDP zero-copy Tx support
Date: Fri, 18 Oct 2024 12:07:00 +0200 [thread overview]
Message-ID: <ZxIzRJlXA91Bapwt@boxer> (raw)
In-Reply-To: <20241018-b4-igb_zero_copy-v9-6-da139d78d796@linutronix.de>
On Fri, Oct 18, 2024 at 10:40:02AM +0200, Kurt Kanzenbach wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>
> Add support for AF_XDP zero-copy transmit path.
>
> A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
> frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
> igb_clean_tx_irq() can clean the buffers correctly based on type.
>
> igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
> enabled. We share the TX ring between slow path, XDP and AF_XDP
> zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
>
> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
> Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
> igb_xmit_zc(), Add FIXME for RS bit]
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
I didn't give you my tag on this patch in previous revision, but from what
I can see now it can stay here:)
Finally, thanks!
> ---
> drivers/net/ethernet/intel/igb/igb.h | 2 +
> drivers/net/ethernet/intel/igb/igb_main.c | 61 +++++++++++++++++++++++++-----
> drivers/net/ethernet/intel/igb/igb_xsk.c | 63 +++++++++++++++++++++++++++++++
> 3 files changed, 116 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index e4a85867aa18..02f340280d20 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -258,6 +258,7 @@ enum igb_tx_flags {
> enum igb_tx_buf_type {
> IGB_TYPE_SKB = 0,
> IGB_TYPE_XDP,
> + IGB_TYPE_XSK
> };
>
> /* wrapper around a pointer to a socket buffer,
> @@ -859,6 +860,7 @@ bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
> void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
> int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> struct xsk_buff_pool *xsk_pool, const int budget);
> +bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool);
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
>
> #endif /* _IGB_H_ */
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 711b60cab594..4587877d1761 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2979,6 +2979,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
> if (unlikely(!tx_ring))
> return -ENXIO;
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return -ENXIO;
> +
> nq = txring_txq(tx_ring);
> __netif_tx_lock(nq, cpu);
>
> @@ -3326,7 +3329,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> netdev->priv_flags |= IFF_SUPP_NOFCS;
>
> netdev->priv_flags |= IFF_UNICAST_FLT;
> - netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
> + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
> + NETDEV_XDP_ACT_XSK_ZEROCOPY;
>
> /* MTU range: 68 - 9216 */
> netdev->min_mtu = ETH_MIN_MTU;
> @@ -4900,15 +4904,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> {
> u16 i = tx_ring->next_to_clean;
> struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
> + u32 xsk_frames = 0;
>
> while (i != tx_ring->next_to_use) {
> union e1000_adv_tx_desc *eop_desc, *tx_desc;
>
> /* Free all the Tx ring sk_buffs or xdp frames */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> dev_kfree_skb_any(tx_buffer->skb);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -4939,6 +4948,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> DMA_TO_DEVICE);
> }
>
> +skip_for_xsk:
> tx_buffer->next_to_watch = NULL;
>
> /* move us one more past the eop_desc for start of next pkt */
> @@ -4953,6 +4963,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> /* reset BQL for queue */
> netdev_tx_reset_queue(txring_txq(tx_ring));
>
> + if (tx_ring->xsk_pool && xsk_frames)
> + xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
> +
> /* reset next_to_use and next_to_clean */
> tx_ring->next_to_use = 0;
> tx_ring->next_to_clean = 0;
> @@ -6486,6 +6499,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
> return NETDEV_TX_BUSY;
> }
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return NETDEV_TX_BUSY;
> +
> /* record the location of the first descriptor for this packet */
> first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> first->type = IGB_TYPE_SKB;
> @@ -8260,13 +8276,18 @@ static int igb_poll(struct napi_struct *napi, int budget)
> **/
> static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> {
> - struct igb_adapter *adapter = q_vector->adapter;
> - struct igb_ring *tx_ring = q_vector->tx.ring;
> - struct igb_tx_buffer *tx_buffer;
> - union e1000_adv_tx_desc *tx_desc;
> unsigned int total_bytes = 0, total_packets = 0;
> + struct igb_adapter *adapter = q_vector->adapter;
> unsigned int budget = q_vector->tx.work_limit;
> + struct igb_ring *tx_ring = q_vector->tx.ring;
> unsigned int i = tx_ring->next_to_clean;
> + union e1000_adv_tx_desc *tx_desc;
> + struct igb_tx_buffer *tx_buffer;
> + struct xsk_buff_pool *xsk_pool;
> + int cpu = smp_processor_id();
> + bool xsk_xmit_done = true;
> + struct netdev_queue *nq;
> + u32 xsk_frames = 0;
>
> if (test_bit(__IGB_DOWN, &adapter->state))
> return true;
> @@ -8297,10 +8318,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> total_packets += tx_buffer->gso_segs;
>
> /* free the skb */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> napi_consume_skb(tx_buffer->skb, napi_budget);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -8332,6 +8357,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> +skip_for_xsk:
> /* move us one more past the eop_desc for start of next pkt */
> tx_buffer++;
> tx_desc++;
> @@ -8360,6 +8386,21 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> q_vector->tx.total_bytes += total_bytes;
> q_vector->tx.total_packets += total_packets;
>
> + xsk_pool = READ_ONCE(tx_ring->xsk_pool);
> + if (xsk_pool) {
> + if (xsk_frames)
> + xsk_tx_completed(xsk_pool, xsk_frames);
> + if (xsk_uses_need_wakeup(xsk_pool))
> + xsk_set_tx_need_wakeup(xsk_pool);
> +
> + nq = txring_txq(tx_ring);
> + __netif_tx_lock(nq, cpu);
> + /* Avoid transmit queue timeout since we share it with the slow path */
> + txq_trans_cond_update(nq);
> + xsk_xmit_done = igb_xmit_zc(tx_ring, xsk_pool);
> + __netif_tx_unlock(nq);
> + }
> +
> if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
> struct e1000_hw *hw = &adapter->hw;
>
> @@ -8422,7 +8463,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> - return !!budget;
> + return !!budget && xsk_xmit_done;
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 3d64a9f6360c..157d43787fa0 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -461,6 +461,69 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> return failure ? budget : (int)total_packets;
> }
>
> +bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool)
> +{
> + unsigned int budget = igb_desc_unused(tx_ring);
> + u32 cmd_type, olinfo_status, nb_pkts, i = 0;
> + struct xdp_desc *descs = xsk_pool->tx_descs;
> + union e1000_adv_tx_desc *tx_desc = NULL;
> + struct igb_tx_buffer *tx_buffer_info;
> + unsigned int total_bytes = 0;
> + dma_addr_t dma;
> +
> + if (!netif_carrier_ok(tx_ring->netdev))
> + return true;
> +
> + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))
> + return true;
> +
> + nb_pkts = xsk_tx_peek_release_desc_batch(xsk_pool, budget);
> + if (!nb_pkts)
> + return true;
> +
> + while (nb_pkts-- > 0) {
> + dma = xsk_buff_raw_get_dma(xsk_pool, descs[i].addr);
> + xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, descs[i].len);
> +
> + tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + tx_buffer_info->bytecount = descs[i].len;
> + tx_buffer_info->type = IGB_TYPE_XSK;
> + tx_buffer_info->xdpf = NULL;
> + tx_buffer_info->gso_segs = 1;
> + tx_buffer_info->time_stamp = jiffies;
> +
> + tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use);
> + tx_desc->read.buffer_addr = cpu_to_le64(dma);
> +
> + /* put descriptor type bits */
> + cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
> + E1000_ADVTXD_DCMD_IFCS;
> + olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
> +
> + /* FIXME: This sets the Report Status (RS) bit for every
> + * descriptor. One nice to have optimization would be to set it
> + * only for the last descriptor in the whole batch. See Intel
> + * ice driver for an example on how to do it.
> + */
> + cmd_type |= descs[i].len | IGB_TXD_DCMD;
> + tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> + tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +
> + total_bytes += descs[i].len;
> +
> + i++;
> + tx_ring->next_to_use++;
> + tx_buffer_info->next_to_watch = tx_desc;
> + if (tx_ring->next_to_use == tx_ring->count)
> + tx_ring->next_to_use = 0;
> + }
> +
> + netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes);
> + igb_xdp_ring_update_tail(tx_ring);
> +
> + return nb_pkts < budget;
> +}
> +
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> {
> struct igb_adapter *adapter = netdev_priv(dev);
>
> --
> 2.39.5
>
WARNING: multiple messages have this Message-ID (diff)
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Richard Cochran <richardcochran@gmail.com>,
Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>,
Benjamin Steinke <benjamin.steinke@woks-audio.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>,
Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v9 6/6] igb: Add AF_XDP zero-copy Tx support
Date: Fri, 18 Oct 2024 12:07:00 +0200 [thread overview]
Message-ID: <ZxIzRJlXA91Bapwt@boxer> (raw)
In-Reply-To: <20241018-b4-igb_zero_copy-v9-6-da139d78d796@linutronix.de>
On Fri, Oct 18, 2024 at 10:40:02AM +0200, Kurt Kanzenbach wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>
> Add support for AF_XDP zero-copy transmit path.
>
> A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
> frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
> igb_clean_tx_irq() can clean the buffers correctly based on type.
>
> igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
> enabled. We share the TX ring between slow path, XDP and AF_XDP
> zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
>
> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
> Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
> igb_xmit_zc(), Add FIXME for RS bit]
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
I didn't give you my tag on this patch in previous revision, but from what
I can see now it can stay here:)
Finally, thanks!
> ---
> drivers/net/ethernet/intel/igb/igb.h | 2 +
> drivers/net/ethernet/intel/igb/igb_main.c | 61 +++++++++++++++++++++++++-----
> drivers/net/ethernet/intel/igb/igb_xsk.c | 63 +++++++++++++++++++++++++++++++
> 3 files changed, 116 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index e4a85867aa18..02f340280d20 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -258,6 +258,7 @@ enum igb_tx_flags {
> enum igb_tx_buf_type {
> IGB_TYPE_SKB = 0,
> IGB_TYPE_XDP,
> + IGB_TYPE_XSK
> };
>
> /* wrapper around a pointer to a socket buffer,
> @@ -859,6 +860,7 @@ bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
> void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
> int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> struct xsk_buff_pool *xsk_pool, const int budget);
> +bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool);
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
>
> #endif /* _IGB_H_ */
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 711b60cab594..4587877d1761 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2979,6 +2979,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
> if (unlikely(!tx_ring))
> return -ENXIO;
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return -ENXIO;
> +
> nq = txring_txq(tx_ring);
> __netif_tx_lock(nq, cpu);
>
> @@ -3326,7 +3329,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> netdev->priv_flags |= IFF_SUPP_NOFCS;
>
> netdev->priv_flags |= IFF_UNICAST_FLT;
> - netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
> + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
> + NETDEV_XDP_ACT_XSK_ZEROCOPY;
>
> /* MTU range: 68 - 9216 */
> netdev->min_mtu = ETH_MIN_MTU;
> @@ -4900,15 +4904,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> {
> u16 i = tx_ring->next_to_clean;
> struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
> + u32 xsk_frames = 0;
>
> while (i != tx_ring->next_to_use) {
> union e1000_adv_tx_desc *eop_desc, *tx_desc;
>
> /* Free all the Tx ring sk_buffs or xdp frames */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> dev_kfree_skb_any(tx_buffer->skb);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -4939,6 +4948,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> DMA_TO_DEVICE);
> }
>
> +skip_for_xsk:
> tx_buffer->next_to_watch = NULL;
>
> /* move us one more past the eop_desc for start of next pkt */
> @@ -4953,6 +4963,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> /* reset BQL for queue */
> netdev_tx_reset_queue(txring_txq(tx_ring));
>
> + if (tx_ring->xsk_pool && xsk_frames)
> + xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
> +
> /* reset next_to_use and next_to_clean */
> tx_ring->next_to_use = 0;
> tx_ring->next_to_clean = 0;
> @@ -6486,6 +6499,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
> return NETDEV_TX_BUSY;
> }
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return NETDEV_TX_BUSY;
> +
> /* record the location of the first descriptor for this packet */
> first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> first->type = IGB_TYPE_SKB;
> @@ -8260,13 +8276,18 @@ static int igb_poll(struct napi_struct *napi, int budget)
> **/
> static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> {
> - struct igb_adapter *adapter = q_vector->adapter;
> - struct igb_ring *tx_ring = q_vector->tx.ring;
> - struct igb_tx_buffer *tx_buffer;
> - union e1000_adv_tx_desc *tx_desc;
> unsigned int total_bytes = 0, total_packets = 0;
> + struct igb_adapter *adapter = q_vector->adapter;
> unsigned int budget = q_vector->tx.work_limit;
> + struct igb_ring *tx_ring = q_vector->tx.ring;
> unsigned int i = tx_ring->next_to_clean;
> + union e1000_adv_tx_desc *tx_desc;
> + struct igb_tx_buffer *tx_buffer;
> + struct xsk_buff_pool *xsk_pool;
> + int cpu = smp_processor_id();
> + bool xsk_xmit_done = true;
> + struct netdev_queue *nq;
> + u32 xsk_frames = 0;
>
> if (test_bit(__IGB_DOWN, &adapter->state))
> return true;
> @@ -8297,10 +8318,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> total_packets += tx_buffer->gso_segs;
>
> /* free the skb */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> napi_consume_skb(tx_buffer->skb, napi_budget);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -8332,6 +8357,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> +skip_for_xsk:
> /* move us one more past the eop_desc for start of next pkt */
> tx_buffer++;
> tx_desc++;
> @@ -8360,6 +8386,21 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> q_vector->tx.total_bytes += total_bytes;
> q_vector->tx.total_packets += total_packets;
>
> + xsk_pool = READ_ONCE(tx_ring->xsk_pool);
> + if (xsk_pool) {
> + if (xsk_frames)
> + xsk_tx_completed(xsk_pool, xsk_frames);
> + if (xsk_uses_need_wakeup(xsk_pool))
> + xsk_set_tx_need_wakeup(xsk_pool);
> +
> + nq = txring_txq(tx_ring);
> + __netif_tx_lock(nq, cpu);
> + /* Avoid transmit queue timeout since we share it with the slow path */
> + txq_trans_cond_update(nq);
> + xsk_xmit_done = igb_xmit_zc(tx_ring, xsk_pool);
> + __netif_tx_unlock(nq);
> + }
> +
> if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
> struct e1000_hw *hw = &adapter->hw;
>
> @@ -8422,7 +8463,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> - return !!budget;
> + return !!budget && xsk_xmit_done;
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 3d64a9f6360c..157d43787fa0 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -461,6 +461,69 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> return failure ? budget : (int)total_packets;
> }
>
> +bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool)
> +{
> + unsigned int budget = igb_desc_unused(tx_ring);
> + u32 cmd_type, olinfo_status, nb_pkts, i = 0;
> + struct xdp_desc *descs = xsk_pool->tx_descs;
> + union e1000_adv_tx_desc *tx_desc = NULL;
> + struct igb_tx_buffer *tx_buffer_info;
> + unsigned int total_bytes = 0;
> + dma_addr_t dma;
> +
> + if (!netif_carrier_ok(tx_ring->netdev))
> + return true;
> +
> + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))
> + return true;
> +
> + nb_pkts = xsk_tx_peek_release_desc_batch(xsk_pool, budget);
> + if (!nb_pkts)
> + return true;
> +
> + while (nb_pkts-- > 0) {
> + dma = xsk_buff_raw_get_dma(xsk_pool, descs[i].addr);
> + xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, descs[i].len);
> +
> + tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + tx_buffer_info->bytecount = descs[i].len;
> + tx_buffer_info->type = IGB_TYPE_XSK;
> + tx_buffer_info->xdpf = NULL;
> + tx_buffer_info->gso_segs = 1;
> + tx_buffer_info->time_stamp = jiffies;
> +
> + tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use);
> + tx_desc->read.buffer_addr = cpu_to_le64(dma);
> +
> + /* put descriptor type bits */
> + cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
> + E1000_ADVTXD_DCMD_IFCS;
> + olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
> +
> + /* FIXME: This sets the Report Status (RS) bit for every
> + * descriptor. One nice to have optimization would be to set it
> + * only for the last descriptor in the whole batch. See Intel
> + * ice driver for an example on how to do it.
> + */
> + cmd_type |= descs[i].len | IGB_TXD_DCMD;
> + tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> + tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +
> + total_bytes += descs[i].len;
> +
> + i++;
> + tx_ring->next_to_use++;
> + tx_buffer_info->next_to_watch = tx_desc;
> + if (tx_ring->next_to_use == tx_ring->count)
> + tx_ring->next_to_use = 0;
> + }
> +
> + netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes);
> + igb_xdp_ring_update_tail(tx_ring);
> +
> + return nb_pkts < budget;
> +}
> +
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> {
> struct igb_adapter *adapter = netdev_priv(dev);
>
> --
> 2.39.5
>
next prev parent reply other threads:[~2024-10-18 10:07 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 8:39 [PATCH iwl-next v9 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
2024-10-18 8:39 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-18 8:39 ` [PATCH iwl-next v9 1/6] igb: Remove static qualifiers Kurt Kanzenbach
2024-10-18 8:39 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:57 ` Kuruvinakunnel, George
2024-10-26 16:57 ` Kuruvinakunnel, George
2024-10-18 8:39 ` [PATCH iwl-next v9 2/6] igb: Introduce igb_xdp_is_enabled() Kurt Kanzenbach
2024-10-18 8:39 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:55 ` Kuruvinakunnel, George
2024-10-26 16:55 ` Kuruvinakunnel, George
2024-10-18 8:39 ` [PATCH iwl-next v9 3/6] igb: Introduce XSK data structures and helpers Kurt Kanzenbach
2024-10-18 8:39 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:54 ` Kuruvinakunnel, George
2024-10-26 16:54 ` Kuruvinakunnel, George
2024-10-18 8:40 ` [PATCH iwl-next v9 4/6] igb: Add XDP finalize and stats update functions Kurt Kanzenbach
2024-10-18 8:40 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:53 ` Kuruvinakunnel, George
2024-10-26 16:53 ` Kuruvinakunnel, George
2024-10-18 8:40 ` [PATCH iwl-next v9 5/6] igb: Add AF_XDP zero-copy Rx support Kurt Kanzenbach
2024-10-18 8:40 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:51 ` Kuruvinakunnel, George
2024-10-26 16:51 ` Kuruvinakunnel, George
2024-10-18 8:40 ` [PATCH iwl-next v9 6/6] igb: Add AF_XDP zero-copy Tx support Kurt Kanzenbach
2024-10-18 8:40 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-18 10:07 ` Maciej Fijalkowski [this message]
2024-10-18 10:07 ` Maciej Fijalkowski
2024-10-18 10:45 ` Kurt Kanzenbach
2024-10-18 10:45 ` [Intel-wired-lan] " Kurt Kanzenbach
2024-10-26 16:45 ` Kuruvinakunnel, George
2024-10-26 16:45 ` Kuruvinakunnel, George
2024-10-18 10:10 ` [PATCH iwl-next v9 0/6] igb: Add support for AF_XDP zero-copy Maciej Fijalkowski
2024-10-18 10:10 ` [Intel-wired-lan] " Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxIzRJlXA91Bapwt@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=benjamin.steinke@woks-audio.com \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=richardcochran@gmail.com \
--cc=sriram.yagnaraman@ericsson.com \
--cc=sriram.yagnaraman@est.tech \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.