* [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
@ 2023-08-04 8:40 Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 1/4] igb: prepare for AF_XDP zero-copy support Sriram Yagnaraman
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Sriram Yagnaraman @ 2023-08-04 8:40 UTC (permalink / raw)
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
The first couple of patches adds helper funcctions to prepare for AF_XDP
zero-copy support which comes in the last couple of patches, one each
for Rx and TX paths.
As mentioned in v1 patchset [0], I don't have access to an actual IGB
device to provide correct performance numbers. I have used Intel 82576EB
emulator in QEMU [1] to test the changes to IGB driver.
The tests use one isolated vCPU for RX/TX and one isolated vCPU for the
xdp-sock application [2]. Hope these measurements provide at the least
some indication on the increase in performance when using ZC, especially
in the TX path. It would be awesome if someone with a real IGB NIC can
test the patch.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 220 235 350
txpush 1.000 1.000 410
l2fwd 1.000 1.000 200
AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 200 210 310
txpush 1.000 1.000 410
l2fwd 0.900 1.000 160
[0]: https://lore.kernel.org/intel-wired-lan/20230704095915.9750-1-sriram.yagnaraman@est.tech/
[1]: https://www.qemu.org/docs/master/system/devices/igb.html
[2]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
v3->v4:
- NULL check buffer_info in igb_dump before dereferencing (Simon Horman)
v2->v3:
- Avoid TX unit hang when using AF_XDP zero-copy by setting time_stamp
on the tx_buffer_info
- Fix uninitialized nb_buffs (Simon Horman)
v1->v2:
- Use batch XSK APIs (Maciej Fijalkowski)
- Follow reverse xmas tree convention and remove the ternary operator
use (Simon Horman)
Sriram Yagnaraman (4):
igb: prepare for AF_XDP zero-copy support
igb: Introduce XSK data structures and helpers
igb: add AF_XDP zero-copy Rx support
igb: add AF_XDP zero-copy Tx support
drivers/net/ethernet/intel/igb/Makefile | 2 +-
drivers/net/ethernet/intel/igb/igb.h | 35 +-
drivers/net/ethernet/intel/igb/igb_main.c | 186 ++++++--
drivers/net/ethernet/intel/igb/igb_xsk.c | 522 ++++++++++++++++++++++
4 files changed, 698 insertions(+), 47 deletions(-)
create mode 100644 drivers/net/ethernet/intel/igb/igb_xsk.c
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH iwl-next v4 1/4] igb: prepare for AF_XDP zero-copy support
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
@ 2023-08-04 8:40 ` Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 2/4] igb: Introduce XSK data structures and helpers Sriram Yagnaraman
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Sriram Yagnaraman @ 2023-08-04 8:40 UTC (permalink / raw)
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
Always call igb_xdp_ring_update_tail under __netif_tx_lock, add a
comment to indicate that. This is needed to share the same TX ring
between XDP, XSK and slow paths.
Remove static qualifiers on the following functions to be able to call
from XSK specific file that is added in the later patches
- igb_xdp_tx_queue_mapping
- igb_xdp_ring_update_tail
- igb_clean_tx_ring
- igb_clean_rx_ring
- igb_run_xdp
- igb_process_skb_fields
Introduce igb_xdp_is_enabled() to check if an XDP program is assigned to
the device.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
drivers/net/ethernet/intel/igb/igb.h | 15 ++++++++++++
drivers/net/ethernet/intel/igb/igb_main.c | 29 +++++++++++------------
2 files changed, 29 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 015b78144114..58e79eb69f92 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -718,6 +718,8 @@ extern char igb_driver_name[];
int igb_xmit_xdp_ring(struct igb_adapter *adapter,
struct igb_ring *ring,
struct xdp_frame *xdpf);
+struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter);
+void igb_xdp_ring_update_tail(struct igb_ring *ring);
int igb_open(struct net_device *netdev);
int igb_close(struct net_device *netdev);
int igb_up(struct igb_adapter *);
@@ -731,12 +733,20 @@ int igb_setup_tx_resources(struct igb_ring *);
int igb_setup_rx_resources(struct igb_ring *);
void igb_free_tx_resources(struct igb_ring *);
void igb_free_rx_resources(struct igb_ring *);
+void igb_clean_tx_ring(struct igb_ring *tx_ring);
+void igb_clean_rx_ring(struct igb_ring *rx_ring);
void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *);
void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *);
void igb_setup_tctl(struct igb_adapter *);
void igb_setup_rctl(struct igb_adapter *);
void igb_setup_srrctl(struct igb_adapter *, struct igb_ring *);
netdev_tx_t igb_xmit_frame_ring(struct sk_buff *, struct igb_ring *);
+struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
+ struct igb_ring *rx_ring,
+ struct xdp_buff *xdp);
+void igb_process_skb_fields(struct igb_ring *rx_ring,
+ union e1000_adv_rx_desc *rx_desc,
+ struct sk_buff *skb);
void igb_alloc_rx_buffers(struct igb_ring *, u16);
void igb_update_stats(struct igb_adapter *);
bool igb_has_link(struct igb_adapter *adapter);
@@ -797,6 +807,11 @@ static inline struct netdev_queue *txring_txq(const struct igb_ring *tx_ring)
return netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
}
+static inline bool igb_xdp_is_enabled(struct igb_adapter *adapter)
+{
+ return !!adapter->xdp_prog;
+}
+
int igb_add_filter(struct igb_adapter *adapter,
struct igb_nfc_filter *input);
int igb_erase_filter(struct igb_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9a2561409b06..775c78df73fb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -117,8 +117,6 @@ static void igb_configure_tx(struct igb_adapter *);
static void igb_configure_rx(struct igb_adapter *);
static void igb_clean_all_tx_rings(struct igb_adapter *);
static void igb_clean_all_rx_rings(struct igb_adapter *);
-static void igb_clean_tx_ring(struct igb_ring *);
-static void igb_clean_rx_ring(struct igb_ring *);
static void igb_set_rx_mode(struct net_device *);
static void igb_update_phy_info(struct timer_list *);
static void igb_watchdog(struct timer_list *);
@@ -2939,7 +2937,8 @@ static int igb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}
-static void igb_xdp_ring_update_tail(struct igb_ring *ring)
+/* This function assumes __netif_tx_lock is held by the caller. */
+void igb_xdp_ring_update_tail(struct igb_ring *ring)
{
/* Force memory writes to complete before letting h/w know there
* are new descriptors to fetch.
@@ -2948,7 +2947,7 @@ static void igb_xdp_ring_update_tail(struct igb_ring *ring)
writel(ring->next_to_use, ring->tail);
}
-static struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter)
+struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter)
{
unsigned int r_idx = smp_processor_id();
@@ -3025,11 +3024,11 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
nxmit++;
}
- __netif_tx_unlock(nq);
-
if (unlikely(flags & XDP_XMIT_FLUSH))
igb_xdp_ring_update_tail(tx_ring);
+ __netif_tx_unlock(nq);
+
return nxmit;
}
@@ -4897,7 +4896,7 @@ static void igb_free_all_tx_resources(struct igb_adapter *adapter)
* igb_clean_tx_ring - Free Tx Buffers
* @tx_ring: ring to be cleaned
**/
-static void igb_clean_tx_ring(struct igb_ring *tx_ring)
+void igb_clean_tx_ring(struct igb_ring *tx_ring)
{
u16 i = tx_ring->next_to_clean;
struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
@@ -5016,7 +5015,7 @@ static void igb_free_all_rx_resources(struct igb_adapter *adapter)
* igb_clean_rx_ring - Free Rx Buffers per Queue
* @rx_ring: ring to free buffers from
**/
-static void igb_clean_rx_ring(struct igb_ring *rx_ring)
+void igb_clean_rx_ring(struct igb_ring *rx_ring)
{
u16 i = rx_ring->next_to_clean;
@@ -6631,7 +6630,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
struct igb_adapter *adapter = netdev_priv(netdev);
int max_frame = new_mtu + IGB_ETH_PKT_HDR_PAD;
- if (adapter->xdp_prog) {
+ if (igb_xdp_is_enabled(adapter)) {
int i;
for (i = 0; i < adapter->num_rx_queues; i++) {
@@ -8600,9 +8599,9 @@ static struct sk_buff *igb_build_skb(struct igb_ring *rx_ring,
return skb;
}
-static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
- struct igb_ring *rx_ring,
- struct xdp_buff *xdp)
+struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
+ struct igb_ring *rx_ring,
+ struct xdp_buff *xdp)
{
int err, result = IGB_XDP_PASS;
struct bpf_prog *xdp_prog;
@@ -8798,9 +8797,9 @@ static bool igb_cleanup_headers(struct igb_ring *rx_ring,
* order to populate the hash, checksum, VLAN, timestamp, protocol, and
* other fields within the skb.
**/
-static void igb_process_skb_fields(struct igb_ring *rx_ring,
- union e1000_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
+void igb_process_skb_fields(struct igb_ring *rx_ring,
+ union e1000_adv_rx_desc *rx_desc,
+ struct sk_buff *skb)
{
struct net_device *dev = rx_ring->netdev;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH iwl-next v4 2/4] igb: Introduce XSK data structures and helpers
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 1/4] igb: prepare for AF_XDP zero-copy support Sriram Yagnaraman
@ 2023-08-04 8:40 ` Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support Sriram Yagnaraman
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Sriram Yagnaraman @ 2023-08-04 8:40 UTC (permalink / raw)
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
Add the following ring flags
- IGB_RING_FLAG_TX_DISABLED (when xsk pool is being setup)
- IGB_RING_FLAG_AF_XDP_ZC (xsk pool is active)
Add a xdp_buff array for use with XSK receive batch API, and a pointer
to xsk_pool in igb_adapter.
Add enable/disable functions for TX and RX rings
Add enable/disable functions for XSK pool
Add xsk wakeup function
None of the above functionality will be active until
NETDEV_XDP_ACT_XSK_ZEROCOPY is advertised in netdev->xdp_features.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
drivers/net/ethernet/intel/igb/Makefile | 2 +-
drivers/net/ethernet/intel/igb/igb.h | 14 +-
drivers/net/ethernet/intel/igb/igb_main.c | 9 +
drivers/net/ethernet/intel/igb/igb_xsk.c | 210 ++++++++++++++++++++++
4 files changed, 233 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ethernet/intel/igb/igb_xsk.c
diff --git a/drivers/net/ethernet/intel/igb/Makefile b/drivers/net/ethernet/intel/igb/Makefile
index 394c1e0656b9..86d25dba507d 100644
--- a/drivers/net/ethernet/intel/igb/Makefile
+++ b/drivers/net/ethernet/intel/igb/Makefile
@@ -8,4 +8,4 @@ obj-$(CONFIG_IGB) += igb.o
igb-objs := igb_main.o igb_ethtool.o e1000_82575.o \
e1000_mac.o e1000_nvm.o e1000_phy.o e1000_mbx.o \
- e1000_i210.o igb_ptp.o igb_hwmon.o
+ e1000_i210.o igb_ptp.o igb_hwmon.o igb_xsk.o
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 58e79eb69f92..1af1a0423fba 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -20,6 +20,7 @@
#include <linux/mdio.h>
#include <net/xdp.h>
+#include <net/xdp_sock_drv.h>
struct igb_adapter;
@@ -320,6 +321,7 @@ struct igb_ring {
union { /* array of buffer info structs */
struct igb_tx_buffer *tx_buffer_info;
struct igb_rx_buffer *rx_buffer_info;
+ struct xdp_buff **rx_buffer_info_zc;
};
void *desc; /* descriptor ring memory */
unsigned long flags; /* ring specific flags */
@@ -357,6 +359,7 @@ struct igb_ring {
};
};
struct xdp_rxq_info xdp_rxq;
+ struct xsk_buff_pool *xsk_pool;
} ____cacheline_internodealigned_in_smp;
struct igb_q_vector {
@@ -384,7 +387,9 @@ enum e1000_ring_flags_t {
IGB_RING_FLAG_RX_SCTP_CSUM,
IGB_RING_FLAG_RX_LB_VLAN_BSWAP,
IGB_RING_FLAG_TX_CTX_IDX,
- IGB_RING_FLAG_TX_DETECT_HANG
+ IGB_RING_FLAG_TX_DETECT_HANG,
+ IGB_RING_FLAG_TX_DISABLED,
+ IGB_RING_FLAG_AF_XDP_ZC
};
#define ring_uses_large_buffer(ring) \
@@ -822,4 +827,11 @@ int igb_add_mac_steering_filter(struct igb_adapter *adapter,
int igb_del_mac_steering_filter(struct igb_adapter *adapter,
const u8 *addr, u8 queue, u8 flags);
+struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
+ struct igb_ring *ring);
+int igb_xsk_pool_setup(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid);
+int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
+
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 775c78df73fb..2c1e1d70bcf9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2929,9 +2929,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf)
static int igb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
+ struct igb_adapter *adapter = netdev_priv(dev);
+
switch (xdp->command) {
case XDP_SETUP_PROG:
return igb_xdp_setup(dev, xdp);
+ case XDP_SETUP_XSK_POOL:
+ return igb_xsk_pool_setup(adapter, xdp->xsk.pool,
+ xdp->xsk.queue_id);
default:
return -EINVAL;
}
@@ -3058,6 +3063,7 @@ static const struct net_device_ops igb_netdev_ops = {
.ndo_setup_tc = igb_setup_tc,
.ndo_bpf = igb_xdp,
.ndo_xdp_xmit = igb_xdp_xmit,
+ .ndo_xsk_wakeup = igb_xsk_wakeup,
};
/**
@@ -4376,6 +4382,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
u64 tdba = ring->dma;
int reg_idx = ring->reg_idx;
+ ring->xsk_pool = igb_xsk_pool(adapter, ring);
+
wr32(E1000_TDLEN(reg_idx),
ring->count * sizeof(union e1000_adv_tx_desc));
wr32(E1000_TDBAL(reg_idx),
@@ -4771,6 +4779,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_PAGE_SHARED, NULL));
+ ring->xsk_pool = igb_xsk_pool(adapter, ring);
/* disable the queue */
wr32(E1000_RXDCTL(reg_idx), 0);
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
new file mode 100644
index 000000000000..925bf97f7caa
--- /dev/null
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Intel Corporation. */
+
+#include <linux/bpf_trace.h>
+#include <net/xdp_sock_drv.h>
+#include <net/xdp.h>
+
+#include "e1000_hw.h"
+#include "igb.h"
+
+static int igb_realloc_rx_buffer_info(struct igb_ring *ring, bool pool_present)
+{
+ int size = pool_present ?
+ sizeof(*ring->rx_buffer_info_zc) * ring->count :
+ sizeof(*ring->rx_buffer_info) * ring->count;
+ void *buff_info = vmalloc(size);
+
+ if (!buff_info)
+ return -ENOMEM;
+
+ if (pool_present) {
+ vfree(ring->rx_buffer_info);
+ ring->rx_buffer_info = NULL;
+ ring->rx_buffer_info_zc = buff_info;
+ } else {
+ vfree(ring->rx_buffer_info_zc);
+ ring->rx_buffer_info_zc = NULL;
+ ring->rx_buffer_info = buff_info;
+ }
+
+ return 0;
+}
+
+static void igb_txrx_ring_disable(struct igb_adapter *adapter, u16 qid)
+{
+ struct igb_ring *tx_ring = adapter->tx_ring[qid];
+ struct igb_ring *rx_ring = adapter->rx_ring[qid];
+ struct e1000_hw *hw = &adapter->hw;
+
+ set_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags);
+
+ wr32(E1000_TXDCTL(tx_ring->reg_idx), 0);
+ wr32(E1000_RXDCTL(rx_ring->reg_idx), 0);
+
+ /* Rx/Tx share the same napi context. */
+ napi_disable(&rx_ring->q_vector->napi);
+
+ igb_clean_tx_ring(tx_ring);
+ igb_clean_rx_ring(rx_ring);
+
+ memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
+ memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid)
+{
+ struct igb_ring *tx_ring = adapter->tx_ring[qid];
+ struct igb_ring *rx_ring = adapter->rx_ring[qid];
+
+ igb_configure_tx_ring(adapter, tx_ring);
+ igb_configure_rx_ring(adapter, rx_ring);
+
+ clear_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags);
+
+ /* call igb_desc_unused which always leaves
+ * at least 1 descriptor unused to make sure
+ * next_to_use != next_to_clean
+ */
+ igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
+
+ /* Rx/Tx share the same napi context. */
+ napi_enable(&rx_ring->q_vector->napi);
+}
+
+struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
+ struct igb_ring *ring)
+{
+ int qid = ring->queue_index;
+
+ if (!igb_xdp_is_enabled(adapter) ||
+ !test_bit(IGB_RING_FLAG_AF_XDP_ZC, &ring->flags))
+ return NULL;
+
+ return xsk_get_pool_from_qid(adapter->netdev, qid);
+}
+
+static int igb_xsk_pool_enable(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct igb_ring *tx_ring, *rx_ring;
+ bool if_running;
+ int err;
+
+ if (qid >= adapter->num_rx_queues)
+ return -EINVAL;
+
+ if (qid >= netdev->real_num_rx_queues ||
+ qid >= netdev->real_num_tx_queues)
+ return -EINVAL;
+
+ err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IGB_RX_DMA_ATTR);
+ if (err)
+ return err;
+
+ tx_ring = adapter->tx_ring[qid];
+ rx_ring = adapter->rx_ring[qid];
+ if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter);
+ if (if_running)
+ igb_txrx_ring_disable(adapter, qid);
+
+ set_bit(IGB_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
+ set_bit(IGB_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
+
+ if (if_running) {
+ err = igb_realloc_rx_buffer_info(rx_ring, true);
+ if (!err) {
+ igb_txrx_ring_enable(adapter, qid);
+ /* Kick start the NAPI context so that receiving will start */
+ err = igb_xsk_wakeup(adapter->netdev, qid, XDP_WAKEUP_RX);
+ }
+
+ if (err) {
+ clear_bit(IGB_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
+ clear_bit(IGB_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
+ xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static int igb_xsk_pool_disable(struct igb_adapter *adapter, u16 qid)
+{
+ struct igb_ring *tx_ring, *rx_ring;
+ struct xsk_buff_pool *pool;
+ bool if_running;
+ int err;
+
+ pool = xsk_get_pool_from_qid(adapter->netdev, qid);
+ if (!pool)
+ return -EINVAL;
+
+ tx_ring = adapter->tx_ring[qid];
+ rx_ring = adapter->rx_ring[qid];
+ if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter);
+ if (if_running)
+ igb_txrx_ring_disable(adapter, qid);
+
+ xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR);
+ clear_bit(IGB_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
+ clear_bit(IGB_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
+
+ if (if_running) {
+ err = igb_realloc_rx_buffer_info(rx_ring, false);
+ if (err)
+ return err;
+
+ igb_txrx_ring_enable(adapter, qid);
+ }
+
+ return 0;
+}
+
+int igb_xsk_pool_setup(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ return pool ? igb_xsk_pool_enable(adapter, pool, qid) :
+ igb_xsk_pool_disable(adapter, qid);
+}
+
+int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
+{
+ struct igb_adapter *adapter = netdev_priv(dev);
+ struct e1000_hw *hw = &adapter->hw;
+ struct igb_ring *ring;
+ u32 eics = 0;
+
+ if (test_bit(__IGB_DOWN, &adapter->state))
+ return -ENETDOWN;
+
+ if (!igb_xdp_is_enabled(adapter))
+ return -EINVAL;
+
+ if (qid >= adapter->num_tx_queues)
+ return -EINVAL;
+
+ ring = adapter->tx_ring[qid];
+
+ if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
+ return -ENETDOWN;
+
+ if (!ring->xsk_pool)
+ return -EINVAL;
+
+ if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
+ /* Cause software interrupt to ensure Rx ring is cleaned */
+ if (adapter->flags & IGB_FLAG_HAS_MSIX) {
+ eics |= ring->q_vector->eims_value;
+ wr32(E1000_EICS, eics);
+ } else {
+ wr32(E1000_ICS, E1000_ICS_RXDMT0);
+ }
+ }
+
+ return 0;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 1/4] igb: prepare for AF_XDP zero-copy support Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 2/4] igb: Introduce XSK data structures and helpers Sriram Yagnaraman
@ 2023-08-04 8:40 ` Sriram Yagnaraman
2023-08-05 14:54 ` [Intel-wired-lan] " kernel test robot
2023-08-04 8:40 ` [PATCH iwl-next v4 4/4] igb: add AF_XDP zero-copy Tx support Sriram Yagnaraman
2024-06-27 7:07 ` [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
4 siblings, 1 reply; 12+ messages in thread
From: Sriram Yagnaraman @ 2023-08-04 8:40 UTC (permalink / raw)
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
Add support for AF_XDP zero-copy receive path.
When AF_XDP zero-copy is enabled, the rx buffers are allocated from the
xsk buff pool using igb_alloc_rx_buffers_zc.
Use xsk_pool_get_rx_frame_size to set SRRCTL rx buf size when zero-copy
is enabled.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
drivers/net/ethernet/intel/igb/igb.h | 4 +
drivers/net/ethernet/intel/igb/igb_main.c | 92 ++++++--
drivers/net/ethernet/intel/igb/igb_xsk.c | 262 +++++++++++++++++++++-
3 files changed, 336 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 1af1a0423fba..39202c40116a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -87,6 +87,7 @@ struct igb_adapter;
#define IGB_XDP_CONSUMED BIT(0)
#define IGB_XDP_TX BIT(1)
#define IGB_XDP_REDIR BIT(2)
+#define IGB_XDP_EXIT BIT(3)
struct vf_data_storage {
unsigned char vf_mac_addresses[ETH_ALEN];
@@ -832,6 +833,9 @@ struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
int igb_xsk_pool_setup(struct igb_adapter *adapter,
struct xsk_buff_pool *pool,
u16 qid);
+bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring, u16 count);
+void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
+int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, const int budget);
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 2c1e1d70bcf9..b13cc94ac178 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -502,12 +502,17 @@ static void igb_dump(struct igb_adapter *adapter)
for (i = 0; i < rx_ring->count; i++) {
const char *next_desc;
- struct igb_rx_buffer *buffer_info;
- buffer_info = &rx_ring->rx_buffer_info[i];
+ dma_addr_t dma = (dma_addr_t)NULL;
+ struct igb_rx_buffer *buffer_info = NULL;
rx_desc = IGB_RX_DESC(rx_ring, i);
u0 = (struct my_u0 *)rx_desc;
staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
+ if (!rx_ring->xsk_pool) {
+ buffer_info = &rx_ring->rx_buffer_info[i];
+ dma = buffer_info->dma;
+ }
+
if (i == rx_ring->next_to_use)
next_desc = " NTU";
else if (i == rx_ring->next_to_clean)
@@ -527,11 +532,11 @@ static void igb_dump(struct igb_adapter *adapter)
"R ", i,
le64_to_cpu(u0->a),
le64_to_cpu(u0->b),
- (u64)buffer_info->dma,
+ (u64)dma,
next_desc);
if (netif_msg_pktdata(adapter) &&
- buffer_info->dma && buffer_info->page) {
+ buffer_info && dma && buffer_info->page) {
print_hex_dump(KERN_INFO, "",
DUMP_PREFIX_ADDRESS,
16, 1,
@@ -2011,7 +2016,10 @@ static void igb_configure(struct igb_adapter *adapter)
*/
for (i = 0; i < adapter->num_rx_queues; i++) {
struct igb_ring *ring = adapter->rx_ring[i];
- igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
+ if (ring->xsk_pool)
+ igb_alloc_rx_buffers_zc(ring, igb_desc_unused(ring));
+ else
+ igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
}
}
@@ -3360,7 +3368,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netdev->priv_flags |= IFF_SUPP_NOFCS;
netdev->priv_flags |= IFF_UNICAST_FLT;
- netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
+ netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_XSK_ZEROCOPY;
/* MTU range: 68 - 9216 */
netdev->min_mtu = ETH_MIN_MTU;
@@ -4740,12 +4749,17 @@ void igb_setup_srrctl(struct igb_adapter *adapter, struct igb_ring *ring)
struct e1000_hw *hw = &adapter->hw;
int reg_idx = ring->reg_idx;
u32 srrctl = 0;
+ u32 buf_size;
- srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
- if (ring_uses_large_buffer(ring))
- srrctl |= IGB_RXBUFFER_3072 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ if (ring->xsk_pool)
+ buf_size = xsk_pool_get_rx_frame_size(ring->xsk_pool);
+ else if (ring_uses_large_buffer(ring))
+ buf_size = IGB_RXBUFFER_3072;
else
- srrctl |= IGB_RXBUFFER_2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ buf_size = IGB_RXBUFFER_2048;
+
+ srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
+ srrctl |= buf_size >> E1000_SRRCTL_BSIZEPKT_SHIFT;
srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
if (hw->mac.type >= e1000_82580)
srrctl |= E1000_SRRCTL_TIMESTAMP;
@@ -4777,9 +4791,17 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
u32 rxdctl = 0;
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
- WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
- MEM_TYPE_PAGE_SHARED, NULL));
ring->xsk_pool = igb_xsk_pool(adapter, ring);
+ if (ring->xsk_pool) {
+ WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+ MEM_TYPE_XSK_BUFF_POOL,
+ NULL));
+ xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
+ } else {
+ WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+ MEM_TYPE_PAGE_SHARED,
+ NULL));
+ }
/* disable the queue */
wr32(E1000_RXDCTL(reg_idx), 0);
@@ -4806,9 +4828,12 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
rxdctl |= IGB_RX_HTHRESH << 8;
rxdctl |= IGB_RX_WTHRESH << 16;
- /* initialize rx_buffer_info */
- memset(ring->rx_buffer_info, 0,
- sizeof(struct igb_rx_buffer) * ring->count);
+ if (ring->xsk_pool)
+ memset(ring->rx_buffer_info_zc, 0,
+ sizeof(*ring->rx_buffer_info_zc) * ring->count);
+ else
+ memset(ring->rx_buffer_info, 0,
+ sizeof(*ring->rx_buffer_info) * ring->count);
/* initialize Rx descriptor 0 */
rx_desc = IGB_RX_DESC(ring, 0);
@@ -4992,8 +5017,13 @@ void igb_free_rx_resources(struct igb_ring *rx_ring)
rx_ring->xdp_prog = NULL;
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
- vfree(rx_ring->rx_buffer_info);
- rx_ring->rx_buffer_info = NULL;
+ if (rx_ring->xsk_pool) {
+ vfree(rx_ring->rx_buffer_info_zc);
+ rx_ring->rx_buffer_info_zc = NULL;
+ } else {
+ vfree(rx_ring->rx_buffer_info);
+ rx_ring->rx_buffer_info = NULL;
+ }
/* if not set, then don't free */
if (!rx_ring->desc)
@@ -5031,6 +5061,11 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
dev_kfree_skb(rx_ring->skb);
rx_ring->skb = NULL;
+ if (rx_ring->xsk_pool) {
+ igb_clean_rx_ring_zc(rx_ring);
+ goto skip_for_xsk;
+ }
+
/* Free all the Rx ring sk_buffs */
while (i != rx_ring->next_to_alloc) {
struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
@@ -5058,6 +5093,7 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
i = 0;
}
+skip_for_xsk:
rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
@@ -8226,7 +8262,9 @@ static int igb_poll(struct napi_struct *napi, int budget)
clean_complete = igb_clean_tx_irq(q_vector, budget);
if (q_vector->rx.ring) {
- int cleaned = igb_clean_rx_irq(q_vector, budget);
+ int cleaned = q_vector->rx.ring->xsk_pool ?
+ igb_clean_rx_irq_zc(q_vector, budget) :
+ igb_clean_rx_irq(q_vector, budget);
work_done += cleaned;
if (cleaned >= budget)
@@ -8634,8 +8672,15 @@ struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
break;
case XDP_REDIRECT:
err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ if (rx_ring->xsk_pool &&
+ xsk_uses_need_wakeup(rx_ring->xsk_pool) &&
+ err == -ENOBUFS)
+ result = IGB_XDP_EXIT;
+ else
+ result = IGB_XDP_CONSUMED;
goto out_failure;
+ }
result = IGB_XDP_REDIR;
break;
default:
@@ -8892,12 +8937,14 @@ static void igb_put_rx_buffer(struct igb_ring *rx_ring,
static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
{
+ unsigned int total_bytes = 0, total_packets = 0;
struct igb_adapter *adapter = q_vector->adapter;
struct igb_ring *rx_ring = q_vector->rx.ring;
- struct sk_buff *skb = rx_ring->skb;
- unsigned int total_bytes = 0, total_packets = 0;
u16 cleaned_count = igb_desc_unused(rx_ring);
+ struct sk_buff *skb = rx_ring->skb;
+ int cpu = smp_processor_id();
unsigned int xdp_xmit = 0;
+ struct netdev_queue *nq;
struct xdp_buff xdp;
u32 frame_sz = 0;
int rx_buf_pgcnt;
@@ -9025,7 +9072,10 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
if (xdp_xmit & IGB_XDP_TX) {
struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
+ nq = txring_txq(tx_ring);
+ __netif_tx_lock(nq, cpu);
igb_xdp_ring_update_tail(tx_ring);
+ __netif_tx_unlock(nq);
}
u64_stats_update_begin(&rx_ring->rx_syncp);
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
index 925bf97f7caa..5e0244772914 100644
--- a/drivers/net/ethernet/intel/igb/igb_xsk.c
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -66,7 +66,10 @@ static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid)
* at least 1 descriptor unused to make sure
* next_to_use != next_to_clean
*/
- igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
+ if (rx_ring->xsk_pool)
+ igb_alloc_rx_buffers_zc(rx_ring, igb_desc_unused(rx_ring));
+ else
+ igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
/* Rx/Tx share the same napi context. */
napi_enable(&rx_ring->q_vector->napi);
@@ -172,6 +175,263 @@ int igb_xsk_pool_setup(struct igb_adapter *adapter,
igb_xsk_pool_disable(adapter, qid);
}
+static u16 igb_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
+ union e1000_adv_rx_desc *rx_desc, u16 count)
+{
+ dma_addr_t dma;
+ u16 buffs;
+ int i;
+
+ /* nothing to do */
+ if (!count)
+ return 0;
+
+ buffs = xsk_buff_alloc_batch(pool, xdp, count);
+ for (i = 0; i < buffs; i++) {
+ dma = xsk_buff_xdp_get_dma(*xdp);
+ rx_desc->read.pkt_addr = cpu_to_le64(dma);
+ rx_desc->wb.upper.length = 0;
+
+ rx_desc++;
+ xdp++;
+ }
+
+ return buffs;
+}
+
+bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring, u16 count)
+{
+ u32 nb_buffs_extra = 0, nb_buffs = 0;
+ union e1000_adv_rx_desc *rx_desc;
+ u16 ntu = rx_ring->next_to_use;
+ u16 total_count = count;
+ struct xdp_buff **xdp;
+
+ rx_desc = IGB_RX_DESC(rx_ring, ntu);
+ xdp = &rx_ring->rx_buffer_info_zc[ntu];
+
+ if (ntu + count >= rx_ring->count) {
+ nb_buffs_extra = igb_fill_rx_descs(rx_ring->xsk_pool, xdp,
+ rx_desc,
+ rx_ring->count - ntu);
+ if (nb_buffs_extra != rx_ring->count - ntu) {
+ ntu += nb_buffs_extra;
+ goto exit;
+ }
+ rx_desc = IGB_RX_DESC(rx_ring, 0);
+ xdp = rx_ring->rx_buffer_info_zc;
+ ntu = 0;
+ count -= nb_buffs_extra;
+ }
+
+ nb_buffs = igb_fill_rx_descs(rx_ring->xsk_pool, xdp, rx_desc, count);
+ ntu += nb_buffs;
+ if (ntu == rx_ring->count)
+ ntu = 0;
+
+ /* clear the length for the next_to_use descriptor */
+ rx_desc = IGB_RX_DESC(rx_ring, ntu);
+ rx_desc->wb.upper.length = 0;
+
+exit:
+ if (rx_ring->next_to_use != ntu) {
+ rx_ring->next_to_use = ntu;
+
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch. (Only
+ * applicable for weak-ordered memory model archs,
+ * such as IA-64).
+ */
+ wmb();
+ writel(ntu, rx_ring->tail);
+ }
+
+ return total_count == (nb_buffs + nb_buffs_extra);
+}
+
+void igb_clean_rx_ring_zc(struct igb_ring *rx_ring)
+{
+ u16 ntc = rx_ring->next_to_clean;
+ u16 ntu = rx_ring->next_to_use;
+
+ while (ntc != ntu) {
+ struct xdp_buff *xdp = rx_ring->rx_buffer_info_zc[ntc];
+
+ xsk_buff_free(xdp);
+ ntc++;
+ if (ntc >= rx_ring->count)
+ ntc = 0;
+ }
+}
+
+static struct sk_buff *igb_construct_skb_zc(struct igb_ring *rx_ring,
+ struct xdp_buff *xdp,
+ ktime_t timestamp)
+{
+ unsigned int totalsize = xdp->data_end - xdp->data_meta;
+ unsigned int metasize = xdp->data - xdp->data_meta;
+ struct sk_buff *skb;
+
+ net_prefetch(xdp->data_meta);
+
+ /* allocate a skb to store the frags */
+ skb = __napi_alloc_skb(&rx_ring->q_vector->napi, totalsize,
+ GFP_ATOMIC | __GFP_NOWARN);
+ if (unlikely(!skb))
+ return NULL;
+
+ if (timestamp)
+ skb_hwtstamps(skb)->hwtstamp = timestamp;
+
+ memcpy(__skb_put(skb, totalsize), xdp->data_meta,
+ ALIGN(totalsize, sizeof(long)));
+
+ if (metasize) {
+ skb_metadata_set(skb, metasize);
+ __skb_pull(skb, metasize);
+ }
+
+ return skb;
+}
+
+static void igb_update_ntc(struct igb_ring *rx_ring)
+{
+ u32 ntc = rx_ring->next_to_clean + 1;
+
+ /* fetch, update, and store next to clean */
+ ntc = (ntc < rx_ring->count) ? ntc : 0;
+ rx_ring->next_to_clean = ntc;
+
+ prefetch(IGB_RX_DESC(rx_ring, ntc));
+}
+
+int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, const int budget)
+{
+ struct igb_adapter *adapter = q_vector->adapter;
+ unsigned int total_bytes = 0, total_packets = 0;
+ struct igb_ring *rx_ring = q_vector->rx.ring;
+ int cpu = smp_processor_id();
+ unsigned int xdp_xmit = 0;
+ struct netdev_queue *nq;
+ bool failure = false;
+ u16 entries_to_alloc;
+ struct sk_buff *skb;
+
+ while (likely(total_packets < budget)) {
+ union e1000_adv_rx_desc *rx_desc;
+ struct xdp_buff *xdp;
+ ktime_t timestamp = 0;
+ unsigned int size;
+
+ rx_desc = IGB_RX_DESC(rx_ring, rx_ring->next_to_clean);
+ size = le16_to_cpu(rx_desc->wb.upper.length);
+ if (!size)
+ break;
+
+ /* This memory barrier is needed to keep us from reading
+ * any other fields out of the rx_desc until we know the
+ * descriptor has been written back
+ */
+ dma_rmb();
+
+ xdp = rx_ring->rx_buffer_info_zc[rx_ring->next_to_clean];
+ xsk_buff_set_size(xdp, size);
+ xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool);
+
+ /* pull rx packet timestamp if available and valid */
+ if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
+ int ts_hdr_len;
+
+ ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
+ xdp->data,
+ ×tamp);
+
+ xdp->data += ts_hdr_len;
+ xdp->data_meta += ts_hdr_len;
+ size -= ts_hdr_len;
+ }
+
+ skb = igb_run_xdp(adapter, rx_ring, xdp);
+
+ if (IS_ERR(skb)) {
+ unsigned int xdp_res = -PTR_ERR(skb);
+
+ if (likely(xdp_res & (IGB_XDP_TX | IGB_XDP_REDIR))) {
+ xdp_xmit |= xdp_res;
+ } else if (xdp_res == IGB_XDP_EXIT) {
+ failure = true;
+ break;
+ } else if (xdp_res == IGB_XDP_CONSUMED) {
+ xsk_buff_free(xdp);
+ }
+
+ total_packets++;
+ total_bytes += size;
+
+ igb_update_ntc(rx_ring);
+ continue;
+ }
+
+ skb = igb_construct_skb_zc(rx_ring, xdp, timestamp);
+
+ /* exit if we failed to retrieve a buffer */
+ if (!skb) {
+ rx_ring->rx_stats.alloc_failed++;
+ break;
+ }
+
+ xsk_buff_free(xdp);
+ igb_update_ntc(rx_ring);
+
+ if (eth_skb_pad(skb))
+ continue;
+
+ /* probably a little skewed due to removing CRC */
+ total_bytes += skb->len;
+
+ /* populate checksum, timestamp, VLAN, and protocol */
+ igb_process_skb_fields(rx_ring, rx_desc, skb);
+
+ napi_gro_receive(&q_vector->napi, skb);
+
+ /* update budget accounting */
+ total_packets++;
+ }
+
+ if (xdp_xmit & IGB_XDP_REDIR)
+ xdp_do_flush();
+
+ if (xdp_xmit & IGB_XDP_TX) {
+ struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
+
+ nq = txring_txq(tx_ring);
+ __netif_tx_lock(nq, cpu);
+ igb_xdp_ring_update_tail(tx_ring);
+ __netif_tx_unlock(nq);
+ }
+
+ u64_stats_update_begin(&rx_ring->rx_syncp);
+ rx_ring->rx_stats.packets += total_packets;
+ rx_ring->rx_stats.bytes += total_bytes;
+ u64_stats_update_end(&rx_ring->rx_syncp);
+ q_vector->rx.total_packets += total_packets;
+ q_vector->rx.total_bytes += total_bytes;
+
+ entries_to_alloc = igb_desc_unused(rx_ring);
+ if (entries_to_alloc >= IGB_RX_BUFFER_WRITE)
+ failure |= !igb_alloc_rx_buffers_zc(rx_ring, entries_to_alloc);
+
+ if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
+ if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
+ xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
+ else
+ xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
+
+ return (int)total_packets;
+ }
+ return failure ? budget : (int)total_packets;
+}
+
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
{
struct igb_adapter *adapter = netdev_priv(dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH iwl-next v4 4/4] igb: add AF_XDP zero-copy Tx support
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
` (2 preceding siblings ...)
2023-08-04 8:40 ` [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support Sriram Yagnaraman
@ 2023-08-04 8:40 ` Sriram Yagnaraman
2024-06-27 7:07 ` [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
4 siblings, 0 replies; 12+ messages in thread
From: Sriram Yagnaraman @ 2023-08-04 8:40 UTC (permalink / raw)
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
Add support for AF_XDP zero-copy transmit path.
A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
frame was allocated from the xsk buff pool, so igb_clean_tx_ring and
igb_clean_tx_irq can clean the buffers correctly based on type.
igb_xmit_zc performs the actual packet transmit when AF_XDP zero-copy is
enabled. We share the TX ring between slow path, XDP and AF_XDP
zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
drivers/net/ethernet/intel/igb/igb.h | 2 +
drivers/net/ethernet/intel/igb/igb_main.c | 56 +++++++++++++++++++----
drivers/net/ethernet/intel/igb/igb_xsk.c | 52 +++++++++++++++++++++
3 files changed, 101 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 39202c40116a..f52a988fe2f0 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -257,6 +257,7 @@ enum igb_tx_flags {
enum igb_tx_buf_type {
IGB_TYPE_SKB = 0,
IGB_TYPE_XDP,
+ IGB_TYPE_XSK
};
/* wrapper around a pointer to a socket buffer,
@@ -836,6 +837,7 @@ int igb_xsk_pool_setup(struct igb_adapter *adapter,
bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring, u16 count);
void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, const int budget);
+bool igb_xmit_zc(struct igb_ring *tx_ring);
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b13cc94ac178..117c3d883529 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3021,6 +3021,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
if (unlikely(!tx_ring))
return -ENXIO;
+ if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
+ return -ENXIO;
+
nq = txring_txq(tx_ring);
__netif_tx_lock(nq, cpu);
@@ -4934,15 +4937,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
{
u16 i = tx_ring->next_to_clean;
struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
+ u32 xsk_frames = 0;
while (i != tx_ring->next_to_use) {
union e1000_adv_tx_desc *eop_desc, *tx_desc;
/* Free all the Tx ring sk_buffs or xdp frames */
- if (tx_buffer->type == IGB_TYPE_SKB)
+ if (tx_buffer->type == IGB_TYPE_SKB) {
dev_kfree_skb_any(tx_buffer->skb);
- else
+ } else if (tx_buffer->type == IGB_TYPE_XDP) {
xdp_return_frame(tx_buffer->xdpf);
+ } else if (tx_buffer->type == IGB_TYPE_XSK) {
+ xsk_frames++;
+ goto skip_for_xsk;
+ }
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -4973,6 +4981,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
DMA_TO_DEVICE);
}
+skip_for_xsk:
tx_buffer->next_to_watch = NULL;
/* move us one more past the eop_desc for start of next pkt */
@@ -4987,6 +4996,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
/* reset BQL for queue */
netdev_tx_reset_queue(txring_txq(tx_ring));
+ if (tx_ring->xsk_pool && xsk_frames)
+ xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
+
/* reset next_to_use and next_to_clean */
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
@@ -6520,6 +6532,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
return NETDEV_TX_BUSY;
}
+ if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
+ return NETDEV_TX_BUSY;
+
/* record the location of the first descriptor for this packet */
first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
first->type = IGB_TYPE_SKB;
@@ -8293,13 +8308,17 @@ static int igb_poll(struct napi_struct *napi, int budget)
**/
static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
{
- struct igb_adapter *adapter = q_vector->adapter;
- struct igb_ring *tx_ring = q_vector->tx.ring;
- struct igb_tx_buffer *tx_buffer;
- union e1000_adv_tx_desc *tx_desc;
unsigned int total_bytes = 0, total_packets = 0;
+ struct igb_adapter *adapter = q_vector->adapter;
unsigned int budget = q_vector->tx.work_limit;
+ struct igb_ring *tx_ring = q_vector->tx.ring;
unsigned int i = tx_ring->next_to_clean;
+ union e1000_adv_tx_desc *tx_desc;
+ struct igb_tx_buffer *tx_buffer;
+ int cpu = smp_processor_id();
+ bool xsk_xmit_done = true;
+ struct netdev_queue *nq;
+ u32 xsk_frames = 0;
if (test_bit(__IGB_DOWN, &adapter->state))
return true;
@@ -8330,10 +8349,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
total_packets += tx_buffer->gso_segs;
/* free the skb */
- if (tx_buffer->type == IGB_TYPE_SKB)
+ if (tx_buffer->type == IGB_TYPE_SKB) {
napi_consume_skb(tx_buffer->skb, napi_budget);
- else
+ } else if (tx_buffer->type == IGB_TYPE_XDP) {
xdp_return_frame(tx_buffer->xdpf);
+ } else if (tx_buffer->type == IGB_TYPE_XSK) {
+ xsk_frames++;
+ goto skip_for_xsk;
+ }
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -8365,6 +8388,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
}
}
+skip_for_xsk:
/* move us one more past the eop_desc for start of next pkt */
tx_buffer++;
tx_desc++;
@@ -8393,6 +8417,20 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
+ if (tx_ring->xsk_pool) {
+ if (xsk_frames)
+ xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
+ if (xsk_uses_need_wakeup(tx_ring->xsk_pool))
+ xsk_set_tx_need_wakeup(tx_ring->xsk_pool);
+
+ nq = txring_txq(tx_ring);
+ __netif_tx_lock(nq, cpu);
+ /* Avoid transmit queue timeout since we share it with the slow path */
+ txq_trans_cond_update(nq);
+ xsk_xmit_done = igb_xmit_zc(tx_ring);
+ __netif_tx_unlock(nq);
+ }
+
if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
struct e1000_hw *hw = &adapter->hw;
@@ -8455,7 +8493,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
}
}
- return !!budget;
+ return !!budget && xsk_xmit_done;
}
/**
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
index 5e0244772914..c5d01d65e7de 100644
--- a/drivers/net/ethernet/intel/igb/igb_xsk.c
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -432,6 +432,58 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, const int budget)
return failure ? budget : (int)total_packets;
}
+bool igb_xmit_zc(struct igb_ring *tx_ring)
+{
+ unsigned int budget = igb_desc_unused(tx_ring);
+ struct xsk_buff_pool *pool = tx_ring->xsk_pool;
+ struct xdp_desc *descs = pool->tx_descs;
+ union e1000_adv_tx_desc *tx_desc = NULL;
+ struct igb_tx_buffer *tx_buffer_info;
+ u32 cmd_type, nb_pkts, i = 0;
+ unsigned int total_bytes = 0;
+ dma_addr_t dma;
+
+ nb_pkts = xsk_tx_peek_release_desc_batch(pool, budget);
+ if (!nb_pkts)
+ return true;
+
+ while (nb_pkts-- > 0) {
+ dma = xsk_buff_raw_get_dma(pool, descs[i].addr);
+ xsk_buff_raw_dma_sync_for_device(pool, dma, descs[i].len);
+
+ tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+ tx_buffer_info->bytecount = descs[i].len;
+ tx_buffer_info->type = IGB_TYPE_XSK;
+ tx_buffer_info->xdpf = NULL;
+ tx_buffer_info->gso_segs = 1;
+ tx_buffer_info->time_stamp = jiffies;
+
+ tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use);
+ tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+ /* put descriptor type bits */
+ cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
+ E1000_ADVTXD_DCMD_IFCS;
+
+ cmd_type |= descs[i].len | IGB_TXD_DCMD;
+ tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+ tx_desc->read.olinfo_status = 0;
+
+ total_bytes += descs[i].len;
+
+ i++;
+ tx_ring->next_to_use++;
+ tx_buffer_info->next_to_watch = tx_desc;
+ if (tx_ring->next_to_use == tx_ring->count)
+ tx_ring->next_to_use = 0;
+ }
+
+ netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes);
+ igb_xdp_ring_update_tail(tx_ring);
+
+ return nb_pkts < budget;
+}
+
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
{
struct igb_adapter *adapter = netdev_priv(dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support
2023-08-04 8:40 ` [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support Sriram Yagnaraman
@ 2023-08-05 14:54 ` kernel test robot
0 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-08-05 14:54 UTC (permalink / raw)
To: Sriram Yagnaraman
Cc: oe-kbuild-all, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
Jonathan Lemon, John Fastabend, Jesse Brandeburg,
Alexei Starovoitov, Björn Töpel, Eric Dumazet,
Sriram Yagnaraman, Tony Nguyen, Jakub Kicinski, intel-wired-lan,
bpf, Paolo Abeni, David S . Miller, Magnus Karlsson
Hi Sriram,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tnguy-next-queue/dev-queue]
url: https://github.com/intel-lab-lkp/linux/commits/Sriram-Yagnaraman/igb-prepare-for-AF_XDP-zero-copy-support/20230804-164354
base: https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
patch link: https://lore.kernel.org/r/20230804084051.14194-4-sriram.yagnaraman%40est.tech
patch subject: [Intel-wired-lan] [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support
config: i386-debian-10.3 (https://download.01.org/0day-ci/archive/20230805/202308052204.SQxxKnNI-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230805/202308052204.SQxxKnNI-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308052204.SQxxKnNI-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/net/ethernet/intel/igb/igb_main.c: In function 'igb_dump':
>> drivers/net/ethernet/intel/igb/igb_main.c:505:42: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
505 | dma_addr_t dma = (dma_addr_t)NULL;
| ^
vim +505 drivers/net/ethernet/intel/igb/igb_main.c
348
349 /* igb_dump - Print registers, Tx-rings and Rx-rings */
350 static void igb_dump(struct igb_adapter *adapter)
351 {
352 struct net_device *netdev = adapter->netdev;
353 struct e1000_hw *hw = &adapter->hw;
354 struct igb_reg_info *reginfo;
355 struct igb_ring *tx_ring;
356 union e1000_adv_tx_desc *tx_desc;
357 struct my_u0 { __le64 a; __le64 b; } *u0;
358 struct igb_ring *rx_ring;
359 union e1000_adv_rx_desc *rx_desc;
360 u32 staterr;
361 u16 i, n;
362
363 if (!netif_msg_hw(adapter))
364 return;
365
366 /* Print netdevice Info */
367 if (netdev) {
368 dev_info(&adapter->pdev->dev, "Net device Info\n");
369 pr_info("Device Name state trans_start\n");
370 pr_info("%-15s %016lX %016lX\n", netdev->name,
371 netdev->state, dev_trans_start(netdev));
372 }
373
374 /* Print Registers */
375 dev_info(&adapter->pdev->dev, "Register Dump\n");
376 pr_info(" Register Name Value\n");
377 for (reginfo = (struct igb_reg_info *)igb_reg_info_tbl;
378 reginfo->name; reginfo++) {
379 igb_regdump(hw, reginfo);
380 }
381
382 /* Print TX Ring Summary */
383 if (!netdev || !netif_running(netdev))
384 goto exit;
385
386 dev_info(&adapter->pdev->dev, "TX Rings Summary\n");
387 pr_info("Queue [NTU] [NTC] [bi(ntc)->dma ] leng ntw timestamp\n");
388 for (n = 0; n < adapter->num_tx_queues; n++) {
389 struct igb_tx_buffer *buffer_info;
390 tx_ring = adapter->tx_ring[n];
391 buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
392 pr_info(" %5d %5X %5X %016llX %04X %p %016llX\n",
393 n, tx_ring->next_to_use, tx_ring->next_to_clean,
394 (u64)dma_unmap_addr(buffer_info, dma),
395 dma_unmap_len(buffer_info, len),
396 buffer_info->next_to_watch,
397 (u64)buffer_info->time_stamp);
398 }
399
400 /* Print TX Rings */
401 if (!netif_msg_tx_done(adapter))
402 goto rx_ring_summary;
403
404 dev_info(&adapter->pdev->dev, "TX Rings Dump\n");
405
406 /* Transmit Descriptor Formats
407 *
408 * Advanced Transmit Descriptor
409 * +--------------------------------------------------------------+
410 * 0 | Buffer Address [63:0] |
411 * +--------------------------------------------------------------+
412 * 8 | PAYLEN | PORTS |CC|IDX | STA | DCMD |DTYP|MAC|RSV| DTALEN |
413 * +--------------------------------------------------------------+
414 * 63 46 45 40 39 38 36 35 32 31 24 15 0
415 */
416
417 for (n = 0; n < adapter->num_tx_queues; n++) {
418 tx_ring = adapter->tx_ring[n];
419 pr_info("------------------------------------\n");
420 pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
421 pr_info("------------------------------------\n");
422 pr_info("T [desc] [address 63:0 ] [PlPOCIStDDM Ln] [bi->dma ] leng ntw timestamp bi->skb\n");
423
424 for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
425 const char *next_desc;
426 struct igb_tx_buffer *buffer_info;
427 tx_desc = IGB_TX_DESC(tx_ring, i);
428 buffer_info = &tx_ring->tx_buffer_info[i];
429 u0 = (struct my_u0 *)tx_desc;
430 if (i == tx_ring->next_to_use &&
431 i == tx_ring->next_to_clean)
432 next_desc = " NTC/U";
433 else if (i == tx_ring->next_to_use)
434 next_desc = " NTU";
435 else if (i == tx_ring->next_to_clean)
436 next_desc = " NTC";
437 else
438 next_desc = "";
439
440 pr_info("T [0x%03X] %016llX %016llX %016llX %04X %p %016llX %p%s\n",
441 i, le64_to_cpu(u0->a),
442 le64_to_cpu(u0->b),
443 (u64)dma_unmap_addr(buffer_info, dma),
444 dma_unmap_len(buffer_info, len),
445 buffer_info->next_to_watch,
446 (u64)buffer_info->time_stamp,
447 buffer_info->skb, next_desc);
448
449 if (netif_msg_pktdata(adapter) && buffer_info->skb)
450 print_hex_dump(KERN_INFO, "",
451 DUMP_PREFIX_ADDRESS,
452 16, 1, buffer_info->skb->data,
453 dma_unmap_len(buffer_info, len),
454 true);
455 }
456 }
457
458 /* Print RX Rings Summary */
459 rx_ring_summary:
460 dev_info(&adapter->pdev->dev, "RX Rings Summary\n");
461 pr_info("Queue [NTU] [NTC]\n");
462 for (n = 0; n < adapter->num_rx_queues; n++) {
463 rx_ring = adapter->rx_ring[n];
464 pr_info(" %5d %5X %5X\n",
465 n, rx_ring->next_to_use, rx_ring->next_to_clean);
466 }
467
468 /* Print RX Rings */
469 if (!netif_msg_rx_status(adapter))
470 goto exit;
471
472 dev_info(&adapter->pdev->dev, "RX Rings Dump\n");
473
474 /* Advanced Receive Descriptor (Read) Format
475 * 63 1 0
476 * +-----------------------------------------------------+
477 * 0 | Packet Buffer Address [63:1] |A0/NSE|
478 * +----------------------------------------------+------+
479 * 8 | Header Buffer Address [63:1] | DD |
480 * +-----------------------------------------------------+
481 *
482 *
483 * Advanced Receive Descriptor (Write-Back) Format
484 *
485 * 63 48 47 32 31 30 21 20 17 16 4 3 0
486 * +------------------------------------------------------+
487 * 0 | Packet IP |SPH| HDR_LEN | RSV|Packet| RSS |
488 * | Checksum Ident | | | | Type | Type |
489 * +------------------------------------------------------+
490 * 8 | VLAN Tag | Length | Extended Error | Extended Status |
491 * +------------------------------------------------------+
492 * 63 48 47 32 31 20 19 0
493 */
494
495 for (n = 0; n < adapter->num_rx_queues; n++) {
496 rx_ring = adapter->rx_ring[n];
497 pr_info("------------------------------------\n");
498 pr_info("RX QUEUE INDEX = %d\n", rx_ring->queue_index);
499 pr_info("------------------------------------\n");
500 pr_info("R [desc] [ PktBuf A0] [ HeadBuf DD] [bi->dma ] [bi->skb] <-- Adv Rx Read format\n");
501 pr_info("RWB[desc] [PcsmIpSHl PtRs] [vl er S cks ln] ---------------- [bi->skb] <-- Adv Rx Write-Back format\n");
502
503 for (i = 0; i < rx_ring->count; i++) {
504 const char *next_desc;
> 505 dma_addr_t dma = (dma_addr_t)NULL;
506 struct igb_rx_buffer *buffer_info = NULL;
507 rx_desc = IGB_RX_DESC(rx_ring, i);
508 u0 = (struct my_u0 *)rx_desc;
509 staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
510
511 if (!rx_ring->xsk_pool) {
512 buffer_info = &rx_ring->rx_buffer_info[i];
513 dma = buffer_info->dma;
514 }
515
516 if (i == rx_ring->next_to_use)
517 next_desc = " NTU";
518 else if (i == rx_ring->next_to_clean)
519 next_desc = " NTC";
520 else
521 next_desc = "";
522
523 if (staterr & E1000_RXD_STAT_DD) {
524 /* Descriptor Done */
525 pr_info("%s[0x%03X] %016llX %016llX ---------------- %s\n",
526 "RWB", i,
527 le64_to_cpu(u0->a),
528 le64_to_cpu(u0->b),
529 next_desc);
530 } else {
531 pr_info("%s[0x%03X] %016llX %016llX %016llX %s\n",
532 "R ", i,
533 le64_to_cpu(u0->a),
534 le64_to_cpu(u0->b),
535 (u64)dma,
536 next_desc);
537
538 if (netif_msg_pktdata(adapter) &&
539 buffer_info && dma && buffer_info->page) {
540 print_hex_dump(KERN_INFO, "",
541 DUMP_PREFIX_ADDRESS,
542 16, 1,
543 page_address(buffer_info->page) +
544 buffer_info->page_offset,
545 igb_rx_bufsz(rx_ring), true);
546 }
547 }
548 }
549 }
550
551 exit:
552 return;
553 }
554
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
` (3 preceding siblings ...)
2023-08-04 8:40 ` [PATCH iwl-next v4 4/4] igb: add AF_XDP zero-copy Tx support Sriram Yagnaraman
@ 2024-06-27 7:07 ` Kurt Kanzenbach
2024-06-27 16:49 ` [Intel-wired-lan] " Benjamin Steinke
4 siblings, 1 reply; 12+ messages in thread
From: Kurt Kanzenbach @ 2024-06-27 7:07 UTC (permalink / raw)
To: Sriram Yagnaraman
Cc: intel-wired-lan, bpf, netdev, Jesse Brandeburg, Tony Nguyen,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Björn Töpel, Magnus Karlsson,
Maciej Fijalkowski, Jonathan Lemon, Sriram Yagnaraman
[-- Attachment #1: Type: text/plain, Size: 696 bytes --]
Hi Sriram,
On Fri Aug 04 2023, Sriram Yagnaraman wrote:
> The first couple of patches adds helper funcctions to prepare for AF_XDP
> zero-copy support which comes in the last couple of patches, one each
> for Rx and TX paths.
>
> As mentioned in v1 patchset [0], I don't have access to an actual IGB
> device to provide correct performance numbers. I have used Intel 82576EB
> emulator in QEMU [1] to test the changes to IGB driver.
I gave this patch series a try on a recent kernel and silicon
(i210). There was one issue in igb_xmit_zc(). But other than that it
worked very nicely.
It seems like it hasn't been merged yet. Do you have any plans for
continuing to work on this?
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2024-06-27 7:07 ` [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
@ 2024-06-27 16:49 ` Benjamin Steinke
2024-06-27 17:18 ` Kurt Kanzenbach
0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Steinke @ 2024-06-27 16:49 UTC (permalink / raw)
To: Sriram Yagnaraman
Cc: intel-wired-lan, Maciej Fijalkowski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jonathan Lemon, John Fastabend,
Alexei Starovoitov, Björn Töpel, Eric Dumazet,
Sriram Yagnaraman, Tony Nguyen, Jakub Kicinski, bpf, Paolo Abeni,
David S . Miller, Magnus Karlsson, Kurt Kanzenbach
On Thursday, 27 June 2024, 09:07:55 CEST, Kurt Kanzenbach wrote:
> Hi Sriram,
>
> On Fri Aug 04 2023, Sriram Yagnaraman wrote:
> > The first couple of patches adds helper funcctions to prepare for AF_XDP
> > zero-copy support which comes in the last couple of patches, one each
> > for Rx and TX paths.
> >
> > As mentioned in v1 patchset [0], I don't have access to an actual IGB
> > device to provide correct performance numbers. I have used Intel 82576EB
> > emulator in QEMU [1] to test the changes to IGB driver.
>
> I gave this patch series a try on a recent kernel and silicon
> (i210). There was one issue in igb_xmit_zc(). But other than that it
> worked very nicely.
Hi Kurt and Sriram,
I recently tried the patches on a 6.1 kernel. On two different devices i210 &
i211 I couldn't see any packets being transmitted on the wire. Perhaps caused
by the issue in igb_xmit_zc() you mentioned, Kurt? Can you share your findings,
please?
RX seemed to work on first sight.
> It seems like it hasn't been merged yet. Do you have any plans for
> continuing to work on this?
I can offer to do testing and debugging on real hardware if this helps.
Thanks,
Benjamin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2024-06-27 16:49 ` [Intel-wired-lan] " Benjamin Steinke
@ 2024-06-27 17:18 ` Kurt Kanzenbach
2024-07-05 21:22 ` Sriram Yagnaraman
2024-07-15 11:34 ` Benjamin Steinke
0 siblings, 2 replies; 12+ messages in thread
From: Kurt Kanzenbach @ 2024-06-27 17:18 UTC (permalink / raw)
To: Benjamin Steinke, Sriram Yagnaraman
Cc: intel-wired-lan, Maciej Fijalkowski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jonathan Lemon, John Fastabend,
Alexei Starovoitov, Björn Töpel, Eric Dumazet,
Sriram Yagnaraman, Tony Nguyen, Jakub Kicinski, bpf, Paolo Abeni,
David S . Miller, Magnus Karlsson
[-- Attachment #1: Type: text/plain, Size: 1933 bytes --]
Hi Benjamin,
On Thu Jun 27 2024, Benjamin Steinke wrote:
> On Thursday, 27 June 2024, 09:07:55 CEST, Kurt Kanzenbach wrote:
>> Hi Sriram,
>>
>> On Fri Aug 04 2023, Sriram Yagnaraman wrote:
>> > The first couple of patches adds helper funcctions to prepare for AF_XDP
>> > zero-copy support which comes in the last couple of patches, one each
>> > for Rx and TX paths.
>> >
>> > As mentioned in v1 patchset [0], I don't have access to an actual IGB
>> > device to provide correct performance numbers. I have used Intel 82576EB
>> > emulator in QEMU [1] to test the changes to IGB driver.
>>
>> I gave this patch series a try on a recent kernel and silicon
>> (i210). There was one issue in igb_xmit_zc(). But other than that it
>> worked very nicely.
>
> Hi Kurt and Sriram,
>
> I recently tried the patches on a 6.1 kernel. On two different devices i210 &
> i211 I couldn't see any packets being transmitted on the wire. Perhaps caused
> by the issue in igb_xmit_zc() you mentioned, Kurt? Can you share your findings,
> please?
Yeah, that's exactly the issue.
Following igb_xmit_xdp_ring() I've added PAYLEN to the Tx descriptor
instead of setting it to zero:
igb_xmit_zc()
{
[...]
/* put descriptor type bits */
cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
E1000_ADVTXD_DCMD_IFCS;
olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
cmd_type |= descs[i].len | IGB_TXD_DCMD;
tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
[...]
}
Afterwards packets are transmitted on the wire.
>
> RX seemed to work on first sight.
>
Yes, Rx works even with PTP enabled.
>> It seems like it hasn't been merged yet. Do you have any plans for
>> continuing to work on this?
>
> I can offer to do testing and debugging on real hardware if this helps.
Great. Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2024-06-27 17:18 ` Kurt Kanzenbach
@ 2024-07-05 21:22 ` Sriram Yagnaraman
2024-07-08 6:51 ` Kurt Kanzenbach
2024-07-15 11:34 ` Benjamin Steinke
1 sibling, 1 reply; 12+ messages in thread
From: Sriram Yagnaraman @ 2024-07-05 21:22 UTC (permalink / raw)
To: Kurt Kanzenbach, Benjamin Steinke, Sriram Yagnaraman
Cc: intel-wired-lan@osuosl.org, Maciej Fijalkowski,
Jesper Dangaard Brouer, Daniel Borkmann, netdev@vger.kernel.org,
Jonathan Lemon, John Fastabend, Alexei Starovoitov,
Björn Töpel, Eric Dumazet, Sriram Yagnaraman,
Tony Nguyen, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
David S . Miller, Magnus Karlsson
Hi,
> -----Original Message-----
> From: Kurt Kanzenbach <kurt@linutronix.de>
> Sent: Thursday, 27 June 2024 19:19
> To: Benjamin Steinke <benjamin.steinke@woks-audio.com>; Sriram
> Yagnaraman <sriram.yagnaraman@est.tech>
> Cc: intel-wired-lan@osuosl.org; Maciej Fijalkowski
> <maciej.fijalkowski@intel.com>; Jesper Dangaard Brouer <hawk@kernel.org>;
> Daniel Borkmann <daniel@iogearbox.net>; netdev@vger.kernel.org;
> Jonathan Lemon <jonathan.lemon@gmail.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexei Starovoitov <ast@kernel.org>; Björn
> Töpel <bjorn@kernel.org>; Eric Dumazet <edumazet@google.com>; Sriram
> Yagnaraman <sriram.yagnaraman@est.tech>; Tony Nguyen
> <anthony.l.nguyen@intel.com>; Jakub Kicinski <kuba@kernel.org>;
> bpf@vger.kernel.org; Paolo Abeni <pabeni@redhat.com>; David S . Miller
> <davem@davemloft.net>; Magnus Karlsson <magnus.karlsson@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for
> AF_XDP zero-copy
>
> Hi Benjamin,
>
> On Thu Jun 27 2024, Benjamin Steinke wrote:
> > On Thursday, 27 June 2024, 09:07:55 CEST, Kurt Kanzenbach wrote:
> >> Hi Sriram,
> >>
> >> On Fri Aug 04 2023, Sriram Yagnaraman wrote:
> >> > The first couple of patches adds helper funcctions to prepare for
> >> > AF_XDP zero-copy support which comes in the last couple of patches,
> >> > one each for Rx and TX paths.
> >> >
> >> > As mentioned in v1 patchset [0], I don't have access to an actual
> >> > IGB device to provide correct performance numbers. I have used
> >> > Intel 82576EB emulator in QEMU [1] to test the changes to IGB driver.
> >>
> >> I gave this patch series a try on a recent kernel and silicon (i210).
> >> There was one issue in igb_xmit_zc(). But other than that it worked
> >> very nicely.
> >
> > Hi Kurt and Sriram,
> >
> > I recently tried the patches on a 6.1 kernel. On two different devices
> > i210 &
> > i211 I couldn't see any packets being transmitted on the wire. Perhaps
> > caused by the issue in igb_xmit_zc() you mentioned, Kurt? Can you
> > share your findings, please?
>
> Yeah, that's exactly the issue.
>
> Following igb_xmit_xdp_ring() I've added PAYLEN to the Tx descriptor instead
> of setting it to zero:
>
> igb_xmit_zc()
> {
> [...]
>
> /* put descriptor type bits */
> cmd_type = E1000_ADVTXD_DTYP_DATA |
> E1000_ADVTXD_DCMD_DEXT |
> E1000_ADVTXD_DCMD_IFCS;
> olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
>
> cmd_type |= descs[i].len | IGB_TXD_DCMD;
> tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>
> [...]
> }
>
> Afterwards packets are transmitted on the wire.
>
> >
> > RX seemed to work on first sight.
> >
>
> Yes, Rx works even with PTP enabled.
>
> >> It seems like it hasn't been merged yet. Do you have any plans for
> >> continuing to work on this?
> >
> > I can offer to do testing and debugging on real hardware if this helps.
>
> Great. Thanks!
I have since changed my position at my company, and my new position doesn't allow me to contribute upstream to kernel unfortunately.
It would be great if one of you can take over this and get it delivered if possible.
Glad that others find this patch useful as well.
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2024-07-05 21:22 ` Sriram Yagnaraman
@ 2024-07-08 6:51 ` Kurt Kanzenbach
0 siblings, 0 replies; 12+ messages in thread
From: Kurt Kanzenbach @ 2024-07-08 6:51 UTC (permalink / raw)
To: Sriram Yagnaraman, Benjamin Steinke, Sriram Yagnaraman
Cc: intel-wired-lan@osuosl.org, Maciej Fijalkowski,
Jesper Dangaard Brouer, Daniel Borkmann, netdev@vger.kernel.org,
Jonathan Lemon, John Fastabend, Alexei Starovoitov,
Björn Töpel, Eric Dumazet, Sriram Yagnaraman,
Tony Nguyen, Jakub Kicinski, bpf@vger.kernel.org, Paolo Abeni,
David S . Miller, Magnus Karlsson
[-- Attachment #1: Type: text/plain, Size: 628 bytes --]
On Fri Jul 05 2024, Sriram Yagnaraman wrote:
>> >> It seems like it hasn't been merged yet. Do you have any plans for
>> >> continuing to work on this?
>> >
>> > I can offer to do testing and debugging on real hardware if this helps.
>>
>> Great. Thanks!
>
> I have since changed my position at my company, and my new position
> doesn't allow me to contribute upstream to kernel unfortunately. It
> would be great if one of you can take over this and get it delivered
> if possible.
Ok, I'll take it over.
>
> Glad that others find this patch useful as well.
Yeah, it's very useful :).
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy
2024-06-27 17:18 ` Kurt Kanzenbach
2024-07-05 21:22 ` Sriram Yagnaraman
@ 2024-07-15 11:34 ` Benjamin Steinke
1 sibling, 0 replies; 12+ messages in thread
From: Benjamin Steinke @ 2024-07-15 11:34 UTC (permalink / raw)
To: intel-wired-lan, Kurt Kanzenbach
Cc: Sriram Yagnaraman, Maciej Fijalkowski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jonathan Lemon, John Fastabend,
Alexei Starovoitov, Björn Töpel, Eric Dumazet,
Sriram Yagnaraman, Tony Nguyen, Jakub Kicinski, bpf, Paolo Abeni,
David S . Miller, Magnus Karlsson
On Thursday, 27 June 2024, 19:18:37 CEST, Kurt Kanzenbach wrote:
> Hi Benjamin,
>
> On Thu Jun 27 2024, Benjamin Steinke wrote:
> > On Thursday, 27 June 2024, 09:07:55 CEST, Kurt Kanzenbach wrote:
> >> Hi Sriram,
> >>
> >> On Fri Aug 04 2023, Sriram Yagnaraman wrote:
> >> > The first couple of patches adds helper funcctions to prepare for
> >> > AF_XDP
> >> > zero-copy support which comes in the last couple of patches, one each
> >> > for Rx and TX paths.
> >> >
> >> > As mentioned in v1 patchset [0], I don't have access to an actual IGB
> >> > device to provide correct performance numbers. I have used Intel
> >> > 82576EB
> >> > emulator in QEMU [1] to test the changes to IGB driver.
> >>
> >> I gave this patch series a try on a recent kernel and silicon
> >> (i210). There was one issue in igb_xmit_zc(). But other than that it
> >> worked very nicely.
> >
> > Hi Kurt and Sriram,
> >
> > I recently tried the patches on a 6.1 kernel. On two different devices
> > i210 & i211 I couldn't see any packets being transmitted on the wire.
> > Perhaps caused by the issue in igb_xmit_zc() you mentioned, Kurt? Can you
> > share your findings, please?
>
> Yeah, that's exactly the issue.
>
> Following igb_xmit_xdp_ring() I've added PAYLEN to the Tx descriptor
> instead of setting it to zero:
>
> igb_xmit_zc()
> {
> [...]
>
> /* put descriptor type bits */
> cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
> E1000_ADVTXD_DCMD_IFCS;
> olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
>
> cmd_type |= descs[i].len | IGB_TXD_DCMD;
> tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
>
> [...]
> }
>
> Afterwards packets are transmitted on the wire.
Hi Kurt,
I can confirm this makes the transmitter work.
Thank you for taking over this patch series and continue to bring this
upstream. I will continue testing on this.
> > RX seemed to work on first sight.
>
> Yes, Rx works even with PTP enabled.
I can confirm this as well.
Best regards,
Benjamin
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-07-15 11:43 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-04 8:40 [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 1/4] igb: prepare for AF_XDP zero-copy support Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 2/4] igb: Introduce XSK data structures and helpers Sriram Yagnaraman
2023-08-04 8:40 ` [PATCH iwl-next v4 3/4] igb: add AF_XDP zero-copy Rx support Sriram Yagnaraman
2023-08-05 14:54 ` [Intel-wired-lan] " kernel test robot
2023-08-04 8:40 ` [PATCH iwl-next v4 4/4] igb: add AF_XDP zero-copy Tx support Sriram Yagnaraman
2024-06-27 7:07 ` [PATCH iwl-next v4 0/4] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
2024-06-27 16:49 ` [Intel-wired-lan] " Benjamin Steinke
2024-06-27 17:18 ` Kurt Kanzenbach
2024-07-05 21:22 ` Sriram Yagnaraman
2024-07-08 6:51 ` Kurt Kanzenbach
2024-07-15 11:34 ` Benjamin Steinke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).