* [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy
@ 2024-10-11 9:00 Kurt Kanzenbach
2024-10-11 9:00 ` [PATCH iwl-next v8 1/6] igb: Remove static qualifiers Kurt Kanzenbach
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:00 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
This is version v8 of the AF_XDP zero-copy support for igb. Since Sriram's
duties changed I am sending this instead. Additionally, I've tested this on
real hardware, Intel i210 [1].
Changes since v7:
- Collect tags
- Split patches (Maciej)
- Use read once xsk_pool pointer in igb_alloc_rx_buffers_zc() (Maciej)
- Add FIXME about RS bit in Tx path (Maciej)
- Link to v7: https://lore.kernel.org/r/20241007-b4-igb_zero_copy-v7-0-23556668adc6@linutronix.de
Changes since v6:
- Rebase to v6.12
- Collect tags
- Merged first patch via -net
- Inline small functions (Maciej)
- Read xdp_prog only once per NAPI cycle (Maciej)
- Use u32 for stack based variables (Maciej)
- Link to v6: https://lore.kernel.org/r/20240711-b4-igb_zero_copy-v6-0-4bfb68773b18@linutronix.de
Changes since v5:
- Rebase to 6.11
- Fix set-but-unused variable warnings
- Split first patches (Maciej)
- Add READ/WRITE_ONCE() for xsk_pool and xdp_prog (Maciej)
- Add synchronize_net() (Maciej)
- Remove IGB_RING_FLAG_AF_XDP_ZC (Maciej)
- Add NETDEV_XDP_ACT_XSK_ZEROCOPY to last patch (Maciej)
- Update Rx ntc handling (Maciej)
- Move stats update and xdp finalize to common functions (Maciej)
- "Likelyfy" XDP_REDIRECT case (Maciej)
- Check Tx disabled and carrier in igb_xmit_zc() (Maciej)
- RCT (Maciej)
- Link to v5: https://lore.kernel.org/r/20240711-b4-igb_zero_copy-v5-0-f3f455113b11@linutronix.de
Changes since v4:
- Rebase to v6.10
- Fix issue reported by kernel test robot
- Provide napi_id for xdp_rxq_info_reg() so that busy polling works
- Set olinfo_status in igb_xmit_zc() so that frames are transmitted
Link to v4: https://lore.kernel.org/intel-wired-lan/20230804084051.14194-1-sriram.yagnaraman@est.tech/
[1] - https://github.com/Linutronix/TSN-Testbench/tree/main/tests/busypolling_i210
Original cover letter:
The first couple of patches adds helper funcctions to prepare for AF_XDP
zero-copy support which comes in the last couple of patches, one each
for Rx and TX paths.
As mentioned in v1 patchset [0], I don't have access to an actual IGB
device to provide correct performance numbers. I have used Intel 82576EB
emulator in QEMU [1] to test the changes to IGB driver.
The tests use one isolated vCPU for RX/TX and one isolated vCPU for the
xdp-sock application [2]. Hope these measurements provide at the least
some indication on the increase in performance when using ZC, especially
in the TX path. It would be awesome if someone with a real IGB NIC can
test the patch.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 220 235 350
txpush 1.000 1.000 410
l2fwd 1.000 1.000 200
AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 200 210 310
txpush 1.000 1.000 410
l2fwd 0.900 1.000 160
[0]: https://lore.kernel.org/intel-wired-lan/20230704095915.9750-1-sriram.yagnaraman@est.tech/
[1]: https://www.qemu.org/docs/master/system/devices/igb.html
[2]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
v3->v4:
- NULL check buffer_info in igb_dump before dereferencing (Simon Horman)
v2->v3:
- Avoid TX unit hang when using AF_XDP zero-copy by setting time_stamp
on the tx_buffer_info
- Fix uninitialized nb_buffs (Simon Horman)
v1->v2:
- Use batch XSK APIs (Maciej Fijalkowski)
- Follow reverse xmas tree convention and remove the ternary operator
use (Simon Horman)
---
Kurt Kanzenbach (1):
igb: Add XDP finalize and stats update functions
Sriram Yagnaraman (5):
igb: Remove static qualifiers
igb: Introduce igb_xdp_is_enabled()
igb: Introduce XSK data structures and helpers
igb: Add AF_XDP zero-copy Rx support
igb: Add AF_XDP zero-copy Tx support
drivers/net/ethernet/intel/igb/Makefile | 2 +-
drivers/net/ethernet/intel/igb/igb.h | 58 ++-
drivers/net/ethernet/intel/igb/igb_main.c | 248 ++++++++-----
drivers/net/ethernet/intel/igb/igb_xsk.c | 567 ++++++++++++++++++++++++++++++
4 files changed, 792 insertions(+), 83 deletions(-)
---
base-commit: f5cae3d7f24df1e8ebcc8b5890a655fa151f3461
change-id: 20240711-b4-igb_zero_copy-bb70a31ecb0f
Best regards,
--
Kurt Kanzenbach <kurt@linutronix.de>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 1/6] igb: Remove static qualifiers
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
@ 2024-10-11 9:00 ` Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 2/6] igb: Introduce igb_xdp_is_enabled() Kurt Kanzenbach
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:00 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Remove static qualifiers on the following functions to be able to call
from XSK specific file that is added in the later patches:
- igb_xdp_tx_queue_mapping()
- igb_xdp_ring_update_tail()
- igb_clean_tx_ring()
- igb_clean_rx_ring()
- igb_xdp_xmit_back()
- igb_process_skb_fields()
While at it, inline igb_xdp_tx_queue_mapping() and
igb_xdp_ring_update_tail(). These functions are small enough and used in
XDP hot paths.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Split patches, inline small XDP functions]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 29 ++++++++++++++++++++++++
drivers/net/ethernet/intel/igb/igb_main.c | 37 +++++--------------------------
2 files changed, 35 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 3c2dc7bdebb5..1bfe703e73d9 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -18,6 +18,7 @@
#include <linux/i2c-algo-bit.h>
#include <linux/pci.h>
#include <linux/mdio.h>
+#include <linux/lockdep.h>
#include <net/xdp.h>
@@ -731,12 +732,18 @@ int igb_setup_tx_resources(struct igb_ring *);
int igb_setup_rx_resources(struct igb_ring *);
void igb_free_tx_resources(struct igb_ring *);
void igb_free_rx_resources(struct igb_ring *);
+void igb_clean_tx_ring(struct igb_ring *tx_ring);
+void igb_clean_rx_ring(struct igb_ring *rx_ring);
void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *);
void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *);
void igb_setup_tctl(struct igb_adapter *);
void igb_setup_rctl(struct igb_adapter *);
void igb_setup_srrctl(struct igb_adapter *, struct igb_ring *);
netdev_tx_t igb_xmit_frame_ring(struct sk_buff *, struct igb_ring *);
+int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp);
+void igb_process_skb_fields(struct igb_ring *rx_ring,
+ union e1000_adv_rx_desc *rx_desc,
+ struct sk_buff *skb);
void igb_alloc_rx_buffers(struct igb_ring *, u16);
void igb_update_stats(struct igb_adapter *);
bool igb_has_link(struct igb_adapter *adapter);
@@ -797,6 +804,28 @@ static inline struct netdev_queue *txring_txq(const struct igb_ring *tx_ring)
return netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
}
+/* This function assumes __netif_tx_lock is held by the caller. */
+static inline void igb_xdp_ring_update_tail(struct igb_ring *ring)
+{
+ lockdep_assert_held(&txring_txq(ring)->_xmit_lock);
+
+ /* Force memory writes to complete before letting h/w know there
+ * are new descriptors to fetch.
+ */
+ wmb();
+ writel(ring->next_to_use, ring->tail);
+}
+
+static inline struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter)
+{
+ unsigned int r_idx = smp_processor_id();
+
+ if (r_idx >= adapter->num_tx_queues)
+ r_idx = r_idx % adapter->num_tx_queues;
+
+ return adapter->tx_ring[r_idx];
+}
+
int igb_add_filter(struct igb_adapter *adapter,
struct igb_nfc_filter *input);
int igb_erase_filter(struct igb_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index f1d088168723..5a44867bcb26 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -33,7 +33,6 @@
#include <linux/bpf_trace.h>
#include <linux/pm_runtime.h>
#include <linux/etherdevice.h>
-#include <linux/lockdep.h>
#ifdef CONFIG_IGB_DCA
#include <linux/dca.h>
#endif
@@ -116,8 +115,6 @@ static void igb_configure_tx(struct igb_adapter *);
static void igb_configure_rx(struct igb_adapter *);
static void igb_clean_all_tx_rings(struct igb_adapter *);
static void igb_clean_all_rx_rings(struct igb_adapter *);
-static void igb_clean_tx_ring(struct igb_ring *);
-static void igb_clean_rx_ring(struct igb_ring *);
static void igb_set_rx_mode(struct net_device *);
static void igb_update_phy_info(struct timer_list *);
static void igb_watchdog(struct timer_list *);
@@ -2915,29 +2912,7 @@ static int igb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}
-/* This function assumes __netif_tx_lock is held by the caller. */
-static void igb_xdp_ring_update_tail(struct igb_ring *ring)
-{
- lockdep_assert_held(&txring_txq(ring)->_xmit_lock);
-
- /* Force memory writes to complete before letting h/w know there
- * are new descriptors to fetch.
- */
- wmb();
- writel(ring->next_to_use, ring->tail);
-}
-
-static struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter)
-{
- unsigned int r_idx = smp_processor_id();
-
- if (r_idx >= adapter->num_tx_queues)
- r_idx = r_idx % adapter->num_tx_queues;
-
- return adapter->tx_ring[r_idx];
-}
-
-static int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp)
+int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp)
{
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
int cpu = smp_processor_id();
@@ -4884,7 +4859,7 @@ static void igb_free_all_tx_resources(struct igb_adapter *adapter)
* igb_clean_tx_ring - Free Tx Buffers
* @tx_ring: ring to be cleaned
**/
-static void igb_clean_tx_ring(struct igb_ring *tx_ring)
+void igb_clean_tx_ring(struct igb_ring *tx_ring)
{
u16 i = tx_ring->next_to_clean;
struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
@@ -5003,7 +4978,7 @@ static void igb_free_all_rx_resources(struct igb_adapter *adapter)
* igb_clean_rx_ring - Free Rx Buffers per Queue
* @rx_ring: ring to free buffers from
**/
-static void igb_clean_rx_ring(struct igb_ring *rx_ring)
+void igb_clean_rx_ring(struct igb_ring *rx_ring)
{
u16 i = rx_ring->next_to_clean;
@@ -8782,9 +8757,9 @@ static bool igb_cleanup_headers(struct igb_ring *rx_ring,
* order to populate the hash, checksum, VLAN, timestamp, protocol, and
* other fields within the skb.
**/
-static void igb_process_skb_fields(struct igb_ring *rx_ring,
- union e1000_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
+void igb_process_skb_fields(struct igb_ring *rx_ring,
+ union e1000_adv_rx_desc *rx_desc,
+ struct sk_buff *skb)
{
struct net_device *dev = rx_ring->netdev;
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 2/6] igb: Introduce igb_xdp_is_enabled()
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
2024-10-11 9:00 ` [PATCH iwl-next v8 1/6] igb: Remove static qualifiers Kurt Kanzenbach
@ 2024-10-11 9:01 ` Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 3/6] igb: Introduce XSK data structures and helpers Kurt Kanzenbach
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:01 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Introduce igb_xdp_is_enabled() to check if an XDP program is assigned to
the device. Use that wherever xdp_prog is read and evaluated.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Split patches and use READ_ONCE()]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 5 +++++
drivers/net/ethernet/intel/igb/igb_main.c | 8 +++++---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 1bfe703e73d9..6e2b61ecff68 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -826,6 +826,11 @@ static inline struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adap
return adapter->tx_ring[r_idx];
}
+static inline bool igb_xdp_is_enabled(struct igb_adapter *adapter)
+{
+ return !!READ_ONCE(adapter->xdp_prog);
+}
+
int igb_add_filter(struct igb_adapter *adapter,
struct igb_nfc_filter *input);
int igb_erase_filter(struct igb_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5a44867bcb26..fc30966282c5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2926,7 +2926,8 @@ int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp)
/* During program transitions its possible adapter->xdp_prog is assigned
* but ring has not been configured yet. In this case simply abort xmit.
*/
- tx_ring = adapter->xdp_prog ? igb_xdp_tx_queue_mapping(adapter) : NULL;
+ tx_ring = igb_xdp_is_enabled(adapter) ?
+ igb_xdp_tx_queue_mapping(adapter) : NULL;
if (unlikely(!tx_ring))
return IGB_XDP_CONSUMED;
@@ -2959,7 +2960,8 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
/* During program transitions its possible adapter->xdp_prog is assigned
* but ring has not been configured yet. In this case simply abort xmit.
*/
- tx_ring = adapter->xdp_prog ? igb_xdp_tx_queue_mapping(adapter) : NULL;
+ tx_ring = igb_xdp_is_enabled(adapter) ?
+ igb_xdp_tx_queue_mapping(adapter) : NULL;
if (unlikely(!tx_ring))
return -ENXIO;
@@ -6593,7 +6595,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
struct igb_adapter *adapter = netdev_priv(netdev);
int max_frame = new_mtu + IGB_ETH_PKT_HDR_PAD;
- if (adapter->xdp_prog) {
+ if (igb_xdp_is_enabled(adapter)) {
int i;
for (i = 0; i < adapter->num_rx_queues; i++) {
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 3/6] igb: Introduce XSK data structures and helpers
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
2024-10-11 9:00 ` [PATCH iwl-next v8 1/6] igb: Remove static qualifiers Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 2/6] igb: Introduce igb_xdp_is_enabled() Kurt Kanzenbach
@ 2024-10-11 9:01 ` Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions Kurt Kanzenbach
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:01 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Add the following ring flag:
- IGB_RING_FLAG_TX_DISABLED (when xsk pool is being setup)
Add a xdp_buff array for use with XSK receive batch API, and a pointer
to xsk_pool in igb_adapter.
Add enable/disable functions for TX and RX rings.
Add enable/disable functions for XSK pool.
Add xsk wakeup function.
None of the above functionality will be active until
NETDEV_XDP_ACT_XSK_ZEROCOPY is advertised in netdev->xdp_features.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Add READ/WRITE_ONCE(), synchronize_net(),
remove IGB_RING_FLAG_AF_XDP_ZC]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/igb/Makefile | 2 +-
drivers/net/ethernet/intel/igb/igb.h | 13 +-
drivers/net/ethernet/intel/igb/igb_main.c | 9 ++
drivers/net/ethernet/intel/igb/igb_xsk.c | 207 ++++++++++++++++++++++++++++++
4 files changed, 229 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/Makefile b/drivers/net/ethernet/intel/igb/Makefile
index 463c0d26b9d4..6c1b702fd992 100644
--- a/drivers/net/ethernet/intel/igb/Makefile
+++ b/drivers/net/ethernet/intel/igb/Makefile
@@ -8,4 +8,4 @@ obj-$(CONFIG_IGB) += igb.o
igb-y := igb_main.o igb_ethtool.o e1000_82575.o \
e1000_mac.o e1000_nvm.o e1000_phy.o e1000_mbx.o \
- e1000_i210.o igb_ptp.o igb_hwmon.o
+ e1000_i210.o igb_ptp.o igb_hwmon.o igb_xsk.o
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 6e2b61ecff68..c30d6f9708f8 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -21,6 +21,7 @@
#include <linux/lockdep.h>
#include <net/xdp.h>
+#include <net/xdp_sock_drv.h>
struct igb_adapter;
@@ -321,6 +322,7 @@ struct igb_ring {
union { /* array of buffer info structs */
struct igb_tx_buffer *tx_buffer_info;
struct igb_rx_buffer *rx_buffer_info;
+ struct xdp_buff **rx_buffer_info_zc;
};
void *desc; /* descriptor ring memory */
unsigned long flags; /* ring specific flags */
@@ -358,6 +360,7 @@ struct igb_ring {
};
};
struct xdp_rxq_info xdp_rxq;
+ struct xsk_buff_pool *xsk_pool;
} ____cacheline_internodealigned_in_smp;
struct igb_q_vector {
@@ -385,7 +388,8 @@ enum e1000_ring_flags_t {
IGB_RING_FLAG_RX_SCTP_CSUM,
IGB_RING_FLAG_RX_LB_VLAN_BSWAP,
IGB_RING_FLAG_TX_CTX_IDX,
- IGB_RING_FLAG_TX_DETECT_HANG
+ IGB_RING_FLAG_TX_DETECT_HANG,
+ IGB_RING_FLAG_TX_DISABLED
};
#define ring_uses_large_buffer(ring) \
@@ -841,4 +845,11 @@ int igb_add_mac_steering_filter(struct igb_adapter *adapter,
int igb_del_mac_steering_filter(struct igb_adapter *adapter,
const u8 *addr, u8 queue, u8 flags);
+struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
+ struct igb_ring *ring);
+int igb_xsk_pool_setup(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid);
+int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
+
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index fc30966282c5..341b83e39019 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2904,9 +2904,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf)
static int igb_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
+ struct igb_adapter *adapter = netdev_priv(dev);
+
switch (xdp->command) {
case XDP_SETUP_PROG:
return igb_xdp_setup(dev, xdp);
+ case XDP_SETUP_XSK_POOL:
+ return igb_xsk_pool_setup(adapter, xdp->xsk.pool,
+ xdp->xsk.queue_id);
default:
return -EINVAL;
}
@@ -3015,6 +3020,7 @@ static const struct net_device_ops igb_netdev_ops = {
.ndo_setup_tc = igb_setup_tc,
.ndo_bpf = igb_xdp,
.ndo_xdp_xmit = igb_xdp_xmit,
+ .ndo_xsk_wakeup = igb_xsk_wakeup,
};
/**
@@ -4337,6 +4343,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
u64 tdba = ring->dma;
int reg_idx = ring->reg_idx;
+ WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring));
+
wr32(E1000_TDLEN(reg_idx),
ring->count * sizeof(union e1000_adv_tx_desc));
wr32(E1000_TDBAL(reg_idx),
@@ -4732,6 +4740,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_PAGE_SHARED, NULL));
+ WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring));
/* disable the queue */
wr32(E1000_RXDCTL(reg_idx), 0);
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
new file mode 100644
index 000000000000..7b632be3e7e3
--- /dev/null
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Intel Corporation. */
+
+#include <linux/bpf_trace.h>
+#include <net/xdp_sock_drv.h>
+#include <net/xdp.h>
+
+#include "e1000_hw.h"
+#include "igb.h"
+
+static int igb_realloc_rx_buffer_info(struct igb_ring *ring, bool pool_present)
+{
+ int size = pool_present ?
+ sizeof(*ring->rx_buffer_info_zc) * ring->count :
+ sizeof(*ring->rx_buffer_info) * ring->count;
+ void *buff_info = vmalloc(size);
+
+ if (!buff_info)
+ return -ENOMEM;
+
+ if (pool_present) {
+ vfree(ring->rx_buffer_info);
+ ring->rx_buffer_info = NULL;
+ ring->rx_buffer_info_zc = buff_info;
+ } else {
+ vfree(ring->rx_buffer_info_zc);
+ ring->rx_buffer_info_zc = NULL;
+ ring->rx_buffer_info = buff_info;
+ }
+
+ return 0;
+}
+
+static void igb_txrx_ring_disable(struct igb_adapter *adapter, u16 qid)
+{
+ struct igb_ring *tx_ring = adapter->tx_ring[qid];
+ struct igb_ring *rx_ring = adapter->rx_ring[qid];
+ struct e1000_hw *hw = &adapter->hw;
+
+ set_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags);
+
+ wr32(E1000_TXDCTL(tx_ring->reg_idx), 0);
+ wr32(E1000_RXDCTL(rx_ring->reg_idx), 0);
+
+ synchronize_net();
+
+ /* Rx/Tx share the same napi context. */
+ napi_disable(&rx_ring->q_vector->napi);
+
+ igb_clean_tx_ring(tx_ring);
+ igb_clean_rx_ring(rx_ring);
+
+ memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
+ memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid)
+{
+ struct igb_ring *tx_ring = adapter->tx_ring[qid];
+ struct igb_ring *rx_ring = adapter->rx_ring[qid];
+
+ igb_configure_tx_ring(adapter, tx_ring);
+ igb_configure_rx_ring(adapter, rx_ring);
+
+ synchronize_net();
+
+ clear_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags);
+
+ /* call igb_desc_unused which always leaves
+ * at least 1 descriptor unused to make sure
+ * next_to_use != next_to_clean
+ */
+ igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
+
+ /* Rx/Tx share the same napi context. */
+ napi_enable(&rx_ring->q_vector->napi);
+}
+
+struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
+ struct igb_ring *ring)
+{
+ int qid = ring->queue_index;
+ struct xsk_buff_pool *pool;
+
+ pool = xsk_get_pool_from_qid(adapter->netdev, qid);
+
+ if (!igb_xdp_is_enabled(adapter))
+ return NULL;
+
+ return (pool && pool->dev) ? pool : NULL;
+}
+
+static int igb_xsk_pool_enable(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ struct net_device *netdev = adapter->netdev;
+ struct igb_ring *rx_ring;
+ bool if_running;
+ int err;
+
+ if (qid >= adapter->num_rx_queues)
+ return -EINVAL;
+
+ if (qid >= netdev->real_num_rx_queues ||
+ qid >= netdev->real_num_tx_queues)
+ return -EINVAL;
+
+ err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IGB_RX_DMA_ATTR);
+ if (err)
+ return err;
+
+ rx_ring = adapter->rx_ring[qid];
+ if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter);
+ if (if_running)
+ igb_txrx_ring_disable(adapter, qid);
+
+ if (if_running) {
+ err = igb_realloc_rx_buffer_info(rx_ring, true);
+ if (!err) {
+ igb_txrx_ring_enable(adapter, qid);
+ /* Kick start the NAPI context so that receiving will start */
+ err = igb_xsk_wakeup(adapter->netdev, qid, XDP_WAKEUP_RX);
+ }
+
+ if (err) {
+ xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static int igb_xsk_pool_disable(struct igb_adapter *adapter, u16 qid)
+{
+ struct xsk_buff_pool *pool;
+ struct igb_ring *rx_ring;
+ bool if_running;
+ int err;
+
+ pool = xsk_get_pool_from_qid(adapter->netdev, qid);
+ if (!pool)
+ return -EINVAL;
+
+ rx_ring = adapter->rx_ring[qid];
+ if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter);
+ if (if_running)
+ igb_txrx_ring_disable(adapter, qid);
+
+ xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR);
+
+ if (if_running) {
+ err = igb_realloc_rx_buffer_info(rx_ring, false);
+ if (err)
+ return err;
+
+ igb_txrx_ring_enable(adapter, qid);
+ }
+
+ return 0;
+}
+
+int igb_xsk_pool_setup(struct igb_adapter *adapter,
+ struct xsk_buff_pool *pool,
+ u16 qid)
+{
+ return pool ? igb_xsk_pool_enable(adapter, pool, qid) :
+ igb_xsk_pool_disable(adapter, qid);
+}
+
+int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
+{
+ struct igb_adapter *adapter = netdev_priv(dev);
+ struct e1000_hw *hw = &adapter->hw;
+ struct igb_ring *ring;
+ u32 eics = 0;
+
+ if (test_bit(__IGB_DOWN, &adapter->state))
+ return -ENETDOWN;
+
+ if (!igb_xdp_is_enabled(adapter))
+ return -EINVAL;
+
+ if (qid >= adapter->num_tx_queues)
+ return -EINVAL;
+
+ ring = adapter->tx_ring[qid];
+
+ if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
+ return -ENETDOWN;
+
+ if (!READ_ONCE(ring->xsk_pool))
+ return -EINVAL;
+
+ if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
+ /* Cause software interrupt */
+ if (adapter->flags & IGB_FLAG_HAS_MSIX) {
+ eics |= ring->q_vector->eims_value;
+ wr32(E1000_EICS, eics);
+ } else {
+ wr32(E1000_ICS, E1000_ICS_RXDMT0);
+ }
+ }
+
+ return 0;
+}
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
` (2 preceding siblings ...)
2024-10-11 9:01 ` [PATCH iwl-next v8 3/6] igb: Introduce XSK data structures and helpers Kurt Kanzenbach
@ 2024-10-11 9:01 ` Kurt Kanzenbach
2024-10-15 12:05 ` Maciej Fijalkowski
2024-10-11 9:01 ` [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support Kurt Kanzenbach
5 siblings, 1 reply; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:01 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Kurt Kanzenbach
Move XDP finalize and Rx statistics update into separate functions. This
way, they can be reused by the XDP and XDP/ZC code later.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
drivers/net/ethernet/intel/igb/igb.h | 3 ++
drivers/net/ethernet/intel/igb/igb_main.c | 54 ++++++++++++++++++++-----------
2 files changed, 38 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index c30d6f9708f8..1e65b41a48d8 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -740,6 +740,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring);
void igb_clean_rx_ring(struct igb_ring *rx_ring);
void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *);
void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *);
+void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status);
+void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets,
+ unsigned int bytes);
void igb_setup_tctl(struct igb_adapter *);
void igb_setup_rctl(struct igb_adapter *);
void igb_setup_srrctl(struct igb_adapter *, struct igb_ring *);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 341b83e39019..4d3aed6cd848 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8852,6 +8852,38 @@ static void igb_put_rx_buffer(struct igb_ring *rx_ring,
rx_buffer->page = NULL;
}
+void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status)
+{
+ int cpu = smp_processor_id();
+ struct netdev_queue *nq;
+
+ if (status & IGB_XDP_REDIR)
+ xdp_do_flush();
+
+ if (status & IGB_XDP_TX) {
+ struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
+
+ nq = txring_txq(tx_ring);
+ __netif_tx_lock(nq, cpu);
+ igb_xdp_ring_update_tail(tx_ring);
+ __netif_tx_unlock(nq);
+ }
+}
+
+void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets,
+ unsigned int bytes)
+{
+ struct igb_ring *ring = q_vector->rx.ring;
+
+ u64_stats_update_begin(&ring->rx_syncp);
+ ring->rx_stats.packets += packets;
+ ring->rx_stats.bytes += bytes;
+ u64_stats_update_end(&ring->rx_syncp);
+
+ q_vector->rx.total_packets += packets;
+ q_vector->rx.total_bytes += bytes;
+}
+
static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
{
unsigned int total_bytes = 0, total_packets = 0;
@@ -8859,9 +8891,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
struct igb_ring *rx_ring = q_vector->rx.ring;
u16 cleaned_count = igb_desc_unused(rx_ring);
struct sk_buff *skb = rx_ring->skb;
- int cpu = smp_processor_id();
unsigned int xdp_xmit = 0;
- struct netdev_queue *nq;
struct xdp_buff xdp;
u32 frame_sz = 0;
int rx_buf_pgcnt;
@@ -8983,24 +9013,10 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
/* place incomplete frames back on ring for completion */
rx_ring->skb = skb;
- if (xdp_xmit & IGB_XDP_REDIR)
- xdp_do_flush();
-
- if (xdp_xmit & IGB_XDP_TX) {
- struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
-
- nq = txring_txq(tx_ring);
- __netif_tx_lock(nq, cpu);
- igb_xdp_ring_update_tail(tx_ring);
- __netif_tx_unlock(nq);
- }
+ if (xdp_xmit)
+ igb_finalize_xdp(adapter, xdp_xmit);
- u64_stats_update_begin(&rx_ring->rx_syncp);
- rx_ring->rx_stats.packets += total_packets;
- rx_ring->rx_stats.bytes += total_bytes;
- u64_stats_update_end(&rx_ring->rx_syncp);
- q_vector->rx.total_packets += total_packets;
- q_vector->rx.total_bytes += total_bytes;
+ igb_update_rx_stats(q_vector, total_packets, total_bytes);
if (cleaned_count)
igb_alloc_rx_buffers(rx_ring, cleaned_count);
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
` (3 preceding siblings ...)
2024-10-11 9:01 ` [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions Kurt Kanzenbach
@ 2024-10-11 9:01 ` Kurt Kanzenbach
2024-10-15 12:15 ` Maciej Fijalkowski
2024-10-11 9:01 ` [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support Kurt Kanzenbach
5 siblings, 1 reply; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:01 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Add support for AF_XDP zero-copy receive path.
When AF_XDP zero-copy is enabled, the rx buffers are allocated from the
xsk buff pool using igb_alloc_rx_buffers_zc().
Use xsk_pool_get_rx_frame_size() to set SRRCTL rx buf size when zero-copy
is enabled.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Port to v6.12 and provide napi_id for xdp_rxq_info_reg(),
RCT, remove NETDEV_XDP_ACT_XSK_ZEROCOPY, update NTC handling,
READ_ONCE() xsk_pool, likelyfy for XDP_REDIRECT case]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
drivers/net/ethernet/intel/igb/igb.h | 6 +
drivers/net/ethernet/intel/igb/igb_main.c | 79 ++++++--
drivers/net/ethernet/intel/igb/igb_xsk.c | 298 +++++++++++++++++++++++++++++-
3 files changed, 364 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 1e65b41a48d8..e4a85867aa18 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -88,6 +88,7 @@ struct igb_adapter;
#define IGB_XDP_CONSUMED BIT(0)
#define IGB_XDP_TX BIT(1)
#define IGB_XDP_REDIR BIT(2)
+#define IGB_XDP_EXIT BIT(3)
struct vf_data_storage {
unsigned char vf_mac_addresses[ETH_ALEN];
@@ -853,6 +854,11 @@ struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
int igb_xsk_pool_setup(struct igb_adapter *adapter,
struct xsk_buff_pool *pool,
u16 qid);
+bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
+ struct xsk_buff_pool *xsk_pool, u16 count);
+void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
+int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
+ struct xsk_buff_pool *xsk_pool, const int budget);
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4d3aed6cd848..711b60cab594 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -472,12 +472,17 @@ static void igb_dump(struct igb_adapter *adapter)
for (i = 0; i < rx_ring->count; i++) {
const char *next_desc;
- struct igb_rx_buffer *buffer_info;
- buffer_info = &rx_ring->rx_buffer_info[i];
+ dma_addr_t dma = (dma_addr_t)0;
+ struct igb_rx_buffer *buffer_info = NULL;
rx_desc = IGB_RX_DESC(rx_ring, i);
u0 = (struct my_u0 *)rx_desc;
staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
+ if (!rx_ring->xsk_pool) {
+ buffer_info = &rx_ring->rx_buffer_info[i];
+ dma = buffer_info->dma;
+ }
+
if (i == rx_ring->next_to_use)
next_desc = " NTU";
else if (i == rx_ring->next_to_clean)
@@ -497,11 +502,11 @@ static void igb_dump(struct igb_adapter *adapter)
"R ", i,
le64_to_cpu(u0->a),
le64_to_cpu(u0->b),
- (u64)buffer_info->dma,
+ (u64)dma,
next_desc);
if (netif_msg_pktdata(adapter) &&
- buffer_info->dma && buffer_info->page) {
+ buffer_info && dma && buffer_info->page) {
print_hex_dump(KERN_INFO, "",
DUMP_PREFIX_ADDRESS,
16, 1,
@@ -1983,7 +1988,11 @@ static void igb_configure(struct igb_adapter *adapter)
*/
for (i = 0; i < adapter->num_rx_queues; i++) {
struct igb_ring *ring = adapter->rx_ring[i];
- igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
+ if (ring->xsk_pool)
+ igb_alloc_rx_buffers_zc(ring, ring->xsk_pool,
+ igb_desc_unused(ring));
+ else
+ igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
}
}
@@ -4405,7 +4414,8 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
res = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
- rx_ring->queue_index, 0);
+ rx_ring->queue_index,
+ rx_ring->q_vector->napi.napi_id);
if (res < 0) {
dev_err(dev, "Failed to register xdp_rxq index %u\n",
rx_ring->queue_index);
@@ -4701,12 +4711,17 @@ void igb_setup_srrctl(struct igb_adapter *adapter, struct igb_ring *ring)
struct e1000_hw *hw = &adapter->hw;
int reg_idx = ring->reg_idx;
u32 srrctl = 0;
+ u32 buf_size;
- srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
- if (ring_uses_large_buffer(ring))
- srrctl |= IGB_RXBUFFER_3072 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ if (ring->xsk_pool)
+ buf_size = xsk_pool_get_rx_frame_size(ring->xsk_pool);
+ else if (ring_uses_large_buffer(ring))
+ buf_size = IGB_RXBUFFER_3072;
else
- srrctl |= IGB_RXBUFFER_2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ buf_size = IGB_RXBUFFER_2048;
+
+ srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
+ srrctl |= buf_size >> E1000_SRRCTL_BSIZEPKT_SHIFT;
srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
if (hw->mac.type >= e1000_82580)
srrctl |= E1000_SRRCTL_TIMESTAMP;
@@ -4738,9 +4753,17 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
u32 rxdctl = 0;
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
- WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
- MEM_TYPE_PAGE_SHARED, NULL));
WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring));
+ if (ring->xsk_pool) {
+ WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+ MEM_TYPE_XSK_BUFF_POOL,
+ NULL));
+ xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
+ } else {
+ WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+ MEM_TYPE_PAGE_SHARED,
+ NULL));
+ }
/* disable the queue */
wr32(E1000_RXDCTL(reg_idx), 0);
@@ -4767,9 +4790,12 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
rxdctl |= IGB_RX_HTHRESH << 8;
rxdctl |= IGB_RX_WTHRESH << 16;
- /* initialize rx_buffer_info */
- memset(ring->rx_buffer_info, 0,
- sizeof(struct igb_rx_buffer) * ring->count);
+ if (ring->xsk_pool)
+ memset(ring->rx_buffer_info_zc, 0,
+ sizeof(*ring->rx_buffer_info_zc) * ring->count);
+ else
+ memset(ring->rx_buffer_info, 0,
+ sizeof(*ring->rx_buffer_info) * ring->count);
/* initialize Rx descriptor 0 */
rx_desc = IGB_RX_DESC(ring, 0);
@@ -4957,8 +4983,13 @@ void igb_free_rx_resources(struct igb_ring *rx_ring)
rx_ring->xdp_prog = NULL;
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
- vfree(rx_ring->rx_buffer_info);
- rx_ring->rx_buffer_info = NULL;
+ if (rx_ring->xsk_pool) {
+ vfree(rx_ring->rx_buffer_info_zc);
+ rx_ring->rx_buffer_info_zc = NULL;
+ } else {
+ vfree(rx_ring->rx_buffer_info);
+ rx_ring->rx_buffer_info = NULL;
+ }
/* if not set, then don't free */
if (!rx_ring->desc)
@@ -4996,6 +5027,11 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
dev_kfree_skb(rx_ring->skb);
rx_ring->skb = NULL;
+ if (rx_ring->xsk_pool) {
+ igb_clean_rx_ring_zc(rx_ring);
+ goto skip_for_xsk;
+ }
+
/* Free all the Rx ring sk_buffs */
while (i != rx_ring->next_to_alloc) {
struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
@@ -5023,6 +5059,7 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
i = 0;
}
+skip_for_xsk:
rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
@@ -8177,6 +8214,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
struct igb_q_vector *q_vector = container_of(napi,
struct igb_q_vector,
napi);
+ struct xsk_buff_pool *xsk_pool;
bool clean_complete = true;
int work_done = 0;
@@ -8188,7 +8226,12 @@ static int igb_poll(struct napi_struct *napi, int budget)
clean_complete = igb_clean_tx_irq(q_vector, budget);
if (q_vector->rx.ring) {
- int cleaned = igb_clean_rx_irq(q_vector, budget);
+ int cleaned;
+
+ xsk_pool = READ_ONCE(q_vector->rx.ring->xsk_pool);
+ cleaned = xsk_pool ?
+ igb_clean_rx_irq_zc(q_vector, xsk_pool, budget) :
+ igb_clean_rx_irq(q_vector, budget);
work_done += cleaned;
if (cleaned >= budget)
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
index 7b632be3e7e3..22d234db0fab 100644
--- a/drivers/net/ethernet/intel/igb/igb_xsk.c
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -70,7 +70,11 @@ static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid)
* at least 1 descriptor unused to make sure
* next_to_use != next_to_clean
*/
- igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
+ if (rx_ring->xsk_pool)
+ igb_alloc_rx_buffers_zc(rx_ring, rx_ring->xsk_pool,
+ igb_desc_unused(rx_ring));
+ else
+ igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
/* Rx/Tx share the same napi context. */
napi_enable(&rx_ring->q_vector->napi);
@@ -169,6 +173,298 @@ int igb_xsk_pool_setup(struct igb_adapter *adapter,
igb_xsk_pool_disable(adapter, qid);
}
+static u16 igb_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
+ union e1000_adv_rx_desc *rx_desc, u16 count)
+{
+ dma_addr_t dma;
+ u16 buffs;
+ int i;
+
+ /* nothing to do */
+ if (!count)
+ return 0;
+
+ buffs = xsk_buff_alloc_batch(pool, xdp, count);
+ for (i = 0; i < buffs; i++) {
+ dma = xsk_buff_xdp_get_dma(*xdp);
+ rx_desc->read.pkt_addr = cpu_to_le64(dma);
+ rx_desc->wb.upper.length = 0;
+
+ rx_desc++;
+ xdp++;
+ }
+
+ return buffs;
+}
+
+bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
+ struct xsk_buff_pool *xsk_pool, u16 count)
+{
+ u32 nb_buffs_extra = 0, nb_buffs = 0;
+ union e1000_adv_rx_desc *rx_desc;
+ u16 ntu = rx_ring->next_to_use;
+ u16 total_count = count;
+ struct xdp_buff **xdp;
+
+ rx_desc = IGB_RX_DESC(rx_ring, ntu);
+ xdp = &rx_ring->rx_buffer_info_zc[ntu];
+
+ if (ntu + count >= rx_ring->count) {
+ nb_buffs_extra = igb_fill_rx_descs(xsk_pool, xdp, rx_desc,
+ rx_ring->count - ntu);
+ if (nb_buffs_extra != rx_ring->count - ntu) {
+ ntu += nb_buffs_extra;
+ goto exit;
+ }
+ rx_desc = IGB_RX_DESC(rx_ring, 0);
+ xdp = rx_ring->rx_buffer_info_zc;
+ ntu = 0;
+ count -= nb_buffs_extra;
+ }
+
+ nb_buffs = igb_fill_rx_descs(xsk_pool, xdp, rx_desc, count);
+ ntu += nb_buffs;
+ if (ntu == rx_ring->count)
+ ntu = 0;
+
+ /* clear the length for the next_to_use descriptor */
+ rx_desc = IGB_RX_DESC(rx_ring, ntu);
+ rx_desc->wb.upper.length = 0;
+
+exit:
+ if (rx_ring->next_to_use != ntu) {
+ rx_ring->next_to_use = ntu;
+
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch. (Only
+ * applicable for weak-ordered memory model archs,
+ * such as IA-64).
+ */
+ wmb();
+ writel(ntu, rx_ring->tail);
+ }
+
+ return total_count == (nb_buffs + nb_buffs_extra);
+}
+
+void igb_clean_rx_ring_zc(struct igb_ring *rx_ring)
+{
+ u16 ntc = rx_ring->next_to_clean;
+ u16 ntu = rx_ring->next_to_use;
+
+ while (ntc != ntu) {
+ struct xdp_buff *xdp = rx_ring->rx_buffer_info_zc[ntc];
+
+ xsk_buff_free(xdp);
+ ntc++;
+ if (ntc >= rx_ring->count)
+ ntc = 0;
+ }
+}
+
+static struct sk_buff *igb_construct_skb_zc(struct igb_ring *rx_ring,
+ struct xdp_buff *xdp,
+ ktime_t timestamp)
+{
+ unsigned int totalsize = xdp->data_end - xdp->data_meta;
+ unsigned int metasize = xdp->data - xdp->data_meta;
+ struct sk_buff *skb;
+
+ net_prefetch(xdp->data_meta);
+
+ /* allocate a skb to store the frags */
+ skb = napi_alloc_skb(&rx_ring->q_vector->napi, totalsize);
+ if (unlikely(!skb))
+ return NULL;
+
+ if (timestamp)
+ skb_hwtstamps(skb)->hwtstamp = timestamp;
+
+ memcpy(__skb_put(skb, totalsize), xdp->data_meta,
+ ALIGN(totalsize, sizeof(long)));
+
+ if (metasize) {
+ skb_metadata_set(skb, metasize);
+ __skb_pull(skb, metasize);
+ }
+
+ return skb;
+}
+
+static struct sk_buff *igb_run_xdp_zc(struct igb_adapter *adapter,
+ struct igb_ring *rx_ring,
+ struct xdp_buff *xdp,
+ struct xsk_buff_pool *xsk_pool,
+ struct bpf_prog *xdp_prog)
+{
+ int err, result = IGB_XDP_PASS;
+ u32 act;
+
+ prefetchw(xdp->data_hard_start); /* xdp_frame write */
+
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
+
+ if (likely(act == XDP_REDIRECT)) {
+ err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
+ if (!err) {
+ result = IGB_XDP_REDIR;
+ goto xdp_out;
+ }
+
+ if (xsk_uses_need_wakeup(xsk_pool) &&
+ err == -ENOBUFS)
+ result = IGB_XDP_EXIT;
+ else
+ result = IGB_XDP_CONSUMED;
+ goto out_failure;
+ }
+
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ result = igb_xdp_xmit_back(adapter, xdp);
+ if (result == IGB_XDP_CONSUMED)
+ goto out_failure;
+ break;
+ default:
+ bpf_warn_invalid_xdp_action(adapter->netdev, xdp_prog, act);
+ fallthrough;
+ case XDP_ABORTED:
+out_failure:
+ trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+ fallthrough;
+ case XDP_DROP:
+ result = IGB_XDP_CONSUMED;
+ break;
+ }
+xdp_out:
+ return ERR_PTR(-result);
+}
+
+int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
+ struct xsk_buff_pool *xsk_pool, const int budget)
+{
+ struct igb_adapter *adapter = q_vector->adapter;
+ unsigned int total_bytes = 0, total_packets = 0;
+ struct igb_ring *rx_ring = q_vector->rx.ring;
+ u32 ntc = rx_ring->next_to_clean;
+ struct bpf_prog *xdp_prog;
+ unsigned int xdp_xmit = 0;
+ bool failure = false;
+ u16 entries_to_alloc;
+ struct sk_buff *skb;
+
+ /* xdp_prog cannot be NULL in the ZC path */
+ xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+
+ while (likely(total_packets < budget)) {
+ union e1000_adv_rx_desc *rx_desc;
+ ktime_t timestamp = 0;
+ struct xdp_buff *xdp;
+ unsigned int size;
+
+ rx_desc = IGB_RX_DESC(rx_ring, ntc);
+ size = le16_to_cpu(rx_desc->wb.upper.length);
+ if (!size)
+ break;
+
+ /* This memory barrier is needed to keep us from reading
+ * any other fields out of the rx_desc until we know the
+ * descriptor has been written back
+ */
+ dma_rmb();
+
+ xdp = rx_ring->rx_buffer_info_zc[ntc];
+ xsk_buff_set_size(xdp, size);
+ xsk_buff_dma_sync_for_cpu(xdp);
+
+ /* pull rx packet timestamp if available and valid */
+ if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
+ int ts_hdr_len;
+
+ ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
+ xdp->data,
+ ×tamp);
+
+ xdp->data += ts_hdr_len;
+ xdp->data_meta += ts_hdr_len;
+ size -= ts_hdr_len;
+ }
+
+ skb = igb_run_xdp_zc(adapter, rx_ring, xdp, xsk_pool, xdp_prog);
+
+ if (IS_ERR(skb)) {
+ unsigned int xdp_res = -PTR_ERR(skb);
+
+ if (likely(xdp_res & (IGB_XDP_TX | IGB_XDP_REDIR))) {
+ xdp_xmit |= xdp_res;
+ } else if (xdp_res == IGB_XDP_EXIT) {
+ failure = true;
+ break;
+ } else if (xdp_res == IGB_XDP_CONSUMED) {
+ xsk_buff_free(xdp);
+ }
+
+ total_packets++;
+ total_bytes += size;
+ ntc++;
+ if (ntc == rx_ring->count)
+ ntc = 0;
+ continue;
+ }
+
+ skb = igb_construct_skb_zc(rx_ring, xdp, timestamp);
+
+ /* exit if we failed to retrieve a buffer */
+ if (!skb) {
+ rx_ring->rx_stats.alloc_failed++;
+ break;
+ }
+
+ xsk_buff_free(xdp);
+ ntc++;
+ if (ntc == rx_ring->count)
+ ntc = 0;
+
+ if (eth_skb_pad(skb))
+ continue;
+
+ /* probably a little skewed due to removing CRC */
+ total_bytes += skb->len;
+
+ /* populate checksum, timestamp, VLAN, and protocol */
+ igb_process_skb_fields(rx_ring, rx_desc, skb);
+
+ napi_gro_receive(&q_vector->napi, skb);
+
+ /* update budget accounting */
+ total_packets++;
+ }
+
+ rx_ring->next_to_clean = ntc;
+
+ if (xdp_xmit)
+ igb_finalize_xdp(adapter, xdp_xmit);
+
+ igb_update_rx_stats(q_vector, total_packets, total_bytes);
+
+ entries_to_alloc = igb_desc_unused(rx_ring);
+ if (entries_to_alloc >= IGB_RX_BUFFER_WRITE)
+ failure |= !igb_alloc_rx_buffers_zc(rx_ring, xsk_pool,
+ entries_to_alloc);
+
+ if (xsk_uses_need_wakeup(xsk_pool)) {
+ if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
+ xsk_set_rx_need_wakeup(xsk_pool);
+ else
+ xsk_clear_rx_need_wakeup(xsk_pool);
+
+ return (int)total_packets;
+ }
+ return failure ? budget : (int)total_packets;
+}
+
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
{
struct igb_adapter *adapter = netdev_priv(dev);
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
` (4 preceding siblings ...)
2024-10-11 9:01 ` [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support Kurt Kanzenbach
@ 2024-10-11 9:01 ` Kurt Kanzenbach
2024-10-15 13:28 ` Maciej Fijalkowski
5 siblings, 1 reply; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-11 9:01 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Richard Cochran, Sriram Yagnaraman,
Benjamin Steinke, Sebastian Andrzej Siewior, Maciej Fijalkowski,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman, Kurt Kanzenbach
From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Add support for AF_XDP zero-copy transmit path.
A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
igb_clean_tx_irq() can clean the buffers correctly based on type.
igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
enabled. We share the TX ring between slow path, XDP and AF_XDP
zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
[Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
igb_xmit_zc(), Add FIXME for RS bit]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 2 +
drivers/net/ethernet/intel/igb/igb_main.c | 61 ++++++++++++++++++++++++-----
drivers/net/ethernet/intel/igb/igb_xsk.c | 64 +++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index e4a85867aa18..f6ac74327bb3 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -258,6 +258,7 @@ enum igb_tx_flags {
enum igb_tx_buf_type {
IGB_TYPE_SKB = 0,
IGB_TYPE_XDP,
+ IGB_TYPE_XSK
};
/* wrapper around a pointer to a socket buffer,
@@ -859,6 +860,7 @@ bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
struct xsk_buff_pool *xsk_pool, const int budget);
+bool igb_xmit_zc(struct igb_ring *tx_ring);
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 711b60cab594..5f396c02e3b9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2979,6 +2979,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
if (unlikely(!tx_ring))
return -ENXIO;
+ if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
+ return -ENXIO;
+
nq = txring_txq(tx_ring);
__netif_tx_lock(nq, cpu);
@@ -3326,7 +3329,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netdev->priv_flags |= IFF_SUPP_NOFCS;
netdev->priv_flags |= IFF_UNICAST_FLT;
- netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
+ netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_XSK_ZEROCOPY;
/* MTU range: 68 - 9216 */
netdev->min_mtu = ETH_MIN_MTU;
@@ -4900,15 +4904,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
{
u16 i = tx_ring->next_to_clean;
struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
+ u32 xsk_frames = 0;
while (i != tx_ring->next_to_use) {
union e1000_adv_tx_desc *eop_desc, *tx_desc;
/* Free all the Tx ring sk_buffs or xdp frames */
- if (tx_buffer->type == IGB_TYPE_SKB)
+ if (tx_buffer->type == IGB_TYPE_SKB) {
dev_kfree_skb_any(tx_buffer->skb);
- else
+ } else if (tx_buffer->type == IGB_TYPE_XDP) {
xdp_return_frame(tx_buffer->xdpf);
+ } else if (tx_buffer->type == IGB_TYPE_XSK) {
+ xsk_frames++;
+ goto skip_for_xsk;
+ }
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -4939,6 +4948,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
DMA_TO_DEVICE);
}
+skip_for_xsk:
tx_buffer->next_to_watch = NULL;
/* move us one more past the eop_desc for start of next pkt */
@@ -4953,6 +4963,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
/* reset BQL for queue */
netdev_tx_reset_queue(txring_txq(tx_ring));
+ if (tx_ring->xsk_pool && xsk_frames)
+ xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
+
/* reset next_to_use and next_to_clean */
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
@@ -6486,6 +6499,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
return NETDEV_TX_BUSY;
}
+ if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
+ return NETDEV_TX_BUSY;
+
/* record the location of the first descriptor for this packet */
first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
first->type = IGB_TYPE_SKB;
@@ -8260,13 +8276,18 @@ static int igb_poll(struct napi_struct *napi, int budget)
**/
static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
{
- struct igb_adapter *adapter = q_vector->adapter;
- struct igb_ring *tx_ring = q_vector->tx.ring;
- struct igb_tx_buffer *tx_buffer;
- union e1000_adv_tx_desc *tx_desc;
unsigned int total_bytes = 0, total_packets = 0;
+ struct igb_adapter *adapter = q_vector->adapter;
unsigned int budget = q_vector->tx.work_limit;
+ struct igb_ring *tx_ring = q_vector->tx.ring;
unsigned int i = tx_ring->next_to_clean;
+ union e1000_adv_tx_desc *tx_desc;
+ struct igb_tx_buffer *tx_buffer;
+ struct xsk_buff_pool *xsk_pool;
+ int cpu = smp_processor_id();
+ bool xsk_xmit_done = true;
+ struct netdev_queue *nq;
+ u32 xsk_frames = 0;
if (test_bit(__IGB_DOWN, &adapter->state))
return true;
@@ -8297,10 +8318,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
total_packets += tx_buffer->gso_segs;
/* free the skb */
- if (tx_buffer->type == IGB_TYPE_SKB)
+ if (tx_buffer->type == IGB_TYPE_SKB) {
napi_consume_skb(tx_buffer->skb, napi_budget);
- else
+ } else if (tx_buffer->type == IGB_TYPE_XDP) {
xdp_return_frame(tx_buffer->xdpf);
+ } else if (tx_buffer->type == IGB_TYPE_XSK) {
+ xsk_frames++;
+ goto skip_for_xsk;
+ }
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -8332,6 +8357,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
}
}
+skip_for_xsk:
/* move us one more past the eop_desc for start of next pkt */
tx_buffer++;
tx_desc++;
@@ -8360,6 +8386,21 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
+ xsk_pool = READ_ONCE(tx_ring->xsk_pool);
+ if (xsk_pool) {
+ if (xsk_frames)
+ xsk_tx_completed(xsk_pool, xsk_frames);
+ if (xsk_uses_need_wakeup(xsk_pool))
+ xsk_set_tx_need_wakeup(xsk_pool);
+
+ nq = txring_txq(tx_ring);
+ __netif_tx_lock(nq, cpu);
+ /* Avoid transmit queue timeout since we share it with the slow path */
+ txq_trans_cond_update(nq);
+ xsk_xmit_done = igb_xmit_zc(tx_ring);
+ __netif_tx_unlock(nq);
+ }
+
if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
struct e1000_hw *hw = &adapter->hw;
@@ -8422,7 +8463,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
}
}
- return !!budget;
+ return !!budget && xsk_xmit_done;
}
/**
diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
index 22d234db0fab..d962c5e22b71 100644
--- a/drivers/net/ethernet/intel/igb/igb_xsk.c
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -465,6 +465,70 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
return failure ? budget : (int)total_packets;
}
+bool igb_xmit_zc(struct igb_ring *tx_ring)
+{
+ unsigned int budget = igb_desc_unused(tx_ring);
+ struct xsk_buff_pool *pool = tx_ring->xsk_pool;
+ u32 cmd_type, olinfo_status, nb_pkts, i = 0;
+ struct xdp_desc *descs = pool->tx_descs;
+ union e1000_adv_tx_desc *tx_desc = NULL;
+ struct igb_tx_buffer *tx_buffer_info;
+ unsigned int total_bytes = 0;
+ dma_addr_t dma;
+
+ if (!netif_carrier_ok(tx_ring->netdev))
+ return true;
+
+ if (test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))
+ return true;
+
+ nb_pkts = xsk_tx_peek_release_desc_batch(pool, budget);
+ if (!nb_pkts)
+ return true;
+
+ while (nb_pkts-- > 0) {
+ dma = xsk_buff_raw_get_dma(pool, descs[i].addr);
+ xsk_buff_raw_dma_sync_for_device(pool, dma, descs[i].len);
+
+ tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
+ tx_buffer_info->bytecount = descs[i].len;
+ tx_buffer_info->type = IGB_TYPE_XSK;
+ tx_buffer_info->xdpf = NULL;
+ tx_buffer_info->gso_segs = 1;
+ tx_buffer_info->time_stamp = jiffies;
+
+ tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use);
+ tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+ /* put descriptor type bits */
+ cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
+ E1000_ADVTXD_DCMD_IFCS;
+ olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
+
+ /* FIXME: This sets the Report Status (RS) bit for every
+ * descriptor. One nice to have optimization would be to set it
+ * only for the last descriptor in the whole batch. See Intel
+ * ice driver for an example on how to do it.
+ */
+ cmd_type |= descs[i].len | IGB_TXD_DCMD;
+ tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+ tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+
+ total_bytes += descs[i].len;
+
+ i++;
+ tx_ring->next_to_use++;
+ tx_buffer_info->next_to_watch = tx_desc;
+ if (tx_ring->next_to_use == tx_ring->count)
+ tx_ring->next_to_use = 0;
+ }
+
+ netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes);
+ igb_xdp_ring_update_tail(tx_ring);
+
+ return nb_pkts < budget;
+}
+
int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
{
struct igb_adapter *adapter = netdev_priv(dev);
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions
2024-10-11 9:01 ` [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions Kurt Kanzenbach
@ 2024-10-15 12:05 ` Maciej Fijalkowski
0 siblings, 0 replies; 11+ messages in thread
From: Maciej Fijalkowski @ 2024-10-15 12:05 UTC (permalink / raw)
To: Kurt Kanzenbach
Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Richard Cochran,
Sriram Yagnaraman, Benjamin Steinke, Sebastian Andrzej Siewior,
intel-wired-lan, netdev, bpf
On Fri, Oct 11, 2024 at 11:01:02AM +0200, Kurt Kanzenbach wrote:
> Move XDP finalize and Rx statistics update into separate functions. This
> way, they can be reused by the XDP and XDP/ZC code later.
>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/igb/igb.h | 3 ++
> drivers/net/ethernet/intel/igb/igb_main.c | 54 ++++++++++++++++++++-----------
> 2 files changed, 38 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index c30d6f9708f8..1e65b41a48d8 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -740,6 +740,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring);
> void igb_clean_rx_ring(struct igb_ring *rx_ring);
> void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *);
> void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *);
> +void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status);
> +void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets,
> + unsigned int bytes);
> void igb_setup_tctl(struct igb_adapter *);
> void igb_setup_rctl(struct igb_adapter *);
> void igb_setup_srrctl(struct igb_adapter *, struct igb_ring *);
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 341b83e39019..4d3aed6cd848 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8852,6 +8852,38 @@ static void igb_put_rx_buffer(struct igb_ring *rx_ring,
> rx_buffer->page = NULL;
> }
>
> +void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status)
> +{
> + int cpu = smp_processor_id();
> + struct netdev_queue *nq;
> +
> + if (status & IGB_XDP_REDIR)
> + xdp_do_flush();
> +
> + if (status & IGB_XDP_TX) {
> + struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
> +
> + nq = txring_txq(tx_ring);
> + __netif_tx_lock(nq, cpu);
> + igb_xdp_ring_update_tail(tx_ring);
> + __netif_tx_unlock(nq);
> + }
> +}
> +
> +void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets,
> + unsigned int bytes)
> +{
> + struct igb_ring *ring = q_vector->rx.ring;
> +
> + u64_stats_update_begin(&ring->rx_syncp);
> + ring->rx_stats.packets += packets;
> + ring->rx_stats.bytes += bytes;
> + u64_stats_update_end(&ring->rx_syncp);
> +
> + q_vector->rx.total_packets += packets;
> + q_vector->rx.total_bytes += bytes;
> +}
> +
> static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
> {
> unsigned int total_bytes = 0, total_packets = 0;
> @@ -8859,9 +8891,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
> struct igb_ring *rx_ring = q_vector->rx.ring;
> u16 cleaned_count = igb_desc_unused(rx_ring);
> struct sk_buff *skb = rx_ring->skb;
> - int cpu = smp_processor_id();
> unsigned int xdp_xmit = 0;
> - struct netdev_queue *nq;
> struct xdp_buff xdp;
> u32 frame_sz = 0;
> int rx_buf_pgcnt;
> @@ -8983,24 +9013,10 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
> /* place incomplete frames back on ring for completion */
> rx_ring->skb = skb;
>
> - if (xdp_xmit & IGB_XDP_REDIR)
> - xdp_do_flush();
> -
> - if (xdp_xmit & IGB_XDP_TX) {
> - struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter);
> -
> - nq = txring_txq(tx_ring);
> - __netif_tx_lock(nq, cpu);
> - igb_xdp_ring_update_tail(tx_ring);
> - __netif_tx_unlock(nq);
> - }
> + if (xdp_xmit)
> + igb_finalize_xdp(adapter, xdp_xmit);
>
> - u64_stats_update_begin(&rx_ring->rx_syncp);
> - rx_ring->rx_stats.packets += total_packets;
> - rx_ring->rx_stats.bytes += total_bytes;
> - u64_stats_update_end(&rx_ring->rx_syncp);
> - q_vector->rx.total_packets += total_packets;
> - q_vector->rx.total_bytes += total_bytes;
> + igb_update_rx_stats(q_vector, total_packets, total_bytes);
>
> if (cleaned_count)
> igb_alloc_rx_buffers(rx_ring, cleaned_count);
>
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support
2024-10-11 9:01 ` [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support Kurt Kanzenbach
@ 2024-10-15 12:15 ` Maciej Fijalkowski
0 siblings, 0 replies; 11+ messages in thread
From: Maciej Fijalkowski @ 2024-10-15 12:15 UTC (permalink / raw)
To: Kurt Kanzenbach
Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Richard Cochran,
Sriram Yagnaraman, Benjamin Steinke, Sebastian Andrzej Siewior,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman
On Fri, Oct 11, 2024 at 11:01:03AM +0200, Kurt Kanzenbach wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>
> Add support for AF_XDP zero-copy receive path.
>
> When AF_XDP zero-copy is enabled, the rx buffers are allocated from the
> xsk buff pool using igb_alloc_rx_buffers_zc().
>
> Use xsk_pool_get_rx_frame_size() to set SRRCTL rx buf size when zero-copy
> is enabled.
>
> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> [Kurt: Port to v6.12 and provide napi_id for xdp_rxq_info_reg(),
> RCT, remove NETDEV_XDP_ACT_XSK_ZEROCOPY, update NTC handling,
> READ_ONCE() xsk_pool, likelyfy for XDP_REDIRECT case]
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/igb/igb.h | 6 +
> drivers/net/ethernet/intel/igb/igb_main.c | 79 ++++++--
> drivers/net/ethernet/intel/igb/igb_xsk.c | 298 +++++++++++++++++++++++++++++-
> 3 files changed, 364 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index 1e65b41a48d8..e4a85867aa18 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -88,6 +88,7 @@ struct igb_adapter;
> #define IGB_XDP_CONSUMED BIT(0)
> #define IGB_XDP_TX BIT(1)
> #define IGB_XDP_REDIR BIT(2)
> +#define IGB_XDP_EXIT BIT(3)
>
> struct vf_data_storage {
> unsigned char vf_mac_addresses[ETH_ALEN];
> @@ -853,6 +854,11 @@ struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter,
> int igb_xsk_pool_setup(struct igb_adapter *adapter,
> struct xsk_buff_pool *pool,
> u16 qid);
> +bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
> + struct xsk_buff_pool *xsk_pool, u16 count);
> +void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
> +int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> + struct xsk_buff_pool *xsk_pool, const int budget);
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
>
> #endif /* _IGB_H_ */
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 4d3aed6cd848..711b60cab594 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -472,12 +472,17 @@ static void igb_dump(struct igb_adapter *adapter)
>
> for (i = 0; i < rx_ring->count; i++) {
> const char *next_desc;
> - struct igb_rx_buffer *buffer_info;
> - buffer_info = &rx_ring->rx_buffer_info[i];
> + dma_addr_t dma = (dma_addr_t)0;
> + struct igb_rx_buffer *buffer_info = NULL;
> rx_desc = IGB_RX_DESC(rx_ring, i);
> u0 = (struct my_u0 *)rx_desc;
> staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
>
> + if (!rx_ring->xsk_pool) {
> + buffer_info = &rx_ring->rx_buffer_info[i];
> + dma = buffer_info->dma;
> + }
> +
> if (i == rx_ring->next_to_use)
> next_desc = " NTU";
> else if (i == rx_ring->next_to_clean)
> @@ -497,11 +502,11 @@ static void igb_dump(struct igb_adapter *adapter)
> "R ", i,
> le64_to_cpu(u0->a),
> le64_to_cpu(u0->b),
> - (u64)buffer_info->dma,
> + (u64)dma,
> next_desc);
>
> if (netif_msg_pktdata(adapter) &&
> - buffer_info->dma && buffer_info->page) {
> + buffer_info && dma && buffer_info->page) {
> print_hex_dump(KERN_INFO, "",
> DUMP_PREFIX_ADDRESS,
> 16, 1,
> @@ -1983,7 +1988,11 @@ static void igb_configure(struct igb_adapter *adapter)
> */
> for (i = 0; i < adapter->num_rx_queues; i++) {
> struct igb_ring *ring = adapter->rx_ring[i];
> - igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
> + if (ring->xsk_pool)
> + igb_alloc_rx_buffers_zc(ring, ring->xsk_pool,
> + igb_desc_unused(ring));
> + else
> + igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
> }
> }
>
> @@ -4405,7 +4414,8 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
> if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
> xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
> res = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
> - rx_ring->queue_index, 0);
> + rx_ring->queue_index,
> + rx_ring->q_vector->napi.napi_id);
> if (res < 0) {
> dev_err(dev, "Failed to register xdp_rxq index %u\n",
> rx_ring->queue_index);
> @@ -4701,12 +4711,17 @@ void igb_setup_srrctl(struct igb_adapter *adapter, struct igb_ring *ring)
> struct e1000_hw *hw = &adapter->hw;
> int reg_idx = ring->reg_idx;
> u32 srrctl = 0;
> + u32 buf_size;
>
> - srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
> - if (ring_uses_large_buffer(ring))
> - srrctl |= IGB_RXBUFFER_3072 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
> + if (ring->xsk_pool)
> + buf_size = xsk_pool_get_rx_frame_size(ring->xsk_pool);
> + else if (ring_uses_large_buffer(ring))
> + buf_size = IGB_RXBUFFER_3072;
> else
> - srrctl |= IGB_RXBUFFER_2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
> + buf_size = IGB_RXBUFFER_2048;
> +
> + srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
> + srrctl |= buf_size >> E1000_SRRCTL_BSIZEPKT_SHIFT;
> srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
> if (hw->mac.type >= e1000_82580)
> srrctl |= E1000_SRRCTL_TIMESTAMP;
> @@ -4738,9 +4753,17 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
> u32 rxdctl = 0;
>
> xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
> - WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
> - MEM_TYPE_PAGE_SHARED, NULL));
> WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring));
> + if (ring->xsk_pool) {
> + WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
> + MEM_TYPE_XSK_BUFF_POOL,
> + NULL));
> + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
> + } else {
> + WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
> + MEM_TYPE_PAGE_SHARED,
> + NULL));
> + }
>
> /* disable the queue */
> wr32(E1000_RXDCTL(reg_idx), 0);
> @@ -4767,9 +4790,12 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
> rxdctl |= IGB_RX_HTHRESH << 8;
> rxdctl |= IGB_RX_WTHRESH << 16;
>
> - /* initialize rx_buffer_info */
> - memset(ring->rx_buffer_info, 0,
> - sizeof(struct igb_rx_buffer) * ring->count);
> + if (ring->xsk_pool)
> + memset(ring->rx_buffer_info_zc, 0,
> + sizeof(*ring->rx_buffer_info_zc) * ring->count);
> + else
> + memset(ring->rx_buffer_info, 0,
> + sizeof(*ring->rx_buffer_info) * ring->count);
>
> /* initialize Rx descriptor 0 */
> rx_desc = IGB_RX_DESC(ring, 0);
> @@ -4957,8 +4983,13 @@ void igb_free_rx_resources(struct igb_ring *rx_ring)
>
> rx_ring->xdp_prog = NULL;
> xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
> - vfree(rx_ring->rx_buffer_info);
> - rx_ring->rx_buffer_info = NULL;
> + if (rx_ring->xsk_pool) {
> + vfree(rx_ring->rx_buffer_info_zc);
> + rx_ring->rx_buffer_info_zc = NULL;
> + } else {
> + vfree(rx_ring->rx_buffer_info);
> + rx_ring->rx_buffer_info = NULL;
> + }
>
> /* if not set, then don't free */
> if (!rx_ring->desc)
> @@ -4996,6 +5027,11 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
> dev_kfree_skb(rx_ring->skb);
> rx_ring->skb = NULL;
>
> + if (rx_ring->xsk_pool) {
> + igb_clean_rx_ring_zc(rx_ring);
> + goto skip_for_xsk;
> + }
> +
> /* Free all the Rx ring sk_buffs */
> while (i != rx_ring->next_to_alloc) {
> struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
> @@ -5023,6 +5059,7 @@ void igb_clean_rx_ring(struct igb_ring *rx_ring)
> i = 0;
> }
>
> +skip_for_xsk:
> rx_ring->next_to_alloc = 0;
> rx_ring->next_to_clean = 0;
> rx_ring->next_to_use = 0;
> @@ -8177,6 +8214,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> struct igb_q_vector *q_vector = container_of(napi,
> struct igb_q_vector,
> napi);
> + struct xsk_buff_pool *xsk_pool;
> bool clean_complete = true;
> int work_done = 0;
>
> @@ -8188,7 +8226,12 @@ static int igb_poll(struct napi_struct *napi, int budget)
> clean_complete = igb_clean_tx_irq(q_vector, budget);
>
> if (q_vector->rx.ring) {
> - int cleaned = igb_clean_rx_irq(q_vector, budget);
> + int cleaned;
> +
> + xsk_pool = READ_ONCE(q_vector->rx.ring->xsk_pool);
> + cleaned = xsk_pool ?
> + igb_clean_rx_irq_zc(q_vector, xsk_pool, budget) :
> + igb_clean_rx_irq(q_vector, budget);
>
> work_done += cleaned;
> if (cleaned >= budget)
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 7b632be3e7e3..22d234db0fab 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -70,7 +70,11 @@ static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid)
> * at least 1 descriptor unused to make sure
> * next_to_use != next_to_clean
> */
> - igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
> + if (rx_ring->xsk_pool)
> + igb_alloc_rx_buffers_zc(rx_ring, rx_ring->xsk_pool,
> + igb_desc_unused(rx_ring));
> + else
> + igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
>
> /* Rx/Tx share the same napi context. */
> napi_enable(&rx_ring->q_vector->napi);
> @@ -169,6 +173,298 @@ int igb_xsk_pool_setup(struct igb_adapter *adapter,
> igb_xsk_pool_disable(adapter, qid);
> }
>
> +static u16 igb_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
> + union e1000_adv_rx_desc *rx_desc, u16 count)
> +{
> + dma_addr_t dma;
> + u16 buffs;
> + int i;
> +
> + /* nothing to do */
> + if (!count)
> + return 0;
> +
> + buffs = xsk_buff_alloc_batch(pool, xdp, count);
> + for (i = 0; i < buffs; i++) {
> + dma = xsk_buff_xdp_get_dma(*xdp);
> + rx_desc->read.pkt_addr = cpu_to_le64(dma);
> + rx_desc->wb.upper.length = 0;
> +
> + rx_desc++;
> + xdp++;
> + }
> +
> + return buffs;
> +}
> +
> +bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
> + struct xsk_buff_pool *xsk_pool, u16 count)
> +{
> + u32 nb_buffs_extra = 0, nb_buffs = 0;
> + union e1000_adv_rx_desc *rx_desc;
> + u16 ntu = rx_ring->next_to_use;
> + u16 total_count = count;
> + struct xdp_buff **xdp;
> +
> + rx_desc = IGB_RX_DESC(rx_ring, ntu);
> + xdp = &rx_ring->rx_buffer_info_zc[ntu];
> +
> + if (ntu + count >= rx_ring->count) {
> + nb_buffs_extra = igb_fill_rx_descs(xsk_pool, xdp, rx_desc,
> + rx_ring->count - ntu);
> + if (nb_buffs_extra != rx_ring->count - ntu) {
> + ntu += nb_buffs_extra;
> + goto exit;
> + }
> + rx_desc = IGB_RX_DESC(rx_ring, 0);
> + xdp = rx_ring->rx_buffer_info_zc;
> + ntu = 0;
> + count -= nb_buffs_extra;
> + }
> +
> + nb_buffs = igb_fill_rx_descs(xsk_pool, xdp, rx_desc, count);
> + ntu += nb_buffs;
> + if (ntu == rx_ring->count)
> + ntu = 0;
> +
> + /* clear the length for the next_to_use descriptor */
> + rx_desc = IGB_RX_DESC(rx_ring, ntu);
> + rx_desc->wb.upper.length = 0;
> +
> +exit:
> + if (rx_ring->next_to_use != ntu) {
> + rx_ring->next_to_use = ntu;
> +
> + /* Force memory writes to complete before letting h/w
> + * know there are new descriptors to fetch. (Only
> + * applicable for weak-ordered memory model archs,
> + * such as IA-64).
> + */
> + wmb();
> + writel(ntu, rx_ring->tail);
> + }
> +
> + return total_count == (nb_buffs + nb_buffs_extra);
> +}
> +
> +void igb_clean_rx_ring_zc(struct igb_ring *rx_ring)
> +{
> + u16 ntc = rx_ring->next_to_clean;
> + u16 ntu = rx_ring->next_to_use;
> +
> + while (ntc != ntu) {
> + struct xdp_buff *xdp = rx_ring->rx_buffer_info_zc[ntc];
> +
> + xsk_buff_free(xdp);
> + ntc++;
> + if (ntc >= rx_ring->count)
> + ntc = 0;
> + }
> +}
> +
> +static struct sk_buff *igb_construct_skb_zc(struct igb_ring *rx_ring,
> + struct xdp_buff *xdp,
> + ktime_t timestamp)
> +{
> + unsigned int totalsize = xdp->data_end - xdp->data_meta;
> + unsigned int metasize = xdp->data - xdp->data_meta;
> + struct sk_buff *skb;
> +
> + net_prefetch(xdp->data_meta);
> +
> + /* allocate a skb to store the frags */
> + skb = napi_alloc_skb(&rx_ring->q_vector->napi, totalsize);
> + if (unlikely(!skb))
> + return NULL;
> +
> + if (timestamp)
> + skb_hwtstamps(skb)->hwtstamp = timestamp;
> +
> + memcpy(__skb_put(skb, totalsize), xdp->data_meta,
> + ALIGN(totalsize, sizeof(long)));
> +
> + if (metasize) {
> + skb_metadata_set(skb, metasize);
> + __skb_pull(skb, metasize);
> + }
> +
> + return skb;
> +}
> +
> +static struct sk_buff *igb_run_xdp_zc(struct igb_adapter *adapter,
> + struct igb_ring *rx_ring,
> + struct xdp_buff *xdp,
> + struct xsk_buff_pool *xsk_pool,
> + struct bpf_prog *xdp_prog)
> +{
> + int err, result = IGB_XDP_PASS;
> + u32 act;
> +
> + prefetchw(xdp->data_hard_start); /* xdp_frame write */
> +
> + act = bpf_prog_run_xdp(xdp_prog, xdp);
> +
> + if (likely(act == XDP_REDIRECT)) {
> + err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
> + if (!err) {
> + result = IGB_XDP_REDIR;
> + goto xdp_out;
> + }
> +
> + if (xsk_uses_need_wakeup(xsk_pool) &&
> + err == -ENOBUFS)
> + result = IGB_XDP_EXIT;
> + else
> + result = IGB_XDP_CONSUMED;
> + goto out_failure;
> + }
> +
> + switch (act) {
> + case XDP_PASS:
> + break;
> + case XDP_TX:
> + result = igb_xdp_xmit_back(adapter, xdp);
> + if (result == IGB_XDP_CONSUMED)
> + goto out_failure;
> + break;
> + default:
> + bpf_warn_invalid_xdp_action(adapter->netdev, xdp_prog, act);
> + fallthrough;
> + case XDP_ABORTED:
> +out_failure:
> + trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> + fallthrough;
> + case XDP_DROP:
> + result = IGB_XDP_CONSUMED;
> + break;
> + }
> +xdp_out:
> + return ERR_PTR(-result);
> +}
> +
> +int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> + struct xsk_buff_pool *xsk_pool, const int budget)
> +{
> + struct igb_adapter *adapter = q_vector->adapter;
> + unsigned int total_bytes = 0, total_packets = 0;
> + struct igb_ring *rx_ring = q_vector->rx.ring;
> + u32 ntc = rx_ring->next_to_clean;
> + struct bpf_prog *xdp_prog;
> + unsigned int xdp_xmit = 0;
> + bool failure = false;
> + u16 entries_to_alloc;
> + struct sk_buff *skb;
> +
> + /* xdp_prog cannot be NULL in the ZC path */
> + xdp_prog = READ_ONCE(rx_ring->xdp_prog);
> +
> + while (likely(total_packets < budget)) {
> + union e1000_adv_rx_desc *rx_desc;
> + ktime_t timestamp = 0;
> + struct xdp_buff *xdp;
> + unsigned int size;
> +
> + rx_desc = IGB_RX_DESC(rx_ring, ntc);
> + size = le16_to_cpu(rx_desc->wb.upper.length);
> + if (!size)
> + break;
> +
> + /* This memory barrier is needed to keep us from reading
> + * any other fields out of the rx_desc until we know the
> + * descriptor has been written back
> + */
> + dma_rmb();
> +
> + xdp = rx_ring->rx_buffer_info_zc[ntc];
> + xsk_buff_set_size(xdp, size);
> + xsk_buff_dma_sync_for_cpu(xdp);
> +
> + /* pull rx packet timestamp if available and valid */
> + if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
> + int ts_hdr_len;
> +
> + ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
> + xdp->data,
> + ×tamp);
> +
> + xdp->data += ts_hdr_len;
> + xdp->data_meta += ts_hdr_len;
> + size -= ts_hdr_len;
> + }
> +
> + skb = igb_run_xdp_zc(adapter, rx_ring, xdp, xsk_pool, xdp_prog);
> +
> + if (IS_ERR(skb)) {
> + unsigned int xdp_res = -PTR_ERR(skb);
> +
> + if (likely(xdp_res & (IGB_XDP_TX | IGB_XDP_REDIR))) {
> + xdp_xmit |= xdp_res;
> + } else if (xdp_res == IGB_XDP_EXIT) {
> + failure = true;
> + break;
> + } else if (xdp_res == IGB_XDP_CONSUMED) {
> + xsk_buff_free(xdp);
> + }
> +
> + total_packets++;
> + total_bytes += size;
> + ntc++;
> + if (ntc == rx_ring->count)
> + ntc = 0;
> + continue;
> + }
> +
> + skb = igb_construct_skb_zc(rx_ring, xdp, timestamp);
> +
> + /* exit if we failed to retrieve a buffer */
> + if (!skb) {
> + rx_ring->rx_stats.alloc_failed++;
> + break;
> + }
> +
> + xsk_buff_free(xdp);
> + ntc++;
> + if (ntc == rx_ring->count)
> + ntc = 0;
> +
> + if (eth_skb_pad(skb))
> + continue;
> +
> + /* probably a little skewed due to removing CRC */
> + total_bytes += skb->len;
> +
> + /* populate checksum, timestamp, VLAN, and protocol */
> + igb_process_skb_fields(rx_ring, rx_desc, skb);
> +
> + napi_gro_receive(&q_vector->napi, skb);
> +
> + /* update budget accounting */
> + total_packets++;
> + }
> +
> + rx_ring->next_to_clean = ntc;
> +
> + if (xdp_xmit)
> + igb_finalize_xdp(adapter, xdp_xmit);
> +
> + igb_update_rx_stats(q_vector, total_packets, total_bytes);
> +
> + entries_to_alloc = igb_desc_unused(rx_ring);
> + if (entries_to_alloc >= IGB_RX_BUFFER_WRITE)
> + failure |= !igb_alloc_rx_buffers_zc(rx_ring, xsk_pool,
> + entries_to_alloc);
> +
> + if (xsk_uses_need_wakeup(xsk_pool)) {
> + if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> + xsk_set_rx_need_wakeup(xsk_pool);
> + else
> + xsk_clear_rx_need_wakeup(xsk_pool);
> +
> + return (int)total_packets;
> + }
> + return failure ? budget : (int)total_packets;
> +}
> +
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> {
> struct igb_adapter *adapter = netdev_priv(dev);
>
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support
2024-10-11 9:01 ` [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support Kurt Kanzenbach
@ 2024-10-15 13:28 ` Maciej Fijalkowski
2024-10-15 17:16 ` Kurt Kanzenbach
0 siblings, 1 reply; 11+ messages in thread
From: Maciej Fijalkowski @ 2024-10-15 13:28 UTC (permalink / raw)
To: Kurt Kanzenbach
Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Richard Cochran,
Sriram Yagnaraman, Benjamin Steinke, Sebastian Andrzej Siewior,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman
On Fri, Oct 11, 2024 at 11:01:04AM +0200, Kurt Kanzenbach wrote:
> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>
> Add support for AF_XDP zero-copy transmit path.
>
> A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
> frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
> igb_clean_tx_irq() can clean the buffers correctly based on type.
>
> igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
> enabled. We share the TX ring between slow path, XDP and AF_XDP
> zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
>
> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
> Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
> igb_xmit_zc(), Add FIXME for RS bit]
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/igb/igb.h | 2 +
> drivers/net/ethernet/intel/igb/igb_main.c | 61 ++++++++++++++++++++++++-----
> drivers/net/ethernet/intel/igb/igb_xsk.c | 64 +++++++++++++++++++++++++++++++
> 3 files changed, 117 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index e4a85867aa18..f6ac74327bb3 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -258,6 +258,7 @@ enum igb_tx_flags {
> enum igb_tx_buf_type {
> IGB_TYPE_SKB = 0,
> IGB_TYPE_XDP,
> + IGB_TYPE_XSK
> };
>
> /* wrapper around a pointer to a socket buffer,
> @@ -859,6 +860,7 @@ bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
> void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
> int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> struct xsk_buff_pool *xsk_pool, const int budget);
> +bool igb_xmit_zc(struct igb_ring *tx_ring);
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
>
> #endif /* _IGB_H_ */
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 711b60cab594..5f396c02e3b9 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2979,6 +2979,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
> if (unlikely(!tx_ring))
> return -ENXIO;
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return -ENXIO;
> +
> nq = txring_txq(tx_ring);
> __netif_tx_lock(nq, cpu);
>
> @@ -3326,7 +3329,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> netdev->priv_flags |= IFF_SUPP_NOFCS;
>
> netdev->priv_flags |= IFF_UNICAST_FLT;
> - netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
> + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
> + NETDEV_XDP_ACT_XSK_ZEROCOPY;
>
> /* MTU range: 68 - 9216 */
> netdev->min_mtu = ETH_MIN_MTU;
> @@ -4900,15 +4904,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> {
> u16 i = tx_ring->next_to_clean;
> struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
> + u32 xsk_frames = 0;
>
> while (i != tx_ring->next_to_use) {
> union e1000_adv_tx_desc *eop_desc, *tx_desc;
>
> /* Free all the Tx ring sk_buffs or xdp frames */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> dev_kfree_skb_any(tx_buffer->skb);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -4939,6 +4948,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> DMA_TO_DEVICE);
> }
>
> +skip_for_xsk:
> tx_buffer->next_to_watch = NULL;
>
> /* move us one more past the eop_desc for start of next pkt */
> @@ -4953,6 +4963,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
> /* reset BQL for queue */
> netdev_tx_reset_queue(txring_txq(tx_ring));
>
> + if (tx_ring->xsk_pool && xsk_frames)
> + xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
> +
> /* reset next_to_use and next_to_clean */
> tx_ring->next_to_use = 0;
> tx_ring->next_to_clean = 0;
> @@ -6486,6 +6499,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
> return NETDEV_TX_BUSY;
> }
>
> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
> + return NETDEV_TX_BUSY;
> +
> /* record the location of the first descriptor for this packet */
> first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> first->type = IGB_TYPE_SKB;
> @@ -8260,13 +8276,18 @@ static int igb_poll(struct napi_struct *napi, int budget)
> **/
> static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> {
> - struct igb_adapter *adapter = q_vector->adapter;
> - struct igb_ring *tx_ring = q_vector->tx.ring;
> - struct igb_tx_buffer *tx_buffer;
> - union e1000_adv_tx_desc *tx_desc;
> unsigned int total_bytes = 0, total_packets = 0;
> + struct igb_adapter *adapter = q_vector->adapter;
> unsigned int budget = q_vector->tx.work_limit;
> + struct igb_ring *tx_ring = q_vector->tx.ring;
> unsigned int i = tx_ring->next_to_clean;
> + union e1000_adv_tx_desc *tx_desc;
> + struct igb_tx_buffer *tx_buffer;
> + struct xsk_buff_pool *xsk_pool;
> + int cpu = smp_processor_id();
> + bool xsk_xmit_done = true;
> + struct netdev_queue *nq;
> + u32 xsk_frames = 0;
>
> if (test_bit(__IGB_DOWN, &adapter->state))
> return true;
> @@ -8297,10 +8318,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> total_packets += tx_buffer->gso_segs;
>
> /* free the skb */
> - if (tx_buffer->type == IGB_TYPE_SKB)
> + if (tx_buffer->type == IGB_TYPE_SKB) {
> napi_consume_skb(tx_buffer->skb, napi_budget);
> - else
> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
> xdp_return_frame(tx_buffer->xdpf);
> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
> + xsk_frames++;
> + goto skip_for_xsk;
> + }
>
> /* unmap skb header data */
> dma_unmap_single(tx_ring->dev,
> @@ -8332,6 +8357,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> +skip_for_xsk:
> /* move us one more past the eop_desc for start of next pkt */
> tx_buffer++;
> tx_desc++;
> @@ -8360,6 +8386,21 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> q_vector->tx.total_bytes += total_bytes;
> q_vector->tx.total_packets += total_packets;
>
> + xsk_pool = READ_ONCE(tx_ring->xsk_pool);
> + if (xsk_pool) {
> + if (xsk_frames)
> + xsk_tx_completed(xsk_pool, xsk_frames);
> + if (xsk_uses_need_wakeup(xsk_pool))
> + xsk_set_tx_need_wakeup(xsk_pool);
> +
> + nq = txring_txq(tx_ring);
> + __netif_tx_lock(nq, cpu);
> + /* Avoid transmit queue timeout since we share it with the slow path */
> + txq_trans_cond_update(nq);
> + xsk_xmit_done = igb_xmit_zc(tx_ring);
> + __netif_tx_unlock(nq);
> + }
> +
> if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
> struct e1000_hw *hw = &adapter->hw;
>
> @@ -8422,7 +8463,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> }
> }
>
> - return !!budget;
> + return !!budget && xsk_xmit_done;
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 22d234db0fab..d962c5e22b71 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -465,6 +465,70 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
> return failure ? budget : (int)total_packets;
> }
>
> +bool igb_xmit_zc(struct igb_ring *tx_ring)
> +{
> + unsigned int budget = igb_desc_unused(tx_ring);
> + struct xsk_buff_pool *pool = tx_ring->xsk_pool;
Ughh that's another read of pool ptr, you should have passed it by arg to
igb_xmit_zc()...
> + u32 cmd_type, olinfo_status, nb_pkts, i = 0;
> + struct xdp_desc *descs = pool->tx_descs;
> + union e1000_adv_tx_desc *tx_desc = NULL;
> + struct igb_tx_buffer *tx_buffer_info;
> + unsigned int total_bytes = 0;
> + dma_addr_t dma;
> +
> + if (!netif_carrier_ok(tx_ring->netdev))
> + return true;
> +
> + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))
> + return true;
> +
> + nb_pkts = xsk_tx_peek_release_desc_batch(pool, budget);
> + if (!nb_pkts)
> + return true;
> +
> + while (nb_pkts-- > 0) {
> + dma = xsk_buff_raw_get_dma(pool, descs[i].addr);
> + xsk_buff_raw_dma_sync_for_device(pool, dma, descs[i].len);
> +
> + tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
> + tx_buffer_info->bytecount = descs[i].len;
> + tx_buffer_info->type = IGB_TYPE_XSK;
> + tx_buffer_info->xdpf = NULL;
> + tx_buffer_info->gso_segs = 1;
> + tx_buffer_info->time_stamp = jiffies;
> +
> + tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use);
> + tx_desc->read.buffer_addr = cpu_to_le64(dma);
> +
> + /* put descriptor type bits */
> + cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT |
> + E1000_ADVTXD_DCMD_IFCS;
> + olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT;
> +
> + /* FIXME: This sets the Report Status (RS) bit for every
> + * descriptor. One nice to have optimization would be to set it
> + * only for the last descriptor in the whole batch. See Intel
> + * ice driver for an example on how to do it.
> + */
> + cmd_type |= descs[i].len | IGB_TXD_DCMD;
> + tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> + tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
> +
> + total_bytes += descs[i].len;
> +
> + i++;
> + tx_ring->next_to_use++;
> + tx_buffer_info->next_to_watch = tx_desc;
> + if (tx_ring->next_to_use == tx_ring->count)
> + tx_ring->next_to_use = 0;
> + }
> +
> + netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes);
> + igb_xdp_ring_update_tail(tx_ring);
> +
> + return nb_pkts < budget;
> +}
> +
> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> {
> struct igb_adapter *adapter = netdev_priv(dev);
>
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support
2024-10-15 13:28 ` Maciej Fijalkowski
@ 2024-10-15 17:16 ` Kurt Kanzenbach
0 siblings, 0 replies; 11+ messages in thread
From: Kurt Kanzenbach @ 2024-10-15 17:16 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Richard Cochran,
Sriram Yagnaraman, Benjamin Steinke, Sebastian Andrzej Siewior,
intel-wired-lan, netdev, bpf, Sriram Yagnaraman
[-- Attachment #1: Type: text/plain, Size: 8670 bytes --]
On Tue Oct 15 2024, Maciej Fijalkowski wrote:
> On Fri, Oct 11, 2024 at 11:01:04AM +0200, Kurt Kanzenbach wrote:
>> From: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>>
>> Add support for AF_XDP zero-copy transmit path.
>>
>> A new TX buffer type IGB_TYPE_XSK is introduced to indicate that the Tx
>> frame was allocated from the xsk buff pool, so igb_clean_tx_ring() and
>> igb_clean_tx_irq() can clean the buffers correctly based on type.
>>
>> igb_xmit_zc() performs the actual packet transmit when AF_XDP zero-copy is
>> enabled. We share the TX ring between slow path, XDP and AF_XDP
>> zero-copy, so we use the netdev queue lock to ensure mutual exclusion.
>>
>> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
>> [Kurt: Set olinfo_status in igb_xmit_zc() so that frames are transmitted,
>> Use READ_ONCE() for xsk_pool and check Tx disabled and carrier in
>> igb_xmit_zc(), Add FIXME for RS bit]
>> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>> ---
>> drivers/net/ethernet/intel/igb/igb.h | 2 +
>> drivers/net/ethernet/intel/igb/igb_main.c | 61 ++++++++++++++++++++++++-----
>> drivers/net/ethernet/intel/igb/igb_xsk.c | 64 +++++++++++++++++++++++++++++++
>> 3 files changed, 117 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
>> index e4a85867aa18..f6ac74327bb3 100644
>> --- a/drivers/net/ethernet/intel/igb/igb.h
>> +++ b/drivers/net/ethernet/intel/igb/igb.h
>> @@ -258,6 +258,7 @@ enum igb_tx_flags {
>> enum igb_tx_buf_type {
>> IGB_TYPE_SKB = 0,
>> IGB_TYPE_XDP,
>> + IGB_TYPE_XSK
>> };
>>
>> /* wrapper around a pointer to a socket buffer,
>> @@ -859,6 +860,7 @@ bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring,
>> void igb_clean_rx_ring_zc(struct igb_ring *rx_ring);
>> int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
>> struct xsk_buff_pool *xsk_pool, const int budget);
>> +bool igb_xmit_zc(struct igb_ring *tx_ring);
>> int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags);
>>
>> #endif /* _IGB_H_ */
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index 711b60cab594..5f396c02e3b9 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -2979,6 +2979,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
>> if (unlikely(!tx_ring))
>> return -ENXIO;
>>
>> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
>> + return -ENXIO;
>> +
>> nq = txring_txq(tx_ring);
>> __netif_tx_lock(nq, cpu);
>>
>> @@ -3326,7 +3329,8 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>> netdev->priv_flags |= IFF_SUPP_NOFCS;
>>
>> netdev->priv_flags |= IFF_UNICAST_FLT;
>> - netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT;
>> + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
>> + NETDEV_XDP_ACT_XSK_ZEROCOPY;
>>
>> /* MTU range: 68 - 9216 */
>> netdev->min_mtu = ETH_MIN_MTU;
>> @@ -4900,15 +4904,20 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
>> {
>> u16 i = tx_ring->next_to_clean;
>> struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
>> + u32 xsk_frames = 0;
>>
>> while (i != tx_ring->next_to_use) {
>> union e1000_adv_tx_desc *eop_desc, *tx_desc;
>>
>> /* Free all the Tx ring sk_buffs or xdp frames */
>> - if (tx_buffer->type == IGB_TYPE_SKB)
>> + if (tx_buffer->type == IGB_TYPE_SKB) {
>> dev_kfree_skb_any(tx_buffer->skb);
>> - else
>> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
>> xdp_return_frame(tx_buffer->xdpf);
>> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
>> + xsk_frames++;
>> + goto skip_for_xsk;
>> + }
>>
>> /* unmap skb header data */
>> dma_unmap_single(tx_ring->dev,
>> @@ -4939,6 +4948,7 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
>> DMA_TO_DEVICE);
>> }
>>
>> +skip_for_xsk:
>> tx_buffer->next_to_watch = NULL;
>>
>> /* move us one more past the eop_desc for start of next pkt */
>> @@ -4953,6 +4963,9 @@ void igb_clean_tx_ring(struct igb_ring *tx_ring)
>> /* reset BQL for queue */
>> netdev_tx_reset_queue(txring_txq(tx_ring));
>>
>> + if (tx_ring->xsk_pool && xsk_frames)
>> + xsk_tx_completed(tx_ring->xsk_pool, xsk_frames);
>> +
>> /* reset next_to_use and next_to_clean */
>> tx_ring->next_to_use = 0;
>> tx_ring->next_to_clean = 0;
>> @@ -6486,6 +6499,9 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
>> return NETDEV_TX_BUSY;
>> }
>>
>> + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)))
>> + return NETDEV_TX_BUSY;
>> +
>> /* record the location of the first descriptor for this packet */
>> first = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
>> first->type = IGB_TYPE_SKB;
>> @@ -8260,13 +8276,18 @@ static int igb_poll(struct napi_struct *napi, int budget)
>> **/
>> static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
>> {
>> - struct igb_adapter *adapter = q_vector->adapter;
>> - struct igb_ring *tx_ring = q_vector->tx.ring;
>> - struct igb_tx_buffer *tx_buffer;
>> - union e1000_adv_tx_desc *tx_desc;
>> unsigned int total_bytes = 0, total_packets = 0;
>> + struct igb_adapter *adapter = q_vector->adapter;
>> unsigned int budget = q_vector->tx.work_limit;
>> + struct igb_ring *tx_ring = q_vector->tx.ring;
>> unsigned int i = tx_ring->next_to_clean;
>> + union e1000_adv_tx_desc *tx_desc;
>> + struct igb_tx_buffer *tx_buffer;
>> + struct xsk_buff_pool *xsk_pool;
>> + int cpu = smp_processor_id();
>> + bool xsk_xmit_done = true;
>> + struct netdev_queue *nq;
>> + u32 xsk_frames = 0;
>>
>> if (test_bit(__IGB_DOWN, &adapter->state))
>> return true;
>> @@ -8297,10 +8318,14 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
>> total_packets += tx_buffer->gso_segs;
>>
>> /* free the skb */
>> - if (tx_buffer->type == IGB_TYPE_SKB)
>> + if (tx_buffer->type == IGB_TYPE_SKB) {
>> napi_consume_skb(tx_buffer->skb, napi_budget);
>> - else
>> + } else if (tx_buffer->type == IGB_TYPE_XDP) {
>> xdp_return_frame(tx_buffer->xdpf);
>> + } else if (tx_buffer->type == IGB_TYPE_XSK) {
>> + xsk_frames++;
>> + goto skip_for_xsk;
>> + }
>>
>> /* unmap skb header data */
>> dma_unmap_single(tx_ring->dev,
>> @@ -8332,6 +8357,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
>> }
>> }
>>
>> +skip_for_xsk:
>> /* move us one more past the eop_desc for start of next pkt */
>> tx_buffer++;
>> tx_desc++;
>> @@ -8360,6 +8386,21 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
>> q_vector->tx.total_bytes += total_bytes;
>> q_vector->tx.total_packets += total_packets;
>>
>> + xsk_pool = READ_ONCE(tx_ring->xsk_pool);
>> + if (xsk_pool) {
>> + if (xsk_frames)
>> + xsk_tx_completed(xsk_pool, xsk_frames);
>> + if (xsk_uses_need_wakeup(xsk_pool))
>> + xsk_set_tx_need_wakeup(xsk_pool);
>> +
>> + nq = txring_txq(tx_ring);
>> + __netif_tx_lock(nq, cpu);
>> + /* Avoid transmit queue timeout since we share it with the slow path */
>> + txq_trans_cond_update(nq);
>> + xsk_xmit_done = igb_xmit_zc(tx_ring);
>> + __netif_tx_unlock(nq);
>> + }
>> +
>> if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) {
>> struct e1000_hw *hw = &adapter->hw;
>>
>> @@ -8422,7 +8463,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
>> }
>> }
>>
>> - return !!budget;
>> + return !!budget && xsk_xmit_done;
>> }
>>
>> /**
>> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
>> index 22d234db0fab..d962c5e22b71 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
>> @@ -465,6 +465,70 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
>> return failure ? budget : (int)total_packets;
>> }
>>
>> +bool igb_xmit_zc(struct igb_ring *tx_ring)
>> +{
>> + unsigned int budget = igb_desc_unused(tx_ring);
>> + struct xsk_buff_pool *pool = tx_ring->xsk_pool;
>
> Ughh that's another read of pool ptr, you should have passed it by arg to
> igb_xmit_zc()...
Ups, missed that somehow. Thanks!
Thanks,
Kurt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-10-15 17:16 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11 9:00 [PATCH iwl-next v8 0/6] igb: Add support for AF_XDP zero-copy Kurt Kanzenbach
2024-10-11 9:00 ` [PATCH iwl-next v8 1/6] igb: Remove static qualifiers Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 2/6] igb: Introduce igb_xdp_is_enabled() Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 3/6] igb: Introduce XSK data structures and helpers Kurt Kanzenbach
2024-10-11 9:01 ` [PATCH iwl-next v8 4/6] igb: Add XDP finalize and stats update functions Kurt Kanzenbach
2024-10-15 12:05 ` Maciej Fijalkowski
2024-10-11 9:01 ` [PATCH iwl-next v8 5/6] igb: Add AF_XDP zero-copy Rx support Kurt Kanzenbach
2024-10-15 12:15 ` Maciej Fijalkowski
2024-10-11 9:01 ` [PATCH iwl-next v8 6/6] igb: Add AF_XDP zero-copy Tx support Kurt Kanzenbach
2024-10-15 13:28 ` Maciej Fijalkowski
2024-10-15 17:16 ` Kurt Kanzenbach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).